RAID Storage
RAID stands for Redundant Array of Inexpensive (or Independent) Disks. RAID takes multiple (2 or more) disks and manages and configures them to work together to provide higher performance and availability of data. The RAID specification is implemented through the disk controller, so any disks that are compatible with your controller can be RAIDed together.
Levels
RAID 0: Striping
Not a part of the original specification, this configuration stripes data across two or more disks, allowing more data to be written or read at the same time, as well as reducing seek latency on the disk. No additional availability of data is provided. Should one drive fail, all data on the array will be lost.
RAID 1: Mirroring
RAID level 1 mirrors data between disks, meaning a copy of the same data is stored in the same place on two different disks. This provides the highest availability of data - one drive can fail completely and the RAID set will continue to function exactly as before the failure.
RAID 2: Bit Level Striping with ECC
This isn't used often, as it's not implemented on most RAID controllers. Data is split by the byte with Hamming ECC information written to additional disks (10 data and 4 ECC or 32 data and 7 ECC). It's weird, don't think too much about this.
RAID 3: Byte Level Striping with Dedicated Parity
This setup requires at least 3 drives. Data is stored on all but one of the disks in the array. The last disk has all the parity information. One drive can be lost and the system will still perform similarly. This RAID level really performs best with large files. Performance suffers due to the cost of calculating the parity information and the bottleneck of writing to one parity disk.
RAID 4: Block Level Striping with Dedicated Parity
Much like RAID level 3, RAID 4 reqires a dedicated parity disk. Likewise, the bottleneck of writing to the parity disk is similar. Read and write performance is increased over RAID 3, as data is read and written at the block level. The Block (stripe) size can be adjusted to improve performance, depending on the data being manipulated. The downside is that Parity is also calculated at a block level, reducing write performance and, by virtue of dealing with more data on the controller, increases costs. Stick with RAID 3 or 5.
RAID 5: Block Level Striping with Distributed Parity
Similar to RAID 4, parity is calculated on a block level, but the parity is rotated through the set, reducing the bottleneck imposed by a dedicated parity drive.
RAID 6: Block Level Striping with Dual Distributed Parity
Just like RAID 5, but the parity is distributed across 2 drives. This reduces the amount of data that can be written as well as increases the overhead of calculating the parity, but the array can withstand 2 drives failing and performance is highly degraded when rebuilding the array. RAID 6 becomes a better option than RAID 5 when the array will contain numerous drives (as calculated based on the MTBF and time to rebuild the array from a failed disk).
Nested RAID Levels
While any combination of RAID levels could be acheived through a combination of hardware and software RAID, the most common levels are 01, 10, 50 and 60.
RAID 01: Mirror of Stripes
Data is first striped across disks arranged in two identical RAID 1 arrays, then the data is mirrored from the 1st array onto the second:
--RAID 1-- -RAID 0- -RAID 0- D1 D2 D3 D4 A1 A2 A1 A2 B1 B2 B1 B2 ... ... ... ...
This drive combination is only fault tolerant for one disk. Should a second disk fail in the same RAID 0 set, or the disk with the same data in the other RAID 0 set fail, the data becomes unrecoverable.
RAID 10: Striped Mirroring
Data is mirrored on sub-arrays and striped across all arrays:
-------RAID 0------- -RAID 1- -RAID 1- -RAID 1- D1 D2 D3 D4 D5 D6 A1 A1 A2 A2 A3 A3 B1 B1 B2 B2 B3 B3 ... ... ... ... ... ...
This drive combination is fault tolerant for one disk in each RAID 1 array. While the fault tolerance of this set is high, the performance of this configuration is better than that of RAID 5 due to the lack of a parity calculation.
RAID 50 & 60: Striped RAID 5 and 6
RAID 50 (60) consists of multiple arrays of RAID 5, striped together. These levels result in increased performance and fault tolerance at the expense of less disk space than RAID 01 or 10.
RAID 100, 500 & 600: Striped, Striped Mirroring
A striped array of mirrored (striping with parity, &c.) disks is striped with other identically sized arrays. Typically the top layer of striping is implemented in software rather than hardware.
RAID Layer
Software
Software RAID is implemented by the operating system (i.e. md in unix/linux and Dynamic Drives in Windows). As such, if the operating system becomes corrupt or a drive fails, the time required for recovery would be higher; the drive setup would need to be performed again, using the identical options.
Hardware
This is the usual layer that RAID is implemented in. Through using hardware, the task of calculating Parity is offloaded to a optimized coprocessor on the RAID card. RAID cards usually have more than one "back-plane" splitting physical drives logically so that the data is written/read with more available bandwidth and thereby increasing performance.
RAID and Disk Technology
As RAID is a logical implementation, seperate from the physical media, any drive type can be used; this even includes iPods. ATA (including ATAPI and SATA) and SCSI drives are most commonly used.
ATA Drives and RAID
ATA drives are very inexpensive, and newer motherboards typically come with support for RAID 0, 1, 10 (at times 01) and 5.
SCSI Drives and RAID
The advantage of using SCSI (and particularly S-SCSI) drives over ATA is that the SCSI instruction set is much more robust and geared towards enterprise applications such as RAID. One example of this is spindle-locking, where parallel stripes are symultaneously under the hard drive's head. SCSI drives also typically have a higher spindle speed and therefore lower latency and higher bandwidth. SCSI drives usually are attached via a dedicated RAID card rather than support on the motherboard. This eliminates the need for Parity calculations on the CPU as well as allows for selection of a card with features to fit your specific needs.
Links
Copyright (C) 2007 John Zak