preface

RAID explains to me, a lazy quote from Wikipedia, RAID, Redundant Array of Independent Disks, Previously known as Redundant Array of Inexpensive Disks, or arrays of Inexpensive Disks. The basic idea is to combine a number of relatively cheap hard disks into a hard disk array group, so that the performance can reach or even exceed that of an expensive, large capacity hard disk. Depending on the version you choose, RAID offers one or more of the following advantages over a single hard disk: increased data integration, improved fault tolerance, and increased processing capacity or capacity. In addition, an array of disks looks like a single hard disk or logical storage unit to a computer. Of course, this article is not a simple introduction to the concept and use methods, but more importantly, how to make reasonable RAID configuration and parameter optimization for different business scenarios. For SSD, I introduce the experimental data of Xiaomi’s operation and maintenance team. Meanwhile, I also believe that distributed storage will gradually mature, with OpenStack, VSAN, Nutanix, which stands for the concept of software definition and hyperconvergence, has also begun to fight.

The essence of RAID is to balance availability against cost


Update history

March 11, 2016 – First draft

Read the original – http://wsgzao.github.io/post/raid/

Further reading

RAID – https://zh.wikipedia.org/wiki/RAID RAID technology development review – http://blog.csdn.net/liuaigui/article/details/4581970 SSD array card scheme optimization: Consider RAID 50 instead of RAID 10 – http://noops.me/?p=1805


RAID Basics

Thanks to @Liu Aigui, please refer to the extension reading for more details

The basic principle of

RAID (Redundant Array of Independent Disks) is a Redundant Array of Independent Disks. In simple terms, RAID is a disk subsystem composed of multiple independent high-performance disk drives, thus providing a technology with higher storage performance and data redundancy than a single disk. RAID is a class of multi-disk management technology that provides high-performance storage with low cost and high data reliability to the host environment. SNIA defines RAID as an array of disks in which part of the physical storage space is used to record redundant information about user data stored on the remaining space. When one of these disks or access paths fails, the redundant information can be used to reconstruct user data. Disk striping is often referred to as RAID (that is, RAID0), even though it does not match the RAID definition.

RAID is designed to provide high-end storage capabilities and redundant data security for large servers. In the system as a whole, RAID is thought of as a storage space consisting of two or more disks that improves the I/O performance of the storage system by concurrently reading and writing data to and from multiple disks. This is where Redundant Redundant comes from: most RAID levels have robust data validation, corrective actions that increase system fault tolerance, and even mirroring, to greatly enhance system reliability.

A quick mention here is JBOD (Just a Bunch of Disks). Originally, JBOD was used to represent a collection of disks that had no control software to provide coordination control, and this is the main factor that distinguishes RAID from JBOD. Today JBOD is often referred to as a disk cabinet, whether or not it provides RAID functionality.

The two key goals of RAID are to improve data reliability and I/O performance. In a disk array, data is spread across multiple disks, but to a computer system, it is like a single disk. By writing the same data to multiple disks at the same time (typically mirroring), or by writing computed validation data to an array, redundancy is achieved so that no data is lost in the event of a single disk failure. Some RAID levels allow for more simultaneous disk failures, such as RAID6, where two disks can fail at the same time. With this redundancy mechanism, the failed disk can be replaced with a new disk, and the RAID will automatically rebuild the lost data from the data on the remaining disk and the calibration data, ensuring data consistency and integrity. Data is scattered across multiple different disks in RAID, and concurrent data reads and writes are much better than a single disk, resulting in higher aggregate I/O bandwidth. Of course, disk arrays reduce the total available storage space on the entire disk, sacrificing space for greater reliability and performance. For example, RAID1 has a storage utilization of only 50%, and RAID5 will lose the storage capacity of one of the disks with a space utilization of (n-1)/n.

An array of disks can guarantee uninterrupted continuous operation of the system even if some of the disks (single or multiple disks, depending on the implementation) are damaged. During the process of rebuilding the failed disk data to the new disk, the system can continue to function normally, but there will be some performance degradation. Some disk arrays must stop when a disk is added or removed, while others support Hot Swapping, which allows disk drives to be replaced without stopping. This high-end array of disks is intended for applications that require high probability, with no downtime or as little downtime as possible. In general, RAID is not an alternative to data backup, and it can’t protect against data loss that is not caused by disk failure, such as viruses, vandals, accidental deletions, and so on. Data loss at this point is relative to the operating system, file system, volume manager or application system, for the RAID system to come, the data is intact, no loss occurred. Therefore, data backup, disaster preparation and other data protection measures are very necessary, and RAID complement each other, protect data in different levels of security, to prevent data loss.

There are three key concepts and techniques in RAID: Mirroring, Data Stripping, and Data parity. Mirroring, which copies data to multiple disks, improves reliability on the one hand and improves read performance on the other hand by concurrently reading data from two or more copies. Obviously, mirroring has slightly lower write performance and more time consuming to ensure that data is written correctly to multiple disks. Data striping, where data is shards stored on multiple different disks. Multiple shards of data together make up a complete copy of the data, as opposed to multiple copies of a mirror. It is often used for performance reasons. Data striping has a higher concurrency granularity, and when data is accessed, it is possible to read and write data on different disks at the same time, resulting in significant I/O performance gains. Data verification, the use of redundant data for data error detection and repair, redundant data is usually calculated by Hamming code, XOR operation and other algorithms. The reliability, robustness and fault tolerance of the disk array can be greatly improved by using the verification function. However, data verification needs to read data from multiple places for calculation and comparison, which will affect system performance. Different levels of RAID use one or more of these three technologies to achieve different data reliability, availability, and I/O performance. As for what RAID to design (or even a new level or type of RAID) or what type of RAID to use, you need to make a reasonable choice based on a deep understanding of the system’s requirements, and make a compromise by evaluating reliability, performance, and cost.

The idea of RAID has been widely accepted by the industry since it was put forward. The storage industry has invested a lot of time and money to research and develop related products. Moreover, with the continuous development of processor, memory, computer interface and other technologies, RAID has been constantly developing and innovating, and has been widely used in the field of computer storage, gradually extending from the high-end system to the common low-end system. RAID technology is so popular, because it has significant characteristics and advantages, can basically meet most of the data storage needs. In general, the main advantages of RAID are as follows: (1) Large Capacity This is an obvious advantage of RAID. It expands the capacity of disks. RAID systems that consist of multiple disks have a large amount of storage space. Now that a single disk can hold more than a terabyte, RAID storage capacity can reach petabytes, and most storage requirements can be met. In general, RAID available capacity is less than the total capacity of all the member disks. RAID algorithms of different levels need some redundancy overhead, and the specific capacity overhead is related to the algorithm adopted. If you know the RAID algorithm and the capacity, you can calculate the capacity available for RAID. Typically, RAID capacity utilization is between 50% and 90%. The high performance of RAID benefits from data striping technology. The I/O performance of a single disk is limited by the interface, bandwidth and other computer technology, and its performance is often very limited, which is easy to become the bottleneck of system performance. Through data striping, RAID spreads data I/O across member disks, resulting in a multiplier of aggregate I/O performance over a single disk. (3) Reliability Availability and reliability are another important feature of RAID. In theory, a RAID system made up of multiple disks should be less reliable than a single disk. There is an implicit assumption that a single disk failure will render the entire RAID unusable. RAID uses data redundancy techniques such as mirroring and data validation to break this assumption. Mirroring is the most primitive of redundant technologies, where data from one set of disk drives is copied completely to another set of disk drives, so that a copy of the data is always available. Data validation is much less than the 50% redundancy overhead of mirroring, and it uses the redundancy information to validate and correct data. RAID redundancy technology greatly improves the availability and reliability of data, to ensure that a number of disk errors, will not lead to the loss of data, does not affect the continuous operation of the system. (4) Manageability In essence, RAID is a virtualization technology that virtualizes multiple physical disk drives into a single high-capacity logical drive. To an external host system, RAID is a single, fast, reliable, high-capacity disk drive. In this way, users can organize and store application system data on this virtual drive. From the user application point of view, can make the storage system simple and easy to use, management is also very convenient. Because RAID does a lot of storage management work internally, administrators only need to manage a single virtual drive, which can save a lot of administrative work. RAID can add and subtract disk drives dynamically, and data validation and data reconstruction can be done automatically, which greatly simplifies administration.

The key technology

The mirror

Mirroring is a redundant technology that provides protection against disk failure and loss of data. For RAID, mirroring typically produces two identical copies of the data in the array at the same time, distributed on two different disk drive groups. Mirroring provides complete data redundancy. When one copy of the data becomes unavailable, the external system can still access the other copy without affecting the operation and performance of the application system. Moreover, mirroring does not require additional computation and validation, and fault repair is very fast, just copy directly. Mirroring can read data concurrently from multiple replicas, providing higher read I/O performance, but cannot write data in parallel, as writing multiple replicas can result in some I/O performance degradation.

Mirroring provides very high data security and is very expensive, requiring at least double the storage space. The high cost limits the wide use of mirroring, mainly for critical data protection, where data loss can cause huge losses. In addition, the mirror can obtain a snapshot of the data at a specific point in time by “splitting”, so as to achieve a data backup technology with almost zero backup window.

Data stripe

The performance bottleneck for disk storage is head homing, which is a slow mechanical motion that cannot be matched with a fast CPU. Furthermore, there are physical limits to the performance of a single disk drive, and I/O performance is very limited. RAID consists of multiple disks, and data striping technology distributes data in blocks across multiple disks so that data can be processed concurrently. In this way, write and read data can be performed simultaneously on multiple disks, producing very high aggregate I/O concurrently, effectively improving overall I/O performance, and with good linear scalability. This is especially true for large volumes of data, which, if not partitioned, can only be stored sequentially on the disks of the disk array and read sequentially as needed. With the striping technology, the performance can be improved by several times and sequential access.

The selection of block size is very important in data striping technology. The granularity of the stripe can range from one byte to several kilobytes. The smaller the block, the stronger the parallel processing power and the higher the data access speed, but at the same time, the randomness of the block access and the block addressing time will increase. In practical application, appropriate block size should be selected according to the characteristics and requirements of data, and the balance between randomness of data access and concurrent processing ability should be carried out to achieve the highest overall performance as possible.

Data striping is based on the idea of improving I/O performance, which means that it only focuses on performance without any improvement in data reliability or availability. In fact, any one of these data strips will be corrupted and the entire data will be unusable, and the adoption of data striping technology increases the conceptual rate of data loss.

Data validation

Mirroring has high security and high read performance, but redundancy overhead is too expensive. Data striping improves performance through concurrency, but no consideration is given to data security or reliability. Data validation is a redundant technology that uses validation data to provide data security, detect data errors, and reconstruct data if the capability permits. Compared with mirroring, data validation significantly reduces the redundancy overhead and achieves excellent data integrity and reliability at a small cost. Data striping provides high performance and data validation provides data security, and RAID levels often combine the two techniques.

When data validation is used, RAID performs validation calculations at the same time as data is written, and the resulting validation data is stored on RAID member disks. Validation data can be stored centrally on a single disk or spread across multiple different disks, or even be partitioned, with different RAID levels implemented differently. When one part of the data is wrong, the remaining data and the check data can be used for inverse check calculation to reconstruct the lost data. Compared with the mirror technology, the advantage of verification technology is to save a lot of overhead, but because each data read and write to carry out a large number of verification operations, the computing speed of the computer is very high, must use the hardware RAID controller. In terms of data reconstruction and recovery, verification techniques are much more complex and slower than mirror techniques.

Hamming check code and XOR check are two most commonly used data check algorithms. Hamming check code was proposed by Richard Hamming, which can not only detect errors, but also give the location of errors and correct them automatically. The basic idea of Hamming check is to divide the effective information into several groups according to a certain law, test the parity of each group and arrange a check bit, so as to provide multiple error detection information to locate the error point and correct it. It can be seen that Hamming check is essentially a multiple parity check. XOR is generated by the XOR logic operation. The XOR operation is performed on a valid message with a given initial value to obtain the checksum. If the valid information is wrong, the correct valid information can be restored by the XOR operation between the verification information and the initial value.

Common RAID Types

Comparison of common 5 types of RAID, the number of N-bit disks, detailed introduction can refer to the extension to read

RAID level RAID0 RAID1 RAID5 RAID6 RAID10
The alias stripe The mirror Distribution of parity strips Double parity strips Mirror image plus stripe
Fault tolerance There is no There are There are There are There are
Redundant type There is no There are There are There are There are
Hot spare disk There is no There are There are There are There are
Read performance high low high high high
Random write performance high low general low general
Continuous write performance high low low low general
Number of disks required N 1 or more 2 n (n 1) or more N 3 or more N 4 or more 2 n (n 2 or higher) 4 or higher
The available capacity all 50% (n-1)/n (n-2)/n 50%

RAID level

Standard RAID Level

SNIA, Berkeley and other organizations to RAID0, RAID1, RAID2, RAID3, RAID4, RAID5, RAID6 seven levels as the standard RAID level, which is also recognized by the industry and academia. The Standard Level is the most basic set of RAID configurations that take advantage of data striping, mirroring, and data validation technologies individually or in combination. Standard RAID can be combined, that is, RAID combination level, to meet the performance, security, reliability requirements of higher storage applications.

JBOD

JBOD (Just a Bunch Of Disks) is not a standard RAID level. It is usually used to refer to a collection Of Disks that have no control software to provide coordinated control. JBOD chains multiple physical disks together to provide one large logical disk. The data storage mechanism of JBOD is to store data sequentially from the first disk, and then store data sequentially from the next disk when the current disk storage space is exhausted. JBOD storage performance is exactly the same as a single disk, and it does not provide data security protection. It simply provides a mechanism for expanding the storage space available to a JBOD equal to the sum of the storage space of all the member disks. Today JBOD is often referred to as a disk cabinet, whether or not it provides RAID functionality.

RAID0

RAID0 is a simple data striping technology with no data validation. It’s not really a RAID, because it doesn’t provide any form of redundancy strategy. RAID0 strips the disk to form a large-capacity storage space, and disperses the data in all disks, so as to realize parallel and read access to multiple disks in an independent access way. Because I/O operations can be performed concurrently, bus bandwidth is fully utilized. Combined with the fact that no data validation is required, RAID0 has the highest performance of any RAID level. Theoretically, a RAID0 composed of n disks can perform N times better than a single disk. However, due to the constraints of bus bandwidth and other factors, the actual performance improvement is lower than the theoretical value.

RAID0 has the advantages of low cost, high read and write performance, 100% high storage space utilization, but it does not provide data redundancy protection, once the data corrupted, will not be able to recover. Therefore, RAID0 is generally suitable for applications with strict performance requirements but low data security and reliability, such as video, audio storage, temporary data cache space, etc.

RAID1

RAID1, called mirroring, writes data to both working and mirroring disks in exactly the same way, with a disk space utilization of 50%. RAID1 has an impact on response time when data is written, but not when data is read. RAID1 provides the best data protection. In the event of a working disk failure, the system automatically reads data from the mirrored disk without affecting the user’s work.

RAID1 and RAID0 are just the opposite, in order to enhance the data security so that the two disk data is completely mirrored, so as to achieve good security, simple technology, convenient management. RAID1 is completely fault-tolerant, but expensive to implement. RAID1 is used for applications where sequential read and write performance is critical and where data protection is important, such as data protection for mail systems.

RAID5

RAID5, probably the most common RAID level by far, is similar in principle to RAID4, except that the validation data is distributed across all disks in the array instead of using a dedicated validation disk. For data and validation data, write operations can occur simultaneously on completely different disks. Therefore, RAID5 does not have the parity disk performance bottleneck that it had in RAID4 for concurrent writes. In addition, RAID5 also has good scalability. As the number of array disks increases, so does the ability to do parallel operations, supporting more disks than RAID4, resulting in higher capacity and higher performance.

Data blocks and corresponding calibration information are stored on different disks. When a data disk is damaged, the system can reconstruct the damaged data according to other data blocks and corresponding calibration data in the same strip. As with other RAID levels, RAID5 performance suffers when data is rebuilt.

RAID5 gives consideration to various factors such as storage performance, data security and storage cost. It can be understood as a compromise between RAID0 and RAID1, and it is the best data protection solution with comprehensive performance at present. RAID5 can basically meet the needs of most storage applications, and most data centers use it as a protection scheme for application data.

RAID6

Each of the RAID levels described earlier can only protect against data loss due to the failure of a single disk. If both disks fail at the same time, the data cannot be recovered. RAID6 introduces the concept of double validation, which protects the array from data loss if two disks fail at the same time. RAID6 level is a RAID designed to further enhance data protection on top of RAID5, which can be viewed as an extended RAID5 level.

RAID6 supports not only the recovery of data, but also the recovery of validation data, so the implementation is very expensive, and the design of the controller is more complex and expensive than other levels. The most common way to implement the RAID6 idea is to use two independent validation algorithms, called P and Q, where the validation data can be stored on two different validation disks, or scattered across all member disks. When two disks fail at the same time, the data on both disks can be reconstructed by solving the two-element equation.

RAID6 has fast read performance and higher fault tolerance. However, it costs much more than RAID5, has poor write performance, and is very complex to design and implement. Therefore, RAID6 is rarely used in practice, and is mainly used in occasions where the data security level is very high. It is generally an economical alternative to RAID10 solutions

RAID Combination Levels

Standard RAID levels have both advantages and disadvantages. Naturally, we thought of combining multiple RAID levels to complement each other’s strengths and make up for each other’s weaknesses, so as to achieve a higher RAID system in terms of performance, data security and other indicators. Currently, RAID combination levels mentioned in the industry and academic research mainly include RAID00, RAID01, RAID10, RAID100, RAID30, RAID50, RAID53, RAID60, However, only two levels, RAID01 and RAID10, have been widely used. Of course, composite levels are generally very expensive to implement and are only used in a few specific situations.

RAID10 and RAID01

While some literature considers these two RAID levels to be equivalent, this paper argues that they are different. RAID01 is striped first and then mirrored, essentially mirrored the physical disk; RAID10 is to do first mirroring and then striping, is to achieve mirroring of virtual disk. In general, RAID01 has better fault tolerance than RAID10 with the same configuration.

RAID01 combines the advantages of both RAID0 and RAID1 in that it first mirrors two disks and then strips within the image. RAID01 data is written to two disk arrays at the same time. If one of the arrays is damaged, it can continue to work, ensuring data security while improving performance. Both RAID01 and RAID10 have RAID1 schema in them, so overall disk utilization is only 50%.



RAID 50

A combination of RAID 5 and RAID 0, doing RAID 5 first and then RAID 0, that is, making Stripe accesses of RAID 5 groups to each other. Since RAID 50 is based on RAID 5, which requires at least three hard disks, it takes at least six hard disks to form RAID 50 with groups of RAID 5. As an example, the minimum configuration of six hard disks for RAID 50 is to divide the six disks into two groups of three each to form RAID 5, so that you have two RAID 5 groups, and then the two RAID 5 groups to form RAID 0.

RAID 50 will continue to function if one of the underlying RAID 5 or RAID 5 disks is destroyed, but if two or more disks are destroyed in any RAID 5 group, the entire RAID 50 will fail.

RAID 50 performance is higher than RAID5 alone, and capacity utilization is lower than RAID5, due to the combination of RAID5 groups at the top into a Stripe. For example, the same 9 hard disks are used, and each RAID 5 consists of 3 RAID 5s and RAID 50 RAID 0. Each RAID 5 group wastes one hard disk, and the utilization rate is (1-3/9), while RAID 5 is (1-1/9).

RAID 60

RAID 6 and RAID 0: RAID 6 first, then RAID 0. In other words, Stripe access on more than two sets of RAID 6. RAID 6 requires at least four hard disks, so RAID 60 requires a minimum of eight hard disks.

Since the bottom layer is made up of RAID 6, RAID 60 allows up to two hard disks to be destroyed in any RAID 6 group while the system is still functioning; However, as long as three hard disks are destroyed in any of the underlying RAID 6 groups, the entire RAID 60 group will fail, of course, the probability of this situation is quite low.

The upper layers of RAID 60 combine multiple RAID 6 groups to form Stripe access, resulting in higher performance than RAID 6 alone. However, the use of high threshold, and low capacity utilization rate is a bigger problem.

About RAID parameter tuning

Normally recommended system (RAID 1) and the data (RAID [5] | 10) separation, cited here @ Ye Jinrong teacher a paragraph

  1. Use SSD or PCIE SSD devices to achieve at least a hundredfold or even a thousandfold increase in IOPS

  2. The purchase of array cards equipped with both Cache and BBU modules can significantly improve IOPS (mainly mechanical disks, except SSD or PCIE SSD). At the same time, the health status of CACHE and BBU modules should be checked regularly to ensure that data will not be lost in case of accident.

  3. When you have an array card, set the array write policy to WB, or even Force WB (if you have dual electrical protection, or data security requirements are not particularly high), and do not use the WT policy. And close the array read-ahead policy

  4. Use RAID-10 instead of RAID-5 whenever possible (debatable)

  5. The use of mechanical disc, as far as possible to choose a high speed, such as 15KRPM, rather than 7.2KRPM disc, not a few money;

Optimization of SSD array card scheme

Thanks to @millet NOOPS operation and maintenance team. Please refer to the extension for detailed experimental data



Performance test conclusion

The performance test shows that the performance of R50 and R10 with the same capacity is similar: R50 is better than R10 for random read of small block files, and R50 is better than R10 for random write 4K, although the difference between R50 and R10 is 28%, R50 is better than R10 for random write 4K. R50 and R10 are very close in terms of sequential reads and writes.

In terms of fault tolerance, R50 is close to R10: the fault tolerance rate of the second plate R50 is very close to R10, with a difference of 30%. The advantage of R10 mainly lies in the fact that it has a certain probability to provide the fault tolerance of the third, or even the fourth disk. However, considering that it is not 100% fault tolerance, R50 has shown a better fault tolerance, at least better than R5, although there is some gap between R10 and R10 from the point of view of fault tolerance. And R50 collocation is flexible, can even specify 3 sets of R5 to achieve a maximum of 3 disk tolerance;

On the cost side, the R50 has a big advantage: the R50 is only 3/4 of the R10 in this configuration.

conclusion

RAID 50 offers close to RAID 10 performance, availability, and close to RAID 5 cost for a better overall cost-performance advantage, so consider replacing RAID 10 with RAID 50