background

We all have mechanical hard drives installed on our laptops, where our JDK and development tools are stored. If you want performance or like to play big games, you might want to install a SOLID-state drive. SSDS are small in size, light in weight, and fast in read/write speed. So it is very popular with everyone.

While solid-state drives give us read and write speed, they also pose a cost problem.

There is a big price gap between solid state and mechanical hard drives of the same capacity.

In enterprises, disk storage capacity is a prerequisite for the ability to play with large amounts of data. The expansion of storage capacity cannot only consider the number of disks but not the quality of data storage. Because the storage quality of data is an important part of determining fault tolerance.

Frequent read/write operations on a disk for a long time may cause disk faults and data loss. Therefore, considering storage capacity, performance, reliability, and cost, the virtual storage technology is used to combine multiple disks to form one or more disk arrays to improve performance and redundancy. The technology is Redundant Array of Inexpensive Disks.

Virtual Storage Technology

In this section, we introduce what virtual storage technology is.

Start with virtual machines

Virtualization, the term may seem abstract. I don’t know about virtualization, but I’ve used virtual machines.

Common VMS include VirtualBox, VMware Workstation, and Virtual PC.

Java development also has a JVM, what’s the difference?

The so-called virtual machine, in fact, are all virtual to the real environment, in order to achieve the purpose of convenience, but different types of it. There are three types of VMS:

  • System VMS, for example, VMware.
  • Program virtual machines, such as Java Virtual Machines (JVMS).
  • Operating system layer virtualization, for example, Docker.

They are targeted at different levels of virtualization, including hardware, operating system, programming language, and storage. These virtualization scenarios can solve different problems.

What is virtualization?

Concept first: Virtualization is the process of creating software-based (or virtual) representations of components such as virtual applications, servers, storage, and networks. IT is the most effective way to reduce IT overhead for businesses of all sizes while increasing their efficiency and agility.

What are the benefits of virtualization?

One of the most obvious benefits is that virtualization can reduce the number of server deployments in the same business scenario.

Virtualization enables each application and operating system to be in a separate container. That is the virtual machine mentioned above. Virtual machines are completely independent. Computing, storage, and network resources are placed in a resource pool.

For example, our own computer is like a centralized resource pool from which multiple virtual machines are running.

Storage device-based virtualization

The virtualization of programming language, the birth of JVM virtual machine, so that we can write Java applications to achieve platform independence, to achieve the characteristics of “compile once, run anywhere”, then how to understand the virtualization of data storage?

In storage device virtualization, a logical layer is added to a storage device to access storage resources through the logical layer.

O&m administrators can easily adjust storage resources to improve storage utilization.

For end users, centralized storage devices provide better read/write performance and ease of use.

The logical layer above the storage device is similar to the relationship between interface abstraction and implementation in Java development. It makes physical resources and logical resources no longer correspond to one another, shields a large number of details, and makes the underlying hardware logic abstract and isolated, but also improves scalability and management difficulty.

Raid Consists of redundant arrays of disks

After the introduction of virtualization technology, we enter the introduction of REDUNDANT array of Raid disks.

Raid is a redundant array consisting of multiple hard disks. It combines multiple physical hard disks into one logical disk. From the upper level, IT is just like using a physical hard disk to ensure data storage quality and high efficiency.

Simply put, RAID combines multiple hard drives into a single logical drive, so the operating system treats it as a single physical drive. RAID is often used on server computers and often uses identical hard drives as a combination. As hard disk prices continue to fall and RAID functions are more effectively integrated with motherboards, it is also becoming an option for ordinary users, especially for jobs that require large storage space, such as video and audio production.

RAID is a redundant array consisting of multiple hard disks. Although a RAID array consists of multiple hard disks, it functions as an independent large-scale storage device in an operating system.

The characteristics of the RAID

  • security

The introduction of RAID improves data security (not all RAID arrays have this feature, but more on this later). The same data is backed up on other disks. If a disk is damaged, data can be recovered from other disks to avoid data loss.

  • Enhanced parallelism

Compared with a single physical disk, a disk group provides larger storage capacity and higher storage performance. Instead of reading and writing to a single disk, it reads and writes to multiple disks simultaneously.

The classification of the Raid

RAID0

In fact, RAID0 is not really a RAID.

Not only does it not provide any fault tolerance. It also increases the possibility of data loss. Because in RAID0, no data is copied. Instead, the data is distributed across two separate disks. If a disk is damaged, all data becomes unavailable.

You might be wondering, what’s the point of RAID0?

In fact, as mentioned earlier, RAID0 exists because it doesn’t provide backup redundancy, but it writes quickly. Combining the two disks improves read/write parallelism and speeds up data reading and writing.

RAID1

RAID1 mirrors data from one disk to another, providing 100% data redundancy but only 50% disk utilization. RAID1 can be used in enterprise monitoring, data critical, and high availability service scenarios.

RAID1 requires at least two disks. When a RAID1 disk is faulty, a maximum of one disk can be damaged and replaced in a timely manner. Otherwise, the system will crash.

RAID5

RAID5 is a storage solution that balances storage performance, data security and storage cost.

Unlike RAID1, which uses a mirror disk entirely for data redundancy, it uses a method called parity bits to record parity data. It also does not back up the data, but stores the parity information corresponding to the data on other disks. When a disk is damaged, the data on other disks can be used to restore the damaged data.

RAID5 requires three or more disks to withstand the damage of two data disks to the maximum extent.

The parity bits are used to recover data when a disk fails. Since all the parity bits require almost a disk-sized capacity, the capacity of the real data store in the disk array is reduced. If there are four disks, each disk has a storage capacity of 1TB, the final storage capacity for data is about 3TB, because parity information takes up about 1TB of space.

RAID10

RAID 10 (also known as RAID0 + 1) is a combination of RAID0 and RAID1. It provides high performance, high availability, and better performance than RAID5. It is especially suitable for large-write applications, but the cost is relatively high. No matter how many disks you have, you’re losing half of your disk storage. According to my understanding, at least four hard disks are needed to complete the task. A and B divide data and store half of the data respectively. C and D make mirror backup of A and B respectively. This is perfect. It’s my ideal state. RAID 5 parity is not required. Obviously, that would also cost more.

How to select an appropriate RAID policy

RAID policy selection is based on three factors: data availability, I/O performance, and cost. If availability is not required, select RAID0 for high performance. If availability and performance are important and cost is not a major factor, RAID1 is selected based on the number of disks. If availability, cost, and performance are equally important, RAID5 is selected for general data transfers and number of disks. In actual applications, you need to select an appropriate RAID policy based on the data application characteristics and specific conditions of users, and consider the availability, performance, and cost. Based on the characteristics of each RAID policy and the requirements of service scenarios, you can summarize the basis for selecting RAID policies.

conclusion

This article starts with the comparison between mechanical hard disks and SSDS, extends to the concept of virtual machines and virtualization technologies, and introduces different levels of virtualization, such as system-level virtualization representing VMware. Programming language level virtualization JVM virtual machines, operating system layer virtualization represents Docker. An enterprise needs to balance storage capacity, reliability, performance, and cost.

The composition, security and parallelism of redundant disk array are also discussed. This section describes how to select an appropriate RAID policy for different service scenarios.

Welcome to pay attention to the development notes of Xia Dream, here is a temperature Java community, share experience and experience, is willing to forge ahead with you want to invite you to join this temperature Java community to share experience and experience, pool wisdom, efficiently answer questions to help others, exercise yourself