Object storage MINIO overview and construction
[Preface] With the rapid development of the Internet, we will face more and more massive unstructured data storage requirements, in this context, object storage solutions for our file access to provide great convenience and guarantee. Today, the open source object storage tool MINIO is used as an example to introduce the history of object storage.
01. Why object storage
Before we talk about object storage, let's look at the two main types of storage that have been dominating the storage market: block storage and file storage.
Block storage is like a piece of hard disk directly mounted to the host in the form of volume or hard disk. It knows nothing about the content and format of stored data. It only cares about reading and writing data, but does not care about relationship or usage. However, it is too biased to the bottom layer, which is not conducive to expansion. Common DAS (direct storage) and SAN (storage area network).
DAS means that a storage device is directly connected to a server through the SMALL Computer System Interface (SCSI). The storage media is directly mounted to an internal bus and the data store is part of the entire server structure.
A SAN is a dedicated storage system that connects one or more storage devices to servers through a high-speed network. A SAN can be regarded as a network, including various elements such as disk arrays and switches.
File storage is generally in the form of files and directories, with multi-level access paths and directory structure based on the file system. Data is accessed in the form of files, and some advanced management functions, such as file-level access permission control, can be implemented. File storage can be easily shared and widely used, but its read and write speed is relatively slow. A common one is NAS(Network attached Storage server).
A NAS device migrates a file system on a local host to an IP network device. Multiple user nodes can share the same file system on the same NAS.
Due to the storage characteristics of block storage and file storage, they are not suitable for public cloud storage and are generally only applicable to lans. In addition, with the development of Internet demand, the explosive growth of data, constantly eating up storage resources; Data types are diversified, and the proportion of unstructured data is increasing. How to cope with new storage requirements? Object storage comes into being.
02. What is object storage
Object storage (OBS) is an object-based storage device that combines the advantages of a NAS device and a SAN device. It provides the high-speed direct access of a SAN device and distributed data sharing of a NAS device. Suitable for storing massive images, videos, log files, backup data, and container images.
In 2006, Amazon released AWS S3(Simple Storage Service), which officially took object Storage as a cloud Storage Service and introduced it into the field of cloud computing, officially opening the golden age of object Storage. S3 is now a mainstream object Storage protocol standard. Many service implementation vendors have good compatibility with it.
Object storage The underlying storage hardware is still the hard disk, which is no different from block storage or file storage, but the system on the underlying hardware is completely different.
Object storage is very simple, with only two core concepts: buckets and objects. Storage data is organized in tenant - bucket - object mode:
- You can create tenants for accessing different buckets.
- Tenants can isolate storage resources and create different buckets and objects.
- A bucket is a collection of containers that hold objects. A bucket can have many objects that are flat (flat).
- An object is an entity unit of a data store, similar to a hash table entry. Key is the object name, value is the object content, and it is stored in KV mode.
The object storage system uses REST APIS (HTTP actions) to process resource information. The interface commands are simple. The storage protocols are S3 and SWIFT. Flat data organization structure also brings high scalability.
In object storage selection, we often want to meet the basic performance of the premise, as simple as possible, among many open source object storage solutions, MINIO with its minimalist guiding ideology, stand out.
Minimalist concept - adopt simple and reliable cluster management solution, abandon complex large-scale cluster scheduling management, reduce risk factors and performance bottlenecks, focus on the core functions of the product, to create a highly reliable cluster, flexible expansion ability and ultra-high performance.
Building block expansion: Build a large number of small - and medium-sized clusters that are easy to manage and can be aggregated into a large resource pool across data centers, rather than a large and centrally managed distributed cluster.
03. MINIO Principle
- MINIO Macro Architecture (decentralization)
The MINIO cluster consists of multiple nodes with the same roles. There is no specific node, and the failure of any node does not affect the entire cluster. Objects are fragmented and stored on multiple disks on different nodes to provide unified namespace. Load balancing is implemented on each node through Web load balancing or DNS Round Robin. Each node is compatible with S3 interfaces.
MINIO Node storage
Before introducing node storage, the following concepts are introduced:
1) Drive: The disk where data is stored is called Drive.
2) Set: A Set of drives constitutes a Set; The Drive of each Set is distributed on different nodes as far as possible. An object is stored on a Set;
3) Bucket: logical location of file object storage. For clients, it is equivalent to the top-level folder where files are stored.
Instead of storing multiple copies, MINIO encodes the original data into multiple copies and stores them on corresponding drives in the form of sets.
A cluster contains multiple sets. The specific Set is hashed by object name and mapped to a unique Set. In this way, data is evenly distributed on all drives.
By default, minio automatically calculates the number of sets contained in a cluster based on the cluster size. You can also specify the number of sets. In line with the principle of putting eggs in multiple baskets to ensure data reliability, the drives of a Set are distributed on different nodes as far as possible.
As shown in the figure above, each row on the right is a machine node, each small block is a Drive, and the yellow and blue blocks in the figure form a Set. When an object is written, the original data is encoded into N copies, which is the number of drives in a Set. Then the unique Set is found through the Hash of the object name, and each copy of data is written to the corresponding Drive in the Set to complete the storage of the object in the Set.
Data reliability assurance
MINIO uses Erasure code and Checksum to ensure data reliability. Data can be recovered even if half of the disks (N/2) are lost.
1) ERASURE - CODED
An object is encoded into several data blocks and parity blocks. For convenience, data blocks and parity blocks are collectively referred to as code blocks. Erasure correcting codes are a mechanism by which the whole object can be restored based on a part of the code blocks.
In a Set, half of the drives are used as data blocks and half of the drives are used as parity blocks. This method has the highest reliability and redundancy. The total size of all code blocks is twice the size of the source file. However, compared with the multi-copy storage mode, the redundancy is greatly reduced (only one more copy of data is stored), and the reliability is the highest. In general, the loss or damage of half of the disks can be tolerated.
Its essence is to make use of the spare CPU of the storage machine to ensure the reliability of data through the calculation and processing of the specific algorithm of erasure codes.
Storage medium there may be another problem is that an attenuation problem (bit rot), it is to point to in case of wear, dust, radiation, heat and other factors, the performance and integrity of data in a storage medium of slow degradation problems, the slight damage, could often not be detect by the OS and hardware, but can lead to serious consequences, For example, writing a bit stream to the storage medium, and then reading it out after a period of time, the two inconsistent phenomenon.
To deal with bit ROT problems, MINIO computs checksums of previous codecs through the HighwayHash algorithm to ensure correctness, and provides a management tool to fix problematic codecs.
From the perspective of data reliability, objects are first divided into several equal length fragments, and then divided into several data fragments and verification fragments through erasure code algorithm, and each fragment is stored on a Drive.
Data storage form
Suppose that the erasution group in the MINIO cluster contains four disks, the object we want to store is named MyObject, the storage bucket belonging to it is named MyBucket, and the disks calculated by hashing are disk1-4, then the subpaths of MyBucket/MyObject will be generated on all four disks. The MyObject directory contains two files, xL. json, which stores metadata information, and the corresponding shard Part.1.
How does object storage manage shard data? How do you find the data or information you want? Object metadata management is involved here. Object metadata is used to manage data storage, fragmentation, and information about the data itself. Because metadata is stored separately, we can extend as much metadata for an object as we want, which is not possible for file storage.
Object storage is flat storage (all objects are laid out in buckets). If we want to make objects look like file systems with a directory hierarchy, we can consider setting object names with hierarchies to achieve the effect of multi-level directories (eg. KEY is Temp/MyObject).
04. Cluster deployment and construction
If the production environment has at least four nodes, perform the following steps for each node:
1. Create a directory
Create the startup script and binary file directory /opt/minio, and save the minio program in this directory.
Create the data store directory data.
Create configuration file directory /etc/minio;
2. Edit the cluster startup script
vim /opt/minio/run.sh Copy the code
MINIO_ACCESS_KEY: indicates the user name. The value contains at least five characters
MINIO_SECRET_KEY: specifies the password. The password cannot be too simple. Otherwise, the system may fail to start
-config-dir: specifies the cluster configuration file directory
3. Configure the system service minio.service
vim /usr/lib/systemd/system/minio.service Copy the code
WorkingDirectory: binary directory
ExecStart: specifies the cluster startup script
[Unit] Description=Minio service Documentation=https://docs.minio.io/ [Service] WorkingDirectory=/opt/minio/ ExecStart=/opt/minio/run.sh Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target Copy the code
4. Modify permissions
Add permissions to all files or directories involved
The service file
Cluster startup script
chmod +x /usr/lib/systemd/system/minio.service chmod +x /opt/minio/run/minio chmod +x /opt/minio/run/run.sh Copy the code
5. Start the cluster
systemctl daemon-reload # Systemd writes the contents of the Unit file to the cache, so when the Unit file is updated, tell Systemd to read all the Unit files systemctl enable minio # Activate service, boot up systemctl start minio # start service systemctl status minio Check service status Copy the code