What is NoSQL?

The data model

Structured data

Structured dataIt refers to the data logically expressed and realized by the two-dimensional table structure, strictly following the data format and length specifications

Unstructured data

Unstructured dataIt refers to data with irregular or incomplete data structure, which does not have any predefined data model and cannot be easily represented by two-dimensional logical tables, such as text, images, HTML, video and audio

Semi-structured data

Semi-structured data is a form of structured data. Although it does not conform to the data model structure of two-dimensional logic, semi-structured data contains related tags for separating semantic elements and layering records and fields, such as XML and JSON
Semi-structured data stores data in a tree or graph data structure
For structured data there is usually structure followed by data, while for semi-structured data there is data followed by structure

Relational database

Architectural evolution of storage as a relational database

Phase 1: In the early stage of enterprise development, an application server adds a relational database and reads and writes the database each time
Phase 2: As the enterprise scale expands, application servers become performance bottlenecks. Add multiple application servers and use Nginx as a layer of load balancing at the traffic entrance
Phase 3: As the enterprise scale continues to expand, the database becomes a performance bottleneck. In this case, read and write data to the primary and secondary databases are separated. Data is synchronized between the primary and secondary databases using the binlog
Stage four: the development of the enterprise is getting better and better, and the pressure of read-write separation database is still increasing. Increase the number of databases to do sub-database sub-table, to do vertical split table, to do horizontal split database

Advantages and disadvantages of relational databases

advantages

Easy to operate: the general SQL language makes it very convenient to operate relational databases, support join and other complex queries
Data consistency: Supports ACID to maintain data consistency
Data stability: Data is persisted to disks without risk of data loss and supports massive data storage
Stable service: The most commonly used relational database products, MySql and Oracle, have excellent performance and stable service

disadvantages

With high concurrency, I/O pressure is high: Data is stored in rows. Even if operations are performed on only one column, the entire row of data is read from the storage device to the memory, resulting in high I/OS
High index maintenance costs: Data updates are accompanied by updates of all secondary indexes, reducing the read and write performance of the relational database, and the more indexes, the worse the read and write performance
High cost of maintaining data consistency: THE SQL standard defines different isolation levels for transactions, from low to high: read uncommitted, read committed, repeatability, and serialization. The higher the isolation level, the worse the read and write performance
Problems with horizontal scaling: Data migration, cross-library joins, and distributed transactions are all issues that need to be considered after repository splitting
Inconvenient expansion of the table structure: If the table structure needs to be modified, DDL needs to be executed to lock the table and some services are unavailable

Non-relational databases

Non-relational database (NoSQL, Not Only SQL) is a database management system that is different from traditional relational database. It is mainly used to solve the requirements of high concurrent reading and writing, mass storage and high scalability of data.

Advantages of NoSQL databases

High scalability: NoSQL data has no relationship to each other, so it is very easy to scale
High performance: NoSQL also has high read and write performance due to its irrelevance
Flexible data model: NoSQL does not need to create fields for the data to be stored and can store custom data formats at any time

Classification of NoSQL databases

	The key value store	Column storage	Document storage	Graphics store
Storage structure	Key/value pair	Column cluster storage	Class a JSON object	The graph structure
Application scenarios	Content caching	Distributed data storage and management	The Web application	Relationship graph
Typical representative	`Redis`,`Memcached`	`Cassandra`,`HBase`	`MongoDB`,`CouchDB`	`Neo4j`,`Infinite Graph`

No CAP + BASE

NoSQL tends to be multi-node and uses BASE theory to ensure data consistency

Theory of CAP

In July 2000, Professor Eric Brewer of University of California, Berkeley proposed CAP conjecture at ACM PODC conference. Two years later, Seth Gilbert and Nancy Lynch of the Massachusetts Institute of Technology proved CAP theoretically. Since then, CAP theory has officially become the accepted theorem in distributed computing.

A distributed system can only satisfy at most two of Consistency, Availability and Partition tolerance at the same time
- Strong Consistency: Data on all nodes is consistent at the same time after a successful update operation is returned to the client. (Weak Consistency and final Consistency are not restricted by CAP theory.)
- Availability: Services are always available and have normal response times
- Partition tolerance: The loss or failure of any information in the system does not affect the continued operation of the system
The relationship between the CAP
- CP without A: Once A network fault or message loss occurs, services are provided after all data is consistent. For example, distributed storage systems such as Redis and HBase or distributed coordination components such as Zookeeper require data consistency
- AP WiHTout C: Once network problems occur, each node can only provide services with local data, leading to global data inconsistency. Many Web applications abandon strong consistency to ensure final consistency in order to provide high availability services (refer to the BASE theory below).

The BASE theory of

BASE theory originated in 2008 and was published by eBay architect Dan Pritchett at the ACM.

BASE is Basically Available, Soft state, and Eventually consistent.
Eventually consistent: All copies of data in a system that Eventually reach a consistent state after a period of synchronization

What is HBase?

HBase is a high availability, high performance, and multi-version distributed NoSQL database based on Apache Hadoop. It is an open source implementation of Google BigTable and provides high-performance random read/write capabilities for massive data.

HBase Storage Structure

The data model

HBase is essentially a key-value database
Key consists of RowKey (RowKey) +ColumnFamily (Column family) +Column Qualifier (Column Qualifier) +TimeStamp (TimeStamp — version) +KeyType (type) and Value is the actual Value

System architecture

Client provides interfaces for accessing HBase and maintains the corresponding cache to facilitate access
Zookeeper: Stores HBase metadata. The Client obtains the metadata from Zookeeper to know which machine to read and write data on
HRegionServer processes read and write requests from clients and interacts with HDFS. It is a node that does real work
HMaster, which processes metadata changes and monitors the status of RegionServer

HRegionServer structure

Data in a table is horizontally segmented to HRegion by RowKey. HRegion is the smallest unit of distributed storage and load balancing in Hbase. An HRegionServer can contain multiple HRegions
HRegion data is vertically segmented to Store in ColumnFamily. Store is the HBase core storage unit consisting of MemStore and StoreFile
HBase writes data to MemStore first. When the MemStore exceeds a certain threshold, data in the memory is written to hard disks to form StoreFile
StoreFile is stored in HFile format at the bottom layer. HFile is the data format stored in HBase
In order to prevent machine downtime, the data in memory will hang before flushing to disk, so when writing Mem Store will also write a HLog

Refer to the article

Sql Or NoSql

CAP Theory of Distributed Systems

I finally understand HBase, it’s not easy…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

No entry

What is NoSQL?

The data model

Structured data

Unstructured data

Semi-structured data

Relational database

Architectural evolution of storage as a relational database

Advantages and disadvantages of relational databases

advantages

disadvantages

Non-relational databases

Advantages of NoSQL databases

Classification of NoSQL databases

No CAP + BASE

Theory of CAP

The BASE theory of

What is HBase?

HBase Storage Structure

The data model

System architecture

HRegionServer structure

Refer to the article

No entry

What is NoSQL?

The data model

Structured data

Unstructured data

Semi-structured data

Relational database

Architectural evolution of storage as a relational database

Advantages and disadvantages of relational databases

advantages

disadvantages

Non-relational databases

Advantages of NoSQL databases

Classification of NoSQL databases

No CAP + BASE

Theory of CAP

The BASE theory of

What is HBase?

HBase Storage Structure

The data model

System architecture

HRegionServer structure

Refer to the article

Related Posts

Does C++ only have three major features? No, there are four

Technical requirements document, it should be written like this!

Kubernetes Preempt scheduler source code in-depth analysis -Kubernetes business environment combat