“This is the fifth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

Three informationization waves

Information wave Time of occurrence mark Problem solved On behalf of the enterprise
First wave Around 1980 Personal computer Information processing Intel, AMD, IBM
Second wave Around 1995 The Internet Information transmission Yahoo, Google, Alibaba
Third wave Around 2010 Internet of Things, cloud computing and big data The information explosion Amazon, Google, Aliyun

 

Characteristics of big data

  • Large amount of data
  • Various data types
  • Fast processing speed
  • Low value density
  • authenticity

Storage location of the HDFS final data block

The location of the datanode

Master Specifies the role of the primary server

Master The Master server manages tables and regions.

Manage user operations such as adding, deleting, modifying, and querying tables. Load balancing is implemented among servers of different regions. After a Region is split or merged, adjust its distribution. Migrate the Region on the failed Region server.Copy the code

The role of the Region

Maintains the regions assigned to the master and processes THE I/O requests from these regions. Splits the regions that become too large during operationCopy the code

Hadoop features:

High reliability, high efficiency, high scalability, high fault tolerance, low cost, running on Linux platform, support a variety of programming languages.

HBase

It provides high reliability, high performance, scalable, real-time read and write, and distributed column database. HDFS is generally used as its underlying data storage.

HBase Access Interface

Native Java API features: The most common and efficient access mode Scenario: Suitable for Parallel batch processing HBase table data in Hadoop MapReduce jobs.

HBase Shell features: HBase command line tool. Simplest interface Scenarios: Suitable for HBase management.

HBase Programming Practice

 

Format command:./bin/ HDFS namenode-format

Hadoop fs -mkdir [-p] hadoop fs -mkdir [-p]

Each Strore corresponds to a column family

Hadoop core configuration file

core-site.xml

 <configuration>

    <property>

        <name>hadoop.tmp.dir</name>

       <value>file:/usr/local/hadoop/tmp</value>

       <description>Abase for other temporary directories.</description>

    </property>

     <property>

         <name>fs.defaultFS</name>

         <value>hdfs://localhost:9000</value>

     </property>

</configuration>
Copy the code

Hadoop.tmp. dir: directory for storing temporary files during hadoop running

Fs. defaultFS: the name of the default file system

hdfs-site.xml



 <configuration>

     <property>

         <name>dfs.replication</name>

         <value>1</value>

     </property>

     <property>

          <name>dfs.namenode.name.dir</name>

          <value>file:/usr/local/hadoop/tmp/dfs/name</value>

     </property>

     <property>

          <name>dfs.datanode.data.dir</name>

          <value>file:/usr/local/hadoop/tmp/dfs/data</value>

     </property>

</configuration>
Copy the code

Dfs. replication: indicates the redundancy number. Set this parameter to 1.

Dfs.namenode.name. dir: indicates the local disk directory where fsimage files are stored and metadata is stored in namenode of Hadoop

Dfs.datanode.data. dir: stores multiple data blocks in datanode, the datanode of Hadoop.

The characteristics of the hadoop

  • High reliability
  • High efficiency
  • High scalability
  • High fault tolerance
  • The cost is low
  • It runs on Linux
  • Support for multiple programming languages

Name nodes and data nodes

NameNode DataNode
Storing metadata Storing file contents
Metadata is stored in memory The file content is saved on disk
Save the mapping between file blocks and Datanodes Maintain the mapping between blocks and DataNode local files

HBase functional components

  • Library function
  • A Master Master server
  • Multiple Region servers

Cloud computing

· There are three typical service modes of cloud computing: IaaS (Infrastructure as a service), PaaS (platform as a service) and SaaS (software as a service). Addendum: DaaS (Data as a Service)

· There are 3 types of cloud computing: public cloud, private cloud and hybrid cloud.

· Key technologies of cloud computing: virtualization, distributed storage, distributed computing, multi-tenant, etc.

· The concept of cloud computing: Cloud computing realizes the provision of scalable and inexpensive distributed computing capability through the network. Users can obtain all kinds of IT resources they need whenever and wherever they have network access conditions. Cloud computing is the most representative network computing technology and mode in recent years, which represents the dynamic and extensible network application infrastructure with virtualization technology as the core and low cost as the goal.

The Internet of things

  • The Internet of Things has four layers: perception layer, network layer, processing layer and application layer.

  • The connection between big data, cloud computing and Internet of Things: Cloud computing provides the technical basis for big data, and big data provides the place for cloud computing; The Internet of Things is an important source of big data, and big data technology provides support for data analysis of the Internet of Things. Cloud computing provides massive data storage capacity for the Internet of Things, and the Internet of Things provides broad application space for cloud computing technology.

  • The difference between big data and cloud computing and Internet of Things: Big data focuses on the storage, processing and analysis of massive data, discovering value from massive data and serving production and life; Cloud computing essentially aims to integrate and optimize various IT resources and provide them cheaply to users as a service over the network; The development goal of the Internet of Things is to connect things, and application innovation is the core of the development of the Internet of Things.