Before we systematically study big data, we should first understand what system platform big data development is carried out. So we have to learn the knowledge of Linux before learning, this part is particularly important.

1. Linux learning

Learn how to install a Linux image using VMware —— Learn about the Linux desktop environment and shell environment —— Perform file system operations in the shell environment. Master more than 20 common commands such as more, touch, cp, mv, rm —— Learn Linux network management, master IP, hostname Settings —— Learn vmware and Linux communication Settings. Learn how to implement host-only, Bridge, NAT and other network connection modes —— Learn Linux process management, master how to view and delete processes —— Learn Linux software management, master Java, mysql installation —— Learn environment variable configuration, Learn how to set environment variables —— Learn SSH management of Linux, master how to achieve password-free login —— Learn firewall management of Linux, master how to disable the firewall and open the specified port —— Learn Scheduling management of Linux, master how to use crontab

These are almost the learning points of Linux, only to master this part of the content, in order to learn the knowledge points behind more handy.

2. Hadoop learning

Pseudo-distribution experiment environment ——HDFS system structure and shell, Java operation mode ——MapReduce system structure and various algorithms

Of course, there are many more hadoop knowledge points here, like the knowledge below the picture is the focus of the content, students who want to learn systematically can look at the learning path of systematic learning

3. Zookeeper learning

What is Zookeeper? —— Set up a Zookeeper cluster environment —— How to use the CLI to operate Zookeeper? —— How to use Java to operate Zookeeper

4. HBase learning

Hbase overview of — — — — — – the hbase data model — — — — — – hbase table design — — — — — – the pseudo hbase distributed and cluster installation — — — — — – the hbase shell operation — — — — — – hbase JavaAPI operation — — — — — – – hbase data migration – the hbase data backup and recovery — — — — — – hbase combining Hive to use — — — — — – hbase cluster management — — — — — – the hbase performance tuning

5. CM+CDH cluster management learning

CM + CDH cluster installation — — — — — – CM host and various kinds of service component based management — — — — — – CDH cluster configuration and parameter tuning of — — — — — – HA configuration of CDH cluster and the cluster upgrade — — — — — – CM monitoring management — — — — — – cluster management matters needing attention

Hive learning

Hive support data types — — — — — – Hive data management — — — — — — — — — — — Hive queries – Hive function — — — — — – Hive file format — — — — — – project of actual combat

Sqoop learning

Flume architecture —— Flume Agent configuration information —— How does Flume dynamically monitor file changes in folders —— How does Flume import data into HDFS —— How does Flume dynamically monitor log file changes and then import them to HDFS

  


The above knowledge points can be said to be the most important link in systematic big data learning. We can summarize them into one chapter as a whole. Of course, there is still a lot of knowledge to learn in addition to the above knowledge points.

Such as:

Machine learning knowledge: R language —— Mahout

Storm streaming computing: Kafka ——srorm——redis

Spark memory calculation: Scala programming — — — — — – spark core — — — — — – spark SQL — — — — — – spark streaming — — — — — – spark mllib — — — — — – the spark Graphx —— Python machine learning —— Spark Python Programming

Cloud computing platform: Docker —— KVM ——openstack cloud computing

And so on…