2021 Big data field high-quality creation blog, take you from entry to mastery, the blog updates every day, gradually improve the big data knowledge system of the article, help you more efficient learning.

Those who are interested in big data can follow the wechat official account: Sanbang Big Data

Hadoop is introduced

Hadoop is an open source software framework in the Java language, which is owned by Apache. Hadoop is a software platform for developing and running large-scale data processing. Allows distributed processing of large data sets on large clusters of computers using a simple programming model.

In a narrow sense, Hadoop refers to Apache, an open source framework whose core components are:

Distributed file system (HDFS) : Provides massive data storage

MAPREDUCE (distributed computing programming framework) : Solves massive data computing

YARN (a framework for job scheduling and cluster resource management) : Resolves resource task scheduling

Β 

Broadly speaking, Hadoop usually refers to a broader concept called the Hadoop ecosystem.

Β 

Hadoop today has grown into a large system, and as the ecosystem has grown, more and more projects have emerged, some of which are not led by Apache, that complement Hadoop or are higher level abstractions. Such as:

The framework use
HDFS Distributed file system
MapReduce Distributed computing program development framework
ZooKeeper Distributed coordination service base component
HIVE Distributed data warehouse based on HADOOP, providing SQL based query data operation
FLUME Log data collection framework
oozie Workflow Scheduling framework
Sqoop Data import and export tool (for example between mysql and HDFS)
Impala Real-time SQL query analysis based on Hive
Mahout Machine learning algorithm library based on distributed computing framework such as MapReduce/Spark/Flink
  • πŸ“’ Blog homepage: lansonli.blog.csdn.net
  • πŸ“’ Welcome to like πŸ‘ collect ⭐ message πŸ“ if there is any error please correct!
  • πŸ“’ this post was originally written by Lansonli and originally published on the CSDN blog πŸ™‰
  • πŸ“’ Big data series will be updated every day. When you stop to rest, don’t forget that others are still running. I hope you can seize the time to learn and strive for a better life ✨