2021 Quality creation blog in the field of big data, take you from the entry to the master, the blog is updated every day, gradually improve the knowledge system of big data articles, to help you learn more efficiently.

Those who are interested in big data can follow the wechat public account sanbang Big Data

directory

A brief history of Hadoop

Hadoop features advantages

A brief history of Hadoop

Hadoop was founded by Doug Cutting, founder of Apache Lucene. Originally Nutch was a subproject of Lucene. Nutch was designed to build a large, full-web search engine, including web crawling, indexing, querying, etc., but as the number of crawls increased, it encountered a serious scalability problem: how to solve the storage and indexing of billions of web pages.

In 2003 Google published a paper offering a possible solution to this problem. This paper describes Google’s product architecture, called Google Distributed File System (GFS), which can solve their storage requirements of large files generated in the process of crawling and indexing web pages.

In 2004, Google published a paper introducing the world to its version of MapReduce.

At the same time, based on Google’s paper, Nutch developers completed the corresponding open source implementation of HDFS and MAPREDUCE, which was spun off from Nutch to become an independent project HADOOP. By January 2008, HADOOP became the top-level project of Apache and ushered in its rapid development period.

In 2006 Google published a paper on BigTable, which led to the development of Hbase.

As a result, Hadoop and its ecosystem cannot grow without Google’s contributions.

Hadoop features advantages

Scalable: Hadoop distributes data and performs computations between clusters of available computers that can be easily scaled to thousands of nodes.

Economical: Hadoop distributes and processes data cheaply by clustering servers on common, inexpensive machines.

Efficient: With concurrent data, Hadoop can dynamically move data in parallel between nodes, making it very fast.

Rellable: It can automatically maintain multiple copies of data and automatically redeploy computing tasks when a task fails. So Hadoop’s ability to store and process data by bit is trustworthy.

  • 📢 : lansonli.blog.csdn.net
  • 📢 welcome to like 👍 collect ⭐ message 📝 if there is an error please correct!
  • 📢 this article was originally written by Lansonli and originally appeared on CSDN blog 🙉
  • 📢 big data series of articles will be updated every day, stop to rest do not forget that others are still running, I hope that we seize the time to learn, strive for a better life ✨