The second issue: Big data related q&A summary, keep an eye on the update
A: Learning anything is the same. It's a bit of a hurdle at the beginning. I love reading books, especially those that are easy to get started with. As for big data, my specific research direction is the machine learning application of large-scale data, so I need to master the following basic concepts first. Of course, if you have calculus, line generation, and probability theory to start with, you can basically get into machine learning. I highly recommend a few books.
Big Data Interview (Hadoop)
1. How does HAnamenode work? ZKFailoverController main responsibility: 1) Health monitoring: periodically sends health detection commands to the NN it monitors to determine whether a NameNode is in a healthy state, if the machine
Hadoop HA
HA is a High availability cluster, which is an effective solution to ensure service continuity. Generally, HA consists of two or more nodes, including active nodes and standby nodes. The one that is performing business is usually called the active node, and the one that is a backup of the active node is called the standby node. When a problem occurs on the active node, the running business (...
Hadoop - Single-node pseudo-distributed construction
$(/usr/libexec/ java_HOME); $(/usr/libexec/java_home); I'm ZSH so change.zshrc, don't forget source. 3. Pseudo-distributed configuration dfs.replication HDFS file save...
Hadoop configuration based on pseudo-distributed system
Hadoop itself is a distributed system application, but most of the time there is no need to do clustering for simple testing. Pseudo-distribution is essentially configuring standalone versions of Hadoop
Aliyun CentOs7.2 set up Hadoop2.7.3 pseudo-distributed practice
Use >> instead of >, because if other hosts (such as A) also log in without login, you can also add host A's public key to the authorized_keys file. In this way, host A can SSH to the machine without logging in. Go to the JAR file directory and execute the following command.
[Secret of Performance Optimization] How can Hadoop optimize the performance of large terabyte files upload by 100 times
In our last article, we talked about the Edits log writing mechanism in NameNode in Hadoop. When edits log is written to disk and network, the throughput of edits log is greatly improved by the mechanism of segmenting locking and double buffering, so as to support high concurrent access. If you didn't read the article, you can...
Hadoop Enterprise Production Tuning Manual (Part 2)
"This is the 27th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021". Last section: Hadoop Enterprise Production Tuning Manual (I) V. HDFS Storage Optimization Note: Demonstrate that erasure codes and heterogeneous storage require a total of five VMS. As far as possible
DKhadoop installation and configuration tutorial and FAQ solutions
Last week, I wrote two shared articles respectively on the installation preparation of DKHadoop and the configuration of the server operating system. This is my first attempt to write a systematic shared article. There are bound to be many omissions, but I'm sorry for them. DKHadoop installation and FAQ solutions are shared today. Step: Run the following command. 3. If the Hue page cannot be opened, the system displays...
Summary of Hadoop project development case scheme
Big data Hadoop application development technology is in full swing, that big data is not only limited to the Internet field, but has been elevated to a high level of national strategy. Big data is profoundly affecting and changing the way we live and work in daily life. Hadoop application development is too low-level and difficult for most of us to understand. Some people say, isn't it all code flipping...