Kaiyuan Big Data Weekly - Issue 9 - Moment For Technology

Abstract:

Dynamic description of Aliyun e-MapReduce

E-mapreduce Team Version 1.3.2 (released) :

Master HA function

Version 1.3.3 (coming soon)

commercial

Version 1.4 (under development) :

User execution plan and cluster running status customized alarm
A dashboard of the overall cluster performance
Some cluster experts suggest, for example, capacity expansion warning
One-click restart of the cluster

information

Cainiao “make” river’s lake: logistics + big data, can guangdong, Jiangsu GDP save out? Where do Chinese logistics go? Cainiao believes that by promoting the smart transformation of the logistics industry, the future goal is to reduce the proportion of the total cost of logistics in China’s GDP to 5%, which will be a significant contribution of the new logistics model to society, equivalent to saving the combined GDP of Guangdong and Jiangsu provinces every year

Big data, the future is here The other day, at a big data conference in Chengdu, top experts all mentioned the idea that big data is far from mature. I also agree with this point of view. Big data is still a frontier science today, and it is still immature in many industries and segmented fields. But that doesn’t mean we have to wait and do nothing. On the contrary, in a few niches, in a few specific “points”, I have seen a spark. It is no different from any other emerging market I have experienced. I firmly believe that these sparks will start a prairie fire! 2016, big data, the future is here.

It took a decade of research to decipher human DNA for the first time. Now, 13 years later, it can be done in 24 hours. We are constantly improving our data processing tools. The amount of data has also exploded over the past decade. So is there room for innovation? Will the future still give us novel Revelations, will it still raise eyebrows? At this point, we need not guess. Let’s take a look at how the top gurus in data science see big data in the next decade, and what they think it will do to change the world.

There is no doubt that the era of big data has come, it is quietly changing people’s behavior and thinking, irresistible. While computer science, e-commerce and other fields have made remarkable achievements in the development and application of big data technology, how should statistics, which takes data as its research object, respond? Indifferent or blindly following? The correct attitude should be rational treatment, active follow-up, change thinking, seeking development.

With Hadoop on the cloud, many people are concerned about performance. Because virtualization comes to mind at a cost, the biased conclusion is that running on the cloud must be worse than running on a physical machine. If you virtualize Hadoop on 10 separate physical machines, there is definitely some performance overhead. But on a public cloud, this is not the case. This is because the cost of public cloud virtualization is ultimately borne by the platform. Firstly, the platform has scale advantage in purchasing machines, and secondly, the platform can oversold some resources while ensuring the performance of virtual machines.

Spark Performance Tuning Guide – Advanced In-depth analysis of data skew tuning and Shuffle tuning to solve trickier performance problems.

Sqoop is a solid bridge for data extraction and transformation between Hadoop and other relational databases. It can import and export data between relational databases and data storage modes supported by the Hadoop ecosystem (HDFS, Hive, HBase).

A year after Apache Pig 0.16.0, Pig was released with Pig on TEz support

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Kaiyuan Big Data Weekly – Issue 9

Dynamic description of Aliyun e-MapReduce

information

Kaiyuan Big Data Weekly – Issue 9

Dynamic description of Aliyun e-MapReduce

information

Related Posts

Decision Tree Learning Algorithm for machine Learning

What about the advertising system, the printing press of the Internet industry?

Retain the model with the highest F1-score in Keras (per epoch)