Big data demand heat, has always been the wave of this era. However, due to the complexity of big data systems, the industry once led to the death of all kinds of voices. Especially when MapR was acquired by HPE and Cloudera's stock continued to fall into the doghouse, those voices were amplified. In fact, the need for big data has always been there, but the traditional big data implementation system needs to consider rebuilding. The container, by virtue of its own standardization, once-build, and ability to run anywhere, makes it very suitable for the construction and management of big data systems. Container technology is the hot chicken right now.
1 huawei cloud BigData Pro BigData solution won the industry annual gold award
On the evening of December 3, 2019, huawei cloud BigData Pro BigData solution won the "2019 BigData product gold award" at the annual award ceremony of the 2019 China data and storage summit, once again demonstrating huawei cloud's extraordinary strength in the field of BigData. China Data and Storage Summit (DSS) is the top technology event in the field of data and storage in China. The awards presented by DSS are of great value and have witnessed the rapid development of data storage technology and industry in China over the past ten years. The selection covers private cloud big data, public cloud big data, big data software, big data solutions and other fields and dimensions. Huawei Cloud BigData Pro can win the gold medal at one fell swoop, is also deserved.
Containerization of big data is the general trend
There are already a large number of big data systems that support on Kubernetes natively. For example, the official Spark version can run on the K8s without any modification since 2.3. In addition, "better running on K8S" will be an important feature direction for the subsequent release. This shows how important k8S is to big data systems.
3 teammates are speeding up, do you feel it
Because container technology is such a good boost for big data, there are already a number of tech-savvy head-players who are trying to make money out of it.
For example, China Unicom's containerized big data platform practice; Jd.com is using Kubernetes to manage big data centers; Netease builds mammoth big data platform based on Kubernetes and Docker; Eggplant technology will directly run a large number of big data tasks in the production environment above K8S; Huawei Cloud DLI service container, Ali Cloud Flink on K8s and so on. All this visible information suggests that the general direction is already accelerating. If you're struggling to maintain your big data system, you need to stop and see what your teammates are doing.
BigData's most immediate advantage on K8s is not the performance increase, but the cost reduction.
(1) Resource scheduling platform with high utilization. Services originally scattered in multiple clusters can be merged into a unified cluster, and long and short tasks can be mixed, as well as peak-cutting and valley filling during peak hours of different services can be added to maximize cluster resource utilization
(2) Unified technology stack. The original Yarn scheduling and node management technology have the same goal as the current cosmic standard K8s cluster scheduling system. However, the maintenance of the two technology stacks will increase the labor cost of RD, and the unified infrastructure technology stack will significantly reduce the cost.
(3) Container automation ability. Standardization is one of the driving forces for the sustainable development of IT technology. The idea of container technology itself is to build once and run anywhere, which is consistent with the idea of standardization. Establish operation and maintenance system through standardization implementation of container technology and integration of container ecology. It can greatly reduce the operation and maintenance cost of business system, and even the construction and use cost of operation and maintenance tool itself.
4 container + storage separation, to speed and cost
Current big data computing combines computing and storage together, which is an attempt to build distributed architecture. However, when the community modifs HDFS to support ErasureCode (Erasure codes) of Hadoop 3.0, it accepts that no more (Wu) and (Fa) support the nearest read policy. It represents a new trend: the ratio of storage space and computing power should be flexible and can be built independently to suit different scenarios.
"Decoupling computing and storage is proving useful in big data deployment, providing higher resource utilization, greater flexibility and lower costs," the IDC China report noted. This judgment coincides with the transformation of big data architecture that many enterprises are undertaking.
At the same time, with the maturity of container technology and its extensive application in various industries, enterprises are increasingly aware that the advantages of container technology can solve the current difficulties encountered by big data platforms. The container can further improve resource utilization and easily cope with the capacity expansion when a large number of concurrent tasks due to its characteristics of smaller granularity, lighter and faster deployment mode, and flexible task scheduling.
5 Above kunpeng, volcanoes help
The Kunpeng processor independently developed by Huawei Cloud has multi-core and high concurrency capability, which can provide users with computing power of various granularity, including bare metal server, cloud server, container and Serverless. The performance of big data distributed scenarios can be greatly improved.
Among them, Kunpeng big data container has extremely elastic scheduling ability and can issue 1000 containers per second, reducing elastic waiting time of resources and improving computing efficiency. Bare-metal container technology can further improve the utilization of services performed by the server because it significantly reduces the overhead of virtualization. Container clusters in Serverless mode support flexible unlimited on-demand expansion and are used to perform Spark big data tasks and easily process PB-level data jobs.
Volcano project is an open source K8s enhanced scheduler developed by Huawei Container team. The original intention was to solve the problem that native K8s does not support Gang Scheduling. Later, as business fields such as AI and big data also began to have great demands on K8s, team members built valuable technical products by summarizing practical experience in specific scenarios and contributing to the community.
Volcano achieves higher container scheduling speed through high-performance scheduling algorithms. At the same time, the built-in algorithm plug-ins can greatly improve the utilization of cluster resources. At the same time, Volcano also fills the Gap between the K8s native scheduler and Yarn scheduler, such as the Queue management ability of resources. For big data container solutions, inserted with the wings of fire
Big data world, containers to cheer
BigData Pro is the first kunpeng BigData solution in the industry. The solution adopts the storage and computation separation architecture based on public cloud, uses kunpeng computing power with infinite elastic expansion as computing resource, and uses OBS object storage service supporting native multi-protocol as unified storage data lake. It provides a new public cloud big data solution of "storage and computation separation, extreme flexibility, and extreme efficiency", which greatly improves the resource utilization rate of big data cluster, effectively deals with the bottleneck existing in the current big data industry, helps enterprises cope with the new challenges in the ERA of 5G+ cloud + intelligence, and realizes the intelligent transformation and upgrading of enterprises.
Kunpeng BigData container solutions, as an important member of BigData Pro solutions, provides complete containerized BigData solutions.