The introduction

Huya is the first listed game livestreaming company in China. Its products include the well-known game livestreaming platform Huya, and the game livestreaming platform NimoTV, which is popular in Southeast Asia and South America. The products cover PC, Web and mobile. Among them, game live broadcasting platform Huya live reached 150 million monthly.

How to use massive business data to connect the high-quality content of the whole platform with end users more intelligently and efficiently, and provide more effective data capability support for the company’s operation and business development is an important mission that huya Big Data team (hereinafter referred to as Huya) has been thinking and exploring deeply in the past and future. In order to achieve the above vision, Huya chose to cooperate with Tencent Cloud EMR team to access the big data cloud solution.

This paper will bring you an in-depth understanding of huya cloud big data practice through case interpretation.

1. Huya live broadcast big data analysis scene

1. Background

Huya is the first listed game livestreaming company in China. Its products include the well-known game livestreaming platform Huya, and the game livestreaming platform NimoTV, which is popular in Southeast Asia and South America. The products cover PC, Web and mobile. Among them, game live broadcasting platform Huya live reached 150 million monthly.

2. Big data scenario

Aware of the important value of data for business, Huya has established a professional big data team with nearly 100 members early on. This team has excellent data technical ability and business understanding in the industry, so as to cope with the efficient storage and calculation of massive data, algorithm construction, business value insight and other domain work.

After years of building, the team has made remarkable progress in all areas around data, truly making data the catalyst for connecting quality content to end users.

Huya Big Data team quickly built a robust full-platform big data platform based on the open Hadoop technology stack to support efficient storage and computing of offline and real-time streaming data with a scale of nearly 100 P and data scientific exploration. At the same time, it also explores the value of data applications such as accurate content recommendation, business analysis and user experience improvement.

With the help of the all-platform big data platform, all business lines of Huya can access the data of business lines quickly and at low cost, and continue to evolve technology with the help of this platform.

End customers can timely obtain personalized and high-quality content strongly related to their interests (esports, console games, mobile games, food, quadratic elements, etc.) and get an immersive experience.

Anchors, as content providers, can also adjust the broadcast style and content to attract more users’ attention by analyzing their own live broadcast data.

2. Challenges of big data analysis

Based on the large-scale data growth and the higher demand of business, people put forward more practical requirements for data tools. As time goes by, the traditional IDC self-built big data analysis platform gradually presents two problems: insufficient response and high cost.

1. Respond to the timeliness challenge

The first challenge is the timeliness of response to emergent tasks. Routine tasks can be divided into months, weeks, days, hours, and minutes according to the time dimension. These tasks are evenly distributed to the big data analysis platform, and the platform load is maintained within a reasonable (relatively saturated) range for a long time, making good use of expensive IDC hardware resources.

But as the big data analysis on the mining business value in the role of more and more important, emergency and the analysis of the new task more, at this point, the utilization rate tends to saturation of the hardware resources will become a bottleneck, from submit budget for equipment to a new device to join analysis cluster usually need in two weeks to complete, but it is also often leads to delays as a result.

Another concomitant problem is that reserving more hardware means less cost performance.

2. Cost challenge

The second challenge is the cost of storing cold data. As time goes by, more and more data become historical data, occupying the same hardware resources while the utilization rate decreases. How to reduce the storage cost of cold data, and at the same time, fast analysis when needed is also a relatively challenging topic.

3. Cloud big data solutions

In order to cope with the above challenges and bottlenecks in the field of big data analysis, huya Big data team is constantly exploring solutions that are more suitable for the actual needs of the business. After years of development, precipitate out of the field of live rich experience in big data analysis, canine teeth on big data team in cloud enjoy cloud platform provides a flexible, open, rich products and services at the same time, is also working with tencent cloud, together with large data team planning industry oriented general open source solution, implement the cloud vendors and Internet companies big data technology synergy, Jointly promote the evolution of big data technologies and industry solutions.

Cloud big data solutions

Recently, huya big data team received an unexpected task as an opportunity to cooperate with Tencent cloud big data team. The task involves analyzing data for the whole of 2019 and getting the results back over the weekend.

According to previous processing experience, it is necessary to expand IDC cluster to meet the operation of unexpected tasks without affecting the operation of original routine tasks, which is obviously not the best solution for this time-constrained task (applying for new devices takes a long time and causes long-term cost waste).

Cloud big data solutions are flexible, efficient and cost saving. It can not only achieve continuous service evolution and rapid global deployment, but also achieve outstanding cost savings in warm and cold data storage resources and redundant computing resources.

Based on the above advantages, huya Big data team began to try to use flexible cloud resources to solve tasks. After multiple investigations on product performance and cost, and after communication with Tencent cloud big data team, we jointly finalized the cloud big data solution:

First of all, warm and cold data were imported into Tencent Cloud COS by using the special line between Huya IDC environment and Tencent Cloud (the 2019 data used in this analysis was preferentially imported). Then, the Hadoop cluster created by Tencent Cloud Elastic MapReduce (EMR) product is used to analyze the data that has been imported into COS.

The results of the analysis task were successfully delivered on time: within 20 minutes, an analysis cluster with hundreds of nodes was created using EMR, the analysis task was deployed within 2 hours, and the analysis result was obtained in advance one and a half days later.

After the analysis task is completed, the temporary analysis task cluster in EMR is destroyed without cost. COS, as a unified storage medium for warm and cold data, continues to support subsequent new urgent tasks (just need to create a cloud EMR Hadoop cluster for analysis based on the data in COS at any time).

Iv. Core values brought by big data cloud

The successful attempt of Huya’s live big data solution most directly reflects the two values of cloud-based big data analysis: flexibility, efficiency and cost saving.

1. Flexible and Efficient: Create a minute-level cluster

Thanks to the separation of storage and computing features of Tencent cloud EMR products, data is stored in COS in a unified manner. EMR analysis clusters can be created at any time when tasks are needed and destroyed after task execution, which is the flexible capability of cloud. In the process of EMR cluster creation, it only takes more than 10 minutes to create a cluster of hundreds of nodes, which is the efficient capability of cloud.

2. Cost saving: 60% flexible cost saving

Cloud big data solutions provide two layers of cost savings:

(1) Tencent Cloud object storage COS is used as the unified storage medium of warm and cold data, replacing expensive IDC equipment, which is the direct cost saving of the first layer.

(2) The cost saving of the second layer comes from the use of flexible EMR architecture. EMR analysis cluster can directly analyze the data in COS, enabling us to create and destroy clusters as needed without long-term maintenance of redundant devices, which is very suitable for the scenario of emergent tasks.

Combined with previous customer experience of EMR products, flexible cost savings of up to 60% can be achieved.

5. Advantages of cloud data architecture

Cloud vendors provide rich big data products and services on the cloud, covering various links from big data infrastructure, full-link data tool chain to domain data value applications.

Cloud-based open big data technologies and products enable enterprise users to quickly build and migrate enterprise data architectures, and even seamlessly integrate existing big data architectures into the cloud.

Thanks to the massive storage/computing facilities in the cloud and the large-scale investment of cloud manufacturers in the field of big data open technology, the cloud big data products and services present the following characteristics:

Cloud big data infrastructure products have been unanimously recognized by Internet enterprise data IT teams for their technical openness, full link coverage and flexibility, and more and more enterprises have gradually realized the implicit value recognition brought by cloud manufacturers’ strong technical support. Data-driven business innovation and operation innovation with the help of cloud big data infrastructure has become the industry consensus and mainstream trend of the new generation of Internet enterprises.

The cooperation between Huya and Tencent Cloud EMR products is a good interpretation of this trend, which is mutually beneficial to release the value of multiple data. Tencent cloud big data team will continue to polish products and explore the way of cloud practice to benefit more industrial scenes.