Author: Li Jingxia Source: Light Finance

In the era of “data is king”, financial big data is known as “gold mine to be mined” and its value has become a consensus.

Since big data was first written into the government work report as a national strategy in 2014, financial institutions have continuously introduced big data platforms and built big data systems.

Nowadays, big data has long been a key part of the core competitiveness of financial institutions. Among them, data center and big data platform have become the key to the comprehensive digital transformation of financial institutions. Financial institutions are increasingly relying on “digital” to serve customers, innovate products and internal management.

It is worth noting that in recent years, the rise of the data center has become the king of topics in the financial industry, and the big data platform is relatively little talked about. With the rise of cloud computing, AI and other technologies and the deepening of the integration of big data, big data platform has stood at a new juncture.

01 new mark

The application of big data technology plus artificial intelligence and other technologies is turning the data of banks into high-value assets of banks, promoting technological empowerment and scene application innovation, and then promoting the reconstruction of internal IT systems and organizational structure reform of banks.

“Establish and improve enterprise-level big data platforms, and fully release the core value of big data as basic strategic resources.” It was mentioned in the FinTech Development Plan (2019-2021) issued by the central bank. What is a Big Data platform?

According to the “General Requirements for Financial Big Data Platform” (hereinafter referred to as “Requirements”) issued on December 29, 2021, the financial Big data platform is an enterprise-level, distributed, open and unified big data platform, which shall include data access, data storage, data processing, data analysis and data service-related components.

The overall goal of the financial big data platform is to help financial institutions more efficiently and quickly complete the development, deployment and management of financial big data applications, from transaction-centered to data-centered, so as to cope with the challenges of more multidimensional, more massive and more real-time data and Internet business.

When it comes to big data computing technology, the open source big data suite Apache Hadoop cannot be bypassed. Cloudera launched its own Hadoop Distribution CDH (Cloudera’s Distribution Including Apache Hadoop) in 2008, when Hadoop was fully incubated. CDH is also open source, but it is more user-friendly in terms of stability, management, deployment, operation and maintenance, which helps Hadoop to be implemented.

Around 2011, Hadoop technology entered the mature stage, and the rapid expansion of data volume due to the rise of Internet finance, the traditional data system could not meet the needs of financial institutions, so the Hadoop system with distributed characteristics entered the list of choices of these institutions.

It will take two years for financial institutions to implement hadoop-based big data platforms. For example, Agricultural Bank of China started to build an independent and controllable big data platform in 2013, and finally chose the architecture of MPP database +Hadoop. In 2014, ICBC formally built a big data platform based on Hadoop technology.

After 2015, mobile Internet has accelerated the change of customer behavior pattern, and financial institutions have entered a new period of digital transformation. They not only deal with more and more massive data, but also analyze customer data in response to the change of customer behavior pattern and carry out precision marketing to customers. At this point, many organizations are switching functions such as data analysis to Hadoop systems.

According to the statistics of 40 to 50 big data platforms tested by the China Information and Communication Academy in 2019, more than 70% of them are based on the secondary research and development of CDH and HDP community version.

The current big data platform is standing at a new pass.

On the one hand, Cloudera previously announced the end of CDH6 and HDP3 service support in late 2021 and March 2022, in favor of a new product, CDP. This means that the CDH and HDP systems used by financial institutions are facing a comprehensive migration, and new alternative solutions are urgently needed.

On the other hand, under the wave of fintech innovation, localization of big data platform of financial institutions is a trend choice. According to the Fintech Development Plan (2022-2025) of the People’s Bank of China, it is necessary to speed up the formulation and implementation of security plans for key hard and soft information infrastructure in the financial industry, and effectively improve the security capability of key hard and soft information infrastructure in the financial industry.

In this context, where should the big data platform of financial institutions go? At this new juncture, domestic third-party fintech manufacturers have emerged to provide rich big data platform solutions for financial institutions with their accumulated capabilities and experience over the years.

New trend.

In addition to changes in the industry environment, big data platform technology also presents some new trends, making financial institutions put forward higher requirements and missions for big data platform.

One is integration. The integration of big data with cloud computing, AI and other technologies has made it a general trend for platforms to be deployed on the cloud. However, due to the risk and security considerations of the financial industry on the use of public cloud, hybrid cloud architecture is mainly used at present. Cloudera’s CDP is a hybrid cloud/multi-cloud big data platform.

The other is integration with AI. For example, intelligent algorithms of AI can be applied to big data. On the one hand, big data provides data support for AI. On the other hand, some conventional algorithms used by AI can be fed back to the big data platform to make accurate product recommendations to customers in combination with the data characteristics of big data.

IDC China released the 2021H1 Big data platform market share report, showing that the overall market size reached 5.42 billion yuan, with a growth rate of 43.5% compared with the same period last year. “The driving force of market growth comes from digital transformation, the deployment of artificial intelligence, the construction of industry cloud and the policy drive of new infrastructure.”

Second, real-time. After years of layout of big data platforms, financial institutions have gradually formed their infrastructure, which has become a new requirement to support their business scenarios with high efficiency. At present, with the deep integration of big data with cloud computing, AI and other technologies, there is a key belief in the market that “big data” is rapidly moving towards the era of “fast data”. For financial institutions, it is to improve the “real time” of big data.

For example, IN 2020, ICBC began to build scenarios with high time-efficiency of big data, that is, in addition to batch calculation, real-time computing, online analysis, data API and other platforms are needed within the big data platform to shorten the end-to-end closed-loop time of data, form online high-concurrency access capability, and improve the time-efficiency of data enabling services.

Third, forward-looking. Big data platforms enable financial institutions to better understand customers and provide services for customers in a forward-looking manner. It is also mentioned in the Requirements that the specific functions and technologies of the financial big data platform can be divided into basic requirements and enhanced requirements. Among them, the enhancement requirement is put forward from the development trend of technology and the prospective demand of financial users. This means that financial institutions need to proactively improve the construction of big data platforms based on customer needs.

Finally, security. Both the autonomous and controllable security of the big data platform technology and the security requirements of the data itself have been raised to a higher level. This puts forward higher requirements for financial institutions to choose or build big data platform cooperation.

With the addition of third-party manufacturers, financial institutions have more choices in terms of technology autonomy and control. Localization trend to the third party service providers ushered in a strategic opportunity period.

The Yuli Data development and management platform launched by NetEase Yuli Data Development and management Platform — a one-stop big data management and development platform, including big data platform and data Center, mainly covering big data development, task scheduling, data quality, data governance and data services.

The big data platform layer is also the Hadoop distribution version. Compared with the community version, it integrates the latest Version of Spark and has complete permission control and audit capabilities, which can greatly improve the efficiency of offline SERVICE ETL. In addition, sufan has implemented a number of feature enhancements and performance optimizations for the Impala component to ensure stability and performance during use.

It is worth paying attention to whether domestic products can meet the needs of financial institutions. How do financial institutions choose the new direction of big data platform?

3 new options

The answer to that question starts with clarifying what financial institutions need right now.

First of all, independent and controllable fintech, controllable data security, cost control and fast service response are the key words of financial institutions’ current demand for big data platform. Finance focuses on security, and its technical requirements for data security and business continuity assurance are usually higher than those of other industries.

For example, in terms of cost control, a financial institution has strong IT technology strength. IT has more than a dozen clusters and the number of nodes is expected to be hundreds. At present, the data platform has 2-3 million software cooperation costs. In addition, the use of CDH version no longer updated, need to train a group of special teams to take charge of maintenance, will also increase the cost.

As a result, the basic software financial institutions of big data platforms often choose products from third-party manufacturers. Faced with such a situation, financial institutions may continue to migrate to CDP, or choose the basic software of big data platform with home-made technology for migration.

Second, no matter what kind of product they choose, financial institutions will pay attention to the “popularity” of big data platform products, that is, whether the underlying platform they use has a high popularity, such as Hadoop and Spark. They also want the product to be open source.

“Financial institutions are increasingly dependent on the whole system of big data.” Jiang Hongxiang, senior architect and head of the big data basic technology platform of NetEase Sufan, told Light Finance that the big data platform is based on a low-cost server and can be expanded infinitely and distributed, so its cost, scalability and stability are good choices for financial institutions.

In addition to the products themselves, financial institutions increasingly pay attention to the strength of third-party fintech companies and product services. Strong technical support, comprehensive ecological compatibility, timely response to vulnerability repair, rapid update and iteration are all required capabilities of suppliers.

Of course, from the current environment, domestic big data platforms have formed the following advantages: independent and controllable, the control of enterprises in their own hands; Quick response and smooth communication with local services; Cooperation and co-creation, in-depth business, customized demand support.

With a number of netease sail for data development and management platform as an example, which has open source base, and ecological support compatible with CDH core components, and on this basis, according to the technical parts of components to upgrade and expand, the development trend of customized needs support of financial institutions, for example, in a standard product, can also support custom development demand of 20% ~ 30%.

With a securities company to build the process of big data platform, netease several main sail data management, security center, data standard, data quality and so on several architectural modules promote development cooperation, at the same time will according to the custom of the special requirements of the securities industry, such as user portrait of enhancement, typical trading day scheduling, namely, data processing, the trading day only Thus, a platform solution more in line with the characteristics of the industry is formed.

NetEase Shufan Financial Big data solution architecture

At the same time, NetEase number fan also supporting one-stop data center and rich data products. On the basis of the underlying components of big data distribution, one-stop data center services and rich data products can be selectively provided for users to facilitate business out of the box. At present, NetEase Shufan has served a number of customers in the financial industry, including the fintech subsidiary of a state-owned bank, Huatai Securities, Northeast Securities, China Asset Management, Huafu Securities, etc., and its applicability has been fully verified.

The products that meet the needs of current financial institutions when they are launched at the time pass of big data platform are mainly derived from the years of deep cultivation in the field of big data, which has accumulated a complete big data research and development ecosystem and rich experience in production line operation and maintenance.

Before Hadoop even existed, NetEase started to build its own distributed storage system in 2006. Hadoop system was introduced in 2011-12 to support mailbox, news and other services. In 2015, in order to solve the problem of scattered components and lack of unified management, NetEase began to develop big data platform tools and made platform integration similar to CDH. In 2018, when big data was booming, NetEase Shufan developed the Data Center, which became a universal tool for all BU.

To the present 4 years, NetEase number fan has also formed a set of data methodology in Taiwan. The research and development of big data technology needs the support of a strong team of scientific and technological talents. Currently, there are hundreds of people in the big data platform and data center team of NetEase Sufan, which can provide technical support, customer operation and maintenance, and core RESEARCH and development services.

Due to its excellent technology, strong product compatibility and service advantages, The big data platform products of NetEase Sufan have attracted the attention of many financial institutions.

“Many financial customers prefer private cloud computing deployment, so Shufan is a little slower in the scenario of big data platform in the financial industry being deployed in the cloud. In the non-financial sector, we’ve actually moved to cloud platforms.” In the face of the future big data platform cloud trend, Jiang Hongxiang said.

According to the calculation by Statista, the global Hadoop and big data market was about usd 34 billion in 2019, with a five-year cagR of 28.5%. With the deepening of the digital transformation of the financial industry, financial institutions are becoming more and more dependent on big data, and the market pie of big data platform will be bigger and bigger.

It is an inevitable choice for the financial industry for technology manufacturers with home-made big data platforms to enter the market with new products, and the financial institutions that take the lead in the layout are expected to take the lead earlier.