Today, with the rapid development of big data and artificial intelligence, laws, regulations and trust issues seriously hinder the data flow between enterprises, and the problem of data islands is like an invisible hand blocking enterprises. Due to the lack of valuable data cooperation, the cost of user acquisition in various industries remains high. In order to meet the data security sharing between enterprises, release data value and promote business innovation, Tencent’s “AegIS – Federated Computing” platform came into being!

Research on multi-party computing technology for data security and privacy protection can be traced back to the 1970s, while the emerging concept of federated learning has been booming in China since 2019.

“Aegis – federal computing platform forming it was also during this time, after 2-3 months system evaluation, safety evaluation calculation and site reply, in December 2019,” on behalf of god shield – federal computing “tencent a license from the mail tunnels courtyard of data flow were calculated based on multilateral security product certificate, first obtained the certificate of the nation’s team had five. S.H.I.E.L.D. is leading the development of federal learning standards for ict.

Tencent’s “S.H.I.E.L.D. – Federal Computing” platform came into being

Big data today, and the rapid development of artificial intelligence, laws and regulations and the trust problem seriously hindered the enterprise data flow between the problem of data island like an invisible hand to block between the enterprise, because of the lack of valuable data, industry users access cost is high, bank credit card accounts for bad users than rose across the board, financial credit audit cost spurt, AI development has also encountered unprecedented bottlenecks. In order to enable these enterprises to conduct data cooperation on the basis of legal compliance, security, efficiency and lossless, Tencent’s “AegIS – Federal Computing” platform came into being!

This is a distributed computing platform mainly based on federated learning, multi-party secure computing (MPC), blockchain, trusted computing and other security technologies. The product is customized for the privacy protection transformation of machine learning algorithm, to ensure that the data can be completed locally, and maximize the data value of each cooperative enterprise:

According to the actual scene requirements of both parties, its upper layer can cover mainstream businesses such as risk control, marketing, recommendation and AI. Meanwhile, “S.H.I.E.L.D. – Federal Computing” will also act as a bridge between business and data, facilitating cooperation between business parties with data needs and data parties with value realization.

Products first in the joint modeling of data format specification, safety intersection, the characteristics of engineering and the parameters of the algorithm debugging details such as the detailed polishing, and then in the bottom at the heart of the data security and privacy protection technology related areas made the basis of in-depth study, made a number of groundbreaking achievement, is the industry leader.

It contains the concept of asymmetric federal initiative and fall to the ground, safety information retrieval scheme first and fall to the ground, covering homomorphic encryption, casual transmission, privacy is a collection of intersection, a number of MPC technology innovation and application, the mainstream of the study protocol optimization, improve accuracy and efficiency of trusted among party stripping, one-way federal web strategy transformation of the propulsion, etc., Here are a few of the key breakthroughs.

First asymmetric federated learning framework

In the standard process of longitudinal federated learning, two cross-feature participants need to perform the following two operations:

1. ID alignment

Mainly relying on Private Set Intersection [2,3] (PSI) technology, the Intersection of all input sample ID sets is output at each participant.

2. Encryption model training

Based on the output intersection mentioned above, each participant calculates and communicates the encrypted intermediate variables calculated based on the original data set.

In the frontier federation learning circle, a great deal of research work has been devoted to the training of encryption models, including the design of new federation protocols [4], optimization of federation communication mechanisms [5, 6], and design of federal incentive systems [7], but there are few systematic studies on ID alignment.

In the actual vertical federated learning scenario, we find that one party usually has a small ID set and strong business attributes, which is the information that the ID owner wants to protect. However, participants with fewer ids have to expose these ids in the ID alignment operation, making them “vulnerable”.

For example, the credit companies in the alliance need to input their customers’ default records into the federal learning system in order to realize the risk control forecast, and each of these default records is obtained by such companies in exchange for huge economic losses, which is the highest level of trade secrets.

In order to solve this problem, the all-round privacy protection of THE three elements of ID, feature and tag is put at the top of the product and the data security concerns in the highly sensitive field are completely removed. We pioneered the concept of asymmetric federated learning in the field of federated learning and invented the technologies such as Asymmetrical fudge-PSI and Genuine with Dummy for the first time. It supports a complete asymmetric encryption entity alignment + asymmetric encryption feature engineering + asymmetric encryption model training federated learning data link. We will present some of this work at FL-IJCAI20 international Conference [8].

Pioneered secure information retrieval technology for federated sharing of results

Asymmetric federation solves the problem of sample ID leakage in the training process, but the user list can still be leaked due to the query behavior in the production line. Protecting the query user list by returning the full predicted score makes it inconvenient to charge by volume and a commercial hurdle.

Aegis federated Computing platform deeply combines business scenarios and requirements, pioneering the security information retrieval technology for federated achievement sharing, solving important privacy problems of federated learning applications, and achieving a complete and new security information flow of sample preprocessing – data mining – federated reasoning – federated achievement security sharing. Security information retrieval technology solves the problem of multi-party achievement sharing in federated learning engineering practice and fills the last short board of federated learning system operation.

The secure information retrieval protocol is based on Pohlig-Hellman exchange encryption technology and Oblivious Transfer technology in MPC, which strongly guarantees that the sender of federal achievements can share the reasoning results of target customers accurately and fully protects the privacy of target customers of the recipient of federal achievements. S.H.I.E.L.D. federal Computing platform has filed several national patent applications based on this work.

Pioneering high performance homomorphic encryption with semantic security

Users who use federated learning systems for the first time can clearly perceive the performance difference between federated learning and traditional distributed machine learning frameworks for extensibility, such as Spark MLlib and Tensorflow, and thus have some doubt about such “inefficient” federated services.

S.H.I.E.L.D. Federated Computing platform optimizes the performance of federated services by starting with homomorphic encryption, the core privacy protection technology of federated learning, and pioneered a high-performance homomorphic encryption technology with semantic security. In the unit test, the computational efficiency of our results is more than a thousand times higher than the existing homomorphic encryption; The training time of the whole model can also be saved by 87%.

Homomorphic encryption is one of the most universal and portable secure multi-party computing technologies in several federated protocols widely used in the current industry. It can easily decouple the roles of data provider and computing party on the premise of protecting privacy, which perfectly fits the privacy-oriented distributed computing nature of federated learning.

The research on homomorphic cryptography has attracted a wide range of scholars, and a large amount of work has been devoted to the research on various homomorphic cryptography that supports deep operation layers, multiple operation types and high security levels [9-11]. However, limited by the performance of modern computer processors and the requirements of high time-efficiency and low latency in actual business scenarios, many complete but complex homomorphic ciphers cannot complete enough rounds of federal modeling training on sufficiently large data sets in a satisfactory time, even if the server configuration is greatly improved. This is the core factor of user perception of federated learning and traditional distributed modeling system performance difference.

In order to speed up federation learning by improving the underlying homomorphic encryption, we refer to the group operation type of the classic symmetric Affine Cipher and the multiple group ciphertext confusion thought of the asymmetric Cipher ElGamal. The world’s first Randomized Iterative Affine Cipher (RIAC). Our achievement RIAC has greatly improved the efficiency of homomorphic operation on the premise of preserving the concealness of operation times and semantic security of classical homomorphic cryptography, and is in the leading position of related technologies in China. S.H.I.E.L.D. federal Computing platform has filed several national patent applications based on this work.

The first peer-to-peer distributed security aggregation technology

In a federated learning system, the protection of data privacy depends on various internal security sub-protocols, such as federated sub-protocols for addition, multiplication, aggregation and other operations [13, 14]. Among them, aggregation technology can complete the centralization of intermediate variables such as parameters (such as gradient, residual, etc.), model estimation (such as weight) and model predicted value required by model update distributed among all parties on the premise of protecting the data privacy of each participant.

Secure Summation is one of the most intuitive implementations of aggregation protocols, and also one of the benchmark testing schemes for many Secure aggregation technologies.

At present, widely popular security summation implementation schemes in the academic industry include efficient security summation protocol [15], homomorphic encryption [10, 11], secret sharing [16], privacy-oriented consensus protocol [17, 18], etc. However, in the application of federal protocol, these existing protocols have various problems. It includes the threat of collusion [15], relatively complex calculation [10,11,18], loss of accuracy [17], the problem of full decentralization [10,11], and dynamic environment [19].

Unfortunately, few summation agreements address such requirements for federated learning in depth. As a reliable subroutine in the federated learning system, we pioneered the evolutionary summation protocol for privacy protection [12], with a completely decentralized structure, to execute the security summation service with good security, high accuracy and strong resilience in the dynamic environment of frequent login and logout of participants’ devices within unlimited time. Suitable for various security aggregation requirements in federated learning protocols. In April 2020, our results were published in IEEE Intelligent Systems.

First unidirectional federated network policy

Mainstream market federal learning products and open source framework, all need to modeling the network two-way communication, but this extremely sensitive data such as in a bank security industry would lead to network safety concerns, if open the outside entrance to access bank internal network, hackers can scan open port, forged IP packet source, launched by means of malicious attacks.

Therefore, the security of data and network environment can be greatly enhanced if only the exit permission can be opened but not the entry permission. Based on this feature, banks, mutual finance and other sensitive industries can export IP dynamically, dynamic port mapping and other means to protect network and data security from hackers. At the same time, data cooperation is easier to pass the compliance review, and cooperation is easier to carry out.

Tencent’s “S.U.D.-Federated Computing” platform pioneered the federated one-way network architecture, in which the more security-sensitive partner can use the one-way mode, which only opens the access to the network exit but not the access to the network entrance. The unidirectional connectivity architecture provided by S.H.I.E.L.D. has been tested with no impact on performance and performance, while significantly improving data security.

Tencent S.H.I.E.L.D. joins hands with PowerFL and FATE

PowerFL is a federated learning platform without a trusted third party produced by Tencent TEG, which is built on the open source intelligent learning platform Angel, including but not limited to vertical federated learning system framework and various algorithms. S.E.D. and PowerFL are the partners of Tencent “Federated Learning” open source collaborative Oteam. From the beginning, cooperative research was conducted in different directions such as basic framework, federated algorithm and applied research.

As the first industrial-grade open source project of Federation Learning in the world, FATE quickly entered the vision of S.E.L.D. team during the same period. After more than half a year’s cooperation discussion on federation technology and application, the two sides set up a joint project team in September 2019 to conduct a series of joint product development and iterative optimization based on Federation Learning. The two sides have also carried out in-depth cooperation in technical research and industry application standards, making data cooperation under privacy protection no longer difficult.

Based on joint research with PowerFL and FATE, S.H.I.E.L.D. redesigned the federated product architecture based on federated learning, MPC, and blockchain technologies and honed the details. On the current Tencent “S.H.I.E.L.D. – Federated Computing”, a novice user with a little knowledge of the algorithm does not need to write any scripts, but can easily complete the entire federated modeling process through simple Settings.

In addition, many of THE high-value work of S.H.I.E.L.D. mentioned above also contributed to the FATE open source community, and the core members of the team became the first level 1 contributors to the FATE open Source community in 2019.

Tencent S.H.I.E.L.D. introduces heavyweight data partner TalkingData

Aegis first carries the mission of a federal computing platform, it can meet the demand of data cooperation between enterprises joint modeling done safely, but we all know that valuable data is the key to all this, Banks have a high quality user samples and the characteristics of flow of data samples of game publishers have quality players, online education institutions have classified education users samples and characteristics, Connecting Internet giants with valuable data collaboration has become another part of S.H.I.E.L.D. ‘s mission.

To put it simply, any data demander can quickly find high-value partners on the S.H.I.E.L.D. platform, and any data cooperation enterprise can complete secure data cooperation through S.H.I.E.L.D. – Federated Computing platform.

TalkingData is a leading data intelligence service provider in China. Yan Hui, the product leader, has a long history with Tencent’s big data team, and has in-depth discussions and cooperation in precision marketing, EMR, ES, statistical analysis, BI fine operation, enterprise portrait and other products. Of course, federal learning is no exception in the frontier field of big data.

The two sides agreed on the value of federated Learning and entered into a strategic partnership in early 2020. TalkingData is willing to work with Tencent S.U.D.-Federated Computing to provide rich, secure and multi-dimensional federated data services to customers.

These are just the beginning. Tencent’s S.H.I.E.L. -Federated Computing team has a long way to go in its vision to “make data cooperation between enterprises no longer difficult!” There is still a long way to go, but in order to bring better data cooperation environment for all walks of life, in order to bring higher market growth space for enterprises, in order to provide better service to users, the team will be brave and brave.

References:

[1] Yang, Qiang, et al. “Federated machine learning: “ACM Transactions on Intelligent Systems and Technology (TIST) 10.2 (2019): 1-19.

[2] Pohlig, Stephen, and Martin Hellman. “An improved algorithm for computing logarithms over GF (p) and its cryptographic significance (Corresp.).” IEEE Transactions on Information Theory 24.1 (1978): 106-110.

[3] De Cristofaro, Emiliano, and Gene Tsudik. “Practical private set intersection protocols with linear complexity.” International Conference on Financial Cryptography and Data Security. Springer, Berlin, Heidelberg, 2010.

[4] Cheng, Kewei et al. “Secureboost: A Lossless Federated learning Framework.” arXiv Preprint arXiv:1901.08755 (2019).

[5] Liu, Yang, Et al. “A Communication Efficient Vertical Federated Learning Framework.” arXiv preprint arXiv:1912.11187 (2019)

[6] Zhuo, Hankz Hankui, et al. “Reinforcement Learning of Federated Reinforcement.” arXiv Preprint arXiv:1901.08277 (2019).

[7] Wang, Tengyun, et al. “A revenue-maximizing bidding strategy for demand-side platforms.” IEEE Access 7 (2019): 68692-68706.

[8] Liu, Yang, Xiong Zhang, Libin Wang. “Asymmetrically Vertical Federated Learning.” arXiv Preprint arXiv:2004.07427(2020).

[9] Rivest, Ronald L., Len Adleman, and Michael L. Dertouzos. “On data banks and privacy homomorphisms.” Foundations of secure computation 4.11 (1978): 169-180.

[10] Paillier, Pascal. “Public-key cryptosystems based on composite degree residuosity classes.” International conference on the theory and applications of cryptographic techniques. Springer, Berlin, Heidelberg, 1999.

[11] Gentry, Craig. “Fully homomorphic encryption using ideal lattices.” Proceedings of the forty-first annual ACM symposium on Theory of computing. 2009.

[12] Liu, Yang, et al. “Distributed Privacy Preserving Iterative Summation Protocols.” arXiv preprint arXiv:2004.06348(2020).

[13] Bonawitz, Keith, et al. “Practical secure aggregation for privacy-preserving machine learning.” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017.

[14] McMahan, H. Brendan, et al. “Federated learning of deep networks using model averaging.” (2016).

[15] Clifton, Chris, “Tools for Privacy Preserving Distributed Data Mining.” ACM Sigkdd Explorations Newsletter 4.2 (2002): 28-34.

[16] Damgård, Ivan, et al. “Multiparty computation from somewhat homomorphic encryption.” Annual Cryptology Conference. Springer, Berlin, Heidelberg, 2012.

[17] Mo, Yilin, and Richard M. Murray. “Privacy preserving average consensus.” IEEE Transactions on Automatic Control 62.2 (2016): 753-765.

[18] Ruan, Minghao, Huan Gao, and Yongqiang Wang. “Secure and privacy-preserving consensus.” IEEE Transactions on Automatic Control 64.10 (2019): 4035-4049.

[19] Wang, Jianyu, and Gauri Joshi. “Adaptive communication strategies to achieve the best error-runtime trade-off in local-update SGD.” ArXiv preprint arXiv: 1810.08313 (2018).

See Tencent technology, learn cloud computing knowledge, pay attention to “Cloud plus community”