This article was first published in Feng Yu’s space Yu-feng.top /

Writing in the front

Notes mainly refer to theFederated Machine Learning Concept and Applications”And Federal Learning. As an introductory note, it can be used as a translation of the paper or as a framework for further study. Follow the framework of the paper, and try to integrate parts of the book Federal Learning, preferably using the research of the book.

Abstract

Ai today still faces two major challenges:

  1. In most industries, data exists as “islands”
  2. Strengthen data privacy and security issues

A possible solution to the above challenges is proposed: Secure Federated Learning

A comprehensive introduction to the framework for secure federated learning, including

1) Horizontal Federated Learning

2) Vertical Federated Learning

3) Federated Transfer Learning

Provide a comprehensive survey of FL definitions, architectures, applications of the FL framework, and existing work in this subject area; In addition, the establishment of data networks among various organizers based on a federal mechanism is proposed as an effective way to share knowledge without compromising users’ privacy.

1. The introduction

Today it is showing its strength in almost every industrial field. However, looking back at the development of AI, it is inevitable that it has experienced several highs and lows. Will **AI fall next time? When will it appear? What’s the reason? ** The current availability of big data is part of what drives the public interest on AI: In 2016 AlphaGo achieved excellent results using 200,000 games as training data.

** However, the real world is sometimes disappointing: with the exception of some industries, limited or low-quality data is available in most areas, making the application of AI technology more difficult than we might think. Is it possible to merge data in a common place by moving it between organizers? In fact, it’s very difficult, if not impossible, to break down barriers between data sources in many cases. ** Data integration even between different parts of the same company faces serious limitations due to industrial competition, privacy security and complex administrative procedures. It is almost impossible to consolidate data across countries and institutions, otherwise it is costly.

At the same time, it has become a major event worldwide to realize the harm of data security and user privacy caused by big companies and attach importance to data privacy and security. The news of public data leakage is attracting great attention from the public media and the government. Facebook’s recent data breach, for example, caused widespread protests. In response, states around the world are strengthening laws to protect data security and privacy. For example, the General Data Protection Regulation (GDPR) implemented by the European Union on May 25, 2018 aims to protect users’ privacy and provide Data security. ** It requires companies to use clear language to enter into user agreements and grants users the “right to be forgotten,” meaning that users can delete or withdraw their personal data. Companies that violate the act face stiff fines. The 2017 China Cybersecurity Law and general Principles of the Civil Law require Internet businesses to refrain from disclosing or tampering with personal information when conducting data transactions with third parties, and they need to ensure that proposed contracts comply with legal data protection business. The establishment of these regulations will clearly contribute to a more civilized society, but it will also bring new challenges to the data-trading programs commonly used in TODAY’s AI.

More specifically, traditional data processing models of AI typically engage in simple data trading models, where one participant collects and transfers data to the other, the other participant cleans and fuses the data, and finally a third party takes the integrated data and builds models for the other participant to use. These models are usually the end product sold as a service. This traditional process is challenged by these new data regulations. And because the user may not be clear about the future use of the model, the transaction violates laws such as GDPR.

As a result, we face the conundrum that our data is isolated island form, but in many cases we are prevented from collecting, merging, and using data in different places for AI processing. Legally addressing data fragmentation and isolation is a major challenge facing AI researchers and practitioners today.

2. FL overview

Google proposed the concept of FL, whose main idea is to build machine learning models based on data sets distributed across multiple devices while preventing data leakage. Recent improvements have focused on overcoming statistical challenges in federal learning [60, 77] and improving security [9, 23]. There is also research work to make federated learning more personalized [13, 60]. The above work all focuses on federated learning on devices, where designing distributed mobile user interactions and communication costs in large-scale distribution, unbalanced data distribution and device reliability are some of the main privacy optimizations. In addition, the data is divided by user ID or device ID, so it is divided horizontally in the data space. As stated in [58], this workline is highly relevant to privacy-protecting machine learning because it also considers data privacy in distributed collaborative learning environments. To extend the concept of federated learning to collaborative learning schemes between organizations, we extend the original “federated learning” to the general concept of all privacy-preserving decentralized collaborative machine learning technologies. In [71], we give a preliminary overview of associative learning and associative transfer learning techniques. In this paper, we will further investigate the relevant security foundations and explore the relationship with several other related fields, such as multi-agent theory and privacy-protected data mining. In this section, we provide a more comprehensive federated learning definition that takes into account data partitioning, security, and applications. We also describe the workflow and architecture of the federated learning system.

2.1 define

M, F, E, D table shows the number of the same training model, (a C C u r a C y) M S U M M 1 ∪ D 2 ∪.. V_{FED} indicates the model trained by the data owners. V_{FED} indicates the accuracy of model M_{FED}. \\ M_{SUM} indicates that D=D_1 \cup D_2 \cup… MFED indicates the accuracy of the MFED model. VFED indicates the accuracy of the MFED model. MSUM indicates that D=D1∪D2∪… ∪Dn training model, VSUM represents the accuracy of MSUM

If the above formula is satisfied, then we say that the federated learning algorithm has a loss of σ accuracy; a loss of sigma accuracy; a loss of σ accuracy

2.2 FL privacy

  • Secure Multiparty Computation (SMC)

    The SMC security model involves multiple parties and provides proof of security in a well-defined simulation framework to ensure complete zero-knowledge, that is, none of the parties is known except for the inputs and outputs. Zero knowledge is highly desirable, but the attributes of such expectations often require complex computing protocols and may not be implemented effectively. In some cases, partial disclosure of knowledge may be considered acceptable if security assurances are provided. Security models can be established with SMC under low security requirements in exchange for efficiency [16] Privacy-Preserving Multivariate Statistical Analysis :Linear Regression and Regression Classification, 2004. More recently, a study [46] used the SMC framework to train machine learning models with two servers and semi-honest assumptions. In [33], the MPC protocol is used for model training and validation without the need for users to disclose sensitive data. One of the most advanced SMC frameworks is Sharemind [8] A Framework for Fast Privacy-Preserving Com-putations, 2008. [44] mixed protocol framework for machine learning, 2018** authors propose a 3PC model with an honest majority [5,21,45] and ** consider security in semi-honest and malicious assumptions. These works require participants’ data to be secretly shared between non-competing servers.

  • Differential Privacy

    Another line uses Differential Privacy: A Survey of Results. 2008 or K-Anonymous User technology [63] K-anonymity: Interim A Model for Protecting Privacy.2002 data privacy Protection. [3] Privacy-Preserving Data Mining.2000 involves adding noise to data, or masking certain sensitive attributes by induction, until a third party is unable to distinguish between individuals, thus making data impossible to recover and protecting user Privacy. But the essence of these approaches still requires that the data be transferred elsewhere, often a trade-off between accuracy and privacy. In [23] Differentially Private Federated Learning: In A Client level perspec-tive.2017, the authors introduce A differential privacy approach for collaborative learning that aims to protect customer data by hiding customer contributions during training.

  • Homomorphic Encryption

    In the process of machine learning, homomorphic encryption [53] On data banks and privacy homomorphisms.1978 was also adopted to protect user data privacy through parameter exchange under encryption mechanism [24,26,48]. ** Unlike differential privacy protection, data and models are not transmitted themselves, nor can they be guessed by another party’s data. ** Therefore, the likelihood of leakage at the raw data level is minimal. Recent work has used homomorphic encryption to centralize and train data on the cloud [75,76]. In practice, additive homomorphic encryption [2] A Survey on Homomorphic encryptionschemes: Theory and implementation.2018 is widely used, and polynomial approximation is needed to evaluate nonlinear functions in machine learning algorithms, thus making a trade-off between accuracy and privacy [4,35].

2.2.1 Indirect information disclosure

Pioneering work on federated learning has revealed intermediate results, such as parameter updates from optimization algorithms such as stochastic gradient descent (SGD) [41,58]. However, no security guarantee is provided, and when exposed with data structures, such as in the case of image pixels, Leakage of these gradients may actually reveal important data information [51] Privacy-Preserving Deeplearning via Additively Homomorphic Encryption.2018.

In [6] How To BackdoorFederated Learning.2018, the authors demonstrate that it is possible To insert hidden backdoors into a federated global model, A new ** constrain-and-scale model-poisoning method is proposed ** to reduce data poisoning.

[43] In Inference Attacks Against Collab-orative Learning.2018, researchers identified potential vulnerabilities in collaborative machine learning systems, Training data used by parties in collaborative learning are vulnerable to inferential attacks. They showed that adversarial participants could infer membership as well as attributes associated with a subset of training data. They also discussed possible defenses against these attacks.

In Securing Distributed Machine Learning in High Dimensions.2018 [62], the authors reveal potential security issues related to gradient exchange between different parties and propose a security variant of the gradient descent method. They showed that it could sustain a constant proportion of Byzantine workers.

Researchers are also beginning to consider blockchain as a platform to facilitate federated learning. In on-device Federated Learning via Blockchainand its Latency Analysis.2018, researchers consider the Blockchain Federated Learning (BlockFL) architecture, Exchange and validate local learning model updates for mobile devices using blockchain. They consider issues of optimal block generation, network scalability, and robustness.

2.3 classification of FL

Discuss the classification of federated learning according to the distributed characteristics of data. D I Indicates the data data owned by user I. Each row of the rectangle represents a sample, and each column represents a characteristic. At the same time, some data sets may also contain labeled data. X is the characteristic space, Y is the labeled space, I is the sample space I D space. (I, X, y) constructs the complete training set. D_i indicates the data owned by user I. Each row of the rectangle represents a sample, and each column represents a feature. At the same time, some data sets may also contain label data. X is defined as the feature space, Y as the label space, and I as the sample ID space. (I,X,y) constitute a complete training set. Di indicates the data owned by user I. Each row of the rectangle represents a sample, and each column represents a feature. At the same time, some data sets may also contain label data. X is defined as the feature space, Y as the label space, and I as the sample ID space. (I,X,y) constitute a complete training set.

2.3.1 Horizontal federal learning

Horizontal Federated Learning (HFL) is also called Federated Learning divided by sample. It can be applied to scenarios where the data sets of each participant in Federated Learning have the same feature space and different sample space, similar to the Horizontal division of data in table view. For example, city commercial banks in two regions may have very different customer groups in their respective regions, so their customer intersection is very small and their data sets have different sample ids. But their business models are very similar, so the feature space of their data sets is the same, and the two banks can combine for horizontal federated learning to build better risk control models. The conditions for horizontal federal learning are summarized as follows:

HFL security definition

It is generally assumed that the participants of an HFL system are honest, and that the object to be guarded against is an honest but curious aggregation server. That is, usually only the server can threaten the privacy security of data participants.

Related research

  • Practical Secure Aggregation for Privacy-Preserving Machine Learning.2017 proposes a method to update user model or secure aggregation of gradient information under federated learning framework Aggregation) method;
  • Privacy-preserving Deeplearning via Additively Homomorphic Encryption.2018 proposes an Additive Homorphic encryption for model parameter aggregation Encryption (AHE), which can prevent the central server in a federated learning system from stealing model information or data privacy;
  • Federated Multi-Task Learning.2017 proposes a multi-tasking form of Federated learning system that allows multiple participants to complete different machine learning tasks by sharing knowledge and protecting privacy. The multi-task learning model can also solve the problems of communication overhead, network delay and system fault tolerance.
  • Federated learning of deepnetworks using model averaging.2016 proposed a secure “client-server” architecture. Among them, the learning system, the data is stored in the client (participants), each client with its data is in its local training machine learning model, each client training model parameters are sent to a federal study server (that is, the coordinate party), and on the server (such as average) model by using the method of integration to build a global model. The model construction process ensures that the data will not be exposed and protects data security and user privacy. Further, Federated Optimization: Distributedmachine Learning for On-Device Intelligence. 2016 proposes a method to reduce communication overhead by using the data training center model distributed in mobile devices.
  • The Deep gradient compression will: Reducing thecommunication bandwidth for distributed training.2017 proposed a Compression method called Deep Gradient Compression (DGC) *. It can greatly reduce the communication bandwidth required in large-scale distributed training.
  • Deep Models Under the GAN: Information Leakagefrom Collaborative Deep Learning.2017 considers the security model of malicious users, bringing new security challenges to federated learning. When the federated model training is complete, the parameters of the aggregate model and the entire model are exposed to all participants

2.3.2 Longitudinal federated learning

Vertical Federated Learning (VFL) can also be understood as Federated Learning divided by features, which can be applied to Federated Learning composed of participants with the same sample space and different feature space on the data set. Data sets owned by different organizations often have different feature Spaces for different business purposes, but these organizations may share a large user population, as shown in the figure above. By using VFL, we can leverage heterogeneous data distributed across these organizations to build better machine learning models without exchanging and exposing private data.

Longitudinal federated learning conditions are:

In the VFL setup, there are some assumptions about implementing security and privacy protection.

  1. The VFL assumes that participants are honest but curious. (This means that a participant, while complying with the security protocol, will try to deduce as much as possible from information received from other participants about what is contained in the information. Since participants also want to build a more accurate model, they don’t collude with each other.)
  2. VFL assumes that the transmission of information is secure and reliable enough to withstand attacks. In addition, it is assumed that the communication is lossless and does not change the content of the intermediate results.

VFL security definition

Imagine a VFL system with honest but curious participants. For example, in a two-party scenario, there is no collusion and at most one participant is destroyed by the opposing party. Security is defined as a situation where an adversary can only learn from data owned by the participant it has destroyed, and cannot access data of other participants. To facilitate security calculations between two parties, an STP is sometimes added and assumed not to collude with either party. The MPC provides a formal proof of privacy for these protocols, How to Play Any Mental Game. 1987. At the end of the learning process, each participant will only have model parameters related to its own characteristics, so in the reasoning process, the two parties also need to collaborate to generate output results.

Related research

Privacy protection machine learning algorithms for vertically partitioned data have been proposed, including collaborative statistical analysis [15], association rule mining [65], secure linear regression [22,32,55], classification [16] and gradient descent [68].

Recently, [27 “Private Federated learning on vertically partitioned data via Entity resolution and Additively Homo-morphic” Encryption.2017,49 Entity Resolution and Federated Learning Get a Federated Resolution.2018] To train a logistic regression model that maintains privacy. We investigate the effect of entity resolution on learning performance and apply The Taylor approximation to loss and gradient functions so that homomorphic encryption can be used in privacy protection calculations.

2.3.3 Federated transfer learning

HFL and VFL require all participants to have the same feature space or sample space, so as to build an effective shared machine learning model.

However, in more practical cases, the data sets owned by each participant may be highly different:

  • There may be only a few overlapping samples and features between participants’ datasets;
  • The distribution of these data sets can vary widely;
  • The size of these data sets can vary greatly;
  • Some participants may have only data, with little or no annotated data

To solve the above problems, the study can be combined with technical migration, make its can be applied to a wider scope of business, at the same time can help only a small amount of data (samples and features of less overlap) and if the application of supervision (less) to establish effective and precise machine learning model, and abide by the provisions of the data privacy and security regulations. Call this combination federated transfer learning.

FTL is an important extension of the existing federal learning system because it can handle problems beyond the capabilities of HFL and VFL:

FTL security definition

Federal transfer learning systems usually involve two parties. As shown in the next section, the protocol is similar to the protocol in VFL, in which case you can extend the VFL security definition here.

FTL differs from traditional TL

From the technical point of view, it is mainly divided into two aspects:

  • Federated migration learning builds models based on data distributed across multiple parties, and data from each party cannot be centralized or exposed to other parties. Traditional transfer learning has no such limitation;
  • Federated migration learning requires protection of user privacy and data (and even model) security, which is not a major concern in traditional migration learning.

2.4 Federated Learning System architecture

2.4.1 Horizontal federated learning

The two HFL systems are called client-server architecture and peer-to-peer (P2P) architecture.

1) Client-server architecture

The client-server architecture of a typical HFL system is shown in the following figure, also known as master-worker or Hub-and-spoke architecture.

HFL system training process:

  1. Each participant calculates the model gradient locally, disguises the gradient information by using homomorphic encryption, differential privacy or secret sharing encryption technologies, and sends the disguised results (referred to as encryption gradient) to the aggregation server.
  2. The server performs secure aggregation operations, such as using a weighted average based on homomorphic encryption
  3. The server sends the aggregated results to each participant;
  4. Participants encrypt the received gradients and update their own model parameters with the decrypted gradient results

Federated Adveraging algorithm (FedAvg) is divided into gradient average and model average. Both methods are used by the aggregation server to calculate the weighted average after receiving the gradient information of participants or model parameters.

Safety analysis

If the federated averaging algorithm uses secure multi-party computing (SMC) or homomorphic encryption, the framework can protect against semi-honest server attacks and data leakage. However, in the process of collaborative learning, if there is a malicious participative generative adversarial network (GAN), the system may be vulnerable to attack. Information Leakagefrom Collaborative Deep Learning.2017 “.

2) Peer-to-peer network architecture

Since there is no central server in the peer-to-peer network architecture, trainers must agree in advance on the order in which model parameters are sent and received. There are two main ways to achieve this:

  • Loop transmission
  • Then transfer
The advantages and disadvantages

One obvious advantage of peer-to-peer architecture over client-server is the removal of a central server (that is, a parameter or aggregation server or coordinator) that may be difficult to obtain or set up in some practical applications

** Disadvantages: ** In the circular transmission mode, because there is no central server, weight parameters are not updated in batches but updated continuously, which will cost more time to train the model.

2.4.2 Longitudinal federated learning

The VFL training process generally consists of two parts:

  • Encrypted entity alignment (due to the different user populations of party A and Party B companies, the system uses an encryption-based user ID alignment technique)

    • Privacy-preserving Inter-Database Operations.2004

    • Privacy preserving schema anddata matching. 2007

      As described in the two articles above, this technique is used to ensure that Party A and Party B do not need to expose their original data to their joint users. As shown in figure 5-2.

  • Encryption model training (divided into 4 steps)

    • Coordinator C creates the key pair and sends the public key to party A and Party B
    • A and B encrypt and exchange the intermediate results. Intermediate results are used to help calculate gradients and loss values
    • Party A and Party B calculate the encryption gradient and add additional masks respectively. Party B also calculates the encryption loss. Party A and Party B send the encrypted result to Party C
    • Party C encrypts the gradient and loss information and sends the results to Party A and Party B. Party A and Party B remove the mask on the gradient information and update the model parameters according to the gradient information

VFL algorithm

Two of them, such as secure federated linear regression and secure federated promotion tree (SecureBoost, 2019)

2.4.3 Federated migration learning

Suppose that in the VFL example above, players A and B have only A very small overlapping sample set, and we are interested in knowing the labels of the entire data set in Player A. Only applicable to overlapping datasets. In order to extend its coverage to the entire sample space, we introduce transfer learning. This does not change the overall architecture shown in the VFL, but it does change the details of the intermediate results exchanged between Party A and Party B. Specifically, transfer learning usually involves learning A common representation between the functions of participants A and B, and minimizing errors in predicting the labels of participants in the target domain by utilizing the labels in the source domain participant (in this case, B). Therefore, the gradient calculation of Party A and Party B is different from that of the vertical joint learning scheme. In reasoning, you still need both sides to calculate the prediction.

Secure Federated Transfer Learning.2018. A Secure, feature-based federated transfer learning framework.

The following figure depicts the data view of the federated learning framework:

2.4.4 Incentive mechanism

In federation learning, the key to achieving the goal of establishing an incentive mechanism for participants to continue to participate in the data federation is to establish an incentive method to share the profits generated by federation with participants fairly and equitably. The mission objective of Federated Learning Incentivizer (FLI) is to dynamically allocate a given budget to various participants in a Federated Learning Incentivizer, in order to maximize the sustainable operation of the federation while minimizing the unfairness among participants. It can also be extended to a moderating mechanism that helps the federation defend against malevolent actors.

  • Game Theory for Data Science: Eliciting Truthful Information.2017

3 Related work

3.1 Machine learning for privacy protection

Federated learning can be thought of as privacy-preserving and decentralized collaborative machine learning. Therefore, it is closely related to multi-party privacy protection machine learning. In the past, much research has been devoted to this area.

For example, the authors of [17,67] propose a secure multi-party decision tree algorithm for vertically splitting data.

Vaidyaand Clifton proposed secure association mining rules for vertically partitioned data [65], secure K-means [66] and naive Bayesian classifier [64].

[31] Zhang J, Zhang J, Zhang J, et al. A new algorithm for horizontal segmentation of association rules.

Secure support vector machine algorithms have been developed for vertically segmented data [73] and horizontally segmented data [74].

[16] Zhang J, Zhang J, Zhang J, et al. A novel security protocol for multi-linear regression and classification.

[68] Zhang J, Zhang J, Zhang J, et al. A novel method for multi-gradient descent. All of these works use SMC [25,72] for privacy protection.

See chapter 2 of Federal Learning

3.2 Federated learning vs. distributed machine learning

At first glance, horizontal federated learning is similar to distributed machine learning. Distributed machine learning covers many aspects, including distributed storage of training data, distributed operation of computing tasks and distributed distribution of model results. [30] More Effective Distributed ML via a stale synchronous Parallel Parameter Server.2013 is a typical element in distributed machine learning. As a tool to speed up the training process, parameter servers store data on distributed work nodes and allocate data and computing resources through the central scheduling node to train models more efficiently. For HFL, the work node represents the data owner. It has full autonomy of local data. It can decide when and how to join in joint learning. In the parameter server, the central node is always responsible for control. As a result, FL faces a more complex learning environment. In addition, joint learning emphasizes the protection of data privacy for data owners during model training. Effective measures to protect data privacy can better cope with the increasingly strict regulatory environment of data privacy and data security in the future.

As with distributed machine learning Settings, federated learning will also need to work with non-iID data. [77] The authors of Federated Learning with Non-iID Data.2018 show that using non-IID local Data can greatly reduce the performance of Federated Learning. In response, the authors offer a new approach to the problem similar to transfer learning.

See chapter 3 of Federal Learning

3.3 Federated learning vs. edge computing

Because federated learning provides learning protocols for coordination and security, it can be thought of as an operating system for edge computing.

[69] When Edge Meets Learning: The authors of Adaptive Control for Resource-constrained Distributed Machine Learning.2018 consider a generic class of machine learning models trained using gradient descent methods. They are divided from a theoretical point of view

The convergence range of distributed gradient descent is analyzed, and a control algorithm is proposed based on which local update and global parameter aggregation are determined

To minimize the loss function for a given resource budget.

3.4 Federated learning vs. federated database systems

Federated Database Systems for Managing Distributed, Heterogeneous, [57] Andautonomous Databases.1990 is a system that integrates multiple database units and manages the integrated system. The federated database concept is proposed to achieve interoperability with multiple independent databases. Federated database systems typically use distributed storage for database cells; in fact, the data in each database cell is heterogeneous. So it has a lot in common with federated learning in terms of data types and data storage. However, federated database systems do not include any privacy protection mechanisms in their interactions with each other, and all database units are fully visible to the management system. In addition, the federated database system focuses on the basic operation of data (including insert, delete, search and merge), while the purpose of federated learning is to establish a federated model for each data owner under the premise of protecting data privacy, so that the various values and laws contained in the data can better serve us.

4 application

  • financial
  • medical
  • education
  • Urban computing and smart cities
  • Edge computing and smart cities
  • Block chain
  • 5G

See chapter 10 of Federal Learning

Federated Learning and Enterprise Data Consortium

As we all know, the cloud computing model has been challenged by the increasing importance of data privacy and data security and the tight relationship between corporate profits and their data. However, the business model of federated learning provides a new paradigm for big data applications. ** The mechanism of federated learning enables organizations and enterprises to share a unified model without data exchange when the isolated data consumed by each organization does not produce the desired model. ** In addition, ** With the consensus mechanism of blockchain technology, federal learning can set fair rules for the distribution of profits. ** Data owners, regardless of the size of the data they own, will be incentivized to join the data federation and make their own profits. We believe that the business model of data federation should be developed in conjunction with the technical mechanism of federated learning. We will also develop joint learning standards across all areas so that they can be put into use as soon as possible.

6 Challenges and Prospects

HFL

  • In an HFL system, we cannot view or examine distributed training data. This leads to the difficulty of selecting hyperparameters for machine learning models and setting optimizers, especially when training DNN models. It is generally assumed that the collaborator or server has the initial model and instructions on how to train the model, but in practice, without collecting any training data in advance, it is almost impossible to get the correct hyperparameters for the DNN model and set the optimizer;

    • Federated Learning, 2018.05, Federated Learning, 2018.08
  • How to effectively motivate companies and institutions to participate in the HFL system?

    • Effective data protection policies, appropriate incentives and commercial models for HFL need to be designed
  • How to prevent cheating by participants?

    • A holistic approach to protecting honest participants needs to be devised
  • In order to achieve large-scale commercial use of HFL, the mechanism of mastering the training process also needs to be studied. ** For example, since model training and evaluation are performed locally on each participant, we need to explore new ways to avoid over-fitting and triggering premature training stops

  • How to manage participants with different levels of reliability?

VFL

  • VFL training is vulnerable to communication failures and requires a reliable and efficient communication mechanism

    • It may be necessary to design a streaming communication mechanism that can efficiently time each participant’s training and communication to offset delays in data transmission. Fault-tolerant mechanisms that can tolerate crashes during the VFL process are also details that must be considered when implementing a VFL system
  • Because VFL typically requires closer and direct interaction between participants, flexible and efficient security protocols are required to meet the security needs of each party

    • Previous work: Server-Aided Secure Computation with Off-line Parties.2017, Machine Learning Classification over Encrypted Data.,2015 proves that only targeted security tools can make different types of computing achieve the best results
  • Efficient security-based entity alignment technology

TFL

  • A plan for learning transferable knowledge needs to be developed. The scheme can well capture the invariance between the participants. In sequential transfer learning and centralized transfer learning, transfer knowledge is usually represented by a general ground pretraining model. In federated transfer learning, the knowledge of transfer is acquired by the local models of all participants. Each participant has complete control over the design and training of their local model. A balance needs to be struck between the autonomy and generalization performance of federated learning models.
  • It is necessary to determine how to ensure the privacy security of the shared representation of all participants and learn the method of transferring knowledge representation in distributed environment.
  • Efficient security protocols need to be designed that can be deployed in FTL. Careful consideration is needed when designing or choosing a security protocol to balance security with computational overhead

Recently, the isolation of data and the emphasis on data privacy have become the next challenges for AI, but joint learning offers new hope. It can build a unified model for multiple enterprises while protecting local data so that enterprises can work together on data security.

reference

  • Yang Q, Liu Y, Chen T, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2019, 10(2): 1-19.
  • Federal Learning, Yang Qiang et al

Due to my limited personal ability, I will first sort it out here and then update it according to the specific research situation

— fZHIY. updated at 18:27 on 23 August 2020