Abstract:

At the beginning of the New Year, three BAT news on data security successfully attracted the public’s attention, which also triggered public concerns about the collection and use of user data by big companies. This article objectively analyzes the concerns of the public and how Internet companies use these data, which can be said to be a good popularization of data security.

information

  • At the beginning of the New Year, three BAT news on data security successfully attracted the public’s attention, which also triggered public concerns about the collection and use of user data by big companies. This article objectively analyzes the concerns of the public and how Internet companies use these data, which can be said to be a good popularization of data security.
  • There has always been a relatively high threshold for machine learning or deep learning to enter, such as a certain foundation of algorithm, not too bad in mathematics, at least be able to understand the meaning of Loss function and so on. AutoML Vision makes it possible for ordinary people to create and drill their own models with a few clicks on a page. In this way, ordinary companies with little understanding of deep learning can build their own AI systems. This is important for popularizing ARTIFICIAL intelligence technology.
  • There’s nothing new about AI systems beating humans. This time it was “reading comprehension,” in which machines and humans read passages and answer questions based on their own understanding of the passages. The end result is that Microsoft’s and Alibaba’s systems beat humans by a few percentage points each.

technology

  • Big data ecology has gradually transformed from traditional batch processing scenarios such as Hadoop, Hive and Spark in the early stage to the stage of integration with streaming processing scenarios such as Kafka and Flink and machine learning scenarios such as Tensorflow. Thus realize data from collection to storage to analysis to mining one-stop processing process. More and more ecological components and more and more application scenarios pose great challenges to how to integrate these components into a complete data pipeline. This article introduces the idea of using container technology to implement a data pipeline, that is, relying entirely on cloud services and using Kubernetes to provide unified orchestration. In this solution, Hadoop, Spark, TensorFlow, etc. are containerized, and data flows between these containers are controlled by Kubernetes. This is a serverless framework (see this article for more information on serverless technologies). The article also mentioned Nuclio, another serverless framework. Here is a link to Eliran Bivas’ report at Kubecon 2017.
  • From theory to strategy algorithm to see recommendation system architecture products | Spark practice case This is a pure technical article, this paper introduces the principle and implementation of recommendation system. Generally speaking, recommendation system is similar to search engine and advertising system, which is a correlation ranking problem. The core of the correlation sorting problem is how to define the correlation, which leads to a variety of similarity measurement algorithms. When a user searches for and clicks on a term, the simplest recommendation is made by presenting the most relevant items to the user. The process is as complicated as it is simple. This paper systematically analyzes the factors that should be carefully considered in specific similarity calculation, such as long tail effect (Matthew effect), calculation matrix is too large, multi-factor weight adjustment and so on. After that, spark code implementation is given for two classic scenarios (or technologies) of personalized recommendation and collaborative filtering, which can be practiced as a novice in learning recommendation system. The value of this article is that in addition to algorithm and technology, it also discusses product forms and technical architecture, which are often lacking in ordinary technical personnel. If readers want to become a comprehensive talent with comprehensive quality, this part of the content should be read more.
  • The link above is the first of a series of articles on Apache Ranger. In this series of articles, the author has made an in-depth analysis of the research and selection, testing and principle, which can be said to be a good introduction to ranger for readers. In addition, if you are interested in hadoop security selection, please refer to this English article.
  • This document describes Spark Security in terms of authentication, authorization, data/link encryption, and Security interaction with other systems. In terms of authentication, spNEGO (Kerberos-based HTTP authentication mechanism), LDAP, and SASL (described in the section on data link encryption) are introduced. Authorization This section describes how to use spark ACL. This article describes which data and data links need to be encrypted in the Spark system. At the end of the article, we introduce how to interact with other systems safely.