Abstract:

information

  • Impala joined the Apache Foundation three years after Cloudera announced the Impala project five years ago, Impala has recently graduated as a top Apache Software Foundation project. Impala is a high-performance analytics database for lightning fast distributed SQL queries against petabytes of data stored in Apache Hadoop clusters. Alibaba Cloud’s e-MapReduce product also recently added support for Impala.
  • Apache Hadoop 3.0.0 GA release After four alpha and one beta versions, Hadoop 3.0 has been released, supporting new features such as HDFS Erasable code, YARN Timeline Service V. 2, and S3Guard. Hadoop 3.0 is also the first official support for ali Cloud OSS file system.
  • The pre-research team of Tencent’s security Platform department has found that TensorFlow, Google’s artificial intelligence learning system, has its own security risks, which can be used by hackers to pose a security threat. It has reported this risk to Google.

technology

  • The author is a big data development engineer of China Minsheng Bank. In this article, he introduces key processes such as technology selection, RowKey design, and Spark HBase writing process in the process of migrating from the Oracle-based storage query system to the HBase distributed platform. They also recorded the potholes they stepped on.
  • This article, translated from Confluent’s blog post, takes an in-depth look at the transactional features of Kafka. The author introduces the main use cases of transaction API design, Kafka transaction semantics, JavaClient transaction API details, and some interesting implementation aspects. At the same time, the article discusses some important factors in API usage.
  • This paper introduces an implementation scheme of audio classification (including category, scene, etc.) using TensorFlow, including alternative model, alternative data set, data set preparation, model training, result extraction, etc., all have detailed guidance. In particular, the author describes how to implement a Web interface and integrate IoT.
  • This article is from the team leader and architect of the recommendation department of JINGdong at that time, which is included in the technical book of Jingdong. This paper introduces the overall business architecture, the construction of the underlying data platform and the key technical points of the recommender system. Open-source big data products such as Hadoop, Spark, HBase and Kafka play an important role in offline and online computing respectively.