Apache's First Asia Technology Summit: A detailed introduction to the big data field players

The introduction

As more and more enterprises start the digital transformation, the big data industry has witnessed unprecedented rapid development, and the prosperity of big data has brought unprecedented opportunities and challenges to the technologies of the big data ecology. Speaking of big data technology, I believe you are familiar with Apache. The vast majority of open source technology of big data comes from the Apache Foundation. Today, I would like to introduce Apache’s annual event – ApacheCon to you

ApacheCon

@Official Global Series Conference

ApacheCon is the official global series of conferences of the Apache Software Foundation (ASF), held annually. As a prestigious open source feast, it is one of the most anticipated conferences in the open source industry.

Since its inception in 1998, APACHECON has attracted more than 350 technology projects and different communities to participate in it, bringing together industry experts and teachers from home and abroad to share the latest global technology trends and practices, and jointly discuss “tomorrow’s technology”, so that the majority of technology enthusiasts can see the frontier of each technology. What are the latest trends and developments to better upgrade your tech stack?

However, Apachecon has been held overseas for more than a decade. This year is the first time for the organizing committee to hold Apachecon online conference: Apachecon Asia for the Asia-Pacific region. The 140+ topics from China, Japan, India, the United States and abroad will be divided into Big Data, Incubator, API/Microservices, Middleware, Workflow and Data Governance, Data Visualization, Observability, Stream Processing, Message Systems, IoT and Industrial IoT, Integration, Open Source Community/Culture, and Web Server/Tomcat and other 14 forums.

Participating in the ASIA conference on 6-8 August 2021, you will receive:

· Share the latest technology trends and practices in the world

· Communication opportunities with more than 200 top experts at home and abroad

· 3-day event, 140+ topics, full free participation

Official website:

https://www.apachecon.com/aca…

Agenda details:

https://apachecon.com/acasia2…

About the Big Data forum

Big Data is one of the most important topics in Apache. Big data fields have also been very busy this year, Items covered include Arrow, Atlas, Bigtop, Carbondata, Cassandra, DolphinScheduler, Doris(in incubator), Druid, Flink, Hadoop, HBase, Hive, Hudi, Impla, Kylin, Kyuub I (in incubation), Liminal(in incubation), Nemo, Pinot, Pulsar, Spark, Yunikorn (in incubation) and other top or ongoing projects, as well as currently hot open source projects such as Milvus and OpenLookeng. In this 3-day conference, everyone can understand the cutting-edge trends of these technologies and the practical experience, principles, architecture analysis and other wonderful content from front-line users.

The producers of

Because big data technology is too hot, the agenda is full of three days. Today, we will give you a detailed interpretation of the first day of the technology masters at home and abroad.

Big Data also invited three presenters

August 6th agenda highlights @Apache

Big Data

Extending Impala — Common mistakes and best practices

Guest: Manish Maheshwari

Time: 13:30 on August 6th

Topic introduction:

Apache Impala is a complex engine that requires a comprehensive technical understanding to fully use it. In this lecture, we’ll discuss ingestion best practices to keep Impala deployment extensible, as well as access control configurations to provide a consistent experience for end users. We’ll also take a high-level look at Impala’s query configuration file, which is used as a first stop for any performance troubleshooting. In addition, we’ll discuss common mistakes that users and BI tools make when interacting with Impala. Finally, we’ll discuss an ideal configuration to put all of this into practice.

Its implementation and application challenges.

Guest introduction:

Manish Maheshwari

Principal Sales Engineer at Cloudera with over 15 years of experience in building very large scale data warehouses and analytical solutions. Experience in Apache Hadoop, DI and BI tools, data mining and forecasting, data modeling, master data and metadata management, and dashboard tools. Proficient in Hadoop, SAS, R, Informatica, Teradata and Qlikview.

How can DBS’s [Singapore Development Bank’s] data platform leverage Apache Carbondata to drive real-time insights and analytics

By Ravindra Pesala/Kumar Vishal

Time: 13:30 on August 6th

Topic introduction:

At DBS, a leading bank based in Singapore, banks already have terabytes of structured and unstructured data that is important to helping them strategists. In 2020, DBS is investing in a data platform based on Carbondata to drive real-time analytics and unlock insights from existing data from a variety of sources. In this presentation, we will look at how DBS Bank is using Spark and Presto engines to move from a traditional data warehouse to a data lake based on Carbondata.

Guest introduction:

Ravindra Pesala

Senior Vice President, Head of Big Data Platform, DBS Bank, Singapore

Apache CarbonData PMC

Lead big data engineering platforms including ingestion, computing, data access, streaming, and metadata.

Kumar Vishal

Apache CarbonData PMC

Senior big data engineer

Processing big data engineering platform, including ingestion, computing, data access, streaming media

The challenge of building a distributed fault-tolerant and extensible analysis stack

Guest: Nishant Bangarwa

Time: 14:10 on August 6th

Topic introduction:

To date, the Apache Druid cluster has more than 50 trillion events, equivalent to more than 500 petabytes of raw data, and is growing. In this presentation, we will introduce the design of a distributed fault-tolerant extensible analysis stack and its challenges, and describe our path to developing Apache Druid into a powerful distributed fault-tolerant extensible analysis data store.

Guest introduction:

Nishant Bangarwa

Co-Founder and Head of Engineering, RillData.

Active open source contributor, committer to Apache Druid PMC & Apache Superset PMC, Apache Calcite and Apache Hive.

Prior to RillData, he was a member of Cloudera’s data warehouse team and the Metamarkets Druid team, where he was responsible for managing large-scale Apache Druid deployments.

Bachelor of Computer Science, Kurukshetra National Institute of Technology, India

How is security achieved in Apache Ozone

Sharing Guest: Bharat Viswanadham/Shashikant Banerjee

Time: 14:10 on August 6th

Topic introduction:

Apache Ozone is a scalable, redundant, distributed Hadoop object store that will become Apache’s top project in 2020. Apache Ozone has two metadata services: a Storage Container Manager (SCM), which manages block/container allocation and replication, certificate and node management; The other is OzonManager, which manages metadata. In this lecture we will discuss how security is achieved in Ozone.

Its implementation and application challenges.

Guest introduction:

Bharat Viswanadham: Software engineering specialist with over 7 years of experience in designing and building scalable and high-performance distributed storage systems. Apache Hadoop and Apache Ozone Committer & PMC.

Shashikant Banerjee: 8 + years of experience as an expert in distributed storage systems. Committer & PMC of Apache Hadoop, Apache Ozone and Apache Ratis community.

Analysis and application of OpenLookeng heuristic index framework

Guest: Li Zheng

Time: 14:50 on August 6th

Topic introduction:

With the application and development of big data technology, there are more and more data types, more and more wide data distribution, and more and more complex query scenarios. This makes the data difficult or not easy to process. In order to improve the availability of big data, Huawei launched OpenLookeng, an open source data virtualization engine project.

OpenLookeng provides a unified SQL interface, provides basic interactive query and analysis capabilities, and continues to evolve across data center/cloud, data source scaling, performance, reliability, security, and more to simplify big data. This presentation will focus on the OpenLookeng heuristic indexing framework and the critical indexing techniques based on the framework and their implementation and application challenges.

Guest introduction:

Pan li

Doctor of Huazhong University of Science and Technology. He joined Huawei in June 2018. At present, he focuses on the performance optimization research of OpenLookeng and is deeply involved in the design and implementation of big data query analysis engine architecture and other related work.

Kyuubi: netease’s exploration and practical application of Serverless Spark scenario

Guest: Yao Qin

Time: 14:50 on August 6th

Topic introduction:

This topic mainly covers the introduction of the architecture, implementation principles and application scenarios of netease open source big data component Kyuubi project, and demonstrates the ability of Kyuubi to help businesses realize Serverless Spark within netease and the corresponding process and thinking through actual cases. It also describes how we were directly involved with the Spark open source community in the process of simultaneous troubleshooting and feature optimization.

Guest introduction:

Yao Qin

Lead author of the Apache Kyuubi project

Apache Spark Committer

Apache Submarine Committer

From the netease big data team

Cross-data source data analysis of China Merchants Bank

Guest: Wu Qiumin

Time: 15:30, August 6

Topic introduction:

China Merchants Bank (CMB) has PB of data stored in RDBMS, NoSQL database, object storage, big data framework – Apache Hadoop, Spark, Flink, etc. The cost of transferring data from different data sources via ETL methods is high. Therefore, OpenLookeng was introduced to connect different data sources and process data locally across data centers and hybrid clouds.

This presentation will provide an overview of the CMB’s data processing engine, which enables in-situ analysis of geographically remote data sources. And how we can use OpenLookeng’s features, such as high availability, automatic scaling, built-in caching, and indexing support, to meet the reliability of enterprise workloads.

Guest introduction:

Monsieur beaucaire wu now

Big data technical expert of China Merchants Bank, 9 years of big data experience in the field of financial technology, responsible for the architecture design, implementation and maintenance of China Merchants Bank’s big data platform. OpenLookeng PMC.

Inside the storage and query engine of Apache Druid

Gian Merlino

Time: 15:30, August 6

Topic introduction:

Apache Druid is an open source columnar database known for its large scale and high performance; Its largest deployment consists of thousands of servers. But regardless of scale, high performance starts from a good foundation. This presentation will dig deeper into these fundamentals by exploring the inner workings of a single data server. We’ll look at how Apache Druid stores data, what compression is used, how the storage engine is connected to the query processing engine, and how the system handles resource management and multithreading.

Guest introduction:

Gian Merlino

Co-founder and CTO of Imply. One of the main submitters of Druid. He led the data uptake team at Metamarkets and held senior engineering positions at Yahoo! Bachelor of Science in Computer Science, California Institute of Technology.

Speed up big data analysis by using Apache Carbondata’s index

Guest: Akash R Nilugal/Kunal Kapoor

Time: 16:10 on August 6th

Topic introduction:

Data in the 21st century is like oil in the 18th century: a huge, untapped and valuable asset, if handled in an intelligent way. The storage and analysis of big data is challenging and expensive, both in cost and time. Analytical solutions need to constantly adapt to the challenge of keeping up with the exponential growth of data. Apache CarbonData is a unified storage solution + file format designed to optimize query performance and thus reduce analysis costs. Apache Carbondata has been adopted by over 100 open source users. In a database, an index is one of the main functions that basically helps with queries without having to scan every row. Taking inspiration from this concept, Apache Carbondata supports custom indexes such as min/Max, Bloom, Lucene, secondary indexes and materialized views to speed up row-level updates, deletes, OLAP and point queries. This presentation highlights CarbonData’s custom indexing architecture and distributed index caching servers, which help deliver faster query results, as well as the challenges and scope of the future.

Guest introduction:

Akash R Nilugal

Apache Carbondata PMC & Committer

Senior technical leader of cloud and AI/ data platform team at Huawei’s Banglore Research Center.

5 years of experience in big data, interested in the areas of index support for big data, materialized views, CDC for big data, Spark SQL query optimization, Spark structured flows, data lake and data warehouse functionality.

Kunal Kapoor

Apache Carbondata PMC & Committer, Cloud and AI/ Data Platform Team System Architect, mainly responsible for distributed index caching server, Hive + Carbondata integration, pre-aggregation support, S3 support for Carbondata, Carbondata secondary indexing, Spark SQL query optimization in Carbondata.

Java-based big data machine learning scheme

Sharing Guest: Lan Qing

Time: 16:10 on August 6th

Topic introduction:

The success of machine learning (ML) applications depends on the use of big data. Most big data is provided in an unstructured format. The availability of big data can also be offline and online. While ML task options are available in Python, integrating Python applications into existing Java/Scala based big data pipelines can be quite challenging. Beyond that, there are few options in Java/Scala to bridge the gap between working with big data and using the same library for ML workloads.

To address the above issues, we will use a machine learning framework in Java, DJL, to demonstrate a big data ML solution in Java. DJL provides a variety of ML engines, including TensorFlow, PyTorch, and Apache MXNet (currently being incubated). PaddlePaddle, onnxRuntime, etc. By using Apache Flink and Apache Spark, users can easily set up their online/offline ML pipelines. At the end of the session, listeners will be able to build an easy-to-use, high-performance ML pipeline for all the different scenarios.

Guest introduction:

LanQing

Software development engineer of Amazon AWS machine learning platform, deep knowledge of big data and application architecture of machine learning in production environment.

One of the co-authors of DJL (djl.ai)

Apache MXNet PPMC

Master’s degree in computer engineering from Columbia University

Insights into the secrets of the open source community – best practices for data-driven community operations

Guest: Zhong Jun/Jiang Yikun/Peng Lei

Time: 16:50 on August 6th

Topic introduction:

During the evaluation process of the open source community, data-driven insight and analysis into the current state of the community is very meaningful to help the community grow healthfully. As a result, data-driven operations play a key role in the community. In this topic, we will introduce best practices for data-driven community operations. This operational management system helps several of China’s most active open source communities (such as OpenEuler, OpenGauss, OpenLookeng, MindSpore, etc.) measure health, activity, and other key metrics efficiently and scientifically. This topic will also describe how the data-driven operating system is implemented based on the real cases of the OpenEuler community, and introduce how to use the powerful Apache big data project to build the first usable version (including data storage, analysis, data insight and visualization). And the improvements we have contributed to the Apache Upstream project.

Guest introduction:

Zhong jun

Involved in the open source community for over 6 years. Responsible for the digital operation system of OpenEuler, MindSpore, OpenGauss and OpenLookeng projects. As a core contributor to several communities, such as maintainer of the Infra Sig team in the OpenEuler open source community, maintainer of the Infra Sig team in the OpenGauss open source community, and a core member of the OpenStack Manila project.

Jiang Yikun

As a senior software engineer in Huawei Open Source Development Team, he has been involved in the open source community for more than 5 years and is committed to multi-architecture support and improvement of projects in the field of big data. Five years of experience in cloud computing and big data optimization. Previously, he was an OpenStack storage project Committer.

Peng lei

As a senior software engineer in Huawei Open Source Development Team, I was engaged in MySQL’s multi-architecture support and improvement work. 5 years of SQL development and big data experience. I have studied the kernel of MySQL, including MySQL group replication, and worked on the kernel development of distributed database. 2 years experience in big data projects such as Spark/Kafka/Hadoop.

Apache Hudi on AWS

Sharing Guest: Fei Lianghong

Time: 16:50 on August 6th

Topic introduction:

Introducing Apache Hudi on AWS, including Apache Hudi introduction, common use cases, Hudi storage types, writing Hudi datasets, querying Hudi datasets, and some tips.

Guest introduction:

FeiLiangHong

Lead Developer Evangelist at Amazon Web Services AWS

Leverages his 20 years of experience to support innovation and help start-ups and companies bring their ideas to life. Focusing on software development and cloud native architecture, as well as the technical and business impact of machine learning and data analytics. Before joining AWS, he worked at Apple and Microsoft. Some interests include artificial intelligence, data science and photography.

These are the highlights of the first day of the Asia Conference Big Data Forum. Please look forward to the big names in the second and third days!

See here you are still hesitating what, hurry up to sign up!

Registration form

ApacheCon Asia 2021

From August 6 to 8

14 forums, 100+ technology projects

140+ topic speeches

Connect to talk to global technology gurus and experts

All weather communication event for 3 days

Free Enrolment

Apachecon’s first Asian online conference

August 6-8, 2021

Looking forward to the arrival of friends

Click on the”here] can be signed up

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Apache’s First Asia Technology Summit: A detailed introduction to the big data field players

The introduction

About the Big Data forum

The producers of

August 6th agenda highlights @Apache

Extending Impala — Common mistakes and best practices

How can DBS’s [Singapore Development Bank’s] data platform leverage Apache Carbondata to drive real-time insights and analytics

The challenge of building a distributed fault-tolerant and extensible analysis stack

How is security achieved in Apache Ozone

Analysis and application of OpenLookeng heuristic index framework

Kyuubi: netease’s exploration and practical application of Serverless Spark scenario

Cross-data source data analysis of China Merchants Bank

Inside the storage and query engine of Apache Druid

Speed up big data analysis by using Apache Carbondata’s index

Java-based big data machine learning scheme

Insights into the secrets of the open source community – best practices for data-driven community operations

Apache Hudi on AWS

Apache’s First Asia Technology Summit: A detailed introduction to the big data field players

The introduction

About the Big Data forum

The producers of

August 6th agenda highlights @Apache

Extending Impala — Common mistakes and best practices

How can DBS’s [Singapore Development Bank’s] data platform leverage Apache Carbondata to drive real-time insights and analytics

The challenge of building a distributed fault-tolerant and extensible analysis stack

How is security achieved in Apache Ozone

Analysis and application of OpenLookeng heuristic index framework

Kyuubi: netease’s exploration and practical application of Serverless Spark scenario

Cross-data source data analysis of China Merchants Bank

Inside the storage and query engine of Apache Druid

Speed up big data analysis by using Apache Carbondata’s index

Java-based big data machine learning scheme

Insights into the secrets of the open source community – best practices for data-driven community operations

Apache Hudi on AWS

Related Posts

Qiang Guo: Open source satisfies a certain amount of vanity in me

Apache BRPC Committer Li Lei: Through sharing, the underlings can constantly step on the shoulders of giants

Zhai Jia, StreamNative CO-FOUNDER: The open source and Apache communities are a treasure trove of magic