Open source big data community & the sixth episode of Aliyun EMR series live broadcast

Theme: EMR Spark on ACK product demos and best practices

Lecturer: Shi Lei, technical expert of Aliyun EMR team

Content framework:

• Yunyuan Biochemical Challenge and Alibaba Practice • Spark Containerized Solution • Product Introduction and Demonstration

Live playback: scan the qr code at the bottom of the article to join nail group to watch replays, or enter the link…

I. Yunyuan Biochemical Challenge and Ali Practice

Big data technology trends

Yunyuan Biochemistry faces challenges

Separation of computation and storage

How to build an HCFS file system based on object storage

• Fully compatible with existing HDFS • Performance benchmarking HDFS, cost reduction

Shuffle Deposits

How to solve ACK mixed heterogeneous models

• Community discussion [SPARK-25299] supports Spark dynamic resources, which has become a consensus in the industry

Caching scheme

How to effectively support cross-room and cross-dedicated hybrid cloud

• Need to support caching within the container for system ACK scheduling How can scheduling performance bottlenecks be addressed • Performance benchlining YARN • Multi-level queue management


• Peak-shifting scheduling • Mutual perception of Yarnon ACK node resources

Ali Practice – Emr on ACK

Introduction to the overall scheme

• Submitted to different execution platforms through data development cluster/scheduling platform • Peak-shifting scheduling and adjusted according to business peak and low peak strategies • Cloud native data lake architecture, ACK has strong elastic expansion capacity • Through dedicated lines, mixed scheduling on cloud and under cloud • ACK manages heterogeneous type clusters with good flexibility

2. Spark Containers

Plan to introduce


1. Why do I need the Remote Shuffle Service?

• RSS enables Spark jobs without the need for Executor Pods to mount cloud disk. Mounting cloud disk is very bad for scalability and large-scale production practices. • cloud disk size can not be determined in advance, large waste of space, small Shuffle will fail. RSS is specifically designed for storage computing separation scenarios. • Executor writes Shuffle data to the RSS system. The RSS system manages the Shuffle data, and when the Executor is idle, it can be recycled. [SPARK-25299] • Perfect support for dynamic resources, avoiding data-skewed long-tail tasks that drag Executor resources down.

2. How is RSS performance, cost and scalability?

• RSS is deeply optimized for Shuffle, specifically designed for storing and calculating separate scenes and K8S elastic scenes. • For ShuffleFetch phase, the random read in Reduce phase can be changed into sequential read, which greatly improves the stability and performance of the job. • You can use disks in the existing K8S cluster to deploy directly, without adding extra cloud disks to shuffle. Very high cost performance, flexible deployment.

Spark Shuffle

• NumMapper * NumReducer Block • Sequence write, random read • Write Spill • Single copy, loss of data requires stage recalculation

EMR Remote Shuffle Service

• Append writes, sequential reads

• No write time Spill

• Two copies; The copy is completed when it is copied to memory

• Replicas are backed up via Intranet, no public network bandwidth is required

RSS TeraSort Benchmark

• Note: Take 10T Terasort as an example, Shuffle volume is compressed about 5.6T. It can be seen that the performance of jobs of this magnitude can be significantly improved in the RSS scenario due to shuffle reads being read sequentially.

Spark on ECI effect


This article is the original content of Aliyun, shall not be reproduced without permission.