With the development of the Internet, big data has become a new generation of “Internet celebrities”. Almost all walks of life are related to big data. Spark is one of the most important frameworks in big data. Here is how to get started with Spark.

Apache Spark is the most commonly used memory-based technology framework in the big data industry. In particular, RDD features and applications help you understand Spark and task submission processes and caching mechanisms.

The preceding tutorials help you learn Spark environment construction, task scheduling process, and RDD code application.

Course Contents:

Chapter 1 Introduction to Spark

01_ Why Learn Spark

02_Spark and MapReduce comparison mp4

03_Spark framework system

04 _spark download

05.Spark running mode description

06. Install the Spark cluster

07.Spark program execution process

08. Explain the terms related to Spark

09_SparkShellLocal

10_SparkShellCluster

Comparison between 11_SPARK2.2 and Spark1.6 shells

Chapter 2 Maven and IDEA

12_Maven and IDEA download

13 _maven installation

14 _idea installation

Configure Maven in 15_IDEA

16_Scala environment installation and Scala plug-in configuration in IDEA

17_IDEA Creates the Spark project

18_Spark To develop WordCount programs

19_Spark package

20_Spark Cluster runs the package

Chapter 3 RDD knowledge explanation

21 _rdd concept

22_RDD Execution process

23 _rdd properties

24 _rdd elastic

Two creation of 25_RDD

26 _rdd programming apis

Chapter 4 Transformation algorithm

27 _transformation algorithm

28 _action algorithm

29_Map

30_filter

31_flatMap

32_sample

33 union

34 intersection

35 distinct

36 join

37_leftOuterJoin

38_rightOuterJoin

39_cartesian

40_groupBy

41_mapPartition

42_mapPartitionWithIndex

43_sortby

44_sortbykey

45_repartition

46_coalesce

47_partitionBy

48_repartitionAndSortWithinPartitions

49_reduce

50_reduceByKey

51_aggregateByKey

52_combineByKey

Chapter 5 Action algorithm

53_collect

54_count

55_top

56_take

57_takeOrdered

58_first

59_saveAsTextFile

60_foreach

61_ countByKey of other operators

62_ countByValue of other operators

63_ filterByRange of other operators

64_ flatMapValues of other operators

65_ foreachPartition of other operators

66_ keyBy of other operators

67_ Other operators keys and values

68_ collectAsMap of other operators

69_RDD function pass

70_RDD dependencies

71_RDD Task Division

72 _lineage descent

73_RDD Cache (Persistent)