The RDD action operator of Spark three data structures

This is the fourth day of my participation in Gwen Challenge

introduce

RDD operators can be classified into two kinds of transformations or actions in terms of data operations.
- Conversion operator: To convert one RDD to another RDD is just a function overlay and is not actually performed. (Decorator design mode)
- Action operator: The action operator actually triggers SparkContext to submit the Job.

`This article describes the action operator`

1. reduce

Aggregate all elements in the RDD, aggregating data first within partitions and then between partitions

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.2.4.5.6))/ / the aggregation
val sum = rdd.reduce(_ + _)
println(sum)

sc.stop()
Copy the code

2. collect

In a Driver, all elements of a data set are returned as an Array

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.2.4.5.6), 3
)

val list = rdd.collect()
println(list.mkString(","))

sc.stop()
Copy the code

3. count

Take the number of elements in RDD

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.2.4.5.6), 3
)

val count = rdd.count()
println(count)

sc.stop()
Copy the code

4. first

Take the first element in the RDD

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.2.4.5.6), 3
)
// take the first one
val firstNum = rdd.first()
println(firstNum)

sc.stop()

Copy the code

5. take

Take an array of the first n elements of the RDD

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.2.4.5.6), 3
)
// take the first three
val numList = rdd.take(3)
println(numList.mkString(","))

sc.stop()
Copy the code

6. takeOrdered

Take the array of the first n elements sorted by RDD

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.7.4.5.6), 3
)

/ / ascending
//val numList = rdd.takeOrdered(3)
/ / descending
val numList = rdd.takeOrdered(3) (Ordering.Int.reverse)
println(numList.mkString(","))

sc.stop()
Copy the code

7. aggregate

Partitioned data is aggregated using the initial value and data within the partition, and then aggregated with the initial value between partitions

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.7.4.5), 2
)

// 0+1 + 7 =8
// 0+4 + 5 =9
/ / 0 + 9 + 10 = 90
val res: Int = rdd.aggregate(0)(
    (i: Int, j: Int) => {
        i + j
    }
    , (i: Int, j: Int) => {
        i + j
    }
)
println(res)

// 1+1 + 7 =9
// 1+4 + 5 =10
/ / 1 * 9 * 10 = 90
val res2: Int = rdd.aggregate(1)(_ + _, _ * _)
println(res2)

sc.stop()
Copy the code

8. fold

Aggregating all elements in the RDD, aggregating data within and between partitions first, and computing rules within and between partitions are the same

This is the same thing as aggregate

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[Int] = sc.makeRDD(
    List(1.7.4.5), 2
)

RDD. Aggregate (0)(_ + _, _ + _)
val res: Int = rdd.fold(0)(_ + _)
println(res)

sc.stop()
Copy the code

9. countByKey

Count the number of each key

// Use examples

val sc: SparkContext = new SparkContext(
    new SparkConf()
        .setMaster("local")
        .setAppName("RddMem"))val rdd: RDD[(String.Int)] = sc.makeRDD(
    List(("a".1), ("b".2), ("a".1), ("b".2)), 2
)

val res = rdd.countByKey()
println(res)

sc.stop()
Copy the code

10. saveXxxxx

Save data to files in different formats

// Use examples

val sc = new SparkContext(
    new SparkConf().setMaster("local[*]").setAppName("MapPartitions"))val rdd = sc.makeRDD(List(("a".1), ("a".2), ("b".3), ("b".3), ("b".3)),1)

/** * text file */
rdd.saveAsTextFile("text")

/** * Sequence file */
rdd.saveAsSequenceFile("sequence")

/** * object file */
rdd.saveAsObjectFile("object")

sc.stop()
Copy the code

11. foreach

distributedIterate over each element in the RDD, calling the specified function

// Use examples

val sc = new SparkContext(
    new SparkConf().setMaster("local[*]").setAppName("MapPartitions"))val rdd = sc.makeRDD(List(("a".1), ("a".2), ("b".3), ("b".3), ("b".3)),1)

// distributed traversal
rdd.foreach(println(_))

sc.stop()
Copy the code