Apache DolphinScheduler is a decentralized, easily extensible visual DAG workflow task scheduling system. It is committed to solving the complicated dependency relationship in the data processing process, so that the scheduling system can be used out of the box in the data processing process.

Recently, Bo Yi contributed an analysis article on the core table structure of workflow to the community, very detailed, if you like, please send it away

1. Overall storage structure of workflow

All workflow definitions (templates) created in the DolphinScheduler library are stored in the T_DS_process_definition table.

The database table structure is shown in the following table:

The serial number field type describe
1 id int(11) A primary key
2 name varchar(255) Process definition name
3 version int(11) Process Definition Version
4 release_state tinyint(4) Publication status of the process definition: 0 not online, 1 online
5 project_id int(11) The project id
6 user_id int(11) The process defines the id of the user to which it belongs
7 process_definition_json longtext Process definition JSON
8 description text Process definition Description
9 global_params text Global parameters
10 flag tinyint(4) Whether the process is available: 0 unavailable, 1 available
11 locations text Node coordinate information
12 connects text Node Connection Information
13 receivers text The recipient
14 receivers_cc text Cc people
15 create_time datetime Creation time
16 timeout int(11) timeout
17 tenant_id int(11) The tenant id
18 update_time datetime Update time
19 modify_by varchar(36) Modify the user
20 resource_ids varchar(255) Resource ids

The process_definition_json field is the core field, which defines the task information in the DAG diagram. This data is stored as JSON.

The common data structures are as follows:

The serial number field type describe
1 globalParams Array Global parameters
2 tasks Array A collection of tasks in a process [see section below for structure of each type]
3 tenantId int The tenant id
4 timeout int timeout

Data examples:

{
    "globalParams":[
        {
            "prop":"golbal_bizdate",
            "direct":"IN",
            "type":"VARCHAR",
            "value":"${system.biz.date}"
        }
    ],
    "tasks":Array[1],
    "tenantId":0,
    "timeout":0
}Copy the code

2. Detailed description of the storage structure of each task type

2.1 the Shell node

** Shell node data structure is as follows :**

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type SHELL
3 name String The name of the
4 params Object Custom parameters Json format
5 rawScript String A Shell script
6 localParams Array Custom parameters
7 resourceList Array Resource file
8 description String describe
9 runFlag String Run logo
10 conditionResult Object Conditional branch
11 successNode Array Successful jump node
12 failedNode Array Failed jump node
13 dependence Object Task dependent on With the params mutex
14 maxRetryTimes String Maximum retry times
15 retryInterval String Retry interval
16 timeout Object Timeout control
17 taskInstancePriority String Task priority
18 workerGroup String The Worker group
19 preTasks Array Front task

Shell node data example:

{
    "type":"SHELL",
    "id":"tasks-80760",
    "name":"Shell Task",
    "params":{
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "rawScript":"echo "This is a shell script""
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.2 SQL nodes

Query and update data on the specified data source through SQL.

SQL node data structure is as follows: * * * * number parameter name | | | | type descriptions | — — — — — — — – | — — — — — — — — – | — — — — — — — — – | — — — — — — — – | — — — — — — — — – | — – | — — — — — – 1 id | | String | Task coding | 2 | type | | String type | | 3 | SQL name | | String name | | | 4 params | | Object | | custom parameters Json format 5 | | type | String type 6 | | database | datasource | Int | data source id 7 | | | SQL String | query SQL statements 8 | | udfs | String | | udf function udf function id, comma-separated. 9 | | sqlType | String | SQL node type | 0 query, 1 the query 10 | | in the | String | | mail title 11 | receivers | String 12 | | the recipient | receiversCc | String | cc people 13 | | showType | String | The mail display type | TABLE form, ATTACHMENT ATTACHMENT 14 | | connParams | String | connection parameters 15 | | preStatements | Array | lead SQL 16 | | postStatements | Array | rear SQL | | 17 | | LocalParams | Array | custom parameters | | | 18 description | | String description | | | | 19 runFlag | | String | run logo | | | 20 conditionResult | | Object conditions branch | | | 21 | | successNode | Array | | successful jump node 22 | | | failedNode | Array 23 | | | jump node failure dependence | | Object | task dependency 24 | | and params mutex maxRetryTimes | | String biggest retries | | | | 25 retryInterval | | String | retry interval | | | 26 timeout | | Object | timeout control | | | TaskInstancePriority | | String | task priority | | | 28 workerGroup | | String | Worker group | | | 29 preTasks | | Array prior task | | |

** Example of SQL node data :**

{
    "type":"SQL",
    "id":"tasks-95648",
    "name":"SqlTask-Query",
    "params":{
        "type":"MYSQL",
        "datasource":1,
        "sql":"select id , namge , age from emp where id =  ${id}",
        "udfs":"",
        "sqlType":"0",
        "title":"[email protected]",
        "receivers":"[email protected]",
        "receiversCc":"",
        "showType":"TABLE",
        "localParams":[
            {
                "prop":"id",
                "direct":"IN",
                "type":"INTEGER",
                "value":"1"
            }
        ],
        "connParams":"",
        "preStatements":[
            "insert into emp ( id,name ) value (1,'Li' )"
        ],
        "postStatements":[
        ]
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.3 Spark node

** Data structure of the Spark node is as follows :**

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type SPARK
3 name String The name of the
4 params Object Custom parameters Json format
5 mainClass String Run the main class
6 mainArgs String Operation parameters
7 others String The other parameters
8 mainJar Object Program jars
9 deployMode String Deployment patterns local,client,cluster
10 driverCores String Driver auditing
11 driverMemory String Driver number of memory
12 numExecutors String Executor number
13 executorMemory String Executor memory
14 executorCores String Executor auditing
15 programType String Program type JAVA,SCALA,PYTHON
16 sparkVersion String The Spark version SPARK1 , SPARK2
17 localParams Array Custom parameters
18 resourceList Array Resource file
19 description String describe
20 runFlag String Run logo
21 conditionResult Object Conditional branch
22 successNode Array Successful jump node
23 failedNode Array Failed jump node
24 dependence Object Task dependent on With the params mutex
25 maxRetryTimes String Maximum retry times
26 retryInterval String Retry interval
27 timeout Object Timeout control
28 taskInstancePriority String Task priority
29 workerGroup String The Worker group
30 preTasks Array Front task

** Example data of the Spark node :**

{
    "type":"SPARK",
    "id":"tasks-87430",
    "name":"SparkTask",
    "params":{
        "mainClass":"org.apache.spark.examples.SparkPi",
        "mainJar":{
            "id":4
        },
        "deployMode":"cluster",
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "driverCores":1,
        "driverMemory":"512M",
        "numExecutors":2,
        "executorMemory":"2G",
        "executorCores":2,
        "mainArgs":"10",
        "others":"",
        "programType":"SCALA",
        "sparkVersion":"SPARK2"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.4 graphs (MR) node

** MapReduce(MR) node data structure is as follows :**

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type 先生
3 name String The name of the
4 params Object Custom parameters Json format
5 mainClass String Run the main class
6 mainArgs String Operation parameters
7 others String The other parameters
8 mainJar Object Program jars
9 programType String Program type JAVA,PYTHON
10 localParams Array Custom parameters
11 resourceList Array Resource file
12 description String describe
13 runFlag String Run logo
14 conditionResult Object Conditional branch
15 successNode Array Successful jump node
16 failedNode Array Failed jump node
17 dependence Object Task dependent on With the params mutex
18 maxRetryTimes String Maximum retry times
19 retryInterval String Retry interval
20 timeout Object Timeout control
21 taskInstancePriority String Task priority
22 workerGroup String The Worker group
23 preTasks Array Front task

** Example of MapReduce(MR) node data :**

{
    "type":"MR",
    "id":"tasks-28997",
    "name":"MRTask",
    "params":{
        "mainClass":"wordcount",
        "mainJar":{
            "id":5
        },
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "mainArgs":"/tmp/wordcount/input /tmp/wordcount/output/",
        "others":"",
        "programType":"JAVA"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.5 the Python node

** Python node data structure is as follows :**

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type PYTHON
3 name String The name of the
4 params Object Custom parameters Json format
5 rawScript String Python scripts
6 localParams Array Custom parameters
7 resourceList Array Resource file
8 description String describe
9 runFlag String Run logo
10 conditionResult Object Conditional branch
11 successNode Array Successful jump node
12 failedNode Array Failed jump node
13 dependence Object Task dependent on With the params mutex
14 maxRetryTimes String Maximum retry times
15 retryInterval String Retry interval
16 timeout Object Timeout control
17 taskInstancePriority String Task priority
18 workerGroup String The Worker group
19 preTasks Array Front task

Python node data example:

{
    "type":"PYTHON",
    "id":"tasks-5463",
    "name":"Python Task",
    "params":{
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "rawScript":"print("This is a python script")"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.6 Flink node

Flink node data structure is as follows:

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type FLINK
3 name String The name of the
4 params Object Custom parameters Json format
5 mainClass String Run the main class
6 mainArgs String Operation parameters
7 others String The other parameters
8 mainJar Object Program jars
9 deployMode String Deployment patterns local,client,cluster
10 slot String Slot number
11 taskManager String TaskManage number
12 taskManagerMemory String TaskManager memory number
13 jobManagerMemory String JobManager memory number
14 programType String Program type JAVA,SCALA,PYTHON
15 localParams Array Custom parameters
16 resourceList Array Resource file
17 description String describe
18 runFlag String Run logo
19 conditionResult Object Conditional branch
20 successNode Array Successful jump node
21 failedNode Array Failed jump node
22 dependence Object Task dependent on With the params mutex
23 maxRetryTimes String Maximum retry times
24 retryInterval String Retry interval
25 timeout Object Timeout control
26 taskInstancePriority String Task priority
27 workerGroup String The Worker group
38 preTasks Array Front task

** Flink node data example :**

{
    "type":"FLINK",
    "id":"tasks-17135",
    "name":"FlinkTask",
    "params":{
        "mainClass":"com.flink.demo",
        "mainJar":{
            "id":6
        },
        "deployMode":"cluster",
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "slot":1,
        "taskManager":"2",
        "jobManagerMemory":"1G",
        "taskManagerMemory":"2G",
        "executorCores":2,
        "mainArgs":"100",
        "others":"",
        "programType":"SCALA"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.7 Http node

The Http node data structure is as follows:

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type HTTP
3 name String The name of the
4 params Object Custom parameters Json format
5 url String Request the address
6 httpMethod String Request way GET,POST,HEAD,PUT,DELETE
7 httpParams Array Request parameters
8 httpCheckCondition String Check condition The default response code is 200
9 condition String Check the content
10 localParams Array Custom parameters
11 description String describe
12 runFlag String Run logo
13 conditionResult Object Conditional branch
14 successNode Array Successful jump node
15 failedNode Array Failed jump node
16 dependence Object Task dependent on With the params mutex
17 maxRetryTimes String Maximum retry times
18 retryInterval String Retry interval
19 timeout Object Timeout control
20 taskInstancePriority String Task priority
21 workerGroup String The Worker group
22 preTasks Array Front task

** Example of Http node data :**

{
    "type":"HTTP",
    "id":"tasks-60499",
    "name":"HttpTask",
    "params":{
        "localParams":[
        ],
        "httpParams":[
            {
                "prop":"id",
                "httpParametersType":"PARAMETER",
                "value":"1"
            },
            {
                "prop":"name",
                "httpParametersType":"PARAMETER",
                "value":"Bo"
            }
        ],
        "url":"https://www.xxxxx.com:9012",
        "httpMethod":"POST",
        "httpCheckCondition":"STATUS_CODE_DEFAULT",
        "condition":""
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.8 DataX node

** DataX node data structure is as follows :**

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type DATAX
3 name String The name of the
4 params Object Custom parameters Json format
5 customConfig Int Custom type 0 Custom, 1 custom
6 dsType String Source database type
7 dataSource Int Source database ID
8 dtType String Target database type
9 dataTarget Int Target database ID
10 sql String The SQL statement
11 targetTable String The target table
12 jobSpeedByte Int Stream limiting (bytes)
13 jobSpeedRecord Int Current limiting (number of records)
14 preStatements Array Lead the SQL
15 postStatements Array The rear SQL
16 json String Custom Configuration CustomConfig = 1
17 localParams Array Custom parameters CustomConfig = 1
18 description String describe
19 runFlag String Run logo
20 conditionResult Object Conditional branch
21 successNode Array Successful jump node
22 failedNode Array Failed jump node
23 dependence Object Task dependent on With the params mutex
24 maxRetryTimes String Maximum retry times
25 retryInterval String Retry interval
26 timeout Object Timeout control
27 taskInstancePriority String Task priority
28 workerGroup String The Worker group
29 preTasks Array Front task

DataX Node data Example:

{
    "type":"DATAX",
    "id":"tasks-91196",
    "name":"DataxTask-DB",
    "params":{
        "customConfig":0,
        "dsType":"MYSQL",
        "dataSource":1,
        "dtType":"MYSQL",
        "dataTarget":1,
        "sql":"select id, name ,age from user ",
        "targetTable":"emp",
        "jobSpeedByte":524288,
        "jobSpeedRecord":500,
        "preStatements":[
            "truncate table emp "
        ],
        "postStatements":[
            "truncate table user"
        ]
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.9 Sqoop node

Sqoop node data structure is as follows:

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type SQOOP
3 name String The name of the
4 params Object Custom parameters JSON format
5 concurrency Int Concurrent degree
6 modelType String flow import,export
7 sourceType String Data source type
8 sourceParams String Data source parameter JSON format
9 targetType String Target data type
10 targetParams String Target data parameter JSON format
11 localParams Array Custom parameters
12 description String describe
13 runFlag String Run logo
14 conditionResult Object Conditional branch
15 successNode Array Successful jump node
16 failedNode Array Failed jump node
17 dependence Object Task dependent on With the params mutex
18 maxRetryTimes String Maximum retry times
19 retryInterval String Retry interval
20 timeout Object Timeout control
21 taskInstancePriority String Task priority
22 workerGroup String The Worker group
23 preTasks Array Front task

Example of Sqoop node data:

{ "type":"SQOOP", "id":"tasks-82041", "name":"Sqoop Task", "params":{ "concurrency":1, "modelType":"import", "sourceType":"MYSQL", "targetType":"HDFS", "sourceParams":"{"srcType":"MYSQL","srcDatasource":1,"srcTable":"","srcQueryType":"1","srcQuerySql":"selec id , name from user","srcColumnType":"0","srcColumns":"","srcConditionList":[],"mapColumnHive":[{"prop":"hivetype-key","direct":"IN","t ype":"VARCHAR","value":"hivetype-value"}],"mapColumnJava":[{"prop":"javatype-key","direct":"IN","type":"VARCHAR","value" :"javatype-value"}]}", "targetParams":"{"targetPath":"/user/hive/warehouse/ods.db/user","deleteTargetDir":false,"fileType":"--as-avrodatafile", "compressionCodec":"snappy","fieldsTerminated":",","linesTerminated":"@"}", "localParams":[ ] }, "description":"", "runFlag":"NORMAL", "conditionResult":{ "successNode":[ "" ], "failedNode":[ "" ] }, "dependence":{ }, "maxRetryTimes":"0", "retryInterval":"1", "timeout":{ "strategy":"", "interval":null, "enable":false }, "taskInstancePriority":"MEDIUM", "workerGroup":"default", "preTasks":[ ] }Copy the code

2.10 Conditional branch nodes

Conditional branch node data structure is as follows:

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type SHELL
3 name String The name of the
4 params Object Custom parameters null
5 description String describe
6 runFlag String Run logo
7 conditionResult Object Conditional branch
8 successNode Array Successful jump node
9 failedNode Array Failed jump node
10 dependence Object Task dependent on With the params mutex
11 maxRetryTimes String Maximum retry times
12 retryInterval String Retry interval
13 timeout Object Timeout control
14 taskInstancePriority String Task priority
15 workerGroup String The Worker group
16 preTasks Array Front task

Conditional branch node data example:

{" type ":" the CONDITIONS ", "id" : "the tasks - 96189", "name" : "CONDITIONS", "params" : {}, "description" : ""," runFlag ":" NORMAL ", "conditionResult":{ "successNode":[ "test04" ], "failedNode":[ "test05" ] }, "dependence":{ "relation":"AND", "dependTaskList":[ ] }, "maxRetryTimes":"0", "retryInterval":"1", "timeout":{ "strategy":"", "interval":null, "enable":false }, "taskInstancePriority":"MEDIUM", "workerGroup":"default", "preTasks":[ "test01", "test02" ] }Copy the code

2.11 Sub-process node

The data structure of sub-process nodes is as follows: Serial number parameter name | | | | type descriptions | — — — — — — — – | — — — — — — — — – | — — — — — — — — – | — — — — — — — – | — — — — — — — — – | — – | — — — — — – 1 id | | String | 2 | | task coding type | | String type | | 3 | SHELL name | | String name | | | 4 params | | Object | | custom parameters Json format 5 | | processDefinitionId | Int | id process definition 6 | description | | String description | | | | 7 runFlag | | String | run logo | | | 8 conditionResult | | Object conditions branch | | | 9 | | successNode | Array | | successful jump node 10 | | | failedNode | Array 11 | | | jump node failure dependence | | Object | task dependent on 12 | | and params mutexes maxRetryTimes | | String biggest retries | | | | 13 retryInterval | | String | retry interval | | | 14 timeout | | Object | timeout control | | | 15 taskInstancePriority | | String | task priority | | | 16 workerGroup | | String | Worker group | | | 17 preTasks | | Array prior task | | |

Example of subprocess node data:

{
            "type":"SUB_PROCESS",
            "id":"tasks-14806",
            "name":"SubProcessTask",
            "params":{
                "processDefinitionId":2
            },
            "description":"",
            "runFlag":"NORMAL",
            "conditionResult":{
                "successNode":[
                    ""
                ],
                "failedNode":[
                    ""
                ]
            },
            "dependence":{
            },
            "timeout":{
                "strategy":"",
                "interval":null,
                "enable":false
            },
            "taskInstancePriority":"MEDIUM",
            "workerGroup":"default",
            "preTasks":[
            ]
        }Copy the code

2.12 DEPENDENT node

The data structure of the DEPENDENT node is as follows:

The serial number Parameter names type describe describe
1 id String Task coding
2 type String type DEPENDENT
3 name String The name of the
4 params Object Custom parameters Json format
5 rawScript String A Shell script
6 localParams Array Custom parameters
7 resourceList Array Resource file
8 description String describe
9 runFlag String Run logo
10 conditionResult Object Conditional branch
11 successNode Array Successful jump node
12 failedNode Array Failed jump node
13 dependence Object Task dependent on With the params mutex
14 relation String Relationship between AND,OR
15 dependTaskList Array Dependent task list
16 maxRetryTimes String Maximum retry times
17 retryInterval String Retry interval
18 timeout Object Timeout control
19 taskInstancePriority String Task priority
20 workerGroup String The Worker group
21 preTasks Array Front task

DEPENDENT node data example:

{
            "type":"DEPENDENT",
            "id":"tasks-57057",
            "name":"DenpendentTask",
            "params":{
            },
            "description":"",
            "runFlag":"NORMAL",
            "conditionResult":{
                "successNode":[
                    ""
                ],
                "failedNode":[
                    ""
                ]
            },
            "dependence":{
                "relation":"AND",
                "dependTaskList":[
                    {
                        "relation":"AND",
                        "dependItemList":[
                            {
                                "projectId":1,
                                "definitionId":7,
                                "definitionList":[
                                    {
                                        "value":8,
                                        "label":"MRTask"
                                    },
                                    {
                                        "value":7,
                                        "label":"FlinkTask"
                                    },
                                    {
                                        "value":6,
                                        "label":"SparkTask"
                                    },
                                    {
                                        "value":5,
                                        "label":"SqlTask-Update"
                                    },
                                    {
                                        "value":4,
                                        "label":"SqlTask-Query"
                                    },
                                    {
                                        "value":3,
                                        "label":"SubProcessTask"
                                    },
                                    {
                                        "value":2,
                                        "label":"Python Task"
                                    },
                                    {
                                        "value":1,
                                        "label":"Shell Task"
                                    }
                                ],
                                "depTasks":"ALL",
                                "cycle":"day",
                                "dateValue":"today"
                            }
                        ]
                    },
                    {
                        "relation":"AND",
                        "dependItemList":[
                            {
                                "projectId":1,
                                "definitionId":5,
                                "definitionList":[
                                    {
                                        "value":8,
                                        "label":"MRTask"
                                    },
                                    {
                                        "value":7,
                                        "label":"FlinkTask"
                                    },
                                    {
                                        "value":6,
                                        "label":"SparkTask"
                                    },
                                    {
                                        "value":5,
                                        "label":"SqlTask-Update"
                                    },
                                    {
                                        "value":4,
                                        "label":"SqlTask-Query"
                                    },
                                    {
                                        "value":3,
                                        "label":"SubProcessTask"
                                    },
                                    {
                                        "value":2,
                                        "label":"Python Task"
                                    },
                                    {
                                        "value":1,
                                        "label":"Shell Task"
                                    }
                                ],
                                "depTasks":"SqlTask-Update",
                                "cycle":"day",
                                "dateValue":"today"
                            }
                        ]
                    }
                ]
            },
            "maxRetryTimes":"0",
            "retryInterval":"1",
            "timeout":{
                "strategy":"",
                "interval":null,
                "enable":false
            },
            "taskInstancePriority":"MEDIUM",
            "workerGroup":"default",
            "preTasks":[
            ]
        }Copy the code

DolphinScheduler Community

Apache DolphinScheduler is a very diverse community, with nearly 100 contributors from over 30 different companies. The wechat group has 3,000 users.

Apache DolphinScheduler User Cases (in no particular order)

DolphinScheduler is used by more than 300 companies and research institutes to handle scheduling and timed tasks, and more than 500 companies have launched DolphinScheduler trials:

Apache DolphinScheduler project origins – Requirements determination

Apache DolphinScheduler has four main features

Apache DolphinScheduler powers:

  • Tasks can be associated based on their dependencies in a DAG diagram to visually monitor their running status in real time

  • Supports various task types: Shell, MR, Spark, Flink, SQL(mysql, PostgresQL, Hive, sparkSQL), Python, Http, Sub_Process, and Procedure

  • Supports scheduled scheduling, dependent scheduling, manual scheduling, and manual pause, stop, and resume operations. In addition, supports retry/alarm operations, recovery failures from specified nodes, and Kill tasks

  • Supports workflow priority, task priority, task failover, and task timeout alarm/failure

  • Support workflow global parameters and node custom parameter Settings

  • Support resource files online upload/download, management, support online file creation, editing

  • You can view and scroll task logs online and download logs online

  • Implement cluster HA and decentralize Master cluster and Worker cluster through Zookeeper

  • Supports online view of Master/Worker CPU load, memory, and CPU

  • Support workflow running history tree/Gantt chart display, support task status statistics, process status statistics

  • Support the complement

  • Multi-tenant

  • Support internationalization

Apache DolphinScheduler is new in 1.3

* Worker reconstructs to improve Worker performance * Master and Worker introduce Netty communication * ZooKeeper task queue * Three options for Worker nodes: Random, circular and linear weighted load balancing for CPU and memory * Worker to database operations * Resource center support for multiple directories * Add if/else conditional tasks * Add SQOOP /datax tasks * Support for K8S deployment * Add ONE-click formatting for DAG flowcharts * Flowchart beautify * Support for Ambari plug-in installation * Bulk export and import workflow * support for process definition replication * Greatly simplify configuration items, simplify deploymentCopy the code

Online DEMO Trial

http://106.75.43.194:8888/

DolphinScheduler Slogan

Join the Apache DolphinScheduler

The Apache mailing list allows you to join the DolphinScheduler community if you have any questions, ideas, or suggestions while working with it.

Welcome to contribute to the open source community by submitting your first PR,

  • Find an “Easy to fix” mark or some very simple issue(spelling mistakes, etc.) and familiarize yourself with the submission process through the first PR. If you have any questions, feel free to contact

This article is published by OpenWrite, a blogging tool platform