Apache DolphinScheduler is a decentralized, easily extensible visual DAG workflow task scheduling system. It is committed to solving the complicated dependency relationship in the data processing process, so that the scheduling system can be used out of the box in the data processing process.

Recently, Bo Yi contributed an analysis article on the core table structure of workflow to the community, very detailed, if you like, please send it away

1. Overall storage structure of workflow

All workflow definitions (templates) created in the DolphinScheduler library are stored in the T_DS_process_definition table.

The database table structure is shown in the following table:

The serial number	field	type	describe
1	id	int(11)	A primary key
2	name	varchar(255)	Process definition name
3	version	int(11)	Process Definition Version
4	release_state	tinyint(4)	Publication status of the process definition: 0 not online, 1 online
5	project_id	int(11)	The project id
6	user_id	int(11)	The process defines the id of the user to which it belongs
7	process_definition_json	longtext	Process definition JSON
8	description	text	Process definition Description
9	global_params	text	Global parameters
10	flag	tinyint(4)	Whether the process is available: 0 unavailable, 1 available
11	locations	text	Node coordinate information
12	connects	text	Node Connection Information
13	receivers	text	The recipient
14	receivers_cc	text	Cc people
15	create_time	datetime	Creation time
16	timeout	int(11)	timeout
17	tenant_id	int(11)	The tenant id
18	update_time	datetime	Update time
19	modify_by	varchar(36)	Modify the user
20	resource_ids	varchar(255)	Resource ids

The process_definition_json field is the core field, which defines the task information in the DAG diagram. This data is stored as JSON.

The common data structures are as follows:

The serial number	field	type	describe
1	globalParams	Array	Global parameters
2	tasks	Array	A collection of tasks in a process [see section below for structure of each type]
3	tenantId	int	The tenant id
4	timeout	int	timeout

Data examples:

{
    "globalParams":[
        {
            "prop":"golbal_bizdate",
            "direct":"IN",
            "type":"VARCHAR",
            "value":"${system.biz.date}"
        }
    ],
    "tasks":Array[1],
    "tenantId":0,
    "timeout":0
}Copy the code

2. Detailed description of the storage structure of each task type

2.1 the Shell node

** Shell node data structure is as follows :**

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	SHELL
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		rawScript	String	A Shell script
6		localParams	Array	Custom parameters
7		resourceList	Array	Resource file
8	description		String	describe
9	runFlag		String	Run logo
10	conditionResult		Object	Conditional branch
11		successNode	Array	Successful jump node
12		failedNode	Array	Failed jump node
13	dependence		Object	Task dependent on	With the params mutex
14	maxRetryTimes		String	Maximum retry times
15	retryInterval		String	Retry interval
16	timeout		Object	Timeout control
17	taskInstancePriority		String	Task priority
18	workerGroup		String	The Worker group
19	preTasks		Array	Front task

Shell node data example:

{
    "type":"SHELL",
    "id":"tasks-80760",
    "name":"Shell Task",
    "params":{
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "rawScript":"echo "This is a shell script""
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.2 SQL nodes

Query and update data on the specified data source through SQL.

** Example of SQL node data :**

{
    "type":"SQL",
    "id":"tasks-95648",
    "name":"SqlTask-Query",
    "params":{
        "type":"MYSQL",
        "datasource":1,
        "sql":"select id , namge , age from emp where id =  ${id}",
        "udfs":"",
        "sqlType":"0",
        "title":"[email protected]",
        "receivers":"[email protected]",
        "receiversCc":"",
        "showType":"TABLE",
        "localParams":[
            {
                "prop":"id",
                "direct":"IN",
                "type":"INTEGER",
                "value":"1"
            }
        ],
        "connParams":"",
        "preStatements":[
            "insert into emp ( id,name ) value (1,'Li' )"
        ],
        "postStatements":[
        ]
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.3 Spark node

** Data structure of the Spark node is as follows :**

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	SPARK
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		mainClass	String	Run the main class
6		mainArgs	String	Operation parameters
7		others	String	The other parameters
8		mainJar	Object	Program jars
9		deployMode	String	Deployment patterns	local,client,cluster
10		driverCores	String	Driver auditing
11		driverMemory	String	Driver number of memory
12		numExecutors	String	Executor number
13		executorMemory	String	Executor memory
14		executorCores	String	Executor auditing
15		programType	String	Program type	JAVA,SCALA,PYTHON
16		sparkVersion	String	The Spark version	SPARK1 , SPARK2
17		localParams	Array	Custom parameters
18		resourceList	Array	Resource file
19	description		String	describe
20	runFlag		String	Run logo
21	conditionResult		Object	Conditional branch
22		successNode	Array	Successful jump node
23		failedNode	Array	Failed jump node
24	dependence		Object	Task dependent on	With the params mutex
25	maxRetryTimes		String	Maximum retry times
26	retryInterval		String	Retry interval
27	timeout		Object	Timeout control
28	taskInstancePriority		String	Task priority
29	workerGroup		String	The Worker group
30	preTasks		Array	Front task

** Example data of the Spark node :**

{
    "type":"SPARK",
    "id":"tasks-87430",
    "name":"SparkTask",
    "params":{
        "mainClass":"org.apache.spark.examples.SparkPi",
        "mainJar":{
            "id":4
        },
        "deployMode":"cluster",
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "driverCores":1,
        "driverMemory":"512M",
        "numExecutors":2,
        "executorMemory":"2G",
        "executorCores":2,
        "mainArgs":"10",
        "others":"",
        "programType":"SCALA",
        "sparkVersion":"SPARK2"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.4 graphs (MR) node

** MapReduce(MR) node data structure is as follows :**

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	先生
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		mainClass	String	Run the main class
6		mainArgs	String	Operation parameters
7		others	String	The other parameters
8		mainJar	Object	Program jars
9		programType	String	Program type	JAVA,PYTHON
10		localParams	Array	Custom parameters
11		resourceList	Array	Resource file
12	description		String	describe
13	runFlag		String	Run logo
14	conditionResult		Object	Conditional branch
15		successNode	Array	Successful jump node
16		failedNode	Array	Failed jump node
17	dependence		Object	Task dependent on	With the params mutex
18	maxRetryTimes		String	Maximum retry times
19	retryInterval		String	Retry interval
20	timeout		Object	Timeout control
21	taskInstancePriority		String	Task priority
22	workerGroup		String	The Worker group
23	preTasks		Array	Front task

** Example of MapReduce(MR) node data :**

{
    "type":"MR",
    "id":"tasks-28997",
    "name":"MRTask",
    "params":{
        "mainClass":"wordcount",
        "mainJar":{
            "id":5
        },
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "mainArgs":"/tmp/wordcount/input /tmp/wordcount/output/",
        "others":"",
        "programType":"JAVA"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.5 the Python node

** Python node data structure is as follows :**

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	PYTHON
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		rawScript	String	Python scripts
6		localParams	Array	Custom parameters
7		resourceList	Array	Resource file
8	description		String	describe
9	runFlag		String	Run logo
10	conditionResult		Object	Conditional branch
11		successNode	Array	Successful jump node
12		failedNode	Array	Failed jump node
13	dependence		Object	Task dependent on	With the params mutex
14	maxRetryTimes		String	Maximum retry times
15	retryInterval		String	Retry interval
16	timeout		Object	Timeout control
17	taskInstancePriority		String	Task priority
18	workerGroup		String	The Worker group
19	preTasks		Array	Front task

Python node data example:

{
    "type":"PYTHON",
    "id":"tasks-5463",
    "name":"Python Task",
    "params":{
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "rawScript":"print("This is a python script")"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.6 Flink node

Flink node data structure is as follows:

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	FLINK
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		mainClass	String	Run the main class
6		mainArgs	String	Operation parameters
7		others	String	The other parameters
8		mainJar	Object	Program jars
9		deployMode	String	Deployment patterns	local,client,cluster
10		slot	String	Slot number
11		taskManager	String	TaskManage number
12		taskManagerMemory	String	TaskManager memory number
13		jobManagerMemory	String	JobManager memory number
14		programType	String	Program type	JAVA,SCALA,PYTHON
15		localParams	Array	Custom parameters
16		resourceList	Array	Resource file
17	description		String	describe
18	runFlag		String	Run logo
19	conditionResult		Object	Conditional branch
20		successNode	Array	Successful jump node
21		failedNode	Array	Failed jump node
22	dependence		Object	Task dependent on	With the params mutex
23	maxRetryTimes		String	Maximum retry times
24	retryInterval		String	Retry interval
25	timeout		Object	Timeout control
26	taskInstancePriority		String	Task priority
27	workerGroup		String	The Worker group
38	preTasks		Array	Front task

** Flink node data example :**

{
    "type":"FLINK",
    "id":"tasks-17135",
    "name":"FlinkTask",
    "params":{
        "mainClass":"com.flink.demo",
        "mainJar":{
            "id":6
        },
        "deployMode":"cluster",
        "resourceList":[
            {
                "id":3,
                "name":"run.sh",
                "res":"run.sh"
            }
        ],
        "localParams":[
        ],
        "slot":1,
        "taskManager":"2",
        "jobManagerMemory":"1G",
        "taskManagerMemory":"2G",
        "executorCores":2,
        "mainArgs":"100",
        "others":"",
        "programType":"SCALA"
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.7 Http node

The Http node data structure is as follows:

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	HTTP
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		url	String	Request the address
6		httpMethod	String	Request way	GET,POST,HEAD,PUT,DELETE
7		httpParams	Array	Request parameters
8		httpCheckCondition	String	Check condition	The default response code is 200
9		condition	String	Check the content
10		localParams	Array	Custom parameters
11	description		String	describe
12	runFlag		String	Run logo
13	conditionResult		Object	Conditional branch
14		successNode	Array	Successful jump node
15		failedNode	Array	Failed jump node
16	dependence		Object	Task dependent on	With the params mutex
17	maxRetryTimes		String	Maximum retry times
18	retryInterval		String	Retry interval
19	timeout		Object	Timeout control
20	taskInstancePriority		String	Task priority
21	workerGroup		String	The Worker group
22	preTasks		Array	Front task

** Example of Http node data :**

{
    "type":"HTTP",
    "id":"tasks-60499",
    "name":"HttpTask",
    "params":{
        "localParams":[
        ],
        "httpParams":[
            {
                "prop":"id",
                "httpParametersType":"PARAMETER",
                "value":"1"
            },
            {
                "prop":"name",
                "httpParametersType":"PARAMETER",
                "value":"Bo"
            }
        ],
        "url":"https://www.xxxxx.com:9012",
        "httpMethod":"POST",
        "httpCheckCondition":"STATUS_CODE_DEFAULT",
        "condition":""
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.8 DataX node

** DataX node data structure is as follows :**

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	DATAX
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		customConfig	Int	Custom type	0 Custom, 1 custom
6		dsType	String	Source database type
7		dataSource	Int	Source database ID
8		dtType	String	Target database type
9		dataTarget	Int	Target database ID
10		sql	String	The SQL statement
11		targetTable	String	The target table
12		jobSpeedByte	Int	Stream limiting (bytes)
13		jobSpeedRecord	Int	Current limiting (number of records)
14		preStatements	Array	Lead the SQL
15		postStatements	Array	The rear SQL
16		json	String	Custom Configuration	CustomConfig = 1
17		localParams	Array	Custom parameters	CustomConfig = 1
18	description		String	describe
19	runFlag		String	Run logo
20	conditionResult		Object	Conditional branch
21		successNode	Array	Successful jump node
22		failedNode	Array	Failed jump node
23	dependence		Object	Task dependent on	With the params mutex
24	maxRetryTimes		String	Maximum retry times
25	retryInterval		String	Retry interval
26	timeout		Object	Timeout control
27	taskInstancePriority		String	Task priority
28	workerGroup		String	The Worker group
29	preTasks		Array	Front task

DataX Node data Example:

{
    "type":"DATAX",
    "id":"tasks-91196",
    "name":"DataxTask-DB",
    "params":{
        "customConfig":0,
        "dsType":"MYSQL",
        "dataSource":1,
        "dtType":"MYSQL",
        "dataTarget":1,
        "sql":"select id, name ,age from user ",
        "targetTable":"emp",
        "jobSpeedByte":524288,
        "jobSpeedRecord":500,
        "preStatements":[
            "truncate table emp "
        ],
        "postStatements":[
            "truncate table user"
        ]
    },
    "description":"",
    "runFlag":"NORMAL",
    "conditionResult":{
        "successNode":[
            ""
        ],
        "failedNode":[
            ""
        ]
    },
    "dependence":{
    },
    "maxRetryTimes":"0",
    "retryInterval":"1",
    "timeout":{
        "strategy":"",
        "interval":null,
        "enable":false
    },
    "taskInstancePriority":"MEDIUM",
    "workerGroup":"default",
    "preTasks":[
    ]
}Copy the code

2.9 Sqoop node

Sqoop node data structure is as follows:

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	SQOOP
3	name		String	The name of the
4	params		Object	Custom parameters	JSON format
5		concurrency	Int	Concurrent degree
6		modelType	String	flow	import,export
7		sourceType	String	Data source type
8		sourceParams	String	Data source parameter	JSON format
9		targetType	String	Target data type
10		targetParams	String	Target data parameter	JSON format
11		localParams	Array	Custom parameters
12	description		String	describe
13	runFlag		String	Run logo
14	conditionResult		Object	Conditional branch
15		successNode	Array	Successful jump node
16		failedNode	Array	Failed jump node
17	dependence		Object	Task dependent on	With the params mutex
18	maxRetryTimes		String	Maximum retry times
19	retryInterval		String	Retry interval
20	timeout		Object	Timeout control
21	taskInstancePriority		String	Task priority
22	workerGroup		String	The Worker group
23	preTasks		Array	Front task

Example of Sqoop node data:

{ "type":"SQOOP", "id":"tasks-82041", "name":"Sqoop Task", "params":{ "concurrency":1, "modelType":"import", "sourceType":"MYSQL", "targetType":"HDFS", "sourceParams":"{"srcType":"MYSQL","srcDatasource":1,"srcTable":"","srcQueryType":"1","srcQuerySql":"selec id , name from user","srcColumnType":"0","srcColumns":"","srcConditionList":[],"mapColumnHive":[{"prop":"hivetype-key","direct":"IN","t ype":"VARCHAR","value":"hivetype-value"}],"mapColumnJava":[{"prop":"javatype-key","direct":"IN","type":"VARCHAR","value" :"javatype-value"}]}", "targetParams":"{"targetPath":"/user/hive/warehouse/ods.db/user","deleteTargetDir":false,"fileType":"--as-avrodatafile", "compressionCodec":"snappy","fieldsTerminated":",","linesTerminated":"@"}", "localParams":[ ] }, "description":"", "runFlag":"NORMAL", "conditionResult":{ "successNode":[ "" ], "failedNode":[ "" ] }, "dependence":{ }, "maxRetryTimes":"0", "retryInterval":"1", "timeout":{ "strategy":"", "interval":null, "enable":false }, "taskInstancePriority":"MEDIUM", "workerGroup":"default", "preTasks":[ ] }Copy the code

2.10 Conditional branch nodes

Conditional branch node data structure is as follows:

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	SHELL
3	name		String	The name of the
4	params		Object	Custom parameters	null
5	description		String	describe
6	runFlag		String	Run logo
7	conditionResult		Object	Conditional branch
8		successNode	Array	Successful jump node
9		failedNode	Array	Failed jump node
10	dependence		Object	Task dependent on	With the params mutex
11	maxRetryTimes		String	Maximum retry times
12	retryInterval		String	Retry interval
13	timeout		Object	Timeout control
14	taskInstancePriority		String	Task priority
15	workerGroup		String	The Worker group
16	preTasks		Array	Front task

Conditional branch node data example:

{" type ":" the CONDITIONS ", "id" : "the tasks - 96189", "name" : "CONDITIONS", "params" : {}, "description" : ""," runFlag ":" NORMAL ", "conditionResult":{ "successNode":[ "test04" ], "failedNode":[ "test05" ] }, "dependence":{ "relation":"AND", "dependTaskList":[ ] }, "maxRetryTimes":"0", "retryInterval":"1", "timeout":{ "strategy":"", "interval":null, "enable":false }, "taskInstancePriority":"MEDIUM", "workerGroup":"default", "preTasks":[ "test01", "test02" ] }Copy the code

2.11 Sub-process node

Example of subprocess node data:

{
            "type":"SUB_PROCESS",
            "id":"tasks-14806",
            "name":"SubProcessTask",
            "params":{
                "processDefinitionId":2
            },
            "description":"",
            "runFlag":"NORMAL",
            "conditionResult":{
                "successNode":[
                    ""
                ],
                "failedNode":[
                    ""
                ]
            },
            "dependence":{
            },
            "timeout":{
                "strategy":"",
                "interval":null,
                "enable":false
            },
            "taskInstancePriority":"MEDIUM",
            "workerGroup":"default",
            "preTasks":[
            ]
        }Copy the code

2.12 DEPENDENT node

The data structure of the DEPENDENT node is as follows:

The serial number	Parameter names		type	describe	describe
1	id		String	Task coding
2	type		String	type	DEPENDENT
3	name		String	The name of the
4	params		Object	Custom parameters	Json format
5		rawScript	String	A Shell script
6		localParams	Array	Custom parameters
7		resourceList	Array	Resource file
8	description		String	describe
9	runFlag		String	Run logo
10	conditionResult		Object	Conditional branch
11		successNode	Array	Successful jump node
12		failedNode	Array	Failed jump node
13	dependence		Object	Task dependent on	With the params mutex
14		relation	String	Relationship between	AND,OR
15		dependTaskList	Array	Dependent task list
16	maxRetryTimes		String	Maximum retry times
17	retryInterval		String	Retry interval
18	timeout		Object	Timeout control
19	taskInstancePriority		String	Task priority
20	workerGroup		String	The Worker group
21	preTasks		Array	Front task

DEPENDENT node data example:

{
            "type":"DEPENDENT",
            "id":"tasks-57057",
            "name":"DenpendentTask",
            "params":{
            },
            "description":"",
            "runFlag":"NORMAL",
            "conditionResult":{
                "successNode":[
                    ""
                ],
                "failedNode":[
                    ""
                ]
            },
            "dependence":{
                "relation":"AND",
                "dependTaskList":[
                    {
                        "relation":"AND",
                        "dependItemList":[
                            {
                                "projectId":1,
                                "definitionId":7,
                                "definitionList":[
                                    {
                                        "value":8,
                                        "label":"MRTask"
                                    },
                                    {
                                        "value":7,
                                        "label":"FlinkTask"
                                    },
                                    {
                                        "value":6,
                                        "label":"SparkTask"
                                    },
                                    {
                                        "value":5,
                                        "label":"SqlTask-Update"
                                    },
                                    {
                                        "value":4,
                                        "label":"SqlTask-Query"
                                    },
                                    {
                                        "value":3,
                                        "label":"SubProcessTask"
                                    },
                                    {
                                        "value":2,
                                        "label":"Python Task"
                                    },
                                    {
                                        "value":1,
                                        "label":"Shell Task"
                                    }
                                ],
                                "depTasks":"ALL",
                                "cycle":"day",
                                "dateValue":"today"
                            }
                        ]
                    },
                    {
                        "relation":"AND",
                        "dependItemList":[
                            {
                                "projectId":1,
                                "definitionId":5,
                                "definitionList":[
                                    {
                                        "value":8,
                                        "label":"MRTask"
                                    },
                                    {
                                        "value":7,
                                        "label":"FlinkTask"
                                    },
                                    {
                                        "value":6,
                                        "label":"SparkTask"
                                    },
                                    {
                                        "value":5,
                                        "label":"SqlTask-Update"
                                    },
                                    {
                                        "value":4,
                                        "label":"SqlTask-Query"
                                    },
                                    {
                                        "value":3,
                                        "label":"SubProcessTask"
                                    },
                                    {
                                        "value":2,
                                        "label":"Python Task"
                                    },
                                    {
                                        "value":1,
                                        "label":"Shell Task"
                                    }
                                ],
                                "depTasks":"SqlTask-Update",
                                "cycle":"day",
                                "dateValue":"today"
                            }
                        ]
                    }
                ]
            },
            "maxRetryTimes":"0",
            "retryInterval":"1",
            "timeout":{
                "strategy":"",
                "interval":null,
                "enable":false
            },
            "taskInstancePriority":"MEDIUM",
            "workerGroup":"default",
            "preTasks":[
            ]
        }Copy the code

DolphinScheduler Community

Apache DolphinScheduler is a very diverse community, with nearly 100 contributors from over 30 different companies. The wechat group has 3,000 users.

Apache DolphinScheduler User Cases (in no particular order)

DolphinScheduler is used by more than 300 companies and research institutes to handle scheduling and timed tasks, and more than 500 companies have launched DolphinScheduler trials:

Apache DolphinScheduler project origins – Requirements determination

Apache DolphinScheduler has four main features

Apache DolphinScheduler powers:

Tasks can be associated based on their dependencies in a DAG diagram to visually monitor their running status in real time
Supports various task types: Shell, MR, Spark, Flink, SQL(mysql, PostgresQL, Hive, sparkSQL), Python, Http, Sub_Process, and Procedure
Supports scheduled scheduling, dependent scheduling, manual scheduling, and manual pause, stop, and resume operations. In addition, supports retry/alarm operations, recovery failures from specified nodes, and Kill tasks
Supports workflow priority, task priority, task failover, and task timeout alarm/failure
Support workflow global parameters and node custom parameter Settings
Support resource files online upload/download, management, support online file creation, editing
You can view and scroll task logs online and download logs online
Implement cluster HA and decentralize Master cluster and Worker cluster through Zookeeper
Supports online view of Master/Worker CPU load, memory, and CPU
Support workflow running history tree/Gantt chart display, support task status statistics, process status statistics
Support the complement
Multi-tenant
Support internationalization

Apache DolphinScheduler is new in 1.3

* Worker reconstructs to improve Worker performance * Master and Worker introduce Netty communication * ZooKeeper task queue * Three options for Worker nodes: Random, circular and linear weighted load balancing for CPU and memory * Worker to database operations * Resource center support for multiple directories * Add if/else conditional tasks * Add SQOOP /datax tasks * Support for K8S deployment * Add ONE-click formatting for DAG flowcharts * Flowchart beautify * Support for Ambari plug-in installation * Bulk export and import workflow * support for process definition replication * Greatly simplify configuration items, simplify deploymentCopy the code

Online DEMO Trial

http://106.75.43.194:8888/

DolphinScheduler Slogan

Join the Apache DolphinScheduler

The Apache mailing list allows you to join the DolphinScheduler community if you have any questions, ideas, or suggestions while working with it.

Welcome to contribute to the open source community by submitting your first PR,

Find an “Easy to fix” mark or some very simple issue(spelling mistakes, etc.) and familiarize yourself with the submission process through the first PR. If you have any questions, feel free to contact

This article is published by OpenWrite, a blogging tool platform

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Apache DolphinScheduler – 1.3 series core table structure analysis

1. Overall storage structure of workflow

2. Detailed description of the storage structure of each task type

2.1 the Shell node

2.2 SQL nodes

2.3 Spark node

2.4 graphs (MR) node

2.5 the Python node

2.6 Flink node

2.7 Http node

2.8 DataX node

2.9 Sqoop node

2.10 Conditional branch nodes

2.11 Sub-process node

2.12 DEPENDENT node

DolphinScheduler Community

Apache DolphinScheduler User Cases (in no particular order)

Apache DolphinScheduler project origins – Requirements determination

Apache DolphinScheduler has four main features

Apache DolphinScheduler powers:

Apache DolphinScheduler is new in 1.3

Online DEMO Trial

DolphinScheduler Slogan

Join the Apache DolphinScheduler

Apache DolphinScheduler – 1.3 series core table structure analysis

1. Overall storage structure of workflow

2. Detailed description of the storage structure of each task type

2.1 the Shell node

2.2 SQL nodes

2.3 Spark node

2.4 graphs (MR) node

2.5 the Python node

2.6 Flink node

2.7 Http node

2.8 DataX node

2.9 Sqoop node

2.10 Conditional branch nodes

2.11 Sub-process node

2.12 DEPENDENT node

DolphinScheduler Community

Apache DolphinScheduler User Cases (in no particular order)

Apache DolphinScheduler project origins – Requirements determination

Apache DolphinScheduler has four main features

Apache DolphinScheduler powers:

Apache DolphinScheduler is new in 1.3

Online DEMO Trial

DolphinScheduler Slogan

Join the Apache DolphinScheduler

Related Posts

AQS source guide

Nginx reverse proxy causes session invalidation

Nebula Storage 2.0 Storage format