This article mainly focuses on the use, detailed introduction of how to parse an SQL statement and SQL rewriting and other advanced operations, SQL parsing is widely used in the field of distributed data, such as sub-database sub-table component routing module, business SQL parsing, extract conditional routing. Typical representatives include Mycat, etc. How to work involves SQL parsing or rewriting requirements, this article should be able to solve the basic requirements.

The content involved in SQL parser is mainly composed of three parts as follows:

Parser A, Parser B, Parser 2, AST(Abstract Syntax Tree) 3, Visitor: SQL elements iteratedCopy the code

AST is introduced

In computer science, AbstractSyntaxTree (AST), or Syntax tree for short, is an abstract representation of the syntactic structure of source code. Druid parses SQL and builds the AST into a Syntax tree by following certain rules.

The main function of Parser is to generate AST. The Parser mainly consists of two parts: lexical analysis and syntax analysis

SQL > select a, b, from userTable where user_id =10 from userTable where user_id =10

And then when you do the lexical analysis, you do the parsing, you do the parsing, you do the parsing, you do the parsing, you do the parsing, you do the parsing, you do the parsing, you do the parsing, you do the parsing, The result of the analysis is to determine the meaning of the word.

The AST is just a representation of the semantics, but to express the semantics, you need to access the AST to see what it means. The syntax tree is usually traversed by using the VISITOR pattern, traversing from the root node to the last leaf node. During the traversal, information is continuously collected into a context. When the traversal is complete, the syntax meaning of the tree has been saved into the context. Sometimes one walk is not enough, you need a second walk. Traversal, breadth-first traversal is the most common, or without visitor, we can manually traverse the AST structure after we know it by ourselves, but it is very tedious. I think the difference between manual traversal and using a visitor is similar to the difference between you using JQuery and JavaScript to implement the same function. Sometimes to implement a function using a visitor might be a dozen lines of code, and traversing a Statement by yourself might require hundreds of lines of code.

The complete AST abstract syntax tree is generated by Parser.

Druid AST node type in SQL

There are three main types of AST nodes in common use: SQLObject, SQLExpr, and SQLStatement, the most commonly used of which is SQLStatement, DruidSelectStatement, DruidInsertStatement, DruidUpdateStatement, etc

public interface SQLStatement extends SQLObject {}
public interface SQLObject {}
public interface SQLExpr extends SQLObject.Cloneable {}
Copy the code

2.1. Common SQLExpr

package com.alibaba.druid.sql.ast.expr;

SQLName is a TYPE of SQLExpr Expr, including SQLIdentifierExpr, SQLPropertyExpr, etc
public interface SQLName extends SQLExpr {}

For example, if ID = 3, the ID is a SQLIdentifierExpr
class SQLIdentifierExpr implements SQLExpr.SQLName {
    String name;
} 

// a.id = 3 where a.id is a SQLPropertyExpr
class SQLPropertyExpr implements SQLExpr.SQLName {
    SQLExpr owner;
    String name;
} 

// For example ID = 3 this is a SQLBinaryOpExpr
// left is ID (SQLIdentifierExpr)
// Right is 3 (SQLIntegerExpr)
class SQLBinaryOpExpr implements SQLExpr {
    SQLExpr left;
    SQLExpr right;
    SQLBinaryOperator operator;
}

Select * from where id =? Here? Is a SQLVariantRefExpr, name is '? '
class SQLVariantRefExpr extends SQLExprImpl { 
    String name;
}

// for example ID = 3 where 3 is an SQLIntegerExpr
public class SQLIntegerExpr extends SQLNumericLiteralExpr implements SQLValuableExpr { 
    Number number;

    // All SQLExpr implementations of the SQLValuableExpr interface can call this method directly to evaluate
    @Override
    public Object getValue(a) {
        return this.number; }}// NAME = 'jobs' where 'jobs' is a SQLCharExpr
public class SQLCharExpr extends SQLTextLiteralExpr implements SQLValuableExpr{
    String text;
}
Copy the code

2.2 common SQLStatement

The most commonly used Statement, of course, is the SELECT/UPDATE/DELETE/INSERT, they are

package com.alibaba.druid.sql.ast.statement;

class SQLSelectStatement implements SQLStatement {
    SQLSelect select;
}
class SQLUpdateStatement implements SQLStatement {
    SQLExprTableSource tableSource;
     List<SQLUpdateSetItem> items;
     SQLExpr where;
}
class SQLDeleteStatement implements SQLStatement {
    SQLTableSource tableSource; 
    SQLExpr where;
}
class SQLInsertStatement implements SQLStatement {
    SQLExprTableSource tableSource;
    List<SQLExpr> columns;
    SQLSelect query;
}
Copy the code

2.3 Common SQLTableSource

Common SQLTableSource include SQLExprTableSource, SQLJoinTableSource, SQLSubqueryTableSource, SQLWithSubqueryClause. Entry

class SQLTableSourceImpl extends SQLObjectImpl implements SQLTableSource { 
    String alias;
}

// For example, select * from EMp where I = 3
// expr is an SQLIdentifierExpr whose name=emp
class SQLExprTableSource extends SQLTableSourceImpl {
    SQLExpr expr;
}

// For example, select * from emp e inner join org o on e.org_id = o.id
// Left 'emp e' is a SQLExprTableSource and right 'org o' is a SQLExprTableSource
Condition 'e.org_id = o.id' is a SQLBinaryOpExpr
class SQLJoinTableSource extends SQLTableSourceImpl {
    SQLTableSource left;
    SQLTableSource right;
    JoinType joinType; // INNER_JOIN/CROSS_JOIN/LEFT_OUTER_JOIN/RIGHT_OUTER_JOIN/...
    SQLExpr condition;
}

Select * from(select * from temp) a; Is a SQLSubqueryTableSource
SQLSubqueryTableSource extends SQLTableSourceImpl {
    SQLSelect select;
}

/* WITH RECURSIVE rooted AS (SELECT * FROM org UNION SELECT f.* FROM org f, ancestors a WHERE f.id = a.parent_id ) SELECT * FROM ancestors; Here rooted to the same piece AS (...) Is a SQLWithSubqueryClause. Entry * /
class SQLWithSubqueryClause {
    static class Entry extends SQLTableSourceImpl { SQLSelect subQuery; }}Copy the code

SQLSelect & SQLSelectQuery

SQLSelectStatement contains a SQLSelect, and SQLSelect contains a SQLSelectQuery, both of which are constituent relationships. SQLSelectQuery has two main derived classes, SQLSelectQueryBlock and SQLUnionQuery.

class SQLSelect extends SQLObjectImpl { 
    SQLWithSubqueryClause withSubQuery;
    SQLSelectQuery query;
}

interface SQLSelectQuery extends SQLObject {}

class SQLSelectQueryBlock implements SQLSelectQuery {
    List<SQLSelectItem> selectList;
    SQLTableSource from;
    SQLExprTableSource into;
    SQLExpr where;
    SQLSelectGroupByClause groupBy;
    SQLOrderBy orderBy;
    SQLLimit limit;
}

class SQLUnionQuery implements SQLSelectQuery {
    SQLSelectQuery left;
    SQLSelectQuery right;
    SQLUnionOperator operator; // UNION/UNION_ALL/MINUS/INTERSECT
}
Copy the code

2.5, SQLCreateTableStatement

Table building clauses contain a series of methods for facilitating various operations

public class SQLCreateTableStatement extends SQLStatementImpl implements SQLDDLStatement.SQLCreateStatement {
    SQLExprTableSource tableSource;
    List<SQLTableElement> tableElementList;
    Select select;

    SQLColumnDefinition in SQLCreateTableStatement
    public SQLColumnDefinition findColumn(String columName) {}

    // Select the index associated with column in SQLCreateTableStatement
    public SQLTableElement findIndex(String columnName) {}

    // Whether the foreign key depends on another table
    public boolean isReferenced(String tableName) {}}Copy the code

3. Generate abstract syntax tree (AST)

The AST is generated from the incoming text, namely SQL, using Mysql’s Parser

  public SQLStatement parserSQL(String originSql) throws SQLSyntaxErrorException {
        SQLStatementParser parser = new MySqlStatementParser(originSql);

        /** * thrown SQL SyntaxError if parser error */
        try {
            List<SQLStatement> list = parser.parseStatementList();
            if (list.size() > 1) {
                throw new SQLSyntaxErrorException("MultiQueries is not supported,use single query instead ");
            }
            return list.get(0);
        } catch (Exception t) {
            LOGGER.info("routeNormalSqlWithAST", t);
            if(t.getMessage() ! =null) {
                throw new SQLSyntaxErrorException(t.getMessage());
            } else {
                throw newSQLSyntaxErrorException(t); }}}Copy the code

Or by database type. Druid generates the AST for different database types

/ * * *@author Qi.qingshan
 */
public class SqlParser {
    public static void main(String[] args) throws SQLSyntaxErrorException {
        String sql = "";
        String dbType = "oracle";
        SQLStatement statement = parser(sql, dbType);

    }
    public static SQLStatement parser(String sql,String dbType) throws SQLSyntaxErrorException {
        List<SQLStatement> list = SQLUtils.parseStatements(sql, dbType);
        if (list.size() > 1) {
            throw new SQLSyntaxErrorException("MultiQueries is not supported,use single query instead ");
        }
        return list.get(0); }}Copy the code

After the AST is generated, we can extract what we need based on the composition of the syntax tree. For example, we want to extract the table name of select name,id from ACCt where ID =10 and replace it with acct_1

/ * * *@author Qi.qingshan
 */
public class SqlParser {
    public static void main(String[] args) throws SQLSyntaxErrorException {
        String sql = "select name ,id from acct where id =10";
        String dbType = "mysql";
        System.out.println("Original SQL is:"+sql);
        SQLSelectStatement statement = (SQLSelectStatement) parser(sql, dbType);
        SQLSelect select = statement.getSelect();
        SQLSelectQueryBlock query = (SQLSelectQueryBlock) select.getQuery();
        SQLExprTableSource tableSource = (SQLExprTableSource) query.getFrom();
        String tableName = tableSource.getExpr().toString();
        System.out.println("Obtain tableName tableName:" + tableName);
        Alter table acct_1
        tableSource.setExpr("acct_1");
        System.out.println(Alter table name alter table name alter table name + statement.toString() +"]");
    }
    public static SQLStatement parser(String sql,String dbType) throws SQLSyntaxErrorException {
        List<SQLStatement> list = SQLUtils.parseStatements(sql, dbType);
        if (list.size() > 1) {
            throw new SQLSyntaxErrorException("MultiQueries is not supported,use single query instead ");
        }
        return list.get(0); }}Copy the code

The function of table division is basically implemented in the database middleware based on this principle. It is best to clone a new object to modify the Statement, and the clone method is provided internally

4, the Visitor

Druid provides multiple default implementations of the Visitor to meet basic requirements, and you can implement your own custom Visitor if the default does not meet your requirements.

Select name, ID, money from user where ID =10 SELECT name, ID, money from user where ID =10 Below is the internal structure

It is also easy to use the default Visitor implementation to extract table names from SQL statements, or to use a custom implementation

/** * @author Qi.qingshan * @date 20190526 */ public class SqlParser { public static void main(String[] args) throws SQLSyntaxErrorException { String sql = "select money from acct where id =10"; String dbType = "mysql"; System.out.println(" original SQL = "+ SQL); SQLSelectStatement statement = (SQLSelectStatement) parser(sql, dbType); MySqlSchemaStatVisitor visitor = new MySqlSchemaStatVisitor(); statement.accept(visitor); System.out.println(visitor.getTables().toString()); } public static SQLStatement parser(String sql,String dbType) throws SQLSyntaxErrorException { List<SQLStatement> list =  SQLUtils.parseStatements(sql, dbType); if (list.size() > 1) { throw new SQLSyntaxErrorException("MultiQueries is not supported,use single query instead "); } return list.get(0); }}Copy the code