Recently we have made some improvements to TiDB code that greatly simplify the process of adding built-in functions. This tutorial will show you how to add builtin functions to TiDB. It starts with some necessary background, then describes the process of adding the Builtin function, and ends with a function as an example.

Background knowledge

After the SQL statement is sent to TiDB, it is first parsed from the text to the AST (Abstract syntax tree). The execution plan is generated through Query Optimizer and an executable plan is obtained by executing the plan. This involves how to get the data in the table, how to filter, calculate, sort, aggregate, filter weight, and how to evaluate the expression. For a builtin function, it’s important to parse and evaluate it. The grammar parsing part needs to know how to write yACC and how to modify the lexical parser of TiDB, which is quite tedious. We have completed this part of work in advance, and the grammar parsing of most builtin functions has been completed. The evaluation of a builtin function is done in TiDB’s expression evaluation framework. Each builtin function is considered an expression and is represented by a ScalarFunction. Each builtin function is defined by its function name and parameters, Get the corresponding function type and function signature, and then evaluate by function signature. In general, the process is more complex, for not familiar with TiDB friends we work done for this part, will be few of the more tedious process, the work processing, a unified has put most of the unrealized buitlin function of syntax parsing and find function signature work done, but the function part blank. In other words, just find the empty function implementation and fill it out as a PR.

Add builtin function overall flow

  • Search errFunctionNotExists in the expression directory of TiDB source to find all unimplemented functions, and select a function of interest, such as SHA2:

    func (b *builtinSHA2Sig) eval(row []types.Datum) (d types.Datum, err error) {
      return d, errFunctionNotExists.GenByArgs("SHA2")}Copy the code
  • The next thing to do is to implement the eval method. See the MySQL documentation for more information about the eval function.

  • Add the return type of this function to handleFuncCallExpr() in Plan/TypeInferer. go. Keep it consistent with MySQL. See MySQL Const for the full type definition.

    * Note that most functions need to get the length of the return value in addition to filling in the return value type.Copy the code
  • Write unit Tests In the expression directory, add unit tests for the implementation of the function, and also add Unit tests for TypeInferer to the plan/ TypeInferer_test.go file

  • Run make dev to make sure all test cases run through

The sample

Expression /builtin_encryption.go: adds SHA1() to the PR function

func (b *builtinSHA1Sig) eval(row []types.Datum) (d types.Datum, err error) {
    // The first parameter is evaluated
    args, err := b.evalArgs(row)
    iferr ! = nil {return types.Datum{}, errors.Trace(err)
    }
    // Refer to the MySQL documentation for the meaning of each parameter
    // SHA/SHA1 function only accept 1 parameter
    arg := args[0]
    if arg.IsNull() {
        return d, nil
    }
    // See util/types/datum.go for the implementation of the function
    bin, err := arg.ToBytes()
    iferr ! = nil {return d, errors.Trace(err)
    }
    hasher := sha1.New()
    hasher.Write(bin)
    data := fmt.Sprintf("%x", hasher.Sum(nil))
    // Set the return value
    d.SetString(data)
    return d, nil
}Copy the code

Next add a unit test to the function implementation, see expression/builtin_encryption_test.go:

var shaCases = []struct {
    origin interface{}
    crypt  string
 }{
    {"test"."a94a8fe5ccb19ba61c4c0873d391e987982fbbd3"},
    {"c4pt0r"."034923dcabf099fc4c8917c0ab91ffcd4c2578a6"},
    {"pingcap"."73bf9ef43a44f42e2ea2894d62f0917af149a006"},
    {"foobar"."8843d7f92416211de9ebb963ff4ce28125932878"},
    {1024."128351137a9c47206c4507dcf2e6fbeeca3a9079"},
    {123.45."22f8b438ad7e89300b51d88684f3f0b9fa1d7a32"},
 }

 func (s *testEvaluatorSuite) TestShaEncrypt(c *C) {
    defer testleak.AfterTest(c)() // The goroutine leak monitoring tool can be directly copied
    fc := funcs[ast.SHA]
    for _, test := range shaCases {
        in := types.NewDatum(test.origin)
        f, _ := fc.getFunction(datumsToConstants([]types.Datum{in}), s.ctx)
        crypt, err := f.eval(nil)
        c.Assert(err, IsNil)
        res, err := crypt.ToString()
        c.Assert(err, IsNil)
        c.Assert(res, Equals, test.crypt)
    }
    // test NULL input for sha
    var argNull types.Datum
    f, _ := fc.getFunction(datumsToConstants([]types.Datum{argNull}), s.ctx)
    crypt, err:= f.val (nil) c.assert (err, IsNil) c.assert (crypt.isnull (), IsTrue)} * Note, except normalcaseIn addition, it is better to add some exceptionscase, such as input value nil, or multiple types of argumentsCopy the code

Finally, you need to add type derivation information and test case, see plan/ TypeInferer. go, plan/ Typeinferer_test.go:

case ast.SHA, ast.SHA1:
        tp = types.NewFieldType(mysql.TypeVarString)
        chs = v.defaultCharset
        tp.Flen = 40Copy the code
        {`sha1(123)`, mysql.TypeVarString, "utf8"},
        {`sha(123)`, mysql.TypeVarString, "utf8"},Copy the code

Edit: Add TiDB Robot wechat, join TiDB Contributor Club, no threshold to participate in open source projects, change the world from here to start (cute).

TiDB Robot wechat QR code





TiDB limited edition mugs