Explore the TiDB Lightning source code to resolve bugs found

background

In the previous article, “Remember a simple Process of Migrating Oracle offline data to TiDB”, I found a bug when importing CSV files into TiDB using Lightning. This is the process.

Table names in the Oracle source library are all in uppercase. After importing the table names into TiDB using the method described above, the data synchronization process is very smooth.

The next day I taught a novice friend the whole procedure, and he picked a table and tried it out. It didn’t work. After all the analysis and retries failed, I came up with the case problem at the end of my mind. I pulled out the CSV file and saw that it was an all lowercase file name. I tried importing the table name in uppercase and succeeded.

Select * from sqluldr2; select * from sqluldr2;

The lower-case-table-names parameter can only be set to 2. This parameter is case-sensitive for TiDB table names. Therefore, it is recommended that table names in TiDB be all lowercase.

This feature is consistent with the basic and MySQL, just MySQL support more scenarios, specific can consult dev.mysql.com/doc/refman/…

The TiDB table name is case insensitive.

Bug back

The above is still a bit abstract, so let’s repeat it through the following steps.

The TiDB test version I prepared here is V5.2.2, which is consistent with the version I found the bug in before. Lightning also uses the matching version. I can reproduce this problem with the latest master branch.

Create a test table with all caps:

use test;

create table LIGHTNING_BUG (f1 varchar(50),f2 varchar(50),f3 varchar(50));

Copy the code

Prepare a CSV file named test.lightning_bug.csv to be imported:

111|aaa|%%%
222|bbb|###

Copy the code

Lightning’s complete configuration file:

[lightning] level = "info" file = "tidb-lightning.log" index-concurrency = 2 table-concurrency = 5 io-concurrency = 5 [tikv-importer] backend = "local" sorted-kv-dir = "/tmp/tidb/lightning_dir" [mydumper] data-source-dir = "/ TMP /tidb/data" no-schema = true filter = ['*.*'] [myDumper.csv] # Field separator, one or more characters supported, default value is ','. The separator = '|' # quoting delimiter, said set to an empty string without quotation marks. Delimiter = "# end-of-line delimiter character. One or more characters are supported. Set to empty (the default) for "\n" (newline) and "\r\n" (carriage return + newline), both for the end of the line. Terminator = "" # CSV file contains header. If header = true, the first line will be skipped. Header = false # CSV file contains NULL. # if not null = true, all CSV columns will not be parsed to NULL. Not -null = false # If not-null = false (i.e. CSV can contain NULL), fields with the following values will be resolved to NULL. Null = '\N' # Whether to escape '\ 'in the field backslash-escape = true # If a line ends in a delimiter, remove the tail delimiter. trim-last-separator = false [tidb] host = "x.x.x.x" port = 4000 user = "root" password = "" status-port = 10080 pd-addr = "x.x.x.x:2379" [checkpoint] enable = false [post-restore] checksum = false analyze = falseCopy the code

Run the following command to start the import task:

./tidb-lightning --config tidb-lightning.toml --check-requirements=false

Copy the code

Error message:

[

Enterprise wechat screenshots _20220124163536. Png1398 ×693 35.3 KB

] (asktug.com/uploads/def… “Screenshot of enterprise wechat _20220124163536.png”)

The log is full of Info, except that tiDB Lightning exit is not normally output, there is no error, a picture of calm and good:

[

Enterprise wechat screenshots _20220124163630. Png1777 ×792 76.2KB

] (asktug.com/uploads/def… “Screenshot of enterprise wechat _20220124163630.png”)

In my opinion, the main problem here is that panic is very unfriendly and the prompt information is not clear enough. Although panic was said to be a null pointer exception, it has no reference value. At that time, it was misled by segmentation violation for a long time, and I always suspected that there was something wrong with the data format.

I realized that this bug should not be difficult, so I pulled a copy of TiDB source code to start locating the problem.

Processing flow of Lightning

The entry file for Lightning is br/ CMD /tidb-lightning/main.go, while its core implementation is in the br/ PKG/Lightning directory.

I retraced the whole Lightning import process according to the stack information reported error, and first located torestore.goAt line 1311 of the file, I see the following code:

Based on intuition, guesstableInfoIs anilValue, so that in fetchingtableInfo.NameA null pointer exception is reported when. If this is the case, the table name does not exist, but I remember the table does not exist in the error message:

So somewhere before that, it must have matched the upper and lower case table namesLet’s keep going up.

Rc.dbmetas and rc.dbinfos are the two map objects being compared. The error is caused by the fact that the table in dbMetas is not found in dbInfos.

The main import flow of Lightning is found by looking at the restoreTables call relationship of the method where this line of code resides:

func (rc *Controller) Run(ctx context.Context) error { opts := []func(context.Context) error{ rc.setGlobalVariables, rc.restoreSchema, rc.preCheckRequirements, rc.restoreTables, rc.fullCompact, rc.switchToNormalMode, rc.cleanCheckpoints, } .... for i, process := range opts { err = process(ctx) .... }... }Copy the code

The main process here is restoreSchema and restoreTables, which we’ll look at in a minute, but scroll up.

The next level up is the run method of the lightning.go file, and here we find out where the dbMetas came from:

func (l *Lightning) run(taskCtx context.Context, taskCfg *config.Config, g glue.Glue) (err error) { ... dbMetas := mdl.GetDatabases() web.BroadcastInitProgress(dbMetas) var procedure *restore.Controller procedure, err = restore.NewRestoreController(ctx, dbMetas, taskCfg, s, g) if err ! = nil { log.L().Error("restore failed", log.ShortError(err)) return errors.Trace(err) } defer procedure.Close() err = procedure.Run(ctx) return errors.Trace(err) }Copy the code

As you trace through, dbMetas gets the database and table names by parsing the file name to be imported. That is, it holds the Schema information to be imported, which is why CSV files are named {dbName}.{tablename}.csv.

Tips: Actually this format can be customized with [myDumper. files], this is the default format.

Above that is the RunOnce method, which is the entry to the main function and passes in an empty context object with configuration file information:

/// br > pkg > lightning > lightning.go func (l *Lightning) RunOnce(taskCtx context.Context, taskCfg *config.Config, glue glue.Glue) error { if err := taskCfg.Adjust(taskCtx); err ! = nil { return err } taskCfg.TaskID = time.Now().UnixNano() ... return l.run(taskCtx, taskCfg, glue) } /// br > cmd > tidb-lightning > main.go func main() { globalCfg := config.Must(config.LoadGlobalConfig(os.Args[1:], nil)) .... err = func() error { if globalCfg.App.ServerMode { return app.RunServer() } cfg := config.NewConfig() if err := cfg.LoadFromGlobal(globalCfg); err ! = nil { return err } return app.RunOnce(context.Background(), cfg, nil) }() .... }Copy the code

The whole process is fairly clear, and the core processing logic is stored in the Restore Controller.

So with the previous analysis, it seems like all I have to do is look at nil where the error is, but what do I do with that? It feels like it’s just treating the symptoms, but it needs further analysis.

Thinking about bugs

One more phenomenon before diving in, I removed the original import command--check-requirements=falseParameter, the following information is displayed:

It seems that lightning itself can recognize the difference in case (see here I thought the fix was the hint table does not exist), combined with the above mentionedtable schema not foundError. I think things are a little weird.

Lightning can check the upstream and downstream schemas very carefully. This logic is encapsulated in SchemaIsValid method, which is enabled only when –check-requirements=true. The checks here include library table names, field counts, data files, CSV headers, and so on. Table Schema not found

As mentioned earlier, dbMetas is obtained by parsing the file name, so let’s look at how dbInfos is obtained. Going back to the restoreSchema method mentioned earlier, I see the following code:

getTableFunc := rc.backend.FetchRemoteTableModels .... err := worker.makeJobs(rc.dbMetas, getTableFunc) .... dbInfos, err := LoadSchemaInfo(ctx, rc.dbMetas, getTableFunc) if err ! = nil { return errors.Trace(err) } rc.dbInfos = dbInfos ....Copy the code

The list of tables that can be used to obtain the target library can be read remotely from Backend. In local mode, the list of tables that can be used to obtain the target library can be obtained by calling the state port of TiDB.

curl http://{tidb-server}:10080/schema/test

Copy the code

MakeJobs method is the core implementation of Schema creation, which mainly includes three parts: database recovery, table structure recovery and view recovery. Look at some of the code below;

// 2. restore tables, execute statements concurrency for _, dbMeta := range dbMetas { // we can ignore error here, and let check failed later if schema not match tables, _ := getTables(worker.ctx, dbMeta.Name) tableMap := make(map[string]struct{}) for _, t := range tables { tableMap[t.Name.L] = struct{}{} } for _, tblMeta := range dbMeta.Tables { if _, ok := tableMap[strings.ToLower(tblMeta.Name)]; ok { // we already has this table in TiDB. // we should skip ddl job and let SchemaValid check. continue } else if tblMeta.SchemaFile.FileMeta.Path == "" { return errors.Errorf("table `%s`.`%s` schema not found", dbMeta.Name, tblMeta.Name) } ... }...Copy the code

The SchemaIsValid method is not the same as the SchemaIsValid method. The SchemaIsValid method is changed to all lower case.

The LoadSchemaInfo method is the source of the dbInfos. This object holds the actual Schema information of the target library. The following code is the most important:

func LoadSchemaInfo( ctx context.Context, schemas []*mydump.MDDatabaseMeta, getTables func(context.Context, string) ([]*model.TableInfo, error), ) (map[string]*checkpoints.TidbDBInfo, error) { result := make(map[string]*checkpoints.TidbDBInfo, len(schemas)) for _, schema := range schemas { tables, err := getTables(ctx, schema.Name) if err ! = nil { return nil, err } tableMap := make(map[string]*model.TableInfo, len(tables)) for _, tbl := range tables { tableMap[tbl.Name.L] = tbl } dbInfo := &checkpoints.TidbDBInfo{ Name: schema.Name, Tables: make(map[string]*checkpoints.TidbTableInfo), } for _, tbl := range schema.Tables { tblInfo, ok := tableMap[strings.ToLower(tbl.Name)] if ! ok { return nil, errors.Errorf("table '%s' schema not found", tbl.Name) } tableName := tblInfo.Name.String() if tblInfo.State ! = model.StatePublic { err := errors.Errorf("table [%s.%s] state is not public", schema.Name, tableName) metric.RecordTableCount(metric.TableStatePending, err) return nil, err } metric.RecordTableCount(metric.TableStatePending, err) if err ! = nil { return nil, errors.Trace(err) } tableInfo := &checkpoints.TidbTableInfo{ ID: tblInfo.ID, DB: schema.Name, Name: tableName, Core: tblInfo, } dbInfo.Tables[tableName] = tableInfo } result[schema.Name] = dbInfo } return result, nil }Copy the code

The first half of the match is always lower case, until the tableName seems to have forgotten this thing.

Finally, let’s see what tblinfo.name.string () returns:

// CIStr is case insensitive string.
type CIStr struct {
	O string `json:"O"` // Original string.
	L string `json:"L"` // Lower case string.
}

// String implements fmt.Stringer interface.
func (cis CIStr) String() string {
	return cis.O
}

Copy the code

In this way, SchemaIsValid is actually influenced by LoadSchemaInfo to give the illusion of being case-sensitive.

My fix idea

The above analysis process also mentioned the changes in my repair ideas, which can be summarized in the following two ways:

First, make nil value at the place where the error is reported to determine that the prompt table structure does not exist, but whether to continue to import or quit the whole task after encountering this prompt needs to be deeply considered, if there is a similar problem whether to fix it in this way.

Second, the whole logic is all lowercase to judge, from the root to solve the problem, so I think there are two advantages, one is to avoid the case caused by new bugs, the second is that TiDB table name itself is case-insensitive.

Next, I will submit PR in the second way to try to fix the problem.

However, for this bug I remember another case, is the database table name is lowercase file name is uppercase, I tested will have the same problem.

conclusion

When naming Schema objects in TiDB, make it a good habit to use lowercase to avoid unnecessary complications.

When using Lightning, it is important not to turn check-requirements off easily. It helps you anticipate many risks ahead of time.

From the experience of using some TiDB tools, many of their exception prompts are not very friendly, which will make users take many detdettions. I hope the official can pay attention to this optimization.

There is also, do not panic when reporting errors (in fact, when the customer scene of a batch of panic), gnaw on the source code is also very interesting.

Explore the TiDB Lightning source code to resolve bugs found

background

Bug back

Processing flow of Lightning

Thinking about bugs

My fix idea

conclusion

Related Posts

Uav layout optimization based on K-mean clustering UAV layout optimization MATLAB source code

JQuery Learning Journey (1)

A historical vulnerability analysis and recurrence of the entire process