Learn to reason tables from less data

The task of identifying textual implications, also known as natural language reasoning, involves determining whether one piece of text (a premise) can be implied or contradicted (or neither) by another piece of text (a hypothesis). While this question is generally considered an important test of machine learning (ML) system reasoning skills, and plain text input has been thoroughly studied, much less effort has been devoted to applying such models to structured data (e.g., websites, tables), databases, etc. However, when it comes to accurately summarizing table content and presenting it to the user, it is important to recognize textual implications, and for high-fidelity question answering systems and virtual assistants.

In our findings published in EMNLP 2020, “Understanding tables with Intermediate Pre-training”, we describe the first pre-training task customized for table parsing, enabling models to learn better and faster from less data. We build on our earlier TAPAS model, which is an extension of BERT’s bi-directional Transformer model with special embedding capabilities to find answers in tables. Applying our new pre-training objectives to TAPAS can produce state-of-the-art technology across multiple data sets involving tables. At TabFact, for example, it reduces the gap between the model and human performance by about 50%. We also systematically benchmarked methods for selecting relevant inputs to improve efficiency, achieving 4x speed and memory gain while retaining 92% of the results. All models for different tasks and sizes are published on the GitHub repository, where you can try them out on your own in Colab Notebook.

Text contains

When applied to tasks more challenging than the literal implications of plain text tabular data. For example, consider a table from Wikipedia with some sentences derived from its associated table content. Evaluating whether the table contains or contradicts sentences may require looking at multiple columns and rows, and may perform simple numerical calculations such as averaging, summing, and differentiating. \

Following the approach used by TAPAS, we encode the statement and the contents of the table together and, using the Transformer model, get a single number for the probability that the statement will be included or refuted by the table. \

Since the only information in the training example is binary values (that is, “true” or “not true”), training the model to understand whether statements are contained is challenging and highlights the difficulty of achieving generalization in deep learning, especially when training signals provided are scarce. Seeing isolated examples of implication or refutation, the model can easily pick up false patterns in the data to make predictions, such as the presence of the word “tie” in “Greg Norman and Billy Mayfair”, rather than actually comparing their grades, which is required for successful application of the model beyond the original training data. Pre-training tasks can be used to “warm up” the model by providing a large amount of off-the-shelf unlabeled data to the model. However, pre-training usually consists mainly of plain text rather than tabular data. In fact, TAPAS was initially pre-trained using a simple mask language to model a target that was not designed for tabular data applications. To improve the performance of the model on tabular data, we introduce two new pre-training binary classification tasks, called counterfactual and composition, which can be used as a second stage of pre-training (often called intermediate pre-training). In the counterfactual task, we take sentences from Wikipedia that refer to an entity (person, place, or thing) that also appears in a given table. Then, 50% of the time, we modify the statement by swapping entities for another alternative. To ensure that this statement is realistic, we select a replacement in the entity in the same column of the table. The model is trained to identify whether statements have been modified. The pre-training task includes millions of such examples, and while the reasoning about them isn’t complicated, they often still sound natural. For comprehensive task, we follow similar semantic parsing method, in which we use a simple set of rules of grammar to generate statements, the rules require the understanding of basic mathematical operation models, such as total and average (for example, the “total income”), or know how to use some filter element in a table (for example, “the country is Australia”). Although these statements are artificial, they help improve the numerical and logical reasoning power of the model. \

The results are based on the TAPAS model with baseline and two previous models, LogicalFactChecker (LFC) and Structure Aware Transformer (SAT), that have shown success in the textual implication domain. The baseline TAPAS model showed improved performance relative to LFC and SAT, but the pre-training model (TAPAS+CS) performed significantly better, reaching new state of the art levels. We also applied TAPAS+CS to a question-and-answer task on an SQA dataset, which required the model to find the answer from the table contents in the dialog setting. Including CS targets improves the previous best performance by more than 4 points, suggesting that this approach can also generalize performance to more than just textual implication. \

Another aspect of the data and computational efficiency counterfactual and composit-pre-training task is that since the models have been adjusted for binary classification, they can be applied without any fine-tuning to TabFact. We explore what happens to each model when trained on only a subset (or none at all) of the data. Without looking at the individual examples, the TAPAS+CS model competes with the powerful baseline Table-bert, and when only 10% of the data is contained, the results are comparable to the previous state-of-the art. \

A common concern when trying to manipulate tables with large models like these is that their high computational demands make it difficult to parse very large tables. To solve this problem, we investigate whether we can heuristically select a subset of inputs to pass through the model to optimize its computational efficiency. We conducted a systematic study of different approaches to filtering input and found that a simple method of selecting word overlap between complete columns and topic statements provided the best results. By dynamically selecting input tokens to include, we can process larger inputs with fewer resources or at the same cost. The challenge is to do so without losing important information and compromising accuracy. For example, the models discussed above all use sequences of 512 tokens, which is about the normal limit of transformer models (although recent efficiency methods such as reformer or performer have proved effective in scaling input sizes). The column selection method we present here allows for faster training while still achieving high accuracy on TabFact. For 256 input markers, our accuracy loss was very small, but we can now pre-train the model, fine-tune it, and make predictions twice as fast. With 128 tokens, the model is still superior to previous state-of-the-art models and has significantly more acceleration – four times faster across the board. \

Using our proposed column selection method and novel pre-training tasks, we can create table parsing models that require less data and less computational power to achieve better results.

We offer new models and pre-training techniques in the GitHub repository that you can try out for yourself in Colab. To make this approach easier to use, we also share models of various sizes, all the way up to “micro.” We hope that these results will help promote the development of tabular reasoning in the wider research community.

Update note: first update wechat public number “rain night blog”, later update blog, after will be distributed to each platform, if the first to know more in advance, please pay attention to the wechat public number “rain night blog”.

Blog Source: Blog of rainy Night