Natural language generation is an important frontier technology of artificial intelligence. When this technology is implemented, it will face a difficult problem: how to ensure that the text generated by the model is consistent with the input at the factual level, that is, to avoid generating false and imaginary information? In order to promote relevant research, the Natural Language Generation Committee of the Chinese Information Society and the Thousand Words Open Source Dataset project jointly held the “Fact-congruent-oriented Generation Evaluation Competition”. The competition will open for registration on August 3, and the evaluation seminar and award ceremony will be held on November 7 at the first Chinese Conference on Natural Language Generation (CCNLG-2021).

Thousand Words Open source Data set project: www.luge.ai

This competition takes factual consistency as the core, and will provide three typical generation tasks with high requirements for factual consistency: copy generation [1], summary generation [2] and problem generation [3]. Combined with two indicators of text fluency and factual consistency, the effect of the competition system will be comprehensively evaluated.

In addition to the challenging questions, baidu will sponsor the event and provide the winning team with generous prizes: 20,000 RMB for the first place, 10,000 RMB for the second place and 5,000 RMB for the third place.

Registration and competition details link:

Aistudio.baidu.com/aistudio/co…

1. Brief introduction to the background and factual consistency of the contest

With the rapid development of deep neural network text generation model and pre-training language model, the readability and fluency of natural language generation are constantly improved. However, the automatically generated text often contains incorrect facts that do not match the input. This problem is known as the “fact consistency problem for natural language generation”, and a specific example is described below, taking the automatic summarization task as an example:

** Input: ** Research has previously reported the end of gold’s long bull market in 2013, cutting its 3 -, 6 – and 12-month price forecasts to $1,825, $1,805 and $1,800, respectively. In a recent research report, commodities analyst Damien Courvalin further predicted that the international gold price could fall to $1,200 an ounce by 2018. The reason is that in addition to real interest rates, the link between gold and currencies also affects the price of gold, but the three-month forecast is still bullish.

** Gold prices may fall to $1200 / Anglo in 2018

This is a sample from the LCSTS data set [2], and the algorithm needs to produce a condensed summary of the input text.

Next, let’s look at two results obtained by natural language generation algorithms:

** Result A (facts consistent) : ** Research institutions expect international gold prices to rise and fall

** Result B (facts inconsistent) : ** Analysts: Gold could fall to $1,800 in 2018

As you can see, result A is correct and result B seems fluent at first glance, but there is A factual deviation from the original (” $1,800 “vs.” $1,200 “).

At present, BLEU or ROUGE are commonly used to evaluate the effects of natural language generation. The idea is to calculate the literal matching between the generated text and the reference answer. However, the literal matching degree of results A and B is calculated with the reference abstract, but the wrong result B can get A higher score. Aiming at this problem, we design and launch this competition.

2. Schedule arrangement

To ensure the fairness of the competition, the official competition will be held in three stages:

  • ** Phase 1: ** Open test set 1. Teams can optimize the model effect on test set 1. After submitting the results to the 1000 Words platform, the results will be provided online and real-time list 1 will be updated;
  • ** Phase 2- Final Test Submission: ** Open test set 2, the team calculates the results on test set 2 and submits them to the 1000 Words platform;
  • ** Stage 3- Manual evaluation: ** The automatic evaluation results in Stage 2 shall be the criterion, and the top 10 teams will enter the manual evaluation stage. In order to avoid the evaluation bias of the automatic evaluation indicators, the final ranking will be confirmed and published based on the manual evaluation.

3. Awards for participation

Thousand Words -**** The final winning team of the Fact-congruent-oriented Generation Evaluation competition will receive:

(1) Generous bonus: 20000 RMB for 1st place, 10000 RMB for 2nd place and 5000 RMB for 3rd place.

** (2) Certificate of Honor: ** The organizer will issue the authoritative certificate of honor to the winning team.

Participants will also receive the following benefits:

** (1) Learning and communication opportunities: ** In-depth communication with participants and organizers in the event group; ** (2) Cutting-edge learning materials: ** Advanced learning materials that capture methods and indicators for improving the accuracy of natural language generation facts. ** (3) Beautiful Gifts & Certificate of Participation: ** Each member of the team who officially enters for the competition and submits the final result will receive a beautiful customized version of the 1000 words data set and a certificate of participation.

4. Competition organization

Supervisor: Chinese Information Society of China

Sponsor: Natural Language Generation Committee of Chinese Information Society (preparation)

Organizers: Tsinghua University, Harbin Institute of Technology (Shenzhen), Baidu

Evaluation Committee: Huang Minlie (Tsinghua University), Hu Baotian (Harbin Institute of Technology (Shenzhen)), Xiao Xinyan (Baidu)

Click on the link to learn more about the competition and sign up!

Aistudio.baidu.com/aistudio/co…

reference

[1] Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, Xiaoyan Zhu. 2019. Long and Diverse Text Generation with Planning-based Hierarchical Variational Model. In Proceedings of EMNLP 2019.

[2] Baotian Hu, Qingcai Chen, Fangze Zhu. 2015. LCSTS: A Large Scale Chinese Short Text Summarization Dataset. In Proceedings of EMNLP 2015.

[3] Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu, Haifeng Wang. 2019. DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications. In Proceedings of ACL 2018 MRQA Workshop.