Question Difficulty Prediction for READING Problems in Standard Tests

Motivation

  1. Predicting difficulty is important to help with problem sets and so on
  2. In my previous work, experts were used to label labor intensive and subjective with bias
  3. The massive accumulation of problem texts and student problem logs provided us with the opportunity to establish an approach without human intervention

Challenges

  1. How to semantically express text
  2. It makes no sense to compare the difficulty (wrong answer rate) of different questions under different exams, so you need to find a way to train the model

Framework

The frame diagram is as follows:

The input

  1. The text of the topic is composed of sentences, which are composed of words
  2. Student’s problem solving log

Sentence CNN Layer

This part is through CNN to encode the sentences in the text. CNN can capture semantic information in the text from local to global. The method of wide-CNN and P-max pooling is adopted in this paper.

It is not clear why such CNN structure is adopted instead of 2-D convolution like textCNN.

CNN’s operation is:

The embedding of k consecutive words is joined together, and then the new hidden layer vector is obtained as the same as the weight matrix.


The operation of Pooling is: pool k consecutive Word by dimension, formula is as follows:


The whole schematic diagram is as follows. Sentence is finally encoded as a vector of equal length through multiple convolutional layers

Attention Layer

This layer is attentive to expressions of sentences and questions in reading materials and options, and then attentive expressions of reading materials and options are obtained.

The author’s explanation is that the same material should form different expressions for different questions. This step should qualify the sentence for its contribution to the question. It doesn’t even seem intuitive.

There are cosine functions for attention.

Loss function

This part is an important part of the article. Firstly, Figure 1(b) shows that the wrong answer rate of the two questions is different in different tests, which shows that the difficulty of different questions in different tests is not comparable. Therefore, it cannot be directly used to predict the difficulty of loss.

The author observed that there would be multiple questions under the same test, and the same test represented the same group of students, so the difficulty of the questions under the same test was comparable. On this basis, the author proposed the test-dependent Pairwise training strategy. The form of loss function is as follows:


Train the parameters of the network by the difference of the difficulty of the two questions in the first test.

The experiment

data

The data set mentioned in this article is not a public data set. There is also a conflict in data description (the log number mentioned in the article is close to 3 million, and it is 28 million in the table, I don’t know which digit is overtyped), maybe the PDF is pre-print.

Embedding

The pre-training results of word2VEc were random, and OOV was random

The model setting

Sentence sequence length 25
Word sequence length 40
Feature map 200, 400, 600, 600
Kernel size(k) 3
Pool size(p) 3, 3, 2, 1
batch size 32
dropout 0.2

Metrics

  1. RMSE
  2. degree of agreement (DOA) ??
  3. Person correlation coefficient
  4. T-test passing ratio??

The paper also compared with the expert annotation results.

Four reading materials from the 12 tests, 16 to questions, and seven experts were selected for evaluation.

Then the model results and expert annotation results were calculated with the truth value (wrong answer rate) respectively. The results show that there are models due to all the experts. It also shows that experts have bias


Welcome to personal blogAlex Chiu’s learning space