Efficient and robust question answering from minimal context over documents

Paper-pdf

Authors: Sewon Min Victor Zhong, Richard Socher, Caiming Xiong (Salesforce) – ACL 2018

Hypothesis

Successful neural QA models build codependent representation of the document and question.
Learning the full context over the document is challenging and inefficient
They are prone to errors given adverserial inputs
The authors present a QA systems that is scalable to large documents. They do so using a sentence selector module that selects a few sentences that are relevant to the query. They also show on the SQuAD-Adverserial dataset that their method is more rebust compared to the previous methods.

Model

The authors conduct human analysis and observe that 88% of the examples in TriviaQA are answerable given the full document and out of those, 95% are answerable using 1 or two sentences.
The proposed model consists of a sentence selector and a QA model
The sentence selector scores each sentence with respect to the query.

Encoder:
First the encoder computes:
$D$: sentence embeddings
$Q$: question embedding
$D^q$: question-aware sentence embeddings
$D_i^q = \sum(\alpha_{i,j}Q_j) $ # $D_i$ is the hidden state of the $i_{th}$ word in the sentence embedding $\alpha_i = \mathrm{softmax}(D_iW_1Q) $

$D^{enc} = RNN([D_i;D_i^q])$ $Q^{enc} = RNN([Q_u])$

Decoder:
Computes the score for the sentence by calculating bilinear similarities between sentence encodings and question encodings.

To get a single hidden representation of the question they use a weighted sum according to the following:

$\beta = \mathrm{softmax}(w^T Q_{enc})$      # weights for each query word hidden state
$q^{enc} = \sum(\beta_j Q_j)$      # A hidden layer size vector representing the query
$h_i = D^{enc}_i W_2 q^{enc}$       # how similar each sentence is to the query
$\tilde{h} = \max{h_1, …, h_L}$      # the most similar sentence
$\mathrm{score} = W_3^T h$ where $W_3 \in R^{h \times 2}$      # each dimension in score means that the question is answerable or non-answerable given the sentence

Weaknesses

The modeling contribution does not seem to be strong enough for a full paper. Basically, the main contribution seems to be just adding a sentence selector module and using a previous q/a model.
A simple TF-IDF model works very close to the proposed sentence selection component. It is not clear why the speedups of TF-IDF sentence selection is very similar to the proposed sentence selector. TF-IDF is completely unsupervised and must be much faster than the proposed model.