Authors: Sewon Min Victor Zhong, Richard Socher, Caiming Xiong (Salesforce) – ACL 2018


  • Successful neural QA models build codependent representation of the document and question.
  • Learning the full context over the document is challenging and inefficient
  • They are prone to errors given adverserial inputs
  • The authors present a QA systems that is scalable to large documents. They do so using a sentence selector module that selects a few sentences that are relevant to the query. They also show on the SQuAD-Adverserial dataset that their method is more rebust compared to the previous methods.


  • The authors conduct human analysis and observe that 88% of the examples in TriviaQA are answerable given the full document and out of those, 95% are answerable using 1 or two sentences.
  • The proposed model consists of a sentence selector and a QA model
  • The sentence selector scores each sentence with respect to the query.

First the encoder computes:
$D$: sentence embeddings
$Q$: question embedding
$D^q$: question-aware sentence embeddings
$D_i^q = \sum(\alpha_{i,j}Q_j) $ # $D_i$ is the hidden state of the $i_{th}$ word in the sentence embedding $\alpha_i = \mathrm{softmax}(D_iW_1Q) $

$D^{enc} = RNN([D_i;D_i^q])$ $Q^{enc} = RNN([Q_u])$

Computes the score for the sentence by calculating bilinear similarities between sentence encodings and question encodings.

To get a single hidden representation of the question they use a weighted sum according to the following:

$\beta = \mathrm{softmax}(w^T Q_{enc})$      # weights for each query word hidden state
$q^{enc} = \sum(\beta_j Q_j)$      # A hidden layer size vector representing the query
$h_i = D^{enc}_i W_2 q^{enc}$       # how similar each sentence is to the query
$\tilde{h} = \max{h_1, …, h_L}$      # the most similar sentence
$\mathrm{score} = W_3^T h$ where $W_3 \in R^{h \times 2}$      # each dimension in score means that the question is answerable or non-answerable given the sentence


  • The modeling contribution does not seem to be strong enough for a full paper. Basically, the main contribution seems to be just adding a sentence selector module and using a previous q/a model.
  • A simple TF-IDF model works very close to the proposed sentence selection component. It is not clear why the speedups of TF-IDF sentence selection is very similar to the proposed sentence selector. TF-IDF is completely unsupervised and must be much faster than the proposed model.