Learned in Translation: Contextualized Word Vectors

Paper-pdf

Authors: Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher (Salesforce)

Hypothesis

Adding context vectors learned from an MT task can improve performance over using only unsupervised word and charachter vectors on a variety of NLP tasks.

Model

Use an MT LSTM English-German model to encode the words and get a contextual representation.
The hidden states of the encoder in the MT model are basically called the CoVe (context ) vectors.

Classification:

Use a biAttentive classification network along with CoVe vectors.