Paper

Hypothesis

• A variational auto encoder can be used for inference of a generative model where the latent variable is a language itself.

Interesting methods

• Generative auto encoding sentence compression model (ASC)
• Other generative methods use continuous latent variables, this work uses discrete latent variables (words)
• For autoencoding, rather than embedding inputs as points in a vector space, they describe them as explicit natural language sentences.
• Discrete variational auto encoder is a natural fit for sentence compression.
• Because it is discrete, they cannot use SGD. Instead, they use the REINFORCE algorithm to mitigate the problem of high variance during sampling-based variational inference.
• In early stages it is difficult to generate reasonable compression samples (hidden state has $\vert V \vert$ possible words to be sampled from)
• They use pointer networks to construct the variational distribution to combat this (which results in limiting the latent space to sequences appearing only in the source sentence – the size of the softmax would be the words in the input sentence instead of $\vert V \vert$)
• Forced attention sentence compression (FSC)
• FSC model shares the pointer network of the ASC model and combines a softmax output layer over the whole vocab. It can switch selecting sentences from the source or generating it from the background distribution.

Details

• ASC
• ASC consists of four recurrent networks: encoder, compressor, decoder, and language model
• compression model: inference network $q_\phi (c \vert s)$
• reconstruction model: is a generative network $p_\theta (s \vert c)$ that reconstructs the source sentences $s$ based on the latent compressions
• As the prior distribution a language model $p(c)$ is used to regularize the latent compressions