### Paper-ArXiv

Authors: Dimitri Kartsaklis, Mohammad Taher Pilehvar, Nigel Collier (University of Cambridge)
Venue: EMNLP 2018

### Hypothesis

• Many KBs are sparse (lacking connections between many entities). Extending a KB using additional contexutal features can enrich them and connect the missing dots.
• This extended KB can be leveraged to create a synthetic corpus on which skipgram models can be trained to learn entity embeddings.
• To map text to entities in this KB, one needs to consider multiple senses associated with words in the text. To do this, one can use an LSTM whose input is the weighted averaged senses of words in the text (calculated by attention w.r.t context of the words).

Mapping a textual description to KB entities (e.g., “can’t sleep, too tired to think straight” -> “Insomnia”) (also called grounding or normalization)

### Model

• How to embed entities?
• Use a set of reliable anchors or textual features that would link entities with textual descriptions.
• The textual features are textual descriptions of the entity weighted by tf-idf values.
• Tf-idf values are computed by considering word frequencies in a document composed of textual descriptions of an entity (tf) against all other documents (df).
• After extending the KB, the entities are embedded using skip-gram model trained on a synthetic corpus build by random work on this extended KB.
• The random walk is done according to a distribution of either choosing the next entity (blue node) or next textual feature (red node).
• Text-to-entity mapping: Transformation from a textual vector space to a KB vector space.
• An LSTM encodes a sentence (description of the entity) into a point in the embedding space.
• The authors raise the issue of polysemy (words with slightly different meanings). As an example, while the lemma for the word “fever” in a dictionary usually contains two or three definitions, the term occurs in many dozens of different forms and contexts in SNOMED.
• They try to address this by extending a standard LSTM with a so called multi-sense LSTM that considers multiple sense-vectors given context in training.
• Each word is associated with a single generic embedding and $k$ sense embeddings.
• For each word $w_i$, a context vector $c_i$ is computed as the average of the generic vectors for all other words in the sentence.
• The probability of each sense vector $s_j$ given this context $c_i$ is then calculated via attention (capturing similarity between the context and the sense) $p(s_j|c_i) = \frac{exp(\tanh(Ws_j+Uc_i))}{\sum_{l=1}^k exp(\tanh(Ws_l, c_8))}$
• Each sense vector $s_i$ is updated by addition of the context vector weighted by its similarity with the specific sense. $s_i = s_i + (s_i c_i) c_i$

### Results

• Evaluation is done on three tasks
• Text to entity mapping
• A dataset of size 21,000 constructed from SNOMED concepts associated with multi-word textual descriptions.

• Reverse dictionary
• Dataset from WordNet by Hill et al (2016) where the goal is to return a word given its definition. Their method achieves up to 0.96 Acc-10 results a 3 point improvement over the best baseline.
• Document classification
• Cora (2078 papers with 7 categories)
• They achieve up to 88% accuracy a 1% improvement over the best baseline.

### conclusions

Using a graph embedding space as a target for mapping text to entities is an effective approach.