Remi Lebret, Ronan Collobert
Hypothesis / Main methods
- N-grams can be represented by averaging/summing their corresponding word vectors. Then a K-means clustering approach can cluster semantically similar concepts (n-grams). Essentially, each n-gram is assigned to one of the K clusters and each document is represented by a feature vector of dimension K where elements are count based features.
- Achieve better results in comparison with LDA / LSA.