Advanced Word Representations for Deep Learning
Winter Semester 2017/2018
Table of Contents
1 Course information
Warning: this course can only be attended by students who received a confirmation of their registration before the first lecture.
1.1 Description
Over the last five years, distributed word representations have largely replaced traditional word representations in natural language processing. Words used to be represented as a disjoint units in traditional respresentations. However, now we are increasingly using distributed representations that use distributional semantics to structure a vector space such that semantically or syntactically similar words have similar representations. In this Hauptseminar, we will will focus on three main questions:
- How can distributed representations be constructed?
- What information do distributed representations encode?
- How can representations for larger units be derived from smaller units?
This course is structured as an intensive reading and research class. Every week we will discuss one or two papers. The papers will not be presented, but discussed in detail, and each weekly discussion is led by a student. Throughout the course, we will do experiments based on the questions that come up during these discussions.
1.2 Time/location
- Lecturer: Daniël de Kok <daniel.de-kok@uni-tuebingen.de>
- Local organizer: Corina Dima <corina.dima@uni-tuebingen.de>
- When:
- Wednesday 11:15-13:00, Room 2.31, Verfügungsgebäude
- First lecture: Wednesday, October 24
2 Schedule
The reading schedule will be adjusted throughout the course based on the interests and research questions of the attendents.
| Date | Reading material | Discussion leader |
|---|---|---|
| October 24 | Jurafsky & Martin, 2018, Schütze, 1993 | Daniël |
| October 31 | Bengio et al., 2003 | Tobias |
| November 7 | Collobert & Weston, 2008, Mikolov et al., 2013 | Lukas |
| November 14 | Levy & Goldberg, 2014 | Neele |
| November 21 | Ling et al., 2015, Levy & Goldberg, 2014b, Bojanowski et al., 2017 | Sebastian |
| November 28 | Pennington et al., 2014 | Kevin |
| December 5 🎅🏻 | Peters et al., 2018 | Madeesh |
| December 12 | Devlin et al., 2018 | Michael |
| December 19 | Joulin et al., 2016 | Daniel |
| January 9 | Linzen, 2016 | Patricia |
3 Projects
3.1 The teams
3.1.1 Composition: Corina, Neele & Patricia
3.1.2 Topological field labeling: Sebastian & Tobias
Plus maybe constituency parsing.
3.1.3 Named entity recognition: Lukas & Madeesh
3.1.4 Sentence similarity: Kevin & Michael
3.2 Embeddings for comparison
We agreed to the following variations (when applicable):
- Dimensionality: 100, 300
- Context size: 5, 10
- Skip-gram negative sampling: 5, 15
- Minimum frequency cut-off: 5, 30
- For word2vec and GloVe: word and context embeddings.
Work division:
- Word2Vec skip-gram: Team NER
- GloVe: Team composition
- Dependency/tag embeddings: Team topological fields
- Structured skip-gram + subword units: Daniël
- ELMo: Neele (training time permitting)
4 Literature
- Vector Semantics, Daniel Jurafsky & James H. Martin, Speech and Language Processing, 3rd edition, 2018
- Word Space, Hinrich Schütze, Advances in neural information processing systems, 1993
- A Neural Probabilistic Language Model, Yoshua Bengio, Réjean Ducharme, Pascal Vincent & Christian Jauvin, Journal of Machine Learning Research 3, 2003
- A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning, Ronan Collobert & Jason Weston, Proceedings of the 25th International Conference on Machine Learning, 2008
- Efficient Estimation of Word Representations in Vector Space, Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, 2013
- Neural Word Embedding as Implicit Matrix Factorization, Omer Levy & Yoav Goldberg, Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014
- Two/Too Simple Adaptations of Word2Vec for Syntax Problems, Wang Ling, Chris Dyer, Alan Black & Isabel Trancoso, NAACL, 2015
- Dependency-Based Word Embeddings, Omer Levy & Goldberg, ACL, 2014
- Enriching Word Vectors with Subword Information, Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, TACL, 2017.
- GloVe: Global Vectors for Word Representation, Jeffrey Pennington, Richard Socher & Christopher D. Manning, EMNLP, 2014
- Deep contextualized word representations, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, NAACL, 2018
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2018
- FastText.zip: Compressing text classification models, Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov, 2016
- Issues in evaluating semantic spaces using word analogies, Tal Linzen, 2016
5 Policy
5.1 Grading
The course is concluded by an oral exam (3 CP) or a paper (6 CP) describing the research that you have conducted throughout this course.
The paper should be formatted in LaTeX using the ACL style. The paper should be solely your own work. Plagiarism will be reported to the faculty.
5.2 Attendance
Since this is an intensive course, attendance is required. Moreover, you should read the weekly literature. If you cannot be present due to illness, a doctor's note should be presented.