Advanced Word Representations for Deep Learning
Winter Semester 2017/2018

Table of Contents

1 Course information

Warning: this course can only be attended by students who received a confirmation of their registration before the first lecture.

1.1 Description

Over the last five years, distributed word representations have largely replaced traditional word representations in natural language processing. Words used to be represented as a disjoint units in traditional respresentations. However, now we are increasingly using distributed representations that use distributional semantics to structure a vector space such that semantically or syntactically similar words have similar representations. In this Hauptseminar, we will will focus on three main questions:

  1. ​​How can distributed representations be constructed?
  2. ​​What information do distributed representations encode?
  3. ​​How can representations for larger units be derived from smaller units?

This course is structured as an intensive reading and research class. Every week we will discuss one or two papers. The papers will not be presented, but discussed in detail, and each weekly discussion is led by a student. Throughout the course, we will do experiments based on the questions that come up during these discussions.

1.2 Time/location

1.3 Registration

The course is only accessible to people who registered timely, as outlined in the semester introduction.

2 Schedule

The reading schedule will be adjusted throughout the course based on the interests and research questions of the attendents.

Date Reading material Discussion leader
October 24 Jurafsky & Martin, 2018, Schütze, 1993 Daniël
October 31 Bengio et al., 2003 Tobias
November 7 Collobert & Weston, 2008, Mikolov et al., 2013 Lukas
November 14 Levy & Goldberg, 2014 Neele
November 21 Ling et al., 2015, Levy & Goldberg, 2014b, Bojanowski et al., 2017 Sebastian
November 28 Pennington et al., 2014 Kevin
December 5 🎅🏻 Peters et al., 2018 Madeesh
December 12 Devlin et al., 2018 Michael
December 19 Joulin et al., 2016 Daniel
January 9 Linzen, 2016 Patricia

3 Projects

3.1 The teams

3.1.1 Composition: Corina, Neele & Patricia

3.1.2 Topological field labeling: Sebastian & Tobias

Plus maybe constituency parsing.

3.1.3 Named entity recognition: Lukas & Madeesh

3.1.4 Sentence similarity: Kevin & Michael

3.2 Embeddings for comparison

We agreed to the following variations (when applicable):

  • Dimensionality: 100, 300
  • Context size: 5, 10
  • Skip-gram negative sampling: 5, 15
  • Minimum frequency cut-off: 5, 30
  • For word2vec and GloVe: word and context embeddings.

Work division:

  • Word2Vec skip-gram: Team NER
  • GloVe: Team composition
  • Dependency/tag embeddings: Team topological fields
  • Structured skip-gram + subword units: Daniël
  • ELMo: Neele (training time permitting)

4 Literature

5 Policy

5.1 Grading

The course is concluded by an oral exam (3 CP) or a paper (6 CP) describing the research that you have conducted throughout this course.

The paper should be formatted in LaTeX using the ACL style. The paper should be solely your own work. Plagiarism will be reported to the faculty.

5.2 Attendance

Since this is an intensive course, attendance is required. Moreover, you should read the weekly literature. If you cannot be present due to illness, a doctor's note should be presented.

Author: Daniël de Kok

Created: 2019-01-02 Wed 19:06