NLP News Cypher 11/10/2019

NLP News Cypher | 11.10.19

Echoes from EMNLP and GPT-2 Strikes Back

Wow, what a week it was. The EMNLP conference gave us many treats to chew on such as the growing popularity of cross-lingual learning and the continued adoption of knowledge graphs in language models.

Because of all this action, this week’s Cypher will be a bit longer than usual.

🤯 EMNLP 2019 🤯

New QA Leaderboard Attempting to Mitigate SQuAD Problems

GPT-2 Doesn’t Bring Armageddon

Chollet’s New Formulation of Intelligence

Unsupervised Cross-lingual Representation Learning

Compute Growth Goes Hyperbolic

What were some of the top keywords in EMNLP papers?

Accepted Paper Top Keywords

Paper Digest: EMNLP 2019 Highlights

To help the community quickly catch up on the work presented in this conference, Paper Digest Team…

EMNLP 2019

Stephen Mayhew et al were live tweeting during the conference (thank you) and sharing all the action. Here are a few threads that caught our eyes:

1. Chris Manning discusses the GQA dataset, which takes natural language questions generated from graphs (based on the visual genome project) and delivers new leaderboard results from his neural state machine paper to be presented at NeurIPS next month. Full thread and link to GQA below:

Visual Reasoning in the Real World

Question Answering on Image Scene Graphs…

Chris Manning keynote: multi-step reasoning for answering complex questions. #conll2019 #emnlp2019

Manning: Lehnert (1977) says that NLU can be measured by asking questions. #conll2019 #emnp2019 Manning: SQuAD leaderboard…

2. Allen Institute’s Matt Gardner shares his slides on the limitations of reading comprehension task in NLP. He posits an open reading benchmark that can evaluate multiple problems in reading comprehension (e.g. Sentence-level linguistic structure, Discrete Reasoning Over Paragraphs, Question-based coreference resolution, Reasoning Over Paragraph Effects in Situations, time, grounding and others) all at once. Slides:

2019-11-04 – How will we know when machines can read?

3. If you haven’t heard of GNN’s, (Graph Neural Networks) you should get familiar. Below is the presentation Graph Neural Networks for Natural Language Processing by Shikhar Vashishth, Naganand Yadati and Partha Talukdar. Warning it’s 315 slides long. Great work!


The repository contains code examples for GNN-for-NLP tutorial at EMNLP 2019.

4. My prayers were answered when Michael Galkin summarized all the knowledge graph insights from EMNLP. I won’t even bother discussing it since he did such a great job in the column below, part 2 is dropping soon!

Knowledge Graphs & NLP @ EMNLP 2019 Part I

The review post of the papers from ACL 2019 on knowledge graphs (KGs) in NLP was well-received so I thought …

New QA Leaderboard Attempting to Mitigate SQuAD Problems

TechQA, a new leaderboard based on questions posted in IBM DeveloperWorks, is introduced by IBM Research for enterprise Question Answering systems. Sobering insight we already knew:

“Natural Questions was created by harvesting users’ questions of Google’s search engine and then finding answers by using turkers. When a SQuAD system is tested on the Natural Questions leaderboard the F measure drops dramatically to 6% (on short answers — it is 2% for a SQuAD v1.1 system) illustrating the brittleness of SQuAD trained systems.”
Question Answering for Enterprise Use Cases

Answering users’ questions in an enterprise domain remains a challenging proposition. Businesses are increasingly …

GPT-2 Doesn’t Bring Armageddon

OpenAI finally unveiled their 1.5 billion parameter transformer to the world. In addition, they also unveiled a detection model for detecting AI written text that thinks all my stuff is written by AI. 😂😂

… 55 minutes later, Adam King, the creator of put GPT-2 up. 🧐🧐

… 80 minutes later, Hugging Face put it up…🧐🧐>


Hard Drive

Chollet’s New Formulation of Intelligence

Francis Chollet (of the Keras fame) dropped his thesis on defining and measuring intelligence and a new eval dataset called ARC (Abstraction and Reasoning Corpus). Apparently he was working on this for the past 2 years.

“ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence.”
I’ve just released a fairly lengthy paper on defining & measuring intelligence…

I’ve just released a fairly lengthy paper on defining & measuring intelligence, as well as a new AI evaluation dataset, the “Abstraction and Reasoning Corpus”…

Unsupervised Cross-lingual Representation Learning

Last week we showed a picture we took during Sebastian Ruder’s talk at NYU. In his blog post, he shares some the slides he used and more:

Unsupervised Cross-lingual Representation Learning

This post expands on the ACL 2019 tutorial on Unsupervised Cross-lingual Representation Learning…

Compute Growth Goes Hyperbolic

Nothing to see here, move along…

Compute Usage In AI

This column is a weekly round-up of NLP news and code drops from researchers worldwide.

Follow us on Twitter for more Code & Demos: @Quantum_Stat