Towards AI and Infinity
And… we’re back! What a week!
First, we are happy to announce we are now publishing our weekly blog on Towards AI’s platform💪💪! Happy for this publishing partnership as we intend to bring NLP trends to a more global audience from developers in NYC to business professionals in Hong Kong.
And talking about global…
In case you were busy this past week: we dropped the “The Big Bad NLP Database”, a large collection of datasets for ML and NLP developers! The database continues to grow and we have already received excellent recommendations from our users. Updates coming very soon!
Wednesday’s Announcement Article:
With the advent of deep learning and the necessity for more and diverse data, researchers are constantly hunting for the most up-to-date datasets…
Thank you for all of your support!
– The Reformer Transformer
– Speaking Thoughts and Minds
– Recognizing Facebook’s Real-Time Speech
– Deep Hacks
– Wolfram Webinars on Data Analytics
– CoNLL Meet Spacy
– I Recommend Research Papers
– Dataset of the Week: ReCord
– Meanwhile, Back at the Vegas Ranch…
The Reformer Transformer
“A Transformer model designed to handle context windows of up to 1 million words, all on a single accelerator and using only 16GB of memory.”
Google started the new year with a bang and a new transformer. Google’s new model wants to solve 2 problems weighing on transformers with large input sequences: attention and memory.
Attention is difficult to scale under a large number of words, and as a result, Google introduced a hashing technique allowing the model to efficiently “connect” similar vectors together and dividing them into chunks. After applying attention over these segments, it leads to a reduction of computational load.
The memory problem arises in a multi-layered model because of the requirement to save the activation at each layer for the backward pass. This can lead to your GPU’s memory exploding aka OOM errors.
To mitigate this issue, Google turned to reversible layers. (This technique is discussed in the paper above). It avoids storing the activation of each layer in memory and instead computes them on the backwards pass through a clever technique.
Understanding sequential data — such as language, music or videos — is a challenging task, especially when there is dependence on extensive surrounding context…
Colab for Text Generation:
This notebook was designed to run on TPU. To use TPUs in Colab, click “Runtime” on the main menu bar and select