PODCAST: review of a failed Peer Review. The Mamba paper was submitted to ICLR 2024, which is described as one of the most prestigious machine learning conferences in the world. In January, the paper was rejected by peer reviewers.

Explore how Mamba positions itself as a competitor to the Transformers AI Architecture. The provided source introduces Mamba, a novel neural network architecture that significantly outperforms transformers in language modeling, especially at smaller model sizes, while requiring less computational power. It explains that Mamba’s efficiency stems from its linear O(n log(n)) compute complexity compared to the transformer’s quadratic O(n^2), enabling larger context sizes. The text details how Mamba is an extension of recurrent neural networks (RNNs), overcoming their historical limitations (slow parallelization and training difficulties) by employing linear recurrences, a specialized parallel scan algorithm, and a unique initialization strategy that addresses vanishing and exploding gradients. Finally, it highlights the controversial rejection of the Mamba paper from a prestigious conference, attributing it to reviewers’ apparent misunderstandings and misinterpretations of the architecture and its evaluation.
ICLR 2024 refers to the 12th International Conference on Learning Representations, which took place in Vienna, Austria from May 7th to May 11th, 2024. It is a leading conference in the field of machine learning, particularly deep learning. The conference featured a main conference with paper presentations, workshops, and invited talks. [1, 2, 3, 4, 5]
This video provides an overview of ICLR 2024: https://m.youtube.com/shorts/uiZWo1c2Fr4
Key Aspects of ICLR 2024:
- Call for Papers: The conference had a call for papers with abstract submission deadlines in September 2023 and final decisions in January 2024. [7]
- Workshops: A variety of workshops were held on May 11th, covering diverse topics like Tackling Climate Change with Machine Learning, and the 5th Workshop on African Natural Language Processing (AfricaNLP 2024). [2, 4, 8]
- Invited Talks: The conference included invited talks, one of which focused on the “emerging science of benchmarks”. [5]
- Virtual Component: ICLR 2024 had a virtual component accessible through the conference website, including schedule information, paper browsing, and search functionality. [9]
- Sponsors and Exhibitors: The conference had various sponsors and exhibitors, including Cambridge University Press and Intelligent Computing. [10]
- Tiny Papers Track: A special track for “Tiny Papers” was also organized, encouraging shorter, more focused research papers. [11, 12]
Overall, ICLR 2024 was a significant event for the machine learning community, showcasing cutting-edge research and fostering discussions on important topics within the field. [2, 3, 6, 13]
[1] https://dblp.org/db/conf/iclr/iclr2024
[2] https://www.climatechange.ai/events/iclr2024
[3] https://en.wikipedia.org/wiki/International_Conference_on_Learning_Representations
[4] https://iclr.cc/Conferences/2024/CallForWorkshops
[5] https://iclr.cc/virtual/2024/eventlistwithbios/InvitedTalks-2024
[6] https://iclr-blogposts.github.io/2024/about/
[7] https://iclr.cc/Conferences/2024/CallForPapers
[8] https://iclr.cc/Downloads/2024
[9] https://iclr.cc/virtual/2024/index.html
[10] https://iclr.cc/Conferences/2024/Sponsors
[11] https://blog.iclr.cc/category/iclr-2024/
https://notebooklm.google.com/notebook/17336321-6b43-4817-952f-388f4c30b4d8
Recurrent Neural Networks (RNNs) aim to improve upon Convolutional Neural Networks (CNNs) by addressing their difficulty in understanding long-range dependencies in input sequences.
Here’s how RNNs improve CNNs:
* **Addressing Long-Range Dependencies**: CNNs apply a neural net to only a small, consecutive group of vectors at a time. This means that information from vectors that are far apart cannot be combined until multiple convolutional layers have been applied, making it challenging for CNNs to understand long-range dependencies, which are common in natural language text.
* **Incorporating All Prior Information**: RNNs take a different approach: instead of applying the neural net to consecutive input vectors, they apply it to one input vector and the *previous output* of the neural net. This seemingly small change has **profound consequences**:
* Each output vector in an RNN **contains information from all of the input vectors prior to it**, rather than just one previous vector.
* The final output vector can contain information from every vector in the input sequence, regardless of its length.
* **Efficiency**: RNNs achieve this ability to incorporate long-range information **without using more compute than a convolutional layer**.
In essence, RNNs manage to incorporate long-range information “for free” compared to CNNs, which struggle with this aspect without needing many layers.