Are LLMs and Diffusion Models Overshadowing Other AI Advances?
Zian (Andy) Wang
In recent years, the field of artificial intelligence has seen a surge in the popularity and research development of diffusion models and Large Language Models (LLMs), reflecting a significant shift in the capabilities and focus of AI research. LLMs, particularly, have become increasingly prevalent, finding their way into every corner of the internet, from ChatGPT to the support chat of many businesses. It seemed like every website or tool had a “specific” LLM designated to support their users. Their ability to understand and generate human-like text has made them an indispensable tool in various sectors, revolutionizing the way we interact with and harness the power of AI.
Simultaneously, diffusion models have made equally impressive strides, particularly in the realm of image generation. Diffusion models, initially inspired by non-equilibrium statistical physics, have taken revolutionary steps forward in image generation abilities, surpassing what people thought was impossible with traditional Generative Adversarial Networks (GANs). These models, known for transforming random noise distributions into detailed and coherent images based on detailed text inputs form the user, have redefined the standards for visual content creation. Notable examples like Midjourney and Stable Diffusion have showcased the remarkable potential of these models, producing stunningly realistic and creative visuals that blur the lines between AI-generated and human-created art. The advancements in both LLMs and diffusion models not only highlight the rapid evolution of AI technologies but also underscore their growing influence and potential to reshape various industries.
A Shift in Focus
The rise of Large Language Models (LLMs) and diffusion models in the AI landscape has inadvertently led to a lessened focus on other pivotal AI domains, or at least it appears to be. More than half a decade ago, the "LLM"s of machine learning was Reinforcement Learning. Unfortunately, the development of Reinforcement Learning in the past 2-3 years can even be described as “halting” compared to its past glory.
The Rise of Reinforcement Learning
Six years ago, DeepMind’s AlphaGo symbolized the pinnacle of Reinforcement Learning and AI in general. Its beyond human intelligence in the complex board game of Go attracted the entire world’s attention. Its competition against the legendary Go player Lee Sedol was watched by more than 200 million people concurrently. AlphaGo dominated its competition and won with a land-sliding 4 to 1. The success of AlphaGo, and in turn DeepMind, was far ahead of its time and sparked the potential of Artificial Intelligence.
Following the resounding success of AlphaGo, DeepMind released its successor, AlphaGo Zero, a model capable of learning without human experience. Merely months later, Deepmind announced AlphaZero, a program that not only mastered Go but also chess and shogi. This is what science fiction pictured AI to be: learning by itself and mastering multiple sets of skills entirely based off of its own experience of the world. AlphaZero quickly surpassed the performance of all previous programs, including AlphaGo, underscoring the potential of RL for general problem-solving.
Up until 2019, DeepMind almost single handily swept through the Reinforcement Learning space, in addition to its previous accomplishments, the development of MuZero and Alpha Star only pushed the research further. In particular, Alpha Star was able to master a 3-dimensional video game, StarCraft II, a feat that was previously unthinkable for AI due to the game’s intricate dynamics and the necessity for long-term strategic planning.
However, DeepMind’s strongest competitor, OpenAI, never failed to impress either. With the release of the Gym toolkit in 2016, OpenAI built the foundation of Reinforcement Learning for every machine learning enthusiast, providing them with simple yet useful environments to train and deploy RL agents on.
But OpenAI’s contributions stretches far beyond the Gym library, particularly with the development of OpenAI Five. This AI system, designed for the complex multiplayer game Dota 2, showcased remarkable strategic depth and team coordination, achieving victories against top human e-sports teams in 2018 and 2019. In addition, OpenAI’s theoretical developments such as the Proximal Policy Optimization algorithm in 2017 has became the cornerstone of State-of-the-Art RL.
In robotics, OpenAI’s Dactyl project marked a notable mark in the integration of machine learning into hardwares. Here, a robotic hand, trained via RL, learned to solve a Rubik’s Cube with human-like dexterity. This feat was significant in showcasing RL’s application to real-world physical tasks, highlighting its adaptability and precision.
The Fall of RL
However, it seemed like Reinforcement Learning, although offered the world with wonder and excitement, lacked real-world applications.
It’s clear that in recent years, the development of Reinforcement Learning has significantly slowed with companies and researchers pushing for the development of diffusion models, Large Language Models, and text to speech models. OpenAI hasn’t published any RL-related research that is ground-breaking since 2019 while DeepMind has shifted its focus with the AlphaFold series of models. This is not to say that RL is not present in the modern machine learning landscape, in fact, RL is crucial for LLMs like ChatGPT, which utilizes Reinforcement Learning with Human Feedback to achieve a conversation-like behavior. It is the development of pure RL focused algorithms that has slowed down, or rather, overshadowed by the LLM and diffusion model revolution.
In 2021, OpenAI disbanded its robotics team citing the lack of training data compared to advancing other domains in ML. Furthermore, in 2019, OpenAI made a decision to abandon the nonprofit nature of its organization, shifting to a “capped-profit” model in order to attract more capital to support its research developments. In the current world, the lack of real-world application in Reinforcement Learning and heavy theoretical research in the area is not necessarily profitable. Instead, large-scaled pre-trained models such as DALL-E, GPT, and CLiP hold massive business and industry potential that can be used to generate profit. In fact, OpenAI has a projected revenue of $1 billion by 2024 with its ChatGPT Plus subscriptions and paid APIs.
In addition, due to OpenAI not devoting substantial resources into maintaining its Reinforcement Learning library, Gym, it has given the control of the library to the Farama Foundation. Overtime, OpenAI seems to be breaking its ties with Reinforcement Learning and shifting their focus to advancing large scale, much more capable models.
Similarly, DeepMind seems to have moved on from its RL driven research. Their AlphaFold series of models was a major milestone in biomedicine with AlphaFold 2 reaching near experimental prediction accuracies. Recently, DeepMind released a blog post on the next generation of AlphaFold, with its predictions able to cover almost all molecules in the Protein Data Bank.
But the shift in focus from two of the largest Reinforcement Learning research organizations does not signify an end to the untapped potential of RL. Swift, an autonomous system trained for competitive drone racing, achieved a first time win for a robot in a competitive sport. It was able to defeat 3 champions using only onboard sensors and computation. A major component in the training of the system involves implementing an on-policy model-free RL algorithm.
It is Not a Halt, It’s Overshadowing
The rise and fall of Reinforcement Learning provides a stark contrast to the flourishing developments of diffusion models and LLMs in recent years, but this observation shouldn’t be generalized to other areas in machine learning.
Reasonably, AI newsletters and blog posts are more likely to present content that appeal to a larger audience, and talking about LLMs and diffusion models are typically the way to go. To the public, it indeed looks like other fields of machine learning are progressing at a much slower rate. But the progress in these fields proves otherwise.
For example, Google released EfficientNetV2 in 2021, a successor to the original EfficientNets, promising better accuracy in addition to smaller model sizes. Around the same time, DeepMind came up with NFNets, a normalization-free ResNet family of models that surpasses EfficientNet models of the same size. Both are a giant leap in the computer vision field. More recently in 2023, Ultralytics launched YOLO v8, the SOTA of object detection with incredible efficiency, accuracy, and usability.
Furthermore, although it may appear to be larger tech-focused companies are all jumping in at the LLM hype, this does not mean other research goals and areas are lacking to compensate for the rapid improvement of LLM models. In October of 2023, Meta released a blog post on their current research of real time thoughts-to-image generations by analyzing brain activities. Models were able to generate images of similar categories to what the participants were seeing at an astonishing speed. If the speed of generation is not taken into account, Meta’s research shows that the decoded image from fMRI activities can be extremely accurate, down to the colors and details on each object.
On the other hand, what most people see on the outside is Meta’s Llama series of LLMs, which paved the way for many open source models later on.
The truth is, there is a larger audience that can understand and relate to a blog post talking about how to start a side hustle with ChatGPT than to read a research paper on how the new Reinforcement Learning algorithm achieved a new state-of-the-art result on some benchmark.
Moreover, an obvious reason for larger cooperation stepping into the LLM space is its profitability. ChatGPT alone can generate OpenAI billions of dollars by a simple subscription model. Additionally, LLMs’ capabilities have the potential to replace manual labor in repetitive tasks such as data entry and analysis jobs, reducing the cost of employment while increasing the efficiency of work. There are countless startups trying to jump on this hype train, from promoting AI-based education systems to AI consolers, the possibilities are endless. In fact, according to Crunchbase, computer vision companies took 10+ years to raise $22 billion while generative AI companies raised more than that in 3 years.
It is better to view the “takeover” of LLMs and diffusion models on the internet as a standalone “trend” separate from the research news in machine learning. More than enough evidence shows that these new “technologies” are far from overtaking other fields of machine learning, in fact, their advancements are nothing less than impressive and the new “gimmicks” are simply casting a shadow on these accomplishments. They are not stagnated in favor of LLMs or generative AI.
What Does This All Mean?
The current prominence of Large Language Models (LLMs) and diffusion models in the AI landscape, while significant, does not signal the stagnation of other domains like Reinforcement Learning (RL). This perceived “halting” is more about the alignment of market viability, public visibility, and immediate applicability rather than a comprehensive indicator of the entire field’s progress. LLMs and diffusion models have found widespread applications, making them more visible and relatable to the general public, which has naturally drawn more attention and investment. However, this doesn’t diminish the ongoing advancements in other AI domains.
The advancements in LLMs and diffusion models are not mutually exclusive with progress in other areas of AI. Instead, they often complement and synergies with each other. For example, Reinforcement Learning from Human Feedback (RLHF) are integral to refining LLMs, showcasing how different AI domains are interconnected and mutually beneficial. This symbiosis suggests that progress in one area can catalyze advancements across the field. Thus, the current focus on LLMs and diffusion models doesn’t preclude the importance or development of other technologies within AI.
Furthermore, the AI field thrives on diversity and the integration of various approaches. The spotlight on certain technologies at any given time reflects a combination of technical feasibility, societal needs, and economic factors, rather than a zero-sum game where the rise of one domain leads to the decline of others. The future of AI is not about one technology overshadowing the rest, but rather about the continuous evolution and amalgamation of different methods and ideas to solve complex problems.
In essence, while LLMs and diffusion models currently dominate public discourse in AI, this trend is not a halt but a phase in the ever-evolving landscape of AI research and development. Other areas like RL continue to progress, often in less conspicuous but equally significant ways, contributing to the rich tapestry of advancements that define the AI field. The trajectory of AI is shaped by a myriad of factors, and its future will undoubtedly involve a diverse array of technologies, each playing a crucial role in the broader narrative of AI development.