For many years, robotics has operated on a relatively simple concept; program a machine to follow specific commands or directions, and it will do it. The actual process of doing this is usually prolonged and painstaking - robots have to be built for a specific purpose with very few deviations, and training them is expensive and takes multiple interactions, but it worked until now. The recent advances in large language models have opened a window of opportunity for revisiting the efficiency of robots and how using large language models could potentially make robots more generalized and intuitive. We already know that generative AI built on large language models, like ChatGPT and Bard, are able to generate new data and insights based on the data they have but what if the same can be done with robotics?

🤖 History of language interfaces with robots

Robots today have come a long way from Unimate, the first ever digitally operable robot that was invented in 1954. Back then, button interfaces were mainly used as a way to limit human interaction, which was one of the primary goals at the time. With these button interfaces, robots could receive commands from a distance efficiently. By the 1980s, the first industrial robots were being developed in the United States, and there was a need for interfaces that could reliably control these robots. This resulted in the development of more sophisticated button interface systems leading to robots that could have thousands of buttons on them.

The rise of natural language interfaces and speech recognition systems has ushered in a new wave of human-robot interactions. Through these systems, robots can communicate with humans more intuitively and learn from these interactions. This collaborative approach to human-robot interactions means that robots are able to learn and adapt to environments quickly. With the addition of large language models to the playing field, these robots can now access even more problem-solving and understanding skills.

💻 The current state of the art

Large language models are now responsible for a complete shift in how robotics is regarded. Things that would have been impossible a couple of years ago, robots having a contextual understanding of a particular scenario, for example, are now realistic goals. One of the most recent examples is Amazon’s home robot, Astro. When the robot was released last year, it was announced as a self-aware domestic robot with impressive navigational skills that could serve as a security guard/pet. While impressive, the robot could not do much more than its main functions, and even that not very well

This will all change soon, as according to internal documents, Amazon is adding a layer of intelligence and a conversational spoken interface to Astro using language AI. In an example outlined in the papers, the upgraded Astro would be able to see broken glass on the floor, identify that it is a potential hazard and prioritize sweeping it up before anyone could step on it, a series of complicated steps that used to be impossible for robots. 

Researchers at Google are also using language models to help robots learn new skills for themselves. Using PaLM, Google’s own large language model, the robots are able to have a contextual understanding of their environments that would enable them to respond to requests like ‘Bring me a snack and something to wash it down with”. One of those models, PaLM-E works by framing robot tasks and visio-language tasks together through a common representation, essentially taking images and text as input and outputting text. In an impressive display, when asked to take a bag of chips to a person, PaLM-E was successfully able to produce a plan to find the specific drawer, open it, and update the plan as it executed the task. 

Google’s latest model, RT-2 is a visual-language-action model that is able to see and analyze the world around it and tell a robot how to move.  It is also able to make abstract connections between different concepts, like picking a dinosaur out of a group of animals when instructed to pick an extinct animal and picking up a soccer ball after being told to pick up Lionel Messi.

👏 Other projects and startups

Apart from large companies like Google and Amazon, other startups and organizations are also experimenting with large language models and robotics on a smaller scale. One of those startups, Furhat Robotics, is on a mission to build the world’s most advanced social robot, a task that is made possible by its use of large language models and conversational intelligence. As a result, their Furhat robot is able to not only learn new skills, it can also engage in natural; conversations with rapid turn-taking. 

TidyBot, a collaborative project between Princeton University, Stanford University, The Nueva School, and Columbia University, also uses large language models to provide personalized robot assistance. Using summarizations with large language models, researchers were able to achieve generalization in robotics resulting in household robots being able to learn preferences from just a few examples. 

Recent research by Microsoft also explores the potential of using large language models to control robots intuitively with language. By giving ChatGPT the controls of a real drone, they were able to determine, researchers were able to determine that it could serve as a highly intuitive language-based interface between a user and a robot. 

📎 Conclusion

As robotics technology advances, the possibilities of what could be achieved with language AI grows endless. Language AI is already helping to overcome the limitations of traditional interfaces, creating a partway for more natural, intuitive interactions. However, some of the issues with large language models such as privacy and bias could also apply to robots built on them. The complexities of human language might also not be completely understandable by robots, but these are all problems that can be solved as more technology develops. For now, the odds of an intelligent robot powered by large language models are pretty good.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo