Today, Google debuted two new AI models: Gemini Robotics and Gemini Robotics-ER (extended reasoning). The new models run on Gemini 2.0, which is the most capable AI to date, according to Google. Gemini Robotics goes beyond normal outputs like text and images. Gemini Robotic prompts would get the robots to perform physical actions.
Google said it will partner with Apptronik developers to build the next generation of humanoid robots using Gemini 2.0. Apptronik is a Texas-based robotic development startup that has been in business since 2016. It has worked with NASA and Nvidia in the past. It is to be noted that Apptronik is a rival to Elon Musk’s Tesla Optimus.
In the demo video, Google showed different types of physical actions that the Robot could perform. Gemini Robotics is an advanced vision-language-action (VLA) model, and it has the addition of physical actions as a new output to control the robots directly. While Gemini Robotics-ER is an advanced spatial understanding model. It enables robots to sense and respond to changes in their surroundings.
Google wrote in the blog post.
“To be useful and helpful to people, AI models for robotics need three principal qualities: they have to be general, meaning they’re able to adapt to different situations; they have to be interactive, meaning they can understand and respond quickly to instructions or changes in their environment; and they have to be dexterous, meaning they can do the kinds of things people generally can do with their hands and fingers, like carefully manipulate objects.”