• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, April 16, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI

Josh by Josh
April 15, 2026
in Al, Analytics and Automation
0
Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI


Google DeepMind research team introduced Gemini Robotics-ER 1.6, a significant upgrade to its embodied reasoning model designed to serve as the ‘cognitive brain’ of robots operating in real-world environments. The model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning, and success detection — acting as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search, vision-language-action models (VLAs), or any other third-party user-defined functions.

Here is the key architectural idea to understand: Google DeepMind takes a dual-model approach to robotics AI. Gemini Robotics 1.5 is the vision-language-action (VLA) model — it processes visual inputs and user prompts and directly translates them into physical motor commands. Gemini Robotics-ER, on the other hand, is the embodied reasoning model: it specializes in understanding physical spaces, planning, and making logical decisions, but does not directly control robotic limbs. Instead, it provides high-level insights to help the VLA model decide what to do next. Think of it as the difference between a strategist and an executor — Gemini Robotics-ER 1.6 is the strategist.

https://deepmind.google/blog/gemini-robotics-er-1-6/?

What’s New in Gemini Robotics-ER 1.6

Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection. But the key addition is a capability that did not exist in prior versions at all: instrument reading.

Pointing as a Foundation for Spatial Reasoning

Pointing — the model’s ability to identify precise pixel-level locations in an image — is far more powerful than it sounds. Points can be used to express spatial reasoning (precision object detection and counting), relational logic (making comparisons such as identifying the smallest item in a set, or defining from-to relationships like ‘move X to location Y’), motion reasoning (mapping trajectories and identifying optimal grasp points), and constraint compliance (reasoning through complex prompts like “point to every object small enough to fit inside the blue cup”).

https://deepmind.google/blog/gemini-robotics-er-1-6/?

In internal benchmarks, Gemini Robotics-ER 1.6 demonstrates a clear advantage over its predecessor. Gemini Robotics-ER 1.6 correctly identifies the number of hammers, scissors, paintbrushes, pliers, and garden tools in a scene, and does not point to requested items that are not present in the image — such as a wheelbarrow and Ryobi drill. In comparison, Gemini Robotics-ER 1.5 fails to identify the correct number of hammers or paintbrushes, misses scissors altogether, and hallucinates a wheelbarrow. For AI Robotics professionals this matters because hallucinated object detections in robotic pipelines can cause cascading downstream failures — a robot that ‘sees’ an object that isn’t there will attempt to interact with empty space.

Success Detection and Multi-View Reasoning

In robotics, knowing when a task is finished is just as important as knowing how to start it. Success detection serves as a critical decision-making engine that allows an agent to intelligently choose between retrying a failed attempt or progressing to the next stage of a plan.

This is a harder problem than it looks. Most modern robotics setups include multiple camera views such as an overhead and wrist-mounted feed. This means a system needs to understand how different viewpoints combine to form a coherent picture at each moment and across time. Gemini Robotics-ER 1.6 advances multi-view reasoning, enabling it to better fuse information from multiple camera streams, even in occluded or dynamically changing environments.

Instrument Reading: A Real-World Breakthrough

The genuinely new capability in Gemini Robotics-ER 1.6 is instrument reading — the ability to interpret analog gauges, pressure meters, sight glasses, and digital readouts in industrial settings. This task stems from facility inspection needs, a critical focus area for Boston Dynamics. Spot, a Boston Dynamics robot, is able to visit instruments throughout a facility and capture images of them for Gemini Robotics-ER 1.6 to interpret.

Instrument reading requires complex visual reasoning: one must precisely perceive a variety of inputs — including the needles, liquid level, container boundaries, tick marks, and more — and understand how they all relate to each other. In the case of sight glasses, this involves estimating how much liquid fills the sightglass while accounting for distortion from the camera perspective. Gauges typically have text describing the unit, which must be read and interpreted, and some have multiple needles referring to different decimal places that need to be combined.

https://deepmind.google/blog/gemini-robotics-er-1-6/?

Gemini Robotics-ER 1.6 achieves its instrument readings by using agentic vision (a capability that combines visual reasoning with code execution, introduced with Gemini 3.0 Flash and extended in Gemini Robotics-ER 1.6). The model takes intermediate steps: first zooming into an image to get a better read of small details in a gauge, then using pointing and code execution to estimate proportions and intervals, and ultimately applying world knowledge to interpret meaning.

Gemini Robotics-ER 1.5 achieves a 23% success rate on instrument reading, Gemini 3.0 Flash reaches 67%, Gemini Robotics-ER 1.6 reaches 86%, and Gemini Robotics-ER 1.6 with agentic vision hits 93%. One important caveat: Gemini Robotics-ER 1.5 was evaluated without agentic vision because it does not support that capability. The other three models were evaluated with agentic vision enabled for the instrument reading task, making the 23% baseline less a performance gap and more a fundamental architectural difference. For AI developers evaluating model generations, this distinction matters — you are not comparing apples to apples across the full benchmark column.

Key Takeaways

  • Gemini Robotics-ER 1.6 is a reasoning model, not an action model: It acts as the high-level ‘brain’ of a robot — handling spatial understanding, task planning, and success detection — while the separate VLA model (Gemini Robotics 1.5) handles the actual physical motor commands.
  • Pointing is more powerful than it looks: Gemini Robotics-ER 1.6’s pointing capability goes far beyond simple object detection — it enables relational logic, motion trajectory mapping, grasp point identification, and constraint-based reasoning, all of which are foundational to reliable robotic manipulation.
  • Instrument reading is the biggest new capability: Built in collaboration with Boston Dynamics’ Spot robot for industrial facility inspection, Gemini Robotics-ER 1.6 can now read analog gauges, pressure meters, and sight glasses with 93% accuracy using agentic vision — up from just 23% in Gemini Robotics-ER 1.5, which lacked the capability entirely.
  • Success detection is what enables true autonomy: Knowing when a task is actually complete — across multiple camera views, in occluded or dynamic environments — is what allows a robot to decide whether to retry or move to the next step without human intervention.

Check out the Technical details and Model Information. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us




Source_link

READ ALSO

Human-machine teaming dives underwater | MIT News

Structured Outputs vs. Function Calling: Which Should Your Agent Use?

Related Posts

Human-machine teaming dives underwater | MIT News
Al, Analytics and Automation

Human-machine teaming dives underwater | MIT News

April 15, 2026
Structured Outputs vs. Function Calling: Which Should Your Agent Use?
Al, Analytics and Automation

Structured Outputs vs. Function Calling: Which Should Your Agent Use?

April 15, 2026
Q&A: MIT SHASS and the future of education in the age of AI | MIT News
Al, Analytics and Automation

Q&A: MIT SHASS and the future of education in the age of AI | MIT News

April 14, 2026
TinyFish Launches Full Web Infrastructure Platform for AI Agents — Search, Fetch, Browser, and Agent Under One API Key
Al, Analytics and Automation

TinyFish Launches Full Web Infrastructure Platform for AI Agents — Search, Fetch, Browser, and Agent Under One API Key

April 14, 2026
How to Implement Tool Calling with Gemma 4 and Python
Al, Analytics and Automation

How to Implement Tool Calling with Gemma 4 and Python

April 14, 2026
Multilingual Audio Datasets for Speech Recognition AI
Al, Analytics and Automation

Multilingual Audio Datasets for Speech Recognition AI

April 14, 2026
Next Post
On-Prem Marketing Automation

On-Prem Marketing Automation

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Comment mesurer le ROI de vos actions marketing ?

Comment mesurer le ROI de vos actions marketing ?

June 10, 2025
Experts Discuss Measurement and Brand Safety Risks

Experts Discuss Measurement and Brand Safety Risks

March 27, 2026
Matt Oldford Nova Scotia: Building Quality, Vision, and Community Growth

Matt Oldford Nova Scotia: Building Quality, Vision, and Community Growth

January 26, 2026
Value Rules Are Here – Jon Loomer Digital

Value Rules Are Here – Jon Loomer Digital

June 12, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • The 9 Best AI Image Generators of 2026 (+ Examples)
  • How to Get the Most Out of Twitter Ads
  • How to Create Floating Cars & Make another Goat Float Using Helium in Goat Simulator 3
  • NASA Wants to Put Nuclear Reactors on the Moon
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions