• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, May 12, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Technology And Software

Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'

Josh by Josh
May 12, 2026
in Technology And Software
0
Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'



Is AI leaving the era of "turn-based" chat?

READ ALSO

Humanoid robots: Can Tesla and other companies deliver on the hype?

Daybreak Is OpenAI’s Response To Anthropic’s Claude Mythos

Right now, all of us who use AI models regularly for work or in our personal lives know that the basic interaction mode across text, imagery, audio, and video remains the same: the human user provides an input, waits anywhere between milliseconds to minutes (or in some cases, for particularly tough queries, hours and days), and the AI model provides an output.

But if AI is to really take on the load of jobs requiring natural interaction, it will need to do more than provide this kind of "turn-based" interactivity — it will ultimately need to respond more fluidly and naturally to human inputs, even responding while also processing the next human input, be it text or another format.

That at least seems to be the contention of Thinking Machines, the well-funded AI startup founded last year by former OpenAI chief technology officer Mira Murati and former OpenAI researcher and co-founder John Schulman, among others.

Today, the firm announced a research preview of what it deems to be "interaction models, a new class of native multimodal systems that treats interactivity as a first-class citizen of model architecture rather than an external software "harness," scoring some impressive gains on third-party benchmarks and reduced latency as a result.

However, the models are not yet available to the general public or even enterprises — the company says in its announcement blog post: "In the coming months, we will open a limited research preview to collect feedback, with a wider release later this year."

'Full duplex' simultaneous input/output processing

At the heart of this announcement is a fundamental shift in how AI perceives time and presence. Current frontier models typically experience reality in a single thread; they wait for a user to finish an input before they begin processing, and their perception freezes while they generate a response.

In their blog post, the Thinking Machines researchers described the status quo as a limitation that forces humans to "contort themselves" to AI interfaces, phrasing questions like emails and batching their thoughts.

To solve this "collaboration bottleneck," Thinking Machines has moved away from the standard alternating token sequence.

Instead, they use a multi-stream, micro-turn design that processes 200ms chunks of input and output simultaneously.

This "full-duplex" architecture allows the model to listen, talk, and see in real time, enabling it to backchannel while a user speaks or interject when it notices a visual cue—such as a user writing a bug in a code snippet or a friend entering a video frame. Technically, the model utilizes encoder-free early fusion.

Rather than relying on massive standalone encoders like Whisper for audio, the system takes in raw audio signals as dMel and image patches (40×40) through a lightweight embedding layer, co-training all components from scratch within the transformer.

Dual model system

The research preview introduces TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters. Because real-time interaction requires near-instantaneous response times that often conflict with deep reasoning, the company has architected a two-part system:

  1. The Interaction Model: Stays in a constant exchange with the user, handling dialog management, presence, and immediate follow-ups.

  2. The Background Model: An asynchronous agent that handles sustained reasoning, web browsing, or complex tool calls, streaming results back to the interaction model to be woven naturally into the conversation.

This setup allows the AI to perform tasks like live translation or generating a UI chart while continuing to listen to user feedback—a capability demonstrated in the announcement video where the model provided typical human reaction times for various cues while simultaneously generating a bar chart.

Impressive performance on major benchmarks against other leading AI labs' fast interaction models

To prove the efficacy of this approach, the lab utilized FD-bench, a benchmark specifically designed to measure interaction quality rather than just raw intelligence.The results show that TML-Interaction-Small significantly outperforms existing real-time systems:

  • Responsiveness: It achieved a turn-taking latency of 0.40 seconds, compared to 0.57s for Gemini-3.1-flash-live and 1.18s for GPT-realtime-2.0 (minimal).

  • Interaction Quality: On FD-bench V1.5, it scored 77.8, nearly doubling the scores of its primary competitors (GPT-realtime-2.0 minimal scored 46.8).

  • Visual Proactivity: In specialized tests like RepCount-A (counting physical repetitions in video) and ProactiveVideoQA, Thinking Machines’ model successfully engaged with the visual world while other frontier models remained silent or provided incorrect answers.

Metric

TML-Interaction-Small

GPT-realtime-2.0 (min)

Gemini-3.1-flash-live (min)

Turn-taking latency (s)

0.40

1.18

0.57

Interaction Quality (Avg)

77.8

46.8

54.3

IFEval (VoiceBench)

82.1

81.7

67.6

Harmbench (Refusal %)

99.0

99.5

99.0

A potentially huge boon to enterprises — once the models are made available

If made available to the enterprise sector, Thinking Machines' interaction models would represent a fundamental shift in how businesses integrate AI into their operational workflows.

A native interaction model like TML-Interaction-Small allows for several enterprise capabilities that are currently impossible or highly brittle with standard multimodal models:

Current enterprise AI requires a "turn" to be completed before it can analyze data. In a manufacturing or lab setting, a native interaction model can monitor a video feed and proactively interject the moment it detects a safety violation or a deviation from a protocol — without waiting for the worker to ask for feedback.

The model's success in visual benchmarks like RepCount-A (accurate repetition counting) and ProactiveVideoQA (answering questions as visual evidence appears) suggests it could serve as a real-time auditor for high-stakes physical tasks.

The primary friction in voice-based customer service is the 1–2 second "processing" delay common in 2026's standard APIs. Thinking Machines' model achieves a turn-taking latency of 0.40 seconds, roughly the speed of a natural human conversation.

Because it handles simultaneous speech natively, an enterprise support bot could listen to a customer's frustration, provide "backchannel" cues (like "I see" or "mm-hmm") without interrupting the user, and offer live translation that feels like a natural conversation rather than a series of disjointed recordings.

Standard LLMs lack an internal clock; they "know" time only if it is provided in a text prompt. Interaction models are natively time-aware, allowing them to manage time-sensitive processes like "Remind me to check the temperature every 4 minutes" or "Alert me if this process takes longer than the last one". This is critical for industrial maintenance and pharmaceutical research where timing is an essential variable.

Background on Thinking Machines

This release marks the second major milestone for Thinking Machines following the October 2025 launch of Tinker, a managed API for fine-tuning language models that lets researchers and developers control their data and training methods while Thinking Machines handles the infrastructure burden of distributed training.

The company said Tinker supports both small and large open-weight models, including mixture-of-experts models, and early users included groups at Princeton, Stanford, Berkeley and Redwood Research.

At launch in early 2025, Thinking Machines framed itself as an AI research and product company trying to make advanced AI systems “more widely understood, customizable and generally capable.”

In July 2025, Thinking Machines said it had raised about $2 billion at a $12 billion valuation in a round led by Andreessen Horowitz, with participation from Nvidia, Accel, ServiceNow, Cisco, AMD and Jane Street, described by WIRED as the largest seed funding round in history.

The Wall Street Journal reported in August 2025 that rival tech CEO Mark Zuckerberg approached Murati about acquiring Thinking Machines Lab and, after she declined, Meta pursued more than a dozen of the startup’s roughly 50 employees.

In March and April 2026, the company also became known for its compute ambitions: it announced a Nvidia partnership to deploy at least one gigawatt of next-generation Vera Rubin systems, then expanded its Google Cloud relationship to use Google’s AI Hypercomputer infrastructure with Nvidia GB300 systems for model research, reinforcement learning workloads, frontier model training and Tinker.

By April 2026, Business Insider reported that Meta had hired seven founding members from Thinking Machines, including Mark Jen and Yinghai Lu, while another Thinking Machines researcher, Tianyi Zhang, also moved to Meta. The same reporting said Joshua Gross, who helped build Thinking Machines’ flagship fine-tuning product Tinker, had joined Meta Superintelligence Labs, and that the company had grown to about 130 employees despite the departures.

Thinking Machines was not simply losing people, however: it also hired Meta veteran Soumith Chintala, creator of PyTorch, as CTO, and added other high-profile technical talent such as Neal Wu. TechCrunch separately reported in April 2026 that Weiyao Wang, an eight-year Meta veteran who worked on multimodal perception systems, had joined Thinking Machines, underscoring that the talent flow was not one-way.

Thinking Machines previously stated it was committed to "significant open source components" in its releases to empower the research community. It's unclear if these new interaction models models will fall under the same ethos and release terms.

But one thing is certain: by making interactivity native to the model, Thinking Machines believes that scaling a model will now make it both smarter and a more effective collaborator.



Source_link

Related Posts

Humanoid robots: Can Tesla and other companies deliver on the hype?
Technology And Software

Humanoid robots: Can Tesla and other companies deliver on the hype?

May 12, 2026
Daybreak Is OpenAI’s Response To Anthropic’s Claude Mythos
Technology And Software

Daybreak Is OpenAI’s Response To Anthropic’s Claude Mythos

May 12, 2026
Testing for ‘Bad Cholesterol’ Doesn’t Tell the Whole Story
Technology And Software

Testing for ‘Bad Cholesterol’ Doesn’t Tell the Whole Story

May 11, 2026
Daniel Ek-backed defense tech Helsing to raise $1.2B at $18B valuation
Technology And Software

Daniel Ek-backed defense tech Helsing to raise $1.2B at $18B valuation

May 11, 2026
AI tool poisoning exposes a major flaw in enterprise agent security
Technology And Software

AI tool poisoning exposes a major flaw in enterprise agent security

May 11, 2026
Samsung’s Bespoke Update Is Big Step Towards A Useful AI For Your Fridge
Technology And Software

Samsung’s Bespoke Update Is Big Step Towards A Useful AI For Your Fridge

May 11, 2026
Next Post
LinkedIn Crossclimb Answer Today for May 12, 2026 (Puzzle #742)

LinkedIn Crossclimb Answer Today for May 12, 2026 (Puzzle #742)

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Unfiltered AI Companion Chatbots with Phone Calls: Top Picks

Unfiltered AI Companion Chatbots with Phone Calls: Top Picks

August 31, 2025
Novoloop is making tons of upcycled plastic

Novoloop is making tons of upcycled plastic

June 24, 2025

Communications Leadership Council Roundtable Recap: Redefining communications’ value

October 26, 2025
Agoda Open Sources APIAgent to Convert Any REST pr GraphQL API into an MCP Server with Zero Code

Agoda Open Sources APIAgent to Convert Any REST pr GraphQL API into an MCP Server with Zero Code

February 17, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Top 15 White Label SEO Agencies to Outsource Projects
  • LinkedIn Crossclimb Answer Today for May 12, 2026 (Puzzle #742)
  • Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'
  • Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions