• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, June 8, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution

Josh by Josh
August 8, 2025
in Al, Analytics and Automation
0
Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution


A Team of researchers from USC, Salesforce AI and University of Washington have introduced CoAct-1, a pioneering multi-agent computer-using agent (CUA) that marks a significant leap in autonomous computer operation. By elevating coding to a first-class action—on par with traditional GUI manipulation—CoAct-1 overcomes longstanding challenges of efficiency and reliability in complex, long-horizon computer tasks. On the demanding OSWorld benchmark, CoAct-1 sets a new gold standard, achieving a state-of-the-art (SOTA) success rate of 60.76%, making it the first CUA agent to surpass the 60% mark.

Why CoAct-1? Bridging the Efficiency Gap in Computer-Using Agents

Conventional CUA agents rely solely on pixel-based GUI interaction—emulating human users by clicking, typing, and navigating interfaces. While this approach mimics user workflows, it proves fragile and inefficient for intricate, multi-step tasks, especially those involving dense UI layouts, multi-app pipelines, or complex OS operations. Single errors such as a mis-click can derail entire workflows, and sequence lengths balloon as tasks increase in complexity.

READ ALSO

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

Efforts to mitigate these issues have included augmenting GUI agents with high-level planners, as seen in systems like GTA-1 and modular multi-agent frameworks. However, these methods cannot escape the bottleneck of GUI-centric action spaces, ultimately limiting both efficiency and robustness.

CoAct-1: Hybrid Architecture with Coding as Action

CoAct-1 takes a fundamentally different approach by integrating three specialized agents:

  • Orchestrator: The high-level planner that decomposes complex tasks and dynamically delegates each subtask either to the Programmer or the GUI Operator based on task requirements.
  • Programmer: Executes backend operations—file management, data processing, environment configuration—directly via Python or Bash scripts, bypassing cumbersome GUI action sequences.
  • GUI Operator: Uses a vision-language model to interact with visual interfaces when human-like UI navigation is indispensable.

This hybrid model enables CoAct-1 to strategically substitute brittle and lengthy mouse-keyboard operations with concise, reliable code execution, while still leveraging GUI interactions where necessary.

Evaluation on OSWorld: Record-Setting Performance

OSWorld—a leading benchmark featuring 369 tasks spanning office productivity, IDEs, browsers, file managers, and multi-app workflows—proves an exacting testbed for agentic systems. Each task mirrors real-world language goals and is assessed by a granular rule-based scoring system.

Results

  • Overall SOTA Success Rate: CoAct-1 achieves 60.76% on the 100+ step category—the first CUA agent to cross the 60-point threshold. This outpaces GTA-1 (53.10%), OpenAI CUA 4o (31.40%), UI-TARS-1.5 (29.60%), and other leading frameworks.
  • Stepped Allowance Performance: At a 100-step budget, CoAct-1 scores 59.93%, again leading all competitors.
  • Efficiency: Completes tasks with an average of 10.15 steps per successful task, compared to 15.22 for GTA-1, 14.90 for UI-TARS, and with much higher success than OpenAI CUA 4o, which, despite fewer steps (6.14), achieves only 31.40% success.

Breakdown

CoAct-1 dominates across task types, with especially large gains in workflows benefitting from code execution:

  • Multi-App: 47.88% (vs. GTA-1’s 38.34%)
  • OS Tasks: 75.00%
  • VLC: 66.07%
  • In productivity and IDE domains (LibreOffice Calc, Writer, VSCode), it consistently leads or ties with the SOTA.

Key Insights: What Drives CoAct-1’s Gains?

  • Coding Actions Replace Redundant GUI Sequences: For operations like batch image resizing or advanced file manipulations, single scripts replace dozens of error-prone clicks, reducing both steps and risk of failure.
  • Dynamic Delegation: The Orchestrator’s flexible task assignment ensures optimal use of coding vs. GUI actions.
  • Improvement with Stronger Backbones: The best configuration uses OpenAI CUA 4o for the GUI Operator, OpenAI o3 for the Orchestrator, and o4-mini for the Programmer, reaching the top 60.76% score. Systems using only smaller or less capable backbones score significantly lower.
  • Efficiency Correlates with Reliability: Fewer steps directly reduce opportunities for error—the single strongest predictor of successful completion.

Conclusion: A Leap Forward in Generalized Computer Automation

By making coding a first-class system action alongside GUI manipulation, CoAct-1 delivers both a quantum leap in success and efficiency, and illustrates the practical path forward for scalable, reliable autonomous computer agents. Its hybrid architecture and dynamic execution logic set a new high-water mark for the CUA field, heralding robust advances in real-world computer automation.


Check out the Paper and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription
Al, Analytics and Automation

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

June 8, 2026
Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation
Al, Analytics and Automation

Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

June 7, 2026
Best 21 Low-Code and No-Code AI Tools in 2026
Al, Analytics and Automation

Best 21 Low-Code and No-Code AI Tools in 2026

June 7, 2026
Tod Machover receives George Peabody Medal for contributions to music and technology | MIT News
Al, Analytics and Automation

Tod Machover receives George Peabody Medal for contributions to music and technology | MIT News

June 6, 2026
Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents
Al, Analytics and Automation

Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents

June 6, 2026
The crucial human component in computing and AI | MIT News
Al, Analytics and Automation

The crucial human component in computing and AI | MIT News

June 6, 2026
Next Post
Performance Marketing Definition, Principles, and Importance

Performance Marketing Definition, Principles, and Importance

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

8 Best Docker Containers for Home Servers in 2025

8 Best Docker Containers for Home Servers in 2025

July 24, 2025
Could Contact-Tracing Apps Help With the Hantavirus? Not Really

Could Contact-Tracing Apps Help With the Hantavirus? Not Really

May 11, 2026
Why your LLM bill is exploding — and how semantic caching can cut it by 73%

Why your LLM bill is exploding — and how semantic caching can cut it by 73%

January 11, 2026
Average email marketing conversion rate in 2025 (+ pro tips)

Average email marketing conversion rate in 2025 (+ pro tips)

June 3, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • We Built a Custom Workflow with the Buffer API — and Tripled Our X Impressions
  • Apple Reintroduces The AI-Powered Siri It Announced At WWDC 2024
  • How to Level-up From SEO Tactician to Search Visibility Leader
  • NotebookLM’s Gemini 3.5 upgrade adds a cloud computer and help finding sources
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions