• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, March 13, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Technology And Software

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

Josh by Josh
August 2, 2025
in Technology And Software
0
New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

READ ALSO

John Solly Is the DOGE Operative Accused of Planning to Take Social Security Data to His New Job

How to watch Jensen Huang’s Nvidia GTC 2026 keynote


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


The rise in Deep Research features and other AI-powered analysis has given rise to more models and services looking to simplify that process and read more of the documents businesses actually use.Ā 

Canadian AI company Cohere is banking on its models, including a newly released visual model, to make the case that Deep Research features should also be optimized for enterprise use cases.Ā 

The company has released Command A Vision, a visual model specifically targeting enterprise use cases, built on the back of its Command A model. The 112 billion parameter model can ā€œunlock valuable insights from visual data, and make highly accurate, data-driven decisions through document optical character recognition (OCR) and image analysis,ā€ the company says.

ā€œWhether it’s interpreting product manuals with complex diagrams or analyzing photographs of real-world scenes for risk detection, Command A Vision excels at tackling the most demanding enterprise vision challenges,ā€ the company said in a blog post.Ā 


The AI Impact Series Returns to San Francisco – August 5

The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Secure your spot now – space is limited: https://bit.ly/3GuuPLF


This means Command A Vision can read and analyze the most common types of images enterprises need: graphs, charts, diagrams, scanned documents and PDFs.Ā 

? @cohere just dropped Command A Vision on @huggingface ?

Designed for enterprise multimodal use cases: interpreting product manuals, analyzing photos, asking about charts… ā“??

A 112B dense vision-language model with SOTA performance – check out the benchmark metrics in… pic.twitter.com/ORMfM5f8cF

— Jeff Boudier ? (@jeffboudier) July 31, 2025

Since it’s built on Command A’s architecture, Command A Vision requires two or fewer GPUs, just like the text model. The vision model also retains the text capabilities of Command A to read words on images and understands at least 23 languages. Cohere said that, unlike other models, Command A Vision reduces the total cost of ownership for enterprises and is fully optimized for retrieval use cases for businesses.Ā 

How Cohere is architecting Command A

Cohere said it followed a Llava architecture to build its Command A models, including the visual model. This architecture turns visual features into soft vision tokens, which can be divided into different tiles.Ā 

These tiles are passed into the Command A text tower, ā€œa dense, 111B parameters textual LLM,ā€ the company said. ā€œIn this manner, a single image consumes up to 3,328 tokens.ā€

Cohere said it trained the visual model in three stages: vision-language alignment, supervised fine-tuning (SFT) and post-training reinforcement learning with human feedback (RLHF).

ā€œThis approach enables the mapping of image encoder features to the language model embedding space,ā€ the company said. ā€œIn contrast, during the SFT stage, we simultaneously trained the vision encoder, the vision adapter and the language model on a diverse set of instruction-following multimodal tasks.ā€

Visualizing enterprise AIĀ 

Benchmark tests showed Command A Vision outperforming other models with similar visual capabilities.Ā 

Cohere pitted Command A Vision against OpenAI’s GPT 4.1, Meta’s Llama 4 Maverick, Mistral’s Pixtral Large and Mistral Medium 3 in nine benchmark tests. The company did not mention if it tested the model against Mistral’s OCR-focused API, Mistral OCR.Ā 

It enables agents to securely see inside your organization’s visual data, unlocking the automation of tedious tasks involving slides, diagrams, PDFs, and photos. pic.twitter.com/iHZnUWekrk

— cohere (@cohere) July 31, 2025

Command A Vision outscored the other models in tests such as ChartQA, OCRBench, AI2D and TextVQA. Overall, Command A Vision had an average score of 83.1% compared to GPT 4.1’s 78.6%, Llama 4 Maverick’s 80.5% and the 78.3% from Mistral Medium 3.Ā 

Most large language models (LLMs) these days are multimodal, meaning they can generate or understand visual media like photos or videos. However, enterprises generally use more graphical documents such as charts and PDFs, so extracting information from these unstructured data sources often proves difficult.Ā 

With Deep Research on the rise, the importance of bringing in models capable of reading, analyzing and even downloading unstructured data has grown.

Cohere also said it’s offering Command A Vision in an open weights system, in hopes that enterprises looking to move away from closed or proprietary models will start using its products.Ā So far, there is some interest from developers.

Very impressed at its accuracy extracting hand handwritten notes from an image!

— Adam Sardo (@sardo_adam) July 31, 2025

Finally, an AI that won’t judge my terrible doodles.

— Martha Wisener ? (@martwisener) August 1, 2025

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source_link

Related Posts

John Solly Is the DOGE Operative Accused of Planning to Take Social Security Data to His New Job
Technology And Software

John Solly Is the DOGE Operative Accused of Planning to Take Social Security Data to His New Job

March 13, 2026
How to watch Jensen Huang’s Nvidia GTC 2026 keynote
Technology And Software

How to watch Jensen Huang’s Nvidia GTC 2026 keynote

March 13, 2026
The team behind continuous batching says your idle GPUs should be running inference, not sitting dark
Technology And Software

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

March 12, 2026
AI-Powered Cybercrime Is Surging. The US Lost $16.6 Billion in 2024.
Technology And Software

AI-Powered Cybercrime Is Surging. The US Lost $16.6 Billion in 2024.

March 12, 2026
NVIDIA- and Uber-backed Nuro is testing autonomous vehicles in Tokyo
Technology And Software

NVIDIA- and Uber-backed Nuro is testing autonomous vehicles in Tokyo

March 12, 2026
Booking.com Promo Codes and Deals: Up to 20% Off
Technology And Software

Booking.com Promo Codes and Deals: Up to 20% Off

March 12, 2026
Next Post
Full Technical SEO Checklist (from Start to Finish)

Full Technical SEO Checklist (from Start to Finish)

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plansĀ 

Google announced the next step in its nuclear energy plansĀ 

August 20, 2025

EDITOR'S PICK

EU says TikTok uses ‘addictive design’ and must change

EU says TikTok uses ‘addictive design’ and must change

February 6, 2026
2025 Black Friday SMS Marketing Tips and Templates

2025 Black Friday SMS Marketing Tips and Templates

October 22, 2025
13 Ways to Get More Views on YouTube in 2025

13 Ways to Get More Views on YouTube in 2025

September 5, 2025

Branding in an era of complexity: Insights from Howard Belk

December 4, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Medical Waste Disposal: A Breakdown
  • John Solly Is the DOGE Operative Accused of Planning to Take Social Security Data to His New Job
  • Can AI help predict which heart-failure patients will worsen within a year? | MIT News
  • The AI Shift That Actually Matters: From Efficiency to Impact
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions