• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, June 9, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing

Josh by Josh
August 17, 2025
in Al, Analytics and Automation
0
Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing






dots.ocr is an open-source vision-language transformer model developed for multilingual document layout parsing and optical character recognition (OCR). It performs both layout detection and content recognition within a single architecture, supporting over 100 languages and a wide variety of structured and unstructured document types.

Architecture

  • Unified Model: dots.ocr combines layout detection and content recognition into a single transformer-based neural network. This eliminates the complexity of separate detection and OCR pipelines, allowing users to switch tasks by adjusting input prompts.
  • Parameters: The model contains 1.7 billion parameters, balancing computational efficiency with performance for most practical scenarios.
  • Input Flexibility: Inputs can be image files or PDF documents. The model features preprocessing options (such as fitz_preprocess) for optimizing quality on low-resolution or dense multi-page files.

Capabilities

  • Multilingual: dots.ocr is trained on datasets spanning more than 100 languages, including major world languages and less common scripts, reflecting broad multilingual support.
  • Content Extraction: The model extracts plain text, tabular data, mathematical formulas (in LaTeX), and preserves reading order within documents. Output formats include structured JSON, Markdown, and HTML, depending on the layout and content type.
  • Preserves Structure: dots.ocr maintains document structure, including table boundaries, formula regions, and image placements, ensuring extracted data remains faithful to the original document.

Benchmark Performance

dots.ocr has been evaluated against modern document AI systems, with results summarized below:

Benchmark dots.ocr Gemini2.5-Pro
Table TEDS accuracy 88.6% 85.8%
Text edit distance 0.032 0.055
  • Tables: Outperforms Gemini2.5-Pro in table parsing accuracy.
  • Text: Demonstrates lower text edit distance (indicating higher precision).
  • Formulas and Layout: Matches or exceeds leading models in formula recognition and document structure reconstruction.
https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md

Deployment and Integration

  • Open-Source: Released under the MIT license, with source, documentation, and pre-trained models available on GitHub. The repository provides installation instructions for pip, Conda, and Docker-based deployments.
  • API and Scripting: Supports flexible task configuration via prompt templates. The model can be used interactively or within automated pipelines for batch document processing.
  • Output Formats: Extracted results are supplied in structured JSON for programmatic use, with options for Markdown and HTML where appropriate. Visualization scripts enable inspection of detected layouts.

Conclusion

dots.ocr provides a technical solution for high-accuracy, multilingual document parsing by unifying layout detection and content recognition in a single, open-source model. It is particularly suited for scenarios requiring robust, language-agnostic document analysis and structured information extraction in resource-constrained or production environments.


Check out the GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.






Previous articleAmazon Unveils Bedrock AgentCore Gateway: Redefining Enterprise AI Agent Tool Integration




Source_link

READ ALSO

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

Related Posts

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset
Al, Analytics and Automation

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

June 8, 2026
Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription
Al, Analytics and Automation

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

June 8, 2026
Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation
Al, Analytics and Automation

Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

June 7, 2026
Best 21 Low-Code and No-Code AI Tools in 2026
Al, Analytics and Automation

Best 21 Low-Code and No-Code AI Tools in 2026

June 7, 2026
Tod Machover receives George Peabody Medal for contributions to music and technology | MIT News
Al, Analytics and Automation

Tod Machover receives George Peabody Medal for contributions to music and technology | MIT News

June 6, 2026
Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents
Al, Analytics and Automation

Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents

June 6, 2026
Next Post
5 Best Modern Virtualization Solutions (Updated)

5 Best Modern Virtualization Solutions (Updated)

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Google updating its ‘G’ icon for the first time in 10 years

Google updating its ‘G’ icon for the first time in 10 years

May 31, 2025
My Top 9 Free File Recovery Software Picks for 2026

My Top 9 Free File Recovery Software Picks for 2026

December 31, 2025
Save $60 on the DJI Mic Mini Kit—Price Drops to Just $109 Today

Save $60 on the DJI Mic Mini Kit—Price Drops to Just $109 Today

September 9, 2025
My Picks for Legal Teams

My Picks for Legal Teams

December 27, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • The Stella Artois Clay Bar, Maple Street’s Biscuit Blaster
  • The Scoop: Tim Cook makes a play for his legacy at final WWDC
  • 12 best online reputation management tools for 2026
  • Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions