• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, June 14, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

Josh by Josh
September 18, 2025
in Al, Analytics and Automation
0
IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model


IBM has released Granite-Docling-258M, an open-source (Apache-2.0) vision-language model designed specifically for end-to-end document conversion. The model targets layout-faithful extraction—tables, code, equations, lists, captions, and reading order—emitting a structured, machine-readable representation rather than lossy Markdown. It is available on Hugging Face with a live demo and MLX build for Apple Silicon.

What’s new compared to SmolDocling?

Granite-Docling is the product-ready successor to SmolDocling-256M. IBM replaced the earlier backbone with a Granite 165M language model and upgraded the vision encoder to SigLIP2 (base, patch16-512) while retaining the Idefics3-style connector (pixel-shuffle projector). The resulting model has 258M parameters and shows consistent accuracy gains across layout analysis, full-page OCR, code, equations, and tables (see metrics below). IBM also addressed instability failure modes observed in the preview model (e.g., repetitive token loops).

READ ALSO

Databricks Open-Sources Omnigent: A Meta-Harness That Composes, Governs, and Shares AI Agents Across Claude Code, Codex, and Pi

Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient

Architecture and training pipeline

  • Backbone: Idefics3-derived stack with SigLIP2 vision encoder → pixel-shuffle connector → Granite 165M LLM.
  • Training framework: nanoVLM (lightweight, pure-PyTorch VLM training toolkit).
  • Representation: Outputs DocTags, an IBM-authored markup designed for unambiguous document structure (elements + coordinates + relationships), which downstream tools convert to Markdown/HTML/JSON.
  • Compute: Trained on IBM’s Blue Vela H100 cluster.

Quantified improvements (Granite-Docling-258M vs. SmolDocling-256M preview)

Evaluated with docling-eval, LMMS-Eval, and task-specific datasets:

  • Layout: MAP 0.27 vs. 0.23; F1 0.86 vs. 0.85.
  • Full-page OCR: F1 0.84 vs. 0.80; lower edit distance.
  • Code recognition: F1 0.988 vs. 0.915; edit distance 0.013 vs. 0.114.
  • Equation recognition: F1 0.968 vs. 0.947.
  • Table recognition (FinTabNet @150dpi): TEDS-structure 0.97 vs. 0.82; TEDS with content 0.96 vs. 0.76.
  • Other benchmarks: MMStar 0.30 vs. 0.17; OCRBench 500 vs. 338.
  • Stability: “Avoids infinite loops more effectively” (production-oriented fix).

Multilingual support

Granite-Docling adds experimental support for Japanese, Arabic, and Chinese. IBM marks this as early-stage; English remains the primary target.

How the DocTags pathway changes Document AI

Conventional OCR-to-Markdown pipelines lose structural information and complicate downstream retrieval-augmented generation (RAG). Granite-Docling emits DocTags—a compact, LLM-friendly structural grammar—which Docling converts into Markdown/HTML/JSON. This preserves table topology, inline/floating math, code blocks, captions, and reading order with explicit coordinates, improving index quality and grounding for RAG and analytics.

Inference and integration

  • Docling Integration (recommended): The docling CLI/SDK automatically pulls Granite-Docling and converts PDFs/office docs/images to multiple formats. IBM positions the model as a component inside Docling pipelines rather than a general VLM.
  • Runtimes: Works with Transformers, vLLM, ONNX, and MLX; a dedicated MLX build is optimized for Apple Silicon. A Hugging Face Space provides an interactive demo (ZeroGPU).
  • License: Apache-2.0.

Why Granite-Docling?

For enterprise document AI, small VLMs that preserve structure reduce inference cost and pipeline complexity. Granite-Docling replaces multiple single-purpose models (layout, OCR, table, code, equations) with a single component that emits a richer intermediate representation, improving downstream retrieval and conversion fidelity. The measured gains—in TEDS for tables, F1 for code/equations, and reduced instability—make it a practical upgrade from SmolDocling for production workflows.

Demo

Summary

Granite-Docling-258M marks a significant advancement in compact, structure-preserving document AI. By combining IBM’s Granite backbone, SigLIP2 vision encoder, and the nanoVLM training framework, it delivers enterprise-ready performance across tables, equations, code, and multilingual text—all while remaining lightweight and open-source under Apache 2.0. With measurable gains over its SmolDocling predecessor and seamless integration into Docling pipelines, Granite-Docling provides a practical foundation for document conversion and RAG workflows where precision and reliability are critical.


Check out the Models on Hugging Face and Demo here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🔥[Recommended Read] NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI



Source_link

Related Posts

Databricks Open-Sources Omnigent: A Meta-Harness That Composes, Governs, and Shares AI Agents Across Claude Code, Codex, and Pi
Al, Analytics and Automation

Databricks Open-Sources Omnigent: A Meta-Harness That Composes, Governs, and Shares AI Agents Across Claude Code, Codex, and Pi

June 14, 2026
Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient
Al, Analytics and Automation

Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient

June 14, 2026
How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing
Al, Analytics and Automation

How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing

June 14, 2026
The Roadmap for Mastering LLMOps in 2026
Al, Analytics and Automation

The Roadmap for Mastering LLMOps in 2026

June 13, 2026
When it comes to predicting people’s preferences, it pays to consider “the power of three” | MIT News
Al, Analytics and Automation

When it comes to predicting people’s preferences, it pays to consider “the power of three” | MIT News

June 13, 2026
Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6
Al, Analytics and Automation

Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6

June 13, 2026
Next Post
14 Best Cheap Cloud Storage Solutions of 2025

14 Best Cheap Cloud Storage Solutions of 2025

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

20 SEO Automation Tools Worth Using in 2026

20 SEO Automation Tools Worth Using in 2026

May 15, 2026
Supercharge your integration workflow with the Google Pay & Wallet Developer MCP server

Supercharge your integration workflow with the Google Pay & Wallet Developer MCP server

May 29, 2026
The Role of AI Virtual Health Assistant in Healthcare

The Role of AI Virtual Health Assistant in Healthcare

July 7, 2025
eCommerce Logistics for Australian Businesses: Key Strategies

eCommerce Logistics for Australian Businesses: Key Strategies

August 5, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to Choose a Crisis Management PR Agency
  • Why communicators are trading employee engagement for employee experience
  • Age, Gender, and Placement Restrictions
  • As AI companies race to go public, who else is along for the ride?
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions