• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, December 2, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

Josh by Josh
September 18, 2025
in Al, Analytics and Automation
0
IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


IBM has released Granite-Docling-258M, an open-source (Apache-2.0) vision-language model designed specifically for end-to-end document conversion. The model targets layout-faithful extraction—tables, code, equations, lists, captions, and reading order—emitting a structured, machine-readable representation rather than lossy Markdown. It is available on Hugging Face with a live demo and MLX build for Apple Silicon.

What’s new compared to SmolDocling?

Granite-Docling is the product-ready successor to SmolDocling-256M. IBM replaced the earlier backbone with a Granite 165M language model and upgraded the vision encoder to SigLIP2 (base, patch16-512) while retaining the Idefics3-style connector (pixel-shuffle projector). The resulting model has 258M parameters and shows consistent accuracy gains across layout analysis, full-page OCR, code, equations, and tables (see metrics below). IBM also addressed instability failure modes observed in the preview model (e.g., repetitive token loops).

READ ALSO

Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training

MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway | MIT News

Architecture and training pipeline

  • Backbone: Idefics3-derived stack with SigLIP2 vision encoder → pixel-shuffle connector → Granite 165M LLM.
  • Training framework: nanoVLM (lightweight, pure-PyTorch VLM training toolkit).
  • Representation: Outputs DocTags, an IBM-authored markup designed for unambiguous document structure (elements + coordinates + relationships), which downstream tools convert to Markdown/HTML/JSON.
  • Compute: Trained on IBM’s Blue Vela H100 cluster.

Quantified improvements (Granite-Docling-258M vs. SmolDocling-256M preview)

Evaluated with docling-eval, LMMS-Eval, and task-specific datasets:

  • Layout: MAP 0.27 vs. 0.23; F1 0.86 vs. 0.85.
  • Full-page OCR: F1 0.84 vs. 0.80; lower edit distance.
  • Code recognition: F1 0.988 vs. 0.915; edit distance 0.013 vs. 0.114.
  • Equation recognition: F1 0.968 vs. 0.947.
  • Table recognition (FinTabNet @150dpi): TEDS-structure 0.97 vs. 0.82; TEDS with content 0.96 vs. 0.76.
  • Other benchmarks: MMStar 0.30 vs. 0.17; OCRBench 500 vs. 338.
  • Stability: “Avoids infinite loops more effectively” (production-oriented fix).

Multilingual support

Granite-Docling adds experimental support for Japanese, Arabic, and Chinese. IBM marks this as early-stage; English remains the primary target.

How the DocTags pathway changes Document AI

Conventional OCR-to-Markdown pipelines lose structural information and complicate downstream retrieval-augmented generation (RAG). Granite-Docling emits DocTags—a compact, LLM-friendly structural grammar—which Docling converts into Markdown/HTML/JSON. This preserves table topology, inline/floating math, code blocks, captions, and reading order with explicit coordinates, improving index quality and grounding for RAG and analytics.

Inference and integration

  • Docling Integration (recommended): The docling CLI/SDK automatically pulls Granite-Docling and converts PDFs/office docs/images to multiple formats. IBM positions the model as a component inside Docling pipelines rather than a general VLM.
  • Runtimes: Works with Transformers, vLLM, ONNX, and MLX; a dedicated MLX build is optimized for Apple Silicon. A Hugging Face Space provides an interactive demo (ZeroGPU).
  • License: Apache-2.0.

Why Granite-Docling?

For enterprise document AI, small VLMs that preserve structure reduce inference cost and pipeline complexity. Granite-Docling replaces multiple single-purpose models (layout, OCR, table, code, equations) with a single component that emits a richer intermediate representation, improving downstream retrieval and conversion fidelity. The measured gains—in TEDS for tables, F1 for code/equations, and reduced instability—make it a practical upgrade from SmolDocling for production workflows.

Demo

Summary

Granite-Docling-258M marks a significant advancement in compact, structure-preserving document AI. By combining IBM’s Granite backbone, SigLIP2 vision encoder, and the nanoVLM training framework, it delivers enterprise-ready performance across tables, equations, code, and multilingual text—all while remaining lightweight and open-source under Apache 2.0. With measurable gains over its SmolDocling predecessor and seamless integration into Docling pipelines, Granite-Docling provides a practical foundation for document conversion and RAG workflows where precision and reliability are critical.


Check out the Models on Hugging Face and Demo here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🔥[Recommended Read] NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI



Source_link

Related Posts

Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training
Al, Analytics and Automation

Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training

December 2, 2025
MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway | MIT News
Al, Analytics and Automation

MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway | MIT News

December 2, 2025
MiniMax-M2: Technical Deep Dive into Interleaved Thinking for Agentic Coding Workflows
Al, Analytics and Automation

MiniMax-M2: Technical Deep Dive into Interleaved Thinking for Agentic Coding Workflows

December 2, 2025
Pretrain a BERT Model from Scratch
Al, Analytics and Automation

Pretrain a BERT Model from Scratch

December 1, 2025
How to Design an Advanced Multi-Page Interactive Analytics Dashboard with Dynamic Filtering, Live KPIs, and Rich Visual Exploration Using Panel
Al, Analytics and Automation

How to Design an Advanced Multi-Page Interactive Analytics Dashboard with Dynamic Filtering, Live KPIs, and Rich Visual Exploration Using Panel

December 1, 2025
The Journey of a Token: What Really Happens Inside a Transformer
Al, Analytics and Automation

The Journey of a Token: What Really Happens Inside a Transformer

December 1, 2025
Next Post
14 Best Cheap Cloud Storage Solutions of 2025

14 Best Cheap Cloud Storage Solutions of 2025

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025

EDITOR'S PICK

AI vs. AI: Prophet Security raises $30M to replace human analysts with autonomous defenders

AI vs. AI: Prophet Security raises $30M to replace human analysts with autonomous defenders

July 30, 2025
Let’s talk about tariffs: How marketers can brace for the looming future

Let’s talk about tariffs: How marketers can brace for the looming future

June 2, 2025
New York Is Doing Its Part to Make America Worth Visiting Again

New York Is Doing Its Part to Make America Worth Visiting Again

June 19, 2025
A Complete Guide on HealthTech Regulations for CTOs

A Complete Guide on HealthTech Regulations for CTOs

July 26, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Predictive Crisis Communications Using AI and Real-Time Data
  • How To Fix Roblox Thinks You’re On Mobile
  • Arcee aims to reboot U.S. open source AI with new Trinity models released under Apache 2.0
  • Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?