• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, April 2, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

IBM Releases Granite 4.0 3B Vision: A New Vision Language Model for Enterprise Grade Document Data Extraction

Josh by Josh
April 2, 2026
in Al, Analytics and Automation
0
IBM Releases Granite 4.0 3B Vision: A New Vision Language Model for Enterprise Grade Document Data Extraction


IBM has announced the release of Granite 4.0 3B Vision, a vision-language model (VLM) engineered specifically for enterprise-grade document data extraction. Departing from the monolithic approach of larger multimodal models, the 4.0 Vision release is architected as a specialized adapter designed to bring high-fidelity visual reasoning to the Granite 4.0 Micro language backbone.

This release represents a transition toward modular, extraction-focused AI that prioritizes structured data accuracy—such as converting complex charts to code or tables to HTML—over general-purpose image captioning.

Architecture: Modular LoRA and DeepStack Integration

The Granite 4.0 3B Vision model is delivered as a LoRA (Low-Rank Adaptation) adapter with approximately 0.5B parameters. This adapter is designed to be loaded on top of the Granite 4.0 Micro base model, a 3.5B parameter dense language model. This design allows for a ‘dual-mode’ deployment: the base model can handle text-only requests independently, while the vision adapter is activated only when multimodal processing is required.

Vision Encoder and Patch Tiling

The visual component utilizes the google/siglip2-so400m-patch16-384 encoder. To maintain high resolution across diverse document layouts, the model employs a tiling mechanism. Input images are decomposed into 384×384 patches, which are processed alongside a downscaled global view of the entire image. This approach ensures that fine details—such as subscripts in formulas or small data points in charts—are preserved before they reach the language backbone.

The DeepStack Backbone

To bridge the vision and language modalities, IBM utilizes a variant of the DeepStack architecture. This involves deeply stacking visual tokens into the language model across 8 specific injection points. By routing visual features into multiple layers of the transformer, the model achieves a tighter alignment between the ‘what’ (semantic content) and the ‘where’ (spatial layout), which is critical for maintaining structure during document parsing.

Training Curriculum: Focused on Chart and Table Extraction

The training of Granite 4.0 3B Vision reflects a strategic shift toward specialized extraction tasks. Rather than relying solely on general image-text datasets, IBM utilized a curated mixture of instruction-following data focused on complex document structures.

  • ChartNet Dataset: The model was refined using ChartNet, a million-scale multimodal dataset designed for robust chart understanding.
  • Code-Guided Pipeline: A key technical highlight of the training involves a “code-guided” approach for chart reasoning. This pipeline uses aligned data consisting of the original plotting code, the resulting rendered image, and the underlying data table, allowing the model to learn the structural relationship between visual representations and their source data.
  • Extraction Tuning: The model was fine-tuned on a mixture of datasets focusing on Key-Value Pair (KVP) extraction, table structure recognition, and converting visual charts into machine-readable formats like CSV, JSON, and OTSL.

Performance and Evaluation Benchmarks

In technical evaluations, Granite 4.0 3B Vision has been benchmarked against several industry-standard suites for document understanding. It is important to note that datasets like PubTables-v2 and OmniDocBench are utilized as evaluation benchmarks to verify the model’s zero-shot performance in real-world scenarios.

Task Evaluation Benchmark Metric
KVP Extraction VAREX 85.5% Exact Match (Zero-Shot)
Chart Reasoning ChartNet (Human-Verified Test Set) High Accuracy in Chart2Summary
Table Extraction TableVQA-Bench & OmniDocBench Evaluated via TEDS and HTML extraction

The model currently ranks 3rd among models in the 2–4B parameter class on the VAREX leaderboard (as of March 2026), demonstrating its efficiency in structured extraction despite its compact size.

https://huggingface.co/blog/ibm-granite/granite-4-vision
https://huggingface.co/blog/ibm-granite/granite-4-vision

Key Takeaways

  • Modular LoRA Architecture: The model is a 0.5B parameter LoRA adapter that operates on the Granite 4.0 Micro (3.5B) backbone. This design allows a single deployment to handle text-only workloads efficiently while activating vision capabilities only when needed.
  • High-Resolution Tiling: Utilizing the google/siglip2-so400m-patch16-384 encoder, the model processes images by tiling them into 384×384 patches alongside a global downscaled view, ensuring that fine details in complex documents are preserved.
  • DeepStack Injection: To improve layout awareness, the model uses a DeepStack approach with 8 injection points. This routes semantic features to earlier layers and spatial details to later layers, which is critical for accurate table and chart extraction.
  • Specialized Extraction Training: Beyond general instruction following, the model was refined using ChartNet and a ‘code-guided’ pipeline that aligns plotting code, images, and data tables to help the model internalize the logic of visual data structures.
  • Developer-Ready Integration: The release is Apache 2.0 licensed and features native support for vLLM (via a custom model implementation) and Docling, IBM’s tool for converting unstructured PDFs into machine-readable JSON or HTML.

Check out the Technical details and Model Weight.  Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


READ ALSO

Silicon Dreams Meet Real-World Rules: The AI Gold Rush Hits Its First Wall

Evaluating the ethics of autonomous systems | MIT News



Source_link

Related Posts

Silicon Dreams Meet Real-World Rules: The AI Gold Rush Hits Its First Wall
Al, Analytics and Automation

Silicon Dreams Meet Real-World Rules: The AI Gold Rush Hits Its First Wall

April 2, 2026
Evaluating the ethics of autonomous systems | MIT News
Al, Analytics and Automation

Evaluating the ethics of autonomous systems | MIT News

April 2, 2026
Z.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows Everywhere
Al, Analytics and Automation

Z.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows Everywhere

April 2, 2026
5 Production Scaling Challenges for Agentic AI in 2026
Al, Analytics and Automation

5 Production Scaling Challenges for Agentic AI in 2026

April 1, 2026
Preview tool helps makers visualize 3D-printed objects | MIT News
Al, Analytics and Automation

Preview tool helps makers visualize 3D-printed objects | MIT News

April 1, 2026
Al, Analytics and Automation

How to Build and Evolve a Custom OpenAI Agent with A-Evolve Using Benchmarks, Skills, Memory, and Workspace Mutations

April 1, 2026
Next Post
Microsoft launches 3 new AI models in direct shot at OpenAI and Google

Microsoft launches 3 new AI models in direct shot at OpenAI and Google

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

NotebookLM introduces curated featured notebooks with partners

NotebookLM introduces curated featured notebooks with partners

July 15, 2025
App Marketing Guide for 2025

Mobile Marketing for SaaS & Subscription Brands

March 30, 2026
AI Creative Tools for Google Ads

AI Creative Tools for Google Ads

September 11, 2025
How We Learn Step-Level Rewards from Preferences to Solve Sparse-Reward Environments Using Online Process Reward Learning

How We Learn Step-Level Rewards from Preferences to Solve Sparse-Reward Environments Using Online Process Reward Learning

December 4, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Do They Help with SEO?
  • Microsoft launches 3 new AI models in direct shot at OpenAI and Google
  • IBM Releases Granite 4.0 3B Vision: A New Vision Language Model for Enterprise Grade Document Data Extraction
  • My Hands-On Review of Synthesia vs. HeyGen: Here’s the Winner
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions