• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Wednesday, October 29, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context Length through Visual-Text Compression

Josh by Josh
October 28, 2025
in Al, Analytics and Automation
0
Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context Length through Visual-Text Compression
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


Can we render long texts as images and use a VLM to achieve 3–4× token compression, preserving accuracy while scaling a 128K context toward 1M-token workloads? A team of researchers from Zhipu AI release Glyph, an AI framework for scaling the context length through visual-text compression. It renders long textual sequences into images and processes them using vision–language models. The system renders ultra long text into page images, then a vision language model, VLM, processes those pages end to end. Each visual token encodes many characters, so the effective token sequence shortens, while semantics are preserved. Glyph can achieve 3-4x token compression on long text sequences without performance degradation, enabling significant gains in memory efficiency, training throughput, and inference speed.

https://arxiv.org/pdf/2510.17800

Why Glyph?

Conventional methods expand positional encodings or modify attention, compute and memory still scale with token count. Retrieval trims inputs, but risks missing evidence and adds latency. Glyph changes the representation, it converts text to images and shifts burden to a VLM that already learns OCR, layout, and reasoning. This increases information density per token, so a fixed token budget covers more original context. Under extreme compression, the research team show a 128K context VLM can address tasks that originate from 1M token level text.

https://arxiv.org/pdf/2510.17800

System design and training

The method has three stages, continual pre training, LLM driven rendering search, and post training. Continual pre training exposes the VLM to large corpora of rendered long text with diverse typography and styles. The objective aligns visual and textual representations, and transfers long context skills from text tokens to visual tokens. The rendering search is a genetic loop driven by an LLM. It mutates page size, dpi, font family, font size, line height, alignment, indent, and spacing. It evaluates candidates on a validation set to optimize accuracy and compression jointly. Post training uses supervised fine tuning and reinforcement learning with Group Relative Policy Optimization, plus an auxiliary OCR alignment task. The OCR loss improves character fidelity when fonts are small and spacing is tight.

https://arxiv.org/pdf/2510.17800

Results, performance and efficiency…

LongBench and MRCR establish accuracy and compression under long dialogue histories and document tasks. The model achieves an average effective compression ratio about 3.3 on LongBench, with some tasks near 5, and about 3.0 on MRCR. These gains scale with longer inputs, since every visual token carries more characters. Reported speedups versus the text backbone at 128K inputs are about 4.8 times for prefill, about 4.4 times for decoding, and about 2 times for supervised fine tuning throughput. The Ruler benchmark confirms that higher dpi at inference time improves scores, since crisper glyphs help OCR and layout parsing. The research team reports dpi 72 with average compression 4.0 and maximum 7.7 on specific sub tasks, dpi 96 with average compression 2.2 and maximum 4.4, and dpi 120 with average 1.2 and maximum 2.8. The 7.7 maximum belongs to Ruler, not to MRCR.

https://arxiv.org/pdf/2510.17800

So, what? Applications

Glyph benefits multimodal document understanding. Training on rendered pages improves performance on MMLongBench Doc relative to a base visual model. This indicates that the rendering objective is a useful pretext for real document tasks that include figures and layout. The main failure mode is sensitivity to aggressive typography. Very small fonts and tight spacing degrade character accuracy, especially for rare alphanumeric strings. The research team exclude the UUID subtask on Ruler. The approach assumes server side rendering and a VLM with strong OCR and layout priors.

Key Takeaways

  • Glyph renders long text into images, then a vision language model processes those pages. This reframes long-context modeling as a multimodal problem and preserves semantics while reducing tokens.
  • The research team reports token compression is 3 to 4 times with accuracy comparable to strong 8B text baselines on long-context benchmarks.
  • Prefill speedup is about 4.8 times, decoding speedup is about 4.4 times, and supervised fine tuning throughput is about 2 times, measured at 128K inputs.
  • The system uses continual pretraining on rendered pages, an LLM driven genetic search over rendering parameters, then supervised fine tuning and reinforcement learning with GRPO, plus an OCR alignment objective.
  • Evaluations include LongBench, MRCR, and Ruler, with an extreme case showing a 128K context VLM addressing 1M token level tasks. Code and model card are public on GitHub and Hugging Face.

Glyph treats long context scaling as visual text compression, it renders long sequences into images and lets a VLM process them, reducing tokens while preserving semantics. The research team claims 3 to 4 times token compression with accuracy comparable to Qwen3 8B baselines, about 4 times faster prefilling and decoding, and about 2 times faster SFT throughput. The pipeline is disciplined, continual pre training on rendered pages, an LLM genetic rendering search over typography, then post training. The approach is pragmatic for million token workloads under extreme compression, yet it depends on OCR and typography choices, which remain knobs. Overall, visual text compression offers a concrete path to scale long context while controlling compute and memory.


Check out the Paper, Weights and Repo. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



Source_link

READ ALSO

The AI Revolution Turning Creativity into a Conversation

3 Ways to Speed Up Model Training Without More GPUs

Related Posts

The AI Revolution Turning Creativity into a Conversation
Al, Analytics and Automation

The AI Revolution Turning Creativity into a Conversation

October 28, 2025
3 Ways to Speed Up Model Training Without More GPUs
Al, Analytics and Automation

3 Ways to Speed Up Model Training Without More GPUs

October 28, 2025
YouTube’s New “Likeness Detector” Takes Aim at Deepfakes — But Is It Enough to Stop the Imitation Game?
Al, Analytics and Automation

YouTube’s New “Likeness Detector” Takes Aim at Deepfakes — But Is It Enough to Stop the Imitation Game?

October 28, 2025
Meet ‘kvcached’: A Machine Learning Library to Enable Virtualized, Elastic KV Cache for LLM Serving on Shared GPUs
Al, Analytics and Automation

Meet ‘kvcached’: A Machine Learning Library to Enable Virtualized, Elastic KV Cache for LLM Serving on Shared GPUs

October 28, 2025
The Complete Guide to Vector Databases for Machine Learning
Al, Analytics and Automation

The Complete Guide to Vector Databases for Machine Learning

October 27, 2025
Data Annotation for Autonomous Vehicles – Self-Driving Car Labeling Services
Al, Analytics and Automation

Data Annotation for Autonomous Vehicles – Self-Driving Car Labeling Services

October 27, 2025
Next Post
Navigating Relocation Challenges With Confidence And Care

Navigating Relocation Challenges With Confidence And Care

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025

EDITOR'S PICK

Amazon Affiliate Program Overview (for Pinterest Creators)

August 21, 2025
REST: A Stress-Testing Framework for Evaluating Multi-Problem Reasoning in Large Reasoning Models

REST: A Stress-Testing Framework for Evaluating Multi-Problem Reasoning in Large Reasoning Models

July 27, 2025
Tim Cook reportedly tells employees Apple ‘must’ win in AI

Tim Cook reportedly tells employees Apple ‘must’ win in AI

August 3, 2025
How to Find Trending Reels on Instagram (+9 of the Best)

How to Find Trending Reels on Instagram (+9 of the Best)

July 21, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Tata Motors confirms it fixed security flaws, which exposed company and customer data
  • How AI Unlocks the Future of Account-Based Marketing
  • Differentiating Brands With Generosity, Surprise, And Delight
  • Transforming Patient Care and Hospital Efficiency
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?