• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, April 23, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context Length through Visual-Text Compression

Josh by Josh
October 28, 2025
in Al, Analytics and Automation
0
Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context Length through Visual-Text Compression


Can we render long texts as images and use a VLM to achieve 3–4× token compression, preserving accuracy while scaling a 128K context toward 1M-token workloads? A team of researchers from Zhipu AI release Glyph, an AI framework for scaling the context length through visual-text compression. It renders long textual sequences into images and processes them using vision–language models. The system renders ultra long text into page images, then a vision language model, VLM, processes those pages end to end. Each visual token encodes many characters, so the effective token sequence shortens, while semantics are preserved. Glyph can achieve 3-4x token compression on long text sequences without performance degradation, enabling significant gains in memory efficiency, training throughput, and inference speed.

https://arxiv.org/pdf/2510.17800

Why Glyph?

Conventional methods expand positional encodings or modify attention, compute and memory still scale with token count. Retrieval trims inputs, but risks missing evidence and adds latency. Glyph changes the representation, it converts text to images and shifts burden to a VLM that already learns OCR, layout, and reasoning. This increases information density per token, so a fixed token budget covers more original context. Under extreme compression, the research team show a 128K context VLM can address tasks that originate from 1M token level text.

https://arxiv.org/pdf/2510.17800

System design and training

The method has three stages, continual pre training, LLM driven rendering search, and post training. Continual pre training exposes the VLM to large corpora of rendered long text with diverse typography and styles. The objective aligns visual and textual representations, and transfers long context skills from text tokens to visual tokens. The rendering search is a genetic loop driven by an LLM. It mutates page size, dpi, font family, font size, line height, alignment, indent, and spacing. It evaluates candidates on a validation set to optimize accuracy and compression jointly. Post training uses supervised fine tuning and reinforcement learning with Group Relative Policy Optimization, plus an auxiliary OCR alignment task. The OCR loss improves character fidelity when fonts are small and spacing is tight.

https://arxiv.org/pdf/2510.17800

Results, performance and efficiency…

LongBench and MRCR establish accuracy and compression under long dialogue histories and document tasks. The model achieves an average effective compression ratio about 3.3 on LongBench, with some tasks near 5, and about 3.0 on MRCR. These gains scale with longer inputs, since every visual token carries more characters. Reported speedups versus the text backbone at 128K inputs are about 4.8 times for prefill, about 4.4 times for decoding, and about 2 times for supervised fine tuning throughput. The Ruler benchmark confirms that higher dpi at inference time improves scores, since crisper glyphs help OCR and layout parsing. The research team reports dpi 72 with average compression 4.0 and maximum 7.7 on specific sub tasks, dpi 96 with average compression 2.2 and maximum 4.4, and dpi 120 with average 1.2 and maximum 2.8. The 7.7 maximum belongs to Ruler, not to MRCR.

https://arxiv.org/pdf/2510.17800

So, what? Applications

Glyph benefits multimodal document understanding. Training on rendered pages improves performance on MMLongBench Doc relative to a base visual model. This indicates that the rendering objective is a useful pretext for real document tasks that include figures and layout. The main failure mode is sensitivity to aggressive typography. Very small fonts and tight spacing degrade character accuracy, especially for rare alphanumeric strings. The research team exclude the UUID subtask on Ruler. The approach assumes server side rendering and a VLM with strong OCR and layout priors.

Key Takeaways

  • Glyph renders long text into images, then a vision language model processes those pages. This reframes long-context modeling as a multimodal problem and preserves semantics while reducing tokens.
  • The research team reports token compression is 3 to 4 times with accuracy comparable to strong 8B text baselines on long-context benchmarks.
  • Prefill speedup is about 4.8 times, decoding speedup is about 4.4 times, and supervised fine tuning throughput is about 2 times, measured at 128K inputs.
  • The system uses continual pretraining on rendered pages, an LLM driven genetic search over rendering parameters, then supervised fine tuning and reinforcement learning with GRPO, plus an OCR alignment objective.
  • Evaluations include LongBench, MRCR, and Ruler, with an extreme case showing a 128K context VLM addressing 1M token level tasks. Code and model card are public on GitHub and Hugging Face.

Glyph treats long context scaling as visual text compression, it renders long sequences into images and lets a VLM process them, reducing tokens while preserving semantics. The research team claims 3 to 4 times token compression with accuracy comparable to Qwen3 8B baselines, about 4 times faster prefilling and decoding, and about 2 times faster SFT throughput. The pipeline is disciplined, continual pre training on rendered pages, an LLM genetic rendering search over typography, then post training. The approach is pragmatic for million token workloads under extreme compression, yet it depends on OCR and typography choices, which remain knobs. Overall, visual text compression offers a concrete path to scale long context while controlling compute and memory.


Check out the Paper, Weights and Repo. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



Source_link

READ ALSO

“Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures

Related Posts

“Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office
Al, Analytics and Automation

“Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office

April 23, 2026
Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures
Al, Analytics and Automation

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures

April 23, 2026
The Most Efficient Approach to Crafting Your Personal AI Productivity System
Al, Analytics and Automation

The Most Efficient Approach to Crafting Your Personal AI Productivity System

April 23, 2026
Teaching AI models to say “I’m not sure” | MIT News
Al, Analytics and Automation

Teaching AI models to say “I’m not sure” | MIT News

April 23, 2026
Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks
Al, Analytics and Automation

Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks

April 22, 2026
Inside the AI Power Move That Could Redefine Finance
Al, Analytics and Automation

Inside the AI Power Move That Could Redefine Finance

April 22, 2026
Next Post
Navigating Relocation Challenges With Confidence And Care

Navigating Relocation Challenges With Confidence And Care

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Ethics of Facial Recognition: Key Issues and Solutions

Ethics of Facial Recognition: Key Issues and Solutions

August 12, 2025
1Password Coupon: Score a Free Trial in November

1Password Coupon: Score a Free Trial in November

November 11, 2025
Beyond the terminal: Gemini CLI comes to Zed

Beyond the terminal: Gemini CLI comes to Zed

August 27, 2025
AI in the Workplace Statistics 2025–2035

AI in the Workplace Statistics 2025–2035

February 15, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0
  • “Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office
  • Victoria Beckham Debuts a Modern Wardrobe Collaboration with Gap
  • Chatbots vs Conversational AI vs Live Chat
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions