• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, January 18, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation

Josh by Josh
June 7, 2025
in Al, Analytics and Automation
0
ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter


Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively.

Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge.

READ ALSO

How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks

Google AI Releases TranslateGemma: A New Family of Open Translation Models Built on Gemma 3 with Support for 55 Languages

Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows.

Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner.

The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity.

The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models.

By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.


Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.



Source_link

Related Posts

How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks
Al, Analytics and Automation

How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks

January 17, 2026
Google AI Releases TranslateGemma: A New Family of Open Translation Models Built on Gemma 3 with Support for 55 Languages
Al, Analytics and Automation

Google AI Releases TranslateGemma: A New Family of Open Translation Models Built on Gemma 3 with Support for 55 Languages

January 17, 2026
Pricing Overview and Feature Breakdown
Al, Analytics and Automation

Pricing Overview and Feature Breakdown

January 17, 2026
Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence
Al, Analytics and Automation

Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence

January 16, 2026
UK Regulator Keeps X Under Pressure
Al, Analytics and Automation

UK Regulator Keeps X Under Pressure

January 16, 2026
Al, Analytics and Automation

How to Build a Safe, Autonomous Prior Authorization Agent for Healthcare Revenue Cycle Management with Human-in-the-Loop Controls

January 16, 2026
Next Post
Why Advertisers Are Betting Big on YouTube Connected TV

Why Advertisers Are Betting Big on YouTube Connected TV

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

How to Unlock Tier 2 Seed Shop in Grow a Garden

How to Unlock Tier 2 Seed Shop in Grow a Garden

August 31, 2025
Oracle AI World 2025 | Marketing Cube

Oracle AI World 2025 | Marketing Cube

October 29, 2025

Building brand reputation in the age of AI

September 24, 2025
How to Migrate Azure File Shares with Zero Downtime

How to Migrate Azure File Shares with Zero Downtime

September 19, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How I Do Content Gap Analysis SEO With Ahrefs
  • How to Fix Server Failed to Boot in Hytale
  • Musk’s Starlink in Iran only works if things don’t go wrong in outer space
  • How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?