• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, March 16, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Qwen Team Introduces Qwen-Image-Edit: The Image Editing Version of Qwen-Image with Advanced Capabilities for Semantic and Appearance Editing

Josh by Josh
August 18, 2025
in Al, Analytics and Automation
0
Qwen Team Introduces Qwen-Image-Edit: The Image Editing Version of Qwen-Image with Advanced Capabilities for Semantic and Appearance Editing


In the domain of multimodal AI, instruction-based image editing models are transforming how users interact with visual content. Just released in August 2025 by Alibaba’s Qwen Team, Qwen-Image-Edit builds on the 20B-parameter Qwen-Image foundation to deliver advanced editing capabilities. This model excels in semantic editing (e.g., style transfer and novel view synthesis) and appearance editing (e.g., precise object modifications), while preserving Qwen-Image’s strength in complex text rendering for both English and Chinese. Integrated with Qwen Chat and available via Hugging Face, it lowers barriers for professional content creation, from IP design to error correction in generated artwork.

Architecture and Key Innovations

Qwen-Image-Edit extends the Multimodal Diffusion Transformer (MMDiT) architecture of Qwen-Image, which comprises a Qwen2.5-VL multimodal large language model (MLLM) for text conditioning, a Variational AutoEncoder (VAE) for image tokenization, and the MMDiT backbone for joint modeling. For editing, it introduces dual encoding: the input image is processed by Qwen2.5-VL for high-level semantic features and the VAE for low-level reconstructive details, concatenated in the MMDiT’s image stream. This enables balanced semantic coherence (e.g., maintaining object identity during pose changes) and visual fidelity (e.g., preserving unmodified regions).

READ ALSO

Moonshot AI Releases π‘¨π’•π’•π’†π’π’•π’Šπ’π’ π‘Ήπ’†π’”π’Šπ’…π’–π’‚π’π’” to Replace Fixed Residual Mixing with Depth-Wise Attention for Better Scaling in Transformers

A Coding Implementation to Design an Enterprise AI Governance System Using OpenClaw Gateway Policy Engines, Approval Workflows and Auditable Agent Execution

The Multimodal Scalable RoPE (MSRoPE) positional encoding is augmented with a frame dimension to differentiate pre- and post-edit images, supporting tasks like text-image-to-image (TI2I) editing. The VAE, fine-tuned on text-rich data, achieves superior reconstruction with 33.42 PSNR on general images and 36.63 on text-heavy ones, outperforming FLUX-VAE and SD-3.5-VAE. These enhancements allow Qwen-Image-Edit to handle bilingual text edits while retaining original font, size, and style.

Key Features of Qwen-Image-Edit

  • Semantic and Appearance Editing: Supports low-level visual appearance editing (e.g., adding, removing, or modifying elements while keeping other regions unchanged) and high-level visual semantic editing (e.g., IP creation, object rotation, and style transfer, allowing pixel changes with semantic consistency).
  • Precise Text Editing: Enables bilingual (Chinese and English) text editing, including direct addition, deletion, and modification of text in images, while preserving the original font, size, and style.
  • Strong Benchmark Performance: Achieves state-of-the-art results on multiple public benchmarks for image editing tasks, positioning it as a robust foundation model for generation and manipulation.

Training and Data Pipeline

Leveraging Qwen-Image’s curated dataset of billions of image-text pairs across Nature (55%), Design (27%), People (13%), and Synthetic (5%) domains, Qwen-Image-Edit employs a multi-task training paradigm unifying T2I, I2I, and TI2I objectives. A seven-stage filtering pipeline refines data for quality and balance, incorporating synthetic text rendering strategies (Pure, Compositional, Complex) to address long-tail issues in Chinese characters.

Training uses flow matching with a Producer-Consumer framework for scalability, followed by supervised fine-tuning and reinforcement learning (DPO and GRPO) for preference alignment. For editing-specific tasks, it integrates novel view synthesis and depth estimation, using DepthPro as a teacher model. This results in robust performance, such as correcting calligraphy errors through chained edits.

Advanced Editing Capabilities

Qwen-Image-Edit shines in semantic editing, enabling IP creation like generating MBTI-themed emojis from a mascot (e.g., Capybara) while preserving character consistency. It supports 180-degree novel view synthesis, rotating objects or scenes with high fidelity, achieving 15.11 PSNR on GSOβ€”surpassing specialized models like CRM. Style transfer transforms portraits into artistic forms, such as Studio Ghibli, maintaining semantic integrity.

For appearance editing, it adds elements like signboards with realistic reflections or removes fine details like hair strands without altering surroundings. Bilingual text editing is precise: changing β€œHope” to β€œQwen” on posters or correcting Chinese characters in calligraphy via bounding boxes. Chained editing allows iterative corrections, e.g., fixing β€œη¨½β€ step-by-step until accurate.

Benchmark Results and Evaluations

Qwen-Image-Edit leads editing benchmarks, scoring 7.56 overall on GEdit-Bench-EN and 7.52 on CN, outperforming GPT Image 1 (7.53 EN, 7.30 CN) and FLUX.1 Kontext [Pro] (6.56 EN, 1.23 CN). On ImgEdit, it achieves 4.27 overall, excelling in tasks like object replacement (4.66) and style changes (4.81). Depth estimation yields 0.078 AbsRel on KITTI, competitive with DepthAnything v2.

Human evaluations on AI Arena position its base model third among APIs, with strong text rendering advantages. These metrics highlight its superiority in instruction-following and multilingual fidelity.

Deployment and Practical Usage

Qwen-Image-Edit is deployable via Hugging Face Diffusers:

from diffusers import QwenImageEditPipeline
import torch
from PIL import Image

pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit")
pipeline.to(torch.bfloat16).to("cuda")

image = Image.open("input.png").convert("RGB")
prompt = "Change the rabbit's color to purple, with a flash light background."
output = pipeline(image=image, prompt=prompt, num_inference_steps=50, true_cfg_scale=4.0).images
output.save("output.png")

Alibaba Cloud’s Model Studio offers API access for scalable inference. Licensed under Apache 2.0, the GitHub repository provides training code.

Future Implications

Qwen-Image-Edit advances vision-language interfaces, enabling seamless content manipulation for creators. Its unified approach to understanding and generation suggests potential extensions to video and 3D, fostering innovative applications in AI-driven design.


Check out theΒ Technical Details, Models on Hugging Face and Try the Chat here. Feel free to check out ourΒ GitHub Page for Tutorials, Codes and Notebooks.Β Also,Β feel free to follow us onΒ TwitterΒ and don’t forget to join ourΒ 100k+ ML SubRedditΒ and Subscribe toΒ our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Moonshot AI Releases π‘¨π’•π’•π’†π’π’•π’Šπ’π’ π‘Ήπ’†π’”π’Šπ’…π’–π’‚π’π’” to Replace Fixed Residual Mixing with Depth-Wise Attention for Better Scaling in Transformers
Al, Analytics and Automation

Moonshot AI Releases π‘¨π’•π’•π’†π’π’•π’Šπ’π’ π‘Ήπ’†π’”π’Šπ’…π’–π’‚π’π’” to Replace Fixed Residual Mixing with Depth-Wise Attention for Better Scaling in Transformers

March 16, 2026
A Coding Implementation to Design an Enterprise AI Governance System Using OpenClaw Gateway Policy Engines, Approval Workflows and Auditable Agent Execution
Al, Analytics and Automation

A Coding Implementation to Design an Enterprise AI Governance System Using OpenClaw Gateway Policy Engines, Approval Workflows and Auditable Agent Execution

March 16, 2026
SoulSpark Chatbot Review: Key Features & Pricing
Al, Analytics and Automation

SoulSpark Chatbot Review: Key Features & Pricing

March 15, 2026
LangChain Releases Deep Agents: A Structured Runtime for Planning, Memory, and Context Isolation in Multi-Step AI Agents
Al, Analytics and Automation

LangChain Releases Deep Agents: A Structured Runtime for Planning, Memory, and Context Isolation in Multi-Step AI Agents

March 15, 2026
Influencer Marketing in Numbers: Key Stats
Al, Analytics and Automation

Influencer Marketing in Numbers: Key Stats

March 15, 2026
How to Build Type-Safe, Schema-Constrained, and Function-Driven LLM Pipelines Using Outlines and Pydantic
Al, Analytics and Automation

How to Build Type-Safe, Schema-Constrained, and Function-Driven LLM Pipelines Using Outlines and Pydantic

March 15, 2026
Next Post
Nvidia’s open Nemotron-Nano-9B-v2 has toggle on/off reasoning

Nvidia's open Nemotron-Nano-9B-v2 has toggle on/off reasoning

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plansΒ 

Google announced the next step in its nuclear energy plansΒ 

August 20, 2025

EDITOR'S PICK

How to Get LLMs to Mention Your Brand

How to Get LLMs to Mention Your Brand

August 4, 2025

The Scoop: McDonald’s reassures DEI policies still central to organization amid nationwide boycott

June 26, 2025
Early sales on Apple TV+, MasterClass, Fubo, Rosetta Stone and more

Early sales on Apple TV+, MasterClass, Fubo, Rosetta Stone and more

November 23, 2025
How 8 Companies Are Evolving Their SEO Strategies

How 8 Companies Are Evolving Their SEO Strategies

August 11, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Craft Food Strawberry Snowball Recipe
  • Walmart-backed PhonePe shelves IPO as global tensions rattle markets
  • Moonshot AI Releases π‘¨π’•π’•π’†π’π’•π’Šπ’π’ π‘Ήπ’†π’”π’Šπ’…π’–π’‚π’π’” to Replace Fixed Residual Mixing with Depth-Wise Attention for Better Scaling in Transformers
  • The New Rules of Enterprise Marketing Operations
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions