• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, April 12, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Liquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMs

Josh by Josh
February 25, 2026
in Al, Analytics and Automation
0


The generative AI race has long been a game of ‘bigger is better.’ But as the industry hits the limits of power consumption and memory bottlenecks, the conversation is shifting from raw parameter counts to architectural efficiency. Liquid AI team is leading this charge with the release of LFM2-24B-A2B, a 24-billion parameter model that redefines what we should expect from edge-capable AI.

https://www.liquid.ai/blog/lfm2-24b-a2b

The ‘A2B’ Architecture: A 1:3 Ratio for Efficiency

The ‘A2B’ in the model’s name stands for Attention-to-Base. In a traditional Transformer, every layer uses Softmax Attention, which scales quadratically (O(N2)) with sequence length. This leads to massive KV (Key-Value) caches that devour VRAM.

Liquid AI team bypasses this by using a hybrid structure. The ‘Base‘ layers are efficient gated short convolution blocks, while the ‘Attention‘ layers utilize Grouped Query Attention (GQA).

In the LFM2-24B-A2B configuration, the model uses a 1:3 ratio:

  • Total Layers: 40
  • Convolution Blocks: 30
  • Attention Blocks: 10

By interspersing a small number of GQA blocks with a majority of gated convolution layers, the model retains the high-resolution retrieval and reasoning of a Transformer while maintaining the fast prefill and low memory footprint of a linear-complexity model.

Sparse MoE: 24B Intelligence on a 2B Budget

The most important thing of LFM2-24B-A2B is its Mixture of Experts (MoE) design. While the model contains 24 billion parameters, it only activates 2.3 billion parameters per token.

This is a game-changer for deployment. Because the active parameter path is so lean, the model can fit into 32GB of RAM. This means it can run locally on high-end consumer laptops, desktops with integrated GPUs (iGPUs), and dedicated NPUs without needing a data-center-grade A100. It effectively provides the knowledge density of a 24B model with the inference speed and energy efficiency of a 2B model.

https://www.liquid.ai/blog/lfm2-24b-a2b

Benchmarks: Punching Up

Liquid AI team reports that the LFM2 family follows a predictable, log-linear scaling behavior. Despite its smaller active parameter count, the 24B-A2B model consistently outperforms larger rivals.

  • Logic and Reasoning: In tests like GSM8K and MATH-500, it rivals dense models twice its size.
  • Throughput: When benchmarked on a single NVIDIA H100 using vLLM, it reached 26.8K total tokens per second at 1,024 concurrent requests, significantly outpacing Snowflake’s gpt-oss-20b and Qwen3-30B-A3B.
  • Long Context: The model features a 32k token context window, optimized for privacy-sensitive RAG (Retrieval-Augmented Generation) pipelines and local document analysis.

Technical Cheat Sheet

Property Specification
Total Parameters 24 Billion
Active Parameters 2.3 Billion
Architecture Hybrid (Gated Conv + GQA)
Layers 40 (30 Base / 10 Attention)
Context Length 32,768 Tokens
Training Data 17 Trillion Tokens
License LFM Open License v1.0
Native Support llama.cpp, vLLM, SGLang, MLX

Key Takeaways

  • Hybrid ‘A2B’ Architecture: The model uses a 1:3 ratio of Grouped Query Attention (GQA) to Gated Short Convolutions. By utilizing linear-complexity ‘Base’ layers for 30 out of 40 layers, the model achieves much faster prefill and decode speeds with a significantly reduced memory footprint compared to traditional all-attention Transformers.
  • Sparse MoE Efficiency: Despite having 24 billion total parameters, the model only activates 2.3 billion parameters per token. This ‘Sparse Mixture of Experts’ design allows it to deliver the reasoning depth of a large model while maintaining the inference latency and energy efficiency of a 2B-parameter model.
  • True Edge Capability: Optimized via hardware-in-the-loop architecture search, the model is designed to fit in 32GB of RAM. This makes it fully deployable on consumer-grade hardware, including laptops with integrated GPUs and NPUs, without requiring expensive data-center infrastructure.
  • State-of-the-Art Performance: LFM2-24B-A2B outperforms larger competitors like Qwen3-30B-A3B and Snowflake gpt-oss-20b in throughput. Benchmarks show it hits approximately 26.8K tokens per second on a single H100, showing near-linear scaling and high efficiency in long-context tasks up to its 32k token window.

Check out the Technical details and Model weights. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source_link

READ ALSO

Why Experts Are Suddenly Worried About AI Going Rogue

MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2

Related Posts

Why Experts Are Suddenly Worried About AI Going Rogue
Al, Analytics and Automation

Why Experts Are Suddenly Worried About AI Going Rogue

April 12, 2026
MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2
Al, Analytics and Automation

MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2

April 12, 2026
Al, Analytics and Automation

Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput

April 12, 2026
Al, Analytics and Automation

How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model

April 11, 2026
Washington Is Getting Ready to Slow AI Down. And This Has Nothing to Do with Politics
Al, Analytics and Automation

Washington Is Getting Ready to Slow AI Down. And This Has Nothing to Do with Politics

April 11, 2026
Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts
Al, Analytics and Automation

Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts

April 11, 2026
Next Post
Craft Food Coxinha Recipe – Followchain

Craft Food Coxinha Recipe - Followchain

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

The Dark Side of AI No One Talks About

The Dark Side of AI No One Talks About

March 13, 2026
Healthcare Leaders’ Guide to Telehealth EHR Integration Success

Healthcare Leaders’ Guide to Telehealth EHR Integration Success

September 10, 2025
Korean AI startup Motif reveals 4 big lessons for training enterprise LLMs

Korean AI startup Motif reveals 4 big lessons for training enterprise LLMs

December 15, 2025
10 Must-Have Shopify Automation Workflows to Boost Sales in 2025

10 Must-Have Shopify Automation Workflows to Boost Sales in 2025

July 30, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Branding for Apparel Startups: Building a Visual Identity That Sells
  • Why Is It So Hard to Fix an Electric Bike? (2026)
  • QR codes for small business
  • The Experience Builder’s Event and Content Roadmap
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions