• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, June 2, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

Josh by Josh
June 2, 2026
in Al, Analytics and Automation
0
JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines


JetBrains released Mellum2, open-sourcing the weights under the Apache 2.0 license. The first version of Mellum was a completion-focused 4B dense model. Mellum2 is its successor: a general-purpose model specialized in software engineering. It covers code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance.

JetBrains team positions Mellum2 as a “focal model” — a fast, specialized component inside larger AI systems, not a standalone replacement for frontier models.

Architecture

Mellum2 uses a Mixture-of-Experts (MoE) architecture with 12B total parameters and 2.5B active parameters per token. In MoE models, only a subset of parameters runs on each token. Here, the model has 64 experts and activates 8 per token. This keeps per-token compute equivalent to a 2.5B dense model, while the total parameter count provides higher capacity for specialization.

Key architectural details:

  • Layers: 28
  • Hidden size: 2304
  • MoE experts: 64 total, 8 activated per token
  • Attention: Grouped-Query Attention (GQA) with 32 query heads and 4 KV heads
  • Sliding Window Attention (SWA): Applied to three of every four layers, with a window size of 1,024. Full attention runs on the remaining layer.
  • Context length: 131,072 tokens
  • Multi-Token Prediction (MTP) head: Serves as an auxiliary pre-training objective and as a built-in draft model for speculative decoding
  • Precision: bfloat16
  • Vocabulary size: 98,304

The model handles natural language and code. It is not multimodal — there is no image or video input.

Pre-Training

Pre-training spans approximately 10.6 trillion tokens through a three-phase curriculum. The data mixture progressively shifts from diverse web content toward curated code and mathematical content across the three phases.

Training used the Muon optimizer under FP8 hybrid precision with a Warmup-Hold-Decay learning rate schedule with linear decay to zero.

After pre-training, the base model’s context window was extended to 128K tokens using a layer-selective YaRN method before post-training began.

The Model Family

JetBrains team released six checkpoints covering the full training pipeline:

Checkpoint Description
Mellum2-12B-A2.5B-Base-Pretrain Base checkpoint before long-context extension
Mellum2-12B-A2.5B-Base Final base model after context extension
Mellum2-12B-A2.5B-Instruct-SFT Supervised fine-tuned instruction checkpoint
Mellum2-12B-A2.5B-Thinking-SFT Supervised thinking checkpoint
Mellum2-12B-A2.5B-Instruct RL-tuned instruction model
Mellum2-12B-A2.5B-Thinking RL-tuned thinking model

Post-training follows two stages: supervised fine-tuning (SFT), then reinforcement learning with verifiable rewards (RLVR) on math, executable coding, tool use, instruction following, reasoning, and knowledge tasks.

The Instruct variant answers directly, without an externalized chain of thought. Use it for low-latency tasks: direct answers, tool use, and instruction following.

The Thinking variant emits an explicit reasoning trace before its final answer. Use it for complex debugging, multi-step planning, or agentic flows where step-by-step reasoning matters.

Benchmark Results

All numbers below are self-reported by JetBrains. The comparison set is open-weight models in the 4B–14B range.

Coding:

Benchmark Mellum2 Instruct Qwen3.5 (4B) Qwen3.5 (9B) Ministral 3 (14B) OLMo-3 (7B) Seed-Coder (8B)
LiveCodeBench v6 37.2 51.0 63.7 42.4 28.2 28.1
EvalPlus 78.4 69.4 71.8 74.1 67.3 73.8
MultiPL-E 67.1 51.0 67.1 71.5 36.1 77.0

Tool Use:

Benchmark Mellum2 Instruct Qwen3.5 (4B) Qwen3.5 (9B) Ministral 3 (14B) OLMo-3 (7B)
BFCL v3 66.3 64.1 70.5 52.7 41.9
BFCL v4 44.2 52.0 60.6 38.8 19.8

Math:

Benchmark Mellum2 Instruct Qwen3.5 (4B) Qwen3.5 (9B) Ministral 3 (14B) OLMo-3 (7B)
AIME 2025+2026 41.7 38.3 58.3 33.3 40.0
GSM-Plus 80.5 85.2 87.9 86.6 85.8

Knowledge and Conversational:

Benchmark Mellum2 Instruct Qwen3.5 (4B) Qwen3.5 (9B) Ministral 3 (14B) OLMo-3 (7B)
MMLU-Redux 78.1 87.5 91.1 85.9 71.8
GPQA Diamond 40.9 76.8 79.8 58.6 40.9
IFEval 75.8 82.1 83.9 67.3 83.2
MixEval 62.2 65.9 71.1 71.2 59.4

Benchmark notes:

  • EvalPlus is the mean of HumanEval+ and MBPP+
  • AIME is the mean of AIME 2025 and AIME 2026 (30 questions each)
  • BFCL v4 is the macro-average of five subtasks: v1, v2, v3, web search, memory
  • Seed-Coder (8B) does not support native tool calling; BFCL scores are not listed for it
https://blog.jetbrains.com/ai/2026/06/mellum2-goes-open-source-a-fast-model-for-ai-workflows/

Use Cases

JetBrains identifies four production scenarios where Mellum2’s latency and efficiency profile is relevant:

  • Routing and orchestration: In a multi-model system, a router analyzes incoming prompts and selects the appropriate model or tool for each task. Mellum2’s low per-token compute makes it suitable for this high-frequency classification step.
  • Low-latency RAG pipelines: Retrieval-Augmented Generation (RAG) systems retrieve relevant context, summarize it, and generate a response. Mellum2 handles retrieval summarization at lower latency than larger dense models.
  • Sub-agents in complex workflows: Agent pipelines break tasks into steps: context gathering, planning, validation, and execution. Mellum2 can handle repetitive or latency-sensitive steps instead of routing every step through a single large frontier model.
  • Private and local deployment: The Apache 2.0 license permits self-hosting without restrictions. Engineers can run Mellum2 on their own infrastructure, keeping code and data under their control.

Strengths and Limitations

Strengths:

  • MoE design activates only 2.5B of 12B parameters per token — per-token compute equivalent to a 2.5B dense model
  • MTP head enables speculative decoding without a separate draft model
  • 131,072 token context window
  • Full checkpoint set released: base pretrain, base, SFT, and RL-tuned variants for both Instruct and Thinking
  • Apache 2.0 license — permits commercial use, self-hosting, and fine-tuning
  • Strong EvalPlus (78.4) and BFCL v3 (66.3) scores relative to 4B–14B comparisons
  • vLLM support, including optional tool-calling via --tool-call-parser hermes

Limitations:

  • Text and code only — no image or multimodal input
  • LiveCodeBench v6 (37.2) trails Qwen3.5 9B (63.7) and Ministral 3 14B (42.4)
  • GPQA Diamond (40.9) and MMLU-Redux (78.1) are below most models in the comparison set
  • GSM-Plus (80.5) is below all comparable models listed
  • Not designed for frontier-level tasks — JetBrains explicitly positions Mellum2 as a component model

Marktechpost’s Visual Explainer

Overview

JetBrains Open-Sources Mellum2

A 12B Mixture-of-Experts model released under Apache 2.0 on June 2, 2026. Trained from scratch on ~10.6 trillion tokens for software engineering tasks.

Architecture

How Mellum2 Is Built

MoE activates 8 of 64 experts per token — per-token compute stays equivalent to a 2.5B dense model. An MTP head enables speculative decoding without a separate draft model.

Experts (total / active)

64 / 8

SWA Window

1,024 (¾ layers)

Pre-Training

Training Pipeline

Three-phase curriculum progressively shifts from diverse web data toward curated code and math. Context extended to 128K via layer-selective YaRN before post-training.

  • Data: ~10.6 trillion tokens across three curriculum phases
  • Optimizer: Muon under FP8 hybrid precision
  • LR Schedule: Warmup-Hold-Decay with linear decay to zero
  • Context Extension: Layer-selective YaRN to 128K tokens
  • Post-Training: SFT → RLVR on coding, math, tool use, reasoning, knowledge
  • Design Constraint: Inference efficiency on commodity GPUs validated by ablation

Model Family

Six Checkpoints Released

Full pipeline from base pretrain through RL-tuned variants. Use Instruct for direct low-latency answers. Use Thinking for explicit step-by-step reasoning traces.

BASEMellum2-12B-A2.5B-Base-PretrainBefore context extension

BASEMellum2-12B-A2.5B-BaseAfter YaRN extension

SFTMellum2-12B-A2.5B-Instruct-SFTSupervised instruction

SFTMellum2-12B-A2.5B-Thinking-SFTSupervised thinking

RLVRMellum2-12B-A2.5B-InstructRL-tuned, no CoT

RLVRMellum2-12B-A2.5B-ThinkingRL-tuned, explicit CoT

Benchmarks

Evaluation Results (Instruct Variant)

All numbers self-reported by JetBrains. Comparison set: open-weight models in the 4B–14B range.

Benchmark Mellum2 Qwen3.5 9B Ministral 3 14B OLMo-3 7B
LiveCodeBench v6 37.2 63.7 42.4 28.2
EvalPlus 78.4 71.8 74.1 67.3
MultiPL-E 67.1 67.1 71.5 36.1
BFCL v3 66.3 70.5 52.7 41.9
AIME 2025+2026 41.7 58.3 33.3 40.0
IFEval 75.8 83.9 67.3 83.2

Use Cases

Where Mellum2 Fits in Production

JetBrains positions Mellum2 as a “focal model” — handling high-frequency, latency-sensitive steps inside larger AI pipelines.

  • Routing & Orchestration — Analyze prompts and select the right model or tool per task
  • RAG Pipelines — Summarize retrieved context at low latency before response generation
  • Sub-Agents — Handle repetitive steps in agent pipelines (context gathering, validation, planning)
  • Private Deployment — Apache 2.0 permits full self-hosting with no external API calls required

Strengths & Limitations

What Works and What Doesn’t

Mellum2 is designed for efficiency in component roles, not frontier-level capability across all benchmarks.

✓ Strengths

  • 2.5B active params — compute of a dense 2.5B model
  • MTP head enables built-in speculative decoding
  • 131K token context window
  • Strong EvalPlus (78.4) and BFCL v3 (66.3)
  • Apache 2.0 — commercial use, fine-tuning, self-hosting
  • vLLM support with tool-calling

✗ Limitations

  • Text and code only — no multimodal input
  • LiveCodeBench v6 (37.2) below Qwen3.5 9B (63.7)
  • GPQA Diamond (40.9) below most comparisons
  • GSM-Plus (80.5) trails all models listed
  • Not a frontier replacement — component role only

Quick Start

Deploy with vLLM

Install vLLM and serve the Instruct variant. Enable tool-calling with the hermes parser for function-calling workflows.

pip install vllm

# Basic serve
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \
  --max-model-len 131072

# With tool calling
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \
  --max-model-len 131072 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

Model weights: huggingface.co/JetBrains/mellum-2  ·  Technical report: arXiv:2605.31268

Getting Started

Serve Mellum2 with vLLM:

pip install vllm
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct --max-model-len 131072

With tool calling enabled:

READ ALSO

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch

vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \
  --max-model-len 131072 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

Using the Hugging Face Transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Instruct")

messages = [{"role": "user", "content": "Write a Python function to reverse a string."}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Check out the Model Weights and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us




Source_link

Related Posts

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent
Al, Analytics and Automation

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

June 1, 2026
Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch
Al, Analytics and Automation

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch

June 1, 2026
An Implementation of the Microsoft Agent Governance Toolkit for Safe AI Agent Tool Use with Policies, Approvals, Audit Logs, and Risk Controls
Al, Analytics and Automation

An Implementation of the Microsoft Agent Governance Toolkit for Safe AI Agent Tool Use with Policies, Approvals, Audit Logs, and Risk Controls

May 31, 2026
Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain
Al, Analytics and Automation

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain

May 31, 2026
Genesis AI Releases Nyx, Quadrants, and Genesis World 1.0 Physics Platform for Scalable Robotics Foundation Model Evaluation
Al, Analytics and Automation

Genesis AI Releases Nyx, Quadrants, and Genesis World 1.0 Physics Platform for Scalable Robotics Foundation Model Evaluation

May 30, 2026
Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4
Al, Analytics and Automation

Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4

May 30, 2026
Next Post
LinkedIn Crossclimb Answer Today for June 2, 2026 (Puzzle #763)

LinkedIn Crossclimb Answer Today for June 2, 2026 (Puzzle #763)

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

4 tools to leverage AI overviews for better visibility

July 2, 2025
List of Gourmet Egg Pets in Grow a Garden

List of Gourmet Egg Pets in Grow a Garden

August 3, 2025
How Brand Loyalty Creates Enduring Profitable Growth

How Brand Loyalty Creates Enduring Profitable Growth

May 28, 2025
Environics Analytics, Caddle Expand Partnership

Environics Analytics, Caddle Expand Partnership

June 4, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Partiful Is Putting Ticket Payments on Its Platform
  • The State of Brand Intelligence in 2026 (Based on G2 Data)
  • Our Guide to the Summer 2026 Issue
  • Testing Google’s Gemini Spark AI agent: it’s incredible, and creepy
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions