Mistral AI Releases Devstral 2507 for Code-Centric Language Modeling

Mistral AI, in collaboration with All Hands AI, has released updated versions of its developer-focused large language models under the Devstral 2507 label. The release includes two models—Devstral Small 1.1 and Devstral Medium 2507—designed to support agent-based code reasoning, program synthesis, and structured task execution across large software repositories. These models are optimized for performance and cost, making them applicable for real-world use in developer tools and code automation systems.

Devstral Small 1.1: Open Model for Local and Embedded Use

Devstral Small 1.1 (also called devstral-small-2507) is based on the Mistral-Small-3.1 foundation model and contains approximately 24 billion parameters. It supports a 128k token context window, which allows it to handle multi-file code inputs and long prompts typical in software engineering workflows.

Sakana AI Releases Fugu-Cyber: An Orchestration Model Reporting 86.9% on CyberGym and 72.1% on CTI-REALM

Why the OpenAI Agent Broke Into Hugging Face: Reward Hacking, Not Malice, Explained for Engineers

The model is fine-tuned specifically for structured outputs, including XML and function-calling formats. This makes it compatible with agent frameworks such as OpenHands and suitable for tasks like program navigation, multi-step edits, and code search. It is licensed under Apache 2.0 and available for both research and commercial use.

Source: https://mistral.ai/news/devstral-2507

Performance: SWE-Bench Results

Devstral Small 1.1 achieves 53.6% on the SWE-Bench Verified benchmark, which evaluates the model’s ability to generate correct patches for real GitHub issues. This represents a noticeable improvement over the previous version (1.0) and places it ahead of other openly available models of comparable size. The results were obtained using the OpenHands scaffold, which provides a standard test environment for evaluating code agents.

While not at the level of the largest proprietary models, this version offers a balance between size, inference cost, and reasoning performance that is practical for many coding tasks.

Deployment: Local Inference and Quantization

The model is released in multiple formats. Quantized versions in GGUF are available for use with llama.cpp, vLLM, and LM Studio. These formats make it possible to run inference locally on high-memory GPUs (e.g., RTX 4090) or Apple Silicon machines with 32GB RAM or more. This is beneficial for developers or teams that prefer to operate without dependency on hosted APIs.

Mistral also makes the model available via their inference API. The current pricing is $0.10 per million input tokens and $0.30 per million output tokens, the same as other models in the Mistral-Small line.

Devstral Medium 2507: Higher Accuracy, API-Only

Devstral Medium 2507 is not open-sourced and is only available through the Mistral API or through enterprise deployment agreements. It offers the same 128k token context length as the Small version but with higher performance.

The model scores 61.6% on SWE-Bench Verified, outperforming several commercial models, including Gemini 2.5 Pro and GPT-4.1, in the same evaluation framework. Its stronger reasoning capacity over long contexts makes it a candidate for code agents that operate across large monorepos or repositories with cross-file dependencies.

API pricing is set at $0.40 per million input tokens and $2 per million output tokens. Fine-tuning is available for enterprise users via the Mistral platform.

Comparison and Use Case Fit

Model	SWE-Bench Verified	Open Source	Input Cost	Output Cost	Context Length
Devstral Small 1.1	53.6%	Yes	$0.10/M	$0.30/M	128k tokens
Devstral Medium	61.6%	No	$0.40/M	$2.00/M	128k tokens

Devstral Small is more suitable for local development, experimentation, or integrating into client-side developer tools where control and efficiency are important. In contrast, Devstral Medium provides stronger accuracy and consistency in structured code-editing tasks and is intended for production services that benefit from higher performance despite increased cost.

Integration with Tooling and Agents

Both models are designed to support integration with code agent frameworks such as OpenHands. The support for structured function calls and XML output formats allows them to be integrated into automated workflows for test generation, refactoring, and bug fixing. This compatibility makes it easier to connect Devstral models to IDE plugins, version control bots, and internal CI/CD pipelines.

For example, developers can use Devstral Small for prototyping local workflows, while Devstral Medium can be used in production services that apply patches or triage pull requests based on model suggestions.

Conclusion

The Devstral 2507 release reflects a targeted update to Mistral’s code-oriented LLM stack, offering users a clearer tradeoff between inference cost and task accuracy. Devstral Small provides an accessible, open model with sufficient performance for many use cases, while Devstral Medium caters to applications where correctness and reliability are critical.

The availability of both models under different deployment options makes them relevant across various stages of the software engineering workflow—from experimental agent development to deployment in commercial environments.

Check out the Technical details, Devstral Small model weights at Hugging Face and Devstral Medium will also be available on Mistral Code for enterprise customers and on finetuning API. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, and Youtube and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

Source_link

Mistral AI Releases Devstral 2507 for Code-Centric Language Modeling

READ ALSO

Sakana AI Releases Fugu-Cyber: An Orchestration Model Reporting 86.9% on CyberGym and 72.1% on CTI-REALM

Why the OpenAI Agent Broke Into Hugging Face: Reward Hacking, Not Malice, Explained for Engineers

Related Posts

Sakana AI Releases Fugu-Cyber: An Orchestration Model Reporting 86.9% on CyberGym and 72.1% on CTI-REALM

Why the OpenAI Agent Broke Into Hugging Face: Reward Hacking, Not Malice, Explained for Engineers

Datalab’s Marker 2 vs MinerU, Docling and LiteParse: 76.0 on olmOCR-bench at 5× MinerU’s Throughput

Working to automate nuclear plant operations | MIT News

How to Build an End-to-End OCR Pipeline with Baidu’s Unlimited-OCR for High-Resolution Images and Multi-Page PDF Parsing

MIT projects selected for funding under US Department of Energy’s Genesis Mission | MIT News

The best Amazon Prime Day deals under $50 that you can get before the event is over

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

Communication Effectiveness Skills For Business Leaders

App Development Cost in Singapore: Pricing Breakdown & Insights

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

EDITOR'S PICK

Google’s AI model is getting really good at spoofing phone photos

Fans ‘Find Their Mountain’ at Paramount+’s The Lodge

Finland Embedded Conference Announced for 2026

The Scoop: Malört made its bad flavor part of the brand experience

About

Categories

Recent Posts