How Sakana AI's new evolutionary algorithm builds powerful AI models without expensive retraining

From human clicks to machine intent: Preparing the web for agentic AI

Enshittification by Cory Doctorow: Why Google, Amazon, and Facebook are worse than ever.

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

A new evolutionary technique from Japan-based AI lab Sakana AI enables developers to augment the capabilities of AI models without costly training and fine-tuning processes. The technique, called Model Merging of Natural Niches (M2N2), overcomes the limitations of other model merging methods and can even evolve new models entirely from scratch.

M2N2 can be applied to different types of machine learning models, including large language models (LLMs) and text-to-image generators. For enterprises looking to build custom AI solutions, the approach offers a powerful and efficient way to create specialized models by combining the strengths of existing open-source variants.

What is model merging?

Model merging is a technique for integrating the knowledge of multiple specialized AI models into a single, more capable model. Instead of fine-tuning, which refines a single pre-trained model using new data, merging combines the parameters of several models simultaneously. This process can consolidate a wealth of knowledge into one asset without requiring expensive, gradient-based training or access to the original training data.

For enterprise teams, this offers several practical advantages over traditional fine-tuning. In comments to VentureBeat, the paper’s authors said model merging is a gradient-free process that only requires forward passes, making it computationally cheaper than fine-tuning, which involves costly gradient updates. Merging also sidesteps the need for carefully balanced training data and mitigates the risk of “catastrophic forgetting,” where a model loses its original capabilities after learning a new task. The technique is especially powerful when the training data for specialist models isn’t available, as merging only requires the model weights themselves.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Early approaches to model merging required significant manual effort, as developers adjusted coefficients through trial and error to find the optimal blend. More recently, evolutionary algorithms have helped automate this process by searching for the optimal combination of parameters. However, a significant manual step remains: developers must set fixed sets for mergeable parameters, such as layers. This restriction limits the search space and can prevent the discovery of more powerful combinations.

How M2N2 works

M2N2 addresses these limitations by drawing inspiration from evolutionary principles in nature. The algorithm has three key features that allow it to explore a wider range of possibilities and discover more effective model combinations.

Model Merging of Natural Niches Source: arXiv

First, M2N2 eliminates fixed merging boundaries, such as blocks or layers. Instead of grouping parameters by pre-defined layers, it uses flexible “split points” and “mixing ration” to divide and combine models. This means that, for example, the algorithm might merge 30% of the parameters in one layer from Model A with 70% of the parameters from the same layer in Model B. The process starts with an “archive” of seed models. At each step, M2N2 selects two models from the archive, determines a mixing ratio and a split point, and merges them. If the resulting model performs well, it is added back to the archive, replacing a weaker one. This allows the algorithm to explore increasingly complex combinations over time. As the researchers note, “This gradual introduction of complexity ensures a wider range of possibilities while maintaining computational tractability.”

Second, M2N2 manages the diversity of its model population through competition. To understand why diversity is crucial, the researchers offer a simple analogy: “Imagine merging two answer sheets for an exam… If both sheets have exactly the same answers, combining them does not make any improvement. But if each sheet has correct answers for different questions, merging them gives a much stronger result.” Model merging works the same way. The challenge, however, is defining what kind of diversity is valuable. Instead of relying on hand-crafted metrics, M2N2 simulates competition for limited resources. This nature-inspired approach naturally rewards models with unique skills, as they can “tap into uncontested resources” and solve problems others can’t. These niche specialists, the authors note, are the most valuable for merging.

Third, M2N2 uses a heuristic called “attraction” to pair models for merging. Rather than simply combining the top-performing models as in other merging algorithms, it pairs them based on their complementary strengths. An “attraction score” identifies pairs where one model performs well on data points that the other finds challenging. This improves both the efficiency of the search and the quality of the final merged model.

M2N2 in action

The researchers tested M2N2 across three different domains, demonstrating its versatility and effectiveness.

The first was a small-scale experiment evolving neural network–based image classifiers from scratch on the MNIST dataset. M2N2 achieved the highest test accuracy by a substantial margin compared to other methods. The results showed that its diversity-preservation mechanism was key, allowing it to maintain an archive of models with complementary strengths that facilitated effective merging while systematically discarding weaker solutions.

Next, they applied M2N2 to LLMs, combining a math specialist model (WizardMath-7B) with an agentic specialist (AgentEvol-7B), both of which are based on the Llama 2 architecture. The goal was to create a single agent that excelled at both math problems (GSM8K dataset) and web-based tasks (WebShop dataset). The resulting model achieved strong performance on both benchmarks, showcasing M2N2’s ability to create powerful, multi-skilled models.

A model merge with M2N2 combines the best of both seed models Source: arXiv

Finally, the team merged diffusion-based image generation models. They combined a model trained on Japanese prompts (JSDXL) with three Stable Diffusion models primarily trained on English prompts. The objective was to create a model that combined the best image generation capabilities of each seed model while retaining the ability to understand Japanese. The merged model not only produced more photorealistic images with better semantic understanding but also developed an emergent bilingual ability. It could generate high-quality images from both English and Japanese prompts, even though it was optimized exclusively using Japanese captions.

For enterprises that have already developed specialist models, the business case for merging is compelling. The authors point to new, hybrid capabilities that would be difficult to achieve otherwise. For example, merging an LLM fine-tuned for persuasive sales pitches with a vision model trained to interpret customer reactions could create a single agent that adapts its pitch in real-time based on live video feedback. This unlocks the combined intelligence of multiple models with the cost and latency of running just one.

Looking ahead, the researchers see techniques like M2N2 as part of a broader trend toward “model fusion.” They envision a future where organizations maintain entire ecosystems of AI models that are continuously evolving and merging to adapt to new challenges.

“Think of it like an evolving ecosystem where capabilities are combined as needed, rather than building one giant monolith from scratch,” the authors suggest.

The researchers have released the code of M2N2 on GitHub.

The biggest hurdle to this dynamic, self-improving AI ecosystem, the authors believe, is not technical but organizational. “In a world with a large ‘merged model’ made up of open-source, commercial, and custom components, ensuring privacy, security, and compliance will be a critical problem.” For businesses, the challenge will be figuring out which models can be safely and effectively absorbed into their evolving AI stack.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.