• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Wednesday, October 8, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Build a Low-Footprint AI Coding Assistant with Mistral Devstral

Josh by Josh
June 25, 2025
in Al, Analytics and Automation
0
Build a Low-Footprint AI Coding Assistant with Mistral Devstral
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


In this Ultra-Light Mistral Devstral tutorial, a Colab-friendly guide is provided that is designed specifically for users facing disk space constraints. Running large language models like Mistral can be a challenge in environments with limited storage and memory, but this tutorial shows how to deploy the powerful devstral-small model. With aggressive quantization using BitsAndBytes, cache management, and efficient token generation, this tutorial walks you through building a lightweight assistant that’s fast, interactive, and disk-conscious. Whether you’re debugging code, writing small tools, or prototyping on the go, this setup ensures that you get maximum performance with minimal footprint.

!pip install -q kagglehub mistral-common bitsandbytes transformers --no-cache-dir
!pip install -q accelerate torch --no-cache-dir


import shutil
import os
import gc

The tutorial begins by installing essential lightweight packages such as kagglehub, mistral-common, bitsandbytes, and transformers, ensuring no cache is stored to minimize disk usage. It also includes accelerate and torch for efficient model loading and inference. To further optimize space, any pre-existing cache or temporary directories are cleared using Python’s shutil, os, and gc modules.

def cleanup_cache():
   """Clean up unnecessary files to save disk space"""
   cache_dirs = ['/root/.cache', '/tmp/kagglehub']
   for cache_dir in cache_dirs:
       if os.path.exists(cache_dir):
           shutil.rmtree(cache_dir, ignore_errors=True)
   gc.collect()


cleanup_cache()
print("🧹 Disk space optimized!")

To maintain a minimal disk footprint throughout execution, the cleanup_cache() function is defined to remove redundant cache directories like /root/.cache and /tmp/kagglehub. This proactive cleanup helps free up space before and after key operations. Once invoked, the function confirms that disk space has been optimized, reinforcing the tutorial’s focus on resource efficiency.

import warnings
warnings.filterwarnings("ignore")


import torch
import kagglehub
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

To ensure smooth execution without distracting warning messages, we suppress all runtime warnings using Python’s warnings module. It then imports essential libraries for model interaction, including torch for tensor computations, kagglehub for streaming the model, and transformers for loading the quantized LLM. Mistral-specific classes like UserMessage, ChatCompletionRequest, and MistralTokenizer are also packed to handle tokenization and request formatting tailored to Devstral’s architecture.

class LightweightDevstral:
   def __init__(self):
       print("📦 Downloading model (streaming mode)...")
      
       self.model_path = kagglehub.model_download(
           'mistral-ai/devstral-small-2505/Transformers/devstral-small-2505/1',
           force_download=False 
       )
      
       quantization_config = BitsAndBytesConfig(
           bnb_4bit_compute_dtype=torch.float16,
           bnb_4bit_quant_type="nf4",
           bnb_4bit_use_double_quant=True,
           bnb_4bit_quant_storage=torch.uint8,
           load_in_4bit=True
       )
      
       print("⚡ Loading ultra-compressed model...")
       self.model = AutoModelForCausalLM.from_pretrained(
           self.model_path,
           torch_dtype=torch.float16,
           device_map="auto",
           quantization_config=quantization_config,
           low_cpu_mem_usage=True, 
           trust_remote_code=True
       )
      
       self.tokenizer = MistralTokenizer.from_file(f'{self.model_path}/tekken.json')
      
       cleanup_cache()
       print("✅ Lightweight assistant ready! (~2GB disk usage)")
  
   def generate(self, prompt, max_tokens=400): 
       """Memory-efficient generation"""
       tokenized = self.tokenizer.encode_chat_completion(
           ChatCompletionRequest(messages=[UserMessage(content=prompt)])
       )
      
       input_ids = torch.tensor([tokenized.tokens])
       if torch.cuda.is_available():
           input_ids = input_ids.to(self.model.device)
      
       with torch.inference_mode(): 
           output = self.model.generate(
               input_ids=input_ids,
               max_new_tokens=max_tokens,
               temperature=0.6,
               top_p=0.85,
               do_sample=True,
               pad_token_id=self.tokenizer.eos_token_id,
               use_cache=True 
           )[0]
      
       del input_ids
       torch.cuda.empty_cache() if torch.cuda.is_available() else None
      
       return self.tokenizer.decode(output[len(tokenized.tokens):])


print("🚀 Initializing lightweight AI assistant...")
assistant = LightweightDevstral()

We define the LightweightDevstral class, the core component of the tutorial, which handles model loading and text generation in a resource-efficient manner. It begins by streaming the devstral-small-2505 model using kagglehub, avoiding redundant downloads. The model is then loaded with aggressive 4-bit quantization via BitsAndBytesConfig, significantly reducing memory and disk usage while still enabling performant inference. A custom tokenizer is initialized from a local JSON file, and the cache is cleared immediately afterward. The generate method employs memory-safe practices, such as torch.inference_mode() and empty_cache(), to generate responses efficiently, making this assistant suitable even for environments with tight hardware constraints.

def run_demo(title, prompt, emoji="🎯"):
   """Run a single demo with cleanup"""
   print(f"\n{emoji} {title}")
   print("-" * 50)
  
   result = assistant.generate(prompt, max_tokens=350)
   print(result)
  
   gc.collect()
   if torch.cuda.is_available():
       torch.cuda.empty_cache()


run_demo(
   "Quick Prime Finder",
   "Write a fast prime checker function `is_prime(n)` with explanation and test cases.",
   "🔢"
)


run_demo(
   "Debug This Code",
   """Fix this buggy function and explain the issues:
```python
def avg_positive(numbers):
   total = sum([n for n in numbers if n > 0])
   return total / len([n for n in numbers if n > 0])
```""",
   "🐛"
)


run_demo(
   "Text Tool Creator",
   "Create a simple `TextAnalyzer` class with word count, char count, and palindrome check methods.",
   "🛠️"
)

Here we showcase the model’s coding abilities through a compact demo suite using the run_demo() function. Each demo sends a prompt to the Devstral assistant and prints the generated response, immediately followed by memory cleanup to prevent buildup over multiple runs. The examples include writing an efficient prime-checking function, debugging a Python snippet with logical flaws, and building a mini TextAnalyzer class. These demonstrations highlight the model’s utility as a lightweight, disk-conscious coding assistant capable of real-time code generation and explanation.

def quick_coding():
   """Lightweight interactive session"""
   print("\n🎮 QUICK CODING MODE")
   print("=" * 40)
   print("Enter short coding prompts (type 'exit' to quit)")
  
   session_count = 0
   max_sessions = 5 
  
   while session_count < max_sessions:
       prompt = input(f"\n[{session_count+1}/{max_sessions}] Your prompt: ")
      
       if prompt.lower() in ['exit', 'quit', '']:
           break
          
       try:
           result = assistant.generate(prompt, max_tokens=300)
           print("💡 Solution:")
           print(result[:500]) 
          
           gc.collect()
           if torch.cuda.is_available():
               torch.cuda.empty_cache()
              
       except Exception as e:
           print(f"❌ Error: {str(e)[:100]}...")
      
       session_count += 1
  
   print(f"\n✅ Session complete! Memory cleaned.")

We introduce Quick Coding Mode, a lightweight interactive interface that allows users to submit short coding prompts directly to the Devstral assistant. Designed to limit memory usage, the session caps interaction to five prompts, each followed by aggressive memory cleanup to ensure continued responsiveness in low-resource environments. The assistant responds with concise, truncated code suggestions, making this mode ideal for rapid prototyping, debugging, or exploring coding concepts on the fly, all without overwhelming the notebook’s disk or memory capacity.

def check_disk_usage():
   """Monitor disk usage"""
   import subprocess
   try:
       result = subprocess.run(['df', '-h', '/'], capture_output=True, text=True)
       lines = result.stdout.split('\n')
       if len(lines) > 1:
           usage_line = lines[1].split()
           used = usage_line[2]
           available = usage_line[3]
           print(f"💾 Disk: {used} used, {available} available")
   except:
       print("💾 Disk usage check unavailable")




print("\n🎉 Tutorial Complete!")
cleanup_cache()
check_disk_usage()


print("\n💡 Space-Saving Tips:")
print("• Model uses ~2GB vs original ~7GB+")
print("• Automatic cache cleanup after each use") 
print("• Limited token generation to save memory")
print("• Use 'del assistant' when done to free ~2GB")
print("• Restart runtime if memory issues persist")

Finally, we offer a cleanup routine and a helpful disk usage monitor. Using the df -h command via Python’s subprocess module, it displays how much disk space is used and available, confirming the model’s lightweight nature. After re-invoking cleanup_cache() to ensure minimal residue, the script concludes with a set of practical space-saving tips.

In conclusion, we can now leverage the capabilities of Mistral’s Devstral model in space-constrained environments like Google Colab, without compromising usability or speed. The model loads in a highly compressed format, performs efficient text generation, and ensures memory is promptly cleared after use. With the interactive coding mode and demo suite included, users can test their ideas quickly and seamlessly.


Check out the Codes. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

READ ALSO

Building a Human Handoff Interface for AI-Powered Insurance Agent Using Parlant and Streamlit

How OpenAI’s Sora 2 Is Transforming Toy Design into Moving Dreams

Related Posts

Building a Human Handoff Interface for AI-Powered Insurance Agent Using Parlant and Streamlit
Al, Analytics and Automation

Building a Human Handoff Interface for AI-Powered Insurance Agent Using Parlant and Streamlit

October 7, 2025
How OpenAI’s Sora 2 Is Transforming Toy Design into Moving Dreams
Al, Analytics and Automation

How OpenAI’s Sora 2 Is Transforming Toy Design into Moving Dreams

October 7, 2025
Printable aluminum alloy sets strength records, may enable lighter aircraft parts | MIT News
Al, Analytics and Automation

Printable aluminum alloy sets strength records, may enable lighter aircraft parts | MIT News

October 7, 2025
Google DeepMind Introduces CodeMender: A New AI Agent that Uses Gemini Deep Think to Automatically Patch Critical Software Vulnerabilities
Al, Analytics and Automation

Google DeepMind Introduces CodeMender: A New AI Agent that Uses Gemini Deep Think to Automatically Patch Critical Software Vulnerabilities

October 7, 2025
How Image and Video Chatbots Bridge the Gap
Al, Analytics and Automation

How Image and Video Chatbots Bridge the Gap

October 6, 2025
A New Agency-Focused Supervision Approach Scales Software AI Agents With Only 78 Examples
Al, Analytics and Automation

A New Agency-Focused Supervision Approach Scales Software AI Agents With Only 78 Examples

October 6, 2025
Next Post
New space startup Lux Aeterna wants to make satellites reusable

New space startup Lux Aeterna wants to make satellites reusable

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025

EDITOR'S PICK

Email Marketing Benchmarking Report 2025

Email Marketing Benchmarking Report 2025

May 30, 2025

The Scoop: Jamie Dimon opposes Trump on federal reserve independence without saying his name

July 17, 2025
Black Hat 2025: How Agentic AI Is finally delivering real value

Black Hat 2025: How Agentic AI Is finally delivering real value

August 8, 2025
7 Best Outdoor Lights (2025), Including Solar Lights

7 Best Outdoor Lights (2025), Including Solar Lights

June 26, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • You can’t libel the dead. But that doesn’t mean you should deepfake them.
  • Building a Human Handoff Interface for AI-Powered Insurance Agent Using Parlant and Streamlit
  • World Mental Health Day: How We Can Make Small Steps With Big Impact
  • AI Mode in Google Search expands to more than 40 new areas
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?