• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, March 13, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

How to Build an Autonomous Machine Learning Research Loop in Google Colab Using Andrej Karpathy’s AutoResearch Framework for Hyperparameter Discovery and Experiment Tracking

Josh by Josh
March 13, 2026
in Al, Analytics and Automation
0


In this tutorial, we implement a Colab-ready version of the AutoResearch framework originally proposed by Andrej Karpathy. We build an automated experimentation pipeline that clones the AutoResearch repository, prepares a lightweight training environment, and runs a baseline experiment to establish initial performance metrics. We then create an automated research loop that programmatically edits the hyperparameters in train.py, runs new training iterations, evaluates the resulting model using the validation bits-per-byte metric, and logs every experiment in a structured results table. By running this workflow in Google Colab, we demonstrate how we can reproduce the core idea of autonomous machine learning research: iteratively modifying training configurations, evaluating performance, and preserving the best configurations, without requiring specialized hardware or complex infrastructure.

import os, sys, subprocess, json, re, random, shutil, time
from pathlib import Path


def pip_install(pkg):
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])


for pkg in [
   "numpy","pandas","pyarrow","requests",
   "rustbpe","tiktoken","openai"
]:
   try:
       __import__(pkg)
   except:
       pip_install(pkg)


import pandas as pd


if not Path("autoresearch").exists():
   subprocess.run(["git","clone","https://github.com/karpathy/autoresearch.git"])


os.chdir("autoresearch")


OPENAI_API_KEY=None
try:
   from google.colab import userdata
   OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
except:
   OPENAI_API_KEY=os.environ.get("OPENAI_API_KEY")


if OPENAI_API_KEY:
   os.environ["OPENAI_API_KEY"]=OPENAI_API_KEY

We begin by importing the core Python libraries required for the automated research workflow. We install all necessary dependencies and clone the autoresearch repository directly from GitHub, ensuring the environment includes the original training framework. We also configure access to the OpenAI API key, if available, allowing the system to optionally support LLM-assisted experimentation later in the pipeline.

READ ALSO

Meta Unveils Four New Chips to Power Its AI and Recommendation Systems

New MIT class uses anthropology to improve chatbots | MIT News

prepare_path=Path("prepare.py")
train_path=Path("train.py")
program_path=Path("program.md")


prepare_text=prepare_path.read_text()
train_text=train_path.read_text()


prepare_text=re.sub(r"MAX_SEQ_LEN = \d+","MAX_SEQ_LEN = 512",prepare_text)
prepare_text=re.sub(r"TIME_BUDGET = \d+","TIME_BUDGET = 120",prepare_text)
prepare_text=re.sub(r"EVAL_TOKENS = .*","EVAL_TOKENS = 4 * 65536",prepare_text)


train_text=re.sub(r"DEPTH = \d+","DEPTH = 4",train_text)
train_text=re.sub(r"DEVICE_BATCH_SIZE = \d+","DEVICE_BATCH_SIZE = 16",train_text)
train_text=re.sub(r"TOTAL_BATCH_SIZE = .*","TOTAL_BATCH_SIZE = 2**17",train_text)
train_text=re.sub(r'WINDOW_PATTERN = "SSSL"','WINDOW_PATTERN = "L"',train_text)


prepare_path.write_text(prepare_text)
train_path.write_text(train_text)


program_path.write_text("""
Goal:
Run autonomous research loop on Google Colab.


Rules:
Only modify train.py hyperparameters.


Metric:
Lower val_bpb is better.
""")


subprocess.run(["python","prepare.py","--num-shards","4","--download-workers","2"])

We modify key configuration parameters inside the repository to make the training workflow compatible with Google Colab hardware. We reduce the context length, training time budget, and evaluation token counts so the experiments run within limited GPU resources. After applying these patches, we prepare the dataset shards required for training so that the model can immediately begin experiments.

subprocess.run("python train.py > baseline.log 2>&1",shell=True)


def parse_run_log(log_path):
   text=Path(log_path).read_text(errors="ignore")
   def find(p):
       m=re.search(p,text,re.MULTILINE)
       return float(m.group(1)) if m else None
   return {
       "val_bpb":find(r"^val_bpb:\s*([0-9.]+)"),
       "training_seconds":find(r"^training_seconds:\s*([0-9.]+)"),
       "peak_vram_mb":find(r"^peak_vram_mb:\s*([0-9.]+)"),
       "num_steps":find(r"^num_steps:\s*([0-9.]+)")
   }


baseline=parse_run_log("baseline.log")


results_path=Path("results.tsv")


rows=[{
   "commit":"baseline",
   "val_bpb":baseline["val_bpb"] if baseline["val_bpb"] else 0,
   "memory_gb":round((baseline["peak_vram_mb"] or 0)/1024,1),
   "status":"keep",
   "description":"baseline"
}]


pd.DataFrame(rows).to_csv(results_path,sep="\t",index=False)


print("Baseline:",baseline)

We execute the baseline training run to establish an initial performance reference for the model. We implement a log-parsing function that extracts key training metrics, including validation bits-per-byte, training time, GPU memory usage, and optimization steps. We then store these baseline results in a structured experiment table so that all future experiments can be compared against this starting configuration.

TRAIN_FILE=Path("train.py")
BACKUP_FILE=Path("train.base.py")


if not BACKUP_FILE.exists():
   shutil.copy2(TRAIN_FILE,BACKUP_FILE)


HP_KEYS=[
"WINDOW_PATTERN",
"TOTAL_BATCH_SIZE",
"EMBEDDING_LR",
"UNEMBEDDING_LR",
"MATRIX_LR",
"SCALAR_LR",
"WEIGHT_DECAY",
"ADAM_BETAS",
"WARMUP_RATIO",
"WARMDOWN_RATIO",
"FINAL_LR_FRAC",
"DEPTH",
"DEVICE_BATCH_SIZE"
]


def read_text(path):
   return Path(path).read_text()


def write_text(path,text):
   Path(path).write_text(text)


def extract_hparams(text):
   vals={}
   for k in HP_KEYS:
       m=re.search(rf"^{k}\s*=\s*(.+?)$",text,re.MULTILINE)
       if m:
           vals[k]=m.group(1).strip()
   return vals


def set_hparam(text,key,value):
   return re.sub(rf"^{key}\s*=.*$",f"{key} = {value}",text,flags=re.MULTILINE)


base_text=read_text(BACKUP_FILE)
base_hparams=extract_hparams(base_text)


SEARCH_SPACE={
"WINDOW_PATTERN":['"L"','"SSSL"'],
"TOTAL_BATCH_SIZE":["2**16","2**17","2**18"],
"EMBEDDING_LR":["0.2","0.4","0.6"],
"MATRIX_LR":["0.01","0.02","0.04"],
"SCALAR_LR":["0.3","0.5","0.7"],
"WEIGHT_DECAY":["0.05","0.1","0.2"],
"ADAM_BETAS":["(0.8,0.95)","(0.9,0.95)"],
"WARMUP_RATIO":["0.0","0.05","0.1"],
"WARMDOWN_RATIO":["0.3","0.5","0.7"],
"FINAL_LR_FRAC":["0.0","0.05"],
"DEPTH":["3","4","5","6"],
"DEVICE_BATCH_SIZE":["8","12","16","24"]
}


def sample_candidate():
   keys=random.sample(list(SEARCH_SPACE.keys()),random.choice([2,3,4]))
   cand=dict(base_hparams)
   changes={}
   for k in keys:
       cand[k]=random.choice(SEARCH_SPACE[k])
       changes[k]=cand[k]
   return cand,changes


def apply_hparams(candidate):
   text=read_text(BACKUP_FILE)
   for k,v in candidate.items():
       text=set_hparam(text,k,v)
   write_text(TRAIN_FILE,text)


def run_experiment(tag):
   log=f"{tag}.log"
   subprocess.run(f"python train.py > {log} 2>&1",shell=True)
   metrics=parse_run_log(log)
   metrics["log"]=log
   return metrics

We build the core utilities that enable automated hyperparameter experimentation. We extract the hyperparameters from train.py, define the searchable parameter space, and implement functions that can programmatically edit these values. We also create mechanisms to generate candidate configurations, apply them to the training script, and run experiments while recording their outputs.

N_EXPERIMENTS=3


df=pd.read_csv(results_path,sep="\t")
best=df["val_bpb"].replace(0,999).min()


for i in range(N_EXPERIMENTS):


   tag=f"exp_{i+1}"


   candidate,changes=sample_candidate()


   apply_hparams(candidate)


   metrics=run_experiment(tag)


   if metrics["val_bpb"] and metrics["val_bpb"]<best:
       status="keep"
       best=metrics["val_bpb"]
       shutil.copy2(TRAIN_FILE,BACKUP_FILE)
   else:
       status="discard"
       shutil.copy2(BACKUP_FILE,TRAIN_FILE)


   row={
       "commit":tag,
       "val_bpb":metrics["val_bpb"] or 0,
       "memory_gb":round((metrics["peak_vram_mb"] or 0)/1024,1),
       "status":status,
       "description":str(changes)
   }


   df=pd.concat([df,pd.DataFrame([row])],ignore_index=True)
   df.to_csv(results_path,sep="\t",index=False)


   print("Experiment",tag)
   print("Changes:",changes)
   print("Metrics:",metrics)
   print("Status:",status)
   print()


print("Final Results")
print(df.sort_values("val_bpb"))


try:
   from google.colab import files
   files.download("train.py")
   files.download("results.tsv")
except:
   pass

We run the automated research loop that repeatedly proposes new hyperparameter configurations and evaluates their performance. For each experiment, we modify the training script, run the training process, and compare the resulting validation score with the best configuration discovered so far. We log all experiment results, preserve improved configurations, and export the best training script along with the experiment history for further analysis.

In conclusion, we constructed a complete automated research workflow that demonstrates how machines can iteratively explore model configurations and improve training performance with minimal manual intervention. Throughout the tutorial, we prepared the dataset, established a baseline experiment, and implemented a search loop that proposes new hyperparameter configurations, runs experiments, and tracks results across multiple trials. By maintaining experiment logs and automatically preserving improved configurations, we created a reproducible and extensible research process that mirrors the workflow used in modern machine learning experimentation. This approach illustrates how we can combine automation, experimentation tracking, and lightweight infrastructure to accelerate model development and enable scalable research directly from a cloud notebook environment.


Check out Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source_link

Related Posts

Meta Unveils Four New Chips to Power Its AI and Recommendation Systems
Al, Analytics and Automation

Meta Unveils Four New Chips to Power Its AI and Recommendation Systems

March 12, 2026
New MIT class uses anthropology to improve chatbots | MIT News
Al, Analytics and Automation

New MIT class uses anthropology to improve chatbots | MIT News

March 12, 2026
How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments
Al, Analytics and Automation

How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments

March 12, 2026
3 Questions: On the future of AI and the mathematical and physical sciences | MIT News
Al, Analytics and Automation

3 Questions: On the future of AI and the mathematical and physical sciences | MIT News

March 12, 2026
NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI
Al, Analytics and Automation

NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI

March 11, 2026
A better method for planning complex visual tasks | MIT News
Al, Analytics and Automation

A better method for planning complex visual tasks | MIT News

March 11, 2026
Next Post
How to watch Jensen Huang’s Nvidia GTC 2026 keynote

How to watch Jensen Huang’s Nvidia GTC 2026 keynote

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

4 Tips for Using Social Media to Sell the Benefits of Log Homes

4 Tips for Using Social Media to Sell the Benefits of Log Homes

July 21, 2025
Tiktok, Instagram & Youtube Pet Focus For 2026

Tiktok, Instagram & Youtube Pet Focus For 2026

March 5, 2026
Seeing Images Through the Eyes of Decision Trees

Seeing Images Through the Eyes of Decision Trees

August 23, 2025
The End of ‘Hire Fast, Fire Fast’ in Tech: Building Workforce Sustainability with EVP

The End of ‘Hire Fast, Fire Fast’ in Tech: Building Workforce Sustainability with EVP

June 12, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • A 4-part process for building an executive voice framework
  • How to watch Jensen Huang’s Nvidia GTC 2026 keynote
  • How to Build an Autonomous Machine Learning Research Loop in Google Colab Using Andrej Karpathy’s AutoResearch Framework for Hyperparameter Discovery and Experiment Tracking
  • New AI features in Google Maps
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions