• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, June 12, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

A Coding Implementation for an Agentic AI Framework that Performs Literature Analysis, Hypothesis Generation, Experimental Planning, Simulation, and Scientific Reporting

Josh by Josh
November 28, 2025
in Al, Analytics and Automation
0
A Coding Implementation for an Agentic AI Framework that Performs Literature Analysis, Hypothesis Generation, Experimental Planning, Simulation, and Scientific Reporting


In this tutorial, we build a complete scientific discovery agent step by step and experience how each component works together to form a coherent research workflow. We begin by loading our literature corpus, constructing retrieval and LLM modules, and then assembling agents that search papers, generate hypotheses, design experiments, and produce structured reports. Through snippets mentioned below, we see how an agentic pipeline emerges naturally, allowing us to explore a scientific question from initial curiosity to a full analysis within a single, integrated system. Check out the FULL CODES here.

import sys, subprocess


def install_deps():
   pkgs = ["transformers", "scikit-learn", "numpy"]
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)


try:
   from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
   from sklearn.feature_extraction.text import TfidfVectorizer
   from sklearn.metrics.pairwise import cosine_similarity
   import numpy as np
except ImportError:
   install_deps()
   from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
   from sklearn.feature_extraction.text import TfidfVectorizer
   from sklearn.metrics.pairwise import cosine_similarity
   import numpy as np


from dataclasses import dataclass
from typing import List, Dict, Any


np.random.seed(42)


LITERATURE = [
   {"id": "P1","title": "Self-Supervised Protein Language Models for Structure Prediction","field": "computational biology",
    "abstract": "We explore transformer-based protein language models trained on millions of sequences. The models learn residue-level embeddings that improve secondary structure prediction and stability estimation."},
   {"id": "P2","title": "CRISPR Off-Target Detection Using Deep Learning","field": "genome editing",
    "abstract": "We propose a convolutional neural network architecture for predicting CRISPR-Cas9 off-target effects directly from genomic sequences, achieving state-of-the-art accuracy on GUIDE-seq datasets."},
   {"id": "P3","title": "Foundation Models for Scientific Equation Discovery","field": "scientific ML",
    "abstract": "Large language models are combined with symbolic regression to recover governing equations from noisy experimental observations in physics and fluid dynamics."},
   {"id": "P4","title": "Active Learning for Materials Property Optimization","field": "materials science",
    "abstract": "We integrate Bayesian optimization with graph neural networks to actively select candidate materials that maximize target properties while reducing experimental cost."},
   {"id": "P5","title": "Graph-Based Retrieval for Cross-Domain Literature Review","field": "NLP for science",
    "abstract": "We construct a heterogeneous citation and concept graph over multi-domain scientific papers and show that graph-aware retrieval improves cross-domain literature exploration."},
]


corpus_texts = [p["abstract"] + " " + p["title"] for p in LITERATURE]
vectorizer = TfidfVectorizer(stop_words="english")
corpus_matrix = vectorizer.fit_transform(corpus_texts)


MODEL_NAME = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)


def generate_text(prompt: str, max_new_tokens: int = 256) -> str:
   inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
   outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, num_beams=4, early_stopping=True)
   return tokenizer.decode(outputs[0], skip_special_tokens=True)

We laid the foundation for our scientific agent by loading libraries, preparing the literature corpus, and initializing our language model. We build the TF-IDF vectorizer and embed all abstracts to later retrieve relevant papers. With the model loaded and data structured, we create the computational backbone for everything that follows. Check out the FULL CODES here.

@dataclass
class PaperHit:
   paper: Dict[str, Any]
   score: float


class LiteratureAgent:
   def __init__(self, vectorizer, corpus_matrix, papers: List[Dict[str, Any]]):
       self.vectorizer = vectorizer
       self.corpus_matrix = corpus_matrix
       self.papers = papers


   def search(self, query: str, k: int = 3) -> List[PaperHit]:
       q_vec = self.vectorizer.transform([query])
       sims = cosine_similarity(q_vec, self.corpus_matrix)[0]
       idxs = np.argsort(-sims)[:k]
       hits = [PaperHit(self.papers[i], float(sims[i])) for i in idxs]
       return hits

We implement the literature-search component of our agent. We convert user queries into a vector space and identify the most relevant scientific papers using cosine similarity. Through this, we give our system the ability to ground its reasoning in the closest-matching prior work. Check out the FULL CODES here.

@dataclass
class ExperimentPlan:
   system: str
   hypothesis: str
   variables: Dict[str, Any]
   protocol: List[str]


@dataclass
class ExperimentResult:
   plan: ExperimentPlan
   metrics: Dict[str, float]


class ExperimentAgent:
   def design_experiment(self, question: str, hypothesis: str, hits: List[PaperHit]) -> ExperimentPlan:
       top_field = hits[0].paper["field"] if hits else "computational science"
       protocol = [
           f"Construct dataset combining ideas from: {', '.join(h.paper['id'] for h in hits)}.",
           "Split data into train/validation/test.",
           "Compare baseline model vs. augmented model implementing the hypothesis.",
           "Evaluate using appropriate metrics and perform ablation analysis.",
       ]
       variables = {
           "baseline_model": "sequence CNN",
           "augmented_model": "protein language model + CNN",
           "n_train_samples": 5000,
           "n_validation_samples": 1000,
           "metric": "AUROC",
       }
       system = f"{top_field} system related to: {question}"
       return ExperimentPlan(system=system, hypothesis=hypothesis, variables=variables, protocol=protocol)


   def run_experiment(self, plan: ExperimentPlan) -> ExperimentResult:
       base = 0.78 + 0.02 * np.random.randn()
       gain = abs(0.05 + 0.01 * np.random.randn())
       metrics = {
           "baseline_AUROC": round(base, 3),
           "augmented_AUROC": round(base + gain, 3),
           "estimated_gain": round(gain, 3),
       }
       return ExperimentResult(plan=plan, metrics=metrics)

We design and simulate experiments based on the retrieved literature and the generated hypothesis. We automatically define variables, build a protocol, and generate synthetic metrics that imitate the dynamics of a real scientific evaluation. This lets us move from theoretical ideas to an actionable experimental plan. Check out the FULL CODES here.

class ReportAgent:
   def write_report(self, question: str, hits: List[PaperHit], plan: ExperimentPlan, result: ExperimentResult) -> str:
       related_work = "\n".join(f"- {h.paper['title']} ({h.paper['field']})" for h in hits)
       protocol_str = "\n".join(f"- {step}" for step in plan.protocol)
       prompt = f"""
You are an AI research assistant writing a concise research-style report.


Research question:
{question}


Hypothesis:
{plan.hypothesis}


Relevant prior work:
{related_work}


Planned experiment:
System: {plan.system}
Variables: {plan.variables}
Protocol:
{protocol_str}


Simulated results:
{result.metrics}


Write a clear report with the following sections:
1. Background
2. Proposed Approach
3. Experimental Setup
4. Results and Discussion
5. Limitations and Future Work
"""
       return generate_text(prompt.strip(), max_new_tokens=320)

We generate a full research-style report using the LLM. We assemble the hypothesis, protocol, results, and related work into a structured document with clearly defined sections. This allows us to turn the pipeline’s raw outputs into polished scientific communication. Check out the FULL CODES here.

class ScientificAgent:
   def __init__(self):
       self.lit_agent = LiteratureAgent(vectorizer, corpus_matrix, LITERATURE)
       self.exp_agent = ExperimentAgent()
       self.report_agent = ReportAgent()


   def propose_hypothesis(self, question: str, hits: List[PaperHit]) -> str:
       context = " ".join(h.paper["abstract"] for h in hits)
       prompt = f"""
You are an AI scientist. Given a research question and related abstracts,
propose a single, testable hypothesis in 2-3 sentences.


Research question:
{question}


Related abstracts:
{context}
"""
       return generate_text(prompt.strip(), max_new_tokens=96)


   def run_pipeline(self, question: str) -> str:
       hits = self.lit_agent.search(question, k=3)
       hypothesis = self.propose_hypothesis(question, hits)
       plan = self.exp_agent.design_experiment(question, hypothesis, hits)
       result = self.exp_agent.run_experiment(plan)
       report = self.report_agent.write_report(question, hits, plan, result)
       return report


if __name__ == "__main__":
   research_question = (
       "How can protein language model embeddings improve CRISPR off-target "
       "prediction compared to sequence-only CNN baselines?"
   )
   agent = ScientificAgent()
   final_report = agent.run_pipeline(research_question)
   print(final_report)

We orchestrate the entire pipeline, searching the literature, generating a hypothesis, designing the experiment, running the simulation, and writing the report. We then execute the system on a real research question and observe the complete workflow in action. This step brings all the modules together into a unified scientific agent.

In conclusion, we see how a compact codebase can evolve into a functioning AI co-researcher capable of searching, reasoning, simulating, and summarizing. We understand how each snippet contributes to the full pipeline and how agentic components amplify one another when combined. Also, we place ourselves in a strong position to extend the agent with richer literature sources, more realistic models, and more sophisticated experimental logic, pushing our scientific exploration further with every iteration.


Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



Source_link

READ ALSO

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

Related Posts

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News
Al, Analytics and Automation

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News

June 11, 2026
Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding
Al, Analytics and Automation

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

June 11, 2026
Building Semantic Search with Transformers.js and Sentence Embeddings
Al, Analytics and Automation

Building Semantic Search with Transformers.js and Sentence Embeddings

June 11, 2026
Startup’s nuclear-inspired cooling system could make data centers more sustainable | MIT News
Al, Analytics and Automation

Startup’s nuclear-inspired cooling system could make data centers more sustainable | MIT News

June 10, 2026
Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared
Al, Analytics and Automation

Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared

June 10, 2026
The Practitioner’s Guide to AgentOps
Al, Analytics and Automation

The Practitioner’s Guide to AgentOps

June 10, 2026
Next Post
Prompt Security's Itamar Golan on why generative AI security requires building a category, not a feature

Prompt Security's Itamar Golan on why generative AI security requires building a category, not a feature

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Pourquoi personnaliser votre « Return Path » ?

Pourquoi personnaliser votre « Return Path » ?

June 4, 2026
Enhancing maritime cybersecurity with technology and policy | MIT News

Enhancing maritime cybersecurity with technology and policy | MIT News

February 25, 2026
Google Pixel 10 Pro Fold colors leak, including more ‘Jade’

Google Pixel 10 Pro Fold colors leak, including more ‘Jade’

July 23, 2025
New Orleans Local’s Guide: Food, Music, and More

New Orleans Local’s Guide: Food, Music, and More

June 8, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Enterprise Regulatory Risk Management in Australia in 2026
  • Square ‘Jazzes’ up the 2026 National Restaurant Association Show
  • Google announces new community investments in Virginia
  • Behind the iconic McNuggets with caviar campaign
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions