• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Wednesday, April 8, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

How to Design a Swiss Army Knife Research Agent with Tool-Using AI, Web Search, PDF Analysis, Vision, and Automated Reporting

Josh by Josh
February 20, 2026
in Al, Analytics and Automation
0


In this tutorial, we build a “Swiss Army Knife” research agent that goes far beyond simple chat interactions and actively solves multi-step research problems end-to-end. We combine a tool-using agent architecture with live web search, local PDF ingestion, vision-based chart analysis, and automated report generation to demonstrate how modern agents can reason, verify, and produce structured outputs. By wiring together small agents, OpenAI models, and practical data-extraction utilities, we show how a single agent can explore sources, cross-check claims, and synthesize findings into professional-grade Markdown and DOCX reports.

%pip -q install -U smolagents openai trafilatura duckduckgo-search pypdf pymupdf python-docx pillow tqdm


import os, re, json, getpass
from typing import List, Dict, Any
import requests
import trafilatura
from duckduckgo_search import DDGS
from pypdf import PdfReader
import fitz
from docx import Document
from docx.shared import Pt
from datetime import datetime


from openai import OpenAI
from smolagents import CodeAgent, OpenAIModel, tool


if not os.environ.get("OPENAI_API_KEY"):
   os.environ["OPENAI_API_KEY"] = getpass.getpass("Paste your OpenAI API key (hidden): ").strip()
print("OPENAI_API_KEY set:", "YES" if os.environ.get("OPENAI_API_KEY") else "NO")


if not os.environ.get("SERPER_API_KEY"):
   serper = getpass.getpass("Optional: Paste SERPER_API_KEY for Google results (press Enter to skip): ").strip()
   if serper:
       os.environ["SERPER_API_KEY"] = serper
print("SERPER_API_KEY set:", "YES" if os.environ.get("SERPER_API_KEY") else "NO")


client = OpenAI()


def _now():
   return datetime.utcnow().strftime("%Y-%m-%d %H:%M:%SZ")


def _safe_filename(s: str) -> str:
   s = re.sub(r"[^a-zA-Z0-9._-]+", "_", s).strip("_")
   return s[:180] if s else "file"

We set up the full execution environment and securely load all required credentials without hardcoding secrets. We import all dependencies required for web search, document parsing, vision analysis, and agent orchestration. We also initialize shared utilities to standardize timestamps and file naming throughout the workflow.

try:
   from google.colab import files
   os.makedirs("/content/pdfs", exist_ok=True)
   uploaded = files.upload()
   for name, data in uploaded.items():
       if name.lower().endswith(".pdf"):
           with open(f"/content/pdfs/{name}", "wb") as f:
               f.write(data)
   print("PDFs in /content/pdfs:", os.listdir("/content/pdfs"))
except Exception as e:
   print("Upload skipped:", str(e))


def web_search(query: str, k: int = 6) -> List[Dict[str, str]]:
   serper_key = os.environ.get("SERPER_API_KEY", "").strip()
   if serper_key:
       resp = requests.post(
           "https://google.serper.dev/search",
           headers={"X-API-KEY": serper_key, "Content-Type": "application/json"},
           json={"q": query, "num": k},
           timeout=30,
       )
       resp.raise_for_status()
       data = resp.json()
       out = []
       for item in (data.get("organic") or [])[:k]:
           out.append({
               "title": item.get("title",""),
               "url": item.get("link",""),
               "snippet": item.get("snippet",""),
           })
       return out


   out = []
   with DDGS() as ddgs:
       for r in ddgs.text(query, max_results=k):
           out.append({
               "title": r.get("title",""),
               "url": r.get("href",""),
               "snippet": r.get("body",""),
           })
   return out


def fetch_url_text(url: str) -> Dict[str, Any]:
   try:
       downloaded = trafilatura.fetch_url(url, timeout=30)
       if not downloaded:
           return {"url": url, "ok": False, "error": "fetch_failed", "text": ""}
       text = trafilatura.extract(downloaded, include_comments=False, include_tables=True)
       if not text:
           return {"url": url, "ok": False, "error": "extract_failed", "text": ""}
       title_guess = next((ln.strip() for ln in text.splitlines() if ln.strip()), "")[:120]
       return {"url": url, "ok": True, "title_guess": title_guess, "text": text}
   except Exception as e:
       return {"url": url, "ok": False, "error": str(e), "text": ""}

We enable local PDF ingestion and establish a flexible web search pipeline that works with or without a paid search API. We show how we gracefully handle optional inputs while maintaining a reliable research flow. We also implement robust URL fetching and text extraction to prepare clean source material for downstream reasoning.

def read_pdf_text(pdf_path: str, max_pages: int = 30) -> Dict[str, Any]:
   reader = PdfReader(pdf_path)
   pages = min(len(reader.pages), max_pages)
   chunks = []
   for i in range(pages):
       try:
           chunks.append(reader.pages[i].extract_text() or "")
       except Exception:
           chunks.append("")
   return {"pdf_path": pdf_path, "pages_read": pages, "text": "\n\n".join(chunks).strip()}


def extract_pdf_images(pdf_path: str, out_dir: str = "/content/extracted_images", max_pages: int = 10) -> List[str]:
   os.makedirs(out_dir, exist_ok=True)
   doc = fitz.open(pdf_path)
   saved = []
   pages = min(len(doc), max_pages)
   base = _safe_filename(os.path.basename(pdf_path).rsplit(".", 1)[0])


   for p in range(pages):
       page = doc[p]
       img_list = page.get_images(full=True)
       for img_i, img in enumerate(img_list):
           xref = img[0]
           pix = fitz.Pixmap(doc, xref)
           if pix.n - pix.alpha >= 4:
               pix = fitz.Pixmap(fitz.csRGB, pix)
           img_path = os.path.join(out_dir, f"{base}_p{p+1}_img{img_i+1}.png")
           pix.save(img_path)
           saved.append(img_path)


   doc.close()
   return saved


def vision_analyze_image(image_path: str, question: str, model: str = "gpt-4.1-mini") -> Dict[str, Any]:
   with open(image_path, "rb") as f:
       img_bytes = f.read()


   resp = client.responses.create(
       model=model,
       input=[{
           "role": "user",
           "content": [
               {"type": "input_text", "text": f"Answer concisely and accurately.\n\nQuestion: {question}"},
               {"type": "input_image", "image_data": img_bytes},
           ],
       }],
   )
   return {"image_path": image_path, "answer": resp.output_text}

We focus on deep document understanding by extracting structured text and visual artifacts from PDFs. We integrate a vision-capable model to interpret charts and figures instead of treating them as opaque images. We ensure that numerical trends and visual insights can be converted into explicit, text-based evidence.

def write_markdown(path: str, content: str) -> str:
   os.makedirs(os.path.dirname(path), exist_ok=True)
   with open(path, "w", encoding="utf-8") as f:
       f.write(content)
   return path


def write_docx_from_markdown(docx_path: str, md: str, title: str = "Research Report") -> str:
   os.makedirs(os.path.dirname(docx_path), exist_ok=True)
   doc = Document()
   t = doc.add_paragraph()
   run = t.add_run(title)
   run.bold = True
   run.font.size = Pt(18)
   meta = doc.add_paragraph()
   meta.add_run(f"Generated: {_now()}").italic = True
   doc.add_paragraph("")
   for line in md.splitlines():
       line = line.rstrip()
       if not line:
           doc.add_paragraph("")
           continue
       if line.startswith("# "):
           doc.add_heading(line[2:].strip(), level=1)
       elif line.startswith("## "):
           doc.add_heading(line[3:].strip(), level=2)
       elif line.startswith("### "):
           doc.add_heading(line[4:].strip(), level=3)
       elif re.match(r"^\s*[-*]\s+", line):
           p = doc.add_paragraph(style="List Bullet")
           p.add_run(re.sub(r"^\s*[-*]\s+", "", line).strip())
       else:
           doc.add_paragraph(line)
   doc.save(docx_path)
   return docx_path


@tool
def t_web_search(query: str, k: int = 6) -> str:
   return json.dumps(web_search(query, k), ensure_ascii=False)


@tool
def t_fetch_url_text(url: str) -> str:
   return json.dumps(fetch_url_text(url), ensure_ascii=False)


@tool
def t_list_pdfs() -> str:
   pdf_dir = "/content/pdfs"
   if not os.path.isdir(pdf_dir):
       return json.dumps([])
   paths = [os.path.join(pdf_dir, f) for f in os.listdir(pdf_dir) if f.lower().endswith(".pdf")]
   return json.dumps(sorted(paths), ensure_ascii=False)


@tool
def t_read_pdf_text(pdf_path: str, max_pages: int = 30) -> str:
   return json.dumps(read_pdf_text(pdf_path, max_pages=max_pages), ensure_ascii=False)


@tool
def t_extract_pdf_images(pdf_path: str, max_pages: int = 10) -> str:
   imgs = extract_pdf_images(pdf_path, max_pages=max_pages)
   return json.dumps(imgs, ensure_ascii=False)


@tool
def t_vision_analyze_image(image_path: str, question: str) -> str:
   return json.dumps(vision_analyze_image(image_path, question), ensure_ascii=False)


@tool
def t_write_markdown(path: str, content: str) -> str:
   return write_markdown(path, content)


@tool
def t_write_docx_from_markdown(docx_path: str, md_path: str, title: str = "Research Report") -> str:
   with open(md_path, "r", encoding="utf-8") as f:
       md = f.read()
   return write_docx_from_markdown(docx_path, md, title=title)

We implement the full output layer by generating Markdown reports and converting them into polished DOCX documents. We expose all core capabilities as explicit tools that the agent can reason about and invoke step by step. We ensure that every transformation from raw data to final report remains deterministic and inspectable.

model = OpenAIModel(model_id="gpt-5")


agent = CodeAgent(
   tools=[
       t_web_search,
       t_fetch_url_text,
       t_list_pdfs,
       t_read_pdf_text,
       t_extract_pdf_images,
       t_vision_analyze_image,
       t_write_markdown,
       t_write_docx_from_markdown,
   ],
   model=model,
   add_base_tools=False,
   additional_authorized_imports=["json","re","os","math","datetime","time","textwrap"],
)


SYSTEM_INSTRUCTIONS = """
You are a Swiss Army Knife Research Agent.
"""


def run_research(topic: str):
   os.makedirs("/content/report", exist_ok=True)
   prompt = f"""{SYSTEM_INSTRUCTIONS.strip()}


Research question:
{topic}


Steps:
1) List available PDFs (if any) and decide which are relevant.
2) Do web search for the topic.
3) Fetch and extract the text of the best sources.
4) If PDFs exist, extract text and images.
5) Visually analyze figures.
6) Write a Markdown report and convert to DOCX.
"""
   return agent.run(prompt)


topic = "Build a research brief on the most reliable design patterns for tool-using agents (2024-2026), focusing on evaluation, citations, and failure modes."
out = run_research(topic)
print(out[:1500] if isinstance(out, str) else out)


try:
   from google.colab import files
   files.download("/content/report/report.md")
   files.download("/content/report/report.docx")
except Exception as e:
   print("Download skipped:", str(e))

We assemble the complete research agent and define a structured execution plan for multi-step reasoning. We guide the agent to search, analyze, synthesize, and write using a single coherent prompt. We demonstrate how the agent produces a finished research artifact that can be reviewed, shared, and reused immediately.

In conclusion, we demonstrated how a well-designed tool-using agent can function as a reliable research assistant rather than a conversational toy. We showcased how explicit tools, disciplined prompting, and step-by-step execution allow the agent to search the web, analyze documents and visuals, and generate traceable, citation-aware reports. This approach offers a practical blueprint for building trustworthy research agents that emphasize evaluation, evidence, and failure awareness, capabilities increasingly essential for real-world AI systems.


Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source_link

READ ALSO

Sixteen new START.nano companies are developing hard-tech solutions with the support of MIT.nano | MIT News

How to Deploy Open WebUI with Secure OpenAI API Integration, Public Tunneling, and Browser-Based Chat Access

Related Posts

Sixteen new START.nano companies are developing hard-tech solutions with the support of MIT.nano | MIT News
Al, Analytics and Automation

Sixteen new START.nano companies are developing hard-tech solutions with the support of MIT.nano | MIT News

April 8, 2026
How to Deploy Open WebUI with Secure OpenAI API Integration, Public Tunneling, and Browser-Based Chat Access
Al, Analytics and Automation

How to Deploy Open WebUI with Secure OpenAI API Integration, Public Tunneling, and Browser-Based Chat Access

April 8, 2026
Helping data centers deliver higher performance with less hardware | MIT News
Al, Analytics and Automation

Helping data centers deliver higher performance with less hardware | MIT News

April 7, 2026
Al, Analytics and Automation

Meta AI Releases EUPE: A Compact Vision Encoder Family Under 100M Parameters That Rivals Specialist Models Across Image Understanding, Dense Prediction, and VLM Tasks

April 7, 2026
How to Build a Netflix VOID Video Object Removal and Inpainting Pipeline with CogVideoX, Custom Prompting, and End-to-End Sample Inference
Al, Analytics and Automation

How to Build a Netflix VOID Video Object Removal and Inpainting Pipeline with CogVideoX, Custom Prompting, and End-to-End Sample Inference

April 6, 2026
RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models
Al, Analytics and Automation

RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models

April 6, 2026
Next Post
Instacart Entertained Its Way to The Top (of Minds)

Instacart Entertained Its Way to The Top (of Minds)

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

The human harbor: Navigating identity and meaning in the AI age

The human harbor: Navigating identity and meaning in the AI age

July 14, 2025

The Scoop: Discord clarifies age verification policy after privacy backlash

February 12, 2026
Legal AI giant Harvey acquires Hexus as competition heats up in legal tech

Legal AI giant Harvey acquires Hexus as competition heats up in legal tech

January 24, 2026
Expo 2025 Osaka, Part Two: 10 Moments We Loved, and Some We Didn’t

Expo 2025 Osaka, Part Two: 10 Moments We Loved, and Some We Didn’t

June 18, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Boost Brand Awareness With Influencer Marketing
  •  Why ‘we don’t know yet’ may be your best crisis response
  • Atlassian launches visual AI tools and third-party agents in Confluence
  • How to Create a Storytelling Framework That Works
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions