• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Saturday, March 7, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks

Josh by Josh
January 17, 2026
in Al, Analytics and Automation
0
How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks


In this tutorial, we build an advanced agentic AI workflow using LlamaIndex and OpenAI models. We focus on designing a reliable retrieval-augmented generation (RAG) agent that can reason over evidence, use tools deliberately, and evaluate its own outputs for quality. By structuring the system around retrieval, answer synthesis, and self-evaluation, we demonstrate how agentic patterns go beyond simple chatbots and move toward more trustworthy, controllable AI systems suitable for research and analytical use cases.

!pip -q install -U llama-index llama-index-llms-openai llama-index-embeddings-openai nest_asyncio


import os
import asyncio
import nest_asyncio
nest_asyncio.apply()


from getpass import getpass


if not os.environ.get("OPENAI_API_KEY"):
   os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY: ")

We set up the environment and install all required dependencies for running an agentic AI workflow. We securely load the OpenAI API key at runtime, ensuring that credentials are never hardcoded. We also prepare the notebook to handle asynchronous execution smoothly.

from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding


Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0.2)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")


texts = [
   "Reliable RAG systems separate retrieval, synthesis, and verification. Common failures include hallucination and shallow retrieval.",
   "RAG evaluation focuses on faithfulness, answer relevancy, and retrieval quality.",
   "Tool-using agents require constrained tools, validation, and self-review loops.",
   "A robust workflow follows retrieve, answer, evaluate, and revise steps."
]


docs = [Document(text=t) for t in texts]
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=4)

We configure the OpenAI language model and embedding model and build a compact knowledge base for our agent. We transform raw text into indexed documents so that the agent can retrieve relevant evidence during reasoning.

from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator


faith_eval = FaithfulnessEvaluator(llm=Settings.llm)
rel_eval = RelevancyEvaluator(llm=Settings.llm)


def retrieve_evidence(q: str) -> str:
   r = query_engine.query(q)
   out = []
   for i, n in enumerate(r.source_nodes or []):
       out.append(f"[{i+1}] {n.node.get_content()[:300]}")
   return "\n".join(out)


def score_answer(q: str, a: str) -> str:
   r = query_engine.query(q)
   ctx = [n.node.get_content() for n in r.source_nodes or []]
   f = faith_eval.evaluate(query=q, response=a, contexts=ctx)
   r = rel_eval.evaluate(query=q, response=a, contexts=ctx)
   return f"Faithfulness: {f.score}\nRelevancy: {r.score}"

We define the core tools used by the agent: evidence retrieval and answer evaluation. We implement automatic scoring for faithfulness and relevancy so the agent can judge the quality of its own responses.

from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context


agent = ReActAgent(
   tools=[retrieve_evidence, score_answer],
   llm=Settings.llm,
   system_prompt="""
Always retrieve evidence first.
Produce a structured answer.
Evaluate the answer and revise once if scores are low.
""",
   verbose=True
)


ctx = Context(agent)

We create the ReAct-based agent and define its system behavior, guiding how it retrieves evidence, generates answers, and revises results. We also initialize the execution context that maintains the agent’s state across interactions. It step brings together tools and reasoning into a single agentic workflow.

async def run_brief(topic: str):
   q = f"Design a reliable RAG + tool-using agent workflow and how to evaluate it. Topic: {topic}"
   handler = agent.run(q, ctx=ctx)
   async for ev in handler.stream_events():
       print(getattr(ev, "delta", ""), end="")
   res = await handler
   return str(res)


topic = "RAG agent reliability and evaluation"
loop = asyncio.get_event_loop()
result = loop.run_until_complete(run_brief(topic))


print("\n\nFINAL OUTPUT\n")
print(result)

We execute the full agent loop by passing a topic into the system and streaming the agent’s reasoning and output. We allow the agent to complete its retrieval, generation, and evaluation cycle asynchronously.

In conclusion, we showcased how an agent can retrieve supporting evidence, generate a structured response, and assess its own faithfulness and relevancy before finalizing an answer. We kept the design modular and transparent, making it easy to extend the workflow with additional tools, evaluators, or domain-specific knowledge sources. This approach illustrates how we can use agentic AI with LlamaIndex and OpenAI models to build more capable systems that are also more reliable and self-aware in their reasoning and responses.


Check out the FULL CODES here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

READ ALSO

Pay for the data you’re using

A Coding Guide to Build a Scalable End-to-End Machine Learning Data Pipeline Using Daft for High-Performance Structured and Image Data Processing

Related Posts

Pay for the data you’re using
Al, Analytics and Automation

Pay for the data you’re using

March 6, 2026
Al, Analytics and Automation

A Coding Guide to Build a Scalable End-to-End Machine Learning Data Pipeline Using Daft for High-Performance Structured and Image Data Processing

March 6, 2026
Liquid AI Releases LocalCowork Powered By LFM2-24B-A2B to Execute Privacy-First Agent Workflows Locally Via Model Context Protocol (MCP)
Al, Analytics and Automation

Liquid AI Releases LocalCowork Powered By LFM2-24B-A2B to Execute Privacy-First Agent Workflows Locally Via Model Context Protocol (MCP)

March 6, 2026
CamSoda AI Chatbot Features and Pricing Model
Al, Analytics and Automation

CamSoda AI Chatbot Features and Pricing Model

March 6, 2026
OpenAI Releases Symphony: An Open Source Agentic Framework for Orchestrating Autonomous AI Agents through Structured, Scalable Implementation Runs
Al, Analytics and Automation

OpenAI Releases Symphony: An Open Source Agentic Framework for Orchestrating Autonomous AI Agents through Structured, Scalable Implementation Runs

March 5, 2026
YuanLab AI Releases Yuan 3.0 Ultra: A Flagship Multimodal MoE Foundation Model, Built for Stronger Intelligence and Unrivaled Efficiency
Al, Analytics and Automation

YuanLab AI Releases Yuan 3.0 Ultra: A Flagship Multimodal MoE Foundation Model, Built for Stronger Intelligence and Unrivaled Efficiency

March 5, 2026
Next Post
Musk’s Starlink in Iran only works if things don’t go wrong in outer space

Musk’s Starlink in Iran only works if things don’t go wrong in outer space

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Who are the leading SEO service providers?

Who are the leading SEO service providers?

September 28, 2025
What Is a Title Tag? How to Optimize Your SEO Titles

What Is a Title Tag? How to Optimize Your SEO Titles

February 24, 2026
White House plan signals “open-weight first” era—and enterprises need new guardrails

White House plan signals “open-weight first” era—and enterprises need new guardrails

July 24, 2025
Google acquiring part of HTC Vive team to boost Android XR

Google acquiring part of HTC Vive team to boost Android XR

June 8, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to Get Leads from ChatGPT With Commercial Citations
  • Is your social media strategy failing? Here’s why.
  • Self-driving cars could cut crashes — but make traffic and sprawl worse
  • Sevio Releases 2025 Programmatic Monetization Performance Report for Financial Audience Publishers
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions