• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, February 9, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks

Josh by Josh
January 17, 2026
in Al, Analytics and Automation
0
How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


In this tutorial, we build an advanced agentic AI workflow using LlamaIndex and OpenAI models. We focus on designing a reliable retrieval-augmented generation (RAG) agent that can reason over evidence, use tools deliberately, and evaluate its own outputs for quality. By structuring the system around retrieval, answer synthesis, and self-evaluation, we demonstrate how agentic patterns go beyond simple chatbots and move toward more trustworthy, controllable AI systems suitable for research and analytical use cases.

!pip -q install -U llama-index llama-index-llms-openai llama-index-embeddings-openai nest_asyncio


import os
import asyncio
import nest_asyncio
nest_asyncio.apply()


from getpass import getpass


if not os.environ.get("OPENAI_API_KEY"):
   os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY: ")

We set up the environment and install all required dependencies for running an agentic AI workflow. We securely load the OpenAI API key at runtime, ensuring that credentials are never hardcoded. We also prepare the notebook to handle asynchronous execution smoothly.

from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding


Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0.2)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")


texts = [
   "Reliable RAG systems separate retrieval, synthesis, and verification. Common failures include hallucination and shallow retrieval.",
   "RAG evaluation focuses on faithfulness, answer relevancy, and retrieval quality.",
   "Tool-using agents require constrained tools, validation, and self-review loops.",
   "A robust workflow follows retrieve, answer, evaluate, and revise steps."
]


docs = [Document(text=t) for t in texts]
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=4)

We configure the OpenAI language model and embedding model and build a compact knowledge base for our agent. We transform raw text into indexed documents so that the agent can retrieve relevant evidence during reasoning.

from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator


faith_eval = FaithfulnessEvaluator(llm=Settings.llm)
rel_eval = RelevancyEvaluator(llm=Settings.llm)


def retrieve_evidence(q: str) -> str:
   r = query_engine.query(q)
   out = []
   for i, n in enumerate(r.source_nodes or []):
       out.append(f"[{i+1}] {n.node.get_content()[:300]}")
   return "\n".join(out)


def score_answer(q: str, a: str) -> str:
   r = query_engine.query(q)
   ctx = [n.node.get_content() for n in r.source_nodes or []]
   f = faith_eval.evaluate(query=q, response=a, contexts=ctx)
   r = rel_eval.evaluate(query=q, response=a, contexts=ctx)
   return f"Faithfulness: {f.score}\nRelevancy: {r.score}"

We define the core tools used by the agent: evidence retrieval and answer evaluation. We implement automatic scoring for faithfulness and relevancy so the agent can judge the quality of its own responses.

from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context


agent = ReActAgent(
   tools=[retrieve_evidence, score_answer],
   llm=Settings.llm,
   system_prompt="""
Always retrieve evidence first.
Produce a structured answer.
Evaluate the answer and revise once if scores are low.
""",
   verbose=True
)


ctx = Context(agent)

We create the ReAct-based agent and define its system behavior, guiding how it retrieves evidence, generates answers, and revises results. We also initialize the execution context that maintains the agent’s state across interactions. It step brings together tools and reasoning into a single agentic workflow.

async def run_brief(topic: str):
   q = f"Design a reliable RAG + tool-using agent workflow and how to evaluate it. Topic: {topic}"
   handler = agent.run(q, ctx=ctx)
   async for ev in handler.stream_events():
       print(getattr(ev, "delta", ""), end="")
   res = await handler
   return str(res)


topic = "RAG agent reliability and evaluation"
loop = asyncio.get_event_loop()
result = loop.run_until_complete(run_brief(topic))


print("\n\nFINAL OUTPUT\n")
print(result)

We execute the full agent loop by passing a topic into the system and streaming the agent’s reasoning and output. We allow the agent to complete its retrieval, generation, and evaluation cycle asynchronously.

In conclusion, we showcased how an agent can retrieve supporting evidence, generate a structured response, and assess its own faithfulness and relevancy before finalizing an answer. We kept the design modular and transparent, making it easy to extend the workflow with additional tools, evaluators, or domain-specific knowledge sources. This approach illustrates how we can use agentic AI with LlamaIndex and OpenAI models to build more capable systems that are also more reliable and self-aware in their reasoning and responses.


Check out the FULL CODES here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

READ ALSO

37 AI Companions Statistics in 2025

Study: Platforms that rank the latest LLMs can be unreliable | MIT News

Related Posts

37 AI Companions Statistics in 2025
Al, Analytics and Automation

37 AI Companions Statistics in 2025

February 9, 2026
Study: Platforms that rank the latest LLMs can be unreliable | MIT News
Al, Analytics and Automation

Study: Platforms that rank the latest LLMs can be unreliable | MIT News

February 9, 2026
ByteDance Releases Protenix-v1: A New Open-Source Model Achieving AF3-Level Performance in Biomolecular Structure Prediction
Al, Analytics and Automation

ByteDance Releases Protenix-v1: A New Open-Source Model Achieving AF3-Level Performance in Biomolecular Structure Prediction

February 9, 2026
Erogen AI Image Generator Prices, Capabilities, and Feature Breakdown
Al, Analytics and Automation

Erogen AI Image Generator Prices, Capabilities, and Feature Breakdown

February 8, 2026
Al, Analytics and Automation

How to Design Production-Grade Mock Data Pipelines Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models

February 8, 2026
Feature Set and Subscription Pricing
Al, Analytics and Automation

Feature Set and Subscription Pricing

February 8, 2026
Next Post
Musk’s Starlink in Iran only works if things don’t go wrong in outer space

Musk’s Starlink in Iran only works if things don’t go wrong in outer space

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Frictionless Growth: An Interview with Ryan Hamilton

Frictionless Growth: An Interview with Ryan Hamilton

December 2, 2025
Shokz’ new earbuds effectively reduce noise while keeping your ears open

Shokz’ new earbuds effectively reduce noise while keeping your ears open

January 6, 2026
Are you leading diversity conversations at your company, yet?

Are you leading diversity conversations at your company, yet?

June 15, 2025
New Logo & Branding for Big Cartel by How & How — BP&O

New Logo & Branding for Big Cartel by How & How — BP&O

July 10, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Data Commons Hosted MCP: Zero-Install Public Data for AI
  • AI Personalization Strategies Beauty Brands Use to Boost Sales
  • A four-pack of first-gen AirTags is on sale for $64
  • 37 AI Companions Statistics in 2025
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?