• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, March 12, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Technology And Software

Confidence in agentic AI: Why eval infrastructure must come first

Josh by Josh
July 3, 2025
in Technology And Software
0
Confidence in agentic AI: Why eval infrastructure must come first


As AI agents enter real-world deployment, organizations are under pressure to define where they belong, how to build them effectively, and how to operationalize them at scale. At VentureBeat’s Transform 2025, tech leaders gathered to talk about how they’re transforming their business with agents: Joanne Chen, general partner at Foundation Capital; Shailesh Nalawadi, VP of project management with Sendbird; Thys Waanders, SVP of AI transformation at Cognigy; and Shawn Malhotra, CTO, Rocket Companies.

A few top agentic AI use cases

“The initial attraction of any of these deployments for AI agents tends to be around saving human capital — the math is pretty straightforward,” Nalawadi said. “However, that undersells the transformational capability you get with AI agents.”

At Rocket, AI agents have proven to be powerful tools in increasing website conversion.

“We’ve found that with our agent-based experience, the conversational experience on the website, clients are three times more likely to convert when they come through that channel,” Malhotra said.

But that’s just scratching the surface. For instance, a Rocket engineer built an agent in just two days to automate a highly specialized task: calculating transfer taxes during mortgage underwriting.

“That two days of effort saved us a million dollars a year in expense,” Malhotra said. “In 2024, we saved more than a million team member hours, mostly off the back of our AI solutions. That’s not just saving expense. It’s also allowing our team members to focus their time on people making what is often the largest financial transaction of their life.”

Agents are essentially supercharging individual team members. That million hours saved isn’t the entirety of someone’s job replicated many times. It’s fractions of the job that are things employees don’t enjoy doing, or weren’t adding value to the client. And that million hours saved gives Rocket the capacity to handle more business.

“Some of our team members were able to handle 50% more clients last year than they were the year before,” Malhotra added. “It means we can have higher throughput, drive more business, and again, we see higher conversion rates because they’re spending the time understanding the client’s needs versus doing a lot of more rote work that the AI can do now.”

Tackling agent complexity

“Part of the journey for our engineering teams is moving from the mindset of software engineering – write once and test it and it runs and gives the same answer 1,000 times – to the more probabilistic approach, where you ask the same thing of an LLM and it gives different answers through some probability,” Nalawadi said. “A lot of it has been bringing people along. Not just software engineers, but product managers and UX designers.”

What’s helped is that LLMs have come a long way, Waanders said. If they built something 18 months or two years ago, they really had to pick the right model, or the agent would not perform as expected. Now, he says, we’re now at a stage where most of the mainstream models behave very well. They’re more predictable. But today the challenge is combining models, ensuring responsiveness, orchestrating the right models in the right sequence and weaving in the right data.

“We have customers that push tens of millions of conversations per year,” Waanders said. “If you automate, say, 30 million conversations in a year, how does that scale in the LLM world? That’s all stuff that we had to discover, simple stuff, from even getting the model availability with the cloud providers. Having enough quota with a ChatGPT model, for example. Those are all learnings that we had to go through, and our customers as well. It’s a brand-new world.”

A layer above orchestrating the LLM is orchestrating a network of agents, Malhotra said. A conversational experience has a network of agents under the hood, and the orchestrator is deciding which agent to farm the request out to from those available.

“If you play that forward and think about having hundreds or thousands of agents who are capable of different things, you get some really interesting technical problems,” he said. “It’s becoming a bigger problem, because latency and time matter. That agent routing is going to be a very interesting problem to solve over the coming years.”

Tapping into vendor relationships

Up to this point, the first step for most companies launching agentic AI has been building in-house, because specialized tools didn’t yet exist. But you can’t differentiate and create value by building generic LLM infrastructure or AI infrastructure, and you need specialized expertise to go beyond the initial build, and debug, iterate, and improve on what’s been built, as well as maintain the infrastructure.

“Often we find the most successful conversations we have with prospective customers tend to be someone who’s already built something in-house,” Nalawadi said. “They quickly realize that getting to a 1.0 is okay, but as the world evolves and as the infrastructure evolves and as they need to swap out technology for something new, they don’t have the ability to orchestrate all these things.”

Preparing for agentic AI complexity

Theoretically, agentic AI will only grow in complexity — the number of agents in an organization will rise, and they’ll start learning from each other, and the number of use cases will explode. How can organizations prepare for the challenge?

“It means that the checks and balances in your system will get stressed more,” Malhotra said. “For something that has a regulatory process, you have a human in the loop to make sure that someone is signing off on this. For critical internal processes or data access, do you have observability? Do you have the right alerting and monitoring so that if something goes wrong, you know it’s going wrong? It’s doubling down on your detection, understanding where you need a human in the loop, and then trusting that those processes are going to catch if something does go wrong. But because of the power it unlocks, you have to do it.”

So how can you have confidence that an AI agent will behave reliably as it evolves?

“That part is really difficult if you haven’t thought about it at the beginning,” Nalawadi said. “The short answer is, before you even start building it, you should have an eval infrastructure in place. Make sure you have a rigorous environment in which you know what good looks like, from an AI agent, and that you have this test set. Keep referring back to it as you make improvements. A very simplistic way of thinking about eval is that it’s the unit tests for your agentic system.”

The problem is, it’s non-deterministic, Waanders added. Unit testing is critical, but the biggest challenge is you don’t know what you don’t know — what incorrect behaviors an agent could possibly display, how it might react in any given situation.

“You can only find that out by simulating conversations at scale, by pushing it under thousands of different scenarios, and then analyzing how it holds up and how it reacts,” Waanders said.



Source_link

READ ALSO

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

AI-Powered Cybercrime Is Surging. The US Lost $16.6 Billion in 2024.

Related Posts

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark
Technology And Software

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

March 12, 2026
AI-Powered Cybercrime Is Surging. The US Lost $16.6 Billion in 2024.
Technology And Software

AI-Powered Cybercrime Is Surging. The US Lost $16.6 Billion in 2024.

March 12, 2026
NVIDIA- and Uber-backed Nuro is testing autonomous vehicles in Tokyo
Technology And Software

NVIDIA- and Uber-backed Nuro is testing autonomous vehicles in Tokyo

March 12, 2026
Booking.com Promo Codes and Deals: Up to 20% Off
Technology And Software

Booking.com Promo Codes and Deals: Up to 20% Off

March 12, 2026
AI ‘actor’ Tilly Norwood put out the worst song I’ve ever heard
Technology And Software

AI ‘actor’ Tilly Norwood put out the worst song I’ve ever heard

March 12, 2026
Manufact raises $6.3M as MCP becomes the ‘USB-C for AI’ powering ChatGPT and Claude apps
Technology And Software

Manufact raises $6.3M as MCP becomes the ‘USB-C for AI’ powering ChatGPT and Claude apps

March 11, 2026
Next Post
How Igensia Education Group Grew Social Engagement by 32%

How Igensia Education Group Grew Social Engagement by 32%

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

How to Organize, Merge, and Rename Pinterest Boards Like a Pro

September 29, 2025

The Exponential Shift: AI and the Unprecedented Speed of Digital Customer Transformation

December 9, 2025
Insights from Cenex, the automated mobility and net zero expo

Insights from Cenex, the automated mobility and net zero expo

September 9, 2025
A Coding Guide to Build a Scalable End-to-End Analytics and Machine Learning Pipeline on Millions of Rows Using Vaex

A Coding Guide to Build a Scalable End-to-End Analytics and Machine Learning Pipeline on Millions of Rows Using Vaex

March 3, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Meta Announces Location Fees (Plus 4 Updates)
  • Build In-House vs Hire Development Agency Guide 2026
  • Google Maps is getting AI-powered ‘Ask Maps’ feature and more immersive navigation
  • Navigating Regulations in Home Wellness Marketing
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions