• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, July 3, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Technology And Software

Confidence in agentic AI: Why eval infrastructure must come first

Josh by Josh
July 3, 2025
in Technology And Software
0
Confidence in agentic AI: Why eval infrastructure must come first
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


As AI agents enter real-world deployment, organizations are under pressure to define where they belong, how to build them effectively, and how to operationalize them at scale. At VentureBeat’s Transform 2025, tech leaders gathered to talk about how they’re transforming their business with agents: Joanne Chen, general partner at Foundation Capital; Shailesh Nalawadi, VP of project management with Sendbird; Thys Waanders, SVP of AI transformation at Cognigy; and Shawn Malhotra, CTO, Rocket Companies.

A few top agentic AI use cases

“The initial attraction of any of these deployments for AI agents tends to be around saving human capital — the math is pretty straightforward,” Nalawadi said. “However, that undersells the transformational capability you get with AI agents.”

At Rocket, AI agents have proven to be powerful tools in increasing website conversion.

“We’ve found that with our agent-based experience, the conversational experience on the website, clients are three times more likely to convert when they come through that channel,” Malhotra said.

But that’s just scratching the surface. For instance, a Rocket engineer built an agent in just two days to automate a highly specialized task: calculating transfer taxes during mortgage underwriting.

“That two days of effort saved us a million dollars a year in expense,” Malhotra said. “In 2024, we saved more than a million team member hours, mostly off the back of our AI solutions. That’s not just saving expense. It’s also allowing our team members to focus their time on people making what is often the largest financial transaction of their life.”

Agents are essentially supercharging individual team members. That million hours saved isn’t the entirety of someone’s job replicated many times. It’s fractions of the job that are things employees don’t enjoy doing, or weren’t adding value to the client. And that million hours saved gives Rocket the capacity to handle more business.

“Some of our team members were able to handle 50% more clients last year than they were the year before,” Malhotra added. “It means we can have higher throughput, drive more business, and again, we see higher conversion rates because they’re spending the time understanding the client’s needs versus doing a lot of more rote work that the AI can do now.”

Tackling agent complexity

“Part of the journey for our engineering teams is moving from the mindset of software engineering – write once and test it and it runs and gives the same answer 1,000 times – to the more probabilistic approach, where you ask the same thing of an LLM and it gives different answers through some probability,” Nalawadi said. “A lot of it has been bringing people along. Not just software engineers, but product managers and UX designers.”

What’s helped is that LLMs have come a long way, Waanders said. If they built something 18 months or two years ago, they really had to pick the right model, or the agent would not perform as expected. Now, he says, we’re now at a stage where most of the mainstream models behave very well. They’re more predictable. But today the challenge is combining models, ensuring responsiveness, orchestrating the right models in the right sequence and weaving in the right data.

“We have customers that push tens of millions of conversations per year,” Waanders said. “If you automate, say, 30 million conversations in a year, how does that scale in the LLM world? That’s all stuff that we had to discover, simple stuff, from even getting the model availability with the cloud providers. Having enough quota with a ChatGPT model, for example. Those are all learnings that we had to go through, and our customers as well. It’s a brand-new world.”

A layer above orchestrating the LLM is orchestrating a network of agents, Malhotra said. A conversational experience has a network of agents under the hood, and the orchestrator is deciding which agent to farm the request out to from those available.

“If you play that forward and think about having hundreds or thousands of agents who are capable of different things, you get some really interesting technical problems,” he said. “It’s becoming a bigger problem, because latency and time matter. That agent routing is going to be a very interesting problem to solve over the coming years.”

Tapping into vendor relationships

Up to this point, the first step for most companies launching agentic AI has been building in-house, because specialized tools didn’t yet exist. But you can’t differentiate and create value by building generic LLM infrastructure or AI infrastructure, and you need specialized expertise to go beyond the initial build, and debug, iterate, and improve on what’s been built, as well as maintain the infrastructure.

“Often we find the most successful conversations we have with prospective customers tend to be someone who’s already built something in-house,” Nalawadi said. “They quickly realize that getting to a 1.0 is okay, but as the world evolves and as the infrastructure evolves and as they need to swap out technology for something new, they don’t have the ability to orchestrate all these things.”

Preparing for agentic AI complexity

Theoretically, agentic AI will only grow in complexity — the number of agents in an organization will rise, and they’ll start learning from each other, and the number of use cases will explode. How can organizations prepare for the challenge?

“It means that the checks and balances in your system will get stressed more,” Malhotra said. “For something that has a regulatory process, you have a human in the loop to make sure that someone is signing off on this. For critical internal processes or data access, do you have observability? Do you have the right alerting and monitoring so that if something goes wrong, you know it’s going wrong? It’s doubling down on your detection, understanding where you need a human in the loop, and then trusting that those processes are going to catch if something does go wrong. But because of the power it unlocks, you have to do it.”

So how can you have confidence that an AI agent will behave reliably as it evolves?

“That part is really difficult if you haven’t thought about it at the beginning,” Nalawadi said. “The short answer is, before you even start building it, you should have an eval infrastructure in place. Make sure you have a rigorous environment in which you know what good looks like, from an AI agent, and that you have this test set. Keep referring back to it as you make improvements. A very simplistic way of thinking about eval is that it’s the unit tests for your agentic system.”

The problem is, it’s non-deterministic, Waanders added. Unit testing is critical, but the biggest challenge is you don’t know what you don’t know — what incorrect behaviors an agent could possibly display, how it might react in any given situation.

“You can only find that out by simulating conversations at scale, by pushing it under thousands of different scenarios, and then analyzing how it holds up and how it reacts,” Waanders said.



Source_link

READ ALSO

What the big, beautiful bill means for AI

Even before the Xbox layoffs, there was ‘tension’ at Halo Studios

Related Posts

What the big, beautiful bill means for AI
Technology And Software

What the big, beautiful bill means for AI

July 3, 2025
Even before the Xbox layoffs, there was ‘tension’ at Halo Studios
Technology And Software

Even before the Xbox layoffs, there was ‘tension’ at Halo Studios

July 3, 2025
A Group of Young Cybercriminals Poses the ‘Most Imminent Threat’ of Cyberattacks Right Now
Technology And Software

A Group of Young Cybercriminals Poses the ‘Most Imminent Threat’ of Cyberattacks Right Now

July 2, 2025
Former SpaceX manager alleges harassment, retaliation, and security violations in lawsuit
Technology And Software

Former SpaceX manager alleges harassment, retaliation, and security violations in lawsuit

July 2, 2025
Capital One builds agentic AI modeled after its own org chart to supercharge auto sales
Technology And Software

Capital One builds agentic AI modeled after its own org chart to supercharge auto sales

July 2, 2025
ICEBlock climbs to the top of the App Store charts after officials slam it
Technology And Software

ICEBlock climbs to the top of the App Store charts after officials slam it

July 2, 2025
Next Post
How Igensia Education Group Grew Social Engagement by 32%

How Igensia Education Group Grew Social Engagement by 32%

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Eating Bugs – MetaDevo

Eating Bugs – MetaDevo

May 29, 2025
Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

May 30, 2025
Entries For The Elektra Awards 2025 Are Now Open!

Entries For The Elektra Awards 2025 Are Now Open!

May 30, 2025

EDITOR'S PICK

Unlockable Reels Are Coming to Instagram: Here’s How They Work

Unlockable Reels Are Coming to Instagram: Here’s How They Work

June 8, 2025
Google just soft-launched nine cool Home app features

Google just soft-launched nine cool Home app features

June 10, 2025
Google’s test turns search results into an AI-generated podcast

Google’s test turns search results into an AI-generated podcast

June 14, 2025
We’re Officially Great Place to Work-Certified™ (Again!)

We’re Officially Great Place to Work-Certified™ (Again!)

May 29, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Squid Game X Script (No Key, Auto Win, Glass Marker)
  • DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output
  • Google’s customizable Gemini chatbots are now in Docs, Sheets, and Gmail
  • 24 Effective Ways to Drive Website Traffic in 2025 (Complete Guide)
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?