• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, April 24, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Safeguarding Agentic AI Systems: NVIDIA’s Open-Source Safety Recipe

Josh by Josh
July 29, 2025
in Al, Analytics and Automation
0
Safeguarding Agentic AI Systems: NVIDIA’s Open-Source Safety Recipe


As large language models (LLMs) evolve from simple text generators to agentic systems —able to plan, reason, and autonomously act—there is a significant increase in both their capabilities and associated risks. Enterprises are rapidly adopting agentic AI for automation, but this trend exposes organizations to new challenges: goal misalignment, prompt injection, unintended behaviors, data leakage, and reduced human oversight. Addressing these concerns, NVIDIA has released an open-source software suite and a post-training safety recipe designed to safeguard agentic AI systems throughout their lifecycle.

The Need for Safety in Agentic AI

Agentic LLMs leverage advanced reasoning and tool use, enabling them to operate with a high degree of autonomy. However, this autonomy can result in:

READ ALSO

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates

  • Content moderation failures (e.g., generation of harmful, toxic, or biased outputs)
  • Security vulnerabilities (prompt injection, jailbreak attempts)
  • Compliance and trust risks (failure to align with enterprise policies or regulatory standards)

Traditional guardrails and content filters often fall short as models and attacker techniques rapidly evolve. Enterprises require systematic, lifecycle-wide strategies for aligning open models with internal policies and external regulations.

NVIDIA’s Safety Recipe: Overview and Architecture

NVIDIA’s agentic AI safety recipe provides a comprehensive end-to-end framework to evaluate, align, and safeguard LLMs before, during, and after deployment:

  • Evaluation: Before deployment, the recipe enables testing against enterprise policies, security requirements, and trust thresholds using open datasets and benchmarks.
  • Post-Training Alignment: Using Reinforcement Learning (RL), Supervised Fine-Tuning (SFT), and on-policy dataset blends, models are further aligned with safety standards.
  • Continuous Protection: After deployment, NVIDIA NeMo Guardrails and real-time monitoring microservices provide ongoing, programmable guardrails, actively blocking unsafe outputs and defending against prompt injections and jailbreak attempts.

Core Components

Stage Technology/Tools Purpose
Pre-Deployment Evaluation Nemotron Content Safety Dataset, WildGuardMix, garak scanner Test safety/security
Post-Training Alignment RL, SFT, open-licensed data Fine-tune safety/alignment
Deployment & Inference NeMo Guardrails, NIM microservices (content safety, topic control, jailbreak detect) Block unsafe behaviors
Monitoring & Feedback garak, real-time analytics Detect/resist new attacks

Open Datasets and Benchmarks

  • Nemotron Content Safety Dataset v2: Used for pre- and post-training evaluation, this dataset screens for a wide spectrum of harmful behaviors.
  • WildGuardMix Dataset: Targets content moderation across ambiguous and adversarial prompts.
  • Aegis Content Safety Dataset: Over 35,000 annotated samples, enabling fine-grained filter and classifier development for LLM safety tasks.

Post-Training Process

NVIDIA’s post-training recipe for safety is distributed as an open-source Jupyter notebook or as a launchable cloud module, ensuring transparency and broad accessibility. The workflow typically includes:

  1. Initial Model Evaluation: Baseline testing on safety/security with open benchmarks.
  2. On-policy Safety Training: Response generation by the target/aligned model, supervised fine-tuning, and reinforcement learning with open datasets.
  3. Re-evaluation: Re-running safety/security benchmarks post-training to confirm improvements.
  4. Deployment: Trusted models are deployed with live monitoring and guardrail microservices (content moderation, topic/domain control, jailbreak detection).

Quantitative Impact

  • Content Safety: Improved from 88% to 94% after applying the NVIDIA safety post-training recipe—a 6% gain, with no measurable loss of accuracy.
  • Product Security: Improved resilience against adversarial prompts (jailbreaks etc.) from 56% to 63%, a 7% gain.

Collaborative and Ecosystem Integration

NVIDIA’s approach goes beyond internal tools—partnerships with leading cybersecurity providers (Cisco AI Defense, CrowdStrike, Trend Micro, Active Fence) enable integration of continuous safety signals and incident-driven improvements across the AI lifecycle.

How To Get Started

  1. Open Source Access: The full safety evaluation and post-training recipe (tools, datasets, guides) is publicly available for download and as a cloud-deployable solution.
  2. Custom Policy Alignment: Enterprises can define custom business policies, risk thresholds, and regulatory requirements—using the recipe to align models accordingly.
  3. Iterative Hardening: Evaluate, post-train, re-evaluate, and deploy as new risks emerge, ensuring ongoing model trustworthiness.

Conclusion

NVIDIA’s safety recipe for agentic LLMs represents an industry-first, openly available, systematic approach to hardening LLMs against modern AI risks. By operationalizing robust, transparent, and extensible safety protocols, enterprises can confidently adopt agentic AI, balancing innovation with security and compliance.


Check out the NVIDIA AI safety recipe and Technical details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

FAQ: Can Marktechpost help me to promote my AI Product and position it in front of AI Devs and Data Engineers?

Ans: Yes, Marktechpost can help promote your AI product by publishing sponsored articles, case studies, or product features, targeting a global audience of AI developers and data engineers. The MTP platform is widely read by technical professionals, increasing your product’s visibility and positioning within the AI community. [SET UP A CALL]


    Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News
Al, Analytics and Automation

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News

April 24, 2026
Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates
Al, Analytics and Automation

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates

April 24, 2026
Mend Releases AI Security Governance Framework: Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model
Al, Analytics and Automation

Mend Releases AI Security Governance Framework: Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model

April 24, 2026
“Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office
Al, Analytics and Automation

“Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office

April 23, 2026
Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures
Al, Analytics and Automation

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures

April 23, 2026
The Most Efficient Approach to Crafting Your Personal AI Productivity System
Al, Analytics and Automation

The Most Efficient Approach to Crafting Your Personal AI Productivity System

April 23, 2026
Next Post
The best Mint alternatives

The best Mint alternatives

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Google DeepMind Releases AlphaGenome: A Deep Learning Model that can more Comprehensively Predict the Impact of Single Variants or Mutations in DNA

Google DeepMind Releases AlphaGenome: A Deep Learning Model that can more Comprehensively Predict the Impact of Single Variants or Mutations in DNA

June 26, 2025
Meta AI Open Sources GCM for Better GPU Cluster Monitoring to Ensure High Performance AI Training and Hardware Reliability

Meta AI Open Sources GCM for Better GPU Cluster Monitoring to Ensure High Performance AI Training and Hardware Reliability

February 25, 2026
Circle to Search adds continuous translation

Circle to Search adds continuous translation

September 7, 2025

The Modern Leader series: Make holistic wellbeing a strategic priority

December 14, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Top 25 SEM Tools: Content, SEO, and More!
  • MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News
  • Which is the Best Knowledge Base Software for Contact Centers?
  • 10 Critical Benefits of Computer Vision for Business in 2026
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions