• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, May 11, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Getting Started with Mirascope: Removing Semantic Duplicates using an LLM

Josh by Josh
July 17, 2025
in Al, Analytics and Automation
0
Getting Started with Mirascope: Removing Semantic Duplicates using an LLM


Mirascope is a powerful and user-friendly library that provides a unified interface for working with a wide range of Large Language Model (LLM) providers, including OpenAI, Anthropic, Mistral, Google (Gemini and Vertex AI), Groq, Cohere, LiteLLM, Azure AI, and Amazon Bedrock. It simplifies everything from text generation and structured data extraction to building complex AI-powered workflows and agent systems.

In this guide, we’ll focus on using Mirascope’s OpenAI integration to identify and remove semantic duplicates (entries that may differ in wording but carry the same meaning) from a list of customer reviews. 

READ ALSO

OpenClaw vs Hermes Agent: Why Nous Research’s Self-Improving Agent Now Leads OpenRouter’s Global Rankings

NVIDIA AI Just Released cuda-oxide: An Experimental Rust-to-CUDA Compiler Backend that Compiles SIMT GPU Kernels Directly to PTX

Installing the dependencies

pip install "mirascope[openai]"

OpenAI Key

To get an OpenAI API key, visit https://platform.openai.com/settings/organization/api-keys and generate a new key. If you’re a new user, you may need to add billing details and make a minimum payment of $5 to activate API access.

import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

Defining the list of customer reviews

customer_reviews = [
    "Sound quality is amazing!",
    "Audio is crystal clear and very immersive.",
    "Incredible sound, especially the bass response.",
    "Battery doesn't last as advertised.",
    "Needs charging too often.",
    "Battery drains quickly -- not ideal for travel.",
    "Setup was super easy and straightforward.",
    "Very user-friendly, even for my parents.",
    "Simple interface and smooth experience.",
    "Feels cheap and plasticky.",
    "Build quality could be better.",
    "Broke within the first week of use.",
    "People say they can't hear me during calls.",
    "Mic quality is terrible on Zoom meetings.",
    "Great product for the price!"
]

These reviews capture key customer sentiments: praise for sound quality and ease of use, complaints about battery life, build quality, and call/mic issues, along with a positive note on value for money. They reflect common themes found in real user feedback.

Defining a Pydantic Schema

This Pydantic model defines the structure for the response of a semantic deduplication task on customer reviews. This schema helps structure and validate the output of a language model tasked with clustering or deduplicating natural language input (e.g., user feedback, bug reports, product reviews).

from pydantic import BaseModel, Field

class DeduplicatedReviews(BaseModel):
    duplicates: list[list[str]] = Field(
        ..., description="A list of semantically equivalent customer review groups"
    )
    reviews: list[str] = Field(
        ..., description="The deduplicated list of core customer feedback themes"
    )

Defining a Mirascope @openai.call for Semantic Deduplication

This code defines a semantic deduplication function using Mirascope’s @openai.call decorator, which enables seamless integration with OpenAI’s gpt-4o model. The deduplicate_customer_reviews function takes a list of customer reviews and uses a structured prompt—defined by the @prompt_template decorator—to guide the LLM in identifying and grouping semantically similar reviews.

The system message instructs the model to analyze the meaning, tone, and intent behind each review, clustering those that convey the same feedback even if worded differently. The function expects a structured response conforming to the DeduplicatedReviews Pydantic model, which includes two outputs: a list of unique, deduplicated review sentiments, and a list of grouped duplicates.

This design ensures that the LLM’s output is both accurate and machine-readable, making it ideal for customer feedback analysis, survey deduplication, or product review clustering.

from mirascope.core import openai, prompt_template

@openai.call(model="gpt-4o", response_model=DeduplicatedReviews)
@prompt_template(
    """
    SYSTEM:
    You are an AI assistant helping to analyze customer reviews. 
    Your task is to group semantically similar reviews together -- even if they are worded differently.

    - Use your understanding of meaning, tone, and implication to group duplicates.
    - Return two lists:
      1. A deduplicated list of the key distinct review sentiments.
      2. A list of grouped duplicates that share the same underlying feedback.

    USER:
    {reviews}
    """
)
def deduplicate_customer_reviews(reviews: list[str]): ...

The following code executes the deduplicate_customer_reviews function using a list of customer reviews and prints the structured output. First, it calls the function and stores the result in the response variable. To ensure that the model’s output conforms to the expected format, it uses an assert statement to validate that the response is an instance of the DeduplicatedReviews Pydantic model.

Once validated, it prints the deduplicated results in two sections. The first section, labeled “✅ Distinct Customer Feedback,” displays the list of unique review sentiments identified by the model. The second section, “🌀 Grouped Duplicates,” lists clusters of reviews that were recognized as semantically equivalent.

response = deduplicate_customer_reviews(customer_reviews)

# Ensure response format
assert isinstance(response, DeduplicatedReviews)

# Print Output
print("✅ Distinct Customer Feedback:")
for item in response.reviews:
    print("-", item)

print("n🌀 Grouped Duplicates:")
for group in response.duplicates:
    print("-", group)

The output shows a clean summary of customer feedback by grouping semantically similar reviews. The Distinct Customer Feedback section highlights key insights, while the Grouped Duplicates section captures different phrasings of the same sentiment. This helps eliminate redundancy and makes the feedback easier to analyze.


Check out the full Codes. All credit for this research goes to the researchers of this project.

Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Research, and top AI companies leverage MarkTechPost to reach their target audience [Learn More]


I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.



Source_link

Related Posts

OpenClaw vs Hermes Agent: Why Nous Research’s Self-Improving Agent Now Leads OpenRouter’s Global Rankings
Al, Analytics and Automation

OpenClaw vs Hermes Agent: Why Nous Research’s Self-Improving Agent Now Leads OpenRouter’s Global Rankings

May 10, 2026
NVIDIA AI Just Released cuda-oxide: An Experimental Rust-to-CUDA Compiler Backend that Compiles SIMT GPU Kernels Directly to PTX
Al, Analytics and Automation

NVIDIA AI Just Released cuda-oxide: An Experimental Rust-to-CUDA Compiler Backend that Compiles SIMT GPU Kernels Directly to PTX

May 10, 2026
Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents
Al, Analytics and Automation

Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents

May 9, 2026
Al, Analytics and Automation

9 Best AI Tools for Spec-Driven Development in 2026: Kiro, BMAD, GSD, and More Compare

May 9, 2026
Europe Hits Pause on Its Toughest AI Rules — and the Backlash Has Already Begun
Al, Analytics and Automation

Europe Hits Pause on Its Toughest AI Rules — and the Backlash Has Already Begun

May 9, 2026
How to Build a Single-Cell RNA-seq Analysis Pipeline with Scanpy for PBMC Clustering, Annotation, and Trajectory Discovery
Al, Analytics and Automation

How to Build a Single-Cell RNA-seq Analysis Pipeline with Scanpy for PBMC Clustering, Annotation, and Trajectory Discovery

May 8, 2026
Next Post
The FCC plans to ban Chinese technology in undersea cables

The FCC plans to ban Chinese technology in undersea cables

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

How to Manage Registration and Payments for Large Events

How to Manage Registration and Payments for Large Events

January 8, 2026
How to Run Facebook Ads: 2026 Beginner’s Guide

How to Run Facebook Ads: 2026 Beginner’s Guide

February 25, 2026
20+ Top Social Media Platforms to Grow Your Brand in 2025

20+ Top Social Media Platforms to Grow Your Brand in 2025

June 1, 2025

Why Digital Ability Trumps IQ

June 7, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • If your pitch is dying in a journalist’s inbox, try this instead
  • Get ready for the whisper-filled office of the future
  • OpenClaw vs Hermes Agent: Why Nous Research’s Self-Improving Agent Now Leads OpenRouter’s Global Rankings
  • I Analyzed the 5 Best Incident Response Tools in 2026
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions