• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, May 25, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

Josh by Josh
May 25, 2026
in Al, Analytics and Automation
0
Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments


print("\nPART 5 ── Datasets & experiments --------------------------------------")
DATASET = "capital-cities-tutorial"
langfuse.create_dataset(name=DATASET, description="Capital-city QA benchmark")
_items = [
   ("What is the capital of France?",  "Paris"),
   ("What is the capital of Germany?", "Berlin"),
   ("What is the capital of Japan?",   "Tokyo"),
   ("What is the capital of Italy?",   "Rome"),
]
for i, (q, a) in enumerate(_items):
   langfuse.create_dataset_item(dataset_name=DATASET, id=f"cap-{i}",
                                input={"question": q}, expected_output=a)
def capital_task(*, item, **kwargs):
   question = item.input["question"] if isinstance(item.input, dict) else item.input
   return llm_chat([{"role": "user", "content": question}], name="experiment-answer")
def accuracy(*, input, output, expected_output, metadata=None, **kwargs):
   hit = bool(expected_output) and expected_output.lower() in (output or "").lower()
   return Evaluation(name="accuracy", value=1.0 if hit else 0.0,
                     comment="exact-match contains check")
def conciseness(*, input, output, **kwargs):
   return Evaluation(name="char_length", value=float(len(output or "")))
def mean_accuracy(*, item_results, **kwargs):
   vals = [e.value for r in item_results for e in r.evaluations if e.name == "accuracy"]
   avg = sum(vals) / len(vals) if vals else 0.0
   return Evaluation(name="mean_accuracy", value=avg, comment=f"{avg:.0%} correct")
dataset = langfuse.get_dataset(DATASET)
result = dataset.run_experiment(
   name="capitals-baseline",
   description="Baseline run from the Colab tutorial",
   task=capital_task,
   evaluators=[accuracy, conciseness],
   run_evaluators=[mean_accuracy],
   max_concurrency=4,
)
print(result.format())



Source_link

READ ALSO

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

Related Posts

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%
Al, Analytics and Automation

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%

May 24, 2026
Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents
Al, Analytics and Automation

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

May 24, 2026
Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification
Al, Analytics and Automation

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

May 23, 2026
A Step-by-Step Coding Tutorial to Implement GBrain: The Self-Wiring Memory Layer Built by Y Combinator’s Garry Tan for AI Agents
Al, Analytics and Automation

A Step-by-Step Coding Tutorial to Implement GBrain: The Self-Wiring Memory Layer Built by Y Combinator’s Garry Tan for AI Agents

May 23, 2026
Microsoft Releases Fara1.5: A Family of Browser Computer-Use Agents (4B/9B/27B) That Outperform OpenAI Operator and Gemini 2.5 Computer Use on Online-Mind2Web
Al, Analytics and Automation

Microsoft Releases Fara1.5: A Family of Browser Computer-Use Agents (4B/9B/27B) That Outperform OpenAI Operator and Gemini 2.5 Computer Use on Online-Mind2Web

May 22, 2026
Justin Solomon appointed associate dean of engineering education | MIT News
Al, Analytics and Automation

Justin Solomon appointed associate dean of engineering education | MIT News

May 22, 2026
Next Post
Why short clips are taking over your social media feed

Why short clips are taking over your social media feed

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Meet the inaugural jury of Ragan’s Zenith Awards

August 31, 2025
More Google app connections and visual help

More Google app connections and visual help

August 25, 2025
Solving LLM Hallucinations in Conversational, Customer-Facing Use Cases

Solving LLM Hallucinations in Conversational, Customer-Facing Use Cases

June 23, 2025
What to post on LinkedIn: 30 ideas plus examples

What to post on LinkedIn: 30 ideas plus examples

July 30, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to Track Quality Traffic Percentage
  • Why short clips are taking over your social media feed
  • Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments
  • A new experiment brings better group meetings to Google Beam
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions