• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, February 8, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

How to Design Production-Grade Mock Data Pipelines Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models

Josh by Josh
February 8, 2026
in Al, Analytics and Automation
0
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


In this tutorial, we walk through an advanced, end-to-end exploration of Polyfactory, focusing on how we can generate rich, realistic mock data directly from Python type hints. We start by setting up the environment and progressively build factories for data classes, Pydantic models, and attrs-based classes, while demonstrating customization, overrides, calculated fields, and the generation of nested objects. As we move through each snippet, we show how we can control randomness, enforce constraints, and model real-world structures, making this tutorial directly applicable to testing, prototyping, and data-driven development workflows. Check out the FULL CODES here.

import subprocess
import sys


def install_package(package):
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])


packages = [
   "polyfactory",
   "pydantic",
   "email-validator",
   "faker",
   "msgspec",
   "attrs"
]


for package in packages:
   try:
       install_package(package)
       print(f"✓ Installed {package}")
   except Exception as e:
       print(f"✗ Failed to install {package}: {e}")


print("\n")


print("=" * 80)
print("SECTION 2: Basic Dataclass Factories")
print("=" * 80)


from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime, date
from uuid import UUID
from polyfactory.factories import DataclassFactory


@dataclass
class Address:
   street: str
   city: str
   country: str
   zip_code: str


@dataclass
class Person:
   id: UUID
   name: str
   email: str
   age: int
   birth_date: date
   is_active: bool
   address: Address
   phone_numbers: List[str]
   bio: Optional[str] = None


class PersonFactory(DataclassFactory[Person]):
   pass


person = PersonFactory.build()
print(f"Generated Person:")
print(f"  ID: {person.id}")
print(f"  Name: {person.name}")
print(f"  Email: {person.email}")
print(f"  Age: {person.age}")
print(f"  Address: {person.address.city}, {person.address.country}")
print(f"  Phone Numbers: {person.phone_numbers[:2]}")
print()


people = PersonFactory.batch(5)
print(f"Generated {len(people)} people:")
for i, p in enumerate(people, 1):
   print(f"  {i}. {p.name} - {p.email}")
print("\n")

We set up the environment and ensure all required dependencies are installed. We also introduce the core idea of using Polyfactory to generate mock data from type hints. By initializing the basic dataclass factories, we establish the foundation for all subsequent examples.

print("=" * 80)
print("SECTION 3: Customizing Factory Behavior")
print("=" * 80)


from faker import Faker
from polyfactory.fields import Use, Ignore


@dataclass
class Employee:
   employee_id: str
   full_name: str
   department: str
   salary: float
   hire_date: date
   is_manager: bool
   email: str
   internal_notes: Optional[str] = None


class EmployeeFactory(DataclassFactory[Employee]):
   __faker__ = Faker(locale="en_US")
   __random_seed__ = 42


   @classmethod
   def employee_id(cls) -> str:
       return f"EMP-{cls.__random__.randint(10000, 99999)}"


   @classmethod
   def full_name(cls) -> str:
       return cls.__faker__.name()


   @classmethod
   def department(cls) -> str:
       departments = ["Engineering", "Marketing", "Sales", "HR", "Finance"]
       return cls.__random__.choice(departments)


   @classmethod
   def salary(cls) -> float:
       return round(cls.__random__.uniform(50000, 150000), 2)


   @classmethod
   def email(cls) -> str:
       return cls.__faker__.company_email()


employees = EmployeeFactory.batch(3)
print("Generated Employees:")
for emp in employees:
   print(f"  {emp.employee_id}: {emp.full_name}")
   print(f"    Department: {emp.department}")
   print(f"    Salary: ${emp.salary:,.2f}")
   print(f"    Email: {emp.email}")
   print()
print()


print("=" * 80)
print("SECTION 4: Field Constraints and Calculated Fields")
print("=" * 80)


@dataclass
class Product:
   product_id: str
   name: str
   description: str
   price: float
   discount_percentage: float
   stock_quantity: int
   final_price: Optional[float] = None
   sku: Optional[str] = None


class ProductFactory(DataclassFactory[Product]):
   @classmethod
   def product_id(cls) -> str:
       return f"PROD-{cls.__random__.randint(1000, 9999)}"


   @classmethod
   def name(cls) -> str:
       adjectives = ["Premium", "Deluxe", "Classic", "Modern", "Eco"]
       nouns = ["Widget", "Gadget", "Device", "Tool", "Appliance"]
       return f"{cls.__random__.choice(adjectives)} {cls.__random__.choice(nouns)}"


   @classmethod
   def price(cls) -> float:
       return round(cls.__random__.uniform(10.0, 1000.0), 2)


   @classmethod
   def discount_percentage(cls) -> float:
       return round(cls.__random__.uniform(0, 30), 2)


   @classmethod
   def stock_quantity(cls) -> int:
       return cls.__random__.randint(0, 500)


   @classmethod
   def build(cls, **kwargs):
       instance = super().build(**kwargs)
       if instance.final_price is None:
           instance.final_price = round(
               instance.price * (1 - instance.discount_percentage / 100), 2
           )
       if instance.sku is None:
           name_part = instance.name.replace(" ", "-").upper()[:10]
           instance.sku = f"{instance.product_id}-{name_part}"
       return instance


products = ProductFactory.batch(3)
print("Generated Products:")
for prod in products:
   print(f"  {prod.sku}")
   print(f"    Name: {prod.name}")
   print(f"    Price: ${prod.price:.2f}")
   print(f"    Discount: {prod.discount_percentage}%")
   print(f"    Final Price: ${prod.final_price:.2f}")
   print(f"    Stock: {prod.stock_quantity} units")
   print()
print()

We focus on generating simple but realistic mock data using dataclasses and default Polyfactory behavior. We show how to quickly create single instances and batches without writing any custom logic. It helps us validate how Polyfactory automatically interprets type hints to populate nested structures.

print("=" * 80)
print("SECTION 6: Complex Nested Structures")
print("=" * 80)


from enum import Enum


class OrderStatus(str, Enum):
   PENDING = "pending"
   PROCESSING = "processing"
   SHIPPED = "shipped"
   DELIVERED = "delivered"
   CANCELLED = "cancelled"


@dataclass
class OrderItem:
   product_name: str
   quantity: int
   unit_price: float
   total_price: Optional[float] = None


@dataclass
class ShippingInfo:
   carrier: str
   tracking_number: str
   estimated_delivery: date


@dataclass
class Order:
   order_id: str
   customer_name: str
   customer_email: str
   status: OrderStatus
   items: List[OrderItem]
   order_date: datetime
   shipping_info: Optional[ShippingInfo] = None
   total_amount: Optional[float] = None
   notes: Optional[str] = None


class OrderItemFactory(DataclassFactory[OrderItem]):
   @classmethod
   def product_name(cls) -> str:
       products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones",
                  "Webcam", "USB Cable", "Phone Case", "Charger", "Tablet"]
       return cls.__random__.choice(products)


   @classmethod
   def quantity(cls) -> int:
       return cls.__random__.randint(1, 5)


   @classmethod
   def unit_price(cls) -> float:
       return round(cls.__random__.uniform(5.0, 500.0), 2)


   @classmethod
   def build(cls, **kwargs):
       instance = super().build(**kwargs)
       if instance.total_price is None:
           instance.total_price = round(instance.quantity * instance.unit_price, 2)
       return instance


class ShippingInfoFactory(DataclassFactory[ShippingInfo]):
   @classmethod
   def carrier(cls) -> str:
       carriers = ["FedEx", "UPS", "DHL", "USPS"]
       return cls.__random__.choice(carriers)


   @classmethod
   def tracking_number(cls) -> str:
       return ''.join(cls.__random__.choices('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=12))


class OrderFactory(DataclassFactory[Order]):
   @classmethod
   def order_id(cls) -> str:
       return f"ORD-{datetime.now().year}-{cls.__random__.randint(100000, 999999)}"


   @classmethod
   def items(cls) -> List[OrderItem]:
       return OrderItemFactory.batch(cls.__random__.randint(1, 5))


   @classmethod
   def build(cls, **kwargs):
       instance = super().build(**kwargs)
       if instance.total_amount is None:
           instance.total_amount = round(sum(item.total_price for item in instance.items), 2)
       if instance.shipping_info is None and instance.status in [OrderStatus.SHIPPED, OrderStatus.DELIVERED]:
           instance.shipping_info = ShippingInfoFactory.build()
       return instance


orders = OrderFactory.batch(2)
print("Generated Orders:")
for order in orders:
   print(f"\n  Order {order.order_id}")
   print(f"    Customer: {order.customer_name} ({order.customer_email})")
   print(f"    Status: {order.status.value}")
   print(f"    Items ({len(order.items)}):")
   for item in order.items:
       print(f"      - {item.quantity}x {item.product_name} @ ${item.unit_price:.2f} = ${item.total_price:.2f}")
   print(f"    Total: ${order.total_amount:.2f}")
   if order.shipping_info:
       print(f"    Shipping: {order.shipping_info.carrier} - {order.shipping_info.tracking_number}")
print("\n")

We build more complex domain logic by introducing calculated and dependent fields within factories. We show how we can derive values such as final prices, totals, and shipping details after object creation. This allows us to model realistic business rules directly inside our test data generators.

print("=" * 80)
print("SECTION 7: Attrs Integration")
print("=" * 80)


import attrs
from polyfactory.factories.attrs_factory import AttrsFactory


@attrs.define
class BlogPost:
   title: str
   author: str
   content: str
   views: int = 0
   likes: int = 0
   published: bool = False
   published_at: Optional[datetime] = None
   tags: List[str] = attrs.field(factory=list)


class BlogPostFactory(AttrsFactory[BlogPost]):
   @classmethod
   def title(cls) -> str:
       templates = [
           "10 Tips for {}",
           "Understanding {}",
           "The Complete Guide to {}",
           "Why {} Matters",
           "Getting Started with {}"
       ]
       topics = ["Python", "Data Science", "Machine Learning", "Web Development", "DevOps"]
       template = cls.__random__.choice(templates)
       topic = cls.__random__.choice(topics)
       return template.format(topic)


   @classmethod
   def content(cls) -> str:
       return " ".join(Faker().sentences(nb=cls.__random__.randint(3, 8)))


   @classmethod
   def views(cls) -> int:
       return cls.__random__.randint(0, 10000)


   @classmethod
   def likes(cls) -> int:
       return cls.__random__.randint(0, 1000)


   @classmethod
   def tags(cls) -> List[str]:
       all_tags = ["python", "tutorial", "beginner", "advanced", "guide",
                  "tips", "best-practices", "2024"]
       return cls.__random__.sample(all_tags, k=cls.__random__.randint(2, 5))


posts = BlogPostFactory.batch(3)
print("Generated Blog Posts:")
for post in posts:
   print(f"\n  '{post.title}'")
   print(f"    Author: {post.author}")
   print(f"    Views: {post.views:,} | Likes: {post.likes:,}")
   print(f"    Published: {post.published}")
   print(f"    Tags: {', '.join(post.tags)}")
   print(f"    Preview: {post.content[:100]}...")
print("\n")


print("=" * 80)
print("SECTION 8: Building with Specific Overrides")
print("=" * 80)


custom_person = PersonFactory.build(
   name="Alice Johnson",
   age=30,
   email="[email protected]"
)
print(f"Custom Person:")
print(f"  Name: {custom_person.name}")
print(f"  Age: {custom_person.age}")
print(f"  Email: {custom_person.email}")
print(f"  ID (auto-generated): {custom_person.id}")
print()


vip_customers = PersonFactory.batch(
   3,
   bio="VIP Customer"
)
print("VIP Customers:")
for customer in vip_customers:
   print(f"  {customer.name}: {customer.bio}")
print("\n")

We extend Polyfactory usage to validated Pydantic models and attrs-based classes. We demonstrate how we can respect field constraints, validators, and default behaviors while still generating valid data at scale. It ensures our mock data remains compatible with real application schemas.

print("=" * 80)
print("SECTION 9: Field-Level Control with Use and Ignore")
print("=" * 80)


from polyfactory.fields import Use, Ignore


@dataclass
class Configuration:
   app_name: str
   version: str
   debug: bool
   created_at: datetime
   api_key: str
   secret_key: str


class ConfigFactory(DataclassFactory[Configuration]):
   app_name = Use(lambda: "MyAwesomeApp")
   version = Use(lambda: "1.0.0")
   debug = Use(lambda: False)


   @classmethod
   def api_key(cls) -> str:
       return f"api_key_{''.join(cls.__random__.choices('0123456789abcdef', k=32))}"


   @classmethod
   def secret_key(cls) -> str:
       return f"secret_{''.join(cls.__random__.choices('0123456789abcdef', k=64))}"


configs = ConfigFactory.batch(2)
print("Generated Configurations:")
for config in configs:
   print(f"  App: {config.app_name} v{config.version}")
   print(f"    Debug: {config.debug}")
   print(f"    API Key: {config.api_key[:20]}...")
   print(f"    Created: {config.created_at}")
   print()
print()


print("=" * 80)
print("SECTION 10: Model Coverage Testing")
print("=" * 80)


from pydantic import BaseModel, ConfigDict
from typing import Union


class PaymentMethod(BaseModel):
   model_config = ConfigDict(use_enum_values=True)
   type: str
   card_number: Optional[str] = None
   bank_name: Optional[str] = None
   verified: bool = False


class PaymentMethodFactory(ModelFactory[PaymentMethod]):
   __model__ = PaymentMethod


payment_methods = [
   PaymentMethodFactory.build(type="card", card_number="4111111111111111"),
   PaymentMethodFactory.build(type="bank", bank_name="Chase Bank"),
   PaymentMethodFactory.build(verified=True),
]


print("Payment Method Coverage:")
for i, pm in enumerate(payment_methods, 1):
   print(f"  {i}. Type: {pm.type}")
   if pm.card_number:
       print(f"     Card: {pm.card_number}")
   if pm.bank_name:
       print(f"     Bank: {pm.bank_name}")
   print(f"     Verified: {pm.verified}")
print("\n")


print("=" * 80)
print("TUTORIAL SUMMARY")
print("=" * 80)
print("""
This tutorial covered:


1. ✓ Basic Dataclass Factories - Simple mock data generation
2. ✓ Custom Field Generators - Controlling individual field values
3. ✓ Field Constraints - Using PostGenerated for calculated fields
4. ✓ Pydantic Integration - Working with validated models
5. ✓ Complex Nested Structures - Building related objects
6. ✓ Attrs Support - Alternative to dataclasses
7. ✓ Build Overrides - Customizing specific instances
8. ✓ Use and Ignore - Explicit field control
9. ✓ Coverage Testing - Ensuring comprehensive test data


Key Takeaways:
- Polyfactory automatically generates mock data from type hints
- Customize generation with classmethods and decorators
- Supports multiple libraries: dataclasses, Pydantic, attrs, msgspec
- Use PostGenerated for calculated/dependent fields
- Override specific values while keeping others random
- Perfect for testing, development, and prototyping


For more information:
- Documentation: https://polyfactory.litestar.dev/
- GitHub: https://github.com/litestar-org/polyfactory
""")
print("=" * 80)

We cover advanced usage patterns such as explicit overrides, constant field values, and coverage testing scenarios. We show how we can intentionally construct edge cases and variant instances for robust testing. This final step ties everything together by demonstrating how Polyfactory supports comprehensive and production-grade test data strategies.

In conclusion, we demonstrated how Polyfactory enables us to create comprehensive, flexible test data with minimal boilerplate while still retaining fine-grained control over every field. We showed how to handle simple entities, complex nested structures, and Pydantic model validation, as well as explicit field overrides, within a single, consistent factory-based approach. Overall, we found that Polyfactory enables us to move faster and test more confidently, as it reliably generates realistic datasets that closely mirror production-like scenarios without sacrificing clarity or maintainability.


Check out the FULL CODES here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source_link

READ ALSO

Feature Set and Subscription Pricing

3 Questions: Using AI to accelerate the discovery and design of therapeutic drugs | MIT News

Related Posts

Feature Set and Subscription Pricing
Al, Analytics and Automation

Feature Set and Subscription Pricing

February 8, 2026
3 Questions: Using AI to accelerate the discovery and design of therapeutic drugs | MIT News
Al, Analytics and Automation

3 Questions: Using AI to accelerate the discovery and design of therapeutic drugs | MIT News

February 8, 2026
Google AI Introduces PaperBanana: An Agentic Framework that Automates Publication Ready Methodology Diagrams and Statistical Plots
Al, Analytics and Automation

Google AI Introduces PaperBanana: An Agentic Framework that Automates Publication Ready Methodology Diagrams and Statistical Plots

February 8, 2026
Plans, Features, and Performance Overview
Al, Analytics and Automation

Plans, Features, and Performance Overview

February 7, 2026
Antonio Torralba, three MIT alumni named 2025 ACM fellows | MIT News
Al, Analytics and Automation

Antonio Torralba, three MIT alumni named 2025 ACM fellows | MIT News

February 7, 2026
How to Build a Production-Grade Agentic AI System with Hybrid Retrieval, Provenance-First Citations, Repair Loops, and Episodic Memory
Al, Analytics and Automation

How to Build a Production-Grade Agentic AI System with Hybrid Retrieval, Provenance-First Citations, Repair Loops, and Episodic Memory

February 7, 2026
Next Post
What to do when you regret a social media post, explained

What to do when you regret a social media post, explained

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

How to See Your Google Reviews and Easily Manage Them

How to See Your Google Reviews and Easily Manage Them

July 13, 2025
How to Build a Trusted Healthcare Brand with AI and Authenticity

How to Build a Trusted Healthcare Brand with AI and Authenticity

May 31, 2025
15 high-converting landing page examples (+ why they work)

15 high-converting landing page examples (+ why they work)

June 7, 2025
Rocket Lab wins another defense-related space contract

Rocket Lab wins another defense-related space contract

December 19, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • GM Financial’s business case for making AI fun for employees
  • What Andromeda Wants – Jon Loomer Digital
  • What to do when you regret a social media post, explained
  • How to Design Production-Grade Mock Data Pipelines Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?