Absolutely! Let’s do a deep dive into Pydantic step by step, with clear examples and industry use cases. I’ll break this down in a structured way so you can fully grasp it.


1. What is Pydantic?

Pydantic is a data validation and settings management library for Python, primarily using Python type annotations.

  • It ensures that your data is correct, consistent, and type-safe.
  • Under the hood, it parses input data, converts types if possible, and raises detailed errors if the data is invalid.
  • Used heavily in FastAPI, data pipelines, configuration management, and APIs.

Think of Pydantic as a guard that sits between your raw data (JSON, dicts, environment variables) and your Python code, making sure the data is structured correctly.


2. Key Features of Pydantic

FeatureExplanation
Type validationEnsures data types match what you declare.
Automatic type conversionConverts compatible types (e.g., string "123" → int 123).
Nested modelsSupports nested structures (dicts inside dicts).
Data parsingConverts JSON or dicts to Python objects.
Error reportingGives detailed, structured error messages.
Immutable modelsCan make data read-only (frozen=True).
Integration with FastAPIPowers request/response validation automatically.

3. Installing Pydantic

pip install pydantic

We’ll use Pydantic v2 examples since it’s the latest and slightly different from v1.


4. Basic Example

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
    is_active: bool = True   # default value

# Input data (possibly from API)
input_data = {
    "id": "101",  # string instead of int
    "name": "Rajeev",
    "email": "rajeev@example.com"
}

user = User(**input_data)
print(user)
print(user.id, type(user.id))  # automatically converted to int

Output:

id=101 name='Rajeev' email='rajeev@example.com' is_active=True
101 <class 'int'>

✅ Notice: Pydantic automatically converted "101" (str) to 101 (int)`.


5. Type Validation

Pydantic validates data types strictly:

from pydantic import ValidationError

try:
    user = User(id="abc", name="Rajeev", email="rajeev@example.com")
except ValidationError as e:
    print(e.json())

Output:

[
  {
    "model": "User",
    "loc": ["id"],
    "msg": "value is not a valid integer",
    "type": "type_error.integer"
  }
]

🔹 Pydantic gives exact location of error and message.


6. Nested Models

You can define models inside models:

class Address(BaseModel):
    street: str
    city: str
    zipcode: str

class User(BaseModel):
    id: int
    name: str
    address: Address

data = {
    "id": 1,
    "name": "Rajeev",
    "address": {"street": "MG Road", "city": "Bangalore", "zipcode": "560001"}
}

user = User(**data)
print(user.address.city)  # Bangalore

✅ Useful for APIs with complex JSON payloads.


7. List, Dict, and Optional Types

from typing import List, Optional

class User(BaseModel):
    id: int
    name: str
    tags: List[str] = []       # default empty list
    nickname: Optional[str]    # optional field

data = {"id": 1, "name": "Rajeev", "tags": ["python", "AI"]}
user = User(**data)
print(user)

8. Validators (Custom Rules)

Pydantic allows custom validation functions:

from pydantic import validator

class User(BaseModel):
    id: int
    name: str
    email: str

    @validator('email')
    def email_must_contain_at(cls, v):
        if '@' not in v:
            raise ValueError('Invalid email')
        return v

User(id=1, name='Rajeev', email='rajeev@example.com')  # ✅ OK
User(id=1, name='Rajeev', email='invalid_email')        # ❌ Raises error

9. Environment & Settings Management

Pydantic’s BaseSettings allows reading configuration from env variables:

from pydantic import BaseSettings

class Settings(BaseSettings):
    app_name: str
    admin_email: str
    debug: bool = False

    class Config:
        env_file = ".env"  # read from .env file

settings = Settings()
print(settings.app_name)

Industry use case: 12-factor apps, where config is in environment variables.


10. Industry Use Cases

Use CaseExample
API request validationFastAPI uses Pydantic models to parse JSON requests and validate them automatically.
Data pipelinesETL scripts validate incoming CSV/JSON data before processing.
Configuration managementLoad .env or JSON configs safely into structured Python objects.
MicroservicesEnsure strict data contracts between services.
ML/AI projectsValidate feature inputs before feeding into models to avoid runtime errors.

11. FastAPI Example (Industry Standard)

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class User(BaseModel):
    id: int
    name: str
    email: str

@app.post("/users/")
async def create_user(user: User):
    return {"message": f"User {user.name} created", "user": user.dict()}

✅ Pydantic automatically validates the incoming POST request JSON.


12. Pydantic V2 Differences (if upgrading)

  • model_validator replaces @validator.
  • field_validator for field-level validation.
  • Faster and supports more type parsing.
  • Some config syntax has changed (model_config instead of Config).

13. Summary – Why Pydantic is Powerful

  1. Safe and validated data at runtime.
  2. Automatic type conversion.
  3. Readable errors – helpful for debugging and API clients.
  4. Deeply nested models supported.
  5. Integrates seamlessly with FastAPI, data pipelines, configs.
  6. Custom validation logic is easy to implement.

Recommended Next Steps

  1. Practice with nested models and lists/dicts.
  2. Try FastAPI integration with Pydantic models.
  3. Explore BaseSettings for env-based configs.
  4. Implement custom validators for real-world rules.

Perfect 👌 — let’s now build a hands-on, real-world Pydantic + FastAPI tutorial that walks you through all the recommended next steps, step by step.

You’ll get:

  • ✅ Nested & List/Dict Models
  • ✅ FastAPI Integration
  • ✅ BaseSettings for config
  • ✅ Custom Validators for real-world data rules
  • ✅ A downloadable “cheat sheet” summary at the end

🧠 Pydantic Advanced Tutorial (with FastAPI Integration)


1️⃣ Setup

Install everything first:

pip install fastapi pydantic[dotenv] uvicorn

2️⃣ Step 1: Practice with Nested Models and Lists/Dicts

Let’s simulate a real-world use case — a company API managing employees.

# file: models.py
from pydantic import BaseModel
from typing import List, Dict

class Skill(BaseModel):
    name: str
    level: str  # Beginner, Intermediate, Expert

class Address(BaseModel):
    street: str
    city: str
    zipcode: str

class Employee(BaseModel):
    id: int
    name: str
    department: str
    address: Address
    skills: List[Skill]     # list of Skill objects
    metadata: Dict[str, str] = {}  # dynamic info

# ✅ Example data
data = {
    "id": 101,
    "name": "Rajeev Lochan",
    "department": "Data Engineering",
    "address": {"street": "MG Road", "city": "Bangalore", "zipcode": "560001"},
    "skills": [
        {"name": "PySpark", "level": "Expert"},
        {"name": "FastAPI", "level": "Intermediate"},
    ],
    "metadata": {"project": "AI Platform", "role": "Lead Engineer"}
}

employee = Employee(**data)
print(employee)
print(employee.skills[0].name)

🔹 What you learn here:

  • How to nest models (Address, Skill inside Employee).
  • How to use List and Dict fields.
  • How Pydantic auto-parses dicts into objects.

3️⃣ Step 2: Integrate with FastAPI

Now, use these models in a real API.

# file: main.py
from fastapi import FastAPI
from models import Employee

app = FastAPI()

@app.post("/employees/")
async def create_employee(employee: Employee):
    return {
        "message": f"Employee {employee.name} added successfully!",
        "skills": [s.name for s in employee.skills],
        "city": employee.address.city,
    }

Run the server:

uvicorn main:app --reload

Then visit:
👉 http://127.0.0.1:8000/docs

Try sending this JSON in Swagger UI:

{
  "id": 101,
  "name": "Rajeev",
  "department": "AI",
  "address": {"street": "MG Road", "city": "Bangalore", "zipcode": "560001"},
  "skills": [
    {"name": "Python", "level": "Expert"},
    {"name": "FastAPI", "level": "Intermediate"}
  ],
  "metadata": {"project": "Chatbot"}
}

✅ FastAPI auto-validates everything using Pydantic before hitting your function.


4️⃣ Step 3: Using BaseSettings for Config Management

You can manage API configs using environment variables (common in production apps).

Create .env file:

APP_NAME=EmployeeAPI
ADMIN_EMAIL=admin@company.com
DEBUG=True

Now load it using BaseSettings:

# file: config.py
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    app_name: str
    admin_email: str
    debug: bool = False

    class Config:
        env_file = ".env"

settings = Settings()

print(settings.app_name)
print(settings.admin_email)

✅ Benefits:

  • No hardcoded secrets/configs in code.
  • Ideal for 12-factor app deployments (e.g., AWS, Docker, Kubernetes).

5️⃣ Step 4: Custom Validators for Real-World Rules

We’ll add validation logic for our Employee model.

Example rules:

  • Email must contain “@”.
  • Skill level must be one of [Beginner, Intermediate, Expert].
  • Zipcode must be numeric.
# file: validators_demo.py
from pydantic import BaseModel, field_validator, ValidationError
from typing import List

class Skill(BaseModel):
    name: str
    level: str

    @field_validator('level')
    def validate_level(cls, v):
        valid_levels = ["Beginner", "Intermediate", "Expert"]
        if v not in valid_levels:
            raise ValueError(f"Invalid level: {v}. Must be one of {valid_levels}")
        return v

class Employee(BaseModel):
    name: str
    email: str
    zipcode: str
    skills: List[Skill]

    @field_validator('email')
    def email_must_contain_at(cls, v):
        if '@' not in v:
            raise ValueError("Invalid email format")
        return v

    @field_validator('zipcode')
    def check_zip(cls, v):
        if not v.isdigit():
            raise ValueError("Zipcode must be numeric")
        return v

# ✅ Test
try:
    e = Employee(
        name="Rajeev",
        email="rajeev@example.com",
        zipcode="560001",
        skills=[{"name": "FastAPI", "level": "Expert"}]
    )
    print("✅ Valid employee:", e)
except ValidationError as e:
    print(e.json())

Try changing:

email="invalid_email"
zipcode="56A001"

and see the error details Pydantic gives — it’s clear and structured.


6️⃣ Bonus: Combining All Together (Mini Project)

Now let’s combine everything in a small FastAPI mini app that uses:

  • Nested Models
  • Validation
  • Config
  • Logging

main.py

from fastapi import FastAPI
from pydantic import BaseModel, field_validator
from pydantic_settings import BaseSettings
from typing import List

# ---------------- Config ----------------
class Settings(BaseSettings):
    app_name: str = "EmployeeAPI"
    admin_email: str
    debug: bool = True
    class Config:
        env_file = ".env"

settings = Settings()

# ---------------- Models ----------------
class Skill(BaseModel):
    name: str
    level: str
    @field_validator('level')
    def valid_level(cls, v):
        if v not in ["Beginner", "Intermediate", "Expert"]:
            raise ValueError("Invalid skill level")
        return v

class Address(BaseModel):
    street: str
    city: str
    zipcode: str

class Employee(BaseModel):
    id: int
    name: str
    email: str
    address: Address
    skills: List[Skill]

    @field_validator('email')
    def valid_email(cls, v):
        if '@' not in v:
            raise ValueError("Invalid email format")
        return v

# ---------------- API ----------------
app = FastAPI(title=settings.app_name)

@app.post("/employees/")
async def add_employee(employee: Employee):
    return {
        "message": f"Employee {employee.name} added successfully!",
        "city": employee.address.city,
        "skills": [s.name for s in employee.skills],
        "contact_admin": settings.admin_email,
    }

Then run:

uvicorn main:app --reload

Visit: http://127.0.0.1:8000/docs

✅ You now have a production-style FastAPI app powered by Pydantic models + validation + config.


7️⃣ Cheat Sheet Summary

ConceptExamplePurpose
BaseModelclass User(BaseModel)Defines structured models
Nested modelsaddress: AddressHierarchical data
Lists & Dictsskills: List[Skill]Arrays & key-value pairs
Optional fieldsOptional[str]Non-mandatory fields
Validators@field_validatorCustom rules
BaseSettingsclass Config(BaseSettings)Config via .env
dict() methoduser.dict()Convert model → dict
JSON serializationuser.model_dump_json()Model → JSON
IntegrationFastAPIAuto validation
Error handlingValidationErrorCatch & display errors

Industry Use Case Summary

Use CasePydantic Role
ETL Frameworks (like yours)Validate and type-check dynamic metadata JSON, configs, and parameters before Spark job execution
FastAPI BackendsValidate incoming API payloads automatically
AI/ML Inference APIsEnsure feature inputs are correctly typed
Config ManagementLoad environment variables securely
Data Quality LayersDetect invalid records early in pipelines

Posted in

Leave a Reply

Your email address will not be published. Required fields are marked *