Absolutely! Let’s do a deep dive into Pydantic step by step, with clear examples and industry use cases. I’ll break this down in a structured way so you can fully grasp it.
1. What is Pydantic?
Pydantic is a data validation and settings management library for Python, primarily using Python type annotations.
- It ensures that your data is correct, consistent, and type-safe.
- Under the hood, it parses input data, converts types if possible, and raises detailed errors if the data is invalid.
- Used heavily in FastAPI, data pipelines, configuration management, and APIs.
Think of Pydantic as a guard that sits between your raw data (JSON, dicts, environment variables) and your Python code, making sure the data is structured correctly.
2. Key Features of Pydantic
| Feature | Explanation |
|---|---|
| Type validation | Ensures data types match what you declare. |
| Automatic type conversion | Converts compatible types (e.g., string "123" → int 123). |
| Nested models | Supports nested structures (dicts inside dicts). |
| Data parsing | Converts JSON or dicts to Python objects. |
| Error reporting | Gives detailed, structured error messages. |
| Immutable models | Can make data read-only (frozen=True). |
| Integration with FastAPI | Powers request/response validation automatically. |
3. Installing Pydantic
pip install pydantic
We’ll use Pydantic v2 examples since it’s the latest and slightly different from v1.
4. Basic Example
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
is_active: bool = True # default value
# Input data (possibly from API)
input_data = {
"id": "101", # string instead of int
"name": "Rajeev",
"email": "rajeev@example.com"
}
user = User(**input_data)
print(user)
print(user.id, type(user.id)) # automatically converted to int
Output:
id=101 name='Rajeev' email='rajeev@example.com' is_active=True
101 <class 'int'>
✅ Notice: Pydantic automatically converted "101" (str) to 101 (int)`.
5. Type Validation
Pydantic validates data types strictly:
from pydantic import ValidationError
try:
user = User(id="abc", name="Rajeev", email="rajeev@example.com")
except ValidationError as e:
print(e.json())
Output:
[
{
"model": "User",
"loc": ["id"],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]
🔹 Pydantic gives exact location of error and message.
6. Nested Models
You can define models inside models:
class Address(BaseModel):
street: str
city: str
zipcode: str
class User(BaseModel):
id: int
name: str
address: Address
data = {
"id": 1,
"name": "Rajeev",
"address": {"street": "MG Road", "city": "Bangalore", "zipcode": "560001"}
}
user = User(**data)
print(user.address.city) # Bangalore
✅ Useful for APIs with complex JSON payloads.
7. List, Dict, and Optional Types
from typing import List, Optional
class User(BaseModel):
id: int
name: str
tags: List[str] = [] # default empty list
nickname: Optional[str] # optional field
data = {"id": 1, "name": "Rajeev", "tags": ["python", "AI"]}
user = User(**data)
print(user)
8. Validators (Custom Rules)
Pydantic allows custom validation functions:
from pydantic import validator
class User(BaseModel):
id: int
name: str
email: str
@validator('email')
def email_must_contain_at(cls, v):
if '@' not in v:
raise ValueError('Invalid email')
return v
User(id=1, name='Rajeev', email='rajeev@example.com') # ✅ OK
User(id=1, name='Rajeev', email='invalid_email') # ❌ Raises error
9. Environment & Settings Management
Pydantic’s BaseSettings allows reading configuration from env variables:
from pydantic import BaseSettings
class Settings(BaseSettings):
app_name: str
admin_email: str
debug: bool = False
class Config:
env_file = ".env" # read from .env file
settings = Settings()
print(settings.app_name)
Industry use case: 12-factor apps, where config is in environment variables.
10. Industry Use Cases
| Use Case | Example |
|---|---|
| API request validation | FastAPI uses Pydantic models to parse JSON requests and validate them automatically. |
| Data pipelines | ETL scripts validate incoming CSV/JSON data before processing. |
| Configuration management | Load .env or JSON configs safely into structured Python objects. |
| Microservices | Ensure strict data contracts between services. |
| ML/AI projects | Validate feature inputs before feeding into models to avoid runtime errors. |
11. FastAPI Example (Industry Standard)
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class User(BaseModel):
id: int
name: str
email: str
@app.post("/users/")
async def create_user(user: User):
return {"message": f"User {user.name} created", "user": user.dict()}
✅ Pydantic automatically validates the incoming POST request JSON.
12. Pydantic V2 Differences (if upgrading)
model_validatorreplaces@validator.field_validatorfor field-level validation.- Faster and supports more type parsing.
- Some config syntax has changed (
model_configinstead ofConfig).
13. Summary – Why Pydantic is Powerful
- ✅ Safe and validated data at runtime.
- ✅ Automatic type conversion.
- ✅ Readable errors – helpful for debugging and API clients.
- ✅ Deeply nested models supported.
- ✅ Integrates seamlessly with FastAPI, data pipelines, configs.
- ✅ Custom validation logic is easy to implement.
Recommended Next Steps
- Practice with nested models and lists/dicts.
- Try FastAPI integration with Pydantic models.
- Explore
BaseSettingsfor env-based configs. - Implement custom validators for real-world rules.
Perfect 👌 — let’s now build a hands-on, real-world Pydantic + FastAPI tutorial that walks you through all the recommended next steps, step by step.
You’ll get:
- ✅ Nested & List/Dict Models
- ✅ FastAPI Integration
- ✅ BaseSettings for config
- ✅ Custom Validators for real-world data rules
- ✅ A downloadable “cheat sheet” summary at the end
🧠 Pydantic Advanced Tutorial (with FastAPI Integration)
1️⃣ Setup
Install everything first:
pip install fastapi pydantic[dotenv] uvicorn
2️⃣ Step 1: Practice with Nested Models and Lists/Dicts
Let’s simulate a real-world use case — a company API managing employees.
# file: models.py
from pydantic import BaseModel
from typing import List, Dict
class Skill(BaseModel):
name: str
level: str # Beginner, Intermediate, Expert
class Address(BaseModel):
street: str
city: str
zipcode: str
class Employee(BaseModel):
id: int
name: str
department: str
address: Address
skills: List[Skill] # list of Skill objects
metadata: Dict[str, str] = {} # dynamic info
# ✅ Example data
data = {
"id": 101,
"name": "Rajeev Lochan",
"department": "Data Engineering",
"address": {"street": "MG Road", "city": "Bangalore", "zipcode": "560001"},
"skills": [
{"name": "PySpark", "level": "Expert"},
{"name": "FastAPI", "level": "Intermediate"},
],
"metadata": {"project": "AI Platform", "role": "Lead Engineer"}
}
employee = Employee(**data)
print(employee)
print(employee.skills[0].name)
🔹 What you learn here:
- How to nest models (
Address,SkillinsideEmployee). - How to use List and Dict fields.
- How Pydantic auto-parses dicts into objects.
3️⃣ Step 2: Integrate with FastAPI
Now, use these models in a real API.
# file: main.py
from fastapi import FastAPI
from models import Employee
app = FastAPI()
@app.post("/employees/")
async def create_employee(employee: Employee):
return {
"message": f"Employee {employee.name} added successfully!",
"skills": [s.name for s in employee.skills],
"city": employee.address.city,
}
Run the server:
uvicorn main:app --reload
Then visit:
👉 http://127.0.0.1:8000/docs
Try sending this JSON in Swagger UI:
{
"id": 101,
"name": "Rajeev",
"department": "AI",
"address": {"street": "MG Road", "city": "Bangalore", "zipcode": "560001"},
"skills": [
{"name": "Python", "level": "Expert"},
{"name": "FastAPI", "level": "Intermediate"}
],
"metadata": {"project": "Chatbot"}
}
✅ FastAPI auto-validates everything using Pydantic before hitting your function.
4️⃣ Step 3: Using BaseSettings for Config Management
You can manage API configs using environment variables (common in production apps).
Create .env file:
APP_NAME=EmployeeAPI
ADMIN_EMAIL=admin@company.com
DEBUG=True
Now load it using BaseSettings:
# file: config.py
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
app_name: str
admin_email: str
debug: bool = False
class Config:
env_file = ".env"
settings = Settings()
print(settings.app_name)
print(settings.admin_email)
✅ Benefits:
- No hardcoded secrets/configs in code.
- Ideal for 12-factor app deployments (e.g., AWS, Docker, Kubernetes).
5️⃣ Step 4: Custom Validators for Real-World Rules
We’ll add validation logic for our Employee model.
Example rules:
- Email must contain “@”.
- Skill level must be one of
[Beginner, Intermediate, Expert]. - Zipcode must be numeric.
# file: validators_demo.py
from pydantic import BaseModel, field_validator, ValidationError
from typing import List
class Skill(BaseModel):
name: str
level: str
@field_validator('level')
def validate_level(cls, v):
valid_levels = ["Beginner", "Intermediate", "Expert"]
if v not in valid_levels:
raise ValueError(f"Invalid level: {v}. Must be one of {valid_levels}")
return v
class Employee(BaseModel):
name: str
email: str
zipcode: str
skills: List[Skill]
@field_validator('email')
def email_must_contain_at(cls, v):
if '@' not in v:
raise ValueError("Invalid email format")
return v
@field_validator('zipcode')
def check_zip(cls, v):
if not v.isdigit():
raise ValueError("Zipcode must be numeric")
return v
# ✅ Test
try:
e = Employee(
name="Rajeev",
email="rajeev@example.com",
zipcode="560001",
skills=[{"name": "FastAPI", "level": "Expert"}]
)
print("✅ Valid employee:", e)
except ValidationError as e:
print(e.json())
Try changing:
email="invalid_email"
zipcode="56A001"
and see the error details Pydantic gives — it’s clear and structured.
6️⃣ Bonus: Combining All Together (Mini Project)
Now let’s combine everything in a small FastAPI mini app that uses:
- Nested Models
- Validation
- Config
- Logging
main.py
from fastapi import FastAPI
from pydantic import BaseModel, field_validator
from pydantic_settings import BaseSettings
from typing import List
# ---------------- Config ----------------
class Settings(BaseSettings):
app_name: str = "EmployeeAPI"
admin_email: str
debug: bool = True
class Config:
env_file = ".env"
settings = Settings()
# ---------------- Models ----------------
class Skill(BaseModel):
name: str
level: str
@field_validator('level')
def valid_level(cls, v):
if v not in ["Beginner", "Intermediate", "Expert"]:
raise ValueError("Invalid skill level")
return v
class Address(BaseModel):
street: str
city: str
zipcode: str
class Employee(BaseModel):
id: int
name: str
email: str
address: Address
skills: List[Skill]
@field_validator('email')
def valid_email(cls, v):
if '@' not in v:
raise ValueError("Invalid email format")
return v
# ---------------- API ----------------
app = FastAPI(title=settings.app_name)
@app.post("/employees/")
async def add_employee(employee: Employee):
return {
"message": f"Employee {employee.name} added successfully!",
"city": employee.address.city,
"skills": [s.name for s in employee.skills],
"contact_admin": settings.admin_email,
}
Then run:
uvicorn main:app --reload
Visit: http://127.0.0.1:8000/docs
✅ You now have a production-style FastAPI app powered by Pydantic models + validation + config.
7️⃣ Cheat Sheet Summary
| Concept | Example | Purpose |
|---|---|---|
| BaseModel | class User(BaseModel) | Defines structured models |
| Nested models | address: Address | Hierarchical data |
| Lists & Dicts | skills: List[Skill] | Arrays & key-value pairs |
| Optional fields | Optional[str] | Non-mandatory fields |
| Validators | @field_validator | Custom rules |
| BaseSettings | class Config(BaseSettings) | Config via .env |
| dict() method | user.dict() | Convert model → dict |
| JSON serialization | user.model_dump_json() | Model → JSON |
| Integration | FastAPI | Auto validation |
| Error handling | ValidationError | Catch & display errors |
✅ Industry Use Case Summary
| Use Case | Pydantic Role |
|---|---|
| ETL Frameworks (like yours) | Validate and type-check dynamic metadata JSON, configs, and parameters before Spark job execution |
| FastAPI Backends | Validate incoming API payloads automatically |
| AI/ML Inference APIs | Ensure feature inputs are correctly typed |
| Config Management | Load environment variables securely |
| Data Quality Layers | Detect invalid records early in pipelines |
Leave a Reply