Excellent.
This is PHASE 9 — Files, JSON, CSV & Serialization, taught at production-grade + senior interview depth.

This phase is where Python meets the real world: filesystems, data exchange, APIs, pipelines, and persistence.
After this, you’ll stop writing “toy file code” and start writing safe, scalable, failure-aware IO.

📂 PHASE 9 — FILES, JSON, CSV & SERIALIZATION (DEEP)

9.1 FILE I/O IS A RESOURCE PROBLEM (NOT JUST READ/WRITE)

A file is:

An OS-managed resource
Limited
Must be closed deterministically

f = open("data.txt")
f.read()
f.close()

❌ Bug-prone
❌ Leaks file descriptors

9.2 CONTEXT MANAGER IS NON-NEGOTIABLE

with open("data.txt", "r", encoding="utf-8") as f:
    data = f.read()

✔ Auto-close
✔ Exception-safe
✔ Production-grade

📌 Interview line

Files must always be managed via context managers.

9.3 FILE MODES (YOU MUST KNOW THESE)

Mode	Meaning
`r`	Read
`w`	Write (truncate)
`a`	Append
`x`	Create (fail if exists)
`b`	Binary
`t`	Text (default)
`+`	Read & write

Examples:

open("x.txt", "rb")
open("x.txt", "a+")

9.4 TEXT VS BINARY FILES (CRITICAL)

Text mode

Returns str
Applies encoding/decoding
Newline translation

Binary mode

Returns bytes
No encoding
Exact byte control

open("img.png", "rb")

📌 Interview rule

Text for human-readable data, binary for everything else.

9.5 ENCODING — SOURCE OF MOST PRODUCTION BUGS

open("data.txt")   # ❌ OS-dependent encoding

✅ Always specify:

open("data.txt", encoding="utf-8")

9.6 READING STRATEGIES (PERFORMANCE)

❌ Bad for large files

data = f.read()

✅ Stream safely

for line in f:
    process(line)

Or:

f.read(1024)   # chunked

📌 Interview insight

Streaming avoids memory blowups.

9.7 FILE POINTER & SEEKING

f.tell()
f.seek(0)

Used in:

Parsers
Resume reads
Binary formats

9.8 ATOMIC FILE WRITES (VERY IMPORTANT)

❌ Dangerous

with open("config.json", "w") as f:
    f.write(data)

Crash → corrupted file

✅ Atomic pattern

import os, tempfile

with tempfile.NamedTemporaryFile("w", delete=False) as tmp:
    tmp.write(data)

os.replace(tmp.name, "config.json")

📌 Used in configs, checkpoints, state files.

🧾 JSON (REAL WORLD, NOT TOY)

9.9 JSON BASICS (YOU ALREADY KNOW — BUT READ THIS)

import json

json.dumps(obj)
json.loads(s)
json.dump(obj, f)
json.load(f)

9.10 WHAT JSON CAN & CANNOT SERIALIZE

JSON supports:

dict
list
str
int
float
bool
null

JSON cannot serialize:

datetime
Decimal
set
custom objects

❌ This fails:

json.dumps(datetime.now())

9.11 CUSTOM JSON SERIALIZATION (INTERVIEW FAVORITE)

Method 1 — `default`

def encode(obj):
    if isinstance(obj, set):
        return list(obj)
    raise TypeError

json.dumps(data, default=encode)

Method 2 — Custom Encoder

class Encoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return super().default(obj)

9.12 DECODING BACK INTO OBJECTS

def decode(d):
    if "type" in d:
        return MyClass(**d)
    return d

json.loads(s, object_hook=decode)

📌 Used in:

APIs
Event systems
Persistence layers

9.13 JSON PERFORMANCE TIPS

Use ensure_ascii=False
Avoid pretty-printing in production
Consider orjson / ujson (if allowed)

📊 CSV (PRODUCTION-GRADE USAGE)

9.14 WHY NOT SPLIT STRINGS FOR CSV?

❌ Wrong

line.split(",")

Breaks on:

Quoted commas
Escaped values
Newlines

9.15 CORRECT CSV HANDLING

import csv

with open("data.csv", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    for row in reader:
        process(row)

📌 newline="" is mandatory on Windows.

9.16 CSV DICT READER (VERY USEFUL)

reader = csv.DictReader(f)

Returns:

{"name": "A", "age": "30"}

9.17 WRITING CSV SAFELY

with open("out.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["name", "age"])
    writer.writerow(["A", 30])

9.18 CSV PITFALLS (INTERVIEW FAVORITES)

Everything is str
Encoding issues
Line-ending bugs
Missing headers

🧠 SERIALIZATION (BEYOND JSON)

9.19 PICKLE — POWERFUL BUT DANGEROUS

import pickle

pickle.dump(obj, f)
pickle.load(f)

🚨 SECURITY WARNING

Never unpickle untrusted data

Why?

Arbitrary code execution

📌 Interview line:

Pickle is unsafe for untrusted inputs.

9.20 WHEN PICKLE IS ACCEPTABLE

✔ Internal caching
✔ ML models (controlled environment)
✔ Session persistence

❌ APIs
❌ User uploads

9.21 OTHER SERIALIZATION OPTIONS

Format	Use
JSON	APIs
CSV	Tabular
Pickle	Internal
MsgPack	Compact
Parquet	Big data
Avro	Schema-based

📌 Mentioning this shows senior awareness.

9.22 VERSIONING SERIALIZED DATA (VERY IMPORTANT)

Always store:

{
  "version": 2,
  "payload": {...}
}

Why?

Backward compatibility
Safe migrations

9.23 ERROR HANDLING IN FILE IO

try:
    with open("x.txt") as f:
        ...
except FileNotFoundError:
    ...
except PermissionError:
    ...

📌 Be specific.

9.24 STREAMING JSON (ADVANCED)

For huge JSON:

Don’t json.load() whole file
Use incremental parsing (ijson)

📌 Mention in senior interviews.

🧪 PRACTICE (INTERVIEW-LEVEL)

Q1

Why must files always be opened using with?

Q2

Explain atomic file writes.

Q3

Why is pickle dangerous?

Q4

Design JSON serialization for a custom class.

Q5

How do you safely process a 50GB CSV file?

🎯 INTERVIEW CHECKPOINT (CRITICAL)

You must now be able to explain:

✅ Text vs binary files
✅ Encoding pitfalls
✅ Streaming large files
✅ Atomic writes
✅ JSON serialization limits
✅ Custom encoders/decoders
✅ CSV correctness
✅ Pickle security risks
✅ When to use which format

If you can explain these cleanly → you handle data like a production Python engineer.

🚀 NEXT PHASE (STRICT CONTINUATION)

Reply with ONLY ONE number:

1 → PHASE 10: Testing, Debugging & Logging (pytest, logging, mocks — deep)
2 → 25-question Files/JSON/CSV interview drill

We continue A→Z without deviation.

Pages: 1 2 3 4 5 6 7 8

True A→Z → PhD-level Python Mastery Course Part2