True A→Z → PhD-level Python Mastery Course Part2

Excellent.
This is PHASE 9 — Files, JSON, CSV & Serialization, taught at production-grade + senior interview depth.

This phase is where Python meets the real world: filesystems, data exchange, APIs, pipelines, and persistence.
After this, you’ll stop writing “toy file code” and start writing safe, scalable, failure-aware IO.


📂 PHASE 9 — FILES, JSON, CSV & SERIALIZATION (DEEP)


9.1 FILE I/O IS A RESOURCE PROBLEM (NOT JUST READ/WRITE)

A file is:

  • An OS-managed resource
  • Limited
  • Must be closed deterministically
f = open("data.txt")
f.read()
f.close()

❌ Bug-prone
❌ Leaks file descriptors


9.2 CONTEXT MANAGER IS NON-NEGOTIABLE

with open("data.txt", "r", encoding="utf-8") as f:
    data = f.read()

✔ Auto-close
✔ Exception-safe
✔ Production-grade

📌 Interview line

Files must always be managed via context managers.


9.3 FILE MODES (YOU MUST KNOW THESE)

ModeMeaning
rRead
wWrite (truncate)
aAppend
xCreate (fail if exists)
bBinary
tText (default)
+Read & write

Examples:

open("x.txt", "rb")
open("x.txt", "a+")

9.4 TEXT VS BINARY FILES (CRITICAL)

Text mode

  • Returns str
  • Applies encoding/decoding
  • Newline translation

Binary mode

  • Returns bytes
  • No encoding
  • Exact byte control
open("img.png", "rb")

📌 Interview rule

Text for human-readable data, binary for everything else.


9.5 ENCODING — SOURCE OF MOST PRODUCTION BUGS

open("data.txt")   # ❌ OS-dependent encoding

✅ Always specify:

open("data.txt", encoding="utf-8")

9.6 READING STRATEGIES (PERFORMANCE)

❌ Bad for large files

data = f.read()

✅ Stream safely

for line in f:
    process(line)

Or:

f.read(1024)   # chunked

📌 Interview insight

Streaming avoids memory blowups.


9.7 FILE POINTER & SEEKING

f.tell()
f.seek(0)

Used in:

  • Parsers
  • Resume reads
  • Binary formats

9.8 ATOMIC FILE WRITES (VERY IMPORTANT)

❌ Dangerous

with open("config.json", "w") as f:
    f.write(data)

Crash → corrupted file

✅ Atomic pattern

import os, tempfile

with tempfile.NamedTemporaryFile("w", delete=False) as tmp:
    tmp.write(data)

os.replace(tmp.name, "config.json")

📌 Used in configs, checkpoints, state files.


🧾 JSON (REAL WORLD, NOT TOY)


9.9 JSON BASICS (YOU ALREADY KNOW — BUT READ THIS)

import json

json.dumps(obj)
json.loads(s)
json.dump(obj, f)
json.load(f)

9.10 WHAT JSON CAN & CANNOT SERIALIZE

JSON supports:

  • dict
  • list
  • str
  • int
  • float
  • bool
  • null

JSON cannot serialize:

  • datetime
  • Decimal
  • set
  • custom objects

❌ This fails:

json.dumps(datetime.now())

9.11 CUSTOM JSON SERIALIZATION (INTERVIEW FAVORITE)

Method 1 — default

def encode(obj):
    if isinstance(obj, set):
        return list(obj)
    raise TypeError

json.dumps(data, default=encode)

Method 2 — Custom Encoder

class Encoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return super().default(obj)

9.12 DECODING BACK INTO OBJECTS

def decode(d):
    if "type" in d:
        return MyClass(**d)
    return d

json.loads(s, object_hook=decode)

📌 Used in:

  • APIs
  • Event systems
  • Persistence layers

9.13 JSON PERFORMANCE TIPS

  • Use ensure_ascii=False
  • Avoid pretty-printing in production
  • Consider orjson / ujson (if allowed)

📊 CSV (PRODUCTION-GRADE USAGE)


9.14 WHY NOT SPLIT STRINGS FOR CSV?

❌ Wrong

line.split(",")

Breaks on:

  • Quoted commas
  • Escaped values
  • Newlines

9.15 CORRECT CSV HANDLING

import csv

with open("data.csv", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    for row in reader:
        process(row)

📌 newline="" is mandatory on Windows.


9.16 CSV DICT READER (VERY USEFUL)

reader = csv.DictReader(f)

Returns:

{"name": "A", "age": "30"}

9.17 WRITING CSV SAFELY

with open("out.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["name", "age"])
    writer.writerow(["A", 30])

9.18 CSV PITFALLS (INTERVIEW FAVORITES)

  • Everything is str
  • Encoding issues
  • Line-ending bugs
  • Missing headers

🧠 SERIALIZATION (BEYOND JSON)


9.19 PICKLE — POWERFUL BUT DANGEROUS

import pickle

pickle.dump(obj, f)
pickle.load(f)

🚨 SECURITY WARNING

Never unpickle untrusted data

Why?

  • Arbitrary code execution

📌 Interview line:

Pickle is unsafe for untrusted inputs.


9.20 WHEN PICKLE IS ACCEPTABLE

✔ Internal caching
✔ ML models (controlled environment)
✔ Session persistence

❌ APIs
❌ User uploads


9.21 OTHER SERIALIZATION OPTIONS

FormatUse
JSONAPIs
CSVTabular
PickleInternal
MsgPackCompact
ParquetBig data
AvroSchema-based

📌 Mentioning this shows senior awareness.


9.22 VERSIONING SERIALIZED DATA (VERY IMPORTANT)

Always store:

{
  "version": 2,
  "payload": {...}
}

Why?

  • Backward compatibility
  • Safe migrations

9.23 ERROR HANDLING IN FILE IO

try:
    with open("x.txt") as f:
        ...
except FileNotFoundError:
    ...
except PermissionError:
    ...

📌 Be specific.


9.24 STREAMING JSON (ADVANCED)

For huge JSON:

  • Don’t json.load() whole file
  • Use incremental parsing (ijson)

📌 Mention in senior interviews.


🧪 PRACTICE (INTERVIEW-LEVEL)

Q1

Why must files always be opened using with?


Q2

Explain atomic file writes.


Q3

Why is pickle dangerous?


Q4

Design JSON serialization for a custom class.


Q5

How do you safely process a 50GB CSV file?


🎯 INTERVIEW CHECKPOINT (CRITICAL)

You must now be able to explain:

✅ Text vs binary files
✅ Encoding pitfalls
✅ Streaming large files
✅ Atomic writes
✅ JSON serialization limits
✅ Custom encoders/decoders
✅ CSV correctness
✅ Pickle security risks
✅ When to use which format

If you can explain these cleanly → you handle data like a production Python engineer.


🚀 NEXT PHASE (STRICT CONTINUATION)

Reply with ONLY ONE number:

1 → PHASE 10: Testing, Debugging & Logging (pytest, logging, mocks — deep)
2 → 25-question Files/JSON/CSV interview drill

We continue A→Z without deviation.