Excellent.
This is PHASE 9 — Files, JSON, CSV & Serialization, taught at production-grade + senior interview depth.
This phase is where Python meets the real world: filesystems, data exchange, APIs, pipelines, and persistence.
After this, you’ll stop writing “toy file code” and start writing safe, scalable, failure-aware IO.
📂 PHASE 9 — FILES, JSON, CSV & SERIALIZATION (DEEP)
9.1 FILE I/O IS A RESOURCE PROBLEM (NOT JUST READ/WRITE)
A file is:
- An OS-managed resource
- Limited
- Must be closed deterministically
f = open("data.txt")
f.read()
f.close()
❌ Bug-prone
❌ Leaks file descriptors
9.2 CONTEXT MANAGER IS NON-NEGOTIABLE
with open("data.txt", "r", encoding="utf-8") as f:
data = f.read()
✔ Auto-close
✔ Exception-safe
✔ Production-grade
📌 Interview line
Files must always be managed via context managers.
9.3 FILE MODES (YOU MUST KNOW THESE)
| Mode | Meaning |
|---|---|
r | Read |
w | Write (truncate) |
a | Append |
x | Create (fail if exists) |
b | Binary |
t | Text (default) |
+ | Read & write |
Examples:
open("x.txt", "rb")
open("x.txt", "a+")
9.4 TEXT VS BINARY FILES (CRITICAL)
Text mode
- Returns
str - Applies encoding/decoding
- Newline translation
Binary mode
- Returns
bytes - No encoding
- Exact byte control
open("img.png", "rb")
📌 Interview rule
Text for human-readable data, binary for everything else.
9.5 ENCODING — SOURCE OF MOST PRODUCTION BUGS
open("data.txt") # ❌ OS-dependent encoding
✅ Always specify:
open("data.txt", encoding="utf-8")
9.6 READING STRATEGIES (PERFORMANCE)
❌ Bad for large files
data = f.read()
✅ Stream safely
for line in f:
process(line)
Or:
f.read(1024) # chunked
📌 Interview insight
Streaming avoids memory blowups.
9.7 FILE POINTER & SEEKING
f.tell()
f.seek(0)
Used in:
- Parsers
- Resume reads
- Binary formats
9.8 ATOMIC FILE WRITES (VERY IMPORTANT)
❌ Dangerous
with open("config.json", "w") as f:
f.write(data)
Crash → corrupted file
✅ Atomic pattern
import os, tempfile
with tempfile.NamedTemporaryFile("w", delete=False) as tmp:
tmp.write(data)
os.replace(tmp.name, "config.json")
📌 Used in configs, checkpoints, state files.
🧾 JSON (REAL WORLD, NOT TOY)
9.9 JSON BASICS (YOU ALREADY KNOW — BUT READ THIS)
import json
json.dumps(obj)
json.loads(s)
json.dump(obj, f)
json.load(f)
9.10 WHAT JSON CAN & CANNOT SERIALIZE
JSON supports:
- dict
- list
- str
- int
- float
- bool
- null
JSON cannot serialize:
- datetime
- Decimal
- set
- custom objects
❌ This fails:
json.dumps(datetime.now())
9.11 CUSTOM JSON SERIALIZATION (INTERVIEW FAVORITE)
Method 1 — default
def encode(obj):
if isinstance(obj, set):
return list(obj)
raise TypeError
json.dumps(data, default=encode)
Method 2 — Custom Encoder
class Encoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, set):
return list(obj)
return super().default(obj)
9.12 DECODING BACK INTO OBJECTS
def decode(d):
if "type" in d:
return MyClass(**d)
return d
json.loads(s, object_hook=decode)
📌 Used in:
- APIs
- Event systems
- Persistence layers
9.13 JSON PERFORMANCE TIPS
- Use
ensure_ascii=False - Avoid pretty-printing in production
- Consider
orjson/ujson(if allowed)
📊 CSV (PRODUCTION-GRADE USAGE)
9.14 WHY NOT SPLIT STRINGS FOR CSV?
❌ Wrong
line.split(",")
Breaks on:
- Quoted commas
- Escaped values
- Newlines
9.15 CORRECT CSV HANDLING
import csv
with open("data.csv", newline="", encoding="utf-8") as f:
reader = csv.reader(f)
for row in reader:
process(row)
📌 newline="" is mandatory on Windows.
9.16 CSV DICT READER (VERY USEFUL)
reader = csv.DictReader(f)
Returns:
{"name": "A", "age": "30"}
9.17 WRITING CSV SAFELY
with open("out.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["name", "age"])
writer.writerow(["A", 30])
9.18 CSV PITFALLS (INTERVIEW FAVORITES)
- Everything is
str - Encoding issues
- Line-ending bugs
- Missing headers
🧠 SERIALIZATION (BEYOND JSON)
9.19 PICKLE — POWERFUL BUT DANGEROUS
import pickle
pickle.dump(obj, f)
pickle.load(f)
🚨 SECURITY WARNING
Never unpickle untrusted data
Why?
- Arbitrary code execution
📌 Interview line:
Pickle is unsafe for untrusted inputs.
9.20 WHEN PICKLE IS ACCEPTABLE
✔ Internal caching
✔ ML models (controlled environment)
✔ Session persistence
❌ APIs
❌ User uploads
9.21 OTHER SERIALIZATION OPTIONS
| Format | Use |
|---|---|
| JSON | APIs |
| CSV | Tabular |
| Pickle | Internal |
| MsgPack | Compact |
| Parquet | Big data |
| Avro | Schema-based |
📌 Mentioning this shows senior awareness.
9.22 VERSIONING SERIALIZED DATA (VERY IMPORTANT)
Always store:
{
"version": 2,
"payload": {...}
}
Why?
- Backward compatibility
- Safe migrations
9.23 ERROR HANDLING IN FILE IO
try:
with open("x.txt") as f:
...
except FileNotFoundError:
...
except PermissionError:
...
📌 Be specific.
9.24 STREAMING JSON (ADVANCED)
For huge JSON:
- Don’t
json.load()whole file - Use incremental parsing (ijson)
📌 Mention in senior interviews.
🧪 PRACTICE (INTERVIEW-LEVEL)
Q1
Why must files always be opened using with?
Q2
Explain atomic file writes.
Q3
Why is pickle dangerous?
Q4
Design JSON serialization for a custom class.
Q5
How do you safely process a 50GB CSV file?
🎯 INTERVIEW CHECKPOINT (CRITICAL)
You must now be able to explain:
✅ Text vs binary files
✅ Encoding pitfalls
✅ Streaming large files
✅ Atomic writes
✅ JSON serialization limits
✅ Custom encoders/decoders
✅ CSV correctness
✅ Pickle security risks
✅ When to use which format
If you can explain these cleanly → you handle data like a production Python engineer.
🚀 NEXT PHASE (STRICT CONTINUATION)
Reply with ONLY ONE number:
1 → PHASE 10: Testing, Debugging & Logging (pytest, logging, mocks — deep)
2 → 25-question Files/JSON/CSV interview drill
We continue A→Z without deviation.