Excellent.
DAY 25 is where Python meets the real world: disks, OS buffers, encodings, and performance tradeoffs. Many production bugs live here.


🧠 DAY 25 — Files, I/O & OS Interaction

(Buffering, Text vs Binary, Context Managers & Pitfalls)


🔑 CORE IDEA OF DAY 25

File I/O is buffered, stateful, and encoding-sensitive.
Correctness and performance depend on understanding these layers.


1️⃣ The I/O Stack (From Python to Disk)

Image
Image
Image
Image
Your Code
  ↓
Python IO Objects (TextIO / BufferedIO)
  ↓
OS Kernel Buffers
  ↓
Filesystem
  ↓
Disk

Key takeaway: writes don’t hit disk immediately.


2️⃣ open() — What You Really Get

f = open("data.txt", "r")
  • Returns a file object
  • File object wraps OS file descriptor
  • Mode controls behavior

Common modes

ModeMeaning
"r"read (text)
"w"write (truncate)
"a"append
"rb"read (binary)
"wb"write (binary)

3️⃣ Text vs Binary Mode (CRITICAL)

Text mode

open("file.txt", "r", encoding="utf-8")
  • Reads str
  • Applies encoding/decoding
  • Handles newline translation

Binary mode

open("file.bin", "rb")
  • Reads bytes
  • No encoding
  • Exact data

🧠 Text is a view over bytes.


4️⃣ Encoding & Decoding (Common Failure Point)

text = "नमस्ते"
data = text.encode("utf-8")   # str → bytes
text2 = data.decode("utf-8")  # bytes → str

Rules:

  • Decode once at boundaries
  • Don’t mix encodings
  • Always specify encoding in production

🚨 Relying on default encoding causes bugs across machines.


5️⃣ Buffering: Why write() Doesn’t Write

f.write("hello")

This writes to:

  • Python buffer
  • OS buffer

Not necessarily disk.

Flush points

  • Buffer full
  • f.flush()
  • f.close()
  • Program exit (not guaranteed on crash)

Use:

f.flush()

for critical writes.


6️⃣ Line-by-Line vs Chunked Reading

Line-by-line (good for text)

for line in f:
    process(line)
  • Lazy
  • Memory efficient
  • Uses internal buffering

Chunked (good for binary)

while chunk := f.read(8192):
    process(chunk)

Avoid:

f.read()   # loads entire file into memory

unless file is small.


7️⃣ Context Managers Are NOT Optional

❌ Bad:

f = open("x.txt")
data = f.read()
# forgot close

✅ Good:

with open("x.txt") as f:
    data = f.read()

Why:

  • Guarantees close
  • Handles exceptions
  • Frees OS resources

🧠 File handles are finite OS resources.


8️⃣ Appending vs Writing (Subtle Difference)

open("f.txt", "w")  # truncates
open("f.txt", "a")  # appends

Append mode:

  • Always writes at end
  • Even if you seek()

9️⃣ File Pointer & seek() / tell()

f.tell()        # current position
f.seek(0)       # move to beginning

Binary-safe:

f.seek(0, 2)    # move to end

Text files:

  • seek() behavior may be limited
  • Due to variable-width encodings

🔟 Common I/O Pitfalls (🔥 Interview Favorites)

Pitfall 1: Assuming write == disk

f.write("data")
# crash → data lost

Pitfall 2: Encoding mismatch

open("file.txt")  # default encoding differs per OS

Pitfall 3: Reading huge files at once

data = f.read()   # memory spike

11️⃣ Atomic File Writes (Production Pattern)

Avoid partial writes:

import os, tempfile

with tempfile.NamedTemporaryFile(delete=False) as tmp:
    tmp.write(data)

os.replace(tmp.name, "final.txt")

Ensures:

  • Either old file or new file exists
  • Never a half-written file

12️⃣ File I/O Performance Tips

  • Prefer buffered reads
  • Avoid frequent small writes
  • Batch writes
  • Use binary mode for large data
  • Let OS buffering work for you

🔥 INTERVIEW TRAPS (DAY 25)

Q1

Why with open() is safer than manual close?

✔ Ensures close on exception
✔ Deterministic cleanup


Q2

Difference between text and binary mode?

✔ Encoding
✔ Newline handling
✔ Data type (str vs bytes)


Q3

Why writes may “disappear” after crash?

✔ Buffered
✔ Not flushed to disk


🧠 DAY 25 MENTAL CHECKLIST

Before doing file I/O:

  1. Text or binary?
  2. Encoding specified?
  3. Large or small file?
  4. Buffered or immediate write?
  5. Using context manager?

📝 DAY 25 ASSIGNMENT (MANDATORY)

1️⃣ Predict behavior (NO RUNNING):

f = open("x.txt", "w")
f.write("hello")
# program crashes here

Will "hello" be in the file? Why/why not?


2️⃣ Rewrite safely:

f = open("data.txt")
data = f.read()

3️⃣ Design question:

How would you process a 100GB log file to extract error lines efficiently in Python?

(Hint: generators + buffered I/O)


🔜 DAY 26 PREVIEW

DAY 26 — Concurrency Model
(GIL, threading vs multiprocessing vs asyncio)

You’ll finally understand:

  • Why Python threads don’t scale CPU
  • When threads DO help
  • How asyncio actually works
  • How to choose the right concurrency model

When ready, say 👉 “START DAY 26”