let’s build a solid foundations-first explanation of how processes, threads, stacks, memory, and process creation work from an OS viewpoint, aimed at someone starting with OS concepts. I’ll keep it clear, conceptual, and practical — with Python-relevant notes and small exercises you can try.
✅ Big picture: process vs thread (short)
- Process: a running instance of a program with its own virtual address space, OS resources (file descriptors), and execution state. Processes are isolated.
- Thread: a flow of execution inside a process. Threads share the same process address space (code, data, heap) but each thread has its own stack and CPU registers.
🧭 Program memory layout (virtual address space)
A typical process memory is divided into segments (conceptual — exact layout varies by OS):
- Text / Code segment
- Holds executable instructions (read-only). Shared (read-only) across processes running the same binary on Unix (with copy-on-write behaviors).
- Data segment
- Initialized data (global/static variables with initial values).
- BSS (uninitialized globals -> zero-initialized).
- Heap
- Dynamically allocated memory (malloc, new, Python objects). Grows upward. Shared by all threads of the process.
- Stack(s)
- Each thread gets its own stack region for local variables, return addresses, function frames. Grows downward (typical).
- Thread stacks are separate — local variables are not shared.
- Memory-mapped region / shared libraries
- mmap’d files, shared libraries (DLLs/.so), sometimes used for IPC via shared memory.
- Kernel space
- Separate — not directly accessible by user code.
🔁 How memory is “shared” and isolated
- Within a process: all threads see the same heap / global variables. So if Thread A modifies a global variable, Thread B sees it (race conditions possible).
- Between processes: default is isolation — one process cannot directly read another process’s memory (virtual address spaces are separate).
- Exceptions / ways to share between processes:
- Shared memory (mmap, POSIX shared memory, System V shm) — regions mapped into more than one process address space.
- Files / memory-mapped files — processes map same file into memory.
- Pipes, sockets, message queues — copy data across.
- Higher-level mechanisms: multiprocessing.Manager(), multiprocessing.Queue() in Python (uses pipes + pickling under the hood).
🧱 Stack vs Heap — concrete differences
- Stack
- Per-thread, stores function call frames, local variables.
- Fast allocation/deallocation — LIFO.
- Limited size (stack overflow if too deep recursion).
- Heap
- Shared among threads (in same process), used for dynamic allocation.
- Managed by allocator (malloc / free), or language runtime (Python’s memory manager / GC).
- Slower to allocate, required synchronization for thread-safety at allocator level.
⚙️ How a process is created (Unix vs Windows)
Unix (fork + exec model)
- fork(): OS creates a new process by duplicating the calling process:
- The child gets a copy of the parent’s memory layout.
- Copy-On-Write (COW) is used: pages are shared read-only until either process writes — then OS copies the page. This makes fork cheap unless you modify lots of memory.
- File descriptors are duplicated (refer to same open files).
- exec(): Replace the current process image (text, data, heap) with a new program (used to run a new program after fork).
- Common pattern: parent
fork()→ childexec()new program → parent continues.
Windows (CreateProcess)
- Windows does not have fork semantics. It uses CreateProcess, which creates a new process and loads the binary — more explicit and often heavier. Python’s multiprocessing accounts for this difference (spawn vs fork).
Python specifics
- On Unix default multiprocessing start method is often fork (fast but watch out with threads + fork).
- On Windows default is spawn (new Python interpreter; safer but more overhead — objects must be picklable to be sent).
- Python also supports forkserver (fork server) start method (available on some systems) to mitigate fork+threads issues.
🔁 Threads vs Processes: lifecycle & costs
- Threads
- Creation: relatively cheap.
- Context switch: lower overhead (same address space, just switch registers/stack).
- Memory: share memory → less RAM overhead.
- Danger: shared-memory bugs (race conditions, deadlocks).
- Processes
- Creation: heavier (new address space). Fork with COW reduces cost on Unix.
- Context switch: costlier (TLB flushes, address space switch).
- Memory: separate; safer (isolation).
- Good for CPU-bound parallelism in Python (bypasses GIL because each process has its own interpreter and GIL).
🧠 The GIL (Global Interpreter Lock) — how it fits in
- GIL is a mutex inside CPython that ensures only one thread executes Python bytecode at a time.
- Why? Simpler memory management and object reference counting safety without per-object locking in CPython.
- Effect:
- I/O-bound threads: threads often release the GIL while waiting (e.g., performing I/O), so threading helps.
- CPU-bound threads: GIL becomes a bottleneck — threads take turns running, so no true multi-core parallelism. Use multiprocessing for CPU-bound tasks.
- Workarounds:
- Use multiprocessing (separate interpreters & GILs).
- Use native extensions that release the GIL (e.g., NumPy heavy computations written in C).
- Use alternative Python implementations (Jython, IronPython, PyPy have different GIL semantics or none).
🔐 Synchronization primitives (why needed, how they map)
- Race condition: two threads write the same shared variable concurrently — result depends on interleaving.
- Mutex / Lock (
threading.Lock) — mutual exclusion for critical sections. - RLock — re-entrant lock.
- Semaphore — limit concurrency to N.
- Condition — thread waits for condition variable (notify/wait).
- Event — simple notify flag among threads.
- Barrier — wait for multiple threads to reach same point.
- Atomic operations — small ops that are indivisible (some languages/CPUs provide atomic ops).
- In processes: use
multiprocessing.Lock,multiprocessing.Queue, or sharedmultiprocessing.Value/Arraywith synchronization.
🔁 Inter-process Communication (IPC) — options & tradeoffs
- Pipes: unidirectional byte stream. Good for parent-child comms.
- Unix domain sockets / TCP sockets: general-purpose, can be local or networked.
- Message queues: OS-provided or user-space (RabbitMQ, Kafka).
- Shared memory / mmap: fastest if you need to share large data but requires synchronization.
- Files: simplest but slow; good for logging or persistence.
- Python tools:
multiprocessing.Queue(uses a pipe + background thread to handle pickling).multiprocessing.Manager()for proxies to shared objects.shared_memory(Python 3.8+) for efficient numpy arrays sharing across processes.
⚡ Context switching & scheduling (brief)
- Context switch: OS saves CPU registers of current thread/process and loads next — cheaper for threads (same address space); heavier for processes (need address space switch).
- Scheduler: decides which thread/process runs. Modern OSes use multi-level feedback queues, priorities, and fairness algorithms.
- Preemptive multitasking: OS interrupts running tasks to give CPU to others.
- Cooperative multitasking: tasks yield (e.g.,
asynciocoroutines yield withawait).
🧪 Examples & gotchas in Python (practical)
- Fork after threads exist (Unix): if your process has threads and you call
fork(), only the calling thread is duplicated in the child. Library states may be inconsistent — dangerous. That’s whymultiprocessingfork + exec patterns andforkserverexist; or preferspawn. - Large memory + fork: fork uses COW — memory isn’t duplicated until written. But if the parent has large memory and then writes, memory spikes.
- Pickling cost:
multiprocessingpasses data via pickling — large objects are serialized → overhead. Use shared memory or memory-mapped files for big data. - I/O in threads: use threads or asyncio. If using blocking libraries (requests), threads are easier. For efficient thousands of concurrent requests, use
aiohttp+asyncio. - Avoid busy-wait loops: use blocking waits or conditions to avoid wasting CPU.
🛠 Short exercises (try these)
- Write a program that starts 2 threads incrementing the same global counter 1,000,000 times each. Observe wrong results; add a
threading.Lockto fix it. - Compare runtime: run a heavy CPU function in (a) two threads, (b) two processes. Measure times — you’ll see processes win for CPU work.
- On Unix, write a small script that
fork()s, have parent and child both write to a file handle — observe interleaving and shared descriptor behavior. - Practice
multiprocessingwithPool.mapvsProcesscreation and measure overhead.
🔎 Interview-style summary + top questions (with short answers)
- What’s the difference between a process and a thread?
Process = independent address space; Thread = lighter, shares address space, has own stack. - What is virtual memory?
Abstraction where each process has its own virtual address space mapped to physical memory/FS via page tables. - Explain the stack and heap.
Stack: per-thread, LIFO frames; Heap: shared dynamic memory. - How does
fork()work and what is copy-on-write?fork()clones process; COW means pages are shared until written — copied on first write. - Why does CPython have a GIL? Pros & cons?
To simplify memory management (reference counting). Pro: simpler C-API. Con: limits multi-core Python thread parallelism. - How do threads communicate safely?
Locks, semaphores, condition variables, queues. - How do processes share large data efficiently?
Shared memory (mmap,shared_memory), memory-mapped files, or IPC with zero-copy if available. - What is a race condition and how to avoid it?
Incorrect outcomes due to unsynchronized interleaving. Avoid with synchronization (locks), or design immutable/functional code. - What happens to file descriptors on fork?
They are duplicated; both parent and child refer to the same underlying open file. - When would you use
spawnvsfork?
Usespawn(Windows default) for safety when your program has threads or complicated state;forkis faster on Unix but can be unsafe with threads.
✅ Final practical checklist for writing concurrent Python programs
- Choose threads for I/O, processes for CPU.
- Understand pickling/serialization costs with processes.
- Avoid forking after launching threads (or use
spawn/forkserver). - Use proper synchronization primitives for shared data.
- Prefer immutability or message passing to limit shared-state bugs.
- Use async when libraries support non-blocking I/O.
If you like, I can now:
- Walk you through the small exercises with ready-to-run Python code.
- Create visual diagrams (ASCII or simple SVG) showing memory layout and fork behavior.
- Show concrete examples of
fork()vsspawnbehavior on Unix vs Windows with Python code and safe patterns.
Let’s learn Multithreading, Multiprocessing, and the Global Interpreter Lock (GIL) in Python — in a way that’s both interactive and industry-focused.
By the end, you’ll not only understand them deeply but also be ready for interview questions.
🧠 1️⃣ What is Multithreading?
Definition
👉 Multithreading means running multiple threads (lightweight tasks) within the same process.
All threads share the same memory space and resources.
A thread is like a mini-program running inside your main program.
🧩 Simple Analogy
Imagine you’re cooking:
- You are the main process (the cook).
- You have two hands doing work simultaneously — threads (cutting veggies 🍅 & boiling water 💧).
Both hands share the same kitchen (memory), but they perform different tasks at the same time.
⚙️ Code Example: Multithreading
import threading
import time
def task(name):
print(f"Starting {name}")
time.sleep(2)
print(f"Finished {name}")
# Create threads
t1 = threading.Thread(target=task, args=("Task 1",))
t2 = threading.Thread(target=task, args=("Task 2",))
# Start threads
t1.start()
t2.start()
# Wait for both to finish
t1.join()
t2.join()
print("All tasks done!")
🧾 Explanation:
- Two threads start almost together.
- While one thread is waiting (sleeping), the other can work.
- Great for I/O-bound tasks like:
- Reading files
- Fetching APIs
- Waiting for user input
- Database queries
🔒 2️⃣ What is the Global Interpreter Lock (GIL)?
The GIL is a mutex (lock) in the CPython interpreter that allows only one thread to execute Python bytecode at a time — even if you have multiple CPU cores!
⚠️ So what does this mean?
- Python threads don’t truly run in parallel for CPU-bound tasks (like heavy computation).
- They take turns executing (cooperative concurrency).
That’s why multithreading ≠ parallelism in Python for CPU-heavy code.
🧩 CPU-bound example showing GIL limitation
import threading
import time
def count():
x = 0
for i in range(10**7):
x += 1
start = time.time()
t1 = threading.Thread(target=count)
t2 = threading.Thread(target=count)
t1.start(); t2.start()
t1.join(); t2.join()
print("Time taken:", time.time() - start)
Now try the same with multiprocessing 👇
You’ll see the difference clearly.
⚡ 3️⃣ What is Multiprocessing?
Definition
👉 Multiprocessing means running multiple processes — each with its own memory space and Python interpreter.
So, each process gets its own GIL, allowing true parallel execution on multiple CPU cores.
🧩 Analogy
Imagine not one cook, but two separate cooks, each in their own kitchen — both can cook at the same time.
They don’t share memory (so you need communication tools like Queues or Pipes).
⚙️ Code Example: Multiprocessing
from multiprocessing import Process
import time
def count():
x = 0
for i in range(10**7):
x += 1
if __name__ == "__main__":
start = time.time()
p1 = Process(target=count)
p2 = Process(target=count)
p1.start(); p2.start()
p1.join(); p2.join()
print("Time taken:", time.time() - start)
✅ This will be faster than the threading version because both processes run in true parallel.
⚙️ 4️⃣ When to Use Which?
| Scenario | Use | Why |
|---|---|---|
| 🧠 CPU-bound tasks (math, ML training, image processing) | Multiprocessing | Avoids GIL, true parallelism |
| 🌐 I/O-bound tasks (API calls, file I/O, DB queries, web scraping) | Multithreading | Overlaps waiting time, efficient |
| ⚙️ Async (asyncio) | For I/O-bound with async APIs | Even lighter than threads, single-threaded cooperative model |
🏭 5️⃣ Real-World & Industry Use Cases
| Industry Use Case | Technique | Example |
|---|---|---|
| Web servers (FastAPI, Flask) | Multithreading / Async | Handle many requests simultaneously |
| Data scraping (requests, Selenium) | Multithreading | Fetch data from multiple URLs concurrently |
| Image / video processing | Multiprocessing | Process frames in parallel |
| Machine Learning | Multiprocessing | Run model training or feature extraction in parallel |
| ETL Pipelines (Spark, Pandas) | Multiprocessing | Parallelize heavy data transformations |
🧩 6️⃣ Combine with Async for Modern Concurrency
Example: using asyncio + aiohttp for I/O concurrency.
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
print(f"Fetched {url}")
return await response.text()
async def main():
urls = ["https://example.com", "https://google.com", "https://python.org"]
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
await asyncio.gather(*tasks)
asyncio.run(main())
⚙️ This runs all HTTP requests concurrently without threads or processes.
💬 7️⃣ Top Interview Questions & Answers
| # | Question | Short Answer |
|---|---|---|
| 1 | What is the GIL and why does it exist? | It’s a lock in CPython that ensures only one thread executes Python bytecode at a time, simplifying memory management. |
| 2 | Does Python multithreading achieve true parallelism? | No, not for CPU-bound tasks, due to GIL. |
| 3 | How can you achieve true parallelism in Python? | Use multiprocessing — each process has its own GIL. |
| 4 | When to use multithreading vs multiprocessing? | I/O-bound → threading; CPU-bound → multiprocessing. |
| 5 | Can GIL be removed? | Not easily. It simplifies memory management in CPython. Other interpreters like Jython or PyPy don’t have it. |
| 6 | How does asyncio differ from multithreading? | asyncio uses a single thread and event loop with cooperative multitasking. |
| 7 | How do threads share data safely? | Use threading.Lock, RLock, or Queue to prevent race conditions. |
| 8 | What is a daemon thread? | A background thread that ends when the main program exits. |
| 9 | What’s the overhead of multiprocessing? | Higher — separate memory spaces, slower startup, data serialization. |
| 10 | Can multiprocessing improve I/O-bound tasks? | Usually no, threading or async is better. |
🎯 8️⃣ Mini Practice Task (Try it yourself)
Let’s make it hands-on 💪
Task:
Build a small program that:
- Uses multithreading to fetch multiple URLs (like a mini scraper).
- Then, modify it to use multiprocessing for computing SHA256 hash of large files.
I can walk you through this step-by-step mini project next if you want — like we did for Async Weather Fetcher 🌦️.
Perfect! 🚀 Let’s do a hands-on mini-project to compare multithreading vs multiprocessing in Python.
We’ll do it in two parts:
- I/O-bound task: Fetch multiple URLs (simulate waiting with
time.sleepor actual requests). - CPU-bound task: Compute SHA256 of large files (simulate heavy computation).
We’ll measure time and see the difference between threading and multiprocessing.
Step 0: Setup
First, make sure you have:
pip install requests
We’ll use requests for URL fetching.
Part 1: I/O-bound Task (Multithreading vs Sequential)
We will fetch multiple URLs sequentially, then with threads.
Step 1A: Sequential URL fetching
import requests
import time
urls = [
"https://www.google.com",
"https://www.python.org",
"https://www.github.com",
]
def fetch(url):
print(f"Fetching: {url}")
response = requests.get(url)
print(f"Done: {url} (status: {response.status_code})")
start = time.time()
for url in urls:
fetch(url)
end = time.time()
print(f"Sequential fetch took: {end - start:.2f} seconds")
✅ This is single-threaded. Notice the time taken.
Step 1B: Multithreading URL fetching
import threading
threads = []
start = time.time()
for url in urls:
t = threading.Thread(target=fetch, args=(url,))
t.start()
threads.append(t)
for t in threads:
t.join()
end = time.time()
print(f"Multithreaded fetch took: {end - start:.2f} seconds")
Observation:
- Time should be much faster than sequential.
- Threads share memory; I/O tasks like HTTP requests benefit the most.
Part 2: CPU-bound Task (Multiprocessing vs Threading)
We will compute SHA256 hashes of large files.
For demo, we can simulate large files with random bytes.
Step 2A: CPU task function
import hashlib
import os
def compute_sha256(size_mb):
data = os.urandom(size_mb * 1024 * 1024) # simulate large file
sha = hashlib.sha256()
sha.update(data)
return sha.hexdigest()
Step 2B: Threading for CPU-bound
import threading
import time
start = time.time()
threads = []
for _ in range(4): # simulate 4 files
t = threading.Thread(target=compute_sha256, args=(50,)) # 50 MB
t.start()
threads.append(t)
for t in threads:
t.join()
end = time.time()
print(f"Threading CPU task took: {end - start:.2f} seconds")
Observation:
- Not much improvement over sequential — due to GIL.
Step 2C: Multiprocessing for CPU-bound
from multiprocessing import Process
import time
start = time.time()
processes = []
for _ in range(4): # simulate 4 files
p = Process(target=compute_sha256, args=(50,))
p.start()
processes.append(p)
for p in processes:
p.join()
end = time.time()
print(f"Multiprocessing CPU task took: {end - start:.2f} seconds")
Observation:
- Much faster!
- Each process runs in true parallel, separate GIL.
✅ Key Takeaways
- I/O-bound → Multithreading works best.
- CPU-bound → Multiprocessing works best.
- GIL restricts CPU-bound threading in Python.
- Threads are lightweight, processes are heavier but allow parallelism.
💡 Next Step Challenge
- Modify I/O-bound example to fetch 20+ URLs concurrently using ThreadPoolExecutor.
- Modify CPU-bound example to compute hashes for 10 files using ProcessPoolExecutor.
This will mirror real-world web scraping & data processing scenarios.
Excellent ⚡ — this next version is where you’ll really see modern concurrency in action using asyncio + aiohttp for fetching, and ProcessPoolExecutor for CPU work.
This mirrors real production design patterns used in:
- Web crawlers
- Async ETL data pipelines
- AI data preparation
- Log aggregation systems
🧩 Project 2: “Async + Multiprocessing Hybrid Pipeline”
We’ll:
- Fetch 100 URLs asynchronously (non-blocking I/O)
- Process their content in parallel using multiprocessing
- Log timing & save results to CSV
Step 1: Install Requirements
pip install aiohttp
Step 2: Complete Code
import asyncio
import aiohttp
import hashlib
import csv
import time
from concurrent.futures import ProcessPoolExecutor
# ------------------- CONFIG -------------------
URLS = [
"https://www.google.com",
"https://www.python.org",
"https://www.github.com",
"https://www.wikipedia.org",
"https://www.stackoverflow.com",
] * 20 # 100 URLs
MAX_CONCURRENT = 20 # simultaneous async connections
PROCESS_WORKERS = 4 # CPU processes
OUTPUT_FILE = "async_url_hash_results.csv"
# ------------------- STAGE 1: ASYNC FETCH -------------------
async def fetch_url(session, url):
"""Fetch URL content asynchronously."""
try:
async with session.get(url, timeout=5) as resp:
text = await resp.text()
print(f"[Async] Fetched {url} ({len(text)} chars)")
return url, text
except Exception as e:
print(f"[Async] ❌ Failed {url}: {e}")
return url, ""
async def fetch_all(urls):
"""Fetch all URLs concurrently."""
results = []
start = time.time()
connector = aiohttp.TCPConnector(limit=MAX_CONCURRENT)
async with aiohttp.ClientSession(connector=connector) as session:
tasks = [fetch_url(session, url) for url in urls]
for coro in asyncio.as_completed(tasks):
results.append(await coro)
print(f"\n✅ Async fetch completed in {time.time() - start:.2f}s\n")
return results
# ------------------- STAGE 2: MULTIPROCESS HASH -------------------
def compute_hash(item):
"""CPU-bound: Compute SHA256 hash."""
url, text = item
if not text:
return url, "FAILED"
sha = hashlib.sha256(text.encode("utf-8")).hexdigest()
print(f"[Process] {url} → {sha[:10]}...")
return url, sha
def process_hashes(data):
"""Run hashing in parallel using multiprocessing."""
start = time.time()
results = []
with ProcessPoolExecutor(max_workers=PROCESS_WORKERS) as executor:
for res in executor.map(compute_hash, data):
results.append(res)
print(f"\n✅ Hashing completed in {time.time() - start:.2f}s\n")
return results
# ------------------- STAGE 3: SAVE RESULTS -------------------
def save_results(results):
with open(OUTPUT_FILE, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["URL", "SHA256"])
writer.writerows(results)
print(f"📁 Results saved to {OUTPUT_FILE} ({len(results)} records)\n")
# ------------------- PIPELINE DRIVER -------------------
async def main():
print("🚀 Starting Async + Multiprocessing Pipeline...\n")
fetched = await fetch_all(URLS)
processed = process_hashes(fetched)
save_results(processed)
print("🏁 Pipeline finished successfully!")
if __name__ == "__main__":
asyncio.run(main())
⚙️ How It Works
| Stage | Technique | Parallelism Type | Purpose |
|---|---|---|---|
| Fetch | aiohttp + asyncio | Concurrent (non-blocking I/O) | Many requests at once |
| Hash | ProcessPoolExecutor | Parallel (true multi-core) | CPU-heavy hashing |
| Save | Sequential | N/A | Lightweight I/O |
💡 Performance Comparison
| Type | Used For | Concurrency Level | Benefit |
|---|---|---|---|
| Threading | I/O-bound tasks | ~10–100 threads | Simple, limited by GIL |
| Asyncio | I/O-bound tasks | Thousands of coroutines | Non-blocking, lightweight |
| Multiprocessing | CPU-bound tasks | # of cores | True parallelism |
🏭 Industry Use Case Example
| Domain | Use | Example |
|---|---|---|
| Web scraping | Fetch & parse hundreds of sites | News or e-commerce crawlers |
| Data engineering | Download data → process JSON/XML | ETL pipelines |
| AI/ML preprocessing | Load & hash documents | Chunking + embedding |
| Monitoring systems | Collect metrics concurrently | Async polling of APIs |
🧠 Interview Talking Points
| # | Question | Key Answer |
|---|---|---|
| 1 | Why use asyncio instead of threading? | It’s lighter and scales to thousands of I/O tasks efficiently. |
| 2 | Does asyncio give true parallelism? | No, it’s cooperative concurrency — only one task runs at a time, but switches rapidly during waits. |
| 3 | Why combine asyncio with multiprocessing? | Async handles I/O concurrency, multiprocessing handles CPU parallelism. |
| 4 | What’s a coroutine? | A special async function that can be paused (await) and resumed without blocking. |
| 5 | When is ThreadPoolExecutor still useful? | For blocking I/O tasks (like requests or legacy libraries) that aren’t async-aware. |
| 6 | What are common asyncio performance tunings? | Increase connection limits, use uvloop, handle retries, or batch async tasks. |
Leave a Reply