Excellent — that’s one of the most important and misunderstood questions about Python concurrency.
Let’s go deep but keep it intuitive — by actually visualizing how the GIL behaves inside CPython when you run multiple threads, and why CPU-bound threads never run truly in parallel, while I/O threads seem to do so.
🧩 Step 1: What is the GIL really?
The Global Interpreter Lock (GIL) is a mutex (lock) that protects the CPython interpreter’s internal state — mainly the object memory manager and reference counts.
In short:
👉 Only one thread can execute Python bytecode at any given time in a single CPython process.
🧱 Step 2: Why does it exist?
Python objects use reference counting for garbage collection:
x = [1, 2, 3]
y = x
del x # refcount decreases
Each assignment, deletion, and many operations increase/decrease reference counts.
If two threads modified x’s refcount simultaneously without synchronization, memory corruption or crashes could happen.
Rather than adding fine-grained locks to every object (which would be slow and complex), CPython designers added one global lock to ensure only one thread touches Python objects at a time.
So, the GIL makes CPython safe but not fully parallel.
⚙️ Step 3: What does the GIL do to threads?
Every Python thread must acquire the GIL before executing Python bytecode.
If a thread doesn’t hold the GIL, it can’t execute Python code — it just waits.
Internally:
- A thread gets the GIL.
- Runs some number of bytecode instructions (or for a short time slice).
- Periodically releases the GIL (every few milliseconds or after N bytecodes).
- Another waiting thread may acquire it next.
💡 In CPython (3.x):
- The GIL is released every 5 milliseconds (configurable with
sys.setswitchinterval()). - So threads take turns running small chunks of bytecode sequentially, not simultaneously.
🔥 Step 4: Why CPU-bound threads don’t run in parallel
Let’s simulate mentally 👇
CPU-bound task example:
import threading, time
def cpu_task():
start = time.time()
total = 0
for i in range(10**8):
total += i*i
print("Done in:", time.time() - start)
threads = [threading.Thread(target=cpu_task) for _ in range(2)]
for t in threads: t.start()
for t in threads: t.join()
What happens internally:
- Both threads start, but only one thread can hold the GIL at a time.
- So thread A runs bytecode for ~5 ms → releases GIL → thread B acquires → runs 5 ms → releases → repeat.
- The CPU context switches frequently between them.
- Result: total runtime ≈ same as single-threaded, sometimes slower (due to switching overhead).
🧠 CPU utilization:
Even on an 8-core CPU, your program uses only one core effectively because only one thread executes Python code at once.
🌊 Step 5: When can threads actually run concurrently?
When they are not executing Python bytecode, i.e., when the GIL is released.
Many C extensions and I/O operations release the GIL internally while they wait (e.g., reading from disk, network, or sleeping).
Example:
import threading, requests, time
def io_task(url):
requests.get(url)
start = time.time()
threads = [threading.Thread(target=io_task, args=("https://httpbin.org/delay/2",)) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
print("Total:", time.time() - start)
Each thread spends most of its time waiting for network I/O.
During those waits, the C code in requests (via urllib3 → sockets) releases the GIL, so other threads can run.
✅ Result: 5 threads fetching in parallel (I/O overlapped).
⚠️ But note: the actual Python code inside each thread still takes turns with the GIL.
🧠 Step 6: How does CPython decide when to switch threads?
Internally CPython maintains a bytecode counter and timer.
- Every thread periodically checks if it should release the GIL.
- The interpreter checks this every few bytecode instructions.
- If time or counter exceeds the
sys.getswitchinterval()threshold, CPython flags “switch thread soon.”
So thread switching happens roughly every 5ms (default).
But actual switching depends on the OS scheduler too — another layer of unpredictability.
🔄 Step 7: Visualization (simplified timeline)
Imagine 2 threads (A & B) on a dual-core CPU:
Time (ms) Thread A Thread B
-------------------------------------------------
0-5 🟢 Runs (holds GIL) ⚪ Waiting
5-10 ⚪ Waiting 🟢 Runs (holds GIL)
10-15 🟢 Runs (holds GIL) ⚪ Waiting
15-20 ⚪ Waiting 🟢 Runs (holds GIL)
...
Even though both cores exist, only one executes Python bytecode at a time (🟢 = active thread).
💪 Step 8: How to bypass GIL for CPU-bound code
- Use
multiprocessing- Each process has its own GIL and interpreter.
- True parallel CPU execution on multiple cores.
from multiprocessing import Pool def cpu_task(n): total = 0 for i in range(10**8): total += i*i return total if __name__ == "__main__": with Pool(2) as p: p.map(cpu_task, [1, 2]) - Use NumPy or C extensions
- NumPy operations are implemented in C and release the GIL.
- So heavy numeric computations can utilize multiple cores internally.
- Use alternative interpreters
- PyPy: JIT-compiled, sometimes faster.
- Jython, IronPython: no GIL (but different ecosystems).
🧩 Step 9: Summary table
| Type of Workload | Recommended | Why |
|---|---|---|
| CPU-bound (e.g. math loops) | multiprocessing | Each process has its own GIL |
| I/O-bound (e.g. API calls, file reads) | threading or asyncio | Threads can release GIL during I/O |
| Mixed | Combine both | e.g. worker processes with threads inside |
🧠 Step 10: Top Interview Questions (on GIL)
| # | Question | Key Idea |
|---|---|---|
| 1 | What is the GIL and why does it exist? | A lock ensuring one thread executes bytecode at a time for memory safety. |
| 2 | Does the GIL prevent true parallelism? | Yes, for CPU-bound tasks in CPython. |
| 3 | How can Python still do concurrent I/O? | I/O operations release the GIL while waiting. |
| 4 | What’s the difference between threading and multiprocessing regarding GIL? | Threads share one GIL; processes each have their own. |
| 5 | How often does Python switch threads? | Roughly every 5ms (configurable). |
| 6 | How can you remove the GIL? | Not directly; use multiprocessing or C extensions that release it. |
| 7 | What happens if a C extension doesn’t release the GIL? | It blocks all other threads during its execution. |
| 8 | How can you measure GIL impact? | Run CPU-bound code in 1 vs multiple threads; check CPU utilization. |
| 9 | Why can’t the GIL just be removed? | CPython’s memory model (reference counting) isn’t thread-safe without it. |
| 10 | How do libraries like NumPy overcome GIL limits? | They run heavy computations in C and release the GIL while doing so. |
💯 Excellent question — and you’ve hit the core confusion most developers have when learning Python concurrency.
Let’s clarify this precisely, with examples and analogies, so you’ll never be confused again.
🧠 Short answer
✅ Threads in Python can concur (overlap in time), but cannot execute Python code in parallel because of the GIL (Global Interpreter Lock).
✅ Processes, on the other hand, can both concur and run truly in parallel on multiple CPU cores — because each process has its own Python interpreter and its own GIL.
⚙️ Step-by-step conceptual difference
| Term | Meaning | Python Threading | Python Multiprocessing |
|---|---|---|---|
| Concurrency | Tasks appear to run simultaneously (overlapping in time) | ✅ Yes (threads take turns quickly) | ✅ Yes |
| Parallelism | Tasks actually run simultaneously on different cores | ❌ No (GIL prevents it) | ✅ Yes (each process has own GIL) |
| Best for | I/O-bound (waiting) tasks | ✅ | 🚫 |
| Best for | CPU-bound (computation) tasks | 🚫 | ✅ |
🧩 Step 1: Concurrency vs Parallelism (Everyday Analogy)
Imagine a kitchen 🍳:
- 👩🍳 Threading → One cook (the CPU) juggling multiple dishes (tasks).
Only one dish is actively being cooked at a time, but the cook switches rapidly between dishes — chopping, stirring, etc.
It feels like multiple dishes are progressing at once (concurrent), but only one is cooked at any exact moment (no parallelism). - 👩🍳👨🍳 Multiprocessing → Multiple cooks (CPU cores), each cooking their own dish independently.
True parallel cooking — several dishes are really being cooked at the same time.
🧱 Step 2: Why threads cannot run Python code in parallel
Because of the GIL, only one thread in a CPython interpreter can hold the lock and execute Python bytecode at a time.
Even if you have a 16-core CPU, the GIL ensures only one thread’s Python code runs at any given instant.
However:
- When a thread calls an I/O operation (like
socket.recv()or file read), it releases the GIL so that another thread can run while it’s waiting for I/O.
That’s why I/O threads can overlap, but CPU-heavy threads can’t.
🧩 Step 3: How it looks under the hood
Example: two CPU-bound threads
import threading, time
def cpu_task():
start = time.time()
total = 0
for i in range(10**8):
total += i*i
print("Done in:", time.time() - start)
t1 = threading.Thread(target=cpu_task)
t2 = threading.Thread(target=cpu_task)
start = time.time()
t1.start(); t2.start()
t1.join(); t2.join()
print("Total time:", time.time() - start)
Output on a 4-core CPU:
Done in: 8.3
Done in: 8.2
Total time: 16.5 ← no speed-up!
👉 They took turns running because the GIL only lets one run at a time.
So even with multiple cores, it behaves like single-threaded execution.
🧩 Step 4: Now use multiprocessing
from multiprocessing import Process
import time
def cpu_task():
start = time.time()
total = 0
for i in range(10**8):
total += i*i
print("Done in:", time.time() - start)
if __name__ == "__main__":
p1 = Process(target=cpu_task)
p2 = Process(target=cpu_task)
start = time.time()
p1.start(); p2.start()
p1.join(); p2.join()
print("Total time:", time.time() - start)
Output:
Done in: 8.2
Done in: 8.3
Total time: 8.4 ← almost 2x faster!
✅ Each process has its own interpreter and its own GIL,
so they truly run in parallel on different cores.
🧩 Step 5: When threads do help (I/O-bound case)
import threading, time, requests
def fetch(url):
requests.get(url)
print("Fetched:", url)
urls = ["https://httpbin.org/delay/2"] * 5
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
start = time.time()
for t in threads: t.start()
for t in threads: t.join()
print("Total time:", time.time() - start)
Each call takes ~2 seconds, but total time ≈ 2–2.5s, not 10s.
✅ Because while one thread is waiting for the network, it releases the GIL,
letting others do their work.
That’s true concurrency — even though not parallel CPU execution.
🧩 Step 6: Quick internal timeline (CPU vs I/O threads)
CPU-bound threads:
t=0ms Thread A holds GIL, executes
t=5ms Thread B gets GIL, executes
t=10ms Thread A again...
(take turns; CPU idle between switches)
I/O-bound threads:
t=0ms Thread A requests network, releases GIL
t=1ms Thread B starts, requests network, releases GIL
t=2ms Thread C runs...
All waiting → network responds → they resume
→ They overlap in waiting (true concurrency).
🧩 Step 7: Key takeaway summary
| Concept | Python Threading | Python Multiprocessing |
|---|---|---|
| Runs on multiple cores? | ❌ No (GIL limits it) | ✅ Yes |
| True parallelism? | ❌ | ✅ |
| Useful for I/O-bound? | ✅ Excellent | Moderate |
| Useful for CPU-bound? | ❌ | ✅ |
| Memory shared? | ✅ Yes | ❌ Separate (unless shared memory used) |
| Lightweight creation? | ✅ | ❌ Heavier |
| Communication | Shared memory, locks | Queues, Pipes |
🧠 Analogy (final wrap)
| Type | Analogy | Result |
|---|---|---|
| Threading | One worker rapidly switching between tasks (only one active) | Concurrency only |
| Multiprocessing | Many workers doing tasks at once | True parallelism |
✅ So in short:
🧩 Threads can run concurrently, not in parallel.
🧩 Processes can run both concurrently and in parallel.
”’
Write a program that starts 2 threads incrementing the same
global counter 1,000,000 times each.
Observe wrong results; add a threading.Lock to fix it.
”’
import threading
import time
global x
x=0
def increment_count(name,x):
x=x+1000000
time.sleep(4)
print(f”{name} the Value of X: {x}”)
thr1=threading.Thread(target=increment_count,args=(“Task1”,x))
thr2=threading.Thread(target=increment_count,args=(“Task2”,x))
thr1.start()
thr2.start()
thr1.join()
thr2.join()
is it correct show
Good question 👍 — you’re on the right track, but your code isn’t correctly modifying the shared global variable x.
Let’s walk through why and then I’ll show you the fixed and Lock-protected version.
🚫 What’s wrong in your code
Here’s your code’s core issue:
def increment_count(name, x):
x = x + 1000000
- When you pass
xas an argument, Python passes its value, not the reference (since integers are immutable). - So, inside the function, the local variable
xshadows the global one — you’re not actually modifying the global variable.
Even with threading, you’re just incrementing a local copy of x, not the shared x.
✅ Correct version without Lock (to show race condition)
Let’s first fix the global access but not use a lock yet — to see wrong results.
import threading
import time
x = 0 # shared variable
def increment_count():
global x
for _ in range(1000000):
x += 1 # critical section
# Create 2 threads
t1 = threading.Thread(target=increment_count)
t2 = threading.Thread(target=increment_count)
# Start them
t1.start()
t2.start()
# Wait for both to finish
t1.join()
t2.join()
print("Final value of x (without lock):", x)
⚙️ Expected Output (without lock)
You’d expect x = 2,000,000
But you’ll often get something like:
Final value of x (without lock): 1743821
💡 Why?
Because both threads read and write x at the same time, overwriting each other’s updates — a race condition.
✅ Correct version with Lock (to fix the race)
Now let’s protect the shared section using threading.Lock():
import threading
import time
x = 0
lock = threading.Lock()
def increment_count():
global x
for _ in range(1000000):
with lock: # acquire and release automatically
x += 1
t1 = threading.Thread(target=increment_count)
t2 = threading.Thread(target=increment_count)
t1.start()
t2.start()
t1.join()
t2.join()
print("Final value of x (with lock):", x)
✅ Expected Output (with Lock)
Final value of x (with lock): 2000000
Perfect ✅ — now every increment is protected.
🧠 Why Lock fixes it
- Without Lock → both threads do:
- read
x - add
1 - write
x - (but get interrupted in between!)
- read
- With Lock → only one thread can enter the
with lock:block at a time, so the shared value isn’t corrupted.
⚡ Optional: To see the GIL effect
Even though Python has a Global Interpreter Lock (GIL), it doesn’t prevent such race conditions, because:
- The GIL only ensures one bytecode executes at a time, not atomic operations on Python objects.
- The GIL can be released between bytecode operations, so multiple threads can interleave operations like
x += 1.
If you want to actually see parallelism (true CPU concurrency), you need multiprocessing, not multithreading.
Would you like me to show the same program using multiprocessing, where both CPU cores really work in parallel (no GIL restriction)?
Perfect 👍
Let’s now see how multiprocessing solves both the GIL limitation and the shared memory problem.
🧩 Step 1: Understanding Why Multiprocessing Works
🧠 Remember:
- In multithreading, threads share the same memory space (same process) — affected by GIL.
- In multiprocessing, each process has its own Python interpreter and memory space — so no GIL conflict.
So, while threads take turns executing on one CPU, processes run truly in parallel on multiple CPUs.
⚙️ Step 2: Example — Same Task Using multiprocessing
We’ll run 2 processes that each increment a shared counter 1,000,000 times.
🧪 Code Without Lock (race condition between processes)
from multiprocessing import Process, Value
import time
def increment_count(x):
for _ in range(1000000):
x.value += 1 # not safe yet
if __name__ == "__main__":
x = Value('i', 0) # shared integer between processes
p1 = Process(target=increment_count, args=(x,))
p2 = Process(target=increment_count, args=(x,))
start = time.time()
p1.start()
p2.start()
p1.join()
p2.join()
print("Final value of x (without lock):", x.value)
print("Time taken:", time.time() - start)
🔎 Expected Output
Final value of x (without lock): 1834271
Same race problem — both processes modify x at the same time.
✅ Step 3: Fix It with Lock (Safe Version)
from multiprocessing import Process, Value, Lock
import time
def increment_count(x, lock):
for _ in range(1000000):
with lock:
x.value += 1
if __name__ == "__main__":
x = Value('i', 0)
lock = Lock()
p1 = Process(target=increment_count, args=(x, lock))
p2 = Process(target=increment_count, args=(x, lock))
start = time.time()
p1.start()
p2.start()
p1.join()
p2.join()
print("Final value of x (with lock):", x.value)
print("Time taken:", time.time() - start)
✅ Output
Final value of x (with lock): 2000000
Now both processes safely update the shared value.
🚀 Step 4: Observe True Parallelism
In this multiprocessing version:
- Both processes run on separate CPU cores.
- The Global Interpreter Lock (GIL) does not apply — each process has its own Python interpreter.
- CPU utilization = near 200% (if you check with Task Manager or
htop).
You can test the time difference by adding:
time.sleep(2)
inside your loop — the total time will roughly halve, showing parallel work.
🧭 When to Use What
| Type | Shared Memory | Parallelism | GIL Affected? | Best For |
|---|---|---|---|---|
| Multithreading | ✅ Yes | ❌ No (I/O only) | ✅ Yes | I/O-bound tasks (web requests, DB, file I/O) |
| Multiprocessing | ❌ No (isolated) | ✅ Yes (real parallelism) | ❌ No | CPU-bound tasks (math, ML, image processing) |
🏭 Real Industry Uses
| Use Case | Typical Approach |
|---|---|
| Web crawlers, API calls, DB queries | Multithreading / AsyncIO |
| Image processing, ML model training | Multiprocessing / GPU parallelism |
| ETL pipelines, data transformation | Multiprocessing (PySpark, Dask) |
| Logging, monitoring tools | Multithreading |
🎯 Top Interview Questions
| Question | Hint/Expected Answer |
|---|---|
| What is the GIL and why does it exist? | GIL ensures one thread executes Python bytecode at a time for memory safety in CPython. |
| Why doesn’t multithreading speed up CPU-bound code in Python? | Because of the GIL. Only one thread can execute Python code at a time. |
| How can you achieve real parallelism in Python? | Use multiprocessing, not threading. |
| How does multiprocessing communicate between processes? | Through Queue, Pipe, Value, Array, or shared memory (Manager). |
| What are I/O-bound vs CPU-bound tasks? | I/O-bound waits on external resources; CPU-bound uses heavy computation. |
What’s the difference between threading.Lock and multiprocessing.Lock? | Both prevent concurrent access, but one is for threads (shared memory), the other for processes (shared state). |
| How can you share data between processes safely? | Use multiprocessing.Manager() or shared memory objects (Value, Array). |
Leave a Reply