Excellent — that’s one of the most important and misunderstood questions about Python concurrency.

Let’s go deep but keep it intuitive — by actually visualizing how the GIL behaves inside CPython when you run multiple threads, and why CPU-bound threads never run truly in parallel, while I/O threads seem to do so.


🧩 Step 1: What is the GIL really?

The Global Interpreter Lock (GIL) is a mutex (lock) that protects the CPython interpreter’s internal state — mainly the object memory manager and reference counts.

In short:

👉 Only one thread can execute Python bytecode at any given time in a single CPython process.


🧱 Step 2: Why does it exist?

Python objects use reference counting for garbage collection:

x = [1, 2, 3]
y = x
del x  # refcount decreases

Each assignment, deletion, and many operations increase/decrease reference counts.

If two threads modified x’s refcount simultaneously without synchronization, memory corruption or crashes could happen.
Rather than adding fine-grained locks to every object (which would be slow and complex), CPython designers added one global lock to ensure only one thread touches Python objects at a time.

So, the GIL makes CPython safe but not fully parallel.


⚙️ Step 3: What does the GIL do to threads?

Every Python thread must acquire the GIL before executing Python bytecode.
If a thread doesn’t hold the GIL, it can’t execute Python code — it just waits.

Internally:

  1. A thread gets the GIL.
  2. Runs some number of bytecode instructions (or for a short time slice).
  3. Periodically releases the GIL (every few milliseconds or after N bytecodes).
  4. Another waiting thread may acquire it next.

💡 In CPython (3.x):

  • The GIL is released every 5 milliseconds (configurable with sys.setswitchinterval()).
  • So threads take turns running small chunks of bytecode sequentially, not simultaneously.

🔥 Step 4: Why CPU-bound threads don’t run in parallel

Let’s simulate mentally 👇

CPU-bound task example:

import threading, time

def cpu_task():
    start = time.time()
    total = 0
    for i in range(10**8):
        total += i*i
    print("Done in:", time.time() - start)

threads = [threading.Thread(target=cpu_task) for _ in range(2)]
for t in threads: t.start()
for t in threads: t.join()

What happens internally:

  • Both threads start, but only one thread can hold the GIL at a time.
  • So thread A runs bytecode for ~5 ms → releases GIL → thread B acquires → runs 5 ms → releases → repeat.
  • The CPU context switches frequently between them.
  • Result: total runtime ≈ same as single-threaded, sometimes slower (due to switching overhead).

🧠 CPU utilization:
Even on an 8-core CPU, your program uses only one core effectively because only one thread executes Python code at once.


🌊 Step 5: When can threads actually run concurrently?

When they are not executing Python bytecode, i.e., when the GIL is released.

Many C extensions and I/O operations release the GIL internally while they wait (e.g., reading from disk, network, or sleeping).

Example:

import threading, requests, time

def io_task(url):
    requests.get(url)

start = time.time()
threads = [threading.Thread(target=io_task, args=("https://httpbin.org/delay/2",)) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
print("Total:", time.time() - start)

Each thread spends most of its time waiting for network I/O.
During those waits, the C code in requests (via urllib3 → sockets) releases the GIL, so other threads can run.

✅ Result: 5 threads fetching in parallel (I/O overlapped).

⚠️ But note: the actual Python code inside each thread still takes turns with the GIL.


🧠 Step 6: How does CPython decide when to switch threads?

Internally CPython maintains a bytecode counter and timer.

  • Every thread periodically checks if it should release the GIL.
  • The interpreter checks this every few bytecode instructions.
  • If time or counter exceeds the sys.getswitchinterval() threshold, CPython flags “switch thread soon.”

So thread switching happens roughly every 5ms (default).
But actual switching depends on the OS scheduler too — another layer of unpredictability.


🔄 Step 7: Visualization (simplified timeline)

Imagine 2 threads (A & B) on a dual-core CPU:

Time (ms)   Thread A           Thread B
-------------------------------------------------
0-5         🟢 Runs (holds GIL)   ⚪ Waiting
5-10        ⚪ Waiting             🟢 Runs (holds GIL)
10-15       🟢 Runs (holds GIL)   ⚪ Waiting
15-20       ⚪ Waiting             🟢 Runs (holds GIL)
...

Even though both cores exist, only one executes Python bytecode at a time (🟢 = active thread).


💪 Step 8: How to bypass GIL for CPU-bound code

  1. Use multiprocessing
    • Each process has its own GIL and interpreter.
    • True parallel CPU execution on multiple cores.
    from multiprocessing import Pool def cpu_task(n): total = 0 for i in range(10**8): total += i*i return total if __name__ == "__main__": with Pool(2) as p: p.map(cpu_task, [1, 2])
  2. Use NumPy or C extensions
    • NumPy operations are implemented in C and release the GIL.
    • So heavy numeric computations can utilize multiple cores internally.
  3. Use alternative interpreters
    • PyPy: JIT-compiled, sometimes faster.
    • Jython, IronPython: no GIL (but different ecosystems).

🧩 Step 9: Summary table

Type of WorkloadRecommendedWhy
CPU-bound (e.g. math loops)multiprocessingEach process has its own GIL
I/O-bound (e.g. API calls, file reads)threading or asyncioThreads can release GIL during I/O
MixedCombine bothe.g. worker processes with threads inside

🧠 Step 10: Top Interview Questions (on GIL)

#QuestionKey Idea
1What is the GIL and why does it exist?A lock ensuring one thread executes bytecode at a time for memory safety.
2Does the GIL prevent true parallelism?Yes, for CPU-bound tasks in CPython.
3How can Python still do concurrent I/O?I/O operations release the GIL while waiting.
4What’s the difference between threading and multiprocessing regarding GIL?Threads share one GIL; processes each have their own.
5How often does Python switch threads?Roughly every 5ms (configurable).
6How can you remove the GIL?Not directly; use multiprocessing or C extensions that release it.
7What happens if a C extension doesn’t release the GIL?It blocks all other threads during its execution.
8How can you measure GIL impact?Run CPU-bound code in 1 vs multiple threads; check CPU utilization.
9Why can’t the GIL just be removed?CPython’s memory model (reference counting) isn’t thread-safe without it.
10How do libraries like NumPy overcome GIL limits?They run heavy computations in C and release the GIL while doing so.

💯 Excellent question — and you’ve hit the core confusion most developers have when learning Python concurrency.

Let’s clarify this precisely, with examples and analogies, so you’ll never be confused again.


🧠 Short answer

Threads in Python can concur (overlap in time), but cannot execute Python code in parallel because of the GIL (Global Interpreter Lock).

Processes, on the other hand, can both concur and run truly in parallel on multiple CPU cores — because each process has its own Python interpreter and its own GIL.


⚙️ Step-by-step conceptual difference

TermMeaningPython ThreadingPython Multiprocessing
ConcurrencyTasks appear to run simultaneously (overlapping in time)✅ Yes (threads take turns quickly)✅ Yes
ParallelismTasks actually run simultaneously on different cores❌ No (GIL prevents it)✅ Yes (each process has own GIL)
Best forI/O-bound (waiting) tasks🚫
Best forCPU-bound (computation) tasks🚫

🧩 Step 1: Concurrency vs Parallelism (Everyday Analogy)

Imagine a kitchen 🍳:

  • 👩‍🍳 Threading → One cook (the CPU) juggling multiple dishes (tasks).
    Only one dish is actively being cooked at a time, but the cook switches rapidly between dishes — chopping, stirring, etc.
    It feels like multiple dishes are progressing at once (concurrent), but only one is cooked at any exact moment (no parallelism).
  • 👩‍🍳👨‍🍳 Multiprocessing → Multiple cooks (CPU cores), each cooking their own dish independently.
    True parallel cooking — several dishes are really being cooked at the same time.

🧱 Step 2: Why threads cannot run Python code in parallel

Because of the GIL, only one thread in a CPython interpreter can hold the lock and execute Python bytecode at a time.

Even if you have a 16-core CPU, the GIL ensures only one thread’s Python code runs at any given instant.

However:

  • When a thread calls an I/O operation (like socket.recv() or file read), it releases the GIL so that another thread can run while it’s waiting for I/O.

That’s why I/O threads can overlap, but CPU-heavy threads can’t.


🧩 Step 3: How it looks under the hood

Example: two CPU-bound threads

import threading, time

def cpu_task():
    start = time.time()
    total = 0
    for i in range(10**8):
        total += i*i
    print("Done in:", time.time() - start)

t1 = threading.Thread(target=cpu_task)
t2 = threading.Thread(target=cpu_task)

start = time.time()
t1.start(); t2.start()
t1.join(); t2.join()
print("Total time:", time.time() - start)

Output on a 4-core CPU:

Done in: 8.3
Done in: 8.2
Total time: 16.5   ← no speed-up!

👉 They took turns running because the GIL only lets one run at a time.
So even with multiple cores, it behaves like single-threaded execution.


🧩 Step 4: Now use multiprocessing

from multiprocessing import Process
import time

def cpu_task():
    start = time.time()
    total = 0
    for i in range(10**8):
        total += i*i
    print("Done in:", time.time() - start)

if __name__ == "__main__":
    p1 = Process(target=cpu_task)
    p2 = Process(target=cpu_task)

    start = time.time()
    p1.start(); p2.start()
    p1.join(); p2.join()
    print("Total time:", time.time() - start)

Output:

Done in: 8.2
Done in: 8.3
Total time: 8.4   ← almost 2x faster!

✅ Each process has its own interpreter and its own GIL,
so they truly run in parallel on different cores.


🧩 Step 5: When threads do help (I/O-bound case)

import threading, time, requests

def fetch(url):
    requests.get(url)
    print("Fetched:", url)

urls = ["https://httpbin.org/delay/2"] * 5
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]

start = time.time()
for t in threads: t.start()
for t in threads: t.join()
print("Total time:", time.time() - start)

Each call takes ~2 seconds, but total time ≈ 2–2.5s, not 10s.

✅ Because while one thread is waiting for the network, it releases the GIL,
letting others do their work.
That’s true concurrency — even though not parallel CPU execution.


🧩 Step 6: Quick internal timeline (CPU vs I/O threads)

CPU-bound threads:

t=0ms   Thread A holds GIL, executes
t=5ms   Thread B gets GIL, executes
t=10ms  Thread A again...
(take turns; CPU idle between switches)

I/O-bound threads:

t=0ms   Thread A requests network, releases GIL
t=1ms   Thread B starts, requests network, releases GIL
t=2ms   Thread C runs...
All waiting → network responds → they resume

→ They overlap in waiting (true concurrency).


🧩 Step 7: Key takeaway summary

ConceptPython ThreadingPython Multiprocessing
Runs on multiple cores?❌ No (GIL limits it)✅ Yes
True parallelism?
Useful for I/O-bound?✅ ExcellentModerate
Useful for CPU-bound?
Memory shared?✅ Yes❌ Separate (unless shared memory used)
Lightweight creation?❌ Heavier
CommunicationShared memory, locksQueues, Pipes

🧠 Analogy (final wrap)

TypeAnalogyResult
ThreadingOne worker rapidly switching between tasks (only one active)Concurrency only
MultiprocessingMany workers doing tasks at onceTrue parallelism

✅ So in short:

🧩 Threads can run concurrently, not in parallel.

🧩 Processes can run both concurrently and in parallel.


”’
Write a program that starts 2 threads incrementing the same
global counter 1,000,000 times each.
Observe wrong results; add a threading.Lock to fix it.

”’

import threading
import time
global x
x=0
def increment_count(name,x):
x=x+1000000
time.sleep(4)
print(f”{name} the Value of X: {x}”)
thr1=threading.Thread(target=increment_count,args=(“Task1”,x))
thr2=threading.Thread(target=increment_count,args=(“Task2”,x))
thr1.start()
thr2.start()

thr1.join()
thr2.join()
is it correct show



Good question 👍 — you’re on the right track, but your code isn’t correctly modifying the shared global variable x.
Let’s walk through why and then I’ll show you the fixed and Lock-protected version.


🚫 What’s wrong in your code

Here’s your code’s core issue:

def increment_count(name, x):
    x = x + 1000000
  • When you pass x as an argument, Python passes its value, not the reference (since integers are immutable).
  • So, inside the function, the local variable x shadows the global one — you’re not actually modifying the global variable.

Even with threading, you’re just incrementing a local copy of x, not the shared x.


✅ Correct version without Lock (to show race condition)

Let’s first fix the global access but not use a lock yet — to see wrong results.

import threading
import time

x = 0  # shared variable

def increment_count():
    global x
    for _ in range(1000000):
        x += 1  # critical section

# Create 2 threads
t1 = threading.Thread(target=increment_count)
t2 = threading.Thread(target=increment_count)

# Start them
t1.start()
t2.start()

# Wait for both to finish
t1.join()
t2.join()

print("Final value of x (without lock):", x)

⚙️ Expected Output (without lock)

You’d expect x = 2,000,000
But you’ll often get something like:

Final value of x (without lock): 1743821

💡 Why?
Because both threads read and write x at the same time, overwriting each other’s updates — a race condition.


✅ Correct version with Lock (to fix the race)

Now let’s protect the shared section using threading.Lock():

import threading
import time

x = 0
lock = threading.Lock()

def increment_count():
    global x
    for _ in range(1000000):
        with lock:       # acquire and release automatically
            x += 1

t1 = threading.Thread(target=increment_count)
t2 = threading.Thread(target=increment_count)

t1.start()
t2.start()

t1.join()
t2.join()

print("Final value of x (with lock):", x)

✅ Expected Output (with Lock)

Final value of x (with lock): 2000000

Perfect ✅ — now every increment is protected.


🧠 Why Lock fixes it

  • Without Lock → both threads do:
    • read x
    • add 1
    • write x
    • (but get interrupted in between!)
  • With Lock → only one thread can enter the with lock: block at a time, so the shared value isn’t corrupted.

⚡ Optional: To see the GIL effect

Even though Python has a Global Interpreter Lock (GIL), it doesn’t prevent such race conditions, because:

  • The GIL only ensures one bytecode executes at a time, not atomic operations on Python objects.
  • The GIL can be released between bytecode operations, so multiple threads can interleave operations like x += 1.

If you want to actually see parallelism (true CPU concurrency), you need multiprocessing, not multithreading.


Would you like me to show the same program using multiprocessing, where both CPU cores really work in parallel (no GIL restriction)?

Perfect 👍
Let’s now see how multiprocessing solves both the GIL limitation and the shared memory problem.


🧩 Step 1: Understanding Why Multiprocessing Works

🧠 Remember:

  • In multithreading, threads share the same memory space (same process) — affected by GIL.
  • In multiprocessing, each process has its own Python interpreter and memory space — so no GIL conflict.

So, while threads take turns executing on one CPU, processes run truly in parallel on multiple CPUs.


⚙️ Step 2: Example — Same Task Using multiprocessing

We’ll run 2 processes that each increment a shared counter 1,000,000 times.

🧪 Code Without Lock (race condition between processes)

from multiprocessing import Process, Value
import time

def increment_count(x):
    for _ in range(1000000):
        x.value += 1  # not safe yet

if __name__ == "__main__":
    x = Value('i', 0)  # shared integer between processes

    p1 = Process(target=increment_count, args=(x,))
    p2 = Process(target=increment_count, args=(x,))

    start = time.time()

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print("Final value of x (without lock):", x.value)
    print("Time taken:", time.time() - start)

🔎 Expected Output

Final value of x (without lock): 1834271

Same race problem — both processes modify x at the same time.


✅ Step 3: Fix It with Lock (Safe Version)

from multiprocessing import Process, Value, Lock
import time

def increment_count(x, lock):
    for _ in range(1000000):
        with lock:
            x.value += 1

if __name__ == "__main__":
    x = Value('i', 0)
    lock = Lock()

    p1 = Process(target=increment_count, args=(x, lock))
    p2 = Process(target=increment_count, args=(x, lock))

    start = time.time()

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print("Final value of x (with lock):", x.value)
    print("Time taken:", time.time() - start)

✅ Output

Final value of x (with lock): 2000000

Now both processes safely update the shared value.


🚀 Step 4: Observe True Parallelism

In this multiprocessing version:

  • Both processes run on separate CPU cores.
  • The Global Interpreter Lock (GIL) does not apply — each process has its own Python interpreter.
  • CPU utilization = near 200% (if you check with Task Manager or htop).

You can test the time difference by adding:

time.sleep(2)

inside your loop — the total time will roughly halve, showing parallel work.


🧭 When to Use What

TypeShared MemoryParallelismGIL Affected?Best For
Multithreading✅ Yes❌ No (I/O only)✅ YesI/O-bound tasks (web requests, DB, file I/O)
Multiprocessing❌ No (isolated)✅ Yes (real parallelism)❌ NoCPU-bound tasks (math, ML, image processing)

🏭 Real Industry Uses

Use CaseTypical Approach
Web crawlers, API calls, DB queriesMultithreading / AsyncIO
Image processing, ML model trainingMultiprocessing / GPU parallelism
ETL pipelines, data transformationMultiprocessing (PySpark, Dask)
Logging, monitoring toolsMultithreading

🎯 Top Interview Questions

QuestionHint/Expected Answer
What is the GIL and why does it exist?GIL ensures one thread executes Python bytecode at a time for memory safety in CPython.
Why doesn’t multithreading speed up CPU-bound code in Python?Because of the GIL. Only one thread can execute Python code at a time.
How can you achieve real parallelism in Python?Use multiprocessing, not threading.
How does multiprocessing communicate between processes?Through Queue, Pipe, Value, Array, or shared memory (Manager).
What are I/O-bound vs CPU-bound tasks?I/O-bound waits on external resources; CPU-bound uses heavy computation.
What’s the difference between threading.Lock and multiprocessing.Lock?Both prevent concurrent access, but one is for threads (shared memory), the other for processes (shared state).
How can you share data between processes safely?Use multiprocessing.Manager() or shared memory objects (Value, Array).

Pages: 1 2

Posted in

Leave a Reply

Your email address will not be published. Required fields are marked *