Excellent 👏 — welcome to your Interactive Python Regex Lab 🧠💻
Here’s how it’ll work:
- You’ll solve small real-world challenges step by step.
- I’ll give you a goal, then hint(s), then solution + explanation — like a mentor walking you through regex thinking.
- Each level gets slightly harder and more useful for real data tasks.
Ready? Let’s begin 🚀
🧩 Level 1: Find All Emails in Text
Goal:
Extract all email addresses from the given string.
text = "Contact us at support@example.com, sales@myshop.in or admin@openai.com"
🧠 Hint:
Emails usually have:
- word characters (letters, digits, underscore, dot, hyphen)
@- domain part (
letters.dots)
💬 Try writing a pattern that captures these.
👉 Your turn:
Try to write regex pattern and use re.findall().
✅ Solution (after you try):
import re
text = "Contact us at support@example.com, sales@myshop.in or admin@openai.com"
pattern = r"[\w\.-]+@[\w\.-]+\.\w+"
emails = re.findall(pattern, text)
print(emails)
✅ Output:
['support@example.com', 'sales@myshop.in', 'admin@openai.com']
📘 Explanation:
[\w\.-]+→ any word/dot/hyphen (1+ times)@→ literal at symbol[\w\.-]+\.\w+→ domain name + dot + top-level domain
🧩 Level 2: Extract Phone Numbers
Goal:
Extract all 10-digit Indian mobile numbers from this text:
text = "Call 9876543210 or 9123456789 for help. Old number 12345 is invalid."
🧠 Hint:
Use \b for word boundary and \d{10} for 10 digits.
✅ Solution:
phones = re.findall(r"\b\d{10}\b", text)
print(phones)
Output:
['9876543210', '9123456789']
📘 Explanation:
\bensures we match only standalone 10-digit numbers (not parts of longer ones).
🧩 Level 3: Clean Text for NLP
Goal:
Remove all special characters except spaces and letters.
text = "Hi!!! My name is Rajeev##...I love Python$@#"
🧠 Hint:
You can use re.sub() to replace unwanted characters.
Negate a character class: [^A-Za-z\s]
✅ Solution:
clean = re.sub(r"[^A-Za-z\s]", "", text)
print(clean)
Output:
Hi My name is RajeevI love Python
📘 Explanation:[^A-Za-z\s] → matches anything not a letter or whitespace.
We replace them with an empty string.
🧩 Level 4: Extract Hashtags from Social Media Text
Goal:
Get all hashtags.
text = "Loving #Python #AI and #MachineLearning"
🧠 Hint:
Each hashtag starts with # and followed by word characters.
✅ Solution:
hashtags = re.findall(r"#\w+", text)
print(hashtags)
Output:
['#Python', '#AI', '#MachineLearning']
🧩 Level 5: Extract Date & Time from Log
Goal:
Extract date and time separately.
log = "2025-10-29 14:55:33 INFO - Process started"
🧠 Hint:
- Date pattern:
\d{4}-\d{2}-\d{2} - Time pattern:
\d{2}:\d{2}:\d{2}
Use groups()to capture both.
✅ Solution:
pattern = r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2})"
match = re.search(pattern, log)
print(match.groups())
Output:
('2025-10-29', '14:55:33')
🧩 Level 6: Validate Email
Goal:
Check if an email is valid.
🧠 Hint:
Use re.match() — it checks from the start of string.
✅ Solution:
def is_valid_email(email):
return bool(re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email))
print(is_valid_email("rajeev@test.com")) # True
print(is_valid_email("not-an-email")) # False
🧩 Level 7: Extract Domain Names from Emails
Goal:
Get only the domain part (example.com) from each email.
emails = ["user1@gmail.com", "contact@openai.com", "raj@iitm.ac.in"]
🧠 Hint:
Use capturing groups ( ) after @.
✅ Solution:
domains = [re.search(r"@([\w\.-]+\.\w+)", email).group(1) for email in emails]
print(domains)
Output:
['gmail.com', 'openai.com', 'iitm.ac.in']
🧩 Level 8: Extract Prices from E-commerce Text
Goal:
Extract all prices like ₹999, ₹1500, ₹49.99, etc.
text = "Deals: ₹999 only! ₹1500 discounted! ₹49.99 special!"
🧠 Hint:
Currency + digits + optional decimal part
Pattern idea: ₹\d+(\.\d+)?
✅ Solution:
prices = re.findall(r"₹\d+(?:\.\d+)?", text)
print(prices)
Output:
['₹999', '₹1500', '₹49.99']
📘 (?: ) is a non-capturing group — used when we don’t need .group() access.
🧩 Level 9: Replace Multiple Spaces with Single Space
text = "This is spaced out sentence"
🧠 Hint:
Use \s+ in re.sub() to match multiple spaces.
✅ Solution:
clean_text = re.sub(r"\s+", " ", text)
print(clean_text)
Output:
This is spaced out sentence
🧩 Level 10: Extract IP Addresses (Log Analysis)
Goal:
Extract IPv4 addresses.
log = "Login from 192.168.1.1 failed. Backup from 10.0.0.5 succeeded."
🧠 Hint:
Each IP block: 1–3 digits, separated by dots.
Pattern: (\d{1,3}\.){3}\d{1,3}
✅ Solution:
ips = re.findall(r"(?:\d{1,3}\.){3}\d{1,3}", log)
print(ips)
Output:
['192.168.1.1', '10.0.0.5']
🌟 Bonus Challenge (Pro Level)
Extract all URLs from a string like:
text = "Visit https://openai.com or http://example.org for info."
Try yourself using what you learned.
💡 Hint:https?://[A-Za-z0-9./]+
Would you like me to now build the “Regex Practice Playground” —
a Python notebook-style step-by-step practice system (where you type code, see results, get hints dynamically like coding game levels)?
It can include:
- 20+ real-world exercises
- instant code outputs
- short explanations for each concept
Would you like me to create that next?