Regular Expressions (Regex) using Python

Excellent 👏 — welcome to your Interactive Python Regex Lab 🧠💻

Here’s how it’ll work:

  • You’ll solve small real-world challenges step by step.
  • I’ll give you a goal, then hint(s), then solution + explanation — like a mentor walking you through regex thinking.
  • Each level gets slightly harder and more useful for real data tasks.

Ready? Let’s begin 🚀


🧩 Level 1: Find All Emails in Text

Goal:
Extract all email addresses from the given string.

text = "Contact us at support@example.com, sales@myshop.in or admin@openai.com"

🧠 Hint:
Emails usually have:

  • word characters (letters, digits, underscore, dot, hyphen)
  • @
  • domain part (letters.dots)

💬 Try writing a pattern that captures these.

👉 Your turn:
Try to write regex pattern and use re.findall().


Solution (after you try):

import re

text = "Contact us at support@example.com, sales@myshop.in or admin@openai.com"

pattern = r"[\w\.-]+@[\w\.-]+\.\w+"
emails = re.findall(pattern, text)
print(emails)

✅ Output:

['support@example.com', 'sales@myshop.in', 'admin@openai.com']

📘 Explanation:

  • [\w\.-]+ → any word/dot/hyphen (1+ times)
  • @ → literal at symbol
  • [\w\.-]+\.\w+ → domain name + dot + top-level domain

🧩 Level 2: Extract Phone Numbers

Goal:
Extract all 10-digit Indian mobile numbers from this text:

text = "Call 9876543210 or 9123456789 for help. Old number 12345 is invalid."

🧠 Hint:
Use \b for word boundary and \d{10} for 10 digits.


Solution:

phones = re.findall(r"\b\d{10}\b", text)
print(phones)

Output:

['9876543210', '9123456789']

📘 Explanation:

  • \b ensures we match only standalone 10-digit numbers (not parts of longer ones).

🧩 Level 3: Clean Text for NLP

Goal:
Remove all special characters except spaces and letters.

text = "Hi!!! My name is Rajeev##...I love Python$@#"

🧠 Hint:
You can use re.sub() to replace unwanted characters.
Negate a character class: [^A-Za-z\s]


Solution:

clean = re.sub(r"[^A-Za-z\s]", "", text)
print(clean)

Output:

Hi My name is RajeevI love Python

📘 Explanation:
[^A-Za-z\s] → matches anything not a letter or whitespace.
We replace them with an empty string.


🧩 Level 4: Extract Hashtags from Social Media Text

Goal:
Get all hashtags.

text = "Loving #Python #AI and #MachineLearning"

🧠 Hint:
Each hashtag starts with # and followed by word characters.


Solution:

hashtags = re.findall(r"#\w+", text)
print(hashtags)

Output:

['#Python', '#AI', '#MachineLearning']

🧩 Level 5: Extract Date & Time from Log

Goal:
Extract date and time separately.

log = "2025-10-29 14:55:33 INFO - Process started"

🧠 Hint:

  • Date pattern: \d{4}-\d{2}-\d{2}
  • Time pattern: \d{2}:\d{2}:\d{2}
    Use groups () to capture both.

Solution:

pattern = r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2})"
match = re.search(pattern, log)
print(match.groups())

Output:

('2025-10-29', '14:55:33')

🧩 Level 6: Validate Email

Goal:
Check if an email is valid.

🧠 Hint:
Use re.match() — it checks from the start of string.


Solution:

def is_valid_email(email):
    return bool(re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email))

print(is_valid_email("rajeev@test.com"))  # True
print(is_valid_email("not-an-email"))     # False

🧩 Level 7: Extract Domain Names from Emails

Goal:
Get only the domain part (example.com) from each email.

emails = ["user1@gmail.com", "contact@openai.com", "raj@iitm.ac.in"]

🧠 Hint:
Use capturing groups ( ) after @.


Solution:

domains = [re.search(r"@([\w\.-]+\.\w+)", email).group(1) for email in emails]
print(domains)

Output:

['gmail.com', 'openai.com', 'iitm.ac.in']

🧩 Level 8: Extract Prices from E-commerce Text

Goal:
Extract all prices like ₹999, ₹1500, ₹49.99, etc.

text = "Deals: ₹999 only! ₹1500 discounted! ₹49.99 special!"

🧠 Hint:
Currency + digits + optional decimal part
Pattern idea: ₹\d+(\.\d+)?


Solution:

prices = re.findall(r"₹\d+(?:\.\d+)?", text)
print(prices)

Output:

['₹999', '₹1500', '₹49.99']

📘 (?: ) is a non-capturing group — used when we don’t need .group() access.


🧩 Level 9: Replace Multiple Spaces with Single Space

text = "This    is   spaced   out   sentence"

🧠 Hint:
Use \s+ in re.sub() to match multiple spaces.


Solution:

clean_text = re.sub(r"\s+", " ", text)
print(clean_text)

Output:

This is spaced out sentence

🧩 Level 10: Extract IP Addresses (Log Analysis)

Goal:
Extract IPv4 addresses.

log = "Login from 192.168.1.1 failed. Backup from 10.0.0.5 succeeded."

🧠 Hint:
Each IP block: 1–3 digits, separated by dots.
Pattern: (\d{1,3}\.){3}\d{1,3}


Solution:

ips = re.findall(r"(?:\d{1,3}\.){3}\d{1,3}", log)
print(ips)

Output:

['192.168.1.1', '10.0.0.5']

🌟 Bonus Challenge (Pro Level)

Extract all URLs from a string like:

text = "Visit https://openai.com or http://example.org for info."

Try yourself using what you learned.

💡 Hint:
https?://[A-Za-z0-9./]+


Would you like me to now build the “Regex Practice Playground”
a Python notebook-style step-by-step practice system (where you type code, see results, get hints dynamically like coding game levels)?

It can include:

  • 20+ real-world exercises
  • instant code outputs
  • short explanations for each concept

Would you like me to create that next?