Great question — this is one of those deceptively tricky Python string topics 👀
Let’s go deep into how to split a string using multiple delimiters (space, comma, period, tab, etc.).
🧠 Background — str.split() Default Behavior
text = "Hello, world. Welcome\tto Python"
print(text.split())
Output:
['Hello,', 'world.', 'Welcome', 'to', 'Python']
👉 Default split() with no argument splits on any whitespace (' ', '\t', '\n', etc.)
but NOT commas or periods.
❌ Problem
You want to split on spaces, commas, periods, and tabs — like this:
"Hi, there. How are\tyou?"
# Desired output: ['Hi', 'there', 'How', 'are', 'you']
✅ Option 1 — Use re.split() (Recommended)
Regular expressions let you specify multiple delimiters.
import re
text = "Hi, there. How are\tyou?"
words = re.split(r"[ ,.\t]+", text.strip())
print(words)
Output:
['Hi', 'there', 'How', 'are', 'you']
Explanation:
r"[ ,.\t]+"means:[]→ character class- → space
,→ comma.→ period (escaped if you want literal.)\t→ tab+→ one or more occurrences
.strip()removes leading/trailing spaces before splitting.
✅ Handles multiple consecutive delimiters automatically (e.g. multiple spaces or commas).
✅ Option 2 — If You Only Need Simple Replacement (No Regex)
You can normalize all separators into a space, then split:
text = "Hi, there. How are\tyou?"
for sep in [',', '.', '\t']:
text = text.replace(sep, ' ')
words = text.split()
print(words)
Output:
['Hi', 'there', 'How', 'are', 'you']
✅ Works fine for small controlled cases,
❌ But breaks if delimiters are more complex (e.g. ?!;:).
✅ Option 3 — Use Regex to Remove All Non-Alphanumeric (like word tokenizer)
If your goal is to extract only words:
import re
text = "Hi, there. How are\tyou?"
words = re.findall(r"[A-Za-z0-9]+", text)
print(words)
Output:
['Hi', 'there', 'How', 'are', 'you']
This extracts pure tokens (ignores punctuation entirely).
This is similar to tokenization in NLP preprocessing.
🧾 Summary Cheat Table
| Method | Use Case | Example |
|---|---|---|
split() | Split on whitespace only | text.split() |
split(',') | Split on a single delimiter | text.split(',') |
re.split(r"[ ,.\t]+", text) | Split on multiple delimiters | ✅ |
| Replace → Split | Quick workaround | text.replace(',', ' ').split() |
re.findall(r"[A-Za-z0-9]+", text) | Extract words only | NLP tokenization |
🚀 Best Practice
For clean multi-separator splitting →
always use:
import re
re.split(r"[ ,.\t]+", text.strip())
It’s readable, efficient, and robust.
Would you like me to show how to extend this to also split on symbols like ;, :, !, ?, and parentheses (basically a universal “word tokenizer”)?