Python Lists – A complete guide

Great question — this is one of those deceptively tricky Python string topics 👀

Let’s go deep into how to split a string using multiple delimiters (space, comma, period, tab, etc.).

🧠 Background — `str.split()` Default Behavior

text = "Hello, world. Welcome\tto Python"
print(text.split())

Output:

['Hello,', 'world.', 'Welcome', 'to', 'Python']

👉 Default split() with no argument splits on any whitespace (' ', '\t', '\n', etc.)
but NOT commas or periods.

❌ Problem

You want to split on spaces, commas, periods, and tabs — like this:

"Hi, there. How are\tyou?"
# Desired output: ['Hi', 'there', 'How', 'are', 'you']

✅ Option 1 — Use `re.split()` (Recommended)

Regular expressions let you specify multiple delimiters.

import re

text = "Hi, there. How are\tyou?"
words = re.split(r"[ ,.\t]+", text.strip())
print(words)

Output:

['Hi', 'there', 'How', 'are', 'you']

Explanation:

r"[ ,.\t]+" means:
- [ ] → character class
- → space
- , → comma
- . → period (escaped if you want literal .)
- \t → tab
- + → one or more occurrences
.strip() removes leading/trailing spaces before splitting.

✅ Handles multiple consecutive delimiters automatically (e.g. multiple spaces or commas).

✅ Option 2 — If You Only Need Simple Replacement (No Regex)

You can normalize all separators into a space, then split:

text = "Hi, there. How are\tyou?"
for sep in [',', '.', '\t']:
    text = text.replace(sep, ' ')
words = text.split()
print(words)

Output:

['Hi', 'there', 'How', 'are', 'you']

✅ Works fine for small controlled cases,
❌ But breaks if delimiters are more complex (e.g. ?!;:).

✅ Option 3 — Use Regex to Remove All Non-Alphanumeric (like word tokenizer)

If your goal is to extract only words:

import re
text = "Hi, there. How are\tyou?"
words = re.findall(r"[A-Za-z0-9]+", text)
print(words)

Output:

['Hi', 'there', 'How', 'are', 'you']

This extracts pure tokens (ignores punctuation entirely).
This is similar to tokenization in NLP preprocessing.

🧾 Summary Cheat Table

Method	Use Case	Example
`split()`	Split on whitespace only	`text.split()`
`split(',')`	Split on a single delimiter	`text.split(',')`
`re.split(r"[ ,.\t]+", text)`	Split on multiple delimiters	✅
Replace → Split	Quick workaround	`text.replace(',', ' ').split()`
`re.findall(r"[A-Za-z0-9]+", text)`	Extract words only	NLP tokenization

🚀 Best Practice

For clean multi-separator splitting →
always use:

import re
re.split(r"[ ,.\t]+", text.strip())

It’s readable, efficient, and robust.

Would you like me to show how to extend this to also split on symbols like ;, :, !, ?, and parentheses (basically a universal “word tokenizer”)?

Pages: 1 2 3

Python Lists – A complete guide

🧠 Background — `str.split()` Default Behavior

❌ Problem

✅ Option 1 — Use `re.split()` (Recommended)

Explanation:

✅ Option 2 — If You Only Need Simple Replacement (No Regex)

✅ Option 3 — Use Regex to Remove All Non-Alphanumeric (like word tokenizer)

🧾 Summary Cheat Table

🚀 Best Practice

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Python Lists – A complete guide

🧠 Background — str.split() Default Behavior

❌ Problem

✅ Option 1 — Use re.split() (Recommended)

Explanation:

✅ Option 2 — If You Only Need Simple Replacement (No Regex)

✅ Option 3 — Use Regex to Remove All Non-Alphanumeric (like word tokenizer)

🧾 Summary Cheat Table

🚀 Best Practice

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

🧠 Background — `str.split()` Default Behavior

✅ Option 1 — Use `re.split()` (Recommended)