Python Strings- A complete guide

Let’s do this step-by-step: first build a deep mental model of Python strings, then walk through all the exercises with solid patterns (brute-force → optimal where it matters).

1️⃣ Python Strings – Deep Mental Model

1.1 What actually is a string in Python?

Type: str
Conceptually: an immutable sequence of Unicode code points.
Implementation (CPython):
- Stored as a PyUnicodeObject (C struct).
- Contains:
  - Length (len(s))
  - Hash (cached after first computation)
  - Pointer to raw character data
  - Encoding details (internally can be 1/2/4 bytes per char depending on the max code point → “compact unicode representation”).

Immutability

Once created, contents cannot change.
Operations like replace, upper, strip return new strings.
Doing:s = "hello" s = s + "!" creates a new string "hello!" and rebinds s to it.

This is why repeated concatenation in a loop is bad:

result = ""
for part in parts:
    result += part  # O(n^2) overall

Better:

result = "".join(parts)  # O(total_length)

1.2 Interning (why `a is b` sometimes True)

Python may intern some strings:

All identifiers (variable names, keywords).
Short strings, or strings that look like identifiers.
You can also manually intern: sys.intern(s).

Interning means: keep a single copy in memory, reuse it → faster comparisons because you can sometimes compare pointers before content.

But never rely on this behavior in logic (== is correct, is is not).

1.3 Indexing & slicing

s[i] is O(1) in CPython (offset arithmetic plus correct width).
s[i:j] creates a new string (copy of that slice).
Python strings are 0-indexed; negative indexes wrap from the end.

s = "abcdef"
s[1:4]  # "bcd"
s[-2:]  # "ef"

1.4 Common operations & inner behavior

Key point: most string methods are O(n) in length of the string (or n + m for substring).

len(s) – O(1) (Python stores length).
in (substr in s) – O(n*m) worst case, but CPython uses optimized search (like Boyer–Moore variants) so it’s much faster in practice.
s.find(sub), s.index(sub) – same underlying C search.
s.replace(old, new) – single pass scanning, building new output.
s.split(sep) – scan and cut into pieces; each piece is a new string.
s.join(list_of_strings) – allocated once with correct final size, then copies parts → very efficient.

Case conversions:

s.lower(), s.upper() – iterate characters, apply Unicode mapping. For non-ASCII, rules are more complex.

Comparison:

s1 == s2, <, etc. – lexicographical comparison, char by char.

Hashing:

String hash is computed on first use, then cached; dictionary/set lookups are very fast.

1.5 Encodings & bytes

Python str is Unicode.
Binary data → bytes (immutable) or bytearray (mutable).

Conversions:

s = "नमस्ते"
b = s.encode("utf-8")     # str -> bytes
s2 = b.decode("utf-8")    # bytes -> str

Errors here are due to encoding mismatch, not “string issues”.

2️⃣ Key String Methods & How They’re Typically Implemented

High level (pseudo-internals):

find(sub) → efficient substring search (e.g., Boyer–Moore / Two-way algorithm).
startswith(prefix) / endswith(suffix) → compare slices directly; no scanning of full string.
strip(chars) → scan from left & right while characters are in chars, then slice.
replace(old, new):
1. Count occurrences to know final size.
2. Allocate final string.
3. Copy characters, substituting new for each old.

You can mentally model them as linear scans plus some cleverness.

3️⃣ Exercises – Basics 🔤

I’ll give: problem → core idea → clean solution.

3.1 Is It a Vowel?

def is_vowel(ch: str) -> bool:
    return len(ch) == 1 and ch.lower() in "aeiou"

3.2 Reverse String

def reverse_string(s: str) -> str:
    return s[::-1]

# manual (for understanding):
def reverse_string_manual(s: str) -> str:
    chars = []
    for i in range(len(s) - 1, -1, -1):
        chars.append(s[i])
    return "".join(chars)

3.3 Remove Vowels from a String

def remove_vowels(s: str) -> str:
    vowels = set("aeiouAEIOU")
    return "".join(ch for ch in s if ch not in vowels)

3.4 Replace Spaces with Hyphens

def replace_spaces_with_hyphens(s: str) -> str:
    return s.replace(" ", "-")

3.5 Valid Anagram (LeetCode)

from collections import Counter

def is_anagram(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False
    return Counter(s) == Counter(t)

3.6 Remove Adjacent Duplicates (LeetCode style: `"abbaca" -> "ca"`)

Use a stack of characters:

def remove_adjacent_duplicates(s: str) -> str:
    stack = []
    for ch in s:
        if stack and stack[-1] == ch:
            stack.pop()
        else:
            stack.append(ch)
    return "".join(stack)

3.7 Pangram Checker (a–z present at least once)

def is_pangram(s: str) -> bool:
    letters = {ch.lower() for ch in s if ch.isalpha()}
    return len(letters) == 26

3.8 Longest Common Prefix (LeetCode)

def longest_common_prefix(strs) -> str:
    if not strs:
        return ""
    prefix = strs[0]
    for s in strs[1:]:
        while not s.startswith(prefix):
            prefix = prefix[:-1]
            if not prefix:
                return ""
    return prefix

3.9 First Unique Character in String (LeetCode)

from collections import Counter

def first_unique_char(s: str) -> int:
    freq = Counter(s)
    for i, ch in enumerate(s):
        if freq[ch] == 1:
            return i
    return -1

3.10 String Compression (LeetCode, e.g. `"aabcccccaaa" -> "a2b1c5a3"`)

def compress_string(s: str) -> str:
    if not s:
        return ""
    result = []
    count = 1
    for i in range(1, len(s)):
        if s[i] == s[i-1]:
            count += 1
        else:
            result.append(s[i-1] + str(count))
            count = 1
    result.append(s[-1] + str(count))
    compressed = "".join(result)
    return compressed if len(compressed) < len(s) else s

3.11 Longest Palindromic Substring

Use expand around center:

def longest_palindrome(s: str) -> str:
    if not s:
        return ""
    
    def expand(l, r):
        while l >= 0 and r < len(s) and s[l] == s[r]:
            l -= 1
            r += 1
        return l + 1, r  # [start, end)

    best_start, best_end = 0, 1
    for i in range(len(s)):
        # odd length
        l1, r1 = expand(i, i)
        if r1 - l1 > best_end - best_start:
            best_start, best_end = l1, r1
        # even length
        l2, r2 = expand(i, i + 1)
        if r2 - l2 > best_end - best_start:
            best_start, best_end = l2, r2
    return s[best_start:best_end]

3.12 Is Subsequence (LeetCode)

def is_subsequence(s: str, t: str) -> bool:
    i = 0
    for ch in t:
        if i < len(s) and s[i] == ch:
            i += 1
    return i == len(s)

3.13 Check if Two Strings Are Rotations of Each Other

def are_rotations(s1: str, s2: str) -> bool:
    if len(s1) != len(s2):
        return False
    return s2 in (s1 + s1)

3.14 Check if String is Palindrome (Ignore Case & Non-Alphanumerics)

def is_palindrome_clean(s: str) -> bool:
    filtered = [ch.lower() for ch in s if ch.isalnum()]
    i, j = 0, len(filtered) - 1
    while i < j:
        if filtered[i] != filtered[j]:
            return False
        i += 1
        j -= 1
    return True

3.15 Count Vowels and Consonants in a String

def count_vowels_consonants(s: str):
    vowels_set = set("aeiouAEIOU")
    vowels = consonants = 0
    for ch in s:
        if ch.isalpha():
            if ch in vowels_set:
                vowels += 1
            else:
                consonants += 1
    return vowels, consonants

3.16 Find First Non-Repeating Character in a String

(Almost same as “First Unique Character”, but return char instead of index.)

from collections import Counter

def first_non_repeating_char(s: str) -> str | None:
    freq = Counter(s)
    for ch in s:
        if freq[ch] == 1:
            return ch
    return None

3.17 Group Anagrams (LeetCode)

Use sorted string or frequency tuple as key:

from collections import defaultdict

def group_anagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        key = "".join(sorted(s))  # or tuple of counts for O(n)
        groups[key].append(s)
    return list(groups.values())

4️⃣ Intermediate & Advanced 📚

4.1 Implement `strStr()` / Substring Search (no built-in search)

Return index of first occurrence of needle in haystack or -1.

Brute force (O(n*m)):

def str_str(haystack: str, needle: str) -> int:
    if needle == "":
        return 0
    n, m = len(haystack), len(needle)
    for i in range(n - m + 1):
        j = 0
        while j < m and haystack[i + j] == needle[j]:
            j += 1
        if j == m:
            return i
    return -1

(You can later upgrade to KMP for O(n + m)).

4.2 Longest Substring Without Repeating Characters

Sliding window with a map of last seen indices:

def length_of_longest_substring(s: str) -> int:
    last = {}
    left = 0
    best = 0
    for right, ch in enumerate(s):
        if ch in last and last[ch] >= left:
            left = last[ch] + 1
        last[ch] = right
        best = max(best, right - left + 1)
    return best

4.3 Longest Substring with At Most K Distinct Characters

Sliding window + count of characters:

from collections import Counter

def longest_substring_k_distinct(s: str, k: int) -> int:
    if k == 0:
        return 0
    freq = Counter()
    left = 0
    best = 0
    distinct = 0

    for right, ch in enumerate(s):
        if freq[ch] == 0:
            distinct += 1
        freq[ch] += 1

        while distinct > k:
            left_ch = s[left]
            freq[left_ch] -= 1
            if freq[left_ch] == 0:
                distinct -= 1
            left += 1
        
        best = max(best, right - left + 1)
    return best

4.4 Minimum Window Substring

Classic sliding window (a bit longer but important):

from collections import Counter

def min_window(s: str, t: str) -> str:
    if not s or not t:
        return ""
    need = Counter(t)
    missing = len(t)
    left = start = end = 0

    for right, ch in enumerate(s, 1):  # right is 1-based here
        if need[ch] > 0:
            missing -= 1
        need[ch] -= 1

        while missing == 0:
            if end == 0 or right - left < end - start:
                start, end = left, right
            left_ch = s[left]
            need[left_ch] += 1
            if need[left_ch] > 0:
                missing += 1
            left += 1

    return s[start:end]

4.5 Valid Palindrome II (remove at most one character)

def valid_palindrome_ii(s: str) -> bool:
    def is_pal_range(i, j):
        while i < j:
            if s[i] != s[j]:
                return False
            i += 1
            j -= 1
        return True

    i, j = 0, len(s) - 1
    while i < j:
        if s[i] != s[j]:
            return is_pal_range(i + 1, j) or is_pal_range(i, j - 1)
        i += 1
        j -= 1
    return True

4.6 Decode String (`3[a2[c]] → accaccacc`)

Use stack:

def decode_string(s: str) -> str:
    num_stack = []
    str_stack = []
    current = []
    num = 0

    for ch in s:
        if ch.isdigit():
            num = num * 10 + int(ch)
        elif ch == '[':
            num_stack.append(num)
            str_stack.append("".join(current))
            num = 0
            current = []
        elif ch == ']':
            repeat = num_stack.pop()
            prev = str_stack.pop()
            current = [prev + "".join(current) * repeat]
        else:
            current.append(ch)
    return "".join(current)

4.7 Basic Calculator (only `+`, `-`, `(`, `)`, spaces)

Evaluate expression like "1 + (2 - 3)".

Idea: scan characters, use a stack to remember signs before parentheses.

def basic_calculator(expr: str) -> int:
    stack = [1]  # sign context
    sign = 1
    result = 0
    i = 0
    n = len(expr)

    while i < n:
        ch = expr[i]
        if ch.isdigit():
            num = 0
            while i < n and expr[i].isdigit():
                num = num * 10 + int(expr[i])
                i += 1
            result += sign * num
            continue  # i already moved
        elif ch == '+':
            sign = stack[-1] * 1
        elif ch == '-':
            sign = stack[-1] * -1
        elif ch == '(':
            stack.append(sign)
        elif ch == ')':
            stack.pop()
        i += 1
    return result

4.8 Multiply Two Large Numbers Given as Strings

Manual multiplication:

def multiply_strings(num1: str, num2: str) -> str:
    if num1 == "0" or num2 == "0":
        return "0"
    m, n = len(num1), len(num2)
    res = [0] * (m + n)

    for i in range(m - 1, -1, -1):
        for j in range(n - 1, -1, -1):
            mul = (ord(num1[i]) - ord('0')) * (ord(num2[j]) - ord('0'))
            total = mul + res[i + j + 1]
            res[i + j + 1] = total % 10
            res[i + j] += total // 10

    # skip leading zeros
    i = 0
    while i < len(res) and res[i] == 0:
        i += 1
    return "".join(str(d) for d in res[i:])

4.9 Add Two Binary Strings

def add_binary(a: str, b: str) -> str:
    i, j = len(a) - 1, len(b) - 1
    carry = 0
    result = []

    while i >= 0 or j >= 0 or carry:
        bit_a = int(a[i]) if i >= 0 else 0
        bit_b = int(b[j]) if j >= 0 else 0
        total = bit_a + bit_b + carry
        result.append(str(total % 2))
        carry = total // 2
        i -= 1
        j -= 1

    return "".join(reversed(result))

4.10 Integer to Roman

def int_to_roman(num: int) -> str:
    values = [
        (1000, "M"), (900, "CM"),
        (500, "D"),  (400, "CD"),
        (100, "C"),  (90, "XC"),
        (50, "L"),   (40, "XL"),
        (10, "X"),   (9, "IX"),
        (5, "V"),    (4, "IV"),
        (1, "I"),
    ]
    result = []
    for val, sym in values:
        while num >= val:
            num -= val
            result.append(sym)
    return "".join(result)

4.11 Roman to Integer

def roman_to_int(s: str) -> int:
    values = {
        "I": 1, "V": 5, "X": 10, "L": 50,
        "C": 100, "D": 250, "M": 1000
    }
    # correct values:
    values["D"] = 500  # fix typo
    total = 0
    i = 0
    while i < len(s):
        if i + 1 < len(s) and values[s[i]] < values[s[i + 1]]:
            total += values[s[i + 1]] - values[s[i]]
            i += 2
        else:
            total += values[s[i]]
            i += 1
    return total

(Note: I fixed D to 500 inside code.)

5️⃣ How to Actually Master This (not just read)

Here’s how I’d suggest you practice this string pack:

Phase 1 – Internals
- Play with:
  - id() of small vs large strings, try sys.intern.
  - Time .join vs += in a loop using timeit.
- Print small experiments to see immutability in action.
Phase 2 – Basics (All 17 problems)
- Try each one from scratch before looking at the solution.
- For each problem, write:
  - Brute force version.
  - Improved version using a pattern (two pointers, hash map, stack, etc.).
Phase 3 – Sliding Window Problems
- Group together:
  - Longest substring without repeating.
  - Longest substring with at most K distinct.
  - Minimum window substring.
- Recognize pattern: window + frequency + shrink while condition broken.
Phase 4 – Stack & Expression Problems
- Decode string.
- Basic calculator.
Phase 5 – Number/String Conversions
- multiply_strings, add_binary, int↔Roman.

If you like, next I can:

Turn this into a multi-day roadmap (like “Deep Python Strings in 7 days”), and
Add diagrams for 2–3 tricky ones (minimum window, decode string, multiply strings) with step-by-step state of variables.

Pages: 1 2 3 4 5 6 7