Build mountain with each pebble 🧗

Python String Manipulation:

From Basics to Advanced Techniques: This pebble is designed to be your go-to resource for mastering string manipulation in Python. We'll start with the fundamentals and gradually explore more complex and lesser-known functions. Let's dive in!

1. The Basics: What are Strings?

In Python, a string is a sequence of characters enclosed in either single quotes ('...'), double quotes ("..."), or triple quotes ('''...''' or """..."""). Strings are immutable, which means that once a string is created, it cannot be changed. However, we can perform various operations to create new strings based on the original.

Creating Strings


# Single quotes
my_string1 = 'Hello, World!'
# Double quotes
my_string2 = "Python is fun."
# Triple quotes for multi-line strings
my_string3 = """This is a
multi-line
string."""
print(my_string1)
print(my_string2)
print(my_string3)

String Concatenation and Repetition

You can combine strings using the + operator and repeat them using the * operator.


first_name = "Santosh"
last_name = "V"
# Concatenation
full_name = first_name + " " + last_name
print(full_name)  # Output: Santosh V
# Repetition
separator = "-" * 10
print(separator)  
# Output: ----------

2. Formatting Strings with f-Strings, .format() and Specifiers

Properly formatting strings is crucial for creating readable output. Python offers several ways to do this.

F-Strings (Formatted String Literals)

This is the modern and preferred way. You prefix the string with an f or F and place variables or expressions directly inside curly braces {}.

Basic F-String Usage


name = "Santosh"
age = 30
greeting = f"My name is {name} and I am {age} years old."
print(greeting)
# Output: My name is Santosh and I am 30 years old.

The .format() Method

Before f-strings, the .format() method was the standard. It works by placing placeholder curly braces in the string and then calling the method with the values to insert.


name = "santosh"
age = 25
# Using positional arguments
greeting_pos = "My name is {} and I am {} years old.".format(name, age)
print(greeting_pos)
# Using keyword arguments
greeting_key = "My name is {name} and I am {age} years old.".format(name="Santosh", age=35)
print(greeting_key)

Name Specifiers for Advanced Formatting

F-strings also allow for "name specifiers" that give you more control over the formatting of the embedded expressions. The general syntax is {value:specifier}.

Alignment and Padding

You can align text to the left (<), right (>), or center (^) within a specified width.


text = "Python"
# Left-aligned, width of 20
print(f"'{text:<20}'")  # Output: 'Python              '
# Right-aligned, width of 20
print(f"'{text:>20}'")  # Output: '              Python'
# Centered, width of 20
print(f"'{text:^20}'")  # Output: '       Python       '
# You can also specify a fill character
print(f"'{text:*^20}'") # Output: '*******Python*******'

Number Formatting

F-strings are powerful for formatting numbers.
number = 1234.56789
# Limiting decimal places
print(f"Two decimal places: {number:.2f}")  # Output: 1234.57
# Adding a comma as thousands separator
print(f"With comma: {number:,.2f}")  # Output: 1,234.57
# Displaying as a percentage
percentage = 0.85
print(f"Percentage: {percentage:.1%}") # Output: 85.0%

3. Checking String Content: isdigit(), isnumeric(), and isdecimal()

These three methods help you check if a string consists of numeric characters, but they have subtle and important differences.

isdecimal()

This is the most restrictive method. It returns True only if all characters in the string are decimal characters (0-9).

s1 = "12345"
s2 = "123.45"
s3 = "\u00B2"  # Superscript two
print(f"'{s1}'.isdecimal(): {s1.isdecimal()}")  # Output: True
print(f"'{s2}'.isdecimal(): {s2.isdecimal()}")  # Output: False
print(f"'{s3}'.isdecimal(): {s3.isdecimal()}")  # Output: False

isdigit()

This method is broader than isdecimal(). It returns True if all characters are digits, which includes decimal characters and superscripts/subscripts.


s1 = "12345"
s2 = "\u00B2"  # Superscript two
s3 = "\u2153"  # Vulgar fraction one-third
print(f"'{s1}'.isdigit(): {s1.isdigit()}")  # Output: True
print(f"'{s2}'.isdigit(): {s2.isdigit()}")  # Output: True
print(f"'{s3}'.isdigit(): {s3.isdigit()}")  # Output: False

isnumeric()

This is the most general of the three. It returns True for digits, fractions, subscripts, superscripts, and other numeric characters in various languages.

s1 = "12345"
s2 = "\u00B2"  # Superscript two
s3 = "\u2153"  # Vulgar fraction one-third
s4 = "一二三" # Chinese numerals
print(f"'{s1}'.isnumeric(): {s1.isnumeric()}")  # Output: True
print(f"'{s2}'.isnumeric(): {s2.isnumeric()}")  # Output: True
print(f"'{s3}'.isnumeric(): {s3.isnumeric()}")  # Output: True
print(f"'{s4}'.isnumeric(): {s4.isnumeric()}")  # Output: True

Key Takeaway:

Use isdecimal() when you need to be sure the string can be converted to an integer. isdigit() is useful for a broader range of digit-like characters. isnumeric() is the most inclusive.

isdigit()	isnumeric()	isdecimal()
True is all characters in string are digits (0-9)	True if all the characters in the string are numeric	True if all characters in string are decimals
Includes characters like superscript and other Unicode characters	Includes digits, fractions, roman numerals and other numeric Unicode characters	More restrictive than other 2, as it considers only standard base-10 digits

4. Finding and Counting Substrings

These functions help you locate and quantify characters or sequences of characters within a string. find() and rfind()

find(substring): Returns the lowest index in the string where substring is found. If it's not found, it returns -1.
rfind(substring): Returns the highest index (searches from the right). Also returns -1 if not found.

sentence = "The quick brown fox jumps over the lazy dog."
# Find the first occurrence of "fox"
print(f'Index of "fox": {sentence.find("fox")}')  # Output: 16
# Find the first occurrence of "the" (case-sensitive)
print(f'Index of "the": {sentence.find("the")}')  # Output: 31
# Find the last occurrence of "the"
print(f'Index of "the" from the right: {sentence.rfind("the")}') # Output: 31
# What if the substring isn't there?
print(f'Index of "cat": {sentence.find("cat")}')  # Output: -1

index() and rindex()

These are nearly identical to find() and rfind(), with one critical difference: if the substring is not found, they raise a ValueError instead of returning -1.

When to use find() vs index()?

Use find() if you just want to check for existence and get an index without worrying about errors. It's great for conditional checks (if my_string.find(...) != -1:).

Use index() if you expect the substring to be present and want the program to stop with an error if it's not. This can help catch unexpected data issues.

count()

Returns the number of non-overlapping occurrences of a substring.

sentence = "the three thieves thought that they could trick the other three."
print(f'The word "the" appears {sentence.count("the")} times.') # Output: 5

5. Modifying and Cleaning Strings

Since strings are immutable, these methods return a new, modified copy of the string.

Case Modification

upper(): Converts the entire string to uppercase.
lower(): Converts the entire string to lowercase.
capitalize(): Makes the first character uppercase and the rest lowercase.
title(): Capitalizes the first letter of each word.
swapcase(): Swaps the case of every letter (upper becomes lower, lower becomes upper).
casefold(): A more aggressive version of lower(), useful for caseless comparisons across different languages (e.g., German "ß" becomes "ss").

text = "weLCoMe to tHe jUnGle"
print(f"Original: {text}")
print(f"upper(): {text.upper()}")
print(f"lower(): {text.lower()}")
print(f"capitalize(): {text.capitalize()}")
print(f"title(): {text.title()}")
print(f"swapcase(): {text.swapcase()}")

Stripping Whitespace

strip(): Removes leading and trailing whitespace (spaces, tabs, newlines).
lstrip(): Removes leading (left) whitespace only.
rstrip(): Removes trailing (right) whitespace only.

You can also pass a string of characters to strip those specific characters instead of whitespace.

messy_string = "   \n  some text here   \t "
print(f"'{messy_string.strip()}'")   # Output: 'some text here'
print(f"'{messy_string.lstrip()}'")  # Output: 'some text here   \t '
print(f"'{messy_string.rstrip()}'")  # Output: '   \n  some text here'
# Stripping specific characters
path = "///documents/file.txt///"
print(path.strip('/')) # Output: documents/file.txt

replace()

Replaces all occurrences of a substring with another. You can also provide a third argument to limit the number of replacements.


sentence = "I like cats. Cats are cute."
new_sentence = sentence.replace("cats", "dogs")
print(new_sentence) # Output: I like dogs. Dogs are cute.
# Replace only the first occurrence
limited_replace = sentence.replace("Cats", "Dogs", 1)
print(limited_replace) # Output: I like cats. Dogs are cute.

6. Lesser-Known but Useful String Functions

Let's explore some string methods that are not as commonly taught but can be incredibly helpful.

casefold()

This method is a more aggressive version of lower(). It's designed for caseless string comparisons and can handle more Unicode characters. For example, the German letter "ß" is equivalent to "ss".

german_string = "Der Fluß"
print(f"lower(): {german_string.lower()}")      # Output: der fluß
print(f"casefold(): {german_string.casefold()}")  # Output: der fluss

expandtabs()

This method replaces tab characters (\t) with a specified number of spaces.

text = "Name:\tJohn\nAge:\t30"
print("Default tab size (8):")
print(text.expandtabs())
print("\nTab size of 15:")
print(text.expandtabs(15))

ljust(), rjust(), center()

We saw these in f-string formatting, but they are also available as string methods for padding a string with a specified character (default is a space) to a certain width.

text = "menu"
print(f"'{text.ljust(10, '-')}'") # Output: 'menu------'
print(f"'{text.rjust(10, '-')}'") # Output: '------menu'
print(f"'{text.center(10, '-')}'")# Output: '---menu---'

zfill()

This method pads a numeric string on the left with zeros. It's particularly useful for creating fixed-width numeric representations.

number_str = "42"
print(number_str.zfill(5))  # Output: 00042
negative_number_str = "-42"
print(negative_number_str.zfill(5)) # Output: -0042

partition() and rpartition()

These methods split a string into three parts based on a separator.

partition() finds the first occurrence of the separator,

rpartition() finds the last. They return a tuple containing the part before the separator, the separator itself, and the part after.

url = "https://www.example.com/path/to/file.txt"
# Partitioning from the left
print(url.partition("://"))
# Output: ('https', '://', 'www.example.com/path/to/file.txt')
# Partitioning from the right to get the filename
print(url.rpartition('/'))
# Output: ('https://www.example.com/path/to', '/', 'file.txt')

translate() and maketrans()

These two methods work together to perform complex character replacements.

str.maketrans() creates a translation table, and translate() applies it.

Simple Two-Argument maketrans()

This maps characters from the first string to the corresponding characters in the second string.


# Replace vowels with numbers
input_str = "hello world"
translation_table = str.maketrans("aeiou", "12345")
translated_str = input_str.translate(translation_table)
print(translated_str) # Output: h2ll4 w4rld

Three-Argument maketrans()

The third argument specifies characters to be removed.

# Remove punctuation
input_str = "This is a sentence."
translation_table = str.maketrans("", "", ".,")
translated_str = input_str.translate(translation_table)
print(translated_str) # Output: This is a sentence

encode()

This method returns an encoded version of the string as a bytes object. This is essential when dealing with files, network requests, or any situation where data needs to be represented as a sequence of bytes. The most common encoding is UTF-8.


text = "Hello, world! 😊"
# Encode the string into bytes using UTF-8
encoded_text = text.encode('utf-8')
print(f"Original string: {text}")
print(f"Type of original: {type(text)}")
print(f"Encoded bytes: {encoded_text}")
print(f"Type of encoded: {type(encoded_text)}")
# To get it back, you would use .decode()
decoded_text = encoded_text.decode('utf-8')
print(f"Decoded string: {decoded_text}")

You’re the Expert!