From Basics to Advanced Techniques: This pebble is designed to be your go-to resource for mastering string manipulation in Python. We'll start with the fundamentals and gradually explore more complex and lesser-known functions. Let's dive in!
In Python, a string is a sequence of characters enclosed in either single quotes ('...'), double quotes ("..."), or triple quotes ('''...''' or """..."""). Strings are immutable, which means that once a string is created, it cannot be changed. However, we can perform various operations to create new strings based on the original.
# Single quotes
my_string1 = 'Hello, World!'
# Double quotes
my_string2 = "Python is fun."
# Triple quotes for multi-line strings
my_string3 = """This is a
multi-line
string."""
print(my_string1)
print(my_string2)
print(my_string3)
You can combine strings using the + operator and repeat them using the * operator.
first_name = "Santosh"
last_name = "V"
# Concatenation
full_name = first_name + " " + last_name
print(full_name) # Output: Santosh V
# Repetition
separator = "-" * 10
print(separator)
# Output: ----------
Properly formatting strings is crucial for creating readable output. Python offers several ways to do this.
This is the modern and preferred way. You prefix the string with an f or F and place variables or expressions directly inside curly braces {}.
name = "Santosh"
age = 30
greeting = f"My name is {name} and I am {age} years old."
print(greeting)
# Output: My name is Santosh and I am 30 years old.
Before f-strings, the .format() method was the standard. It works by placing placeholder curly braces in the string and then calling the method with the values to insert.
name = "santosh"
age = 25
# Using positional arguments
greeting_pos = "My name is {} and I am {} years old.".format(name, age)
print(greeting_pos)
# Using keyword arguments
greeting_key = "My name is {name} and I am {age} years old.".format(name="Santosh", age=35)
print(greeting_key)
F-strings also allow for "name specifiers" that give you more control over the formatting of the embedded expressions. The general syntax is {value:specifier}.
You can align text to the left (<), right (>), or center (^) within a specified width.
text = "Python"
# Left-aligned, width of 20
print(f"'{text:<20}'") # Output: 'Python '
# Right-aligned, width of 20
print(f"'{text:>20}'") # Output: ' Python'
# Centered, width of 20
print(f"'{text:^20}'") # Output: ' Python '
# You can also specify a fill character
print(f"'{text:*^20}'") # Output: '*******Python*******'
F-strings are powerful for formatting numbers.
number = 1234.56789
# Limiting decimal places
print(f"Two decimal places: {number:.2f}") # Output: 1234.57
# Adding a comma as thousands separator
print(f"With comma: {number:,.2f}") # Output: 1,234.57
# Displaying as a percentage
percentage = 0.85
print(f"Percentage: {percentage:.1%}") # Output: 85.0%
These three methods help you check if a string consists of numeric characters, but they have subtle and important differences.
This is the most restrictive method. It returns True only if all characters in the string are decimal characters (0-9).
s1 = "12345"
s2 = "123.45"
s3 = "\u00B2" # Superscript two
print(f"'{s1}'.isdecimal(): {s1.isdecimal()}") # Output: True
print(f"'{s2}'.isdecimal(): {s2.isdecimal()}") # Output: False
print(f"'{s3}'.isdecimal(): {s3.isdecimal()}") # Output: False
This method is broader than isdecimal(). It returns True if all characters are digits, which includes decimal characters and superscripts/subscripts.
s1 = "12345"
s2 = "\u00B2" # Superscript two
s3 = "\u2153" # Vulgar fraction one-third
print(f"'{s1}'.isdigit(): {s1.isdigit()}") # Output: True
print(f"'{s2}'.isdigit(): {s2.isdigit()}") # Output: True
print(f"'{s3}'.isdigit(): {s3.isdigit()}") # Output: False
This is the most general of the three. It returns True for digits, fractions, subscripts, superscripts, and other numeric characters in various languages.
s1 = "12345"
s2 = "\u00B2" # Superscript two
s3 = "\u2153" # Vulgar fraction one-third
s4 = "一二三" # Chinese numerals
print(f"'{s1}'.isnumeric(): {s1.isnumeric()}") # Output: True
print(f"'{s2}'.isnumeric(): {s2.isnumeric()}") # Output: True
print(f"'{s3}'.isnumeric(): {s3.isnumeric()}") # Output: True
print(f"'{s4}'.isnumeric(): {s4.isnumeric()}") # Output: True
Key Takeaway:
Use isdecimal() when you need to be sure the string can be converted to an integer. isdigit() is useful for a broader range of digit-like characters. isnumeric() is the most inclusive.
isdigit() | isnumeric() | isdecimal() |
True is all characters in string are digits (0-9) | True if all the characters in the string are numeric | True if all characters in string are decimals |
Includes characters like superscript and other Unicode characters | Includes digits, fractions, roman numerals and other numeric Unicode characters | More restrictive than other 2, as it considers only standard base-10 digits |
These functions help you locate and quantify characters or sequences of characters within a string. find() and rfind()
sentence = "The quick brown fox jumps over the lazy dog."
# Find the first occurrence of "fox"
print(f'Index of "fox": {sentence.find("fox")}') # Output: 16
# Find the first occurrence of "the" (case-sensitive)
print(f'Index of "the": {sentence.find("the")}') # Output: 31
# Find the last occurrence of "the"
print(f'Index of "the" from the right: {sentence.rfind("the")}') # Output: 31
# What if the substring isn't there?
print(f'Index of "cat": {sentence.find("cat")}') # Output: -1
These are nearly identical to find() and rfind(), with one critical difference: if the substring is not found, they raise a ValueError instead of returning -1.
Use find() if you just want to check for existence and get an index without worrying about errors. It's great for conditional checks (if my_string.find(...) != -1:).
Use index() if you expect the substring to be present and want the program to stop with an error if it's not. This can help catch unexpected data issues.
Returns the number of non-overlapping occurrences of a substring.
sentence = "the three thieves thought that they could trick the other three."
print(f'The word "the" appears {sentence.count("the")} times.') # Output: 5
Since strings are immutable, these methods return a new, modified copy of the string.
text = "weLCoMe to tHe jUnGle"
print(f"Original: {text}")
print(f"upper(): {text.upper()}")
print(f"lower(): {text.lower()}")
print(f"capitalize(): {text.capitalize()}")
print(f"title(): {text.title()}")
print(f"swapcase(): {text.swapcase()}")
You can also pass a string of characters to strip those specific characters instead of whitespace.
messy_string = " \n some text here \t "
print(f"'{messy_string.strip()}'") # Output: 'some text here'
print(f"'{messy_string.lstrip()}'") # Output: 'some text here \t '
print(f"'{messy_string.rstrip()}'") # Output: ' \n some text here'
# Stripping specific characters
path = "///documents/file.txt///"
print(path.strip('/')) # Output: documents/file.txt
Replaces all occurrences of a substring with another. You can also provide a third argument to limit the number of replacements.
sentence = "I like cats. Cats are cute."
new_sentence = sentence.replace("cats", "dogs")
print(new_sentence) # Output: I like dogs. Dogs are cute.
# Replace only the first occurrence
limited_replace = sentence.replace("Cats", "Dogs", 1)
print(limited_replace) # Output: I like cats. Dogs are cute.
Let's explore some string methods that are not as commonly taught but can be incredibly helpful.
This method is a more aggressive version of lower(). It's designed for caseless string comparisons and can handle more Unicode characters. For example, the German letter "ß" is equivalent to "ss".
german_string = "Der Fluß"
print(f"lower(): {german_string.lower()}") # Output: der fluß
print(f"casefold(): {german_string.casefold()}") # Output: der fluss
This method replaces tab characters (\t) with a specified number of spaces.
text = "Name:\tJohn\nAge:\t30"
print("Default tab size (8):")
print(text.expandtabs())
print("\nTab size of 15:")
print(text.expandtabs(15))
We saw these in f-string formatting, but they are also available as string methods for padding a string with a specified character (default is a space) to a certain width.
text = "menu"
print(f"'{text.ljust(10, '-')}'") # Output: 'menu------'
print(f"'{text.rjust(10, '-')}'") # Output: '------menu'
print(f"'{text.center(10, '-')}'")# Output: '---menu---'
This method pads a numeric string on the left with zeros. It's particularly useful for creating fixed-width numeric representations.
number_str = "42"
print(number_str.zfill(5)) # Output: 00042
negative_number_str = "-42"
print(negative_number_str.zfill(5)) # Output: -0042
These methods split a string into three parts based on a separator.
partition() finds the first occurrence of the separator,
rpartition() finds the last. They return a tuple containing the part before the separator, the separator itself, and the part after.
url = "https://www.example.com/path/to/file.txt"
# Partitioning from the left
print(url.partition("://"))
# Output: ('https', '://', 'www.example.com/path/to/file.txt')
# Partitioning from the right to get the filename
print(url.rpartition('/'))
# Output: ('https://www.example.com/path/to', '/', 'file.txt')
These two methods work together to perform complex character replacements.
str.maketrans() creates a translation table, and translate() applies it.
This maps characters from the first string to the corresponding characters in the second string.
# Replace vowels with numbers
input_str = "hello world"
translation_table = str.maketrans("aeiou", "12345")
translated_str = input_str.translate(translation_table)
print(translated_str) # Output: h2ll4 w4rld
The third argument specifies characters to be removed.
# Remove punctuation
input_str = "This is a sentence."
translation_table = str.maketrans("", "", ".,")
translated_str = input_str.translate(translation_table)
print(translated_str) # Output: This is a sentence
encode()
This method returns an encoded version of the string as a bytes object. This is essential when dealing with files, network requests, or any situation where data needs to be represented as a sequence of bytes. The most common encoding is UTF-8.
text = "Hello, world! 😊"
# Encode the string into bytes using UTF-8
encoded_text = text.encode('utf-8')
print(f"Original string: {text}")
print(f"Type of original: {type(text)}")
print(f"Encoded bytes: {encoded_text}")
print(f"Type of encoded: {type(encoded_text)}")
# To get it back, you would use .decode()
decoded_text = encoded_text.decode('utf-8')
print(f"Decoded string: {decoded_text}")