Python Gotcha: strip, lstrip, rstrip can remove more than expected
Posted on Fri 29 March 2024 in Technical Solutions
Introduction¶
As a software engineer, you've cleaned your fair share of dirty strings. Removing leading or trailing spaces is probably one of the most common things done to user input.
In Python, this is done with the .strip()
, .lstrip()
or .rstrip()
functions and generally looks like this:
>>> " Andrew Wegner ".lower().strip()
'andrew wegner'
>>> " Andrew Wegner ".lower().lstrip()
'andrew wegner '
>>> " Andrew Wegner ".lower().rstrip()
' andrew wegner'
That's pretty straightforward and nothing unexpected in going on.
Gotcha¶
The Gotcha is that each of these functions take a list of characters that can be removed.
>>> "Andrew Wegner".lower().rstrip(" wegner")
'and'
What happened? Why wasn't the result just
'andrew'
Explanation¶
Read line from the documentation again, carefully:
A list of characters
Not a list of strings.
This is explicitly spelled out in the documentation, with an example, showing what the implications are. However, for a new developer, it's unexpected behavior. After all, these seem like intutive functions.
The example with my does the following:
- Receives a list of characters to remove. In this case it is all letters in my last name, plus the space character:
wegner
- Lower case all letters in the input string, resulting in
andrew wegner
- From the right hand side of the string, begin removing characters that are in the input list. Stop when you encounter a character not in the list. In this case that means that
rengew wer
are removed (right to left) and then thed
inandrew
is encountered so thatrstrip
function stops. - Return the remaining string of
and
Solution¶
Python has two functions that will correctly remove a string - .removesuffix()
and .removeprefix()
for right and left side removals.
>>> "Andrew Wegner".lower().removesuffix(" wegner")
'andrew'
These two functions were introduced in Python 3.9 as part of PEP-616. In the PEP, it explicitly calls out the confusion users have about the *strip()
functions and how they behave. These two were introduced to allow the desired behavior.
One important note is that these two remove*
functions will only remove at most one instance of the string.
>>> "Andrew Wegner Wegner".lower().removesuffix(" wegner")
'andrew wegner'