Python Gotcha: Comparisons
Posted on Wed 18 October 2023 in Technical Solutions
Float Equality Comparisons¶
Like every other programming language, Python can't accurately represent floating-point numbers. I'm sure many Computer Science students have lost many hours learning how floating representation works. I remember that class well.
In any case, let's get into the problem with comparing float
values in Python and how to handle it.
>>> 0.1 + 0.2 == 0.3
False
You, me, and anyone with a few years of elementary school under their belt can see that this should be True
.
What's happening here? We can see the problem by breaking down the component parts of this comparison.
>>> 0.3
0.3
>>> 0.1 + 0.2
0.30000000000000004
And now we see the floating point representation that is causing a problem.
So, how do we deal with this?
Decimal¶
There are a couple options, both of which have their drawbacks. Let's start with Decimal
.
The decimal module provides support for fast correctly rounded decimal floating point arithmetic.
This sounds good, but an important gotcha here too is how it handles numerals vs. strings.
>>> from decimal import Decimal
>>> Decimal(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
>>> Decimal('0.1')
Decimal('0.1')
This means, to accomplish the comparison above, we need to wrap each of our numbers in a string.
>>> Decimal('0.1') + Decimal('0.2') == Decimal('0.3')
True
That's annoying, but does function.
isclose¶
Python 3.5 implemented PEP 485 to test for approximate equality. This is done in the isclose
function.
>>> import math
>>> math.isclose(0.1+0.2,0.3)
True
That's cleaner than wrapping everything in strings. But, it also is more verbose than a simple ==
statement. It makes your code less clean, but does provide accurate comparisons.
is vs. ==¶
Another comparison that I've commonly seen misapplied is developers using is
when they mean ==
. Put simply, is
should ONLY be used if you are checking if two references refer to the same object. ==
is used to compare value by calling underlying __eq__
methods.
Let's see this in action:
>>> a = [1, 2, 3]
>>> b = a
>>> c = [1, 2, 3]
>>> d = [3, 2, 1]
>>> a == b
True
>>> a == c
True
>>> a == d
False
So far, nothing unusual. a
has the same values of b
and c
and different values from d
. Now let's use is
>>> a is b
True
>>> b is a
True
>>> a is c
False
>>> a is d
False
>>> b is c
False
Here, the only True
statements are the comparison between a
and b
. This is because b
was initialized with the statement b = a
. The other two variables were initialized as their own statements and values. Remember, is
compares object references. If they are the same, it returned True
.
>>> id(a), id(b), id(c), id(d)
(2267170738432, 2267170738432, 2267170545600, 2267170359040)
Since a
and b
are the same object, we get a True
on their comparison. The others are different, hence the False
.
nan == nan¶
nan
, or Not a Number, is a floating point value that can not be converted to anything other than a float and is considered not equal to all other values. It's a common way to represent missing values in a data set.
There is a key phrase in that description above that is the basis for this Gotcha:
is considered not equal to all other values
It's common for software to check if two values are equal to one another prior to taking an action. For nan
, that does not work:
>>> float('nan') == float('nan')
False
This prevents code like this from entering the if
block of an if/else
statement
>>> a = float('nan')
>>> b = float('nan')
>>> if a == b:
... .. ## Do something if equal
... else:
... .. ## Do something if not equal
In this example, they are never equal.
This leads to an interesting, if not unintutive, way of checking if a variable is a nan
value. Since nan
is not equal to all other values, it is not equal to itself.
>>> a != a
True
Like I said, "Interesting". But, when your code is looked at by others it's also "confusing". There is an easier way to show that you are checking for a nan
value. isnan
>>> import math
>>> a = float('nan')
>>> b = 5
>>> c = float('infinity')
>>> math.isnan(a)
True
>>> math.isnan(b)
False
>>> math.isnan(c)
False
To me, that's a much clearer check that we want to see if the value is nan
. It's likely you aren't just passing nan
to a single variable. You're probably using a library like NumPy or Pandas. In that case, you have functions in each of those libraries that can check for nan
in a performant way.
- In NumPy the function has the same name but in the NumPy library:
numpy.isnan(value)
. - In Pandas, the function has a slightly different name:
pandas.isna(value)
Conclusion¶
Comparisons aren't always as straight forward as we'd like. I covered a few common comparison problems in Python here.
Floating point comparisons are common across languages. Python has a few ways of making this easier for developers. I recommend utilizing isclose
as it keeps the code a bit cleaner and eliminates the need to wrap numbers in strings if using the Decimal
module.
is
should only being used to check if two items are referring to the same object. In any other case, it's not doing the check you want it to be doing.
Lastly, nan
is equal to nothing else. It's important to be aware of that before you start comparing values in your dataset to one another.