Python Gotcha: Comparisons

Posted on Wed 18 October 2023 in Technical Solutions

Float Equality Comparisons

Like every other programming language, Python can't accurately represent floating-point numbers. I'm sure many Computer Science students have lost many hours learning how floating representation works. I remember that class well.

In any case, let's get into the problem with comparing float values in Python and how to handle it.

>>> 0.1 + 0.2 == 0.3
False

You, me, and anyone with a few years of elementary school under their belt can see that this should be True.

What's happening here? We can see the problem by breaking down the component parts of this comparison.

>>> 0.3
0.3
>>> 0.1 + 0.2
0.30000000000000004

And now we see the floating point representation that is causing a problem.

So, how do we deal with this?

Decimal

There are a couple options, both of which have their drawbacks. Let's start with Decimal.

The decimal module provides support for fast correctly rounded decimal floating point arithmetic.

This sounds good, but an important gotcha here too is how it handles numerals vs. strings.

>>> from decimal import Decimal
>>> Decimal(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
>>> Decimal('0.1')
Decimal('0.1')

This means, to accomplish the comparison above, we need to wrap each of our numbers in a string.

>>> Decimal('0.1') + Decimal('0.2') == Decimal('0.3')
True

That's annoying, but does function.

isclose

Python 3.5 implemented PEP 485 to test for approximate equality. This is done in the isclose function.

>>> import math
>>> math.isclose(0.1+0.2,0.3)
True

That's cleaner than wrapping everything in strings. But, it also is more verbose than a simple == statement. It makes your code less clean, but does provide accurate comparisons.

is vs. ==

Another comparison that I've commonly seen misapplied is developers using is when they mean ==. Put simply, is should ONLY be used if you are checking if two references refer to the same object. == is used to compare value by calling underlying __eq__ methods.

Let's see this in action:

>>> a = [1, 2, 3]
>>> b = a
>>> c = [1, 2, 3]
>>> d = [3, 2, 1]
>>> a == b
True
>>> a == c
True
>>> a == d
False

So far, nothing unusual. a has the same values of b and c and different values from d. Now let's use is

>>> a is b
True
>>> b is a
True
>>> a is c
False
>>> a is d
False
>>> b is c
False

Here, the only True statements are the comparison between a and b. This is because b was initialized with the statement b = a. The other two variables were initialized as their own statements and values. Remember, is compares object references. If they are the same, it returned True.

>>> id(a), id(b), id(c), id(d)
(2267170738432, 2267170738432, 2267170545600, 2267170359040)

Since a and b are the same object, we get a True on their comparison. The others are different, hence the False.

nan == nan

nan, or Not a Number, is a floating point value that can not be converted to anything other than a float and is considered not equal to all other values. It's a common way to represent missing values in a data set.

There is a key phrase in that description above that is the basis for this Gotcha:

is considered not equal to all other values

It's common for software to check if two values are equal to one another prior to taking an action. For nan, that does not work:

>>> float('nan') == float('nan')
False

This prevents code like this from entering the if block of an if/else statement

>>> a = float('nan')
>>> b = float('nan')
>>> if a == b:
...   .. ## Do something if equal
... else:
...   .. ## Do something if not equal

In this example, they are never equal.

This leads to an interesting, if not unintutive, way of checking if a variable is a nan value. Since nan is not equal to all other values, it is not equal to itself.

>>> a != a
True

Like I said, "Interesting". But, when your code is looked at by others it's also "confusing". There is an easier way to show that you are checking for a nan value. isnan

>>> import math
>>> a = float('nan')
>>> b = 5
>>> c = float('infinity')
>>> math.isnan(a)
True
>>> math.isnan(b)
False
>>> math.isnan(c)
False

To me, that's a much clearer check that we want to see if the value is nan. It's likely you aren't just passing nan to a single variable. You're probably using a library like NumPy or Pandas. In that case, you have functions in each of those libraries that can check for nan in a performant way.

  • In NumPy the function has the same name but in the NumPy library: numpy.isnan(value).
  • In Pandas, the function has a slightly different name: pandas.isna(value)

Conclusion

Comparisons aren't always as straight forward as we'd like. I covered a few common comparison problems in Python here.

Floating point comparisons are common across languages. Python has a few ways of making this easier for developers. I recommend utilizing isclose as it keeps the code a bit cleaner and eliminates the need to wrap numbers in strings if using the Decimal module.

is should only being used to check if two items are referring to the same object. In any other case, it's not doing the check you want it to be doing.

Lastly, nan is equal to nothing else. It's important to be aware of that before you start comparing values in your dataset to one another.


- is a father, an engineer and a computer scientist. He is interested in online community building, tinkering with new code and building new applications. He writes about his experiences with each of these.