Benchmarking Python Heaps

I was doing some work on some software that needs a cache. This is pretty common in the DNS world, and indeed caches with a timeout have many uses. The basic operations are:

  • Add an entry with a key and a given expiration time
    This is the basic operation when we have a new entry.
  • Lookup an entry by some key
    This is what we do to see if an entry is already in the cache.
  • Update an entry to have a new expiration time
    When we want to refresh an item in the cache, for example when it has been used, we want to update the time when it is scheduled to expire.
  • Find or remove the entry for the next expiration time
    This operation is for when we want to clean items out of the cache. (Note that we can either remove items when they are scheduled to expire, or simply remove items when we are out of space. We need the same thing in both cases: find the item with the oldest expiration time.)

In Python, we have a built-in dictionary data structure which works fine for looking up an entry by a key. We also have the heapq module as part of the standard Python library, which implements a heap; this works fine for adding, removing, and updating based on the expiration time. Unfortunately these two data structures don’t work so well together. The basic problem is that when an item changes position in the heap, the object does not know it, so if you reference it from the dictionary you have to search through the entire heap to remove the entry.

I checked the always-useful Python Package Index (PyPI) and found that there is the HeapDict module there which does exactly what I want. Fantastic! Problem solved.

But…

I couldn’t help wondering if maybe one could somehow use the built-in heapq and get better performance, if for no other reason than heapq is coded in C instead of pure Python. The secret would be to have a way to update information in the contents of the heap when they changed position in the underlying list. I even coded up a list implementation that provided this functionality before I realized that I had not actually measured the performance to see if it made sense.

So I decided to benchmark not only the heapdict and heapq modules, but also all of the various other heap implementations in PyPI.

All benchmarks were run on a single-core Python on my laptop. I used the perf module, which I saw presented at FOSDEM 2017.

The Tests

In order to benchmark the various heap implementations, we use the published API in order to check the two fundamental operations:

  • Adding something to the heap.
  • Removing the smallest (or largest) item from the heap.

We do NOT check the operations:

  • Convert an unordered list into a heap.
  • Remove an arbitrary item from the heap.
  • Change the priority of an item on the heap.

Converting an unordered list into a heap may be interesting, but not so much for implementing a cache.

On a basic heap, both remove/change require a linear search through the heap and so are not the main strength of a heap implementation.

In the tests the items that we add are a single-element tuple, something like this:

import heapq

h = []
heap.heappush(h, (42,))

The reason for this is that a heap is very rarely used to store lists of numbers, but much more often used to order objects somehow. Using a tuple should be the cheapest non-numeric items that we can add.

The Modules

I did a simple search of PyPI using:

$ python3 -m pip search heap

I ended up with the following modules:

  • HeapDict
    This module acts as both a heap and a dictionary. It is pure Python.
  • heapq
    This is included in the Python standard library. It uses a C implementation if available, otherwise a Python version.
  • pyheapq
    This is a version I copied from the Python standard library, but removed the part which imports the C implementation if available. The idea is to make a fair comparison of the algorithm of heapq versus the other modules written in Python only. Also, because heapq is a C implementation, it updates the list that represents the heap directly, so you cannot use a UserList and use a list with modified behavior; the Python version works as expected when using such a list.
  • binaryheap
    This is a fairly straightforward Python implementation, which seems to be relatively new (8 months old).
  • heapqueue
    A Python implementation that appears to be under active development, with the last GitHub commit only a few days ago. Looking at the current GitHub code we see stubbed-out versions of both binomial and Fibonacci heaps, although those have not yet been implemented. So perhaps there are larger plans for this module.
  • fibonacci-heap-mod
    This module is a bit different from the others. It implements a Fibonacci heap, rather than a binary heap. This is a more expensive data structure, but has better run time for large heaps. It is a port of a Java implementation, so is a bit less Pythonic than the other modules – although this may actually result in better performance.

Note that the binaryheap and heapqueue modules may be more recent in GitHub than in PyPI. I did not spend any effort trying the latest versions. Also, I did some profiling and made some performance tweaks to the HeapDict module, and got about a 30% speed boost (you can find that in my GitHub fork); the benchmarks here are for the module available in PyPI.

Insertion Results

For insertion in a small heap, the results are as follows:

cpython-heap-insertion-n1000

We see the same behavior for the binary-heap implementations, which is that inserting in ascending order is fastest, descending order is slowest, and random order is in the middle. The standard library’s C-based module is the fastest. We also see that the standard library’s Python-based modules is the fastest of the binary heaps, which shows us that it is an excellent implementation regardless of the language.

Interesting to see is that the Fibonacci heap works as advertised, with a constant time for adding regardless of the order.

For comparison, I also ran the same benchmark with PyPy, an alternate Python implementation written in RPython (a simplified Python). This includes a just-in-time (JIT) compiler, and generally executes much faster than CPython (the reference Python implementation):

pypy-heap-insertion-n1000

One thing to note is that the heapq and pyheapq times are the same in PyPy. This is because on PyPy the Python interpreter is as fast as the C implementation, so only the Python version is used.

We see the same pattern in PyPy as in CPython: ascending is fast, descending is slow, random in between. The heap implementation in the standard library is the fastest. We also see that the Fibonacci heap has a fast, constant-time insertion.

What is different is that the Fibonacci heap is much faster at insertion than even the built-in heap implementation. This may be because you assign a numeric weight to each item in the heap, so comparisons are numeric and PyPy can optimize these better.

Now lets look at insertion time for a larger heap, with 1 million elements:

cpython-heap-insertion-n1000000

I kept the scale microseconds so that you can compare the chart with the times for 1000 elements. Here are the results for PyPy:

pypy-heap-insertion-n1000000

The picture for 1K and 1M elements is basically the same, which is not unexpected but nice to see.

Removal Results

We run fewer tests for removal for a couple of reasons. First, we assume that once items are in the heap that it does not matter which order they were inserted. (This may be incorrect, but this is my assumption.) Second, the tests take longer to run because we need to build the heap that we want to pop off of before timing the removal. (There may be clever ways to combine an insertion and removal test, but I did not pursue this.)

Here are the results for 1000 elements in Python:

cpython-heap-pop-n1000

And for 1000 elements in PyPy:

pypy-heap-pop-n1000

One surprising result here is that the standard library’s heapq module is about 2x faster in CPython than in PyPy.

Here are the results for 1 million elements in Python:

cpython-heap-pop-n1000000

And here are the results for 1 million elements in PyPy:

pypy-heap-pop-n1000000

Just like with adding to the heap, the relative performance of the various implementations remains basically the same for heaps with 1 million elements as for heaps with 1000 elements.

Even though the heapq implementation is about 2x faster in stock Python, it is still slower than the Fibonacci heap in PyPy, which is the fastest in removal.

Conclusion

The heap implementation included in the Python standard library is well-implemented and efficient – if you are able to, probably this is the best choice for most things.

The heap + dictionary module is handy, so if performance is not a major consideration it is certainly “good enough”. I have used it in the past for an A* path-finding implementation, and it worked great.

The other basic heap implementations in PyPI are very new, and may become faster or grow other features in the future.

I was impressed at the performance of the Fibonacci heap module. For a cache, it is the module that has the best overall performance in terms of runtime. There may be a size penalty, which can of course be important for a large cache. Anyone who wants to use this data structure may want to research this.

Indeed the performance of the Fibonacci heap module is good enough that it makes me wonder whether something like this should be added to the Python standard library.

Finally, use PyPy. 😉

References

You can find the code used to do the benchmarking here:

https://github.com/shane-kerr/heapbench

You can see a spreadsheet with the results that I got in the attached heapbench-results file.

Advertisements