Measure Code Execution Time Accurately in Python

Measuring code execution times is hard. Learn how to eliminate systematic and random measurement errors and obtain more reliable results.

We often need to measure how long a specific part of code takes to execute. Unfortunately, simply measuring the system time before and after a function call is not very robust and susceptible to systematic and random measurement errors. This is especially true for measuring very short intervals (< 100 milliseconds).

Systematic and Random Errors

So what is wrong with the following way of measuring?

time_start = time.perf_counter()
time_end = time.perf_counter()
execution_time = time_end - time_start

First, there is a systematic error: by invoking time.perf_counter(), an unknown amount of time is added to the execution time of my_function(). How much time? This depends on the OS, the particular implementation and other uncontrollable factors.

Second, there is a random error: the execution time of the call to my_function() will vary to a certain degree.

We can combat the random error by just performing multiple measurements and taking the average of those. However, it is much more challenging to remove the systematic error.

Straight Line Fitting

Carlos Moreno and Sebastian Fischmeister presented a novel technique to combat this systematic error. The basic idea is to first measure the time of one function call, then the time of two, then the time of three, and so on. The resulting method may look like this:

time_1 = time.perf_counter()
time_2 = time.perf_counter()
time_3 = time.perf_counter()
time_4 = time.perf_counter()
# ...

You can then fit a straight line through the measurements:

The overall execution time can then be obtained by taking the slope a from the straight line y = a x + b.

In the above example, the straight line is y = 205.91 x + 29.56; therefore, the execution time equals 205.91 milliseconds.

The authors note that this type of measurement is very robust against occasional measurements with large errors. This can be visualized by artificially changing the 4th measurement and rerunning the line fitting process:

Even though one value is completely off, the resulting slope (201.15) is still very close to the previously measured value.

To learn more about the mathematical basics of this method, I invite you to read the original paper:

Python Implementation

You can find my implementation of the presented algorithm in my public GitLab repository:

All credit for the algorithm and the idea goes to Moreno and Fischmeister.

Edit January 18 2020: Corrected the usage of perf_counter().

Bernhard Knasmüller on Software Development