SteadBytes |TPP Topic 39: Algorithm Speed

See the first post in The Pragmatic Programmer 20th Anniversary Edition series for an introduction.

The first two challenges recommend some (excellent) books to the reader, however do not provide a specific challenge for me to write about here. I shall, therefore, begin with the third challenge.

Challenge 3

In the first exercise that follows we look at sorting arrays of long integers. What is the impact if the keys are more complex, and the overhead of key comparison is high? Does the key structure affect the efficiency of the sort algorithms, or is the fastest sort always fastest?

The algorithmic complexity of sorting algorithms is usually formulated in terms of comparisons so if the overhead ok key comparison is high then it is all the more important to choose the most efficient algorithm possible.

Key structure certainly does affect the efficiency of sorting algorithms. By ‘fastest sort’, the authors are referring to the sorting algorithm with the lowest algorithmic complexity. However, algorithms can also be analysed in terms of space/memory complexity which indicates how the memory usage of an algorithm grows with respect to input size. If the key structure is large an algorithm with a higher algorithmic complexity but a lower memory complexity may outperform an faster algorithm with a higher memory complexity due to the overhead of copying the keys.

Exercise 28

We coded a set of simple sort routines in Rust. Run them on various machines available to you. Do your figures follow the expected curves? What can you deduce about the relative speeds of your machines? What are the effects of various compiler optimization settings?

The code provided includes Rust implementations of bubble sort, insertion sort, selection sort and quicksort along with a simple benchmark program which times the execution of each sorting algorithm with various input sizes.

rustc (the Rust compiler) provides an opt-level flag which can be used to control the level of optimisation performed by during compilation. Using a simple script i ran the benchmark program with each opt-level (from 0 to 3 inclusive) on 3 different devices:

Lenovo Y540
- Intel Core i7-9750H CPU
- 16 GiB DDR4 RAM
- 500GB Samsung 970 EVO NVMe SSD
- Nvidia GeForce RTC 2060 GPU
- Pop!_OS 19.10
Lenovo ThinkPad X1 Carbon
- Intel Core i7-8565U CPU
- 16 GiB DDR4 RAM
- Intel UHD Graphics
- 512 GB Samsung (very long model number) NVMe SSD
- Pop!_OS 19.10
Raspberry Pi 3 Model B+
- Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC
- 1GB LPDDR2 SDRAM
- Raspbian Buster 2019-09-26

The following graphs compare benchmarks of the sorting algorithms across each of the devices. These results are using opt-level=3 as this is the default level for release builds with cargo.

Lenovo Y540 results

Raspberry Pi results

X1 Carbon results

Unsurprisingly, the Lenovo Y540 (being the most powerful device) was the fastest and the Raspberry Pi the slowest. These are not particularly revolutionary findings!

Comparing optimisation levels is somewhat more interesting; each of the following graphs shows the average (arithmetic mean) execution times for each input size across all 3 devices for each sorting algorithm.

Bubble sort results

Insertion sort results

Quicksort results

Selection sort results

Unsurprisingly again, the higher the optimisation level, the faster the execution time. However, it’s worth pointing out that opt-level=2 and opt-level=3 are extremely close and for small input sizes the effect of optimisation is far less.

Exercise 29

In Common Sense Estimation, on page 206, we claimed that a binary chop is O (lg n). Can you prove this?

Yep!

As a quick review, the binary chop is a general algorithm technique in which the set of items to be consider halves upon each iteration. This is most commonly found in the binary search algorithm for finding the position of a target value in a sorted array (although it does generalise to other data structures). Assuming an array sorted by ascending values:

Pick the middle element
If this is the target value, return.
Else, if this is less than the target value, repeat from step 1 but on the portion of the array above the middle element.
Else, repeat from step 1 but on the portion of the array below the middle element.
If the beginning or the end of the array is reached the target does not exist in the array, return indicating failure.

Each time step 1 is executed, the search space is half the size than on the previous iteration:

Iteration	Number of Elements
1	n
2	n / 2
3	(n / 2) / 2
4	((n / 2) / 2) / 2

In general, at iteration k there will be n / 2k. On the final iteration there must be 1 element remaining:

n/2k = 1
n = 2k

Taking the logarithm of both sides:

log2n = k log22
k = log2n

Done!

Exercise 30

Yep!

In Big O notation, constant factors are removed i.e. O(2n) is the same as O(3n). Converting between logarithm bases involves multiplying by a constant factor: logbx = logdx / logdb.

TPP Topic 39: Algorithm Speed

Contents

Challenge 3

Exercise 28

Exercise 29

Exercise 30

Feedback