The SteadBlog

A mixture of Computer Science and Software engineering related writings, study notes, book reviews and general ramblings. Opinions are my own - enjoy!

TL;DR: Replace s_0 with s_1, s_1 with s_2 and so on (assuming the largest number to replace is less than 26):

s/s_\(\d\+\)/\='s_'.(nr2char(97+submatch(1)))/g

Read more…

Reservoir Sampling refers to a family of algorithms for sampling a fixed number of elements from an input of unknown length with uniform probabilities. In this article, I aim to give an overview of Reservoir Sampling usage, implementation and testing. When/why is Reservoir Sampling useful? The main benefit of Reservoir Sampling is that it provides an upper bound on memory usage that is invariant of the input stream length. This allows sampling input streams which are far larger in size than the available memory of a system.

Read more…

A solution and, importantly, a proof for LeetCode Problem 11 - Container with Most Water.

Read more…

PySpark seemingly allows Python code to run on Apache Spark - a JVM based computing framework. How is this possible? I recently needed to answer this question and although the PySpark API itself is well documented, there is little in-depth information on its implementation. This article contains my findings from diving into the Spark source code to find out what’s really going. Spark vs PySpark For the purposes of this article, Spark refers to the Spark JVM implementation as a whole.

Read more…

See the first post in The Pragmatic Programmer 20th Anniversary Edition series for an introduction. The first two challenges recommend some (excellent) books to the reader, however do not provide a specific challenge for me to write about here. I shall, therefore, begin with the third challenge. Challenge 3 In the first exercise that follows we look at sorting arrays of long integers. What is the impact if the keys are more complex, and the overhead of key comparison is high?

Read more…