Ben Steadman

Striving for efficient, effective and easy to use systems that achieve complex goals. I engineer software that meets real needs; solving problems and helping others. Outside of my professional work, I may also be found doing interesting things for no reason other than fun or intellectual curiosity.

Take a look at:

  • My blog for articles on software engineering, programming, education and other tangential topics.
  • More about Me.
  • My GitHub account for my public and Open Source work.
  • My CV.

Recent Posts

Reservoir Sampling refers to a family of algorithms for sampling a fixed number of elements from an input of unknown length with uniform probabilities. In this article, I aim to give an overview of Reservoir Sampling usage, implementation and testing. When/why is Reservoir Sampling useful? The main benefit of Reservoir Sampling is that it provides an upper bound on memory usage that is invariant of the input stream length. This allows sampling input streams which are far larger in size than the available memory of a system.

Read more…

A solution and, importantly, a proof for LeetCode Problem 11 - Container with Most Water.

Read more…

PySpark seemingly allows Python code to run on Apache Spark - a JVM based computing framework. How is this possible? I recently needed to answer this question and although the PySpark API itself is well documented, there is little in-depth information on its implementation. This article contains my findings from diving into the Spark source code to find out what’s really going. Spark vs PySpark For the purposes of this article, Spark refers to the Spark JVM implementation as a whole.

Read more…