Code Snippets

When I'm bored, I sometimes write small "snips" of code that implement some interesting problem, or just solve a problem in an unconventional manner.

A Knight's Tour is a sequence of moves such that a knight will step on each square of a m × n chess board only once. More technically, a tour is a Hamiltonian cycle of a chessboard for a Knight. This snippet finds one such tour on a board via the backtracking method. While it is guaranteed to find a solution (or the lack of a solution), it can be very slow (i.e. an 8 × 8 board may takes hours to solve).

This is a monte carlo simulation of the electoral college. It has a web front-end, so you can run your own simulations.

Pi, of course, is the ratio of a circle's circumference to the same circle's diameter. It is commonly abbreviated to 3.14(159), but is actually an irrational number and thus has an infinite number of digits. Monte Carlo is an approximation method wherein we use many random runs or rolls to approximate the real solution. In this case, we have inscribed a unit circle (circle of radius 1) in a box. We randomly generate points within the box. The total area of the box is 4; the total area of the circle is pi. Every generated point will land inside the box, but only pi/4 will fall into the circle. By counting how many points land inside the circle versus outside, we can approximate pi.

This method does not converge very fast and requires many millions of iterations to start becoming useful. However, it is an easy demonstration of how monte carlo methods work. The table below shows the convergence of the method as the number of iterations increase. Error values are rounded to five decimal places. Since the values are randomly generated, further runs will produce similar, but not necessarily identical results. (Although, as iterations increase, the similarity should increase as well.)

Iterations Value for pi Error
1 0 3.14159
10 3.2 0.05841
100 3.16 0.01841
1,000 3.164 0.02241
10,000 3.144 0.00241
100,000 3.15028 0.00869
1,000,000 3.140412 0.00118
10,000,000 3.1417688 0.00018

Given a list, this method prints the mth entry from the end. The problem is trivial, except that this method is designed to work on singly linked lists and is recursive. This snippet is more of a demonstration of existence than a good solution; the two pointer solution to the problem is superior.

Find a list of primes using the Sieve of Eratosthenes method. This code demonstrates the use of filter and the fact that Python does not fully optimize tail recursive calls. I wrote this because I became irritated at some Mathematica code that I felt was wasteful of time and space.

A trie (pronounced tree) is a space-efficient structure for storing sets of strings. The depth of the tree is based on the longest string or word in the tree. All similar word prefixes are stored in the same position in memory, thus making the structure space-efficiency dependent on the similarity of the data. (Telephone numbers would work well, because of their common prefixes.)

Since a trie is typically used as a set, the common operations are "add" and "contains", where add adds an element to the set and contains tests for existence. I benchmarked my trie against Python's list and dictionary data structures. This is not a fair comparison; both the list and dictionary are written in C and have been highly optimized.

The benchmark consists of two tests, an insert and contains test. For the first, I benched how long it took the structure to add the entire contents of the Unix words file (the default dictionary in Unix). The version of words on my system has 483,523 words and 4,992,010 characters. When inserted into the trie, the number of characters stored was 2,892,791. Thus, the trie (when ignoring the actual tree overhead) yielded a 58% compression in size.

The second test took a random hundred, thousand, and ten-thousand words from the same file and asked the structure if they contained that string. For the list, this was an O(N) operation for each test, while it was (effectively) an O(1) operation for the dictionary and trie.

Times are in seconds. The file contains the benchmark routines so you may run them on your machine as well.

Insert Contains - 100 Contains - 1,000 Contains - 10,000
Trie 43.8 0.04 0.01 0.09
Dict 2.45 <0.01 <0.01 <0.01
List 1.47 7.67 76.9 767.6

Systems often have some "jitter" when benchmarking. Thus, I'm not too worried about the trie's contains tests not following a curve. From these tests, you can see that the list does very poorly at contains tests, but that is to be expected. Instead, I find it encouraging that the trie compares so favorably with the dictionary. Although it is slower, several magnitudes in fact, it still finishes in less than a tenth of a second. In this case, the compression offered by the trie was pretty good, but is highly dependent on your data set.