Levenshtein Word Distance

A while ago I wrote an implementation of the Soundex Algorithm which attempts to assign the same encoding to words which are pronounced the same but spelled differently. In this post I'll cover the Levenshtein Word Distance algorithm which is a related concept measuring the "cost" of transforming one word into another by totalling the number of letters which need to be inserted, deleted or substituted.

The Levenshtein Word Distance has a fairly obvious use in helping spell checkers decided which words to suggest as alternatives to mis-spelled words: if the distance is low between a mis-spelled word and an actual word then it is likely that word is what the user intended to type. However, it can be used in any situation where strings of characters need to be compared, such as DNA matching.

Continue reading

Estimating Pi

I started off calling this project "Calculating Pi" but soon realised that I needed to rename it "Estimating Pi". Pi is an irrational number starting off 3.14159 and then carrying on for an infinite number of digits with no pattern which anybody has ever discovered. Therefore it cannot be calculated as such, just estimated to (in principle) any number of digits.

Continue reading

Serializing and Deserializing Data Structures

If you want to convert the contents of a data structure to a string to either save it to disc or transmit it over the wire you will probably choose either XML or JSON. Both of these have the advantage of including metadata with the data itself, so it is possible to recreate the data structure purely from the XML or JSON without any additional knowledge. The downside is that both these formats can be very bloated.

In a restricted environment where you have control over all the code writing and reading data you can get away without a data + metadata format, and just save the data itself. In this project I will write a few functions to create an array of structs, and then serialize/deserialize them to/from a file.

Continue reading