# Levenshtein Word Distance

A while ago I wrote an implementation of the Soundex Algorithm which attempts to assign the same encoding to words which are pronounced the same but spelled differently. In this post I'll cover the Levenshtein Word Distance algorithm which is a related concept measuring the "cost" of transforming one word into another by totalling the number of letters which need to be inserted, deleted or substituted.

The Levenshtein Word Distance has a fairly obvious use in helping spell checkers decided which words to suggest as alternatives to mis-spelled words: if the distance is low between a mis-spelled word and an actual word then it is likely that word is what the user intended to type. However, it can be used in any situation where strings of characters need to be compared, such as DNA matching.

# Estimating Pi

I started off calling this project "Calculating Pi" but soon realised that I needed to rename it "Estimating Pi". Pi is an irrational number starting off 3.14159 and then carrying on for an infinite number of digits with no pattern which anybody has ever discovered. Therefore it cannot be calculated as such, just estimated to (in principle) any number of digits.

# Serializing and Deserializing Data Structures

If you want to convert the contents of a data structure to a string to either save it to disc or transmit it over the wire you will probably choose either XML or JSON. Both of these have the advantage of including metadata with the data itself, so it is possible to recreate the data structure purely from the XML or JSON without any additional knowledge. The downside is that both these formats can be very bloated.

In a restricted environment where you have control over all the code writing and reading data you can get away without a data + metadata format, and just save the data itself. In this project I will write a few functions to create an array of structs, and then serialize/deserialize them to/from a file.

# Boolean Bits

The C99 standard introduced the _Bool type as well as stdbool.h which allows you to use bool, true and false. _Bool uses a byte to store true/false, yes/no, on/off or whatever the semantics of your program might be, but of course you only really need 1 bit so 7 bits are wasted. Most of the time this isn't worth worrying about but in some rare cases where you need a lot of Booleans it might be worthwhile looking into ways of being a bit more efficient with memory usage. This article presents my particular approach to doing this.

# Redirection and Piping

This post will demonstrate programs which each perform a single very specific task but which can be chained together in such a way that the output of one forms the input of another. Connecting programs like this, or piping to use the correct terminology, enables more complex workflows or processes to be run.

This post includes just three short programs to carry out the following tasks:

• Generating data

• Filtering data from the data generating program

• Calculating totals from the filtering program