For this post I will write a simple implementation of a 1-dimensional cellular automaton in C. The concept of cellular automata has existed since the middle of the 20th century and has grown into a vast field with many practical and theoretical applications.
A cellular automaton consists of any number of "cells" arranged in 1, 2, 3s or more dimensions. Each has a state associated with it (in the simplest case just on or off) and each cell and therefore the entire automaton transitions from one state to the next over time according to a rule or set of rules.
The number of dimensions, the number of possible cell states, and the rules can become arbitrarily large and complex but for this project I will implement the simplest type of one-dimensional cellular automaton, known as an elementary cellular automaton.
I recently wrote a post on Pascal's Triangle and in this post I will write a program in C to implement a Galton Board simulator, a Galton Board being an actual physical gadget following the Pascal Triangle's probabilistic characteristics.
A while ago I wrote an implementation of the Soundex Algorithm which attempts to assign the same encoding to words which are pronounced the same but spelled differently. In this post I'll cover the Levenshtein Word Distance algorithm which is a related concept measuring the "cost" of transforming one word into another by totalling the number of letters which need to be inserted, deleted or substituted.
The Levenshtein Word Distance has a fairly obvious use in helping spell checkers decided which words to suggest as alternatives to mis-spelled words: if the distance is low between a mis-spelled word and an actual word then it is likely that word is what the user intended to type. However, it can be used in any situation where strings of characters need to be compared, such as DNA matching.
I started off calling this project "Calculating Pi" but soon realised that I needed to rename it "Estimating Pi". Pi is an irrational number starting off 3.14159 and then carrying on for an infinite number of digits with no pattern which anybody has ever discovered. Therefore it cannot be calculated as such, just estimated to (in principle) any number of digits.
If you want to convert the contents of a data structure to a string to either save it to disc or transmit it over the wire you will probably choose either XML or JSON. Both of these have the advantage of including metadata with the data itself, so it is possible to recreate the data structure purely from the XML or JSON without any additional knowledge. The downside is that both these formats can be very bloated.
In a restricted environment where you have control over all the code writing and reading data you can get away without a data + metadata format, and just save the data itself. In this project I will write a few functions to create an array of structs, and then serialize/deserialize them to/from a file.
The C99 standard introduced the _Bool type as well as stdbool.h which allows you to use bool, true and false. _Bool uses a byte to store true/false, yes/no, on/off or whatever the semantics of your program might be, but of course you only really need 1 bit so 7 bits are wasted. Most of the time this isn't worth worrying about but in some rare cases where you need a lot of Booleans it might be worthwhile looking into ways of being a bit more efficient with memory usage. This article presents my particular approach to doing this.
This post will demonstrate programs which each perform a single very specific task but which can be chained together in such a way that the output of one forms the input of another. Connecting programs like this, or piping to use the correct terminology, enables more complex workflows or processes to be run.
This post includes just three short programs to carry out the following tasks:
- Generating data
- Filtering data from the data generating program
- Calculating totals from the filtering program
This post covers the first part of a project to develop a star catalog in C. The aims of this initial stage are quite modest - we will write a program to import star data from a file into a data structure and then print it to the screen in a neat format.
Future plans for the project include calculating a few extra pieces of information from existing data, sorting and filtering the data, and exporting it to various other formats.
All these things are specific to stellar data but of course the principles and coding techniques can be applied to any field.
The numbers in the graphic below form the first five rows of Pascal's Triangle. The first row consists of a single number 1. In subsequent rows, each of which is has one more number than the previous, values are calculated by adding the two numbers above left and above right. For the first and last values in each row we just take the single value above, therefore these are always 1.
Pascal's Triangle in its conventional centred layout
(If you do an image search for Pascal's Triangle you will find plenty more, most frequently in a honeycomb grid, sometimes animated, but always nicer than mine. I'm a software engineer, not a graphic designer!)
C is not, of course, an object oriented language and does not even have any discernible features of one. The three core characteristics of object oriented programming are frequently stated to be encapsulation, inheritance and polymorphism, but a more fundamental characteristic is the combining of data (or properties) and the functions (or methods) which work on that data into a single entity which can be used to access both. This can be simulated in C by adding function pointers to a struct. The nuts and bolts of doing so are not as slick as in a real OOP language such as C++ and frankly look a bit clunky, but in this post I will write a short demonstration of the principle. Whether it is actually worthwhile is a moot point!