Serializing and Deserializing Data Structures

If you want to convert the contents of a data structure to a string to either save it to disc or transmit it over the wire you will probably choose either XML or JSON. Both of these have the advantage of including metadata with the data itself, so it is possible to recreate the data structure purely from the XML or JSON without any additional knowledge. The downside is that both these formats can be very bloated.

In a restricted environment where you have control over all the code writing and reading data you can get away without a data + metadata format, and just save the data itself. In this project I will write a few functions to create an array of structs, and then serialize/deserialize them to/from a file.

For the purposes of this demonstration I will create a very simple struct called thing. It doesn't represent any real "thing" but just has a set of members chosen to represent a selection of data types. I will also write a set of functions to create a list of "things", add items to it, print the data and free the memory. On top of this there will be two functions to serialize and deserialize the list to/from a file.

Create a new folder and within it create the following empty files. You can download the source code from the Downloads page or download/clone from Github if you prefer.

  • thinglist.h
  • thinglist.c
  • main.c

Now open thinglist.h and enter or paste the following code.

thinglist.h

#include<stdbool.h>
#include<time.h>

//--------------------------------------------------------
// STRUCT thing
//--------------------------------------------------------
typedef struct thing
{
    char* name;
    int count;
    double value;
    time_t timestamp;
    bool trueorfalse;
} thing;

//--------------------------------------------------------
// STRUCT thinglist
//--------------------------------------------------------
typedef struct thinglist
{
    thing* things;
    int size;
} thinglist;

//--------------------------------------------------------
// FUNCTION PROTOTYPES
//--------------------------------------------------------
thinglist* thinglist_new(void);
bool thinglist_append(thinglist* tl, char* name, int count, double value, time_t timestamp, bool trueorfalse);
void thinglist_print(thinglist* tl);
void thinglist_free(thinglist* tl);
bool thinglist_serialize(thinglist* tl, char* filepath);
thinglist* thinglist_deserialize(char* filepath);

Firstly we have a couple of #includes: stdbool.h lets us use bool, true and false instead of _Bool, 1 and 0, while time gives us time_t and other stuff related to handling dates and times.

After this we have the thing struct which, as I stated above, is contrived to give us a few different data types to experiment with. After this comes the thinglist struct designed to hold a list of strings as well as the list size. Lastly come the function prototypes for handling a thinglist, which I will describe as we come to them.

Now we can move on to thinglist.c, which I'll describe in several stages.

thinglist.c part 1: #includes and thinglist_new function

#include<stdlib.h>
#include<stdio.h>
#include<string.h>

#include"thinglist.h"

//--------------------------------------------------------
// FUNCTION thinglist_new
//--------------------------------------------------------
thinglist* thinglist_new(void)
{
    thinglist* tl = malloc(sizeof(thinglist));

    if(tl != NULL)
    {
        *tl = (thinglist){.things = NULL, .size = 0};

        return tl;
    }
    else
    {
        return NULL;
    }
}

After the #includes comes the thinglist_new function which uses malloc to get us enough memory for a thinglist struct, checks the malloc was successful by checking it for NULL, and if so assigns a new struct to the memory, after which it returns the new pointer. If the malloc was unsuccessful we return NULL which must be checked for by the calling code.

thinglist.c part 2: thinglist_append function

//--------------------------------------------------------
// FUNCTION thinglist_append
//--------------------------------------------------------
bool thinglist_append(thinglist* tl, char* name, int count, double value, time_t timestamp, bool trueorfalse)
{
    if(tl->things == NULL)
    {
        tl->things = malloc(sizeof(thing));
    }
    else
    {
        tl->things = realloc(tl->things, sizeof(thing) * (tl->size + 1) );
    }

    if(tl->things != NULL)
    {
        *(tl->things + (tl->size)) = (thing){.name = NULL, .count = count, .value = value, .timestamp = timestamp, .trueorfalse = trueorfalse};

        (tl->things + (tl->size))->name = malloc(strlen(name) + 1);

        if((tl->things + (tl->size))->name != NULL)
        {
            strcpy((tl->things + (tl->size))->name, name);

            tl->size++;

            return true;
        }
        else
        {
            return false;
        }
    }
    else
    {
        return false;
    }
}

The append function firstly uses malloc or realloc to grab enough memory for the new size of the list. If there are not yet any items in the list tl->things will be NULL, in which case we use malloc to obtain enough memory for the one item being added. If tl->things is not NULL there are already items in the list so we use realloc to increase the memory to the sise of a single "thing" multiplied by its current size + 1, the + 1 to allow for the new item being added.

If the malloc/realloc was successful we set the end item to a struct, most of which is initialized to the arguments passed to the function. The exception is name which, being a char array or string, needs its own malloc. If that malloc is successful we strcpy the value to it, increment size and return true to indicate success. If any of the malloc or realloc calls failed we return false.

thinglist.c part 3: thinglist_print and thinglist_free functions

//--------------------------------------------------------
// FUNCTION thinglist_print
//--------------------------------------------------------
void thinglist_print(thinglist* tl)
{
    for(int i = 0; i < tl->size; i++)
    {
        printf("name:          %-32s\n", (tl->things + i)->name);
        printf("count:         %d\n", (tl->things + i)->count);
        printf("value:         %lf\n", (tl->things + i)->value);
        printf("timestamp:     %ld\n", (tl->things + i)->timestamp);
        printf("Date and Time: %s", ctime(&((tl->things + i)->timestamp)));
        printf("trueorfalse:   %-32s\n", (tl->things + i)->trueorfalse == true ? "true" : "false");

        printf("---------------------------------------\n");
    }
}

//--------------------------------------------------------
// FUNCTION thinglist_free
//--------------------------------------------------------
void thinglist_free(thinglist* tl)
{
    for(int i = 0; i < tl->size; i++)
    {
        free((tl->things + i)->name);
    }

    free(tl->things);

    free(tl);
}

These functions are quite straightforward: thinglist_print simply iterates the list and prints out the members. The timestamp is printed as a number but experienced programmers will know that users can be scornful of dates and times displayed as the number of seconds before or after a certain date so I have added a line which prints out the date and time in a more human-friendly way using the ctime function from time.h. The thinglist_free function actually has three tasks: firstly it calls free on the memory used for the name members, then the memory used to hold the list of thing structs, and finally on the thing list itself.

thinglist.c part 4: thinglist_serialize function

//--------------------------------------------------------
// FUNCTION thinglist_serialize
//--------------------------------------------------------
bool thinglist_serialize(thinglist* tl, char* filepath)
{
    FILE* fp;

    fp = fopen(filepath, "w");

    if(fp != NULL)
    {
        for(int i = 0; i < tl->size; i++)
        {
            fprintf(fp,
                    "%s\n%d\n%lf\n%ld\n%d\n",
                    (tl->things + i)->name,
                    (tl->things + i)->count,
                    (tl->things + i)->value,
                    (tl->things + i)->timestamp,
                    (tl->things + i)->trueorfalse);
        }

        fclose(fp);

        return true;
    }
    else
    {
        return false;
    }
}

With thinglist_serialize we get to the first of the two functions at the heart of this project. The function takes a pointer to the thinglist to serialize and the path of the file to write to. Firstly we create and try to open a FILE pointer for writing, checking whether it is NULL and returning false if so. If the file is opened successfully we can then enter a for loop to iterate the list.

With each iteration we write a thing struct using the fprintf function. This works in almost exactly the same way as printf except that its first argument is a FILE*. (In fact you can think of printf as a special version of fprintf which always writes to stdout, a "built-in" FILE* which by default points to the monitor.)

The fprintf call is rather long so I have split it across seven lines. The second line looks like a jumble of random characters but if you look closely you will see it is just the format specifiers for the various types in the thing struct separated by \n new lines. The last five lines are just the various values from the "thing". After the for loop we can close the file and return true.

thinglist.c part 5: thinglist_deserialize function

//--------------------------------------------------------
// FUNCTION thinglist_deserialize
//--------------------------------------------------------
thinglist* thinglist_deserialize(char* filepath)
{
    char name[256];
    int count;
    double value;
    time_t timestamp;
    int trueorfalse;

    FILE* fp;

    fp = fopen(filepath, "r");

    if(fp != NULL)
    {
        thinglist* tl = thinglist_new();

        if(tl != NULL)
        {
            while(!feof(fp))
            {
                fscanf(fp,
                    "%256[^\n]\n%d\n%lf\n%ld\n%d\n",
                    name,
                    &count,
                    &value,
                    ×tamp,
                    &trueorfalse);

                thinglist_append(tl, name, count, value, timestamp, trueorfalse);
            }

            fclose(fp);

            return tl;
        }
        else
        {
            return NULL;
        }
    }
    else
    {
        return NULL;
    }
}

Lastly we have the thinglist_deserialize function, which takes a file path and returns a thinglist pointer. We need five variables to hold the various items of data being read from the file - these will be passed to the thinglist_append function. After these are declared we create and attempt to open a FILE pointer for reading. If this is successful we create a thinglist pointer using thinglist_new, and then enter a while loop through the file.

This loop uses the fscanf which is the equivalent of fprintf but for reading. It works in a very similar way, first taking a FILE pointer and then a format string, after which come the various variable names to hold data as it is read in. There are a couple of points to note here:

  • The format string starts with the bizarre looking "%256[^\n]". This means "read up to 256 characters (the input buffer size) until you hit a newline character".

  • fscanf needs pointers to variables to write data to, therefore we use the & character before the variable names, apart from the name string which is a char pointer anyway.

At the end of every iteration of the loop we call thinglist_append to stick a new "thing" onto the list using the values read in from the file by fscanf. After the loop exits we can close the file and return the new thinglist pointer.

The thinglist.c file is now finished so we just need a main function to try it out.

main.c

#include<stdio.h>
#include<stdlib.h>

#include"thinglist.h"

//--------------------------------------------------------
// FUNCTION main
//--------------------------------------------------------
int main(int argc, char* argv[])
{
    puts("---------------------------------------------\n| code-in-c.com - Serialize and Deserialize |\n---------------------------------------------");

    thinglist* tl = thinglist_new();

    if(tl != NULL)
    {
        thinglist_append(tl, "A Thing", 123, 1.234, time(NULL), true);
        thinglist_append(tl, "Another Thing", 234, 2.345, time(NULL), false);
        thinglist_append(tl, "Yet Another Thing", 345, 3.456, time(NULL), true);

        puts("\nData to serialize\n=================\n");

        thinglist_print(tl);

        if(thinglist_serialize(tl, "thinglist1.dat"))
        {
            puts("\nDeserialized data\n=================\n");

            thinglist* tld = thinglist_deserialize("thinglist1.dat");

            if(tld != NULL)
            {
                thinglist_print(tld);

                thinglist_free(tld);
            }
        }

        thinglist_free(tl);
    }

    return EXIT_SUCCESS;
}

This should be reasonably self-explanatory. Firstly we attempt to create a new thinglist and add three "things" to it (I omitted checking the return value of thinglist_append for brevity) and then print and serialize it. We then deserialize the file and print the deserialized data to prove it's the same as the original.

We can now compile and run the program with these commands in the terminal.

Compile and run

gcc main.c thinglist.c -std=c11 -g -lm -o main
./main

Program output

---------------------------------------------
| code-in-c.com - Serialize and Deserialize |
---------------------------------------------

Data to serialize
=================

name:          A Thing
count:         123
value:         1.234000
timestamp:     1521024497
Date and Time: Wed Mar 14 10:48:17 2018
trueorfalse:   true
---------------------------------------
name:          Another Thing
count:         234
value:         2.345000
timestamp:     1521024497
Date and Time: Wed Mar 14 10:48:17 2018
trueorfalse:   false
---------------------------------------
name:          Yet Another Thing
count:         345
value:         3.456000
timestamp:     1521024497
Date and Time: Wed Mar 14 10:48:17 2018
trueorfalse:   true
---------------------------------------

Deserialized data
=================

name:          A Thing
count:         123
value:         1.234000
timestamp:     1521024497
Date and Time: Wed Mar 14 10:48:17 2018
trueorfalse:   true
---------------------------------------
name:          Another Thing
count:         234
value:         2.345000
timestamp:     1521024497
Date and Time: Wed Mar 14 10:48:17 2018
trueorfalse:   false
---------------------------------------
name:          Yet Another Thing
count:         345
value:         3.456000
timestamp:     1521024497
Date and Time: Wed Mar 14 10:48:17 2018
trueorfalse:   true
---------------------------------------

This code is of course tailored to serialize and deserialize a specific list of structs. However, the principle can be easily adapted to deal with any data structure.

Please let me have your comments and suggestions, and follow Code in C on Twitter for news of future posts and other C programming stuff.

One thought on “Serializing and Deserializing Data Structures

  1. So…realloc. I had to re-read the docs for realloc a few times before it made sense to me. It also doesn’t help that some online resources explain realloc incorrectly. I ended up doing a ‘man realloc’, then double checked it in both the 1989 and 1999 C standards. This is why I like your blog.

    So, given the function prototype:

    “`void *realloc(void *ptr, size_t size);“`

    1) if ptr is NULL, realloc functions the same as malloc, so you can actually get rid of the first if-else statement in thinglist_append(). The way you have it currently still works though, so this is more of a refactoring thing. The way you have it is arguably more readable too, though it’s an argument I’d by happy to avoid instead of taking one side or the other.

    2) Now the weird part. If realloc fails to allocate, the memory pointed to by t1->things prior to the realloc call will not be freed, however, realloc will still return NULL. The way you have things, t1->things will then be set to NULL on a failed realloc call, leaving the unfreed memory still allocated, but with nothing pointing to it. I made this exact same mistake while using the curl C API at a previous job. The curl examples used realloc this way and I just copied what they did without thinking twice about it.

    Other things: possible data corruption or seg fault in thinglist_append on failed call to:
    “`(tl->things + (tl->size))->name = malloc(strlen(name) + 1);“`

    If this allocation fails, t1->things that was either alloced or realloced will contain junk for the “name” member, and who knows where that junk points. That memory is never freed. Suggested implementation:
    “`
    struct thing *tmp = realloc(tl->things, sizeof(thing) * (tl->size + 1));
    if (tmp == NULL)
    return false;

    (tl->things + tl->size)->name = malloc(strlen(name) + 1);
    if ((t1->things + tl->size)->name == NULL) {
    free(tmp);
    return false;
    }

    tl->things = tmp;
    /* then do strcpy, tl->size++, etc here */
    return true;
    “`
    Some nit-picky things. Return values for calls to thinglist_append in main are never checked. Returns EXIT_SUCCESS even if call to thinglist_new fails.

    (I’m trying to include markdown in this comment. I have no idea if it will work.)

Leave a Reply

Your email address will not be published. Required fields are marked *