File Byte Reader

Writing code to read or write text files can be tricky because the exact contents of a file cannot be viewed in a text editor if the file contains non-printable characters such as line feeds or carriage returns. This simple utility program will take a filename as a command line argument and print out its exact contents, including descriptions of any non-printable or whitespace characters.

Let's get straight into coding by creating a folder and within it an empty document called filebytereader.c. You can also download the source code from the Downloads page if you prefer. Open the file and enter the following code.

filebytereader.c (part 1)

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<ctype.h>
#include<stdbool.h>

//--------------------------------------------------------
// FUNCTION PROTOTYPES
//--------------------------------------------------------
void readfile(char* filepath);
void populate_mappings(char** mappings);
bool set_value(char** array, int index, char* value);

//--------------------------------------------------------
// FUNCTION main
//--------------------------------------------------------
int main(int argc, char* argv[])
{
    puts("----------------------------------\n| code-in-c.com - FileByteReader |\n----------------------------------\n");

    if(argc != 2)
    {
        printf("input file must be specified");
    }
    else
    {
        printf("File: %s\n\n", argv[1]);

        readfile(argv[1]);
    }

    return EXIT_SUCCESS;
}

The main function is very simple - it checks that a command line argument has been supplied and, if so, prints it out before calling the readfile function. The name of the executable is always passed as the first argument to main so argc is always at least 1. The first actual command line argument is therefore argv[1] which is what we pass to the readfile function.

At the core of this program is an array of strings which map the individual characters in the file to what is actually displayed. Most characters (letters, numerals, punctuation etc.) will be displayed as themselves, but whitespace and non-printable characters will show a description instead. For example a space will be displayed as [space] and a line feed will be displayed as [line feed]. We therefore need a function populate_mappings, so go back to the source code and enter the following.

filebytereader.c (part 2)

//--------------------------------------------------------
// FUNCTION populate_mappings
//--------------------------------------------------------
void populate_mappings(char** mappings)
{
    // initialize to default values
    for(int i = 0; i <= 127; i++)
    {
        mappings[i] = malloc(2);

        sprintf(mappings[i], "%c", i);
    }

    // replace non-printable characters with descriptions
    set_value(mappings, 0, "[null]");
    set_value(mappings, 1, "[start of heading]");
    set_value(mappings, 2, "[start of text]");
    set_value(mappings, 3, "[end of text]");
    set_value(mappings, 4, "[end of transmission]");
    set_value(mappings, 5, "[enquiry]");
    set_value(mappings, 6, "[acknowledge]");
    set_value(mappings, 7, "[bell]");
    set_value(mappings, 8, "[backspace]");
    set_value(mappings, 9, "[tab]");
    set_value(mappings, 10, "[line feed]");
    set_value(mappings, 11, "[vertical tab]");
    set_value(mappings, 12, "[form feed]");
    set_value(mappings, 13, "[carriage return]");
    set_value(mappings, 14, "[shift out]");
    set_value(mappings, 15, "[shift in]");
    set_value(mappings, 16, "[data link escape]");
    set_value(mappings, 17, "[device control 1]");
    set_value(mappings, 18, "[device control 2]");
    set_value(mappings, 19, "[device control 3]");
    set_value(mappings, 20, "[device control 4]");
    set_value(mappings, 21, "[negative acknowledge]");
    set_value(mappings, 22, "[synchronous idle]");
    set_value(mappings, 23, "[end of trans. block]");
    set_value(mappings, 24, "[cancel]");
    set_value(mappings, 25, "[end of medium]");
    set_value(mappings, 26, "[substitute]");
    set_value(mappings, 27, "[escape]");
    set_value(mappings, 28, "[file separator]");
    set_value(mappings, 29, "[group separator]");
    set_value(mappings, 30, "[record separator]");
    set_value(mappings, 31, "[unit separator]");
    set_value(mappings, 32, "[space]");
    set_value(mappings, 127, "[delete]");
}

//--------------------------------------------------------
// FUNCTION set_value
//--------------------------------------------------------
bool set_value(char** array, int index, char* value)
{
    array[index] = realloc(array[index], strlen(value) + 1);
    strcpy(array[index], value);
}

The populate_mappings function first initializes all characters to the character corresponding to the array index, eg. mappings[97] will be initialized to "a". We then need to replace the non-printable and whitespace characters with their descriptions. To avoid repetitive calls to strlen and realloc I have written a separate function set_value to do this. Within populate_mappings this is called a number of times to carry out the replacements.

We can now polish off the coding with the readfile function.

filebytereader.c (part 3)

//--------------------------------------------------------
// FUNCTION readfile
//--------------------------------------------------------
void readfile(char* filepath)
{
    char* mappings[128];

    populate_mappings(mappings);

    // attempt to open file
    FILE* fpinput;
    fpinput = fopen(filepath, "r");

    char c;

    int i = 1;

    if(fpinput != NULL)
    {
        puts("------------------------------------------------------");
        puts("| Pos   | Code | Printable? | Character              |");
        puts("------------------------------------------------------");

        // iterate input file,
        // printing corresponding values from mappings array
        while((c = fgetc(fpinput)) != EOF)
        {
            if(c >= 0 && c <= 127)
            {
                printf("| %-5d | %-4d | %-10s | %-22s |\n", i, c, isprint(c) ? "Yes" : "No",  mappings[c]);
            }
            else
            {
                printf("| %-5d | %-4s | %-10s | %-22s |\n", i, "", "",  "Outside ASCII range");
            }

            i++;
        }

        fclose(fpinput);

        puts("------------------------------------------------------");
    }
    else
    {
        printf("Cannot open %s\n", filepath);
    }

    // free up malloc'ed memory
    for(int i = 0; i <= 127; i++)
    {
        free(mappings[i]);
    }
}

In readfile we create a char array and pass it to populate_mappings, and then attempt to open the specified file. Checking the file was successfully opened, we iterate it one character at a time using fgetc. This program is intended to work only on ASCII files, not files containing Extended ASCII or Unicode, so we check the character is between 0 and 127. If so we print its position in the file (1 rather than 0 based), the ASCII code, whether it is printable with the isprint function, and finally the value from the mappings array which is either the character itself or a description. For characters outside the 0-127 range we just print a suitable message. After we hit EOF the while loop exits and we can fclose the file. The mappings array uses dynamic memory so we need to call free on it.

I have included a text file called ascii.txt with the download which simply contains the characters 0 to 127 in order. We can now compile and run the program, and if you run it with ascii.txt it will produce the output shown below.

Compile and Run

gcc filebytereader.c -std=c11 -lm -o filebytereader

./filebytereader ascii.txt

Program Output (partial)

----------------------------------
| code-in-c.com - FileByteReader |
----------------------------------

File: ascii.txt

------------------------------------------------------
| Pos   | Code | Printable? | Character              |
------------------------------------------------------
| 1     | 0    | No         | [null]                 |
| 2     | 1    | No         | [start of heading]     |
| 3     | 2    | No         | [start of text]        |
| 4     | 3    | No         | [end of text]          |
| 5     | 4    | No         | [end of transmission]  |
| 6     | 5    | No         | [enquiry]              |
| 7     | 6    | No         | [acknowledge]          |
| 8     | 7    | No         | [bell]                 |
| 9     | 8    | No         | [backspace]            |
| 10    | 9    | No         | [tab]                  |
| 11    | 10   | No         | [line feed]            |
| 12    | 11   | No         | [vertical tab]         |
| 13    | 12   | No         | [form feed]            |
| 14    | 13   | No         | [carriage return]      |
| 15    | 14   | No         | [shift out]            |
| 16    | 15   | No         | [shift in]             |
| 17    | 16   | No         | [data link escape]     |
| 18    | 17   | No         | [device control 1]     |
| 19    | 18   | No         | [device control 2]     |
| 20    | 19   | No         | [device control 3]     |
| 21    | 20   | No         | [device control 4]     |
| 22    | 21   | No         | [negative acknowledge] |
| 23    | 22   | No         | [synchronous idle]     |
| 24    | 23   | No         | [end of trans. block]  |
| 25    | 24   | No         | [cancel]               |
| 26    | 25   | No         | [end of medium]        |
| 27    | 26   | No         | [substitute]           |
| 28    | 27   | No         | [escape]               |
| 29    | 28   | No         | [file separator]       |
| 30    | 29   | No         | [group separator]      |
| 31    | 30   | No         | [record separator]     |
| 32    | 31   | No         | [unit separator]       |
| 33    | 32   | Yes        | [space]                |
| 34    | 33   | Yes        | !                      |
| 35    | 34   | Yes        | "                      |
| 36    | 35   | Yes        | #                      |
| 37    | 36   | Yes        | $                      |
| 38    | 37   | Yes        | %                      |
| 39    | 38   | Yes        | &                      |
| 40    | 39   | Yes        | '                      |
| 41    | 40   | Yes        | (                      |
| 42    | 41   | Yes        | )                      |
| 43    | 42   | Yes        | *                      |
| 44    | 43   | Yes        | +                      |
| 45    | 44   | Yes        | ,                      |
| 46    | 45   | Yes        | -                      |
| 47    | 46   | Yes        | .                      |
| 48    | 47   | Yes        | /                      |
| 49    | 48   | Yes        | 0                      |
| 50    | 49   | Yes        | 1                      |
| 51    | 50   | Yes        | 2                      |
| 52    | 51   | Yes        | 3                      |
| 53    | 52   | Yes        | 4                      |
| 54    | 53   | Yes        | 5                      |
| 55    | 54   | Yes        | 6                      |
| 56    | 55   | Yes        | 7                      |
| 57    | 56   | Yes        | 8                      |
| 58    | 57   | Yes        | 9                      |
| 59    | 58   | Yes        | :                      |
| 60    | 59   | Yes        | ;                      |
| 61    | 60   | Yes        | <                      |
| 62    | 61   | Yes        | =                      |
| 63    | 62   | Yes        | >                      |
| 64    | 63   | Yes        | ?                      |
| 65    | 64   | Yes        | @                      |
| 66    | 65   | Yes        | A                      |
| 67    | 66   | Yes        | B                      |
| 68    | 67   | Yes        | C                      |
| 69    | 68   | Yes        | D                      |
| 70    | 69   | Yes        | E                      |
| 71    | 70   | Yes        | F                      |

Please let me have your comments and suggestions, and follow Code in C on Twitter for news of future posts.

One thought on “File Byte Reader

Leave a Reply

Your email address will not be published. Required fields are marked *