Graphing Data Using a Logarithmic Plot

The majority of data can easily be plotted on a graph with equal intervals on the axes, for example 1, 2, 3 or 100, 200, 300 etc.. Some data, typically that which increases or decreases exponentially, cannot comfortably be graphed on such a scale without squashing the data up so much at one end that it becomes incomprehensible. The solution to this problem is to use a logarithmic scale.

The Problem

Consider the data in the following table. Graphing this data with equal axis intervals of, say, 100,000 would make the differences in the lower values indistinguishable, and a scale to show them distinctly would make the graph impossibly large.

Table 1
LabelData
19102
19206
193029
194084
1950361
1960622
19704106
19806951
199015994
200081022
2010198240
2020765008

The Solution

To show the lower values distinctly but still fit all the data on a reasonably sized graph we need to plot the logarithms of the data rather than the data itself, using a scale which increases exponentially. Assuming we are using a base 10 scale, the increments on the axis would be 1, 10, 100, 1000 etc..

Let's look at the data again, this time including the logarithm (to base 10) of the data.

Table 2
LabelDatalog10(Data)
191020.301030
192060.778151
1930291.462398
1940841.924279
19503612.561101
19606222.793790
197041063.613419
198069513.842047
1990159944.203957
2000810224.908603
20101982405.297191
20207650085.883666

We have now reduced the data to a range of approximately 0.3 to 5.8, which can comfortably be shown on a graph with an axis of perhaps 0 to 10. Note though that the axis will not be labeled 0-10, but instead with 10 (or whatever base we are using) to the power of 0 to 10, as shown in the following table.

Table 3
Interval ValuesPower EquationAxis Label
61061000000
5105100000
410410000
31031000
2102100
110110
01001

For this project we will write a short program to create a logarithmic plot of the sample data shown above, and save it as an SVG file looking like this.

logarithmic plot

The sample data is only very approximately exponential but is still reduced to roughly a straight line when plotted here. If the data were exactly exponential the points on the logarithmic plot would be on an exact straight line, but would have an ever-increasing gradient if plotted on a interval scale.

This project uses the SVG library I wrote for an earlier post. I won't include that code here but you might wish to take a look at the post to get an idea how the SVG library works.

Coding

Create a new folder somewhere and in it create the following empty files. You can download the source code from the Downloads page if you prefer, and the source code zip also contains the SVG library files.

  • data.h
  • data.c
  • logarithmicplot.c

Open data.h and enter the following.

data.h

//--------------------------------------------------------
// FUNCTION PROTOTYPES
//--------------------------------------------------------
void populate_data(double data[12], double labels[12]);

Then open data.c and enter the function body.

data.c

#include<string.h>

//--------------------------------------------------------
// FUNCTION populate_data
//--------------------------------------------------------
void populate_data(double data[12], double labels[12])
{
    memcpy(labels, (double[12]){1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020}, sizeof(double[12]));

    memcpy(data, (double[12]){2,6, 29,84, 364,622, 4106, 6951, 15994, 81022, 198240, 765008}, sizeof(double[12]));
}

The data.h and data.c files simply implement a quick and dirty way of getting some data suitable for plotting on a logarithmic scale. We can now move on to writing the code to create the actual graph, so open logarithmicplot.c and enter the #includes, function prototypes and main function.

logarithmicplot.c (part 1)

#include<stdio.h>
#include<math.h>
#include<time.h>
#include<locale.h>
#include<stdlib.h>

#include"data.h"
#include"svg.h"

//--------------------------------------------------------
// FUNCTION PROTOTYPES
//--------------------------------------------------------
void print_data(double* data, double* labels, int size);
void draw_logarithmic_plot(int width,
                           int height,
                           char* title,
                           double* data,
                           double* labels,
                           int size,
                           int maxpower,
                           char* filename);

//--------------------------------------------------------
// FUNCTION main
//--------------------------------------------------------
int main(int argc, char* argv[])
{
    puts("------------------------------------\n| code-in-c.com - Logarithmic Plot |\n------------------------------------\n");

    double data[12];
    double labels[12];

    populate_data(data, labels);

    print_data(data, labels, 12);

    draw_logarithmic_plot(720, 540, "Logarithmic Plot", data, labels, 12, 6, "logarithmicplot1.svg");

    return EXIT_SUCCESS;
}

I'll discuss the print_data and draw_logarithmic_plot when we actually implement them, but for the moment let's just look at the main function. Firstly it creates a couple of double arrays (for the purposes of creating sample data the size is hard coded) and then passes them to populate_data. We then call print_data to show the data on screen, and draw_logarithmic_plot to create and save the graph.

Now let's look at the print_data function which can be added to logarithmicplot.c.

logarithmicplot.c (part 2)

//--------------------------------------------------------
// FUNCTION print_data
//--------------------------------------------------------
void print_data(double* data, double* labels, int size)
{
    puts("       label         data  log10(data)\n--------------------------------------");

    for(int i = 0; i < size; i++)
    {
        printf("%12.0lf %12.0lf %12.6lf\n", labels[i], data[i], log10(data[i]));
    }
}

This prints out the data in the same format as in the second table above. Of course it is not necessary for the main task of creating a graph, but does provide a useful indicator of how data maps to its corresponding logarithmic values. Finally we can move on to the draw_logarithmic_plot function.

logarithmicplot.c (part 3)

//--------------------------------------------------------
// FUNCTION draw_logarithmic_plot
//--------------------------------------------------------
void draw_logarithmic_plot(int width, int height, char* title, double* data, double* labels, int size, int maxpower, char* filename)
{
    int topmargin = 64;
    int bottommargin = 32;
    int leftmargin = 86;
    int rightmargin = 32;

    int graph_height = height - topmargin - bottommargin;
    int graph_width = width - leftmargin - rightmargin;
    double pixels_per_unit_x = (double)graph_width / (double)(size - 1);
    double pixels_per_unit_y = (double)graph_height / (double)maxpower;
    double x;
    double y;
    char number_string[8];

    // Create svg struct
    svg* psvg;
    psvg = svg_create(width, height);

    if(psvg == NULL)
    {
        puts("psvg is NULL");
    }
    else
    {
        svg_fill(psvg, "#FFFFFF");

        // header text and border lines
        svg_text(psvg, width/2, 38, "sans-serif", 16, "#000000", "#000000", "middle", title);
        svg_line(psvg, "#808080", 2, leftmargin, topmargin, leftmargin, height - bottommargin);
        svg_line(psvg, "#808080", 2, leftmargin, height - bottommargin, width - rightmargin, height - bottommargin);

        // y axis indexes and values
        y = height - bottommargin;
        for(int power = 0; power <= maxpower; power++)
        {
            svg_line(psvg, "#808080", 1, leftmargin - 8, y, leftmargin, y);

            sprintf(number_string, "%.0lf", pow(10, power));
            svg_text(psvg, leftmargin - 12, y + 4, "sans-serif", 10, "#000000", "#000000", "end", number_string);

            y -= pixels_per_unit_y;
        }

        // x axis indexes and values
        x = leftmargin;
        for(int i = 0; i < size; i++)
        {
            svg_line(psvg, "#808080", 1, x, height - bottommargin, x, height - bottommargin + 8);

            sprintf(number_string, "%.0lf", labels[i]);
            svg_text(psvg, x, height - bottommargin + 24, "sans-serif", 10, "#000000", "#000000", "middle", number_string);

            x += pixels_per_unit_x;
        }

        // plot data
        x = leftmargin;

        for(int d = 0; d < size; d++)
        {
            y = height - bottommargin - (log10(data[d]) * pixels_per_unit_y);

            svg_circle(psvg, "#0000FF", 0, "#0000FF", 3, x, y);

            x += pixels_per_unit_x;
        }

        // finish off
        svg_finalize(psvg);

        svg_save(psvg, filename);

        puts("File saved");

        svg_free(psvg);
    }
}

In the draw_logarithmic_plot function we first create a few variables:

  • topmargin, bottommargin, leftmargin and rightmargin - the sizes of the four margins in pixels
  • graph_height and graph_width - the size of the actual graph inside the margins
  • pixels_per_unit_x and pixels_per_unit_y - the number of pixels used to represent each unit of data
  • x and y - these will be used several times for the location of the various elements of the graph
  • number_string - we will sprintf numbers to this to get them in a string form suitable for drawing on the graph

We can then create an SVG struct - refer to the SVG Library post if you want to know the full details of how this works. If the struct creation is successful we can then fill its background, which I have hardcoded as white, and then draw the title and axis lines.

We then use a pair of for loops to draw the indices and values on the two axes, and a third for loop to calculate the position of and draw a small circle for each data point. Note the use of the log10 function in the calculation; this function lives, not surprisingly, in math.h.

That's the graph drawn so we then call svg_finalize, which basically just adds a closing tag, then svg_save to write the SVG to a file. Then we just write out a message and call svg_free to free up the dynamic memory used by the SVG library.

The code is now finished so we can compile and run it - enter this in your terminal.

Compile and Run

gcc logarithmicplot.c data.c svg.c -std=c11 -lm -o logarithmicplot

./logarithmicplot

The program output itself isn't hugely exciting, basically just the stuff in one of the tables above.

Program Output

------------------------------------
| code-in-c.com - Logarithmic Plot |
------------------------------------

       label         data  log10(data)
--------------------------------------
        1910            2     0.301030
        1920            6     0.778151
        1930           29     1.462398
        1940           84     1.924279
        1950          364     2.561101
        1960          622     2.793790
        1970         4106     3.613419
        1980         6951     3.842047
        1990        15994     4.203957
        2000        81022     4.908603
        2010       198240     5.297191
        2020       765008     5.883666
File saved

But if you open the folder where you saved your source code you'll find a newly-created file called logarithmicplot1.svg, which you can double click to open with your default image viewer.

Bases Other Than 10

The code in this project uses base 10, which is likely to be the most appropriate for the majority of data. However, there is no reason why you shouldn't use another base if necessary. As an example, if you were plotting the growth of computer memory over the years base 2 would be more appropriate.

Dealing With Fractions

The sample data used for this project consisted only of values >= 1. Fractions and negatives can also be plotted using logarithmic scales and in this section I'll show a couple of tables demonstrating values between 0 and 1.

Firstly let's look at 10 to the power of negative integers. This table is equivalent to Table 3 above, and shows values getting an order of magnitude smaller each step instead of larger.

Table 4: 10 to negative powers
Interval ValuesPower EquationAxis Label
01001
-110-10.1
-210-20.01
-310-30.001
-410-40.0001
-510-50.00001
-610-60.000001

Now lets look at some sample data with its base 10 logarithms.

Table 5: Fractional data
LabelDatalog10(data) to 6 dp
19100.9-0.045757
19200.36-0.443697
19300.081-1.091515
19400.052-1.283997
19500.0064-2.193820
19600.0012-2.920819
19700.00092-3.036212
19800.00049-3.309804
19900.000051-4.292430
20000.000011-4.958607
20100.0000077-5.113509
20200.0000029-5.537602

The raw data ranges from 0.0000029 to 0.9 which, as with the data in Table 1, is too wide a range to sensibly plot as it is but using the logarithms we reduce the values to fit neatly within the range 0 to -6 shown in table 4.

Dealing With Negative Data

Negative values can also be plotted on a logarithmic scale but are rather fiddly as the absolute (positive) values must be used for calculating the logarithm, therefore the log increases as the actual negative data values decreases. It is therefore necessary to invert the plot. This should be clearer with another table.

Table 6: Negative Data
LabelDataabs(Data)log10(abs(Data))
1910-220.301030
1920-660.778151
1930-29291.462398
1940-84841.924279
1950-3613612.561101
1960-6226222.793790
1970-410641063.613419
1980-695169513.842047
1990-15994159944.203957
2000-81022810224.908603
2010-1982401982405.297191
2020-7650087650085.883666

The data in this table are the negatives of the sample data we plotted. Therefore if we take the logarithms of the absolute values we end up plotting the exact same numbers, which of course is wrong. However, if we plot downwards instead of upwards, effectively mirroring the graph along the x-axis, we will get the correct result.

Combining Negative, Fractional and Positive Data Values

Combining positive and fractional data is no problem - we can just extend the solution we developed above so that the powers run from negative through to positive, as show in the following table which combines tables 3 and 4.

Including negative values presents a bit of a problem though, which can only be resolved by dealing with the negative values separately, both when drawing the indexes and plotting the data points.

Table 7: Negative Through to Positive Powers
Interval ValuesPower EquationAxis Label
61061000000
5105100000
410410000
31031000
2102100
110110
01001
-110-10.1
-210-20.01
-310-30.001
-410-40.0001
-510-50.00001
-610-60.000001

Conclusion

This has been a very basic introduction to the rather esoteric topic of logarithmic plots, but I hope I have got the principles across sufficiently to give a foundation on which to build should you need to do so. Please let me have your comments and suggestions, and follow Code in C on Twitter for news of future posts and other C programming stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *