Pointers in C

Posted on 26.12.2007 by Kim N. Lesmer.
This is an article on how pointers work in the programming language C. The article is addressed to beginners. Rather than focusing a lot on the more technical aspects of pointers, and using complex examples, the article focuses on a deeper understanding of the subject so that the issue is understood firmly. The article assumes a little bit of prior knowledge of C, such as assigning values to variables, compiling, printing to the screen and using comments, but it is not necessary.

Table of contents

One of the greatest strength of C is that it allows the programmer to gain access to low-level hardware such as memory locations. While being the greatest strength it is also the greatest weakness because with great power comes great responsibility. Actually I don't know if it is right to call it a weakness, but many bugs arise from the improper use of C.

It is often said that pointers are one of the greatest hurdles a beginner must overcome in using the C programming language.

Variables

In reality a pointer is nothing more than a simple variable. A variable is a symbolic representation consisting of letters and/or numbers that represent a value of some sort. Variables are best thought of as boxes stabled upon one another. The boxes can hold values and each box will be labeled with some symbolic representation so that we can recognize the different boxes from each other. Besides from the symbolic representation each box also has a number. Box number 1 is in the bottom, box number 2 is placed upon box number 1, and box number 3 is placed upon box number 2 and so forth.

To access the boxes we can use either the symbolic representation or the box number. When we access the different boxes we can insert items and remove items from the boxes. When we enter data into our computer memory we are accessing our boxes. Computer memory, just like the boxes, can be represented with a symbolic representation and each memory location has its own address.

In mid-level and high-level programming languages the compiler or interpreter will take care of handling the boxes for us. We don't have to think much about what box resides at what location in memory. If we have to manage this ourselves it becomes extremely time consuming and difficult to program, but that's how it actually was done in the beginning.

The old way

Lets imagine that you need to store some data in the computer memory. Lets imagine that you are using the computer to calculate some numbers and need to save the result for later usage.

To achieve this the old way, yet simplified, you would first have to figure out what parts of memory are free for usage. Next you have to make sure that the free memory isn't reserved for some other usage by the operating system. Next you have to reserve the part of memory that you need and then you need to fill it with the relevant data.

Lets imagine that you want to store the integer "5" in memory. After gaining access to the memory the operating system supplies you with a memory address, and just for the fun of it, lets call that address "Box number 27".

Now lets imagine for a moment that after a while you need to store another number in the memory as well, the number "6". Again you ask the operating system for access to the memory, but you can't get box number 28, because that has been occupied by some other program, instead you get "Box number 578".

So "Box number 27" holds the value "5" and "Box number 578" holds the value "6".

What about if you had 255 different numbers of different length? It would become quite difficult to keep track. That's where the blessings of a mid-level or high-level programming language comes in.

The new way

From this point on I will just referere to both mid-level and high-level languages as high-level.

Using a high-level programming language means that you no longer need to keep track of memory. All you have to do is to use symbolic representations and the compiler or interpreter will take care of the rest.

A variable serves as an easy way to access memory. Think of a variable as a box from before. Inside variables we can input data such as numeric values, but in C we can also input the physical memory addresses of other boxes.

So "Box number 27" can hold the value "5" which can represents the amount of money on my bank account, but it can also hold the value "578" which is in reality the physical address of "Box number 578". The same goes for variables.

In reality a variable is the memory. The memory can contain values such as the amount of money on my bank account, or it can contain addresses of other blocks of memory.

Pointers

When a variable contains the address of some memory block location it is called a "pointer" because it is "pointing" at that particular block of memory.

All programming languages contains variables of some kind, but only few contains pointers. That's because pointers gives direct access to physical memory locations anywhere inside the computer.

With pointers it is possible to access any memory location and change the data at that location. Even specific data from the operating system itself - and that's partly why computer programs sometimes crash.

Working with pointers

Enough theory, lets get to the work..

To use variables in C they must first be declared. A lot of other high-level programming languages such as PHP, for example, allows you to work with variables without using declarations. Such languages are much more safe because they don't allow improper use of variables, but they also take the real power away, and power is sometimes needed, especially if you are writting a kernel for an operating system, or a device driver for a network card.

To declare a variable in C you make up a symbolic representation (I will not address the issue of what characters may be used or reserved words). Next you have to let the compiler know, what kind of data you are going to enter into that particular variable. The symbolic representation will be created by the compiler and the right amount of space will be reserved in memory to hold that value.

Lets declare a variable called my_var and reserve room in memory to hold an integer value:

#include <stdio.h>

int main()
{
    int my_var;
    return 0;
}

The variable my_var is now ready to receive some data, and we can define the data of the variable by assigning it a specific value. Lets take it a step futher and do that:

1. #include <stdio.h>
2.
3. int main()
4. {
5.      int my_var;
6.      my_var = 5;
7.
8.      return 0;
9. }

On line 5 the variable gets declared with the name my_var, and it is declared to hold an integer value. On line 6 the variable is assigned the value "5".

Now, where in the computer memory is the number "5" physically located? We don't know, but we can find out using the symbol & ampersand.

Lets print out both the value of the variable my_var and the physical memory location of my_var:

#include <stdio.h>

int main()
{
    int my_var;
    my_var = 5;

    /* Print out the value of my_var. */
    printf ("%d", my_var);

    /* Print out the physical memory address of my_var. */
    printf ("%p", &my_var);

    return 0;
}

If you compile the program and run it, it will print "5" to the screen, and in my case, the memory address "0xbfae625c". To compile it, save it as "mytest.c" and compile it with the command (assuming you are using the GNU GCC Compiler): gcc -o mytest mytest.c, next run it with the command: ./mytest

Now, what if we want to add another variable, lets call it mem_var, to contain the physical memory address of the first variable?

We can do that by declaring a variable as a pointer, and that is done using the asterisk sign * like this:

int *mem_var;

The variable mem_var now becomes a pointer, but since it hasn't been initialized with any memory address, it just holds some random number. That random number can literally point to anywhere in the computer memory, and for safety reasons it is always a good idea to default its value to NULL upon declaration, like this:

int *mem_var = NULL;

Once a variable has been declared as a pointer it is dangerous to mess with it. You cannot keep ordinary values inside the variable, it is only supposed to contain memory addresses.

In other words: A variable in C can normally contain numbers, chars, etc., depending on how you declare them, but once a variable is declared as a pointer, it MUST only contain memory location addresses. If you assign a number to a variable that has been declared as a pointer, the compiler will automatically assume that the number is a valid memory location - no matter what that number is!

Some programming languages like Ada also make use of pointers, but pointers in Ada are default to NULL automatically, thus making it more safe.

Null is a special pointer value used to signify that a pointer intentionally does not point to an address yet. Such a pointer is called a null pointer in C.

Make it a strong habit to ALWAYS declare C pointers as NULL right away.

Keeping things apart

The method I use to remember how to keep variables and pointers apart, is that I think of the asterisk sign (*) as a riffles aim, "pointing" at my target. The star is the eye of the aim.

The way I remember that the & ampersand means the address of the memory where the data is stored is by thinking of the "A" in "Ampersand" as the "A" for "Address".

Maybe this isn't the best way, but I manage to keep them apart like that.

When declaring a pointer the asterisk sign can also be located next to the type declaration like this:

int* mem_var;

But I prefer to keep it next to the variable.

Continue working with pointers

So now we have got a variable, and we have got the address in memory where the data of that variable is located, and we have also got a pointer pointing to NULL.

Lets make the pointer point to the address of the variable my_var. I will now expand our little program a bit:

1.  #include <stdio.h>
2.
3.  int main()
4.  {
5.      int my_var = 5;
6.      int *mem_var = NULL;
7.
8.      mem_var = &my_var;
9.
10.    /* Adds a newline for better readability. */
11.    printf ("\\\\n");
12.
13.   /* Print out the value of my_var. */
14.    printf ("%d", my_var);
15.
16.    /* Print out the physical memory address of my_var. */
17.    printf ("%p", &my_var);
18.
19.    printf ("\\\\n");
20.
21.    /* Print out the physical memory address of my_var using */
22.    /* the pointer mem_var. */
23.    printf ("%p", mem_var);
24.    printf ("\\\\n");
25.
26.    return 0;
27. }

On line 5 the variable my_var gets declared to hold the value of an integer and it is assigned the value 5 upon declaration. On line 6 the variable mem_var gets declared as a pointer and is assigned the NULL. On line 9 the variable "mem_var" gets assigned a new value and that is the physical memory location address where the value "5" is actually stored. In my case the value "5" is located at memory address "0xbfae625c" (most likely different on your computer).

We get the address by the use of the & ampersand sign in front of the variable like this "&my_var", and the pointer "mem_var" now points to that address location.

A bit of confusing can arise at this point because of the asterisk sign * and the & ampersand sign.

The & ampersand sign is easy to remember, just think of the "A" in ampersand as the "A" in address. We ONLY use it in front of a variable to get the memory address of that particular variable.

Now the asterisk sign on the other hand is a bit more confusing. When we declare a variable to be a "pointer" to some address, we use the asterisk sign in front of that variable like this: `int mem_var;, but when we assign an actually memory location to the pointer, by the use an address of another variable, we don't use the asterisk sign any longer:mem_var = &my_var`

In my personal opinion this makes it much more difficult to keep ordinary variables apart from pointers, but there is a solution to the problem. What most people do is that they ALWAYS name their pointers with something obvious like "point" or "ptr". To do that our program now looks like this:

#include <stdio.h>

int main()
{
    int my_var = 5;
    int *ptr_mem_var = NULL;

    ptr_mem_var = &my_var;

    /* Adds a newline for better readability. */
    printf ("\\\\n");

    /* Print out the value of my_var. */
    printf ("%d", my_var);

    /* Print out the physical memory address of my_var. */
    printf ("%p", &my_var);

    printf ("\\\\n");

    /* Print out the physical memory address of my_var using */
    /* the pointer ptr_mem_var. */
    printf ("%p", ptr_mem_var);
    printf ("\\\\n");

    return 0;
}

Make it a strong habit to ALWAYS and ONLY use something like "ptr_" as a short for pointer in front of all your pointer variables, and NEVER use it anywhere else. That way you can more easily keep pointer variables apart from ordinary variables.

Working a bit more with pointers

What if we want to change the value located at our current memory address?

We can do that by changing the value of our variable my_var like this:

my_var = 17;

By changing the value of the variable my_var we are indirectly accessing the memory location that holds that value. To access that memory location in a more "direct" approach, we could now use the pointer that is pointing to the physical location like this:

*ptr_mem_var = 17;

A bit more confusion (perhaps)..

When we declare a variable to be a "pointer" to some address, we use the asterisk sign in front of that variable like this: int *ptr_mem_var;, when we assign an actually memory location to the pointer, by the use an address of another variable, we don't use the asterisk sign any longer: ptr_mem_var = &my_var, but when we need to change the value located at the address which pointer is pointing too, we again need the asterisk sign like this: *ptr_mem_var = 17;

In my opinion this is what actually makes the subject difficult to understand, and not the subject of memory access itself. The way that both C, C++ and D access memory by pointers, are very NOT human readable, but that's the way it is.

The & ampersand business is easy, "A" in ampersand like "A" in address, but the asterisk sign of pointers makes it more hard to remember.

When we assign memory addresses to a pointer we DON'T use the asterisk sign *.

When we assign values to memory addresses pointed to by pointers we USE the asterisk sign *.

The way I remember it is by thinking of the asterisk sign * as the aim of a riffle. If the "aim is on" we are pointing to a memory location ready to "shoot some value into it". If "the aim is off" we are NOT pointing at any memory location, and by assigning a new value, we are actually moving our aim to another target.

Some common mistakes

One of the most common mistakes is to assign a value to a pointer rather then a memory address by omitting the & ampersand sign like this:

/* Wrong code: */
ptr_mem_var = my_var;
/* Right code: */
ptr_mem_var = &my_var;

In the above example the pointer ptr_mem_var will point to the memory address located at "0x5" rather than the actual address of my_var, and that's because we have omitted the & ampersand sign. Without the ampersand sign we are not talking about addresses anymore.

Because C is about giving you power you are allowed to do the above wrong code because who knows, maybe that's what you actually want to do! Maybe you need to access the physical memory location "0x5" and you are allowed to do it, but the compiler should at least give you a warning saying something like: "mytest.c:11: warning: assignment makes pointer from integer without a cast".

The warning perhaps doesn't make sense, but at least you get a line number, and you can then take a look at what is going on.

Another common mistake is to forget to initialize a pointer.

When a pointer is first declared, and if we forget to use the NULL value as initialization, the pointer gets assigned some random data and that data could actually be pointing towards a real memory location. The risk of accessing illegal memory locations are big, and all kinds of strange things might happen. We might access some specific part of the operating system, or some part of the memory stack, in either case we need to make sure that the pointer actually points to a safe memory location. A safe memory location is a location that we know holds the value of one of our variables, or if you are actually dealing with accessing physical hardware, you know of some memory location that you need to access, and you know it is safe.

/* Wrong use: */
int *ptr_mem_var;
/* This is the mistake. */
*ptr_mem_var = 5;
/* Wrong use, but safe: */
int *ptr_mem_var = NULL;
/* Still a mistake, but now no harm is done. */
*ptr_mem_var = 5;

In the example above we have forgotten to initialize the pointer with the NULL value to make it safe.The way it should have looked to be right from the beginning is like this:

int *ptr_mem_var = NULL;
ptr_mem_var = &my_var;
*ptr_mem_var = 5;

If the phone rings just before you assign some safe memory location to your pointer, and you later forget to do it, the pointer is initialized to NULL, and no harm will be done.

A note on arrays

C treats the names of arrays as if they are pointers to the first element.

If you define an array like this:

char my_text[] = "This is a string of text.";

Then *my_text is the same as my_text[0]. If you declare a character pointer and want it to point to the address location of the character array "my_text" then you don't need the & ampersand sign, as you do when working with integer pointers.

The code below is wrong:

#include <stdio.h>

int main()
{
    char my_text[] = "This is a string of text.";
    char *ptr_text = NULL;
    ptr_text = &my_text;

    return 0;
}

The code is wrong because my_text is itself a pointer. So if you use the & ampersand in front of it, you are actually asking for the address of the pointer my_text and NOT the address of the array.

The code below is right:

#include <stdio.h>

int main()
{
    char my_text[] = "This is a string of text.";
    char *ptr_text = NULL;
    /* Pointer to pointer assignment */
    ptr_text = my_text;

    return 0;
}

Always think of arrays as actually pointers because that's what they are.

Conclusion

The use of pointers is a powerfull tool and it is one of the strongest aspects of C, but at the same time it is dangerous.

In my personal opinion the difficulty lies not in understanding the subject as much as it does in remembering how to use pointers. If time passes and you don't program in C often, you tend to forget how it works.

By the use of my imaginary analogies I am better suited at remembering how to use pointers, and I hope it will help you as well. Perhaps you can make up you own way of remembering.

In reality all variables are pointers. What makes the difference between when we call it a variable and when we call it a pointer, is what the pointer is actually pointing at. We can point at a memory location, and we then call it a "pointer", or we can point at the value located inside a memory location, and we then call it a variable.

If you have any comments or corrections feel free to email them to me.