Linked Lists

The Basics

Once you have a good understanding of pointers and data structures, you can begin to combine them to form more complex structures which can be very useful in programming.

One such structure is called a Linked List. It consists of a series of nodes that contain some sort of data, and at least one pointer to another node.

The main advantage of Linked Lists is that they can be dynamically created or destroyed. What this means is that you only have as many items in the list as you need. So you aren't wasting memory by having allocated but unused storage, and in most cases, you don't have to worry about running out of storage either. Not sure how many elements you are going to need on your stack? Make your stack with a linked list and you don't have to worry about it.

The main disadvantage is that they take slightly more storage space per element than an array would (because of the pointers), and they can be very confusing to work with, especially if you don't understand how pointers work.

Usage

I'm going to start with the simplest form of a linked list. It contains an integer variable and a pointer to the next item, or node, of the list.

struct listnode{
  int data;
  struct listnode *next;
};

This is the structure definition, and as you can see, it contains a data member which stores an int value, and a pointer to a variable of type struct listnode. Of course, you have to create nodes before you can make any use of this. To do so, use declarations like these:

struct listnode firstnode;
struct listnode secondnode;
struct listnode thirdnode;

Figure 1 - Declared Nodes

The orange 'next' indicates a pointer.

Then we link the two nodes together...

firstnode.next = &secondnode;
secondnode.next = &thirdnode;

And since there is no fourth node yet, we assign the 'next' pointer in thirdnode the value of NULL so we know it is the last element. NULL is a special C device that is used to denote an 'empty' variable. On most systems, NULL is actually equal to zero.

thirdnode.next = NULL;

Figure 2 - Linked Nodes

Now, let's create a pointer to the first node.

struct listnode *listPTR;
listPTR = &firstnode;

These two C statements create a pointer and make it point to the first node of the list. Note that the * makes this variable a pointer, not a data structure. listPTR is only a pointer to a variable of type 'struct listnode', it is not a node itself, and does not contain any of the members of a node.

Figure 3 - Pointer to the linked list

Now would be a good time to introduce a new operator. When working with pointers to structures in C, it's best to use the -> operator. It essentially works the same as the dot operator for declared structures, but is specific to pointers.

listPTR->data = 14;
firstnode.data = 14;

As you could probably already guess, both of those statements perform the same function, although one access the member directly, and the first uses a pointer. Similarly, the following are the same, though the use of pointers:

listPTR->next->data = 35;
secondnode.data = 35;

In the first statement, listPTR is pointing to the 'next' member of the node it is pointing to (in this case, firstnode). 'next' itself is a pointer, and is pointing to secondnode, where the member 'data' is changed by the statement. You are probably wondering why you would want to use a pointer when you can access the data directly, and the answer is dynamic allocation of storage.

secondnode.next->data = 97;
thirdnode.data = 97;

Again, these are equivalent statements showing how the pointers and data members of the nodes work. In the end, we get:

Figure 4 - Linked List

Dynamic Storage Allocation

This is where the purpose for using pointers and linked lists begins to make some sense. Up until this point, all of the nodes we have used in linked lists have been explicitly declared. So what happens when you don't know how many nodes you are going to need? You want a programming technique that will allow you to have as little or as many storage spaces as possible, and one that will allow you to change those as the program runs.

If you have a stack implemented with an array, you have a set number of elements. Say you make a 10 element stack, and run the program, and it only uses 4 of those spaces. That's wasted storage. Likewise, if you run the program again with different parameters and the stack requires 17 places, you are going to have a stack overflow.

If you implement a stack with a linked list, however, you can start off with no elements, and add them as you need them. So as long as you free the memory after you are finished with it, you will never waste space, and as long as you don't run out of physical memory for the program to run in, you will never run out of space.

So how do we go about implementing a linked list that can change it's size as it needs to? With pointers, of course.

One of the hardest concepts to grasp about dynamically allocated linked lists is that the nodes are not declared as variables. In fact, the nodes don't even have names, they are just anonymous storage space out in the void of memory, and the only way to access them is by having something pointing to them. That is why a solid understanding of pointers is so important; if you allocate storage without assigning a pointer to it, or remove or overwrite a pointer to one of your nodes, then that storage is lost, and in many cases your linked list is broken and unusable.

How do you create a node without a name? By the use of a C function called malloc(). malloc() is a function that takes one argument and returns one value. The argument is the amount of memory storage you require in bytes. The return value is a memory address for use in a pointer.

An int is 4 bytes, so if you were to allocate storage space for an int, you would use malloc(4). The problem is accessing that storage. To do that, you need a pointer.

int *intPTR;
listPTR = malloc(4);

This creates a pointer to an int data type called 'intPTR,' then it sets intPTR to point to the 4 bytes of storage that malloc() just created. Note that the value returned by malloc() is the address of the first byte of allocated storage, therefore you do not need to use the & operator to assign the pointer.

Another function that goes hand in hand with malloc() is a C function called sizeof(). sizeof() accepts a data type as an argument, and returns the size in bytes that the data type requires for storage. Therefore, sizeof(int) is 4, so you could use:

malloc(sizeof(int));

This is just as valid as the first use of malloc(), and actually, is the better way of doing it. In many cases, you will be using malloc() to create complex nodes that you may not know the size of. sizeof() does all that work for you.

So, let's start creating a dynamic linked list. The first thing you will need is a pointer for list manipulation.

struct listnode *listPTR;
listPTR = NULL;

This creates a pointer called 'listPTR' that points to the data type 'struct listnode' which was defined earlier in this section. Then it sets the value of the pointer to NULL, since there are no elements in the list.

Figure 5 - List Pointer

Now that you have a pointer to use, you can create your first node. You need to allocate enough memory to store a variable of type struct listnode, the earlier defined structure. To do this, you need to use sizeof() to determine the amount of allocation that each node requires, and then malloc() to allocate the storage. Remember that you must use a pointer when you use malloc() or the storage space is lost.

listPTR = malloc(sizeof(struct listnode));

That one compact line just created your first node. The only way to access that storage space right now is through the pointer, listPTR.

Figure 6 - First Node

Note that the node does not have a variable name.

Once you add nodes to your list, you are going to have to use listPTR to manipulate the data and pointers in your list, which means there will no longer be anything pointing to your first node. For this reason, you need to add a second pointer which will always point to the first node of the linked list.

struct listnode *headPTR;
headPTR = listPTR;

These two lines create a new pointer, the sole purpose of which is to point to the first node of the linked list. The second line makes headPTR point to the same thing that listPTR is pointing to, in this case, the first node of the list. Note that the actual value of listPTR is the address that listPTR points to, therefore you do not need to use the & operator.

Figure 7 - Head Pointer

You can have as many pointers to a single address as you like

Now that we have a node, we need to make sure our list is linked, and add some data to it. Since we only have one node, we need to set that node's pointer to NULL, since it is the last node in the list right now. Generally speaking, the pointer should be set to NULL or pointed to another node immediately after it is created, but for learning purposes we'll be a little lenient. Let's also give the data member of the node a value.

listPTR->next = NULL;
listPTR->data = 82;

Here, we use listPTR to access the node, and access the 'next' member, which we set to NULL, then the 'data' member, which we set to 82. You could have just as easily used headPTR to set the values, but I want to try to keep you from forming that habit. headPTR is only used as a placeholder so we can find the beginning of the list at any time; all value changing should be done with listPTR.

Now, let's add a second node to the list, and set it's value to 78.

listPTR = malloc(sizeof(struct listnode));
listPTR->data = 78;

Now we have a new node, pointed to by listPTR, with a data value of 78. The only problem is, it isn't linked to the list.

Figure 8 - Second Node

As you can see in the above figure, when you set listPTR to point to the new node, it no longer points to the first node. The pointer is overwritten with the new address.

Now we want to link the second node to the list. There are two things we have to do here, first we have to set the second node's pointer to NULL, since it will be the last node of the list, and then we have to set the first node's pointer to point to the second node.

listPTR->next = NULL;
headPTR->next = listPTR;

This sets the second node's pointer to NULL and points the first node to the second node.

Figure 9 - Linked Nodes

As I'm sure you noticed, I used headPTR after telling you never to use headPTR. This illustrates one of the tricky thing with linked lists; you have to know how to handle multiple pointers to be able to make them work. Not only do you have to deal with the pointers inside each node, but you will have to use a number of pointers to handle the list as a whole.

I'm trying to write this in a way that will show you how to do this, but not tell you how. There are still some things you need to figure out yourself, or there is no point in even learning C. But I will tell you this much.. I usually use three pointers for a singly linked list:

headPTR - Always points to the first node of the list.
newPTR - Used to create new nodes and assign values to them.
listPTR - Used to search the list, and to help add nodes.

Figuring out just how to use those three together to manage a list is up to you, but by now it should be clear.