Java 101: Datastructures and algorithms, Part 2

Explore linked lists, stacks, queues, and trees

Last month's column brought you into the computer science world of datastructures and algorithms by focusing on the array datastructure and associated algorithms. Developers view the array as a fundamental datastructure because it serves as the basis for more complex datastructures, such as stacks, queues, and trees. Developers also view the fundamental linked-list datastructure as a foundation for more complex datastructures.

This article wraps up a two-part series that explores datastructures and algorithms. You first learn about the linked-list datastructure in terms of its single, double, and circular variants. While learning those variants, you discover the related linked-list algorithms for node-insertion/deletion, concatentation, inversion, and insertion-sort. I also discuss the advantages and disadvantages of the linked list and array, a rule-of-thumb for deciding when to use each datastructure in your own programs, and whether or not both can integrate into useful hybrid datastructures. Finally, you receive an introduction to the more complex stack, queue, and tree datastructures.

The linked-list datastructure

In addition to the array, another datastructure that receives much use is the linked list. The linked-list datastructure involves four concepts: the self-referential class, node, link field, and link:

  • Self-referential class: a class with at least one field whose reference type is the class name:

    class Employee
    {
       private int empno;
       private String name;
       private double salary;
       public Employee next;
       // Other members
    }
    

    Employee is a self-referential class because its next field has type Employee.

  • Node: an object you create from a self-referential class.
  • Link field: a field whose reference type is the class name. In the code fragment above, next is a link field. In contrast, empno, name, and salary are nonlink fields.
  • Link: the reference in a link field. In the code fragment above, next's reference to an Employee node is a link.

The four concepts above lead to the following definition: a linked list is a sequence of nodes that interconnect via the links in their link fields. Computer scientists use a special notation to illustrate linked lists. A variant of that notation, which I use in this article, appears in Figure 1.

Figure 1. A special notation for illustrating linked lists

Figure 1 presents three nodes: A, B, and C. Each node divides into a contents area (in orange) and one or more link areas (in green). The contents area represents all nonlink fields, and each link area represents a link field. A's single link area and C's link areas incorporate an arrow to signify a reference to some other node of the same type (or a subtype). B's single link area incorporates an X to signify a null reference. In other words, B doesn't connect to any other node.

Although you can create many kinds of linked lists, three popular linked list variants are the single, double, and circular linked lists. Let's explore those variants, beginning with the singly linked list.

Singly linked lists

A singly linked list is a linked list of nodes, where each node has a single link field. A reference variable holds a reference to the first node, each node (except the last) links to the next node, and the last node's link field contains null to signify the list's end. Although the reference variable is commonly named top, you can choose any name you want. Figure 2 presents a three-node singly linked list, where top references the A node, A connects to B, B connects to C, and C is the final node.

Figure 2. A three-node singly linked list contains connected nodes A, B, and C

One common singly linked list algorithm is node-insertion. That algorithm is somewhat involved because it must deal with four cases: when the node must be inserted before the first node; when the node must be inserted after the last node; when the node must be inserted between two nodes; and when the singly linked list does not exist. Before studying each case, consider the following pseudocode:

DECLARE CLASS Node
  DECLARE STRING name
  DECLARE Node next
END DECLARE
DECLARE Node top = NULL

The pseudocode above declares a Node self-referential class with a name nonlink field and a next link field. The pseudocode also declares a top reference variable (of type Node) that holds a reference to the first Node in a singly linked list. Because the list does not yet exist, top's initial value is NULL. Each of the following four cases assumes the Node and top declarations:

1 2 Page 1