Java 101: Datastructures and algorithms, Part 1

Explore datastructures, algorithms, flowcharts, pseudocode, and arrays

Computer science emphasizes two important topics: datastructures and algorithms. Those topics are important because the choices you make for a program's datastructures and algorithms affect that program's memory usage (for datastructures) and CPU time (for algorithms that interact with those datastructures). When choosing a datastructure or algorithm, you sometimes discover an inverse relationship between memory usage and CPU time: the less memory a datastructure uses, the more CPU time associated algorithms need to process the datastructure's data items, which are primitive type values or objects, via references. Also, the more memory a datastructure uses, the less CPU time associated algorithms need to process the data items—and faster algorithms result. This inverse relationship appears in Figure 1.

Figure 1. The inverse relationship between memory usage and CPU time

An example of the inverse relationship between memory usage and CPU time involves the one-dimensional array and doubly-linked list datastructures, and their insertion/deletion algorithms. For any given list of data items, a one-dimensional array occupies less memory than a doubly linked list: a doubly linked list needs to associate links with data items to find each data item's predecessor and successor, which requires extra memory. In contrast, a one-dimensional array's insertion/deletion algorithms are slower than a doubly linked list's equivalent algorithms: inserting a data item into or deleting a data item from a one-dimensional array requires data item movement to expose an empty element for insertion or close an element made empty by deletion. (I explore one-dimensional arrays later in this article, and doubly linked lists in next month's article.)

This article initiates a two-part series that explores datastructures and algorithms. The article begins with a presentation of basic concepts and continues with a tour of the array datastructure. You learn about one-dimensional, two-dimensional, and ragged arrays, plus linear-search, bubble-sort, binary-search, and matrix-multiplication array-oriented algorithms. The article ends by asserting that Java's arrays are objects.

Note: You can download the source code that accompanies this article from Resources.

Datastructure and algorithm basics

Before we explore specific datastructures and algorithms, we need to examine three basic questions: What is a datastructure? What is an algorithm? How do you represent an algorithm? Knowledge of those concepts helps you understand this series.

What is a datastructure?

Datastructures have been around since the structured programming era. A definition from that era: a datastructure is a set of types, a designated type from that type set, a set of functions, and a set of axioms. That definition implies that a datastructure is a type with implementation. In our object-oriented programming era, type with implementation means class.

The datastructure is a class definition is too broad because it embraces Employee, Vehicle, Account, and many other real-world entity-specific classes as datastructures. Although those classes structure various data items, they do so to describe real-world entities (in the form of objects) instead of describing container objects for other entity (and possibly container) objects. This containment idea leads to a more appropriate datastructure definition: a datastructure is a container class that provides storage for data items, and capabilities for storing and retrieving data items. Examples of container datastructures: arrays, linked lists, stacks, and queues. (I will explore the linked-list, stack, and queue datastructures next month.)

What is an algorithm?

Algorithms often associate with datastructures. An algorithm is a sequence of instructions that accomplishes a task in a finite time period. The algorithm receives zero or more inputs, produces at least one output, consists of clear and unambiguous instructions, terminates after a finite number of steps, and is basic enough that a person can carry out the algorithm using a pencil and paper. In contrast, a program is not necessarily finite: the program, such as a Web server, may never terminate without external intervention. Examples of algorithms associated with datastructures: linear-search, bubble-sort, binary-search, matrix-multiplication, linked-list-concatenation, stack-push-and-pop, and queue-insert-and-delete. (I will explore the linked-list-concatenation, stack-push-and-pop, and queue-insert-and-delete algorithms next month.)

Algorithm representation

How do you represent an algorithm? The most obvious representation: Java source code. However, writing source code before you fully understand an algorithm often leads to difficult-to-find bugs. One technique for overcoming those bugs involves flowcharts.

A flowchart is a visual representation of an algorithm's control flow. That representation illustrates statements that need to execute, decisions that need to be made, logic flow (for iteration and other purposes), and terminals that indicate start and end points. To present that visual representation, a flowchart uses various symbols, which Figure 2 shows.

Figure 2. Flowchart symbols for statements, decisions, logic flow, and terminals

What does a flowchart look like? Suppose you have a simple algorithm that initializes a counter to 0, reads characters until a new-line (\n) character is seen, increments the counter for each read digit character, and prints the counter's value after the new-line character has been read. Figure 3's flowchart illustrates this algorithm's control flow.

Figure 3. A flowchart for counting digit characters. Click on thumbnail to view full-size image.

A flowchart's advantages include its simplicity and its ability to present an algorithm's control flow visually. Flowcharts also have disadvantages:

  • Highly-detailed flowcharts can introduce errors or inaccuracies.
  • Extra time is required to position, label, and connect a flowchart's symbols, even though tools speed up this process. This delay might slow your understanding of an algorithm.
  • Because flowcharts are tools of the structured programming era, they aren't as useful in an object-oriented context. The Unified Modeling Language (UML) is more appropriate for providing object-oriented visual representations.

An alternative to flowcharts is pseudocode: a textual representation of an algorithm that approximates the final source code. Pseudocode is useful for quickly writing down an algorithm's representation. Because syntax is not a concern, no rules define how to write pseudocode. Consider the following example:

  READ ch
  IF ch IS '0' THROUGH '9' THEN
UNTIL ch IS '\n'
PRINT count

The example above presents the pseudocode equivalent of Figure 3's flowchart. Although locating control flow in pseudocode can prove harder than finding it in a flowchart, typically, writing pseudocode takes less time than drawing a flowchart.

This series uses pseudocode to represent algorithms.

The array datastructure

The array is one of the most widely used datastructures because of its flexibility in deriving complex datastructures and its simplicity. Because arrays are so prevalent and because I needed to use the one-dimensional array variant in several of this column's earlier programs, I briefly introduced the array datastructure in "Non-Object-Oriented Language Basics, Part 1." This section builds on that earlier article's introduction and begins with a definition: an array is a sequence of elements, where each element (a group of memory bytes that stores a single data item) associates with at least one index (nonnegative integer). That definition raises four interesting points:

  • Each element occupies the same number of bytes; the exact number depends on the type of the element's data item.
  • All elements share the same type.
  • We tend to think of an array's elements as occupying consecutive memory locations. When I discuss two-dimensional arrays, you'll discover that isn't always the case.
  • The number of indexes associated with any element is the array's dimension.
This section focuses exclusively on arrays of one and two dimensions because higher-dimensional arrays are not used as frequently.

One-dimensional arrays

The simplest kind of array has one dimension: each element associates with a single index. Java provides three techniques for creating a one-dimensional array: use only an initializer, use only keyword new, and use keyword new with an initializer. Because I showed you how to create a one-dimensional array with only an initializer in "Non-Object-Oriented Language Basics, Part 1," I focus on the latter two techniques in this section.

Use only keyword new to create a one-dimensional array. That technique requires either of the following syntaxes:

type variable_name '[' ']' '=' 
    'new' type '[' integer_expression ']' ';'
type '[' ']' variable_name '=' 
    'new' type '[' integer_expression ']' ';'

Either syntax:

  • Specifies variable_name as the name of the one-dimensional array variable.
  • Specifies type as each element's type. Because the one-dimensional array variable holds a reference to the one-dimensional array, the variable's type is type [ ].
  • Specifies keyword new, followed by type, followed by integer_expression between square brackets ([ ]), which identifies the number of elements. new allocates memory for a one-dimensional array's elements and zeroes all bits in each element's bytes, which means that each element contains a default value that you interpret based on type.
  • Specifies = to assign the one-dimensional array's reference to variable_name.
Java developers often place square bracket characters after type (int [] test_scores) rather than after variable_name (int test_scores []) when declaring an array variable. Keeping all type information in one place improves the source code's readability.

The following code fragment use only keyword new to create a one-dimensional array that stores data items based on a primitive type:

int [] test_scores = new int [4];

int [] test_scores declares a one-dimensional array variable (test_scores) along with that variable's type (int []). The int [] reference type signifies that each element must contain a data item of primitive type integer. new int [4] creates a one-dimensional array by allocating memory for four consecutive integer elements. Each element holds a single integer and initializes to 0. Operator equal-to (=) assigns the one-dimensional array's reference to test_scores. Figure 4 illustrates the resulting one-dimensional array variable and elements.

Figure 4. The test_scores one-dimensional array reference variable contains a reference to a four-element one-dimensional array of integers. Click on thumbnail to view full-size image.
When creating a one-dimensional array based on a primitive type, the compiler requires the same primitive type keyword to appear on both sides of the equal-to operator. Otherwise, the compiler reports an error. For example, int [] test_scores = new long [20]; is illegal because keywords int and long represent incompatible primitive types.

Primitive type one-dimensional arrays store data items that are primitive values. In contrast, reference type one-dimensional arrays store data items that reference objects. The following code fragment uses only keyword new to create a pair of one-dimensional arrays that each store data items based on a reference type:

Clock [] c1 = new Clock [3];
Clock [] c2 = new AlarmClock [3];

Clock [] c1 = new Clock [3]; declares a one-dimensional array variable, (c1) of type Clock [], allocates memory for a one-dimensional Clock array consisting of three consecutive elements, and assigns the Clock array's reference to c1. Each element must contain a reference to either a Clock object (assuming Clock is a concrete class) or an object created from a concrete Clock subclass, and initializes to null. Clock [] c2 = new AlarmClock [3]; resembles the previous declaration, except that a one-dimensional AlarmClock array is created, and its reference assigns to Clock [] variable c2. (Assume AlarmClock subclasses Clock.)

Use keyword new with an array initializer to create a one-dimensional array. That technique requires either of the following syntaxes:

type variable_name '[' ']' '=' 
    'new' type '[' ']' initializer ';'
type '[' ']' variable_name '=' 
    'new' type '[' ']' initializer ';'

where initializer has the following syntax:

'{' [ initial_value [ ',' ... ] ] '}'

Either syntax:

  • Specifies variable_name as the name of the one-dimensional array variable.
  • Specifies type as the type of each element. Because the one-dimensional array variable holds a reference to the one-dimensional array, the variable's type is type [ ].
  • Specifies keyword new, followed by type, followed by empty square brackets, followed by initializer. You do not specify the number of elements between the square brackets because the compiler determines that count from the number of initializer's entries. new allocates memory for a one-dimensional array and assigns each of initializer's entries to an element in a left-to-right fashion.
  • Specifies = to assign the one-dimensional array's reference to variable_name.
A one-dimensional array (or an array of higher dimension) created by keyword new with an array intializer is sometimes referred to as an anonymous array.
1 2 3 4 Page 1