Java programming fundamentals: Datastructures and algorithms in Java

Java 101: Datastructures and algorithms in Java, Part 2

Searching and sorting one-dimensional arrays

Java 101: Datastructures and algorithms, Part 2
Mikael Hvidtfeldt Christensen (CC BY-SA 2.0)

An array is a fundamental datastructure category, and a building block for more complex datastructures. In this second part of my Java 101 introduction to datastructures and algorithms, you will learn how arrays are understood and used in Java programming. I introduce the concept of an array and how arrays are represented in the Java language. Then you'll learn about one-dimensional arrays and the three ways that you can introduce them to your Java programs. Finally, we'll explore five algorithms used to search and sort one-dimensional arrays.

Note that this article builds on Datastructures and algorithms, Part 1, which introduced the theoretical side of datastructures and the algorithms associated with them. That article includes an in-depth discussion of algorithms and how to use space and time complexity factors to evaluate and select the most efficient algorithm for your Java program. This article will be much more hands-on, and assumes you have already read and digested Part 1.

What is an array?

An array is a sequence of elements where each element is associated with at least one index. An element is a group of memory locations that store a single data item. An index is a nonnegative integer, which in this case is used to uniquely identify an element. This relationship is similar to how a box number uniquely identifies a house on a given street.

The number of indexes associated with any element is the array's dimension. In this article, we'll be talking about one-dimensional arrays. The next article in this series introduces multi-dimensional arrays.

Java supports arrays. Each element occupies the same number of bytes, and the exact number depends on the type of the element's data item. Furthermore, all elements share the same type.

One-dimensional arrays

The simplest kind of array has one dimension. A one-dimensional array associates each element with one index. One-dimensional arrays are used to store lists of data items. There are three techniques for creating one-dimensional arrays in Java:

  • Use only an initializer
  • Use only keyword new
  • Use keyword new with an initializer

Creating a one-dimensional array with only an initializer

Here is the syntax to create a one-dimensional array using just an initializer:


'{' [expr (',' expr)*] '}'

This syntax states that a one-dimensional array is an optional, comma-separated list of expressions appearing between open and close brace characters. Furthermore, all expressions must evaluate to compatible types. For example, in a two-element one-dimensional array of doubles, both elements might be of type double, or one element might be a double while the other element is a float or an integer type (such as int).

Example:


{ 'J', 'a', 'v', 'a' }

Creating a one-dimensional array with the keyword new

The keyword new allocates memory for an array and returns its reference. Here's the syntax for this approach:


'new' type '[' int_expr ']'

This syntax states that a one-dimensional array is a region of (positive) int_expr elements that share the same type. Furthermore, all elements are zeroed, and are interpreted as 0, 0L, 0.0F, 0.0, false, null, or '\u0000'.

Example:


new char[4]

Creating a one-dimensional array with the keyword new and an initializer

Here is the syntax to create a one-dimensional array using the keyword new with an initializer. As you see, it blends the syntax from the previous two approaches:


'new' type '[' ']' '{' [expr (',' expr)*] '}'

In this case, because the number of elements can be determined from the comma-separated list of expressions, it isn't necessary (or allowed) to provide an int_expr between the square brackets.

Example:


new char[] { 'J', 'a', 'v', 'a' }

Something to note is that the syntax for creating an array with only an initializer is no different in effect from the syntax using an initializer and a keyword. The initializer-only syntax is an example of syntactic sugar, which means syntax that makes the language sweeter, or easier, to use.

Array variables

By itself, a newly-created one-dimensional array is useless. Its reference must be assigned to an array variable of a compatible type, either directly or via a method call. The following two lines of syntax show how you would declare this variable:


type var_name '[' ']'
type '[' ']' var_name

Each syntax declares an array variable that stores a reference to a one-dimensional array. Although you can use either syntax, placing the square brackets after type is preferred.

Examples:


char[] name1 = { 'J', 'a', 'v', 'a' };
char[] name2 = new char[4];
char[] name3 = new char[] { 'J', 'a', 'v', 'a' };
output(new char[] { 2, 3 }); // output({ 2, 3 }); results in a compiler error
static void output(char[] name)
{
   // ...
}

In the examples, name1, name2, name3, and name are array variables. The single pair of square brackets states that each stores references to one-dimensional arrays.

Keyword char indicates that each element must store a value of char type. However, you can specify a non-char value if Java can convert it to a char. For example, char[] chars = { 'A', 10 }; is legal because 10 is a small enough positive int (meaning that it fits into the char range of 0 through 65535) to be converted to a char. In contrast, char[] chars = { 'A', 80000 }; would be illegal.

An array variable is associated with a .length property that returns the length of the associated one-dimensional array as a positive int; for example, name1.length returns 4.

Given an array variable, you can access any element in a one-dimensional array by specifying an expression that agrees with the following syntax:


array_var '[' index ']'

Here, index is a positive int that ranges from 0 (Java arrays are zero-based) to one less than the value returned from the .length property.

Examples:


char ch = names[0]; // Get value.
names[1] = 'A';     // Set value.

If you specify a negative index or an index that is greater than or equal to the value returned by the array variable's .length property, Java creates and throws a java.lang.ArrayIndexOutOfBoundsException object.

Algorithms for searching and sorting

It is a very common task to search one-dimensional arrays for specific data items, and there are a variety of algorithms for doing it. One of the most popular search algorithms is called Linear Search. Another option is Binary Search, which is usually more performant but also more demanding: in order to use Binary Search, the array's data items must first be sorted, or ordered. Although not very performant, Bubble Sort, Selection Sort, and Insertion Sort are all simple algorithms for sorting a one-dimensional array. Each works well enough for shorter arrays.

The next sections introduce each of these algorithms for searching and sorting one-dimensional arrays.

Linear Search

Linear Search searches a one-dimensional array of n data items for a specific one. It functions by comparing data items from the lowest index to the highest until it finds the specified data item, or until there are no more data items to compare.

The following pseudocode expresses Linear Search used for a one-dimensional array of integers:


DECLARE INTEGER i, srch = ...
DECLARE INTEGER x[] = [ ... ]
FOR i = 0 TO LENGTH(x) - 1
   IF x[i] EQ srch THEN
      PRINT "Found ", srch
      END
   END IF
NEXT i
PRINT "Not found", srch
END

Consider a one-dimensional unordered array of five integers [ 1, 4, 3, 2, 6 ], where integer 1 is located at index 0 and integer 6 is located at index 4. The pseudocode performs the following tasks to find integer 3 in this array:

  1. Compare the integer at index 0 (1) with 3.
  2. Because there's no match, compare the integer at index 1 (4) with 3.
  3. Because there's still no match, compare the integer at index 2 (3) with 3.
  4. Because there's a match, print Found 3 and exit.

Linear Search has a time complexity of O(n), which is pronounced "Big Oh of n." For n data items, this algorithm requires a maximum of n comparisons. On average, it performs n/2 comparisons. Linear Search offers linear performance.

Explore Linear Search

To let you experiment with Linear Search, I've created the LinearSearch Java application in Listing 1.

Listing 1. LinearSearch.java


{
   public static void main(String[] args)
   {
      // Validate command line arguments count.

      if (args.length != 2)
      {
         System.err.println("usage: java LinearSearch integers integer");
         return;
      }

      // Read integers from first command-line argument. Return if integers 
      // could not be read.

      int[] ints = readIntegers(args[0]);
      if (ints == null)
         return;

      // Read search integer; NumberFormatException is thrown if the integer
      // isn't valid.

      int srchint = Integer.parseInt(args[1]);

      // Perform the search and output the result.

      System.out.println(srchint + (search(ints, srchint) ? " found"
                                                          : " not found"));
   }

   private static int[] readIntegers(String s)
   {
      String[] tokens = s.split(",");
      int[] integers = new int[tokens.length];
      for (int i = 0; i < tokens.length; i++)
         integers[i] = Integer.parseInt(tokens[i]);
      return integers;
   }

   private static boolean search(int[] x, int srchint)
   {
      for (int i = 0; i < x.length; i++)
         if (srchint == x[i])
            return true;

      return false;
   }
}

The LinearSearch application reads a comma-separated list of integers from its first command-line argument. It searches the array for the integer identified by the second command-line argument, and outputs a found/not found message.

To experiment with this application, start by compiling Listing 1:


javac LinearSearch.java

Next, run the resulting application as follows:


java LinearSearch "4,5,8" 5

You should observe the following output:


5 found

Run the resulting application a second time, as follows:


java LinearSearch "4,5,8" 15

You should observe the following output:


15 not found

Binary Search

The Binary Search algorithm searches an ordered one-dimensional array of n data items for a specific data item. This algorithm consists of the following steps:

  1. Set low and high index variables to the indexes of the array's first and last data items, respectively.
  2. Terminate if the low index is greater than the high index. The searched-for data item is not in the array.
  3. Calculate the middle index by summing the low and high indexes and dividing the sum by 2.
  4. Compare the searched-for data item with the middle-indexed data item. Terminate if they are the same. The searched-for data item has been found.
  5. If the searched-for data item is greater than the middle-indexed data item, set the low index to the middle index plus one and transfer execution to Step 2. Binary Search repeats the search in the upper half of the array.
  6. The searched-for data item must be smaller than the middle-indexed data item, so set the high index to the middle index minus one and transfer execution to Step 2. Binary Search repeats the search in the lower half of the array.

Here is pseudocode representing the Binary Search algorithm for a one-dimensional array of integers:


DECLARE INTEGER x[] = [ ... ]
DECLARE INTEGER loIndex = 0
DECLARE INTEGER hiIndex = LENGTH(x) - 1
DECLARE INTEGER midIndex, srch = ...
WHILE loIndex LE hiIndex
   midIndex = (loIndex + hiIndex) / 2
   IF srch GT x[midIndex] THEN
      loIndex = midIndex + 1
   ELSE
   IF srch LT x[midIndex] THEN
      hiIndex = midIndex - 1
   ELSE
      EXIT WHILE
   END IF
END WHILE
IF loIndex GT hiIndex THEN
   PRINT srch, " not found"
ELSE
   PRINT srch, " found"
END IF
END

Binary Search isn't hard to understand. For example, consider a one-dimensional ordered array of six integers [ 3, 4, 5, 6, 7, 8 ], where integer 3 is located at index 0 and integer 8 is located at index 5. The pseudocode does the following to find integer 6 in this array:

  1. Obtain the low index (0) and high index (5).
  2. Calculate the middle index: (0+5)/2 = 2.
  3. Because the integer at index 2 (5) is less than 6, set the low index to 2+1 = 3.
  4. Calculate the middle index: (3+5)/2 = 4.
  5. Because the integer at index 4 (7) is greater than 6, set the high index to 4-1 = 3.
  6. Calculate the middle index: (3+3)/2 = 3.
  7. Because the integer at index 3 (6) equals 6, print 3 found and exit.
1 2 3 Page 1
Page 1 of 3