May 26, 2015 12:10 PM PT
Java 101: Foundations

Java 101: Elementary Java language features

Using comments, identifiers, types, literals, and variables in your Java programs

Java is an object-oriented programming language, but there's more to Java than programming with objects. This article begins a three-part miniseries that introduces some of the non-object-oriented features and syntax that are fundamental to the Java language. Find out why Unicode has replaced ASCII as the universal encoding standard for Java, then learn how to use comments, identifiers, types, literals, and variables in your Java programs.

Note that examples in this article were written using Java 8.

download
Source code for "Java 101: Elementary Java language features." Created by Jeff Friesen for JavaWorld.

Unicode and character encoding

When you save a program's source code (typically in a text file), the characters are encoded for storage. Historically, ASCII (the American Standard Code for Information Interchange) was used to encode these characters. Because ASCII is limited to the English language, Unicode was developed as a replacement.

Unicode is a computing industry standard for consistently encoding, representing, and handling text that's expressed in most of the world's writing systems. Unicode uses a character encoding to encode characters for storage. Two commonly used encodings are UTF-8 and UTF-16. You'll learn later in this article how Java's support for Unicode can impact your source code and compilation.

Comments: Three ways to document your Java code

Suppose you are working in the IT department for a large company. Your boss instructs you to write a program consisting of a few thousand lines of source code. After a few weeks, you finish the program and deploy it. A few months later, users begin to notice that the program occasionally crashes. They complain to your boss and he orders you to fix it. After searching your projects archive, you encounter a folder of text files that list the program's source code. Unfortunately, you find that the source code makes little sense. You've worked on other projects since creating this one, and you can't remember why you wrote the code the way that you did. It could take you hours or even days to decipher your code, but your boss wanted a solution yesterday. Talk about major stress! What do you do?

You can avoid this stress by documenting the source code with meaningful descriptions. Though frequently overlooked, documenting source code while writing a program's logic is one of a developer's most important tasks. As my example illustrates, given some time away from the code, even the original programmer might not understand the reasoning behind certain decisions.

In Java, you can use the comment feature to embed documentation in your source code. A comment is a delimited block of text that's meaningful to humans but not to the compiler. When you compile the source code, the Java compiler ignores all comments; it doesn't generate bytecodes for them. Java supports single-line, multiline, and Javadoc comments. Let's look at examples for each of these.

Single-line comments

A single-line comment spans a single line. It begins with // and continues to the end of the current line. The compiler ignores all characters from // through the end of that line. The following example presents a single-line comment:

System.out.println((98.6 - 32) * 5 / 9);
// Output Celsius equivalent of 98.6 degrees Fahrenheit.

A single-line comment is useful for specifying a short meaningful description of the intent behind a given line of code.

Multiline comments

A multiline comment spans multiple lines. It begins with /* and ends with */. All characters from /* through */ are ignored by the compiler. The following example presents a multiline comment:

/*
   An amount of $2,200.00 is deposited in a bank paying an annual
   interest rate of 2%, which is compounded quarterly. What is
   the balance after 10 years?
   Compound Interest Formula:
   A = P(1+r/n)nt
   A = amount of money accumulated after n years, including interest
   P = principal amount (the initial amount you deposit)
   r = annual rate of interest (expressed as a decimal fraction)
   n = number of times the interest is compounded per year
   t = number of years for which the principal has been deposited
*/
double principal = 2200;
double rate = 2 / 100.0;
double t = 10;
double n = 4;
System.out.println(principal * Math.pow(1 + rate / n, n * t));

As you can see, a multiline comment is useful for documenting multiple lines of code. Alternatively, you could use multiple single-line comments for this purpose, as I've done below:

// Create a ColorVSTextComponent object that represents a component
// capable of displaying lines of text in different colors and which
// provides a vertical scrolling capability. The width and height of
// the displayed component are set to 180 pixels and 100 pixels,
// respectively.
ColorVSTextComponent cvstc = new ColorVSTextComponent(180, 100);

Another use for multiline comments is in commenting out blocks of code that you don't want compiled, but still want to keep because you might need them in the future. The following source code demonstrates this scenario:

/*
      if (!version.startsWith("1.3") && !version.startsWith("1.4"))
      {
         System.out.println("JRE " + version + " not supported.");
         return;
      }
*/

Don't nest multiline comments because the compiler will report an error. For example, the compiler outputs an error message when it encounters /* This /* nested multiline comment (on a single line) */ is illegal */.

Javadoc comments

A Javadoc comment is a special multiline comment. It begins with /** and ends with */. All characters from /** through */ are ignored by the compiler. The following example presents a Javadoc comment:

/**
 *  Application entry point
 *
 *  @param args array of command-line arguments passed to this method
 */
public static void main(String[] args)
{
   // TODO code application logic here
}

This example's Javadoc comment describes the main() method. Sandwiched between /** and */ is a description of the method and the @param Javadoc tag (an @-prefixed instruction to the javadoc tool).

Consider these commonly used Javadoc tags:

  • @author identifies the source code's author.
  • @deprecated identifies a source code entity (e.g., method) that should no longer be used.
  • @param identifies one of a method's parameters.
  • @see provides a see-also reference.
  • @since identifies the software release where the entity first originated.
  • @return identifies the kind of value that the method returns.
  • @throws documents an exception thrown from a method.

Although ignored by the compiler, Javadoc comments are processed by javadoc, which compiles them into HTML-based documentation. For example, the following command generates documentation for a hypothetical Checkers class:

javadoc Checkers

The generated documentation includes an index file (index.html) that describes the documentation's start page. For example, Figure 1 shows the start page from the Java SE 8 update 45 runtime library API documentation.

Figure 1. Java SE 8u45 runtime library API documentation was generated by javadoc.

Identifiers: Naming classes, methods, and more in your Java code

Various source code entities such as classes and methods must be named so that they can be referenced in code. Java provides the identifiers feature for this purpose, where an identifier is nothing more than a name for a source code entity.

An identifier consists of letters (A-Z, a-z, or equivalent uppercase/lowercase letters in other human alphabets), digits (0-9 or equivalent digits in other human alphabets), connecting punctuation characters (such as the underscore), and currency symbols (such as the dollar sign). This name must begin with a letter, a currency symbol, or a connecting punctuation character. Furthermore, it cannot wrap from one line to the next.

Below are some examples of valid identifiers:

  • i
  • count2
  • loanAmount$
  • last_name
  • $balance
  • π (Greek letter Pi -- 3.14159)

Many character sequences are not valid identifiers. Consider the following examples:

  • 5points, because it starts with a digit
  • your@email_address, because it contains an @ symbol
  • last name, because it includes a space

Almost any valid identifier can be chosen to name a class, method, or other source code entity. However, Java reserves some identifiers for special purposes; they are known as reserved words. Java reserves the following identifiers:

The compiler outputs an error message when it detects any of these reserved words being used outside of its usage contexts; for example, as the name of a class or method. Java also reserves but doesn't use const and goto.

Types: Classifying values in your Java code

Java applications process characters, integers, floating-point numbers, strings, and other kinds of values. All values of the same kind share certain characteristics. For example, integers don't have fractions and strings are sequences of characters with the concept of length.

Java provides the type feature for classifying values. A type is a set of values, their representation in memory, and a set of operations for manipulating these values, often transforming them into other values. For example, the integer type describes a set of numbers without fractional parts, a two's-complement representation (I'll explain two's-complement shortly), and operations such as addition and subtraction that produce new integers.

Java supports primitive types, reference types, and array types.

Primitive types

A primitive type is a type that's defined by the language and whose values are not objects. Java supports a handful of primitive types:

  • Boolean
  • Character
  • Byte integer
  • Short integer
  • Integer
  • Long integer
  • Floating-point
  • Double precision floating-point

We'll consider each of these before moving on to reference and array types.

Boolean

The Boolean type describes true/false values. The JVM specification indicates that Boolean values stored in an array (discussed later) are represented as 8-bit (binary digit) integer values in memory. Furthermore, when they appear in expressions, these values are represented as 32-bit integers. Java supplies AND, OR, and NOT operations for manipulating Boolean values. Also, its boolean reserved word identifies the Boolean type in source code.

Note that the JVM offers very little support for Boolean values. The Java compiler transforms them into 32-bit values with 1 representing true and 0 representing false.

Character

The character type describes character values (for instance, the uppercase letter A, the digit 7, and the asterisk [*] symbol) in terms of their assigned Unicode numbers. (As an example, 65 is the Unicode number for the uppercase letter A.) Character values are represented in memory as 16-bit unsigned integer values. Operations performed on characters include classification, for instance classifying whether a given character is a digit.

Extending the Unicode standard from 16 bits to 32 bits (to accommodate more writing systems, such as Egyptian hieroglyphs) somewhat complicated the character type. It now describes Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. If you want to learn about BMP, code points, and code units, study the Character class's Java API documentation. For the most part, however, you can simply think of the character type as accommodating character values.

Integer types

Java supports four integer types for space and precision reasons: byte integer, short integer, integer, and long integer. Arrays based on shorter integers don't consume as much space. Calculations involving longer integers give you greater precision. Unlike the unsigned character type, the integer types are signed.

Byte integer

The byte integer type describes integers that are represented in 8 bits; it can accommodate integer values ranging from -128 through 127. As with the other integer types, byte integers are stored as two's-complement values. In a two's-complement all the bits are flipped, from one to to zero and from zero to one, and then the number one is added to the result. The leftmost bit is referred to as the sign bit and all other bits refer to the number's magnitude. This representation is illustrated in Figure 2.

Figure 2. The internal representation of positive and negative 8-bit integers consists of sign and magnitude.

Byte integers are most useful for storing small values in an array. The compiler generates bytecode to convert a byte integer value to an integer value before performing a mathematical operation such as addition. Java's byte reserved word identifies the byte integer type in source code.

Short integer

The short integer type describes integers that are represented in 16 bits; it can accommodate integer values ranging from -32,768 to 32,767. It possesses the same internal representation as byte integer, but with more bits to accommodate its larger magnitude. The compiler generates bytecode to convert a short integer value to an integer value before performing a mathematical operation. Java's short reserved word identifies the short integer type in source code.

Integer type

The integer type describes integers that are represented in 32 bits; it can accommodate integer values ranging from -2,147,483,648 to 2,147,483,647. It possesses the same internal representation as byte integer and short integer, but with more bits to accommodate its larger magnitude. Java's int reserved word identifies the integer type in source code.

Long integer

The long integer type describes integers that are represented in 64 bits; it can accommodate integer values ranging from -263 to 263-1. It possesses the same internal representation as byte integer, short integer, and integer, but with more bits to accommodate its larger magnitude. Java's long reserved word identifies the long integer type in source code.

Floating-point types

Java supports two floating-point types for space and precision reasons. The smaller type is useful in an array context, but cannot accommodate as large a range of values. Although it occupies more space in an array context, the larger type can accommodate a greater range.

The floating-point type describes floating-point values that are represented in 32 bits; it can accommodate floating-point values ranging from approximately +/-1.18x10-38 to approximately +/-3.4x1038. It is represented in IEEE 754 format in which the leftmost bit is the sign bit (0 for positive and 1 for negative), the next eight bits hold the exponent, and the final 23 bits hold the mantissa, resulting in about 6-9 decimal digits of precision. Java's float reserved word identifies the floating-point type in source code.

The double precision floating-point type describes floating-point values that are represented in 64 bits; it can accommodate floating-point values ranging from approximately +/-2.23x10-308 to approximately +/-1.8x10308. It is represented in IEEE 754 format in which the leftmost bit is the sign bit (0 for positive and 1 for negative), the next 11 bits hold the exponent, and the final 52 bits hold the mantissa, resulting in about 15-17 decimal digits of precision. Java's double reserved word identifies the double precision floating-point type in source code.

Reference types

A reference type is a type from which objects are created or referenced, where a reference is some kind of pointer to the object. (A reference could be an actual memory address, an index into a table of memory addresses, or something else.) Reference types are also known as user-defined types because they are typically created by language users.

Java developers use the class feature to create reference types. A class is either a placeholder for an application's main() method (see the HelloWorld application in "Learn Java from the ground up" for an example of main()) or various static methods, or it's a template for manufacturing objects, which I demonstrate below:

class Cat
{
   String name; // String is a special reference type for describing strings
   Cat(String catName)
   {
      name = catName;
   }
   String name()
   {
      return name;
   }
}

This class declaration introduces a Cat class for describing felines. Its name field stores the cat's name as a string, its constructor initializes this data member to a cat name, and its name() method returns the cat's name. The following code snippet, which presumably would be located in a main() method, shows how to manufacture a cat and obtain its name:

Cat cat = new Cat("Garfield");
System.out.println(cat.name()); // Output: Garfield

The interface feature lets you reference an object without concern for the object's class type. As long as the object's class implements the interface, the object is also considered to be a member of the interface type. Consider the following example, which declares a Shape interface along with Circle and Rectangle classes:

interface Shape
{
   void draw();
}
class Circle implements Shape
{
   void draw()
   {
      System.out.println("I am a circle.");
   }
}
class Rectangle implements Shape
{
   void draw()
   {
      System.out.println("I am a rectangle.");
   }
}

The next code snippet instantiates Circle and Rectangle, assigns their references to Shape variables, and asks them to draw themselves:

Shape shape = new Circle();
shape.draw(); // Output: I am a circle.
shape = new Rectangle();
shape.draw(); // Output: I am a rectangle.

You can use interfaces to abstract commonality from a set of otherwise dissimilar classes. As an example, an Inventory interface would extract commonality from Goldfish, Car, and Hammer classes, because each of these items can be inventoried. Interfaces offer considerable power when combined with arrays and loops, which you'll learn about later in this series.

Array types

Array is the last of our three types. An array type is a special reference type that denotes an array, which is a region of memory that stores values in slots that are of equal size and are (typically) contiguous. These values are commonly referred to as elements. The array type is composed of the element type (a primitive type or a reference type) and one or more pairs of square brackets that indicate the number of dimensions (extents) occupied by the array. A single pair of brackets signifies a one-dimensional array (a vector); two pairs of brackets signify a two-dimensional array (a table); three pairs of brackets signify a one-dimensional array of two-dimensional arrays (a vector of tables); and so on. For example, int[] signifies a one-dimensional array (with int as the element type), and String[][] signifies a two-dimensional array (with String as the element type).

Literals: Specifying values in your Java code

Java provides the literals language feature for embedding values in source code. A literal is a value's character representation. Each primitive type is associated with its own set of literals:

The Boolean primitive type is associated with the literals true or false.

The character primitive type is associated with character literals, which often consist of single values placed between single quotes, as in capital letter A ('A'). Alternatively, you could specify an escape sequence or a Unicode escape sequence. Consider each option:

  • An escape sequence is a representation for a character that cannot be expressed literally in a character literal or a string literal. An escape sequence begins with a backslash character (\) and is followed by one of \, ', ", b, f, n, r, or t. You must always escape a backslash that's to be expressed literally to inform the compiler that it isn't introducing an escape sequence. You must always escape a single quote expressed literally in a character literal to inform the compiler that the single quote isn't ending the character literal. Similarly, you must always escape a double quote expressed literally in a string literal to inform the compiler that the double quote isn't ending the string literal. The other escape sequences are for characters with no symbolic representation: \b represents a backspace, \f represents a form feed, \n represents a new-line, \r represents a carriage return, and \t represents a horizontal tab. Escape sequences appear between single quotes in a character literal context (e.g., '\n').
  • A Unicode escape sequence is a representation for an arbitrary Unicode character. It consists of a \u prefix immediately followed by four hexadecimal digits. For example, \u0041 represents capital letter A, and \u3043 represents a Hiragana letter. Unicode escape sequences appear between single quotes in a character literal context (e.g., '\u3043').

The integer types are associated with literals consisting of sequences of digits, with optionally embedded underscore characters. By default, an integer literal is assigned the integer (int) type. You must suffix the literal with capital letter L (or lowercase letter l, which might be confused with digit 1) to represent a long integer value. Integer literals can be specified in binary, decimal, hexadecimal, and octal formats:

  • Binary consists of the numbers zero and one and is prefixed with 0b or 0B. Example: 0b01111010.
  • Decimal consists of the numbers zero through nine and has no prefix. Example: 2200.
  • Hexadecimal consists of the numbers zero through nine, lowercase letters a through f, and uppercase letters A through F. This literal is prefixed with 0x or 0X. Example: 0xAF.
  • Octal consists of the numbers zero through seven and is prefixed with 0. Example: 077.

To improve legibility, you can insert underscore characters between digits; for example, 1234_5678_9012_3456L. You cannot specify a leading underscore, as in _1234, because the compiler would assume that an identifier was being specified. You also cannot specify a trailing underscore.

The floating-point types are associated with literals consisting of a non-fractional part, a decimal point, a fractional part, an optional exponent, and either optional double precision floating-point type letter D or d, or a floating-point type letter F or f. Examples of floating-point literals include 2.7818, 0.8D, -57.2E+31, and 3.14159f. If neither D, d, F, nor f is present, the type defaults to double precision floating-point. If D or d is present, the type is also double precision floating-point. However, if F or f is specified, the type is floating-point.

For floating-point types you can insert underscore characters between digits; for example, 1.234_567e+56. You cannot specify a leading underscore (e.g., _1.234) because the compiler would assume that an identifier was being specified. You also cannot specify a trailing underscore (e.g., 1.5_), an underscore on either side of the decimal point (e.g., 2_.3 or 2._3), an underscore before or after the e or E character when an exponent is present (e.g., 1.2_e3 or 1.2E_3), and an underscore on either side of any + or - character that follows e or E (e.g., 2.8e_+2 or 3.1E-_5).

Variables: Storing values in your Java code

Applications manipulate values that are stored in memory. Java's variables feature symbolically represents memory in source code. A variable is a named memory location that stores a value of some type. For a primitive type, the value is stored directly in the variable. For a variable of reference type, a reference is stored in the variable and the object referred to by the reference is stored elsewhere. Variables that store references are often called reference variables.

You must declare a variable before it is used. A variable declaration minimally consists of a type name, optionally followed by a sequence of square bracket pairs, followed by a name, optionally followed by a sequence of square bracket pairs, and terminated with a semicolon character (;). Consider the following examples:

int age;             	// Declare integer variable age.
float interest_rate; 	// Declare floating-point variable interest_rate.
String name;         	// Declare String variable name.
Car car;             	// Declare Car variable car.
char[] text;         	// Declare one-dimensional character array variable text.
double[][] temps;    	// Declare two-dimensional floating-point array variable temps.

The above variables need to be initialized before they are used. You can initialize a variable as part of its declaration:

int age = 25;
float interest_rate = 4.0F;
String name = "Java";
Car car = new Car();
char[] text = { 'J', 'a', 'v', 'a' };
double[][] temps = { { 25.0, 96.2, -32.5 }, { 0.0, 212.0, -41.0 }};

Each initialization requires = followed by a literal, an object-creation expression that begins with new, or an array initializer (for array types only). The array initializer consists of a brace-delimited and comma-separated list of literals and (for multi-dimensional arrays) nested array initializers.

Note that the text example creates a one-dimensional array of characters consisting of four elements. The temps example creates a two-row-by-three-column two-dimensional array of double precision floating-point values. The array initializer specifies two row arrays with each row array containing three column values.

Alternatively, you can initialize a variable after its declaration by omitting the type, as follows:

age = 25;
interest_rate = 4.0F;
name = "Java";
car = new Car();
text = { 'J', 'a', 'v', 'a' };
temps = { { 25.0, 96.2, -32.5 }, { 0.0, 212.0, -41.0 }};

Accessing a variable's value

To access a variable's value, specify the variable's name (for primitive types and String), de-reference the object and access a member, or use an array-index notation to identify the element whose value is to be accessed:

System.out.println(age);           	// Output: 25
System.out.println(interest_rate); 	// Output: 4.0
System.out.println(name);          	// Output: Java
System.out.println(cat.name());    	// Output: Garfield
System.out.println(text[0]);       	// Output: J
System.out.println(temps[0][1]);   	// Output: 96.2

In order to de-reference an object you must place a period character between the reference variable (cat) and the member (name()). In this case, the name() method is called and its return value is output.

Array access requires a zero-based integer index to be specified for each dimension. For text, only a single index is needed: 0 identifies the first element in this one-dimensional array. For temps, two indexes are required: 0 identifies the first row and 1 identifies the second column in the first row in this two-dimensional array.

You can declare multiple variables in one declaration by separating each variable from its predecessor with a comma, as demonstrated by the following example:

int a, b[], c;

This example declares three variables named a, b, and c. Each variable shares the same type, which happens to be integer. Unlike a and c, which each store one integer value, b[] denotes a one-dimensional array where each element stores an integer. No array is yet associated with b.

Note that the square brackets must appear after the variable name when the array is declared in the same declaration as the other variables. If you place the square brackets before the variable name, as in int a, []b, c;, the compiler reports an error. If you place the square brackets after the type name, as in int[] a, b, c;, all three variables signify one-dimensional arrays of integers.

Earlier, I mentioned that Java supports Unicode. In the next section, we'll find out how this support can affect source code and compilation.

Experimenting with Java's Unicode support

Java program listings are typically stored in files where they are encoded according to the native platform's character encoding. For example, my Windows 7 platform uses Cp1252 as its character encoding. When the JVM starts running, such as when you start the Java-based Java compiler via the javac tool, it tries to obtain this encoding. If the JVM cannot obtain it, the JVM chooses UTF-8 as the default character encoding.

Cp1252 doesn't support many characters beyond the traditional ASCII character set, which can cause problems. For instance, if you attempt to use the Windows notepad editor to save Listing 1, the editor will complain that characters in the Unicode format will be lost. Can you figure out why?

Listing 1. Symbolically naming an identifier (version 1)

class PrintPi
{
   public static void main(String[] args)
   {
      double π = 3.14159;
      System.out.println(π);
   }
}

The problem is that the above source includes the Greek letter Pi (π) as a variable's name, which causes the editor to balk. Fortunately, we can resolve this issue.

First, try saving Listing 1 to a file named PrintPi.java: From notepad's Save As dialog box, enter PrintPi.java as the file's name and select Unicode, which corresponds to UTF-16 (little-endian order), from the Encoding drop-down list of encoding options. Then press the Save button.

Next, attempt to compile PrintPi.java, as follows:

javac PrintPi.java

In response you'll receive many error messages because the text file's contents were encoded as UTF-16, but javac assumes (on my platform) that the contents were encoded as Cp1252. To fix this problem, we must tell javac that the contents were encoded as UTF-16. We do this by passing the -encoding Unicode option to this program, as follows:

javac -encoding Unicode PrintPi.java

This time, the code compiles without error. When you execute PrintPi.class via java PrintPi, you'll observe the following output:

3.14159

You can also embed symbols from other alphabets by specifying their Unicode escape sequences without the surrounding quotes. This way, you don't have to specify an encoding when saving a listing or compiling the saved text because the text was encoded according to the native platform's encoding (e.g., Cp1252). For example, Listing 2 replaces π with the \u03c0 Unicode escape sequence for this symbol.

Listing 2. Symbolically naming an identifier (version 2)

class PrintPi
{
   public static void main(String[] args)
   {
      double \u03c0 = 3.14159;
      System.out.println(\u03c0);
   }
}

Compile the source code without the -encoding unicode option (javac PrintPi.java) -- the same class file is generated -- and run the application as before (java PrintPi). You'll observe identical output.

In conclusion

Java has many fundamental language features that you should grasp before getting to the really interesting parts of the language. In this article, you learned how Unicode, comments, identifiers, types, literals, and variables work in Java programs. Next time we'll tackle Java expressions and their operators.