Control flow
With code samples, tables, and a Java virtual machine simulation, here's a look at the bytecodes of the Java virtual machine
that deal with control flow
By Bill Venners, JavaWorld.com, 03/01/97
Java offers all the control-flow constructs that C++ programmers found endearing:
if,
if-else,
while,
do-while,
for, and
switch. (Java doesn't offer the
goto, but that was never endearing, not to
real C++ programmers anyway.)
Decisions, decisions: keep it simple
The simplest control-flow construct Java offers is the if statement. But in bytecodes, the if is not so simple. When a Java program is compiled, the if statement may be translated to a variety of opcodes. Each opcode pops one or two values from the top of the stack and does
a comparison. The opcodes that pop only one value off the top of the stack compare that value with zero. The opcodes that
pop two values off the stack compare one of the popped values to the other popped value. If the comparison succeeds (success
is defined differently by each individual opcode), the Java virtual machine (JVM) branches -- or jumps -- to the offset given
as an operand to the comparison opcode. In this manner, the if statement provides many ways for you to make the Java virtual machine decide between two alternative paths of program flow.
All you ever wanted to know about the if opcode
One family of if opcodes performs integer comparisons against zero. When the JVM encounters one of these opcodes, it pops one int off the stack and compares it with zero.
Conditional branch: Integer comparison with zero
| Opcode |
Operand(s) |
Description |
ifeq |
branchbyte1, branchbyte2 |
pop int value, if value == 0, branch to offset |
ifne |
branchbyte1, branchbyte2 |
pop int value, if value != 0, branch to offset |
iflt |
branchbyte1, branchbyte2 |
pop int value, if value < 0, branch to offset |
ifle |
branchbyte1, branchbyte2 |
pop int value, if value <= 0, branch to offset |
ifgt |
branchbyte1, branchbyte2 |
pop int value, if value > 0, branch to offset |
ifge |
branchbyte1, branchbyte2 |
pop int value, if value >= 0, branch to offset |
Another family of if opcodes pops two integers off the top of the stack and compares them against one another. The Java virtual machine branches
if the comparison succeeds. Just before these opcodes are executed, value2 is on the top of the stack; value1 is just beneath
value2.
Conditional branch: Comparison of two integers
| Opcode |
Operand(s) |
Description |
if_icmpeq |
branchbyte1, branchbyte2 |
pop int value2 and value1, if value1 == value2, branch to offset |
if_icmpne |
branchbyte1, branchbyte2 |
pop int value2 and value1, if value1 != value2, branch to offset |
if_icmplt |
branchbyte1, branchbyte2 |
pop int value2 and value1, if value1 < value2, branch to offset |
if_icmple |
branchbyte1, branchbyte2 |
pop int value2 and value1, if value1 <= value2, branch to offset |
if_icmpgt |
branchbyte1, branchbyte2 |
pop int value2 and value1, if value1 > value2, branch to offset |
if_icmpge |
branchbyte1, branchbyte2 |
pop int value2 and value1, if value1 >= value2, branch to offset |
The opcodes shown above operate on ints. These opcodes also are used for comparisons of types short, byte, and char -- the JVM always manipulates types smaller than int by first converting them to ints and then manipulating the ints.
A third family of opcodes takes care of comparisons of the other primitive types: long, float, and double. These opcodes don't cause a branch by themselves. Instead, they push the int value that represents the result of the comparison -- 0 for equal to, 1 for greater than, and -1 for less than -- and then
use one of the int compare opcodes introduced above to force the actual branch.
Comparison of longs, floats, and doubles
| Opcode |
Operand(s) |
Description |
lcmp |
(none) |
pop long value2 and value1, compare, push int result |
fcmpg |
(none) |
pop float value2 and value1, compare, push int result |
fcmpl |
(none) |
pop float value2 and value1, compare, push int result |
dcmpg |
(none) |
pop double value2 and value1, compare, push int result |
dcmpl |
(none) |
pop double value2 and value1, compare, push int result |
The two opcodes for float comparisons (fcmpg and fcmpl) differ only in how they handle NaN ("not a number"). In the Java virtual machine, comparisons of floating-point numbers
always fail if one of the values being compared is NaN. If neither value being compared is NaN, both fcmpg and fcmpl instructions push a 0 if the values are equal, a 1 if the value1 is greater than value2, and a -1 if value1 is less than
value2. But if one or both of the values is NaN, the fcmpg instruction pushes a 1, whereas the fcmpl instruction pushes a -1. Because both of these operands are available, any comparison between two float values can push the same result onto the stack independent of whether the comparison failed because of a NaN. This is also
true for the two opcodes that compare double values: dcmpg and dcmpl.
A fourth family of if opcodes pops one object reference off the top of the stack and compares it with null. If the comparison succeeds, the JVM
branches.
Conditional branch: object reference comparison with null
| Opcode |
Operand(s) |
Description |
ifnull |
branchbyte1, branchbyte2 |
pop reference value, if value == null, branches to offset
|
ifnonnull |
branchbyte1, branchbyte2 |
pop reference value, if value != null, branches to offset
|
The last family of if opcodes pops two object references off the stack and compares them with each other. In this case, there are only two comparisons
that make sense: "equals" and "not equals." If the references are equal, then they refer to the exact same object on the heap.
If not, they refer to two different objects. As with all the other if opcodes, if the comparison succeeds, the JVM branches.
Conditional branch: Comparison of two object references
| Opcode |
Operand(s) |
Description |
if_acmpeq |
branchbyte1, branchbyte2 |
pop reference value2 and value1, if value1 == value2, branch to offset |
if_acmpne |
branchbyte1, branchbyte2 |
pop reference value2 and value1, if value1 != value2, branch to offset |
It's unconditional: goto opcodes
Those are all of the opcodes that cause the Java virtual machine to branch conditionally. One other family of opcodes, however,
causes the JVM to branch unconditionally. Not surprisingly, these opcodes are called "goto." Although goto is a reserved word in the Java programming language, it can't be used in your programs because it won't compile. The reason
goto is a reserved word is so that a mischievous programmer can't make a variable named "goto" in order to freak out their peers. But, when you compile a Java program, the bytecodes generated will likely contain lots
of goto instructions.
| Opcode |
Operand(s) |
Description |
goto |
branchbyte1, branchbyte2 |
branch to offset |
goto_w |
branchbyte1, branchbyte2, branchbyte3, branchbyte4 |
branch to offset |
The above opcodes, which perform comparisons and both conditional and unconditional branches, are sufficient to express to
a Java virtual machine the desired control flow indicated in Java source code. They achieve this with an if, if-else, while, do-while, or for statement. The above opcodes also could be used to express a switch statement, but the JVM's instruction set includes two opcodes specially designed for the switch statement: tableswitch and lookupswitch.
The nitty gritty of tableswitch and lookupswitch
The tableswitch and lookupswitch instructions both include one default branch offset and a variable-length set of case value/branch offset pairs. Both instructions pop the key (the value of the expression in the parentheses immediately following
the switch keyword) from the stack. The key is compared with all the case values. If a match is found, the branch offset associated
with the case value is taken. If no match is found, the default branch offset is taken.
The difference between tableswitch and lookupswitch is in how they indicate the case values. The lookupswitch instruction is more general-purpose than tableswitch, but tableswitch is usually more efficient. Both instructions are followed by zero to three bytes of padding -- enough so that the byte immediately
following the padding starts at an address that is a multiple of four bytes from the beginning of the method. (These two instructions,
by the way, are the only ones in the entire Java virtual machine instruction set that involve alignment on a greater than
one-byte boundary.) For both instructions, the next four bytes after the padding is the default branch offset.
After the zero- to three-byte padding and the four-byte default branch offset, the lookupswitch opcode is followed by a four-byte value, npairs, which indicates the number of case value/branch offset pairs that will follow. The case value is an int; this highlights the fact that switch statements in Java require a key expression that is an int, short, char, or byte. If you attempt to use a long, float, or double as a switch key, your program won't compile. The branch offset associated with each case value is another four-byte offset.
In the tableswitch instruction, the zero- to three-byte padding and the four-byte default branch offset are followed by low and high int values. The low and high values indicate the endpoints of a range of case values included in this tableswitch instruction. Following the low and high values are high - low + 1 branch offsets -- one branch offset for high, one for low,
and one for each integer case value in between high and low. The branch offset for low immediately follows the high value.
Thus, when the Java virtual machine encounters a lookupswitch instruction, it must check the key against each case value until it finds a match or runs out of case values. If it runs
out of case values, it uses the default branch offset. On the other hand, when the JVM encounters a tableswitch instruction, it can simply check to see if the key is within the range defined by low and high. If not, it takes the default
branch offset. If so, it just subtracts low from key to get an offset into the list of branch offsets. In this manner, it
can determine the appropriate branch offset without having to check each case value.