Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

The lean, mean, virtual machine

An introduction to the basic structure and functionality of the Java Virtual Machine

  • Print
  • Feedback
Welcome to the first installment of "Under The Hood." In this column I'd like to explore topics concerning the inner workings of Java. Each month I'll focus on one area and attempt to demystify it. My aim is to help programmers understand what is actually going on when they compile and run their Java programs. In this installment, I provide an introduction to the basic structure and functionality of the Java Virtual Machine.



What is the Java Virtual Machine? Why is it here?

The Java Virtual Machine, or JVM, is an abstract computer that runs compiled Java programs. The JVM is "virtual" because it is generally implemented in software on top of a "real" hardware platform and operating system. All Java programs are compiled for the JVM. Therefore, the JVM must be implemented on a particular platform before compiled Java programs will run on that platform.



The JVM plays a central role in making Java portable. It provides a layer of abstraction between the compiled Java program and the underlying hardware platform and operating system. The JVM is central to Java's portability because compiled Java programs run on the JVM, independent of whatever may be underneath a particular JVM implementation.



What makes the JVM lean and mean? The JVM is lean because it is small when implemented in software. It was designed to be small so that it can fit in as many places as possible -- places like TV sets, cell phones, and personal computers. The JVM is mean because it of its ambition. "Ubiquity!" is its battle cry. It wants to be everywhere, and its success is indicated by the extent to which programs written in Java will run everywhere.



Java bytecodes

Java programs are compiled into a form called Java bytecodes. The JVM executes Java bytecodes, so Java bytecodes can be thought of as the machine language of the JVM. The Java compiler reads Java language source (.java) files, translates the source into Java bytecodes, and places the bytecodes into class (.class) files. The compiler generates one class file per class in the source.



To the JVM, a stream of bytecodes is a sequence of instructions. Each instruction consists of a one-byte opcode and zero or more operands. The opcode tells the JVM what action to take. If the JVM requires more information to perform the action than just the opcode, the required information immediately follows the opcode as operands.



A mnemonic is defined for each bytecode instruction. The mnemonics can be thought of as an assembly language for the JVM. For example, there is an instruction that will cause the JVM to push a zero onto the stack. The mnemonic for this instruction is iconst_0, and its bytecode value is 60 hex. This instruction takes no operands. Another instruction causes program execution to unconditionally jump forward or backward in memory. This instruction requires one operand, a 16-bit signed offset from the current memory location. By adding the offset to the current memory location, the JVM can determine the memory location to jump to. The mnemonic for this instruction is goto, and its bytecode value is a7 hex.



Virtual parts

The "virtual hardware" of the Java Virtual Machine can be divided into four basic parts: the registers, the stack, the garbage-collected heap, and the method area. These parts are abstract, just like the machine they compose, but they must exist in some form in every JVM implementation.



The size of an address in the JVM is 32 bits.The JVM can, therefore, address up to 4 gigabytes (2 to the power of 32) of memory, with each memory location containing one byte. Each register in the JVM stores one 32-bit address. The stack, the garbage-collected heap, and the method area reside somewhere within the 4 gigabytes of addressable memory. The exact location of these memory areas is a decision of the implementor of each particular JVM.



A word in the Java Virtual Machine is 32 bits. The JVM has a small number of primitive data types: byte (8 bits), short (16 bits), int (32 bits), long (64 bits), float (32 bits), double (64 bits), and char (16 bits). With the exception of char, which is an unsigned Unicode character, all the numeric types are signed. These types conveniently map to the types available to the Java programmer. One other primitive type is the object handle, which is a 32-bit address that refers to an object on the heap.



The method area, because it contains bytecodes, is aligned on byte boundaries. The stack and garbage-collected heap are aligned on word (32-bit) boundaries.



The proud, the few, the registers

The JVM has a program counter and three registers that manage the stack. It has few registers because the bytecode instructions of the JVM operate primarily on the stack. This stack-oriented design helps keep the JVM's instruction set and implementation small.



The JVM uses the program counter, or pc register, to keep track of where in memory it should be executing instructions. The other three registers -- optop register, frame register, and vars register -- point to various parts of the stack frame of the currently executing method. The stack frame of an executing method holds the state (local variables, intermediate results of calculations, etc.) for a particular invocation of the method.



The method area and the program counter

The method area is where the bytecodes reside. The program counter always points to (contains the address of) some byte in the method area. The program counter is used to keep track of the thread of execution. After a bytecode instruction has been executed, the program counter will contain the address of the next instruction to execute. After execution of an instruction, the JVM sets the program counter to the address of the instruction that immediately follows the previous one, unless the previous one specifically demanded a jump.



The Java stack and related registers

The Java stack is used to store parameters for and results of bytecode instructions, to pass parameters to and return values from methods, and to keep the state of each method invocation. The state of a method invocation is called its stack frame. The vars, frame, and optop registers point to different parts of the current stack frame.

  • Print
  • Feedback

Resources