If you try to map instruction op codes to instruction sizes, you’ll get the following discouraging table:
0 - 15 1 bytes
16 2 bytes
17 3 bytes
18 2 bytes
19 - 20 3 bytes
21 - 25 2 bytes
26 - 53 1 bytes
54 - 58 2 bytes
59 - 131 1 bytes
132 3 bytes
133 - 152 1 bytes
153 - 168 3 bytes
169 2 bytes
170 - 171 special handling
172 - 177 1 bytes
178 - 184 3 bytes
185 - 186 5 bytes
187 3 bytes
188 2 bytes
189 3 bytes
190 - 191 1 bytes
192 - 193 3 bytes
194 - 195 1 bytes
196 special handling
197 4 bytes
198 - 199 3 bytes
200 - 201 5 bytes
In other words, there is no size information encoded in the instruction’s numeric value nor its bit pattern, but there is another property, which you can consider some sort of pattern: out of the ~200 defined instructions, roughly 150 instructions have the size of one byte, leaving only ~50 instructions which require any handling at all. Even this small group of instructions can be subdivided further into logical groups, the majority taking three bytes, the second biggest group taking two bytes.
So the code of a method rushing through the instructions may look like:
static void readByteCode(ByteBuffer bb) {
while(bb.hasRemaining()) {
switch(bb.get()&0xff) {
case BIPUSH: // one byte embedded constant
case LDC: // one byte embedded constant pool index
// follow-up: one byte embedded local variable index
case ILOAD: case LLOAD: case FLOAD: case DLOAD: case ALOAD:
case ISTORE: case LSTORE: case FSTORE: case DSTORE: case ASTORE: case RET:
case NEWARRAY: // one byte embedded array type
bb.get();
break;
case IINC: // one byte local variable index, another one for the constant
case SIPUSH: // two bytes embedded constant
case LDC_W: case LDC2_W: // two bytes embedded constant pool index
// follow-up: two bytes embedded branch offset
case IFEQ: case IFNE: case IFLT: case IFGE: case IFGT: case IFLE:
case IF_ICMPEQ: case IF_ICMPNE: case IF_ICMPLT: case IF_ICMPGE:
case IF_ICMPGT: case IF_ICMPLE: case IF_ACMPEQ: case IF_ACMPNE:
case GOTO: case JSR: case IFNULL: case IFNONNULL:
// follow-up: two bytes embedded constant pool index to member or type
case GETSTATIC: case PUTSTATIC: case GETFIELD: case PUTFIELD:
case INVOKEVIRTUAL: case INVOKESPECIAL: case INVOKESTATIC: case NEW:
case ANEWARRAY: case CHECKCAST: case INSTANCEOF:
bb.getShort();
break;
case MULTIANEWARRAY:// two bytes pool index, one byte dimension
bb.getShort();
bb.get();
break;
// follow-up: two bytes embedded constant pool index to member, two reserved
case INVOKEINTERFACE: case INVOKEDYNAMIC:
bb.getShort();
bb.getShort();
break;
case GOTO_W: case JSR_W:// four bytes embedded branch offset
bb.getInt();
break;
case LOOKUPSWITCH:
// special handling left as an exercise for the reader...
break;
case TABLESWITCH:
// special handling left as an exercise for the reader...
break;
case WIDE:
int widened=bb.get()&0xff;
bb.getShort(); // local variable index
if(widened==IINC) {
bb.getShort(); // constant offset value
}
break;
default: // one of the ~150 instructions taking one byte
}
}
}
I intentionally kept some of the instructions separated having the same number of follow-up bytes, but with a different meaning. After all, you want to insert some actual logic at certain places, I guess.
Note that the handling of the two switch
bytecode instructions is left out, they require padding whose implementation requires knowledge about the code alignment within the buffer, which is in control of the caller. So that’s up to your specific application. Refer to the documentation of lookupswitch
and tableswitch
.
Of course, handling of all single byte instructions as default
implies that the code won’t catch unknown or invalid instructions. If you want safety, you’ll have to insert the cases…
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…