Virtual Machine Implementation
Byte-Code and Assembler
Every executable, compiled executable that is, is in binary. Which is a sequence of zeros and ones, but int the grand scheme of things, the binary actually represents instructions to the CPU that are in the form of Byte-Code.
Byte-Code by the name means that the binary could be split into different sections of Bytes. Where each section either represents a sequence of instructions or raw data (decimal or characters).
At first, there is the magic number or the indicator of what type of binary is this. This is standarised and all the major formats has their own magic numbers. Then there could be a sort of a header for the file, it may contains information about the program.
Standard types also include information about sizes, like the sizes of the different sections of the binary.
And final, the assembler is the program responsible of turning a set of assembly instructions into binary. Thus, encoding each instruction and its arguments and parameters as well as data into Byte-Code.
.name "zork"
.comment "I'M ALIIIVE"
l2: sti r1, %:live, %1
and r1, %0, r1
live: live %1
zjmp %:live
Above is a simple program written in a language similar to Assembly, it has its own assembler. The Byte-Code looks like this.
0000000 00 ea 83 f3 7a 6f 72 6b 00 00 00 00 00 00 00 00
0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000080 00 00 00 00 00 00 00 00 00 00 00 17 49 27 4d 20
0000090 41 4c 49 49 49 49 56 45 00 00 00 00 00 00 00 00
00000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000890 0b 68 01 00 0f 00 01 06 64 01 00 00 00 00 01 01
00008a0 00 00 00 01 09 ff fb
00008a7
Virtual Machine and Execution
The VM does the reverse, it decodes the Byte-Code, reads the data and executes the instructions. Each Assembly Language has a table of instruction to indicate how many Bytes each instruction will take up in the binary and what each encoded value represent.
For example, 0x01
could either mean the value 1
or the first register depending on the instruction table.
Sample project
This is a project that messes around all those ideas and more. It is written in C so very native. It also has docs.