Programming Like It's 1982. Part IV – Instructions :: doismiu

In the previous post we built the map. We know where data lives, how memory is organized into addresses, and why certain regions were worth more than others. Now for the missing piece. What the processor actually does while running a program.

The program is also bytes

In the post about the execution cycle we saw that the processor reads a byte, interprets it as an instruction, executes, and moves to the next. But we never talked about how a byte becomes an instruction.

The answer is simple. Every processor instruction has a fixed number that represents it. That number is called an opcode. The processor reads the byte $A9 and knows, by construction in silicon, that it must load the next byte into register A. It reads $8D and knows it must take the value from A and write it to a memory address specified by the next two bytes. It reads $60 and knows it must return from a subroutine.

The processor does not read text. It does not read names. It reads numbers, and each number has a meaning burned into the chip permanently.

Assembly is the layer that exists so humans do not have to memorize those numbers. Instead of writing $A9 $05, you write LDA #$05. Instead of $8D $00 $02, you write STA $0200. The assembler, the program that compiles assembly code, converts that text back into the bytes the processor understands. It is a direct translation, with no interpretation, no optimization. Each line becomes exactly the corresponding bytes.

What this means in practice. When you write LDA #$05, you are not describing an intention to a smart compiler. You are literally specifying which bytes will sit in memory, in what order, and what the processor will do when it gets to them.

The instructions you will use all the time

With that model in mind, the instructions become natural. The 6510 has around 56 mnemonics in total, but a small set shows up in practically every program.

LDA and STA are the data movement pair. LDA loads a value into the accumulator A, STA writes the value of A to a memory address. Everything that enters and leaves the processor passes through them. The versions for X and Y follow the same pattern. LDX/STX and LDY/STY.

ADC and SBC are addition and subtraction. ADC adds a value to what is already in A, SBC subtracts. The 6510 has no native multiplication or division. Those operations were implemented by hand, using loops of repeated additions or subtractions. Nothing that a C compiler does not do underneath to this day on certain architectures.

INX, INY, DEX, DEY increment and decrement the X and Y registers. They are the i++ and i-- of assembly, and appear in practically every loop.

JMP, JSR, and RTS control flow. JMP is an unconditional goto. The Program Counter jumps to the specified address and the processor continues from there. JSR calls a subroutine, saving the return address on the stack. RTS returns from that subroutine, reading the address back from the stack and going back to where it came from. It is the mechanism behind every function call in assembly.

When JSR jumps to a subroutine, the processor has a problem. It needs to remember where to come back when RTS is called. The place it stores that information is the stack, a fixed region of memory from $0100 to $01FF that works like a physical stack of plates.

JSR pushes the return address onto the top of the stack and the SP moves down one slot to reflect that something was added. RTS reads the address back off the top, the SP moves up one slot, and execution continues from where the call was made. Each JSR pushes, each RTS pops, always balanced.

This is why nested subroutine calls work without breaking anything. Each call pushes a new return address, each return pops one. The stack holds 256 bytes, enough for around 128 nested calls before overflowing.

The detail that trips everyone up, `#` or no `#`

There is a small syntactic distinction that causes more confusion than anything else when starting out. The # symbol before a value completely changes what the instruction does.

LDA #$05    ; A = 5  (the number five itself)
LDA $05     ; A = RAM[$05]  (whatever is stored at address 5)

With #, you are working with the literal value. Without #, you are working with the contents of the memory address. In Python, the difference would be between x = 5 and x = RAM[5].

In practice. LDA #$00 zeroes the accumulator. LDA $00 loads into the accumulator whatever is stored at position zero of the Zero Page, which could be anything depending on the program. They are completely different instructions with a difference of one character.

A complete program, line by line

This program loads the number 5 into the accumulator and stores it at address $0200:

LDA #$05    ; A = 5
STA $0200   ; RAM[$0200] = A
RTS         ; return

Simple enough. But the interesting part is seeing what this becomes in memory. Each instruction occupies a different number of bytes depending on how much information it needs to carry:

LDA #$05 becomes 2 bytes, the opcode $A9 followed by the value $05
STA $0200 becomes 3 bytes, the opcode $8D, the low byte of the address $00, and the high byte $02
RTS becomes 1 byte, the opcode $60, alone

Six bytes total. That is the entire program. When the processor reads the first byte at $0801, it executes the LDA. When it advances to $0803, it executes the STA. When it gets to $0806, it returns. There is nothing beyond that.

Notice the order of the address bytes in STA. The low byte comes before the high byte. $0200 is stored as $00 $02, not $02 $00. This is called little-endian, and it is a characteristic of the entire 6502 family. It will appear every time an address is stored in two consecutive bytes in memory.

FUN FACT!

Before the assembler, there was the monitor

Back in the C64 era, having an assembler was not guaranteed. The computer came with BASIC in ROM and nothing else. For anyone who wanted to write assembly without an external tool, the alternative was the machine monitor, a utility that let you view and edit memory byte by byte, directly in hexadecimal.

On the C64, the monitor was typically loaded from a cartridge or disk, or accessed by jumping to a specific ROM address using SYS. Once inside, you saw something like this:

.M 0801
0801: 00 00 00 00 00 00 00 00

To write the program above manually:

.M 0801
0801: A9 05 8D 00 02 60

You typed the opcodes in hex, one by one, knowing by heart what each number meant. $A9 is LDA immediate, $8D is STA absolute, $60 is RTS. Nothing there to check if you made a mistake. If you typed $A8 instead of $A9, the program would do something completely different without any warning.

Many programmers of the era operated this way by choice, not for lack of options. It was faster than loading an external assembler from a floppy disk, and whoever worked this way knew the instruction set cold.

In the next post we put all of this together and write something that actually does something visible. A loop that fills the screen, using what we now know about registers, memory, instructions, and the execution cycle.

The program is also bytes

The instructions you will use all the time

The detail that trips everyone up, # or no #

A complete program, line by line

The detail that trips everyone up, `#` or no `#`