Welcome back. Continuing from our last post, today, we’re going to peek behind the curtain and see how the Little Computer 3 (LC-3) actually “understands” our instructions. Imagine you’re writing a letter to someone who only reads binary—you’d need a very specific format for them to understand your message. That’s exactly what we’re doing when we write LC-3 assembly code: we’re writing human-readable instructions that need to be translated into a language of 1s and 0s that the computer understands.
Here is a simple example. When you write ADD R1, R2, R3
in assembly, you’re saying “take the values in registers R2 and R3, add them together, and put the result in R1.” But the computer needs this instruction packaged in a very specific way—as a 16-bit pattern that tells it exactly what to do. Think of it like filling out a standardized form where each box has a specific meaning:
|
|
Quick Refresher: The Building Blocks
Before we dive deeper, let’s understand what we’re working with. Think of the LC-3 as a small office with these key components:
- Eight filing cabinets (our registers R0-R7) for quick access to values we’re working with.
- A massive storage room with exactly 65,536 numbered shelves (our memory locations). Why 65,536? Because with a 16-bit address, we can specify any location from x0000 to xFFFF—that’s 2^16 unique addresses. Each shelf holds one 16-bit value.
- A set of standardized procedures (instructions) for moving and processing information. Each procedure starts with a 4-bit code (opcode) that tells us what kind of operation we’re performing.
This setup gives us a perfect balance of speed (registers) and space (memory), while keeping our instructions simple and uniform—every instruction fits in exactly 16 bits.
Basic Arithmetic: ADD and AND
Let’s start with the most fundamental operations. These are like the basic arithmetic you do every day, just formatted in a very specific way for the computer.
The ADD Instruction
ADD comes in two flavors, just like you might add either two numbers from your calculator’s memory or a number from memory plus a small constant. In LC-3 terms:
- Register Mode:
ADD R3, R1, R2
(Add the contents of R1 and R2) - Immediate Mode:
ADD R3, R1, #5
(Add 5 to the contents of R1)
Let’s see how ADD R3, R1, #5
gets encoded:
|
|
When would you use this? Imagine you’re counting items in a loop—you’d use ADD R0, R0, #1
to increment your counter by 1.
The AND Operation
AND works exactly like ADD but performs a bitwise AND operation instead of addition. It’s particularly useful when you need to check specific bits in a value—like checking if a number is odd by ANDing it with 1.
The NOT Operation
NOT is a simpler operation that flips all the bits in a value—turning every 1 to 0 and every 0 to 1. The instruction looks like this:
|
|
In machine code, NOT has a unique pattern:
|
|
You might use NOT when you need to find the opposite of a binary pattern, or as part of calculating other operations like subtraction (by NOTing a number and adding 1, you get its negative).
Working with Memory: Loading and Storing
Now, let’s talk about moving data between our registers and memory. Think of this like retrieving files from your storage room (loading) or filing them away (storing).
Loading Values (LD)
When you write:
|
|
The LC-3 needs to know how far away MESSAGE is from our current position. We call this the “offset,” and it’s like saying “walk forward 4 shelves” or “go back 3 shelves” in our storage room analogy. The assembler calculates this distance for us and encodes it in the instruction.
Storing Values (ST)
ST is the opposite of LD—it takes what’s in a register and saves it to memory. You’ll use this when you need to save results for later use:
|
|
Getting Addresses (LEA)
Sometimes you don’t want the contents of a memory location—you want its address instead. That’s where LEA (Load Effective Address) comes in. Think of it like getting the shelf number instead of what’s on the shelf:
|
|
This is particularly useful when you’re working with strings or arrays and need to remember where they start in memory. The machine code follows the same pattern as LD, but with opcode 1110
.
Control Flow: Making Decisions
Sometimes your program needs to make decisions or jump to different sections. LC-3 provides several ways to do this.
Branching (BR)
Branching is like a road sign that says “if condition X is true, go this way.” For example:
|
|
The BR instruction looks at “condition codes” (negative, zero, or positive) set by the previous instruction and decides whether to jump or continue straight ahead. In machine code, it looks like this:
|
|
For example, BRz
sets only the ‘z’ bit (010), while BRnzp
sets all three bits (111). You can combine these flags any way you need—BRnp
would jump if the last result was either negative or positive (but not zero).
Jumping Around (JMP and RET)
Sometimes you want to jump unconditionally to a specific location. The JMP instruction (opcode 1100
) does exactly this:
|
|
RET, which we use to return from subroutines, is actually just a special case of JMP—specifically, JMP R7
. Since we always store our return address in R7, this gets us back to where we came from.
Subroutine Calls (JSR/JSRR)
Sometimes you want to jump to another part of your program but remember where you came from—like marking your page in a book before checking the index. JSR (Jump to Subroutine) and JSRR (Jump to Subroutine Register) handle this:
|
|
This is particularly useful for code you use repeatedly, like printing numbers or handling input. The computer automatically saves the return address in R7 (that’s why we call R7 the “link register”).
A Complete Example
Let’s put it all together with a simple program that counts down from 5 to 0:
|
|
System Calls and Pseudo-Ops
TRAP Instructions
The LC-3 provides several built-in routines through TRAP instructions. Think of these as pre-written helper functions that handle common tasks:
|
|
Each TRAP instruction has an 8-bit “trap vector” (like x21, x23, x25) that tells the computer which helper routine to use. In machine code, TRAP uses opcode 1111
followed by the vector number.
Assembler Directives (Pseudo-ops)
While not actual CPU instructions, pseudo-ops help organize our program:
.ORIG x3000
: “Start putting the program here in memory”.END
: “This is the end of our program”.FILL x1234
: “Put this value (x1234) at this spot in memory”.BLKW #5
: “Reserve 5 words of memory here”.STRINGZ "Hello"
: “Store this string here, with a zero at the end”
Quick Reference Guide
Here’s a handy reference for the terms we’ve covered:
- Opcode: The 4-bit code that starts each instruction (like 0001 for ADD)
- Immediate Value: A constant number built right into the instruction
- Offset: The distance to a memory location, used in LD/ST/BR instructions
- Condition Codes: Flags (negative/zero/positive) set by arithmetic operations
- Trap Vector: An 8-bit code identifying which system routine to call
- Link Register: R7, used to store return addresses for subroutines
Wrapping Up: From Assembly to Bits
Understanding how assembly maps to machine code demystifies how computers work at their core. Each 16-bit instruction is like a tiny, self-contained message telling the CPU exactly what to do. While we’ve covered the basics here, in our next post we’ll dive into building an actual assembler that can translate assembly code into these precise patterns.
Whether we’re planning to write low-level code or work with high-level AI models, I believe this foundation helps ourselves understand how computers actually process our instructions under the hood.
That’s it for today. Until next time, happy coding—and happy bit-wrangling!