diff options
17 files changed, 554 insertions, 0 deletions
diff --git a/docs/projects/img/nand2tetris_2.1/function_states.kra b/docs/projects/img/nand2tetris_2.1/function_states.kra Binary files differnew file mode 100644 index 0000000..a92e83a --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/function_states.kra diff --git a/docs/projects/img/nand2tetris_2.1/function_states.png b/docs/projects/img/nand2tetris_2.1/function_states.png Binary files differnew file mode 100644 index 0000000..b0276fb --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/function_states.png diff --git a/docs/projects/img/nand2tetris_2.1/function_states_blobs.png b/docs/projects/img/nand2tetris_2.1/function_states_blobs.png Binary files differnew file mode 100644 index 0000000..db7472e --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/function_states_blobs.png diff --git a/docs/projects/img/nand2tetris_2.1/ram.kra b/docs/projects/img/nand2tetris_2.1/ram.kra Binary files differnew file mode 100644 index 0000000..3d2c3f4 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/ram.kra diff --git a/docs/projects/img/nand2tetris_2.1/ram_areas.png b/docs/projects/img/nand2tetris_2.1/ram_areas.png Binary files differnew file mode 100644 index 0000000..9650182 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/ram_areas.png diff --git a/docs/projects/img/nand2tetris_2.1/stack.kra b/docs/projects/img/nand2tetris_2.1/stack.kra Binary files differnew file mode 100644 index 0000000..ca0dd1b --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack.kra diff --git a/docs/projects/img/nand2tetris_2.1/stack_after_add.png b/docs/projects/img/nand2tetris_2.1/stack_after_add.png Binary files differnew file mode 100644 index 0000000..efc19e8 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_after_add.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_after_call.png b/docs/projects/img/nand2tetris_2.1/stack_after_call.png Binary files differnew file mode 100644 index 0000000..d4a7311 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_after_call.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_after_function.png b/docs/projects/img/nand2tetris_2.1/stack_after_function.png Binary files differnew file mode 100644 index 0000000..b752f93 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_after_function.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_after_return.png b/docs/projects/img/nand2tetris_2.1/stack_after_return.png Binary files differnew file mode 100644 index 0000000..4a8ea50 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_after_return.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_before_add.png b/docs/projects/img/nand2tetris_2.1/stack_before_add.png Binary files differnew file mode 100644 index 0000000..4f13a1b --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_before_add.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_before_call.png b/docs/projects/img/nand2tetris_2.1/stack_before_call.png Binary files differnew file mode 100644 index 0000000..408d626 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_before_call.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_before_restore.png b/docs/projects/img/nand2tetris_2.1/stack_before_restore.png Binary files differnew file mode 100644 index 0000000..b45ccd2 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_before_restore.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_before_return.png b/docs/projects/img/nand2tetris_2.1/stack_before_return.png Binary files differnew file mode 100644 index 0000000..8aabcb2 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_before_return.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_call_backup.png b/docs/projects/img/nand2tetris_2.1/stack_call_backup.png Binary files differnew file mode 100644 index 0000000..b6a9607 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_call_backup.png diff --git a/docs/projects/img/nand2tetris_2.1/stack_push.png b/docs/projects/img/nand2tetris_2.1/stack_push.png Binary files differnew file mode 100644 index 0000000..3395169 --- /dev/null +++ b/docs/projects/img/nand2tetris_2.1/stack_push.png diff --git a/docs/projects/nand2tetris_2.1.md b/docs/projects/nand2tetris_2.1.md new file mode 100644 index 0000000..c707732 --- /dev/null +++ b/docs/projects/nand2tetris_2.1.md @@ -0,0 +1,554 @@ +# nand2tetris, Part 2.1 + +2022-08-25 + +Last week I completed nand2tetris part 1 and wrote [a blogpost about +it](../nand2tetris_1). It was great fun; I like how I built a computer +from NAND gates. Without missing a beat I went on to work on the sequel: +nand2tetris, Part 2. + +So we have a computer. Now what? Hang it on the wall for the lolz? Of +course not — let's force the emulation of silicon to think. + +Again, this course expects you to finish 6 projects (actually 7 but +"module 0" is actually a replica of project 04 from part 1). As of the +time of writing, I've finished two of them. Over time I discovered that +these two projects alone are already complicated enough. If I were to wait +until I finish all six, I won't have the motivation to cover them in one +single monolithic blogpost. I find it more sensible to mark a checkpoint, +write a blogpost, then carry on doing project 09. + +__SPOILER WARNING: this blogpost contains my solutions to many challenges. +If you haven't taken the course but wish to, please do it before reading +this.__ + +## Stack machine + +Let's pull out an inventory of things we made in project 06: + +- A computer built on the HACK architecture +- HACK assembly language +- An assembler from HACK assembly to machine code + +What do we do now? Write Tetris in assembly? Of course not. We need +a high-level language and a compiler. However, the authors find the curve +too steep from assembly to a high-level language. Therefore, we are +introduced an abstraction: the __stack machine__, also called virtual +machine (VM). Our ultimate task in project 07 and 08 is to write a VM +translator that translates VM commands into one single assembly file, +where we emulate the stack machine on HACK hardware. + +What's a stack? Sure, it's a kind of data structure, but right now we are +talking about a bunch of contiguous words in RAM and a pointer to the top. +The stack starts at RAM[256] and goes on until 2047. RAM[0], which in our +assembler is also called SP, contains the pointer to the top of our stack +(although technically, it is the address of the topmost word plus one; +more on that later). Initially, since the stack is empty, SP = 256. + +Of course, our machine is much more than that, even in our first two +projects. Our focus in this blogpost is to understand how the stack +machine works, how it helps our computer be a computer, and how we +implement it using HACK assembly. + +Our complete RAM is charted as follows. It'll prove useful sooner or +later. + +![](img/nand2tetris_2.1/ram_areas.png) + +Nomenclature: Except in assembly code, the symbols SP/LCL/etc. refer to +the value stored in RAM[0]/RAM[1]/etc., and an asterisk followed by +a symbol (with or without an offset) dereferences it in C-style, e.g. +\*(LCL+1) refers to RAM[LCL+1], i.e. RAM[RAM[1] + 1]. + +## Memory segments + +You can do many things to a stack, but the most elementary operations are +push and pop. Let's say you want to push a number, say 420, onto the +stack. All you need to do is: + +- read SP +- go to RAM[SP] +- write 420 +- increment SP by one + +The VM command for this is `push constant 420`. You can repeat this as +many times as you please, and SP will keep updated. + +![SP=263, stack: 420 (7 times)](img/nand2tetris_2.1/stack_push.png) + +What if we need to push some non-constant number, like RAM[420]? In this +case, the VM implementation imposes some restrictions. You cannot just ask +for any memory in RAM; you have to specify one of eight "memory segments". +Four of them have something to do with the four companions of SP: LCL, +ARG, THIS, and THAT. In project 06, their purposes were not clearly +stated, but they're actually pointers to useful chunks of memory at +a certain point in the program. To push them onto the stack, you follow +these pointers. + +Suppose LCL (RAM[1]) is 420, and I run this VM command: `push local 3`. +The result is: + +- SP is incremented by one +- \*(SP-1) is equal to RAM[420+3] + +LCL is not an assemble-time constant, so the 420+3 is computed in +runtime like this: + +``` +@LCL +D=M // D is 420 +@3 +A=D+A // A is 423 +D=M // now D is RAM[423] +``` + +The other memory segments are dead simple so I will not elaborate. + +pop is the exact opposite of push and share a similar syntax. It's worth +mentioning that: + +- You decrement SP _before_ copying top of stack to memory segment +- There's no need to erase what you just popped from the stack +- You can't pop to the constant segment + +<details markdown="1"> +<summary markdown="1">__Watch me struggle to implement `pop local 3`__</summary> + +All I have to do is copy \*(--SP) into \*(LCL+3) in assembly. Easy, right? +Here goes: + +``` +// find out what's on top of stack +@SP +AM=M-1 +D=M // D = RAM[SP] +// go to destination +@LCL +D=M +@3 +A=D+A +M=D // RAM[LCL+3] = D +``` + +Do you spot the problem? When we did `D=M` the second time, we overwrote +our copy of \*SP with LCL. And that's bad, because we will need \*SP on +the last line. + +The solution is a bit convoluted. You need to: + +- add 3 to LCL +- decrement SP by one +- copy \*(SP) into D +- go to \*LCL +- write D into \*LCL +- subtract 3 from LCL + +The main difference is that we no longer touch D as we relay the data. The +assembly is 14 instructions long, compared to 10 when we do `push local +3`, as a result of this limitation I call "not enough registers". + +I would like to thank John Downey for [his +implementation](https://github.com/jtdowney/nand2tetris/blob/master/07/commands/pop_command.rb) +which I referred to when I was stuck. +</details> + +OK, so we're done with pushing and popping. It doesn't sound very +exciting, does it? After all it's just copying and pasting stuff into more +stuff. Indeed, we need to discover what we can do with our stack. + +## Stack arithmetics + +When we program, we will need to do math at one point (gasp, who could've +thought that). Thus, we will be implementing `add`, `sub` and `neg`. These +commands do not take "parameters" like push and pop do because they always +operate on the top of stack. Suppose we have two numbers, a and b, with +b being the topmost word on our stack, i.e. \*(SP-1). + +![SP=259, Stack: 420, a, b](img/nand2tetris_2.1/stack_before_add.png) + +After we do `add`, the stack becomes this: + +![SP=258, Stack: 420, a+b, b](img/nand2tetris_2.1/stack_after_add.png) + +Similar to `pop`, stack arithmetic commands do not erase anything at or +after the SP marker because the stack machine only cares about what is +_in_ the stack, i.e. from 256 to SP. + +Now it should be clear what `sub` does; instead of a+b, it pushes a-b. +`neg` only takes one value, so the effect is the top of stack gets +negated, i.e. b becomes -b. + +## Logical commands + +I classify the so-called logical commands into two categories: + +- Bitwise operators: `and`, `or`, and `not` +- Comparison: `lt`, `gt`, and `eq` + +### Bitwise operators + +According to Shocken, logical commands return booleans. There are two +problems with this statement: + +1. Booleans are indistinguishable from integers on the hardware layer; +2. _How much is a boolean?_ + +Without thinking I assumed false is 0 and true is 1, as in `stdbool.h`. +However, upon testing the CPU emulator told me that, in fact, true is -1, +or 65535. Not a single word was said about this specification in the +lectures. I suppose the authors had this written in their book somewhere +but forgot about it in the videos. + +<details markdown="1"> +<summary markdown="1">__In hindsight, this is obvious__</summary> + +Suppose false is 0 and true is 1: + +``` +false = 0000 0000 0000 0000 +true = 0000 0000 0000 0001 +``` + +Now we do some logical commands: + +``` +false & true = 0000 0000 0000 0000 = false +false | true = 0000 0000 0000 0001 = true +``` + +So far so good, but problems arise as soon as we do a bitwise NOT, which +is the only native instruction related to inverting a boolean: + +``` +~false = 1111 1111 1111 1111 = true, probably? +~true = 1111 1111 1111 1110 = definitely not false +``` + +Thus we have proven that, if true is 1, a bitwise NOT does not invert it +to false. For this to happen, true has to be 16 ones, i.e. -1. +</details> + +### Comparison + +We proceed to implement the comparison commands: `lt`, `gt`, and `eq`. +Apparently they are extremely similar in logic: + +- Pop topmost word on stack into D (SP is decremented whenever I say + "pop") +- Pop and subtract the new topmost word from D +- Compare D to zero +- Push true to stack if [condition], false otherwise + +There are two things to figure out at this moment: + +- What is [condition] for each of the three commands +- How do we make this "if … otherwise …" work + +To be honest, I was reluctant to use branching in assembly because if so +I would have to generate a lot of labels. After quite a while messing with +the D register trying to find an "elegant" solution, I quit. Branches are +not that bad. + +<details markdown="1"> +<summary markdown="1">__Sample assembly for `lt`__</summary> + +Suppose the stack is once again in this state: + +![SP=259, Stack: 420, a, b](img/nand2tetris_2.1/stack_before_add.png) + +``` +@SP +AM=M-1 // SP-- +D=M // D = b +A=A-1 +D=M-D // D = a-b +M=0 // RAM[257] = false +@END_LT_0 +D;JGE // if (D >= 0) goto END_LT_0 +@SP +A=M-1 +M=-1 // RAM[257] = true +(END_LT_0) +``` +</details> + +## Function commands + +No language is practical without functions. Although I cannot be more +familiar with the notion of functions, the implementation remains +a mystery. The fascinating thing about functions it that it is not +a simple goto; the program has to return where it came from. In other +words, every function call involves a context switch, which will be +reversed once it returns. + +Suppose we are in function f(x), and we're calling g(y). We are going to +have two states: one used by f just before g is called, and one used by g. + +![two blobs labeled "f's state" and "g's state"](img/nand2tetris_2.1/function_states_blobs.png) + +What's in a function's state? As it turns out, it's four pointers and one stack: + +![f/g's LCL/ARG/THIS/THAT](img/nand2tetris_2.1/function_states.png) + +The neat thing about our VM is that, there's only one global stack, and +the stack of each function just starts somewhere in the middle, instead of +256. + +Suppose I, function f, am about to call function g at a point in time when +SP=301. I must first push the arguments it needs, let's say 60 and 9: + +![SP=303, LCL=299, ARG=292, THIS=1024, THAT=2048, stack (from 301): 60, 9](img/nand2tetris_2.1/stack_before_call.png) + +Now, I execute `call g 2`, where the 2 is the number of arguments that +g takes. But before I do anything risky, I need to back up my state. One +of them is the "return address", which is the address of the instruction +I would be executing next if I didn't call g. + +<details markdown="1"> +<summary markdown="1">__Wait, you're telling me we're predicting the future?__</summary> + +It sounds like wizardry but it's not. This is done with the assembler's +symbol substitution: I insert a label (for example, `(Main$ret.0)`) just +before the first instruction of the next VM command. This is where I'll +end up if `call g 2` wasn't there. Knowing the assembler will take care of +this, I just write: + +``` +@Main$ret.0 +D=A +@SP +A=M +M=D // *SP is the return address +@SP +M=M+1 // SP++ +// more code +@Main.g +0;JMP +(Main$ret.0) +// assembly translated from next VM command +``` + +If `0;JMP` is instruction 419, then the return address pushed is 420. +</details> + +Then we just push LCL, ARG, THIS and THAT: + +![SP=308, LCL=299, ARG=292, THIS=1024, THAT=2048, stack (from 301): 60, 9, +420, 299, 292, 1024, 2048](img/nand2tetris_2.1/stack_call_backup.png) + +The section of memory from 303 to 307 (inclusive) is called the "frame" of +f, because it contains all the information needed by f to recover its +state. + +Now that the clones of LCL et al. are safely contained within the frame, +we can commit all sorts of atrocities to them. ARG will become the address +of the first argument pushed onto f's stack, namely 301; LCL will become +the first address after the frame ends. We will not be touching THIS and +THAT just yet. + +![SP=308, LCL=308, ARG=301, THIS=1024, THAT=2048, stack (from 301): 60, 9, +420, 299, 292, 1024, 2048](img/nand2tetris_2.1/stack_after_call.png) + +At this point f is ready to jump to g. Suppose function g begins with this +VM command: `function g 3`. The 3 here is not the number of arguments. +Instead, it's the number of local variables it needs. Therefore, this VM +command pushes this number of zeroes onto the stack: + +![SP=311, LCL=308, ARG=301, THIS=1024, THAT=2048, stack (from 301): 60, 9, +420, 299, 292, 1024, 2048, 0, 0, 0](img/nand2tetris_2.1/stack_after_function.png) + +From now on, RAM[311-2047] is the stack that g has access to. It can also +access `argument 0-1` and `local 0-2`. + +A dozen commands later, g decides to return. First it copies its topmost +word on stack — the return value — to somewhere convenient for f to use. +Then it exits and hands the control back to f. But in fact, g has no idea +it came from f; it only knows where to find and recover the pre-saved +context. + +Suppose g pushed a few things and touched some local variables, and +decides to return. + +![SP=313, LCL=308, ARG=301, THIS=1024, THAT=2048, stack (from 301): 60, 9, +420, 299, 292, 1024, 2048, 69, 51, 0, 2, 3519](img/nand2tetris_2.1/stack_before_return.png) + +The return value is 3519, and we know that before g was called SP was 301 +— always equal to our current ARG. So we copy \*(SP-1) to \*ARG. + +When the number of arguments is zero, \*ARG happens to be where we store +the return address, namely 420. We could overwrite it and lose context! +That's why we need to back up the return address somewhere safe and +recover from there; I selected R13 in my implementation and I've had no +problems yet. + +After this is done, everything after LCL is useless. So we move SP to LCL: + +![SP=308, LCL=308, ARG=301, THIS=1024, THAT=2048, stack (from 301): 3519, 9, +420, 299, 292, 1024, 2048, 69, 51, 0, 2, 3519](img/nand2tetris_2.1/stack_before_restore.png) + +Then, we restore LCL et al. by popping them one by one, and move SP right +after the return value. + +![SP=302, LCL=299, ARG=292, THIS=1024, THAT=2048, stack (from 301): 3519, 9, +420, 299, 292, 1024, 2048, 69, 51, 0, 2, 3519](img/nand2tetris_2.1/stack_after_return.png) + +Finally, we jump to the address indicated by R13, which is 420. And thus, +this grand waltz of function call and return is complete. + +## Writing the VM translator + +As a person with limited wits and patience, I decided to write the VM +translator in Python 3.10 to leverage the power of pattern matching, aka +`match case`, as defined in [PEP 636](https://peps.python.org/pep-0636/). +This means I can match VM commands like + +``` +match cmd.split(): + case []: + continue + case [("push" | "pop") as action, segment, index]: + # handle memory command + case [("add" | "sub" | "neg" | "and" | "or" | "not") as command]: + # handle arithmetic / logical command + # more cases here + case _: + # invalid command, raise error +``` + +This level of syntactic elegance is, in my opinion, unmatched among all +languages (pun intended). One other characteristic of the VM translator is +that it's context-free. In contrast, the assembler I wrote in project 06 +needs to read the entire file to decide whether a given symbol is +a reference to a label or a variable. However it does depend on _some_ +global state, for example to generate unique labels for each `lt` command. + +Over the course of implementation, I realized that the hardest part is +that you are programming in two languages at once. You are using one +language to generate another. Of course, assembly is way harder (I forgot +`A=M` _multiple times_, and instead of debugging for hours in the CPU +emulator you should probably recheck if your assembly matches your +pseudocode in terms of pointer dereferencing). But the caveats in Python +showed up in unexpected places. For example, my optimized implementation +of `function <name> <varc>` consists of three parts: snippet A, snippet +B repeated (varc - 1) times, and snippet C. As you can imagine, things +took a dramatic turn as varc approaches zero. It turns out Python does not +raise an exception when I do `snippet_b * (-1)`; it silently returns an +empty string, which means at least one variable is initialized no matter +what. The bug is fixed, but I learned one more reason why Python is weird. + +### Code statistics + +- 408 lines of code +- About half of it is multiline strings of assembly + +## Is the stack machine any good? + +The stack machine, or any kind of virtual machine used by industrial +languages, is an intermediate step from high-level source code to machine +code. Not every languages needs it, though; C is a counterexample. Virtual +machines offer you one more level of abstraction, at the cost of +efficiency and size. + +<details markdown="1"> +<summary markdown="1">__One example how VM increases executable size__</summary> + +Let's say we need to do: + +``` +push constant 1337 +pop local 69 +``` + +Our VM translator would output these lines: + +``` +// push constant 1337 +@x +D=A +@SP +A=M +M=D +@SP +M=M+1 +// pop local 69 +@69 +D=A +@LCL +M=M+D +@SP +AM=M-1 +D=M +@LCL +A=M +M=D +@69 +D=A +@LCL +M=M-D +``` + +This is 21 instructions, but in reality, some instructions are totally +unnecessary. Like when we incremented, then decremented SP. If we treated +the two commands as one atomic operation and rewrite the assembly from +scratch: + +``` +@69 +D=A +@LCL +M=M+D +@1337 +D=M +@LCL +A=M +M=D +@69 +D=A +@LCL +M=M-D +``` + +Down to 13 instructions now. + +</details> + +The merit of a half-human, half-machine abstraction is increased control +and security. Suppose a function declares itself as `function f 4`, but +asks us to `pop local 9`. The VM translator could, in theory, refuse to +assemble the code, which could be malicious (if it even runs, that is). +But my VM translator does not, because I want to keep my VM translator as +simple as possible. Furthermore, the VM acts as a better troubleshooting +tool than assembly for your compiler. + +## Conclusion + +In project 07 and 08 of nand2tetris part 2, we built a VM translator that +translates an imaginary architecture to a concrete one. This added layer +of abstraction bridges the gap between high-level source code and machine +language. In this course, we discussed the benefits and costs in the +Perspectives unit; however, specifics are out of scope. + +Philosophically, the lesson I learned is that in modern computer +engineering, universality often beats efficiency. The stack machine is +clever in that it is a complete abstraction of assembly. In other words, +everything you expect a computer to do, if you don't care about speed, can +be done on a stack machine, without the need to write a line of assembly. +Incidentally, this also describes high-level languages. Efficiency is +lost, yes, but there exist two justifications: + +1. Most applications are not efficiency-critical, and if yours is + extremely so, you probably shouldn't be using a VM; +2. If you _really_ need efficiency on VM, you can revise your translator + to generate more efficient assembly. + +I would like to say the following about this course: + +- Shocken spoke in _every_ single lecture. Impressive. +- I'm not sure if I'm the only one, but I find the video lectures a lot + more repetitive than those in part 1. +- At least one technical specification (booleans) was not communicated + well. +- The project is arguably more challenging (intellectually, not + syntactically) than writing the assembler. + +That's all for project 07 and 08; see you in Part 2.2! |