2.9 State Machines and Indirect Jumps
Another control structure commonly found in assembly language programs is the state machine. A state machine uses a state variable to control program flow. The FORTRAN programming language provides this capability with the assigned GOTO statement. Certain variants of C (e.g., GNU's GCC from the Free Software Foundation) provide similar features. In assembly language, the indirect jump provides a mechanism to easily implement state machines.
So what is a state machine? In very basic terms, it is a piece of code1 that keeps track of its execution history by entering and leaving certain "states". For the purposes of this chapter, we'll not use a very formal definition of a state machine. We'll just assume that a state machine is a piece of code which (somehow) remembers the history of its execution (its state) and executes sections of code based upon that history.
In a very real sense, all programs are state machines. The CPU registers and values in memory constitute the "state" of that machine. However, we'll use a much more constrained view. Indeed, for most purposes only a single variable (or the value in the EIP register) will denote the current state.
Now let's consider a concrete example. Suppose you have a procedure which you want to perform one operation the first time you call it, a different operation the second time you call it, yet something else the third time you call it, and then something new again on the fourth call. After the fourth call it repeats these four different operations in order. For example, suppose you want the procedure to ADD EAX and EBX the first time, subtract them on the second call, multiply them on the third, and divide them on the fourth. You could implement this procedure as follows:procedure StateMachine; static State:byte := 0; begin StateMachine; cmp( State, 0 ); jne TryState1; // State 0: Add EBX to EAX and switch to state 1: add( ebx, eax ); inc( State ); exit StateMachine; TryState1: cmp( State, 1 ); jne TryState2; // State 1: subtract ebx from EAX and switch to state 2: sub( ebx, eax ); inc( State ); // State 1 becomes State 2. exit StateMachine; TryState2: cmp( State, 2 ); jne MustBeState3; // If this is state 2, multiply EAX by EAX and switch to state 3: intmul( ebx, eax ); inc( State ); // State 2 becomes State 3. exit StateMachine; // If it isn't one of the above states, we must be in state 3 // So divide eax by ebx and switch back to state zero. MustBeState3: push( edx ); // Preserve this `cause it gets whacked by DIV. xor( edx, edx ); // Zero extend EAX into EDX. div( ebx, edx:eax); pop( edx ); // Restore EDX's value preserved above. mov( 0, State ); // Reset the state back to zero. end StateMachine;
Technically, this procedure is not the state machine. Instead, it is the variable State and the CMP/JNE instructions which constitute the state machine.
There is nothing particularly special about this code. It's little more than a SWITCH statement implemented via the IF..THEN..ELSEIF construct. The only thing special about this procedure is that it remembers how many times it has been called2 and behaves differently depending upon the number of calls. While this is a correct implementation of the desired state machine, it is not particularly efficient. The astute reader, of course, would recognize that this code could be made a little faster using an actual SWITCH statement rather than the IF..THEN..ELSEIF implementation. However, there is a better way...
The more common implementation of a state machine in assembly language is to use an indirect jump. Rather than having a state variable which contains a value like zero, one, two, or three, we could load the state variable with the address of the code to execute upon entry into the procedure. By simply jumping to that address, the state machine could save the tests above needed to execute the proper code fragment. Consider the following implementation using the indirect jump:procedure StateMachine; static State:dword := &State0; begin StateMachine; jmp( State ); // State 0: Add EBX to EAX and switch to state 1: State0: add( ebx, eax ); mov( &State1, State ); exit StateMachine; State1: // State 1: subtract ebx from EAX and switch to state 2: sub( ebx, eax ); mov( &State2, State ); // State 1 becomes State 2. exit StateMachine; State2: // If this is state 2, multiply EAX by EAX and switch to state 3: intmul( ebx, eax ); mov( &State3, State ); // State 2 becomes State 3. exit StateMachine; // State 3: divide eax by ebx and switch back to state zero. State3: push( edx ); // Preserve this `cause it gets whacked by DIV. xor( edx, edx ); // Zero extend EAX into EDX. div( ebx, edx:eax); pop( edx ); // Restore EDX's value preserved above. mov( &State0, State ); // Reset the state back to zero. end StateMachine;
The JMP instruction at the beginning of the StateMachine procedure transfers control to the location pointed at by the State variable. The first time you call StateMachine it points at the State0 label. Thereafter, each subsection of code sets the State variable to point at the appropriate successor code.
2.10 Spaghetti Code
One major problem with assembly language is that it takes several statements to realize a simple idea encapsulated by a single HLL statement. All too often an assembly language programmer will notice that s/he can save a few bytes or cycles by jumping into the middle of some program structure. After a few such observations (and corresponding modifications) the code contains a whole sequence of jumps in and out of portions of the code. If you were to draw a line from each jump to its destination, the resulting listing would end up looking like someone dumped a bowl of spaghetti on your code, hence the term "spaghetti code".
Spaghetti code suffers from one major drawback- it's difficult (at best) to read such a program and figure out what it does. Most programs start out in a "structured" form only to become spaghetti code at the altar of efficiency. Alas, spaghetti code is rarely efficient. Since it's difficult to figure out exactly what's going on, it's very difficult to determine if you can use a better algorithm to improve the system. Hence, spaghetti code may wind up less efficient than structured code.
While it's true that producing some spaghetti code in your programs may improve its efficiency (e.g., destructuring your code, see "Efficient Implementation of IF Statements in Assembly Language" on page 772), doing so should always be a last resort after you've tried everything else and you still haven't achieved what you need. Always start out writing your programs with straight-forward IFs and SWITCH statements. Start combining sections of code (via JMP instructions) once everything is working and well understood. Of course, you should never obliterate the structure of your code unless the gains are worth it.
A famous saying in structured programming circles is "After GOTOs, pointers are the next most dangerous element in a programming language." A similar saying is "Pointers are to data structures what GOTOs are to control structures." In other words, avoid excessive use of pointers. If pointers and gotos are bad, then the indirect jump must be the worst construct of all since it involves both GOTOs and pointers! Seriously though, the indirect jump instructions should be avoided for casual use. They tend to make a program harder to read. After all, an indirect jump can (theoretically) transfer control to any label within a program. Imagine how hard it would be to follow the flow through a program if you have no idea what a pointer contains and you come across an indirect jump using that pointer. Therefore, you should always exercise care when using jump indirect instructions.
1Note that state machines need not be software based. Many state machines' implementation are hardware based.
2Actually, it remembers how many times, MOD 4, that it has been called.