7.10 Interrupts and Polled I/O
Polling is constantly testing a port to see if data is available. That is, the CPU polls (asks) the port if it has data available or if it is capable of accepting data. The REPEAT..UNTIL loop in the previous section is a good example of polling. The CPU continually polls the port to see if the printer is ready to accept data. Polled I/O is inherently inefficient. Consider what happens in the previous section if the printer takes ten seconds to accept another byte of data - the CPU spins in a loop doing nothing (other than testing the printer status port) for those ten seconds.
In early personal computer systems, this is exactly how a program would behave; when it wanted to read a key from the keyboard it would poll the keyboard status port until a key was available. Such computers could not do other operations while waiting for the keyboard.
The solution to this problem is to provide an interrupt mechanism. An interrupt is an external hardware event (such as the printer becoming ready to accept another byte) that causes the CPU to interrupt the current instruction sequence and call a special interrupt service routine. (ISR). An interrupt service routine typically saves all the registers and flags (so that it doesn't disturb the computation it interrupts), does whatever operation is necessary to handle the source of the interrupt, it restores the registers and flags, and then it resumes execution of the code it interrupted. In many computer systems (e.g., the PC), many I/O devices generate an interrupt whenever they have data available or are able to accept data from the CPU. The ISR quickly processes the request in the background, allowing some other computation to proceed normally in the foreground.
An interrupt is essentially a procedure call that the hardware makes (rather than explicit call to some procedure, like a call to the stdout.put routine). The most important thing to remember about an interrupt is that it can pause the execution of some program at any point between two instructions when an interrupt occurs. Therefore, you typically have no guarantee that one instruction always executes immediately after another in the program because an interrupt could occur between the two instructions. If an interrupt occurs in the middle of the execution of some instruction, then the CPU finishes that instruction before transferring control to the appropriate interrupt service routine. However, the interrupt generally interrupts execution before the start of the next instruction1. Suppose, for example, that an interrupt occurs between the execution of the following two instructions:
add( i, eax );
<---- Interrupt occurs here.
mov( eax, j );
When the interrupt occurs, control transfers to the appropriate ISR that handles the hardware event. When that ISR completes and executes the IRET (interrupt return) instruction, control returns back to the point of interruption and execution of the original code continues with the instruction immediately after the point of interrupt (e.g., the MOV instruction above). Imagine an interrupt service routine that executes the following code:
mov( 0, eax );
If this ISR executes in response to the interrupt above, then the main program will not produce a correct result. Specifically, the main program should compute "j := eax +i;" Instead, it computes "j := 0;" (in this particular case) because the interrupt service routine sets EAX to zero, wiping out the sum of i and the previous value of EAX. This highlights a very important fact about ISRs: ISRs must preserve all registers and flags whose values they modify. If an ISR does not preserve some register or flag value, this will definitely affect the correctness of the programs running when an interrupt occurs. Usually, the ISR mechanism itself preserves the flags (e.g., the interrupt pushes the flags onto the stack and the IRET instruction restores those flags). However, the ISR itself is responsible for preserving any registers that it modifies.
Although the preceding discussion makes it clear that ISRs must preserve registers and the flags, your ISRs must exercise similar care when manipulating any other resources the ISR shares with other processes. This includes variables, I/O ports, etc. Note that preserving the values of such objects isn't always the correct solution. Many ISRs communicate their results to the foreground program using shared variables. However, as you will see, the ISR and the foreground program must coordinate access to shared resources or they may produce incorrect results. Writing code that correctly works with shared resources is a difficult challenge; the possibility of subtle bugs creeping into the program is very great. We'll consider some of these issues a little later in this chapter; the messy details will have to wait for a later volume of this text.
CPUs that support interrupts must provide some mechanism that allows the programmer to specify the address of the ISR to execute when an interrupt occurs. Typically, an interrupt vector is a special memory location that contains the address of the ISR to execute when an interrupt occurs. PCs typically support up to 16 different interrupts.
After an ISR completes its operation, it generally returns control to the foreground task with a special "return from interrupt" instruction. On the Y86 hypothetical processor, for example, the IRET (interrupt return) instruction handles this task. This same instruction does a similar task on the 80x86. An ISR should always end with this instruction so the ISR can return control to the program it interrupted.
7.11 Using a Circular Queue to Buffer Input Data from an ISR
A typical interrupt-driven input system uses the ISR to read data from an input port and buffer it up whenever data becomes available. The foreground program can read that data from the buffer at its leisure without losing any data from the port. A typical foreground/ISR arrangement appears in Figure 7.10. In this diagram the ISR reads a value from the peripheral device and then stores the data into a common buffer that the ISR shares with the foreground application. Sometime later, the foreground process removes the data from the buffer. If (during a burst of input) the device and ISR produce data faster than the foreground application reads data from the buffer, the ISR will store up multiple unread data values in the buffer. As long as the average consumption rate of the foreground process matches the average production rate of the ISR, and the buffer is large enough to hold bursts of data, there will be no lost data.
Figure 7.10 Interrupt Service Routine as a Data Produce/Application as a Data Consumer
If the foreground process in Figure 7.10 consumes data faster than the ISR produces it, sooner or later the buffer will become empty. When this happens the foreground process will have to wait for the background process to produce more data. Typically the foreground process would poll the data buffer (or, in a more advanced system, block execution) until additional data arrives. Then the foreground process can easily extract the new data from the buffer and continue execution.
There is nothing special about the data buffer. It is just a block of contiguous bytes in memory and a few additional pieces of information to maintain the list of data in the buffer. While there are lots of ways to maintain data in a buffer such as this one, probably the most popular technique is to use a circular buffer. A typical circular buffer implementation contains three objects: an array that holds the actual data, a pointer to the next available data object in the buffer, and a length value that specifies how many objects are currently in the buffer.
Later in this text you will see how to declare and use arrays. However, in the chapter on Memory Access you saw how to allocate a block of data in the STATIC section (see "The Static Sections" on page167) or how to use malloc to allocate a block of bytes (see "Dynamic Memory Allocation and the Heap Segment" on page187). For our purposes, declaring a block of bytes in the STATIC section is just fine; the following code shows one way to set aside 16 bytes for a buffer:static buffer: byte := 0; // Reserves one byte. byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 // 15 additional bytes.
Of course, this technique would not be useful if you wanted to set aside storage for a really large buffer, but it works fine for small buffers (like our example above). See the chapter on arrays (appearing later in this text) if you need to allocate storage for a larger buffer.
In addition to the buffer data itself, a circular buffer also needs at least two other values: an index into the buffer that specifies where the next available data object appears and a count of valid items in the buffer. Given that the 80x86's addressing modes all use 32-bit registers, we'll find it most convenient to use a 32-bit unsigned integer for this purpose even though the index and count values never exceed 16. The declaration for these values might be the following:static index: uns32 := 0; // Start with first element of the array. count: uns32 := 0; // Initially, there is no data in the array.
The data producer (the ISR in our example) inserts data into the buffer by following these steps:
Suppose that the producer wishes to add a character to the initially empty buffer. The count is zero so we don't have to deal with a buffer overflow. The index value is also zero, so ((index+count) MOD 16) is zero and we store our first data byte at index zero in the array. Finally, we increment count by one so that the producer will put the next byte at offset one in the array of bytes.
If the consumer never removes any bytes and the producer keeps producing bytes, sooner or later the buffer will fill up and count will hit 16. Any attempt to insert additional data into the buffer is an error condition. The producer needs to decide what to do at that point. Some simple routines may simply ignore any additional data (that is, any additional incoming data from the device will be lost). Some routines may signal an exception and leave it up to the main application to deal with the error. Some other routines may attempt to expand the buffer size to allow additional data in the buffer. The corrective action is application-specific. In our examples we'll assume the program either ignores the extra data or immediately stops the program if a buffer overflow occurs.
You'll notice that the producer stores the data at location ((index+count) MOD buffer_size) in the array. This calculation, as you'll soon see, is how the circular buffer obtains its name. HLA does provide a MOD instruction that will compute the remainder after the division of two values, however, most buffer routines don't compute remainder using the MOD instruction. Instead, most buffer routines rely on a cute little trick to compute this value much more efficiently than with the MOD instruction. The trick is this: if a buffer's size is a power of two (16 in our case), you can compute (x MOD buffer_size) by logically ANDing x with buffer_size - 1. In our case, this means that the following instruction sequence computes ((index+count) MOD 16) in the EBX register:mov( index, ebx ); add( count, ebx ); and( 15, ebx );
Remember, this trick only works if the buffer size is an integral power of two. If you look at most programs that use a circular buffer for their data, you'll discover that they commonly use a buffer size that is an integral power of two. The value is not arbitrary; they do this so they can use the AND trick to efficiently compute the remainder.
To remove data from the buffer, the consumer half of the program follows these steps:
- The consumer first checks to the count to see if there is any data in the buffer. If not, the consumer waits until data is available.
- If (or when) data is available, the consumer fetches the value at the location index specifies within the buffer.
- The consumer then decrements the count and computes index := (index + 1) MOD buffer_size.
To remove a byte from the circular buffer in our current example, you'd use code like the following:// wait for data to appear in the buffer. repeat until( count <> 0 ); // Remove the character from the buffer. mov( index, ebx ); mov( buffer[ ebx ], al ); // Fetch the byte from the buffer. dec( count ); // Note that we've removed a character. inc( ebx ); // Index := Index + 1; and( 15, ebx ); // Index := (index + 1) mod 16; mov( ebx, index ); // Save away the new index value.
As the consumer removes data from the circular queue, it advances the index into the array. If you're wondering what happens at the end of the array, well that's the purpose of the MOD calculation. If index starts at zero and increments with each character, you'd expect the sequence 0, 1, 2, ... At some point or another the index will exceed the bounds of the buffer (i.e., when index increments to 16). However, the MOD operation resets this value back to zero (since 16 MOD 16 is zero). Therefore, the consumer, after that point, will begin removing data from the beginning of the buffer.
Take a close look at the REPEAT..UNTIL loop in the previous code. At first blush you may be tempted to think that this is an infinite loop if count initially contains zero. After all, there is no code in the body of the loop that modifies count's value. So if count contains zero upon initial entry, how does it ever change? Well, that's the job of the ISR. When an interrupt comes along the ISR suspends the execution of this loop at some arbitrary point. Then the ISR reads a byte from the device, puts the byte into the buffer, and updates the count variable (from zero to one). Then the ISR returns and the consumer code above resumes where it left off. On the next loop iteration, however, count's value is no longer zero, so the loop falls through to the following code. This is a classic example of how an ISR communicates with a foreground process - by writing a value to some shared variable.
There is a subtle problem with the producer/consumer code in this section. It will fail if the producer is attempting to insert data into the buffer at exactly the same time the consumer is removing data. Consider the following sequence of instructions:// wait for data to appear in the buffer. repeat until( count <> 0 ); // Remove the character from the buffer. mov( index, ebx ); mov( buffer[ ebx ], al ); // Fetch the byte from the buffer. dec( count ); // Note that we've removed a character. *** Assume the interrupt occurs here, so we begin executing *** the data insertion sequence: mov( index, ebx ); add( count, ebx ); and( 15, ebx ); mov( al, buffer[ebx] ); inc( count ); *** now the ISR returns to the consumer code (assume we've preserved EBX): inc( ebx ); // Index := Index + 1; and( 15, ebx ); // Index := (index + 1) mod 16; mov( ebx, index ); // Save away the new index value.
The problem with this code, which is very subtle, is that the consumer has decremented the count variable and an interrupt occurs before the consumer can update the index variable as well. Therefore, upon arrival into the ISR, the count value and the index value are inconsistent. That is, index+count now points at the last value placed in the buffer rather than the next available location. Therefore, the ISR will overwrite the last byte in the buffer rather than properly placing this byte after the (current) last byte. Worse, once the ISR returns to the consumer code, the consumer will update the index value and effectively add a byte of garbage to the end of the circular buffer. The end result is that we wipe out the next to last value in the buffer and add a garbage byte to the end of the buffer.
Note that this problem doesn't occur all the time, or even frequently for that matter. In fact, it only occurs in the very special case where the interrupt occurs between the "dec( count );" and "mov(ebx, index);" instructions in this code. If this code executes a very tiny percentage of the time, the likelihood of encountering this error is quite small. This may seem good, but this is actually worse than having the problem occur all the time; the fact that the problem rarely occurs just means that it's going to be really hard to find and correct this problem when you finally do detect that something has gone wrong. ISRs and concurrent programs are among the most difficult programs in the world to test and debug. The best solution is to carefully consider the interaction between foreground and background tasks when writing ISRs and other concurrent programs. In a later volume, this text will consider the issues in concurrent programming, for now, be very careful about using shared objects in an ISR.
There are two ways to correct the problem that occurs in this example. One way is to use a pair of (somewhat) independent variables to manipulate the queue. The original PC's type ahead keyboard buffer, for example, used two index variables rather than an index and a count to maintain the queue. The ISR would use one index to insert data and the foreground process would use the second index to remove data from the buffer. The only sharing of the two pointers was a comparison for equality, which worked okay even in an interrupt environment. Here's how the code worked:// Declarations static buffer: byte := 0; byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; Ins: uns32 := 0; // Insert bytes starting here. Rmv: uns32 := 0; // Remove bytes starting here. // Insert a byte into the queue (the ISR ): mov( Ins, ebx ); inc( ebx ); and( 15, ebx ); if( ebx <> Rmv ) then mov( al, buffer[ ebx ] ); mov( ebx, Ins ); else // Buffer overflow error. // Note that we don't update INS in this case. endif; // Remove a byte from the queue (the consumer process). mov( Rmv, ebx ); repeat // Wait for data to arrive until( ebx <> Ins ); mov( buffer[ ebx ], al ); inc( ebx ); and( 15, ebx ); mov( ebx, Rmv );
If you study this code (very) carefully, you'll discover that the two code sequences don't interfere with one another. The difference between this code and the previous code is that the foreground and background processes don't write to a (control) variable that the other routine uses. The ISR only writes to Ins while the foreground process only writes to Rmv. In general, this is not a sufficient guarantee that the two code sequences won't interfere with one another, but it does work in this instance.
One drawback to this code is that it doesn't fully utilize the buffer. Specifically, this code sequence can only hold 15 characters in the buffer; one byte must go unused because this code determines that the buffer is full when the value of Ins is one less than Rmv (MOD 16). When the two indices are equal, the buffer is empty. Since we need to test for both these conditions, we can't use one of the bytes in the buffer.
A second solution, that many people prefer, is to protect that section of code in the foreground process that could fail if an interrupt comes along. There are lots of ways to protect this critical section2 of code. Alas, most of the mechanisms are beyond the scope of this chapter and will have to wait for a later volume in this text. However, one simple way to protect a critical section is to simply disable interrupts during the execution of that code. The 80x86 family provides two instructions, CLI and STI that let you enable and disable interrupts. The CLI instruction (clear interrupt enable flag) disables interrupts by clearing the "I" bit in the flags register (this is the interrupt enable flag). Similarly, the STI instruction enables interrupts by setting this flag. These two instructions use the following syntax:cli(); // Disables interrupts from this point forward... . . . sti(); // Enables interrupts from this point forward...
You can surround a critical section in your program with these two instructions to protect that section from interrupts. The original consumer code could be safely written as follows:// wait for data to appear in the buffer. repeat until( count <> 0 ); // Remove the character from the buffer. cli(); // Protect the following critical section. mov( index, ebx ); mov( buffer[ ebx ], al ); // Fetch the byte from the buffer. dec( count ); // Note that we've removed a character. inc( ebx ); // Index := Index + 1; and( 15, ebx ); // Index := (index + 1) mod 16; mov( ebx, index ); // Save away the new index value. sti(); // Critical section is done, restore interrupts. Perhaps a better sequence to use is to push the EFLAGs register (that contains the I flag) and turn off the interrupts. Then, rather than blindly turning interrupts back on, you can restore the original I flag setting using a POPFD instruction: // Remove the character from the buffer. pushfd(); // Preserve current I flag value. cli(); // Protect the following critical section. mov( index, ebx ); mov( buffer[ ebx ], al ); // Fetch the byte from the buffer. dec( count ); // Note that we've removed a character. inc( ebx ); // Index := Index + 1; and( 15, ebx ); // Index := (index + 1) mod 16; mov( ebx, index ); // Save away the new index value. popfd(); // Restore original I flag value
This mechanism is arguably safer since it doesn't turn the interrupts on even if they were already off before executing this sequence.
In our simple example (with a single producer and a single consumer) there is no need to protect the code in the ISR. However, if it were possible for two different ISRs to insert data into the buffer, and one ISR could interrupt another, then you would have to protect the code inside the ISR as well.
You must be very careful about turning the interrupts on and off. If you turn the interrupts off and forget to turn them back on, the next time you enter a loop like one of the REPEAT..UNTIL loops in this section the program will lock up because the loop control variable (count) will never change if an ISR cannot execute and update its value. This situation is called deadlock and you must take special care to avoid it.
Note that applications under Windows or Linux cannot change the state of the interrupt disable flag. This technique is useful mainly in embedded system or under simpler operating systems like DOS. Fortunately, advanced 32-bit operating systems like Linux and Windows provide other mechanisms for protecting critical sections.
7.12 Using a Circular Queue to Buffer Output Data for an ISR
You can also use a circular buffer to hold data waiting for transmission. For example, a program can buffer up data in bursts while an output device is busy and then the output device can empty the buffer at a steady state. The queuing and dequeuing routines are very similar to those found in the previous section with one major difference: output devices don't automatically initiate the transfer like input devices. This problem is a source of many bugs in output buffering routines and one that we'll pay careful attention to in this section.
As noted above, one advantage of an input device is that it automatically interrupts the system whenever data is available. This activates the corresponding ISR that reads the data from the device and places the data in the buffer. No special processing is necessary to prepare the interrupt system for the next input value.
There is a subtle difference between the interrupts an input device generates and the interrupts an output device generates. An input device generates an interrupt when data is available, output devices generate an interrupt when they are ready to accept more data. For example, a keyboard device generates an interrupt when the user presses a key and the system has to read the character from the keyboard. A printer device, on the other hand, generates an interrupt once it is done with the last character transmitted to it and it's ready to accept another character. Whenever the user presses a keyboard for the very first time, the system will generate an interrupt in response to that event. However, the printer does not generate an interrupt when the system first powers up to tell the system that it's ready to accept a character. Even if it did, the system would ignore the interrupt since it (probably) doesn't have any data to transmit to the printer at that point. Later, when the system puts data in the printer's buffer for transmission, there is no interrupt that activates the ISR to remove a character from the buffer and send it to the printer. The printer device only sends interrupts when it is done processing a character; if it isn't processing any characters, it won't generate any interrupts.
This creates a bit of a problem. If the foreground process places characters in the queue and the background process (the ISR, which is the consumer in this case) only removes those characters when an interrupt occurs, the system will never activate the ISR since the device isn't currently processing anything. To correct this problem, the producer code (the foreground process) must maintain a flag that indicates whether the output device is currently processing a character; if so, then the producer can simply queue up the character in the buffer. If the device is not currently processing any data, then the producer should send the data directly to the device rather than queue up the data in the buffer3. Old time programmers refer to this as "priming the pump" since we have to put data in the transmission pipeline in order to get the process working properly.
Once the producer "primes the pump" the process continues automatically as long as there is data in the buffer. After the output device processes the current byte it generates an interrupt. The ISR removes a byte from the buffer and transmits this data to the device. When that byte completes transmission the device generates another interrupt and the process repeats. This process repeats automatically as long as there is data in the buffer to transmit to the output device.
When the ISR transmits the last character from the buffer, the output device still generates an interrupt at the end of the transmission. The ISR, upon noting that the buffer is empty, returns without sending any new data to the output device. Since there is no pending data transmission to the output device, there will be no new interrupts to activate the ISR when new data appears in the buffer. Once again the foreground process (producer) will have to prime the pump to get the process going when it attempts to put data in the buffer.
Perhaps the easiest way to handle this process is to use a boolean variable to indicate whether the output device is currently transmitting data (and will generate an interrupt to process the next byte). If the flag is set, the foreground process can simply enqueue the data; if the flag is clear, the foreground process must transmit the data directly to the device (or call the code that does this). In this latter case, the foreground process must also set the flag to denote a transmission in progress.
Here is some code that can implement this functionality:static OutBuf: byte := 0; byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; Index: uns32 := 0; Count: uns32 := 0; Xmitting: boolean := false; // Flag to denote transmission in progress. . . . // Code to enqueue a byte (foreground process executes this) if( Count = 16 ) then // Error, buffer is full. Do whatever processing is necessary to // deal with this problem. . . . elseif( Xmitting ) then // If we're currently transmitting data, just add the byte to the queue. pushfd(); // Critical region! Turn off the interrupts. cli(); mov( Index, ebx ); // Store the new byte at address (Index+Count) mod 16 add( Count, ebx ); and( 15, ebx ); mov( al, OutBuf[ ebx ] ); inc( Count ); popfd(); // Restore the interrupt flag's value. else // The buffer is empty and there is no character in transmission. // Do whatever is necessary to transmit the character to the output // device. . . . // Be sure to set the Xmitting flag since a character is now being // transmitted to the output device. mov( true, Xmitting ); endif; . . . // Here's the code that would appear inside the ISR to remove a character // from the buffer and send it to the output device. The system calls this // ISR whenever the device finishes processing some character. // (Presumably, the ISR preserves all registers this code sequence modifies) if( Count > 0 ) then // Okay, there are characters in the buffer. Remove the first one // and transmit it to the device: mov( Index, ebx ); mov( OutBuf[ ebx ], al ); // Get the next character to output inc( ebx ); // Point Index at the next byte in the and( 15, ebx ); // circular buffer. mov( ebx, Index ); dec( Count ); // Decrement count since we removed a char. << At this point, do whatever needs to be done in order to transmit a character to the output device >> . . . else // At this point, the ISR was called but the buffer is empty. // Simply clear the Xmitting flag and return. (this will force the // next buffer insertion operation to transmit the data directly to the // device.) mov( true, Xmitting ); endif;
1The situation is somewhat fuzzy if you have pipelines and superscalar operation. Exactly what instruction does an interrupt precede if there are multiple instructions executing simultaneously? The answer is somewhat irrelevant, however, since the interrupt does take place between the execution of some pair of instructions; in reality, the interrupt may occur immediately after the last instruction to enter the pipeline when the interrupt occurs. Nevertheless, the system does interrupt the execution of the foreground process after the execution of some instruction.
2A critical section is a region of code during which certain resources have to be protected from other processes. For example, the consumer code that fetches data from the buffer needs to be protected from the ISR.
3Another possibility is to go ahead and queue up the data and then manually activate the code that dequeues the data and sends it to the output device.