[Next] [Art of Assembly][Randall Hyde] [WEBster Home Page]



Art of Assembly Language: Chapter Fifteen



Art of Assembly/Win32 Edition is now available. Let me read that version.


PLEASE: Before emailing me asking how to get a hard copy of this text, read this.


PDF version of text. The Best Way to read "The Art of Assembly Language Programming"
Support Software for "Art of Assembly"


Important Notice: As you have probably discovered by now, I am no longer updating this document. The reason is quite simple: I'm working on a Windows version of "The Art of Assembly Language Programming". In the past I have encouraged individuals to send me corrections to this text. However, as I am no longer updating this material, don't expect those correctioins to appear in a future release. I am collecting errata that I will post to Webster someday, so feel free to continue sending corrections to AoA/DOS (16-bit) to rhyde@cs.ucr.edu. If you're more interested in leading edge material, please see the information about the Win/32 edition, above.


The Legal Stuff (Copyrights, etc.)


Chapter 15 - Strings and Character Sets
15.0 - Chapter Overview
15.1 - The 80x86 String Instructions
15.1.1 - How the String Instructions Operate
15.1.2 - The REP/REPE/REPZ and REPNZ/REPNE Prefixes
15.1.3 - The Direction Flag
15.1.4 - The MOVS Instruction
15.1.5 - The CMPS Instruction
15.1.6 - The SCAS Instruction
15.1.7 - The STOS Instruction
15.1.8 - The LODS Instruction
15.1.9 - Building Complex String Functions from LODS and STOS
15.1.10 - Prefixes and the String Instructions
15.2 - Character Strings
15.2.1 - Types of Strings
15.2.2 - String Assignment
15.2.3 - String Comparison
15.3 - Character String Functions
15.3.1 - Substr
15.3.2 - Index
15.3.3 - Repeat
15.3.4 - Insert
15.3.5 - Delete
15.3.6 - Concatenation
15.4 - String Functions in the UCR Standard Library
15.4.1 - StrBDel, StrBDelm
15.4.2 - Strcat, Strcatl, Strcatm, Strcatml
15.4.3 - Strchr
15.4.4 - Strcmp, Strcmpl, Stricmp, Stricmpl
15.4.5 - Strcpy, Strcpyl, Strdup, Strdupl
15.4.6 - Strdel, Strdelm
15.4.7 - Strins, Strinsl, Strinsm, Strinsml
15.4.8 - Strlen
15.4.9 - Strlwr, Strlwrm, Strupr, Struprm
15.4.10 - Strrev, Strrevm
15.4.11 - Strset, Strsetm
15.4.12 - Strspan, Strspanl, Strcspan, Strcspanl
15.4.13 - Strstr, Strstrl
15.4.14 - Strtrim, Strtrimm
15.4.15 - Other String Routines in the UCR Standard Library
15.5 - The Character Set Routines in the UCR Standard Library
15.6 - Using the String Instructions on Other Data Types
15.6.1 - Multi-precision Integer Strings
15.6.2 - Dealing with Whole Arrays and Records
15.7 - Sample Programs
15.7.1 - Find.asm
15.7.2 - StrDemo.asm
15.7.3 - Fcmp.asm



Chapter 15 Strings and Character Sets


A string is a collection of objects stored in contiguous memory locations. Strings are usually arrays of bytes, words, or (on 80386 and later processors) double words. The 80x86 microprocessor family supports several instructions specifically designed to cope with strings. This chapter explores some of the uses of these string instructions.

The 8088, 8086, 80186, and 80286 can process two types of strings: byte strings and word strings. The 80386 and later processors also handle double word strings. They can move strings, compare strings, search for a specific value within a string, initialize a string to a fixed value, and do other primitive operations on strings. The 80x86's string instructions are also useful for manipulating arrays, tables, and records. You can easily assign or compare such data structures using the string instructions. Using string instructions may speed up your array manipulation code considerably.


15.0 Chapter Overview


This chapter presents a review of the operation of the 80x86 string instructions. Then it discusses how to process character strings using these instructions. Finally, it concludes by discussing the string instruction available in the UCR Standard Library. The sections below that have a "*" prefix are essential. Those sections with a "o" discuss advanced topics that you may want to put off for a while.

* The 80x86 string instructions.

* Character strings.

* Character string functions.

* String functions in the UCR Standard Library.

o Using the string instructions on other data types.


15.1 The 80x86 String Instructions


All members of the 80x86 family support five different string instructions: movs, cmps, scas, lods, and stos[1]. They are the string primitives since you can build most other string operations from these five instructions. How you use these five instructions is the topic of the next several sections.


15.1.1 How the String Instructions Operate


The string instructions operate on blocks (contiguous linear arrays) of memory. For example, the movs instruction moves a sequence of bytes from one memory location to another. The cmps instruction compares two blocks of memory. The scas instruction scans a block of memory for a particular value. These string instructions often require three operands, a destination block address, a source block address, and (optionally) an element count. For example, when using the movs instruction to copy a string, you need a source address, a destination address, and a count (the number of string elements to move).

Unlike other instructions which operate on memory, the string instructions are single-byte instructions which don't have any explicit operands. The operands for the string instructions include











For example, one variant of the movs (move string) instruction copies a string from the source address specified by ds:si to the destination address specified by es:di, of length cx. Likewise, the cmps instruction compares the string pointed at by ds:si, of length cx, to the string pointed at by es:di.

Not all instructions have source and destination operands (only movs and cmps support them). For example, the scas instruction (scan a string) compares the value in the accumulator to values in memory. Despite their differences, the 80x86's string instructions all have one thing in common - using them requires that you deal with two segments, the data segment and the extra segment.


15.1.2 The REP/REPE/REPZ and REPNZ/REPNE Prefixes


The string instructions, by themselves, do not operate on strings of data. The movs instruction, for example, will move a single byte, word, or double word. When executed by itself, the movs instruction ignores the value in the cx register. The repeat prefixes tell the 80x86 to do a multi-byte string operation. The syntax for the repeat prefix is:














Field:
Label   repeat  mnemonic operand        ;comment

For MOVS:
        rep     movs    {operands}

For CMPS:
        repe    cmps    {operands}      
        repz    cmps    {operands}
        repne   cmps    {operands}
        repnz   cmps    {operands}

For SCAS:
        repe    scas    {operands}
        repz    scas    {operands}
        repne   scas    {operands}
        repnz   scas    {operands}

For STOS:
        rep     stos    {operands}

You don't normally use the repeat prefixes with the lods instruction.

As you can see, the presence of the repeat prefixes introduces a new field in the source line - the repeat prefix field. This field appears only on source lines containing string instructions. In your source file:











When specifying the repeat prefix before a string instruction, the string instruction repeats cx times[2]. Without the repeat prefix, the instruction operates only on a single byte, word, or double word.

You can use repeat prefixes to process entire strings with a single instruction. You can use the string instructions, without the repeat prefix, as string primitive operations to synthesize more powerful string operations.

The operand field is optional. If present, MASM simply uses it to determine the size of the string to operate on. If the operand field is the name of a byte variable, the string instruction operates on bytes. If the operand is a word address, the instruction operates on words. Likewise for double words. If the operand field is not present, you must append a "B", "W", or "D" to the end of the string instruction to denote the size, e.g., movsb, movsw, or movsd.


15.1.3 The Direction Flag


Besides the si, di, si, and ax registers, one other register controls the 80x86's string instructions - the flags register. Specifically, the direction flag in the flags register controls how the CPU processes strings.

If the direction flag is clear, the CPU increments si and di after operating upon each string element. For example, if the direction flag is clear, then executing movs will move the byte, word, or double word at ds:si to es:di and will increment si and di by one, two, or four. When specifying the rep prefix before this instruction, the CPU increments si and di for each element in the string. At completion, the si and di registers will be pointing at the first item beyond the string.

If the direction flag is set, then the 80x86 decrements si and di after processing each string element. After a repeated string operation, the si and di registers will be pointing at the first byte or word before the strings if the direction flag was set.

The direction flag may be set or cleared using the cld (clear direction flag) and std (set direction flag) instructions. When using these instructions inside a procedure, keep in mind that they modify the machine state. Therefore, you may need to save the direction flag during the execution of that procedure. The following example exhibits the kinds of problems you might encounter:














StringStuff:
                cld
        <do some operations>
                call    Str2
        <do some string operations requiring D=0>
                 .
                 .
                 .
Str2            proc    near
                std
        <Do some string operations>
                ret
Str2            endp

This code will not work properly. The calling code assumes that the direction flag is clear after Str2 returns. However, this isn't true. Therefore, the string operations executed after the call to Str2 will not function properly.

There are a couple of ways to handle this problem. The first, and probably the most obvious, is always to insert the cld or std instructions immediately before executing a string instruction. The other alternative is to save and restore the direction flag using the pushf and popf instructions. Using these two techniques, the code above would look like this:

Always issuing cld or std before a string instruction:














StringStuff:
                cld
        <do some operations>
                call    Str2
                cld
        <do some string operations requiring D=0>
                 .
                 .
                 .
Str2            proc    near
                std
        <Do some string operations>
                ret
Str2            endp

Saving and restoring the flags register:














StringStuff:
                cld
        <do some operations>
                call    Str2
        <do some string operations requiring D=0>
                 .
                 .
                 .
Str2            proc    near
                pushf
                std
        <Do some string operations>
                popf
                ret
Str2            endp

If you use the pushf and popf instructions to save and restore the flags register, keep in mind that you're saving and restoring all the flags. Therefore, such subroutines cannot return any information in the flags. For example, you will not be able to return an error condition in the carry flag if you use pushf and popf.


[1] The 80186 and later processor support two additional string instructions, INS and OUTS which input strings of data from an input port or output strings of data to an output port. We will not consider these instructions in this chapter.
[2] Except for the cmps instruction which repeats at most the number of times specified in the cx register.

15.0 - Chapter Overview
15.1 - The 80x86 String Instructions
15.1.1 - How the String Instructions Operate
15.1.2 - The REP/REPE/REPZ and REPNZ/REPNE Prefixes
15.1.3 - The Direction Flag
15.1.4 - The MOVS Instruction
15.1.5 - The CMPS Instruction
15.1.6 - The SCAS Instruction
15.1.7 - The STOS Instruction
15.1.8 - The LODS Instruction
15.1.9 - Building Complex String Functions from LODS and STOS
15.1.10 - Prefixes and the String Instructions
15.2 - Character Strings
15.2.1 - Types of Strings
15.2.2 - String Assignment
15.2.3 - String Comparison
15.3 - Character String Functions
15.3.1 - Substr
15.3.2 - Index
15.3.3 - Repeat
15.3.4 - Insert
15.3.5 - Delete
15.3.6 - Concatenation
15.4 - String Functions in the UCR Standard Library
15.4.1 - StrBDel, StrBDelm
15.4.2 - Strcat, Strcatl, Strcatm, Strcatml
15.4.3 - Strchr
15.4.4 - Strcmp, Strcmpl, Stricmp, Stricmpl
15.4.5 - Strcpy, Strcpyl, Strdup, Strdupl
15.4.6 - Strdel, Strdelm
15.4.7 - Strins, Strinsl, Strinsm, Strinsml
15.4.8 - Strlen
15.4.9 - Strlwr, Strlwrm, Strupr, Struprm
15.4.10 - Strrev, Strrevm
15.4.11 - Strset, Strsetm
15.4.12 - Strspan, Strspanl, Strcspan, Strcspanl
15.4.13 - Strstr, Strstrl
15.4.14 - Strtrim, Strtrimm
15.4.15 - Other String Routines in the UCR Standard Library
15.5 - The Character Set Routines in the UCR Standard Library
15.6 - Using the String Instructions on Other Data Types
15.6.1 - Multi-precision Integer Strings
15.6.2 - Dealing with Whole Arrays and Records
15.7 - Sample Programs
15.7.1 - Find.asm
15.7.2 - StrDemo.asm
15.7.3 - Fcmp.asm


Art of Assembly: Chapter Fifteen - 28 SEP 1996

[Next] [Art of Assembly][Randall Hyde]



Number of Web Site Hits since Jan 1, 2000: