|
Table of Content | Chapter Eight (Part 6) |
| CHAPTER EIGHT: MASM: DIRECTIVES & PSEUDO-OPCODES (Part 5) |
| 8.12.4 -
Type Operators 8.12.5 - Operator Precedence |
The "xxxx ptr" coercion
operator is an example of a type operator. MASM expressions possess two major attributes:
a value and a type. The arithmetic, logical, and relational operators change an
expression's value. The type operators change its type. The previous section demonstrated
how the ptr operator could change an expression's type. There are several
additional type operators as well.
| Operator | Syntax | Description |
|---|---|---|
| PTR | byte ptr expr word ptr expr dword ptr expr qword ptr expr tbyte ptr expr near ptr expr far ptr expr |
Coerce expr to point at a byte. Coerce expr to point at a word. Coerce expr to point at a dword. Coerce expr to point at a qword. Coerce expr to point at a tbyte. Coerce expr to a near value. Coerce expr to a far value. |
| short | short expr | expr must be within ±128 bytes of the current jmp instruction (typically a JMP instruction). This operator forces the JMP instruction to be two bytes long (if possible). |
| this | this type | Returns an expression of the specified type whose value is the current location counter. |
| seg | seg label | Returns the segment address portion of label. |
| offset | offset label | Returns the offset address portion of label. |
| .type | type label | Returns a byte that indicates whether this symbol is a variable, statement label, or structure name. Superceded by opattr. |
| opattr | opattr label | Returns a 16 bit value that gives information about label. |
| length | length variable | Returns the number of array elements for a single dimension array. If a multi-dimension array, this operator returns the number of elements for the first dimension. |
| lengthof | lengthof variable | Returns the number of items in array variable. |
| type | type symbol | Returns a expression whose type is the same as symbol and whose value is the size, in bytes, for the specified symbol. |
| size | size variable | Returns the number of bytes allocated for single dimension array variable. Useless for multi-dimension arrays. Superceded by sizeof. |
| sizeof | sizeof variable | Returns the size, in bytes, of array variable. |
| low | low expr | Returns the L.O. byte of expr. |
| lowword | lowword expr | Returns the L.O. word of expr. |
| high | high expr | Returns the H.O. byte of expr. |
| highword | highword expr | Returns the H.O. word of expr. |
The short operator works exclusively with the jmp
instruction. Remember, there are two jmp direct near instructions, one
that has a range of 128 bytes around the jmp, one that has a range of 32,768
bytes around the current instruction. MASM will automatically generate a short jump if the
target address is up to 128 bytes before the current instruction. This operator is mainly
present for compatibility with old MASM (pre-6.0) code.
The this operator forms an expression with the
specified type whose value is the current location counter. The instruction mov bx,
this word, for example, will load the bx register with the value
8B1Eh, the opcode for mov bx, memory. The address this word is
the address of the opcode for this very instruction! You mostly use the this operator
with the equ directive to give a symbol some type other than constant. For
example, consider the following statement:
HERE equ this near
This statement assigns the current location counter value
to HERE and sets the type of HERE to near. This, of course,
could have been done much easier by simply placing the label HERE: on the
line by itself. However, the this operator with the equ
directive does have some useful applications, consider the following:
WArray equ this word BArray byte 200 dup (?)
In this example the symbol BArray is of type
byte. Therefore, instructions accessing BArray must contain byte operands
throughout. MASM would flag a mov ax, BArray+8 instruction as an error.
However, using the symbol WArray lets you access the same exact memory
locations (since WArray has the value of the location counter immediately
before encountering the byte pseudo-opcode) so mov ax,WArray+8
accesses location BArray+8. Note that the following two instructions are
identical:
mov ax, word ptr BArray+8
mov ax, WArray+8
The seg operator does two things. First, it
extracts the segment portion of the specified address, second, it converts the type of the
specified expression from address to constant. An instruction of the form mov ax,
seg symbol always loads the accumulator with the constant corresponding to the
segment portion of the address of symbol. If the symbol is the name of a
segment, MASM will automatically substitute the paragraph address of the segment for the
name. However, it is perfectly legal to use the seg operator as well. The
following two statements are identical if dseg is the name of a segment:
mov ax, dseg
mov ax, seg dseg
Offset works like seg, except it
returns the offset portion of the specified expression rather than the segment portion. If
VAR1 is a word variable, mov ax, VAR1 will always load the two
bytes at the address specified by VAR1 into the ax register. The
mov ax, offset VAR1 instruction, on the other hand, loads the offset (address) of VAR1
into the ax register. Note that you can use the lea instruction
or the mov instruction with the offset operator to load the
address of a scalar variable into a 16 bit register. The following two instructions both
load bx with the address of variable J:
mov bx, offset J
lea bx, J
The lea instruction is more flexible since you
can specify any memory addressing mode, the offset operator only allows a
single symbol (i.e., displacement only addressing). Most programmers use the mov
form for scalar variables and the lea instructor for other addressing modes.
This is because the mov instruction was faster on earlier processors.
One very common use for the seg and offset
operators is to initialize a segment and pointer register with the segmented address of
some object. For example, to load es:di with the address of SomeVar, you
could use the following code:
mov di, seg SomeVar
mov es, di
mov di, offset SomeVar
Since you cannot load a constant directly into a segment
register, the code above copies the segment portion of the address into di
and then copies di into es before copying the offset into di.
This code uses the di register to copy the segment portion of the address
into es so that it will affect as few other registers as possible.
Opattr returns a 16 bit value providing
specific information about the expression that follows it. The .type operator
is an older version of opattr that returns the L.O. eight bits of this value.
Each bit in the value of these operators has the following meaning:
| Bit(s) | Meaning |
|---|---|
| 0 | References a label in the code segment if set. |
| 1 | References a memory variable or relocatable data object if set. |
| 2 | Is an immediate (absolute/constant) value if set. |
| 3 | Uses direct memory addressing if set. |
| 4 | Is a register name, if set. |
| 5 | References no undefined symbols and there is no error, if set. |
| 6 | Is an SS: relative reference, if set. |
| 7 | References an external name. |
| 8-10 | 000 - no language type 001 - C/C++ language type 010 - SYSCALL language type 011 - STDCALL language type 100 - Pascal language type 101 - FORTRAN language type 110 - BASIC language type |
The language bits are for programmers writing code that interfaces with high level languages like C++ or Pascal. Such programs use the simplified segment directives and MASM's HLL features.
You would normally use these values with MASM's conditional assembly directives and macros. This allows you to generate different instruction sequences depending on the type of a macro parameter or the current assembly configuration. For more details, see "Conditional Assembly" and "Macros".
The size, sizeof, length,
and lengthof operators compute the sizes of variables (including arrays) and
return that size and their value. You shouldn't normally use size and length.
The sizeof and lengthof operators have superceded these
operators. Size and length do not always return reasonable
values for arbitrary operands. MASM 6.x includes them to remain compatible with older
versions of the assembler. However, you will see an example later in this chapter where
you can use these operators.
The sizeof variable operator returns the
number of bytes directly allocated to the specified variable. The following examples
illustrate the point:
a1 byte ? ;SIZEOF(a1) = 1 a2 word ? ;SIZEOF(a2) = 2 a4 dword ? ;SIZEOF(a4) = 4 a8 real8 ? ;SIZEOF(a8) = 8 ary0 byte 10 dup (0) ;SIZEOF(ary0) = 10 ary1 word 10 dup (10 dup (0)) ;SIZEOF(ary1) = 200
You can also use the sizeof operator to
compute the size, in bytes, of a structure or other data type. This is very useful for
computing an index into an array using the formula from Chapter Four:
Element_Address := base_address + index*Element_Size
You may obtain the element size of an array or structure
using the sizeof operator. So if you have an array of structures, you can
compute an index into the array as follows:
.286 ;Allow 80286 instructions.
s struct
<some number of fields>
s ends
.
.
.
array s 16 dup ({}) ;An array of 16 "s" elements
.
.
.
imul bx, I, sizeof s ;Compute BX := I * elementsize
mov al, array[bx].fieldname
You can also apply the sizeof operator to
other data types to obtain their size in bytes. For example, sizeof byte
returns 1, sizeof word returns two, and sizeof dword returns 4.
Of course, applying this operator to MASM's built-in data types is questionable since the
size of those objects is fixed. However, if you create your own data types using typedef,
it makes perfect sense to compute the size of the object using the sizeof
operator:
integer typedef word
Array integer 16 dup (?)
.
.
.
imul bx, bx, sizeof integer
.
.
.
In the code above, sizeof integer would return
two, just like sizeof word. However, if you change the typedef
statement so that integer is a dword rather than a word,
the sizeof integer operand would automatically change its value to four to
reflect the new size of an integer.
The lengthof operator returns the total number
of elements in an array. For the Array variable above, lengthof Array
would return 16. If you have a two dimensional array, lengthof returns the
total number of elements in that array.
When you use the lengthof and sizeof
operators with arrays, you must keep in mind that it is possible for you to declare arrays
in ways that MASM can misinterpret. For example, the following statements all declare
arrays containing eight words:
A1 word 8 dup (?)
A2 word 1, 2, 3, 4, 5, 6, 7, 8
; Note: the "\" is a "line continuation" symbol. It tells MASM to append
; the next line to the end of the current line.
A3 word 1, 2, 3, 4, \
5, 6, 7, 8
A4 word 1, 2, 3, 4
word 5, 6, 7, 8
Applying the sizeof and lengthof
operators to A1, A2, and A3 produces sixteen
(sizeof) and eight (lengthof). However, sizeof(A4) produces eight and
lengthof(A4) produces four. This happens because MASM thinks that the arrays begin
and end with a single data declaration. Although the A4 declaration sets
aside eight consecutive words, just like the other three declarations above, MASM thinks
that the two word directives declare two separate arrays rather than a single array. So if
you want to initialize the elements of a large array or a multidimensional array and you
also want to be able to apply the lengthof and sizeof operators
to that array, you should use A3's form of declaration rather than A4's.
The type operator returns a constant that is
the number of bytes of the specified operand. For example, type(word) returns
the value two. This revelation, by itself, isn't particularly interesting since the size
and sizeof operators also return this value. However, when you use the type
operator with the comparison operators (eq, ne, le, lt, gt, and ge), the comparison
produces a true result only if the types of the operands are the same. Consider the
following definitions:
Integer typedef word
J word ?
K sword ?
L integer ?
M word ?
byte type (J) eq word ;value = 0FFh
byte type (J) eq sword ;value = 0
byte type (J) eq type (L) ;value = 0FFh
byte type (J) eq type (M) ;value = 0FFh
byte type (L) eq integer ;value = 0FFh
byte type (K) eq dword ;value = 0
Since the code above typedef'd Integer
to word, MASM treats integers and words as the same type. Note that with the
exception of the last example above, the value on either side of the eq
operator is two. Therefore, when using the comparison operations with the type
operator, MASM compares more than just the value. Therefore, type and sizeof
are not synonymous. E.g.,
byte type (J) eq type (K) ;value = 0
byte (sizeof J) equ (sizeof K) ;value = 0FFh
The type operator is especially useful when
using MASM's conditional assembly directives. See "Conditional Assembly" for
more details.
The examples above also demonstrate another interesting
MASM feature. If you use a type name within an expression, MASM treats it as though you'd
entered "type(name)" where name is a symbol of the given type. In
particular, specifying a type name returns the size, in bytes, of an object of that type.
Consider the following examples:
Integer typedef word
s struct
d dword ?
w word ?
b byte ?
s ends
byte word ;value = 2
byte sword ;value = 2
byte byte ;value = 1
byte dword ;value = 4
byte s ;value = 7
byte word eq word ;value = 0FFh
byte word eq sword ;value = 0
byte b eq dword ;value = 0
byte s eq byte ;value = 0
byte word eq Integer ;value = 0FFh
The high and low operators, like offset
and seg, change the type of expression from whatever it was to a constant.
These operators also affect the value of the expression - they decompose it into a high
order byte and a low order byte. The high operator extracts bits eight
through fifteen of the expression, the low operator extracts and returns bits
zero through seven. Highword and lowword extract the H.O. and
L.O. 16 bits of an expression:
You can extract bits 16-23 and 24-31 using expressions of
the form low( highword( expr )) and high( highword(
expr )), respectively.
Although you will rarely need to use a complex address expression employing more than two operands and a single operator, the need does arise on occasion. MASM supports a simple operator precedence convention based on the following rules:
| Precedence | Operators |
|---|---|
| (Highest) | |
| 1 | length, lengthof, size, sizeof, ( ), [ ], < > |
| 2 | . (structure field name operator) |
| 3 | CS: DS: ES: FS: GS: SS: (Segment override prefixes) |
| 4 | ptr offset set type opattr this |
| 5 | high, low, highword, lowword |
| 6 | + - (unary) |
| 7 | * / mod shl shr |
| 8 | + - (binary) |
| 9 | eq ne lt le gt ge |
| 10 | not |
| 11 | and |
| 12 | or xor |
| 13 | short .type |
| (Lowest) |
Parentheses should only surround expressions. Some
operators, like sizeof and lengthof, require type names, not
expressions. They do not allow you to put parentheses around the name. Therefore, "(sizeof
X)" is legal, but "sizeof(X)" is not. Keep this in mind
when using parentheses to override operator precedence in an expression. If MASM generates
an error, you may need to rearrange the parentheses in your expression.
As is true for expressions in a high level language, it is a good idea to always use parentheses to explicitly state the precedence in all complex address expressions (complex meaning that the expression has more than one operator). This generally makes the expression more readable and helps avoid precedence related bugs.
|
Table of Content | Chapter Eight (Part 6) |
Chapter Eight: MASM: Directives &
Pseudo-Opcodes (Part 5)
26 SEP 1996