Chapter Nine Managing Large Programs
9.1 Chapter Overview
When writing larger HLA programs you do not typically write the whole program as a single source file. This chapter discusses how to break up a large project into smaller pieces and assemble the pieces separately. This radically reduces development time on large projects.
9.2 Managing Large Programs
Most assembly language programs are not totally stand alone programs. In general, you will call various standard library or other routines that are not defined in your main program. For example, you've probably noticed by now that the 80x86 doesn't provide any machine instructions like "read", "write", or "printf" for doing I/O operations. Of course, you can write your own procedures to accomplish this. Unfortunately, writing such routines is a complex task, and beginning assembly language programmers are not ready for such tasks. That's where the HLA Standard Library comes in. This is a package of procedures you can call to perform simple I/O operations like stdout.put.
The HLA Standard Library contains tens of thousands of lines of source code. Imagine how difficult programming would be if you had to merge these thousands of lines of code into your simple programsl imagine how slow compiling your programs would be if you had to compile those tens of thousands of lines with each program you write. Fortunately, you don't have to.
For small programs, working with a single source file is fine. For large programs this gets very cumbersome (consider the example above of having to include the entire HLA Standard Library into each of your programs). Furthermore, once you've debugged and tested a large section of your code, continuing to assemble that same code when you make a small change to some other part of your program is a waste of time. The HLA Standard Library, for example, takes several minutes to assemble, even on a fast machine. Imagine having to wait five or ten minutes on a fast Pentium machine to assemble a program to which you've made a one line change!
As with high level languages, the solution is separate compilation . First, you break up your large source files into manageable chunks. Then you compile the separate files into object code modules. Finally, you link the object modules together to form a complete program. If you need to make a small change to one of the modules, you only need to reassemble that one module, you do not need to reassemble the entire program.
The HLA Standard Library works in precisely this way. The Standard Library is already compiled and ready to use. You simply call routines in the Standard Library and link your code with the Standard Library using a linker program. This saves a tremendous amount of time when developing a program that uses the Standard Library code. Of course, you can easily create your own object modules and link them together with your code. You could even add new routines to the Standard Library so they will be available for use in future programs you write.
"Programming in the large" is a term software engineers have coined to describe the processes, methodologies, and tools for handling the development of large software projects. While everyone has their own idea of what "large" is, separate compilation, and some conventions for using separate compilation, are among the more popular techniques that support "programming in the large." The following sections describe the tools HLA provides for separate compilation and how to effectively employ these tools in your programs.
9.3 The #INCLUDE Directive
The #INCLUDE directive, when encountered in a source file, switches program input from the current file to the file specified in the parameter list of the include directive. This allows you to construct text files containing common constants, types, source code, and other HLA items, and include such a file into the assembly of several separate programs. The syntax for the include directive is#include( "filename" )
Filename must be a valid filename. HLA merges the specified file into the compilation at the point of the #INCLUDE directive. Note that you can nest #INCLUDE statements inside files you include. That is, a file being included into another file during assembly may itself include a third file. In fact, the "stdlib.hhf" header file you see in most example programs contains the following1:#include( "hla.hhf" ) #include( "x86.hhf" ) #include( "misctypes.hhf" ) #include( "hll.hhf" ) #include( "excepts.hhf" ) #include( "memory.hhf" ) #include( "args.hhf" ) #include( "conv.hhf" ) #include( "strings.hhf" ) #include( "cset.hhf" ) #include( "patterns.hhf" ) #include( "tables.hhf" ) #include( "arrays.hhf" ) #include( "chars.hhf" ) #include( "math.hhf" ) #include( "rand.hhf" ) #include( "stdio.hhf" ) #include( "stdin.hhf" ) #include( "stdout.hhf" ) Program 9.1 The stdlib.hhf Header File, as of 01/01/2000
By including "stdlib.hhf" in your source code, you automatically include all the HLA library modules. It's often more efficient (in terms of compile time and size of code generated) to provide only those #INCLUDE statements for the modules you actually need in your program. However, including "stdlib.hhf" is extremely convenient and takes up less space in this text, which is why most programs appearing in this text use "stdlib.hhf".
Note that the #INCLUDE directive does not need to end with a semicolon. If you put a semicolon after the #INCLUDE, that semicolon becomes part of the source file and is the first character following the included file during compilation. HLA generally allows spare semicolons in various parts of the program, so you will often see a #INCLUDE statement ending with a semicolon that produces no harm. In general, though, you should not get in the habit of putting semicolons after #INCLUDE statements because there is the slight possibility this could create a syntax error in certain circumstances.
Using the #include directive by itself does not provide separate compilation. You could use the include directive to break up a large source file into separate modules and join these modules together when you compile your file. The following example would include the PRINTF.HLA and PUTC.HLA files during the compilation of your program:#include( "printf.hla" ) #include( "putc.hla" )
Now your program will benefit from the modularity gained by this approach. Alas, you will not save any development time. The #INCLUDE directive inserts the source file at the point of the #INCLUDE during compilation, exactly as though you had typed that code in yourself. HLA still has to compile the code and that takes time. Were you to include all the files for the Standard Library routines in this manner, your compilations would take forever.
In general, you should not use the include directive to include source code as shown above2. Instead, you should use the #INCLUDE directive to insert a common set of constants, types, external procedure declarations, and other such items into a program. Typically an assembly language include file does not contain any machine code (outside of a macro, see the chapter on Macros and the Compile-Time Language for details). The purpose of using #INCLUDE files in this manner will become clearer after you see how the external declarations work.
9.4 Ignoring Duplicate Include Operations
As you begin to develop sophisticated modules and libraries, you eventually discover a big problem: some header files will need to include other header files (e.g., the stdlib.hhf header file includes all the other Standard Library Header files). Well, this isn't actually a big problem, but a problem will occur when one header file includes another, and that second header file includes another, and that third header file includes another, and ..., and that last header file includes the first header file. Now this is a big problem.
There are two problems with a header file indirectly including itself. First, this creates an infinite loop in the compiler. The compiler will happily go on about its business including all these files over and over again until it runs out of memory or some other error occurs. Clearly this is not a good thing. The second problem that occurs (usually before the problem above) is that the second time HLA includes a header file, it starts complaining bitterly about duplicate symbol definitions. After all, the first time it reads the header file it processes all the declarations in that file, the second time around it views all those symbols as duplicate symbols.
HLA provides a special include directive that eliminates this problem: #INCLUDEONCE. You use this directive exactly like you use the #include directive, e.g.,#includeonce( "myHeaderFile.hhf" )
If myHeaderFile.hhf directly or indirectly includes itself (with a #INCLUDEONCE directive), then HLA will ignore the new request to include the file. Note, however, that if you use the #INCLUDE directive, rather than #INCLUDEONCE, HLA will include the file a second name. This was done in case you really do need to include a header file twice, for some reason (though it is hard to imagine needing to do this).
The bottom line is this: you should always use the #INCLUDEONCE directive to include header files you've created. In fact, you should get in the habit of always using #INCLUDEONCE, even for header files created by others (the HLA Standard Library already has provisions to prevent recursive includes, so you don't have to worry about using #INCLUDEONCE with the Standard Library header files).
There is another technique you can use to prevent recursive includes - using conditional compilation. For details on this technique, see the chapter on the HLA Compile-Time Language in a later volume.
9.5 UNITs and the EXTERNAL Directive
Technically, the #INCLUDE directive provides you with all the facilities you need to create modular programs. You can create several modules, each containing some specific routine, and include those modules, as necessary, in your assembly language programs using #INCLUDE. However, HLA provides a better way: external and public symbols.
One major problem with the include mechanism is that once you've debugged a routine, including it into a compilation still wastes a lot of time since HLA must recompile bug-free code every time you assemble the main program. A much better solution would be to preassemble the debugged modules and link the object code modules together rather than reassembling the entire program every time you change a single module. This is what the EXTERNAL directive allows you to do.
To use the external facilities, you must create at least two source files. One file contains a set of variables and procedures used by the second. The second file uses those variables and procedures without knowing how they're implemented. The only problem is that if you create two separate HLA programs, the linker will get confused when you try to combine them. This is because both HLA programs have their own main program. Which main program does the OS run when it loads the program into memory? To resolve this problem, HLA uses a different type of compilation module, the UNIT, to compile programs without a main program. The syntax for an HLA UNIT is actually simpler than that for an HLA program, it takes the following form:unit unitname; << declarations >> end unitname;
With one exception (the VAR section), anything that can go in the declaration section of an HLA program can go into the declaration section of an HLA unit. Notice that a unit does not have a BEGIN clause and there are no program statements in the unit3; a unit only contains declarations.
In addition to the fact that a unit does not contain any executable statements, there is one other difference between units and programs. Units cannot have a VAR section. This is because the VAR section declares variables that are local to the main program's source code. Since there is no source code associated with a unit, VAR sections are illegal4
To demonstrate, consider the following two modules:unit Number1; static Var1: uns32; Var2: uns32; procedure Add1and2; begin Add1and2; push( eax ); mov( Var2, eax ); add( eax, Var1 ); end Add1and2; end Number1; Program 9.2 Example of a Simple HLA Unitprogram main; #include( "stdlib.hhf" ); begin main; mov( 2, Var2 ); mov( 3, Var1 ); Add1and2(); stdout.put( "Var1=", Var1, nl ); end main; Program 9.3 Main Program that References External Objects
The main program references Var1, Var2, and Add1and2, yet these symbols are external to this program (they appear in unit Number1). If you attempt to compile the main program as it stands, HLA will complain that these three symbols are undefined.
Therefore, you must declare them external with the EXTERNAL option. An external procedure declaration looks just like a forward declaration except you use the reserved word EXTERNAL rather than FORWARD. To declare external static variables, simply follow those variables' declarations with the reserved word EXTERNAL. The following is a modification to the previous main program that includes the external declarations:program main; #include( "stdlib.hhf" ); procedure Add1and2; external; static Var1: uns32; external; Var2: uns32; external; begin main; mov( 2, Var2 ); mov( 3, Var1 ); Add1and2(); stdout.put( "Var1=", Var1, nl ); end main; Program 9.4 Modified Main Program with EXTERNAL Declarations
If you attempt to compile this second version of main, using the typical HLA compilation command "HLA main2.hla" you will be somewhat disappointed. This program will actually compile without error. However, when HLA attempts to link this code it will report that the symbols Var1, Var2, and Add1and2 are undefined. This happens because you haven't compiled and linked in the associated unit with this main program. Before you try that, and discover that it still doesn't work, you should know that all symbols in a unit, by default, are private to that unit. This means that those symbols are inaccessible in code outside that unit unless you explicitly declare those symbols as public symbols. To declare symbols as public, you simply put external declarations for those symbols in the unit before the actual symbol declarations. If an external declaration appears in the same source file as the actual declaration of a symbol, HLA assumes that the name is needed externally and makes that symbol a public (rather than private) symbol. The following is a correction to the Number1 unit that properly declares the external objects:unit Number1; static Var1: uns32; external; Var2: uns32; external; procedure Add1and2; external; static Var1: uns32; Var2: uns32; procedure Add1and2; begin Add1and2; push( eax ); mov( Var2, eax ); add( eax, Var1 ); end Add1and2; end Number1; Program 9.5 Correct Number1 Unit with External Declarations
It may seem redundant declaring these symbols twice as occurs in Program 9.5, but you'll soon seen that you don't normally write the code this way.
If you attempt to compile the main program or the Number1 unit using the typical HLA statement, i.e.,HLA main2.hla HLA unit2.hla
You'll quickly discover that the linker still returns errors. It returns an error on the compilation of main2.hla because you still haven't told HLA to link in the object code associated with unit2.hla. Likewise, the linker complains if you attempt to compile unit2.hla by itself because it can't find a main program. The simple solution is to compile both of these modules together with the following single command:HLA main2.hla unit2.hla
This command will properly compile both modules and link together their object code.
Unfortunately, the command above defeats one of the major benefits of separate compilation. When you issue this command it will compile both main2 and unit2 prior to linking them together. Remember, a major reason for separate compilation is to reduce compilation time on large projects. While the above command is convenient, it doesn't achieve this goal.
To separately compile the two modules you must run HLA separately on them. Of course, we saw earlier that attempting to compile these modules separately produced linker errors. To get around this problem, you need to compile the modules without linking them. The "-c" (compile-only) HLA command line option achieves this. To compile the two source files without running the linker, you would use the following commands:HLA -c main2.hla HLA -c unit2.hla
This produces two object code files, main2.obj and unit2.obj, that you can link together to produce a single executable. You could run the linker program directly, but an easier way is to use the HLA compiler to link the object modules together for you:HLA main2.obj unit2.obj
Under Windows, this command produces an executable file named main2.exe5; under Linux, this command produces a file named main2. You could also type the following command to compile the main program and link it with a previously compiled unit2 object module:HLA main2.hla unit2.obj
In general, HLA looks at the suffixes of the filenames following the HLA commands. If the filename doesn't have a suffix, HLA assumes it to be ".HLA". If the filename has a suffix, then HLA will do the following with the file:
- If the suffix is ".HLA", HLA will compile the file with the HLA compiler.
- If the suffix is ".ASM", HLA will assemble the file with MASM.
- If the suffix is ".OBJ" or ".LIB"(Windows), or ".o" or ".a" (Linux), then HLA will link that module with the rest of the compilation.
9.5.1 Behavior of the EXTERNAL Directive
Whenever you declare a symbol EXTERNAL using the external directive, keep in mind several limitations of EXTERNAL objects:
- Only one EXTERNAL declaration of an object may appear in a given source file. That is, you cannot define the same symbol twice as an EXTERNAL object.
- Only PROCEDURE, STATIC, READONLY, and STORAGE variable objects can be external. VAR and parameter objects cannot be external.
- External objects must be at the global declaration level. You cannot declarare EXTERNAL objects within a procedure or other nested structure.
- EXTERNAL objects publish their name globally. Therefore, you must carefully choose the names of your EXTERNAL objects so they do not conflict with other symbols.
This last point is especially important to keep in mind. As this text is being written, the HLA compiler translates your HLA source code into assembly code. HLA assembles the output by using MASM (the Microsoft Macro Assembler), Gas (Gnu's as), or some other assembler. Finally, HLA links your modules using a linker. At each step in this process, your choice of external names could create problems for you.
Consider the following HLA external/public declaration:static extObj: uns32; external; extObj: uns32; localObject: uns32;
When you compile a program containing these declarations, HLA automatically generates a "munged" name for the localObject variable that probably isn't ever going to have any conflicts with system-global external symbols6. Whenever you declare an external symbol, however, HLA uses the object's name as the default external name. This can create some problems if you inadvertently use some global name as your variable name. Worse still, the assembler will not be able to properly process HLA's output if you happen to choose an identifier that is legal in HLA but is one of the assembler's reserved word. For example, if you attempt to compile the following code fragment as part of an HLA program (producing MASM output), it will compile properly but MASM will not be able to assemble the code:static c: char; external; c: char;
The reason MASM will have trouble with this is because HLA will write the identifier "c" to the assembly language output file and it turns out that "c" is a MASM reserved word (MASM uses it to denote C-language linkage).
To get around the problem of conflicting external names, HLA supports an additional syntax for the EXTERNAL option that lets you explicitly specify the external name. The following example demonstrates this extended syntax:static c: char; external( "var_c" ); c: char;
If you follow the EXTERNAL keyword with a string constant enclosed by parentheses, HLA will continue to use the declared name (c in this example) as the identifier within your HLA source code. Externally (i.e., in the assembly code) HLA will substitute the name var_c whenever you reference c. This features helps you avoid problems with the misuse of assembler reserved words, or other global symbols, in your HLA programs.
You should also note that this feature of the EXTERNAL option lets you create aliases. For example, you may want to refer to an object by the name StudentCount in one module while refer to the object as PersonCount in another module (you might do this because you have a general library module that deals with counting people and you want to use the object in a program that deals only with students). Using a declaration like the following lets you do this:static StudentCount: uns32; external( "PersonCount" );
Of course, you've already seen some of the problems you might encounter when you start creating aliases. So you should use this capability sparingly in your programs. Perhaps a more reasonable use of this feature is to simplify certain OS APIs. For example, Win32 uses some really long names for certain procedure calls. You can use the EXTERNAL directive to provide a more meaningful name than the standard one supplied by the operating system.
9.5.2 Header Files in HLA
HLA's technique of using the same EXTERNAL declaration to define public as well as external symbols may seem somewhat counter-intuitive. Why not use a PUBLIC reserved word for public symbols and the EXTERNAL keyword for external definitions? Well, as counter-intuitive as HLA's external declarations may seem, they are founded on decades of solid experience with the C/C++ programming language that uses a similar approach to public and external symbols7. Combined with a header file, HLA's external declarations make large program maintenance a breeze.
An important benefit of the EXTERNAL directive (versus separate PUBLIC and EXTERNAL directives) is that it lets you minimize duplication of effort in your source files. Suppose, for example, you want to create a module with a bunch of support routines and variables for use in several different programs (e.g., the HLA Standard Library). In addition to sharing some routines and some variables, suppose you want to share constants, types, and other items as well.
The #INCLUDE file mechanism provides a perfect way to handle this. You simply create a #INCLUDE file containing the constants, macros, and external definitions and include this file in the module that implements your routines and in the modules that use those routines (see Figure 9.1).
Figure 9.1 Using Header Files in HLA Programs
A typical header file contains only CONST, VAL, TYPE, STATIC, READONLY, STORAGE, and procedure prototypes (plus a few others we haven't look at yet, like macros). Objects in the STATIC, READONLY, and STORAGE sections, as well as all procedure declarations, are always EXTERNAL objects. In particular, you generally should not put any VAR objects in a header file, nor should you put any non-external variables or procedure bodies in a header file. If you do, HLA will make duplicate copies of these objects in the different source files that include the header file. Not only will this make your programs larger, but it will cause them to fail under certain circumstances. For example, you generally put a variable in a header file so you can share the value of that variable amongst several different modules. However, if you fail to declare that symbol as external in the header file and just put a standard variable declaration there, each module that includes the source file will get its own separate variable - the modules will not share a common variable.
If you create a standard header file, containing CONST, VAL, and TYPE declarations, and external objects, you should always be sure to include that file in the declaration section of all modules that need the definitions in the header file. Generally, HLA programs include all their header files in the first few statements after the PROGRAM or UNIT header.
This text adopts the HLA Standard Library convention of using an ".hhf" suffix for HLA header files ("HHF" stands for HLA Header File).
1Note that this file changes over time as new library modules appear in the HLA Standard Library, so this file is probably not up to date. Furthermore, there are some minor differences between the Linux and Windows version of this file. The OS-specific entries do not appear in this example.
2There is nothing wrong with this, other than the fact that it does not take advantage of separate compilation.
3Of course, units may contain procedures and those procedures may have statements, but the unit itself does not have any executable instructions associated with it.
4Of course, procedures in the unit may have their own VAR sections, but the procedure's declaration section is separate from the unit's declaration section.
5If you want to explicitly specify the name of the output file, HLA provides a command-line option to achieve this. You can get a menu of all legal command line options by entering the command "HLA -?".
6 Typically, HLA creates a name like ?001A_localObject out of localObject. This is a legal MASM identifier but it is not likely it will conflict with any other global symbols when HLA compiles the program with MASM.
7Actually, C/C++ is a little different. All global symbols in a module are assumed to be public unless explicitly declared private. HLA's approach (forcing the declaration of public items via EXTERNAL) is a little safer.