The Assembler Developer's Kit

As far as computer languages go, most assembly languages have a fairly simple syntax. As a result, many programmers have actually written their own assembler. Though many open source assemblers exist and one could argue that there is no real reason for writing an assembler from scratch, there are many benefits to doing exactly that. Among these benefits include:
  • Writing an assembler will give a programmer a good appreciation of the instruction encoding
  • Writing an assembler will let the programmer insert the features they want into the assembler
  • Writing an assembler allows the author to design a syntax for the assembly language that they prefer
  • Writing an assembler is a good medium-sized project that many beginning to intermediate programmers can handle, allowing them to sharpen their programming skills on a practical project.

Unfortunately, there are some disadvantages to writing an assembler, as well:

  • Creating a "hobby-quality" assembler isn't a difficult task, but creating a "commercial-quality" assembler with a professional feature set is a large project, often requiring skills that beginning to intermediate programmers don't possess.
  • Creating a modern assembler requires a lot of advanced compiler knowledge, again that most beginning to intermediate programmers don't have.
  • Creating a fast assembler, one that others will want to use, requires a commanding knowledge of data structures and algorithms. It's easy enough to whip out a little "toy" assembler that works fine for small projects; it's a bit more difficult to create a high-performance system that handles large projects just as well as small projects.
  • While writing code to process individual machine instructions is fun and interesting, a professional-quality modern assembler requires a lot of other code to handle declarations, data types, macros, and other advanced features. These features are not particularly easy or obvious to implement.

The purpose of the Assembler Developer's Kit is to provide documentation and source code to those individuals who are interested in writing a professional quality assembler, without all the work needed to create such a product from scratch. Using the ADK will allow a programmer to concentrate on the interesting and fun parts of writing an assembler (e.g., working on the instructions and the encoding of those instructions) while sparing themselves all the "grunt" work (e.g., writing high-performance symbol table management code, writing a lexical analyzer, parsing declarations, and so on).

The ADK is based on the code written for the High-Level Assembler V3.0 (or, perhaps, it's better to say that HLA v3.0 is being written around the use of the ADK). This code is written in assembly language using good algorithms. As such, it executes very fast! Because the ADK is designed to implement the HLA v3.0 feature set, you'll find that the ADK provides a very rich set of advanced assembler features. Because the ADK is written in assembly language using HLA (v2.x), the code is very easy read and understand. Here are some of the advantages of writing an assembler based on the ADK:

  • Assemblers designed around the ADK will be very fast (most of the time-consuming algorithms you'll find in an assembler are already efficiently implemented in the ADK).
  • The ADK contains over 75,000 lines of code that an assembler author will not have to write themselves.
  • The ADK is very modular. You can easily eliminate features you don't want.
  • The ADK is based on the HLA v3.0 feature set, perhaps the most advanced x86 assembler ever designed.
  • Unlike most open source assemblers, the ADK contains documentation that explains the internal operation of the code, so you can more easily figure out the internal operation of the system in order to make modifications.
  • If you decide to adopt an HLA-like syntax for declarations, you can use the ADK code almost as-is, supplying only the instructions needed to implement the assembly of your instructions.
  • The ADK is being designed to be portable. Code will compile under Windows and Linux. If you exercise care when writing your portion of the assembler, you'll be able to port your assembler to different operating systems with minimal effort.
  • Along with HLA v3.0, the ADK is under continuing development. Expect new features and facilities as time passes.
  • The ADK is great for those who would like to create a HLA-like assembler for processors other than the x86 (actually, there's no reason you can't use it to create an x86 high-level assembler; that's exactly what the ADK was developed for in the first place.).
  • With some minor modifications, you could even use the ADK as the basis for creating a high-level language compiler.

There are some drawbacks some beginners will find with the ADK, among them:

  • Using the ADK allows you to avoid learning some theoretical material needed to write an assembler from scratch (though some might consider this to be an advantage, this does cheat the ADK user out of this educational experience).
  • As the ADK is written in assembly language, it is not portable in the sense that the code will run on other processors. It is, however, portable between Windows, Linux, Mac OSX, and FreeBSD on the 80x86 (as HLA directly supports these OSes).
  • The ADK provides a system that is very "feature-rich" and may contain many more features than an assembler author desires (fortunately, it is very easy to strip features out of the ADK).
  • The ADK contains a lot of code, that takes considerable time to master (fortunately, you generally don't have to understand how every line of code in the ADK works in order to use it).
  • Those who do study the internal ADK code will quickly realize that it really helps to have a basic understanding of compiler theory. Though many "low-end" assemblers have been written without such prior knowledge, a high-end system like HLA v3.0 (on which the ADK is based) does require this knowledge. Those wanting to create the best possible assembler will want to learn (or know) some compiler theory.

The ADK is planned to contain the following components:

  • A lexical analyzer module (scanner)
  • Symbol table management code
  • Declarations parsing code
  • A compile-time language/macro processor module
  • A set of object-code generation modules (for different object code file formats)
  • Documentation for the internal operation of the ADK code
  • User-level documentation (that you can edit) that describes the user-visible features of the ADK components (to which you would add your specific assembler's documentation).

Not all of these components are in place at this time, but a fair amount of code is currently available. As development proceeds on the HLA v3.0 project, you can expect to see the ADK's feature set grow. Please see the feature matrix below for features that are currently implemented.

Note that the ADK source code is written in HLA v2.x, so you will need to download and install a recent version of HLA in order to modify the ADK source code. Note that the ADK source code is very modular and uses standard calling conventions, so it is perfectly possible to write your code using a different assembler and link your code into the ADK object code. However, it's probably going to be easier to maintain your assembler if you write the whole thing with HLA.

Note that the ADK project is public domain and open source. Anyone wishing to contribute may do so, as long as they are willing to release their code to the public domain (of course, anyone may contribute to the project anyway they like, but the "official" components of the ADK must all be in the public domain).

The Assembler Developer's Kit
Download the ADK source code and documentation

hla3

ADK Documentation
Documentation for the ADK lexical analyzer
Documentation for the ADK symbol table management routines
Documentation describing the ADK grammar
Documentation describing the ADK declarations parsing code (Updated June 5, 2004)
Compiler Theory Documentation
P.D. Terry has granted permission to include his text on compiler construction on Webster. If you are not familiar with compiler concepts like grammars, lexical analysis, parsing, and other subjects, this book is an excellent introduction to the subject. In assumes the reader is familiar with HLLs like C++ or Pascal, though you should be able to pick up the fundamental theory from this textbook even if you don't know these languages. If you're interested in this subject, you'll definitely want to visit P.D. Terry's website at

Table of Contents contents.pdf
Preface preface.pdf
Chapter One: Introduction
A generic introduction to compilers and compiler concepts. Easy reading and short.
chap01.pdf
Chapter Two: Translator Classification and Structure
This chapter provides a large set of definitions in common use in compiler theory. For example, it defines terms such as "assembler" or "compiler" as well as many technical terms you'll find in use in the ADK documentation. Definitely an important chapter to read.
chap02.pdf
Chapter Three: Compiler Construction and Bootstrapping
This chapter describes the process one must go through in order to create a compiler from scratch (i.e., "bootstrapping" a compiler by writing it in some other language).
chap03.pdf
Chapter Four: Machine Emulation
This chapter discusses writing an interpreter for a hypothetical machine, one that can process compiler output (for the compiler the book builds). As the ADK deals with creating an assembler for a very real machine (the 80x86), this chapter is probably of little interest to those who want to write an assembler for the x86 processor family.
chap04.pdf
Chapter Five: Language Specification
Ouch! Greek letters and math ahead! This chapter discusses formal language theory/automata theory, grammars, regular expressions, and the like. Lots of mathematical notation. But this chapter is a must-read! The ADK documentation assumes you are familiar with, and know how to read, grammars. In fact, if you only read one chapter in P.D. Terry's book, this is the one to read; definitely a prerequisite for the rest of the documentation in the PDK.
chap05.pdf
Chapter Six: Simple Assemblers
This chapter provides a description of a very simple assembler. Indeed, this is exactly the type of assembler a beginner might write were they to start off on their own without a whole lot of formal training (or without using the ADK!). Of course, all of the features of a simple assembler are found in a more complex assembler, so someone wanting to learn how to write an assembler will definitely want to read though this chapter.
chap06.pdf
Chapter Seven: Advanced Assembler Features
Despite the name of this chapter, it doesn't cover really advanced features you'll find in a modern assembler (such as one you can create with the ADK). In fact, most of the features this chapter discusses are common even in "traditional" assemblers that aren't generally considered "advanced". Nevertheless, this chapter does discuss information like macros, conditional assembly, hash tables, and other facilities you'll want to incorporate into an assembler. The ADK, using the information in this chapter, provides the information and code you'll need to write a really, really, advanced assembler.
chap07.pdf
Chapter Eight: Grammars and Their Classification
More greek symbols and math! Yet another important chapter you'll need to read and understand in order to make full use of the ADK. Among other things, this chapter discusses grammar transformations that will be important when translating a context-free grammar into assembly code while writing your assembler.
chap08.pdf
Chapter Nine: Deterministic Top-Down Parsing
The parser the ADK provides is an example of a deterministic top-down parser. Therefore, to understand the coding style present in the parser appearing in the ADK, it's helpful to know a little bit about deterministic top-down/recursive descent parsers; this chapter provides a discussion of these types of parsers.
chap09.pdf
Chapter Ten: Parser and Scanner Construction
This chapter describes how to write a recursive-descent parser (like the one in the ADK). You can safely ignore the information on LR parsers and the use of parser-generators in this chapter if you're interested only in writing an assembler (the simplicity of most assembly languages do not require LR parsing techniques).
chap10.pdf
Chapter Eleven: Syntax-Directed Translation
An assembler based on the ADK (such as HLA) will probably use a technique known as syntax-directed translation in order to generate actual machine code. This chapter provides a basic discussion of the terms and techniques associated with syntax-directed translation.
chap11.pdf
Chapter Twelve: Using Coco/R - Overview
This chapter describes a compiler-generator system the author has developed. It will probably be of little interest to programmers writing an assembler with the ADK, but if you're interested in the general subject of compiler writing you may want to read this chapter.
chap12.pdf
Chapter Thirteen: Using Coco/R - Case Studies
Another chapter on the Coco/R system. See the comments above.
chap13.pdf
Chapter Fourteen: A Simple Compiler - The Front End
This chapter discusses some techniques for generating a parser by hand (as well as via the Coco/R system). This chapter also provides a brief discussion of symbol table manipulation.
chap14.pdf
Chapter Fifteen: A Simple Compiler - The Back End
This chapter discusses code generator for the simple (hypothetical) machine language the book presents earlier. It mainly discusses things like how to translate HLL-like control structures into machine code. If you're interested in developing a high-level assembler like MASM, TASM, or HLA that has HLL-like control structures in it, you'll probably want to take a look at this chapter.
chap15.pdf
Chapter Sixteen: Simple Block Structure
This chapter discusses run-time memory management for a block structured language. The ADK supports the declaration of nested procedure declarations. If you intend to enable this facility in your assembler, you'll want to read this chapter to learn about concepts like static links and displays at run time.
chap16.pdf
Chapter Seventeen: Parameters and Functions
This chapter discusses high-level procedure declarations and invocations. If you want your assembler to support high-level procedure declarations and calls (e.g., like MASM or HLA), then you'll want to read this chapter. Note that the ADK fully supports high-level procedure declarations.
chap17.pdf
Chapter Eighteen: Concurrent Programming
This chapter probably isn't of much interest to programmers writing an assembler.
chap18.pdf
Appendix A: Software Resources for this Book appa.pdf
Appendix B: Source code for the Clang Compiler/Interpreter appb.pdf
Appendix C: Cocol Grammar for the Clang Compiler/Interpreter appc.pdf
Appendix D: Source Code for a Macro Assembler appd.pdf
Biblilography biblio.pdf
Read Me File readme.pdf
Index index.pdf
Assembler Benchmark Generator Program
The "Assembler Benchmark Generator Program" is an application (written in HLA) that creates large synthetic assembly language source files (in many different formats) that you can use to test the performance of your new assembler.
The ADK Feature Matrix
The following entries describe features planned and currently implemented in the ADK, along with a description of the implementor and any historical notes on the code
Feature Description
Implemented Date Author
Lexical Analyzer (scanner) HLA v1.x "context-free" macros are working. Still need to implement templates. Feb, 2005 Randall Hyde
Symbol Table Management Routines Complete, for the most part. Feb, 2004 Randall Hyde
Expression Parsing Constant expressions are complete, but still need to parse memory address expressions. Still need to write an external document for this code. June, 2004 Randall Hyde
Declaration Parsing Handling constant, type, variable, and procedure declarations. Still need to write the code to handle label declarations. This module still needs to be documented. Feb, 2005 Randall Hyde
Compile-time Function Facilities Most of this work is complete. Still need to implement templates This module still needs to be documented. Feb, 2005 Randall Hyde
Intermediate code format documentation This document will describe the format for the ADK intermediate code (that the object file generators will use). Work has begun, but a lot of work remains to be done on this documentation. Current documentation is not available on-line. Feb, 2004 Randall Hyde
Macro and template parsing and expansion Handles standard and "context-free" macro declarations and invocations/expansions. Still need to write template processing code. Feb 2005 Randall Hyde
Procedure declaration and call parsing Handle procedure declarations. Handle HLL-like procedure calls to those procedures. Sept 2004 - Declarations (still need to do HLL-like calls) Randall Hyde
Machine instruction parsing TBD TBD TBD
HLL-like control statement parsing TBD TBD TBD
Displacement optimization TBD TBD TBD
Native code generation TBD TBD TBD
Object module generation TBD TBD TBD

Other Assembler Projects

The ADK isn't the only solution if you want to work on an assembler. There are many open-source assemblers out there whose projects you may contribute to. The following is a list of some of the more popular open-source assembler projects.

Assembler Contact
NASM http://sourceforge.net/projects/nasm/
FASM http://flatassembler.net/
WASM http://www.openwatcom.com