7.4 Random Access Files

7.4 Random Access Files

The problem with sequential files is that they are, well, sequential. They are great for dumping and retrieving large blocks of data all at once, but they are not suitable for applications that need to read, write, and rewrite the same data in a file multiple times. In those situations random access files provide the only reasonable alternative.

Windows and Linux don't differentiate sequential and random access files anymore than the CPU differentiates byte and character values in memory; it's up to your application to treat the files as sequential or random access. As such, you use many of the same functions to manipulate random access files as you use to manipulate sequential access files; you just use them differently is all.

You still open files with fileio.open and fileio.openNew. Random access files are generally opened for reading or reading and writing. You rarely open a random access file as write-only since a program typically needs to read data if it's jumping around in the file.

You still close the files with fileio.close.

You can read and write the files with fileio.get and fileio.put, although you would not normally use these functions for random access file I/O because each record you read or write has to be exactly the same length and these functions aren't particularly suited for fixed-length record I/O. Most of the time you will use one of the following functions to read and write fixed-length data:
fileio.write( fileHandle, buffer, count );
 
fileio.read( fileHandle, buffer, count );
 

 
The fileHandle parameter is the usual file handle value (a dword variable). The count parameter is an uns32 object that specifies how many bytes to read or write. The buffer parameter must be an array object with at least count bytes. This parameter supplies the address of the first byte in memory where the I/O transfer will take place. These functions return the number of bytes read or written in the EAX register. For fileio.read, if the return value in EAX does not equal count's value, then you've reached the end of the file. For fileio.write, if EAX does not equal count then the disk is full.

Here is a typical call to the fileio.read function that will read a record from a file:
	fileio.read( myHandle, myRecord, @size( myRecord ) );
 

 
If the return value in EAX does not equal @size( myRecord ) and it does not equal zero (indicating end of file) then there is something seriously wrong with the file since the file should contain an integral number of records.

Writing data to a file with fileio.write uses a similar syntax to fileio.read.

You can use fileio.read and fileio.write to read and write data from/to a sequential file, just as you can use routines like fileio.get and fileio.put to read/write data from/to a random access file. You'd typically use these routines to read and write data from/to a binary sequential file.

The functions we've discussed to this point don't let you randomly access records in a file. If you call fileio.read several times in a row, the program will read those records sequentially from the text file. To do true random access I/O we need the ability to jump around in the file. Fortunately, the HLA Standard Library's file module provides several functions you can use to accomplish this.

The fileio.position function returns the current offset into the file in the EAX register. If you call this function immediately before reading or writing a record to a file, then this function will tell you the exact position of that record. You can use this value to quickly locate that record for a future access. The calling sequence for this function is
	fileio.position( fileHandle ); // Returns current file position in EAX.
 

 
The fileio.seek function repositions the file pointer to the offset you specify as a parameter. The following is the calling sequence for this function:
	fileio.seek( fileHandle, offset ); // Repositions file to specified offset.
 

 
The function call above will reposition the file pointer to the byte offset specified by the offset parameter. If you feed this function the value returned by fileio.position, then the next read or write operation will access the record written (or read) immediately after the fileio.position call.

You can pass any arbitrary offset value as a parameter to the fileio.seek routine; this value does not have to be one that the fileio.position function returns. For random access file I/O you would normally compute this offset file by specifying the index of the record you wish to access multiplied by the size of the record. For example, the following code computes the byte offset of record index in the file, repositions the file pointer to that record, and then reads the record:
	intmul( @size( myRecord ), index, ebx );
 
	fileio.seek( fileHandle, ebx );
 
	fileio.read( fileHandle, (type byte myRecord), @size( myRecord ) );
 

 
You can use essentially this same code sequence to select a specific record in the file for writing.

Note that it is not an error to seek beyond the current end of file and then write data. If you do this, the OS will automatically fill in the intervening records with uninitialized data. Generally, this isn't a great way to create files, but it is perfectly legal. On the other hand, be aware that if you do this by accident, you may wind up with garbage in the file and no error to indicate that this has happened.

The fileio module provides another routine for repositioning the file pointer: fileio.rSeek. This function's calling sequence is very similar to fileio.seek, it is
	fileio.rSeek( fileHandle, offset );
 

 
The difference between this function and the regular fileio.seek function is that this function repositions the file pointer offset bytes from the end of the file (rather than offset bytes from the start of the file). The "r" in "rSeek" stands for "reverse" seek.

Repositioning the file pointer, especially if you reposition it a fair distance from its current location, can be a time-consuming process. If you reposition the file pointer and then attempt to read a record from the file, the system may need to reposition a disk arm (a very slow process) and wait for the data to rotate underneath the disk read/write head. This is why random access I/O is much less efficient than sequential I/O.

The following program demonstrates random access I/O by writing and reading a file of records:
 
program RandomAccessDemo;
 
#include( "stdlib.hhf" )
 

 
type
 
    fileRec:
 
        record
 
            x:int16;
 
            y:int16;
 
            magnitude:uns8;
 
        endrecord;
 
        
 
const
 

 
    // Some arbitrary data we can use to initialize the file:
 

 
    fileData:=
 
        [
 
            fileRec:[ 2000, 1, 1 ],
 
            fileRec:[ 1000, 10, 2 ],
 
            fileRec:[ 750, 100, 3 ],
 
            fileRec:[ 500, 500, 4 ],
 
            fileRec:[ 100, 1000, 5 ],
 
            fileRec:[ 62, 2000, 6 ],
 
            fileRec:[ 32, 2500, 7 ],
 
            fileRec:[ 10, 3000, 8 ]
 
        ];
 

 

 
static
 
    fileHandle:         dword;
 
    RecordFromFile:     fileRec;
 
    InitialFileData:    fileRec[ 8 ] := fileData;
 

 

 
                        
 
begin RandomAccessDemo; 
 

 
    fileio.openNew( "fileRec.bin" );
 
    mov( eax, fileHandle );
 

 
    // Okay, write the initial data to the file in a sequential fashion:
 

 
    for( mov( 0, ebx ); ebx < 8; inc( ebx )) do
 

 
        intmul( @size( fileRec ), ebx, ecx );   // Compute index into fileData
 
        fileio.write
 
        ( 
 
            fileHandle, 
 
            (type byte InitialFileData[ecx]), 
 
            @size( fileRec )
 
        );
 

 
    endfor;
 

 
    // Okay, now let's demonstrate a random access of this file
 
    // by reading the records from the file backwards.
 

 
    stdout.put( "Reading the records, backwards:" nl );
 
    for( mov( 7, ebx ); (type int32 ebx) >= 0; dec( ebx )) do
 

 
        intmul( @size( fileRec ), ebx, ecx );   // Compute file offset
 
        fileio.seek( fileHandle, ecx );
 
        fileio.read
 
        ( 
 
            fileHandle, 
 
            (type byte RecordFromFile), 
 
            @size( fileRec )
 
        );
 
        if( eax = @size( fileRec )) then
 

 
            stdout.put
 
            ( 
 
                "Read record #", 
 
                (type uns32 ebx),
 
                ", values:" nl
 
                "  x: ", RecordFromFile.x, nl
 
                "  y: ", RecordFromFile.y, nl
 
                "  magnitude: ", RecordFromFile.magnitude, nl nl
 
            );
 

 
        else
 

 
            stdout.put( "Error reading record number ", (type uns32 ebx), nl );
 

 
        endif;
 

 
    endfor;
 
    fileio.close( fileHandle );
 
        
 
end RandomAccessDemo;
 
            
 

 
Program 7.6	 Random Access File I/O Example
 
7.5 ISAM (Indexed Sequential Access Method) Files

ISAM is a trick that attempts to allow random access to variable-length records in a sequential file. This is a technique employed by IBM on their mainframe data bases in the 1960's and 1970's. Back then, disk space was very precious (remember why we wound up with the Y2K problem?) and IBM's engineers did everything they could to save space. At that time disks held about five megabytes, or so, were the size of washing machines, and cost tens of thousands of dollars. You can appreciate why they wanted to make every byte count. Today, data base designers have disk drives with hundreds of gigabytes per drive and RAID¹ devices with dozens of these drives installed. They don't bother trying to conserve space at all ("Heck, I don't know how big the person's name can get, so I'll allocate 256 bytes for it!"). Nevertheless, even with large disk arrays, saving space is often a wise idea. Not everyone has a terabyte (1,000 gigabytes) at their disposal and a user of your application may not appreciate your decision to waste their disk space. Therefore, techniques like ISAM that can reduce disk storage requirements are still important today.

ISAM is actually a very simple concept. Somewhere, the program saves the offset to the start of every record in a file. Since offsets are four bytes long, an array of dwords will work quite nicely². Generally, as you construct the file you fill in the list (array) of offsets and keep track of the number of records in the file. For example, if you were creating a text file and you wanted to be able to quickly locate any line in the file, you would save the offset into the file of each line you wrote to the file. The following code fragment shows how you could do this:
static
 
	outputLine: string;
 
	ISAMarray: dword[ 128*1024 ]; // allow up to 128K records.
 
			.
 
			.
 
			.
 
	mov( 0, ecx );         // Keep record count here.
 
	forever
 

 
		<< create a line of text in "outputLine" >>
 

 
		fileio.position( fileHandle );
 
		mov( eax, ISAMarray[ecx*4] );  // Save away current record offset.
 
		fileio.put( fileHandle, outputLine, nl ); // Write the record.
 
		inc( ecx );  // Advance to next element of ISAMarray.
 

 
		<< determine if we're done and BREAK if we are >>
 

 
	endfor;
 

 
	<< At this point, ECX contains the number of records and >>
 
	<< ISAMarray[0]..ISAMarray[ecx-1] contain the offsets to >>
 
	<< each of the records in the file.                      >>
 

 
After building the file using the code above, you can quickly jump to an arbitrary line of text by fetching the index for that line from the ISAMarray list. The following code demonstrates how you could read line recordNumber from the file:
	mov( recordNumber, ebx );
 
	fileio.seek( fileHandle, ISAMarray[ ebx*4 ] );
 
	fileio.a_gets( fileHandle, inputString );
 

 
As long as you've precalculated the ISAMarray list, accessing an arbitrary line in this text file is a trivial matter.

Of course, back in the days when IBM programmers were trying to squeeze every byte from their databases as possible so they would fit on a five megabyte disk drive, they didn't have 512 kilobytes of RAM to hold 128K entries in the ISAMarray list. Although a half a megabyte is no big deal today, there are a couple of reasons why keeping the ISAMarray list in a memory-based array might not be such a good idea. First, databases are much larger these days. Some databases have hundreds of millions of entries. While setting aside a half a megabyte for an ISAM table might not be a bad thing, few people are willing to set aside a half a gigabyte for this purpose. Even if your database isn't amazingly big, there is another reason why you might not want to keep your ISAMarray in main memory - it's the same reason you don't keep the file in memory - memory is volatile and the data is lost whenever the application quits or the user removes power from the system. The solution is exactly the same as for the file data: you store the ISAMarray data in its own file. A program that builds the ISAM table while writing the file is a simple modification to the previous ISAM generation program. The trick is to open two files concurrently and write the ISAM data to one file while you're writing the text to the other file:
static
 
	fileHandle: dword;     // file handle for the text file.
 
	outputLine: string;    // file handle for the ISAM file.
 
	CurrentOffset: dword;  // Holds the current offset into the text file.
 
			.
 
			.
 
			.
 
	forever
 

 
		<< create a line of text in "outputLine" >>
 

 
		// Get the offset of the next record in the text file
 
		// and write this offset (sequentially) to the ISAM file.
 

 
		fileio.position( fileHandle );
 
		mov( eax, CurrentOffset );
 
		fileio.write( isamHandle, (type byte CurrentOffset), 4 );
 

 
		// Okay, write the actual text data to the text file:
 

 
		fileio.put( fileHandle, outputLine, nl ); // Write the record.
 

 
		<< determine if we're done and BREAK if we are >>
 

 
	endfor;
 

 
If necessary, you can count the number of records as before. You might write this value to the first record of the ISAM file (since you know the first record of the text file is always at offset zero, you can use the first element of the ISAM list to hold the count of ISAM/text file records).

Since the ISAM file is just a sequence of four-byte integers, each record in the file (i.e., an integer) has the same length. Therefore, we can easily access any value in the ISAM file using the random access file I/O mechanism. In order to read a particular line of text from the text file, the first task is to read the offset from the ISAM file and then use that offset to read the desired line from the text file. The code to accomplish this is as follows:
 
	// Assume we want to read the line specified by the "lineNumber" variable.
 

 
	if( lineNumber <> 0 ) then
 

 
		// If not record number zero, then fetch the offset to the desired
 
		// line from the ISAM file:
 

 
		intmul( 4, lineNumber, eax );   // Compute the index into the ISAM file.
 
		fileio.seek( isamHandle, eax );
 
		fileio.read( isamHandle, (type byte CurrentOffset), 4 ); // Read offset
 

 
	else
 

 
		mov( 0, eax );  // Special case for record zero because the file
 
		                // contains the record count in this position.
 

 
	endif;
 
	fileio.seek( fileHandle, CurrentOffset ); // Set text file position.
 
	fileio.a_gets( fileHandle, inputLine );   // Read the line of text.
 

 
This operation runs at about half the speed of having the ISAM array in memory (since it takes four file accesses rather than two to read the line of text from the file), but the data is non-volatile and is not limited by the amount of available RAM.

If you decide to use a memory-based array for your ISAM table, it's still a good idea to keep that data in a file somewhere so you don't have to recompute it (by reading the entire file) every time your application starts. If the data is present in a file, all you've got to do is read that file data into your ISAMarray list. Assuming you've stored the number of records in element number zero of the ISAM array, you could use the following code to read your ISAM data into the ISAMarray variable:
static
 
	isamSize: uns32;
 
	isamHandle: dword;
 
	fileHandle: dword;
 
	ISAMarray: dword[ 128*1024 ];
 
		.
 
		.
 
		.
 
	// Read the first record of the ISAM file into the isamSize variable:
 

 
	fileio.read( isamHandle, (type byte isamSize), 4 );
 

 
	// Now read the remaining data from the ISAM file into the ISAMarray
 
	// variable:
 

 
	if( isamSize >= 128*1024 ) then
 

 
		raise( ex.ValueOutOfRange );
 

 
	endif;
 
	intmul( 4, isamSize, ecx );  // #records * 4 is number of bytes to read.
 
	fileio.read( isamHandle, (type byte ISAMarray), ecx );
 

 
	// At this point, ISAMarray[0]..ISAMarray[isamSize-1] contain the indexes
 
	// into the text file for each line of text.
 

 
¹Redundant array of inexpensive disks. RAID is a mechanism for combining lots of cheap disk drives together to form the equivalent of a really large disk drive.

²This assumes, of course, that your files have a maximum size of four gigabytes.

7.4 Random Access Files

7.5 ISAM (Indexed Sequential Access Method) Files

Web Site Hits Since Jan 1, 2000

Web Site Hits Since
Jan 1, 2000