Reverse Engineering Sierra's Adventure Game Interpreter - Part 2

17th September 2018 | Programming

In Part 1 of this blog series, I displayed some examples on how to parse out various data from a game that used Sierra's Adventure Game Interpreter (AGI). In this post, I will cover aspects from a more complex program that is used to extract the views from a game. The views represent pictures such as animations and inventory objects.

In an age where game sizes are measured in gigabytes instead of kilobytes, the effort to save a paltry byte here or there is hardly worth it, but in the 80s, every byte counted. This post will cover a couple of the interesting techniques which were used to make the most out of the limited resources of the computers and media of the 1980s.

Due to the memory and space constraints of the 1980s, Sierra's programmers came up with some interesting techniques to optimize the size of their files. Some of their programming trickery, such as requiring the program to be decrypted against a key phrase or switching between using both big and little endian was likely employed to obfuscate the files so casual hackers could not easily take a peek under the covers to garner some hints about the game. However, a basic compression method called Run Length Encoding (RLE) is used to reduce the size of the views, which works fairly well, especially when the same color is repeated in a row. However, RLE does not work well if there is a lot of variance in the picture, such as random static on an old TV.

Big + Little Endian

If one peruses an old Sierra catalog, they will see that Sierra supported a multitude of different operating systems and computers in the mid to late 80s. AGI was then built to support each of these different systems so the same game resources and logic could effectively be ported to many different systems (in theory, at least — applying this task was likely much trickier).

The following table lists the various systems which Sierra supported in the latter half of the 1980s.

Computer Processor Endianness
MS-DOS PC Intel 8088/x86 Little
Atari ST Motorola 680x0 Big
Macintosh Motorola 68000 Big
Apple IIe/c 6502/65C02 Little
Apple IIGS 65C816 Little
Amiga Motorola 680x0 Big

As you can see from the table, Sierra supported a wide range of architectures, which included both big and little endian processors. In the programs I've written to parse out the data from the AGI files, they use an odd combination of reading data using both big and little endian methods. There seems to be little reason to do this other than to obfuscate the file format and less on the processor type.

The endianness of a machine was a topic which was much more carefully observed in the 80s with a variety of systems available. It's one of those areas we probably learned about in school, but haven't actively had to worry about so much these days. This section will be a quick refresher for all of us.

Let's quickly review the difference between little and big endian. A brief overview from the web page Byte Order - Big and Little Endian:

Little Endian

If the hardware is built so that the lowest, least significant byte of a multi-byte scalar is stored "first", at the lowest memory address, then the hardware is said to be "little-endian"; the "little" end of the integer gets stored first, and the next bytes get stored in higher (increasing) memory locations. Little-Endian byte order is "littlest end goes first (to the littlest address)".

Machines such as the Intel/AMD x86, Digital VAX, and Digital Alpha, handle scalars in Little-Endian form.

Big Endian

If the hardware is built so that the highest, most significant byte of a multi-byte scalar is stored "first", at the lowest memory address, then the hardware is said to be "big-endian"; the "big" end of the integer gets stored first, and the next bytes get stored in higher (increasing) memory locations. Big-Endian byte order is "biggest end goes first (to the lowest address)".

Machines such as IBM mainframes, the Motorola 680x0, Sun SPARC, PowerPC, and most RISC machines, handle scalars in Big-Endian form.

Since an unsigned 8-byte integer can only go up to 255 (28 = 256 values which are 0 - 255), if a larger value is needed, such as an offset to find a resource in one of the AGI VOL files (which contain much of the game's resources), then two bytes are needed to save a larger number.

In this example, the decimal number 49619 will be stored as two 8-bit numbers, 193 and 211. This is calculated by multiplying the high byte by 256 and then adding the low byte to the result.

193*256 + 211 = 49619

In binary, the numbers 193, 211, and 49619 are represented in binary as follows:


193   = 1100 0001
211   = 1101 0011
49619 = 1100 0001 1101 0011
For little endian systems, the least significant byte (the "little" end, which represents the value 1101 0011 in this example) would actually be stored first in memory, so it would be stored like 1101 0011 1100 0001.

Little Endian
Memory Location 0 1
Data 1101 0011 1100 0001

The example code is taken from export_view.m, which grabs two bytes from the file, and then calculates the value. In this instance, the low byte is first and the high byte is second.


// Little Endian : Low - High
int lowResByte  = getc(volFile); // res len byte 1 
int highResByte = getc(volFile); // res len byte 2
int reslen = highResByte*256 + lowResByte;

In contrast, there is big endian which stores the high and low bytes in memory which looks more "correct" and conventional to how we read numbers by placing the first part (the big end) in memory first.

Big Endian
Memory Location 0 1
Data 1100 0001 1101 0011

The example code shows that the high byte is read first and the low byte read second.


// Big Endian : High - Low
int ms_byte = getc(volFile); // high byte, most significant
int ls_byte = getc(volFile); // low byte, least significant
long signature = ms_byte*256 + ls_byte;

Run Length Encoding

One crafty technique which is implemented to conserve space with the views is run length encoding. When retrieving the data for a view, a byte is read from the file, which cleverly holds both the color and the number of repeated pixels of that color. Since there is only a maximum of 16 colors which can be used with these games, only 4 bits (half a byte) are needed.

This leaves the remaining 4 bits to detail how many times to repeat that color across the row. This might seem limiting, but for each pixel that is read, two pixels are drawn to the screen, so 4 bits can theoretically represent up to 32 pixels on the screen. If you look closely at how pictures are drawn (take Graham's nose from King's Quest 1 or 2), one will notice that the pixels are fairly wide, but they can be fairly short in the vertical aspect.

Going through a number of the exported views, few of them are ever overly wide. Many of the character sprites might only be 22 pixels in width, so 4 bits can hold enough data for a majority of the time.

But what about background images, which might have large swaths of the same color (such as the sky or ground)? Backgrounds are actually drawn with vectors. If one plays an early AGI game (such as King's Quest 1 on an Apple ][), one can see the backgrounds being slowly drawn. For a plethora of other examples of backgrounds being drawn, check out the slow motion drawings at the @agistuff Twitter account.

The following is a code snippet from export_view.m to read and parse the byte containing the pixel data. The color index is stored in the first half of the byte. The number of times (loop indicator) to display the color is in the latter half of the byte. The example data will be the byte: 01000111


int pixelByte  = getc(volFile);          // NOTE 1
int colorIndex = pixelByte >> 4;         // NOTE 2
int numPixels  = pixelByte & 0b00001111; // NOTE 3

NOTE 1: Grab one byte from the file and store it as an integer into the variable pixelByte. In this example, the number retrieved is 4, which is represented as 0100 in binary. The latter half of the byte, 0111 represents the decimal number 7, which indicates how many times to repeat the color (times two, since the width of the screen is doubled).

NOTE 2: To isolate the color index value, byte shift the value to the right by 4. As seen in the example below, each value in 01000111 is shifted to the right by four places. The four high bits are shifted to the lower section, which isolates the value. In this example, it leaves the binary value of 0100, which is 4 in decimal. This is the index used for a look up table to determine which color to use (in this case, a deep red). To see the list of predefined colors, look at the initialization of the colorPalette array in export_view.m.

01000111 >> 4 = 00000100

NOTE 3: To determine the number of pixels to draw, we'll need to isolate the lower half of the byte. Perform a bitwise AND operation against pixelByte with the value 00001111. Reviewing our truth tables, only 1 & 1 will result in 1, whereas all other combinations of 0 and 1 will result in 0. So, to null out any of the high bits, we perform a bitwise AND operator with the operand 0000 on the first four bits, and then 1111 on the lower four bits to ensure that those bits are preserved.


  01000111
& 00001111
----------
  00000111

To isolate the concept of Run Length Encoding, I created a simple example program in Swift to further exemplify how to take a given string and perform run-length encoding on it.

Constructing + Saving an Image With NSBitmapImageRep

Once we finally get to the cel's data (that's not a typo — that's cel as in cel animation.), we need to take it and load it into an appropriate container object and then save it to disk. Since most of my examples are programmed in a mix of C and Objective-C, I use the Cocoa class NSBitmapImageRep.


NSBitmapImageRep *bitmap = nil;
bitmap = [[NSBitmapImageRep alloc] initWithBitmapDataPlanes:NULL
                                                 pixelsWide:image_width
                                                 pixelsHigh:cel_height
                                              bitsPerSample:8
                                            samplesPerPixel:4
                                                   hasAlpha:YES
                                                   isPlanar:NO
                                             colorSpaceName:NSCalibratedRGBColorSpace
                                                bytesPerRow:4 * image_width
                                               bitsPerPixel:32];
The initializer for NSBitmapImageRep looks daunting at first, but it is not quite as bad as it first seems. The most important thing to keep in mind when constructing this image, that each pixel will be comprised of an NSColor object, each which has four components (RGBA - Red, Green, Blue, Alpha), and each of these components (or samples) takes up one byte (8 bits). A bit of quick math then shows that 8 bits x 4 = 32 bits/pixel.

Setting the pixel data in the NSBitmapImageRep is straightforward by only needing to set the color of the pixel at a given position. Since I defined the NSBitmapImageRep to use an NSCalibratedRGBColorSpace, I defined each of the NSColors in my color palette with the [NSColor colorWithCalibratedRed:green:blue:alpha] method.

An example color of bright green is defined by the following NSColor: [NSColor colorWithCalibratedRed: 0.0 green: 1.0 blue: 80.0/255.0 alpha: 1.0]. As you can see, each component can have a fractional range from 0.0 to 1.0, where 0 is dark and 1 is light, so an RGBA value (0, 0, 0, 1) is black and (1, 1, 1, 1) is white.

When looping through the data, each pixel color is set easily with the method [bitmap setColor: pixelColor atX: x y: y]; After reaching the end of the cel image data, the bitmap is saved out as an image. In this example, the image is being saved as a PNG, but it is possible to also save in a variety of other modern image formats, including JPG, GIF, TIFF, etc.


NSString *imagePath = [NSString stringWithFormat:@"%@/export_views/%@_%d_%d.png", agiDir, key, i, k];
NSData *data = [bitmap representationUsingType: NSPNGFileType properties: nil];
[data writeToFile: imagePath atomically: NO];

Conclusion

Much of the enjoyment I have derived from reverse engineering how the AGI engine works is due to the clever techniques used during an era where the computing power was still woefully underpowered, so the programmers of the day had to be inventive in how they could stretch out the capabilities of these early microcomputers. Concepts and techniques I might have casually brushed against in a computer science curriculum (byte shifting, big vs. little endian, run length encoding) are actually practiced here. Today we enjoy far greater resources so we need not have to resort to methods to pinch and save every last spare bit and byte, but it does not dismiss the practice that we shouldn't be more mindful about out code and assets and consider ways we can always try and optimize what we are working on.

References