Digraph compression

Given that the target platform for U3.5 is severely memory constrained I figured it would make sense to use some kind of compression on the text. But given that I already use about 2/3rds of the CPU power just to update the attribute area of the screen, decompression needs to be really fast. So I opted for digraph compression. The character set is 7-bit ASCII (there are a couple of control characters to switch the offset for the Runic and Elvish characters). When bit 7 is set that indicates the character is encoding a pair of letters. There are 16 possibilities for the first character and 8 for the second. The available characters are based on the frequency with which characters appear in English. I had tried to be clever and with the second set use the second characters from the most common digraphs. But it turns out it’s more efficient to just use the most common characters. On the plus side that cuts the size of the lookup table by 8 bytes. More info on letter frequency and digraphs here: http://www.simonsingh.net/The_Black_Chamber/hintsandtips.html

Oh, and I wrote a Perl script to automate the conversion from plain text.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.