3.3.1 Run-length encoding

Run-Length-Encoding is a simple and fast procedure for coding symbols which appear often in succession. For example a line/vector graphic (with few lines) will contain mainly pixel in the background color. Each repeating symbol is coded once together with the repetition count.

We consider two flavors of this coding:

7-Bit-Run-Length-Encoding:

If the symbols are (mainly) in the range 00h-7Fh (like it is the case with ASCII-text) we code repeating symbols by first writing the count () with highest bit set to 1 followed by the symbol. For symbols with highest bit 1 (like umlauts or other accented symbols) we have to precede them by , since this is not a useful count indicator:

[0 $\vert$ symbol] or [1 $\vert$ count][0 $\vert$ symbol] or [1 $\vert$ 0000001][symbol]

Standard-Run-Length-Encoding:

In the standard run-length-encoding one codes multiple appearing bytes by three bytes: the original byte, 90h, and the count. Since the count has to be longer chains of the same symbol have to be coded by multiple of such triples. For the symbol 90h another coding has to be used, since 90h in the output means `count follows'. So 90h will be coded as 90 00h, since this is does not represent a reasonable count.

Some coding procedures apply first run-length-encoding before the proper compression algorithm is applied.

Pseudo-code for the general run-length-algorithm looks as follows:

  Loop: count = 0
        REPEAT
          get next symbol
          count = count + 1
        UNTIL (symbol unequal to next one)
				output symbol
        IF count > 1
          output count
        GOTO Loop

Andreas Kriegl 2003-07-23