TSQRT: A New Trick for an Old Dog

Algorithms

TSQRT: A New Trick for an Old Dog

Don Bockenfeld

The square root function is widely used, and amenable to a number of interesting specializations. Author Bockenfeld shares with us his joy in discovering one of those special approaches. Introduction

I present a variation on traditional table lookup with interpolation as applied to the square root function: tsqrt (table square root). Miroslav Sálek, of the Aeronautical Research & Test Institute (VZLU) in Prague, gave me this algorithm in 1992. We use it in an embedded avionics application where all math operations are performed on scaled integers — there is no floating point.

The return value of tsqrt(x) is 128 * x; that is, tsqrt(30) returns 701. For an input domain of 0 to 65,535, the largest error magnitude is 2 counts (corresponding to 2/128), and the largest percent error magnitude (0.15%) comes from resolution, not accuracy, limits. This feat is accomplished by a fast algorithm using only 258 bytes of lookup table.

Description

tsqrt follows the standard approach of first finding the pair of table entries that bracket the input and then using linear interpolation to estimate the output. (Source code and a test program are shown in Listing 1.) The following equations summarize this process:
Find bracketing table entries

1a: x0 < = x < x0+dx
1b: x0 = (int)((x-MIN)/dx) * dx + MIN

Table lookup

2a: y0 = 128*sqrt(x0)
2b: y1 = 128*sqrt(x0+dx)

Linear interpolation

3: y = (x-x0) * (y1-y0) / dx + x0
where MIN is the input value corresponding to the first table entry.

What makes this algorithm interesting is how the interpretation of the lookup table changes for different subdomains. Each successive subdomain uses the same lookup table entries even though their sizes increase exponentially.

To compute the lookup table index corresponding to (x0, y0):
4a: i0 = x                            (range 1)
4b: i0 = (int)((x-MIN)/dx) + 32       (ranges 2-6)
Table 1 shows the various ranges (or subdomains), and the corresponding parameters for each range.

The lookup table, in turn, consists of 129 entries defined as follows:
for (i=0; i<129; i++)
    table[i] = (unsigned short)
        (16*128*sqrt(i) + 0.5);
You may notice that this algorithm is amenable to considerable speed optimization. The equations for subdomain 1 reduce to simple table lookup without any interpolation. In addition, all of the multipliers and divisors are powers of two, so shift instructions may be used.

Analysis

This "folded" lookup table technique was new to me. How could such a small table be used and reused over such a large domain without any loss of resolution?

Let's examine a typical table entry that is used in all six subdomains (for simplicity I chose the 32nd). The x0 values that map to this entry are 32, 128, 512, 2048, 8192, and 32768, as shown in Table 2.

The coefficients in the right-most column of Table 2 are proportional to those in the y0 column of Table 1. The table lookup portion of the algorithm relies on the distributive nature of the square root function. But what about the interpolation portion? Why is the algorithm still accurate with these ever increasing dx values? The only way for a linear approximation to be accurate is if the function is approximately linear. The second derivative of the square root function falls so precipitously that this is essentially true once we get beyond the first or second subdomain.

In conclusion, this algorithm is accurate, tight, and fast. It also illustrates a more general technique that was missing from my armamentarium. What more could you ask for? o

Don Bockenfeld currently works for Flight Visions writing software for commercial Heads-Up Displays. He has been programming embedded systems for 17 years, split about evenly between assembly language and C. Don received a B.Sc. in Electrical Engineering from Northwestern back before they were a Rose Bowl contender. He may be reached via surface mail at 43W752 Route 30, Sugar Grove, Illinois, 60554.