The Learning C/C++urve

Columns

The Learning C/C++urve

Bobby Schmidt

Controlling Silent Conversions

A properly designed Boolean will convert silently with some types, and raise a clamor with others. Bobby tweaks his boolean class to handle conversions as sensibly as a builtin type.

This Old Boolean

With this month's episode, we conclude our exploration of fundamental type design. Last month we left boolean with its most solid foundation yet, although it still had a few problems. In this column, I repair some of those problems, spackle over others, and leave some untouched, discussing the benefits (and costs) of your repairing them on your own.

We have two major problems still facing us:

Some types silently convert to boolean when we don't want them to.

Conversely, boolean never converts to any other type, silently or otherwise, preventing conversions we do want.

I'll warn you now that I've rigged this game, for there is no solution that comfortably satisfies all my test criteria. In this concluding installment, I'll offer my compromise solution, leaving you to embellish as you see fit.

Type Alchemy, Take 2

Let's look at these two major problems in turn, starting with other types turning themselves into boolean. If you cast back to the February column (the second installment of this series), I engaged in what I then called "a little alchemy," turning FALSE and TRUE into boolean. FALSE and TRUE were their own type, boolean_value; to change them into boolean, I introduced the constructor boolean(boolean_value).

That constructor is a specific example of a more general rule: for any arbitrary source type s, and any target class type t, all of the constructors[1]
t::t(s);
t::t(s const);
t::t(s &);
t::t(s const &);
turn an existing object of source type s into a new object of target type t. Note that these last two alternatives, for the case of s and t being the same type, amount to
t::t(t &);
t::t(t const &);
which act as class t copy constructors. I leave it as an exercise for the student to explain why removing the references in that scenario, giving
t::t(t);
t::t(t const);
yields invalid C++ (try them with your own compiler, just to see).

Such single-argument constructors are called "conversion constructors," for they convert objects of one type into objects of another type. The example of boolean(boolean_value) bore this notion out, for it indeed converted one type (boolean_value) into another (boolean). Last month, I removed boolean_type and the conversion constructor along with it; however, in that same column, I ended up creating a new conversion constructor boolean(int) to allow initialization of FALSE and TRUE, and to support statements like
boolean b = ('x' != 0);
and its equivalent:
// more obviously a call to
// boolean(int) constructor
boolean b('x' != 0);
That these statements compile implies the expression ('x' != 0) either is of type int, or is convertible to type int. Which possibility is correct depends on the vintage of your compiler.

If your compiler is state-of-the-art, it falls under the purview of the ANSI/ISO C++ standard, which says the results of relational, equality, and logical operators are of type bool — the very builtin integral type bool we're emulating. As an integral type, bool converts to other integral types (like our constructor parameter, int).

For us Luddites with bool-less compilers, I refer to the ARM [2] . There we learn that these operators return int directly, meaning expressions using these operators pass unconverted as arguments to our constructor. For the rest of this column, I'm assuming your compiler falls into this category[3] .

Allowing such expressions to convert to boolean I deem, in best Martha Stewart fashion, A Good Thing. Where my opinion sours is for expressions such as
boolean b('x');
which also compile. Initializing a boolean with relational, equality, and logical expressions to me makes consummate sense; initializing a boolean with a character does not.

Thwarting Unwanted Conversions

In general, given an object x of type t, and a class type c, for the statement
// 'c' is of some class type
// 'a' is an object of type'c'
// 'x' is of any arbitrary type 't'
c a(x);
to compile, c must have a constructor that either accepts a t (directly or by reference), or accepts an intermediate type that can be formed by conversion from a type t. Now consider our specific case of
boolean b('x');
Given that the only boolean constructor takes not a char (the type of 'x') but an int, the 'x' must be turning itself into an int before passing to the constructor. This is in fact the case, one of the many implicit integral conversions maintained for C compatibility.

Short of rewriting the compiler, we can't change the language to disallow these implicit conversions — but we can stop the conversions from happening, by removing the compiler's incentive for performing the conversions in the first place. The compiler converts arguments if it can't find an exact match. If we provide constructors that match the original argument types directly, the compiler won't bother converting the arguments.

In the case of boolean, this means writing a second constructor, boolean(char). Adding this constructor to Version 10 from last month gives the boolean definition
class boolean
    {
    friend ostream &operator<<
        (ostream &, boolean const &);
public:
    boolean(int = 0);
    boolean(char); // new constructor
private:
    char value_;
    };
Now boolean b('x') will call this new constructor instead of calling boolean(int). You may well wonder what this gains us. After all, we didn't want char to turn into a boolean in the first place, regardless of which constructor got called.

Appearances aside, we have gained an important advantage: previously both int and char arguments were funneled into the same constructor, whereas now they fork into different constructors. Since these different argument types use different constructors, we can now control which constructor calls actually compile. We want the calls with int arguments (including relational and other expressions) to continue compiling correctly, and want the calls with char arguments to fail to compile.

Drawing the Shades

We achieve the above control by declaring the boolean(char) constructor (so the compiler will try to use that constructor for char arguments), but making that definition hidden (so the compiler, even though it wants to, can't access the constructor). The trick, simply enough, is to declare boolean(char) as private, giving the new class definition as
class boolean
    {
    friend ostream &operator<<
        (ostream &, boolean const &);
public:
    boolean(int = 0);
private:
    boolean(char); // was public, now private
    char value_;
    };
If you now try to compile boolean('x'), you'll get a complaint from the compiler to the effect of[4] .
'boolean::boolean' : cannot access private member
declared in class 'boolean'
Note that, even though boolean(char) is hidden, the compiler will still try to call it, rather than calling the accessible public constructor boolean(int). When resolving overload ambiguities, the compiler looks at type compatibility, not at accessibility. boolean(char) is a better type match for the argument, so that's what the compiler calls.

There's a bonus to this scheme that makes this slicker still: bear in mind that we intend for this compiler diagnostic to crop up every time we try to call boolean(char). Put another way, if a translation unit successfully compiles, it evidently contained no calls to boolean(char). If all translation units compile, the linker won't have to resolve boolean(char), since no one called it. And that mean you don't ever have to implement the body of boolean(char)!

Some people find this a bit disconcerting, purposely declaring class members they never intend to implement. Trust me, this is a safe and sane C++ practice. In later columns, we'll apply this same technique in other ways (e.g., disabling inadvertent object copying); for now, if you extrapolate to all the builtin types that can turn themselves into int, you end up with a pretty long list of private function members[5] :
    boolean(char);
    boolean(signed char);
    boolean(unsigned char);
    boolean(short);
    boolean(unsigned short);
    boolean(unsigned int);
    boolean(long);
    boolean(unsigned long);
    boolean(float);
    boolean(double);
    boolean(long double);
You can't include all relevant constructor argument types here. For example, as I showed a couple months back, enumerations can turn into int; clearly, there's no way boolean can list every possible enumeration type it may ever come across, so some lack of type safety remains. C unfortunately considers many kinds of expressions to be int or convertible to int, and C++ is still living with that legacy.

Inserting the above members into the Version 10 boolean definition from last month gives the complete boolean Version 11, shown in Listing 1.

As usual, append this version to booltest.h and compile the test suites, as listed in my January column. I summarize my results in Table 1, where the success ratio has gone up again, this time to 83%.

Type Alchemy, Take 3

The above targeted the first major problem identified at the start of this column — that of too many other types turning into boolean. I now turn attention to the second of those problems, boolean not turning into enough other types, which is in a sense a counterpoint to the first.

Let's start with the opposite of what we did earlier. Instead of turning an int into a boolean, we'll try turning a boolean into an int. Think back to when I introduced conversion constructors that let a source type s turn into another target type t. Those conversion constructors were all of the general form
t::t(s); // creates a new target object of type 't' from
         // a source object of type 's'
In the example of turning boolean into int, boolean takes the place of s (the source) and int takes the place of t (the target), giving a conversion constructor of
int::int(boolean); // creates a new target object of
                   // type 'int' from a source object of
                   // type 'boolean'
Hmmm. What's wrong with this picture ... anyone? Anyone? Bueller? Bzzzt, time's up. This example assumes int can have constructors. But int is a builtin type, and builtin types don't have constructors. A dead end.

Or maybe not. Conversion constructors are more than just a generic way to turn a source s into a target t. They are more specifically a property of the type t, saying "I will turn you into one of me." What we need is an analogous property of s that says "I will turn myself into one of you." Well seek and ye shall find, for C++ supports such properties, and they are called conversion operators.

Turning On Desired Conversions

Just as conversion constructors have general properties, so do conversion operators. For the case of any source class type s and any arbitrary target type t, the conversion operator
s::operator t();
lets an s source object turn itself into a t target object. For our case of boolean turning itself into an int, the operator would be defined as
boolean::operator int();
As with constructors and destructors, conversion operator declarations show no explicit return type. However, unlike constructors and destructors, conversion operators do actually return a value. The operator's name implies the returned type, meaning operator int() returns a result of type int. You also can make conversion operators const member functions[6] , something else you can't do with constructors and destructors, so that our conversion operator can become:
boolean::operator int() const;
The implementation of this operator is probably what you would expect:
inline boolean::operator int() const
    {
    return value_;
    }
Again note that, although there is no return type declared, the function still returns a value. With the exception of constructors, destructors, and conversion operators, you must supply a return type for all function declarations[7] .

Add operator int to boolean, call the result Version 12, append to booltest.h, and compile the test suites. You may suffer sticker shock when reading my results (also shown in Table 1) , for the success ratio has plummeted to 56%! The good news is that all but two of the acceptance cases pass now; the bad news is that many more rejection suites fail.

Let's look at the upside first. Equality, relational, and logical expressions result in values of type int; thus, by converting to int, boolean is now type-compatible with these expressions as well, allowing such desired sequences as
boolean a, b;
// logical-and returns 'int'
boolean c = a && b;
Unfortunately, subtle bugs can still creep in. If, for example, you accidentally wrote & instead of && above, so that the statements read
boolean a, b;
// oops -- accidentally used
// bitwise-and, which also returns
// 'int'
boolean c = a & b;
the compiler would not complain.

This whole business of type safety has been a process of successive approximations: first we had too much safety, then too little, then too much, and so on, like a pendulum. Now we are still unsafe, in an apparently major way. But amazingly enough, with one tweak — my last — we can fix most all of the remaining type interaction problems.

All these operators (&& and the like) accept int operands and return int results. By turning itself into int, boolean can interact with these operators. Unfortunately, we can't control the contexts in which boolean interacts with them. We need a type into which boolean can turn itself that offers more control; the trick is not to turn boolean into int directly, but rather to turn it into a type that is compatible with int where we want such compatibility, and incompatible where we don't.

Rounding Third and Heading for Home

By reading various and sundry parts of the language standard, I find that

Of the operators that we want to have accept boolean operands, many happen to accept pointer operands.

Of the operators that we want to have reject boolean operands, many happen to reject pointer operands.

Operators accepting pointers won't mix pointer and non-pointer operands in the same expression.

Conditional statements accept pointer expressions.

So it seems that, by turning boolean into a pointer rather than into an int, we may achieve more control over type interaction. For reasons I'll make clear shortly, I'm turning boolean into one particular pointer type: void *. Just as I defined operator int() const to turn boolean into an int, I now define operator void *() const to turn boolean into a void *:
inline boolean::operator void *() const
    {
    static char dummy;
    return value_ ? &dummy : 0;
    }
I don't intend for a user of this operator to ever dereference the returned pointer, so you may wonder why I bother returning a valid address, instead of just returning something like (void *) value_. Even if your client code never tries to dereference the pointer, turning a non-pointer into a pointer — that is, treating an arbitrary integer as if it were a real memory address — is not well-defined C++ behavior.

The conversion operator returns void *, as opposed to some other pointer type, to minimize the number of ways you can accidentally mix boolean and other types. In C++, you can assign void * only to another void *[8] . Were we to return another pointer type, we could assign that pointer to both its type and to void *, increasing the chance of misuse.

Substituting operator void * for operator int, and making one other change, yields my final boolean Version 13 (Listing 2) .

The "one other change" is to no longer declare operator<< as a friend. Before, operator<< had to interrogate the private data member value_; now boolean returns its state through the public function member operator void *.

Compiling this version gives me the results shown in the third column pair of Table 1. As I warned near the beginning, we have not reached 100% success (here it's 86%). Still, I find this boolean class is more than adequate for most of my work; indeed, this is the very implementation I typically use in The Real World.

As a final measure, I mentally ran the builtin bool through the same test suites. I reckon bool succeeds on about 50% of the tests, depending on how bool is implemented on a particular compiler, and how accurately I eyeballed both the code and the standard. bool scores about the same as Version 3's enum boolean did back in January, meaning our boolean class is much safer than is the equivalent builtin bool type.

Extra Innings

If you want to play around more with this class, here are some considerations to mull over:

You may be tempted to have the conversion operator return void const * instead of void *, to prevent a user from accidentally writing through the pointer. This is tempting, but wrong: C++ requires this pointer be non-const to serve in the roles we need.

Because operator void * doesn't return value_, there is now a dissonance between the internal state of boolean (as implemented by value_) and the externally perceived state (as betrayed by operator void *).

To behave more like builtin bool, boolean must "decay" to either 0 (for false) or 1 (for true) when used in expressions. As written, boolean decays to either a null pointer (0), or to some non-null pointer (which almost certainly won't be 1).

This also means boolean doesn't observe predicate calculus identities like b == !!b, unless you change how boolean decays, or overload operators (operator== or operator! in this example) specifically for boolean.

Overloading these operators also lets you catch more improper boolean use at run time. Remember that equality, relational, and logical expressions result in 0 or 1. If any other integer value is mixed with a boolean, it may imply something other than these types of expressions is involved.

You can't write b < TRUE because pointers (which the boolean conversion operator returns) can't be used as operands of relational expressions. If you want to allow such expressions, you need to write your own versions of the relational operators, although I question the semantic merits of asking if one boolean value is less than another.

Similarly, you can't write b++ without rolling your own postfix operator++, or creating conversion operators that return modifiable lvalues. The wisdom of such a boolean operator escapes me; after all, what is the meaning of b++ when b already equals TRUE? Perhaps unsurprisingly, builtin bool supports b++.

Conclusion

In the past four months, we've surveyed many of the fundamental considerations underlying user-defined types. Starting with a high-level notion of what we wanted to model, we journeyed from primitive macro implementations to enumerations, and finally to successively refined classes. Along the way we touched on a fair number of basic design concepts like information hiding and data type conversion, and surveyed many C++ language components, such as function members and constants objects.

I developed boolean some months back for my own project use, but soon after found it would make a fine catalyst for demonstrating type design. Rather than gloss over that development, I've instead carefully dissected my thought processes in the open. This way, you can witness the surprisingly large number of interdependencies and tradeoffs involved in creating even a simple class like boolean. o

Erratica

This month's educational mail comes from intrepid reader Russ Noseworthy, who writes that GNU C++ version 2.7.2 supports builtin bool. I am deathly afraid of constructing and running the GNU system on my twinkie machine, so I'm still unable to run the test suite against the real bool (although according to Russ, my "best guess" of how bool should work is apparently spot on). Fortunately, by the time you read this, I will have purchased vastly improved hardware — probably an Apple PowerBook — and may well have a bool-supporting compiler installed.

Next month I will offer a new topic, so new that even I don't know what it is yet. But rest assured, it will be the E-Ticket attraction you've come to know and love. Speaking of attractions, Dan Saks and I pontificate at the Embedded Systems Conference in Boston the first week of April, which marks my first time ever in New England. [I'll be there too I'm giving the keynote. Besides, I live just down the road. pjp] If you are in town, stroll by and give us a hearty "g'day." I hear that Stan Kelly-Bootle might be in Boston then . . . could give Dan a chance to figure out who he is.

Footnotes and References

[1] I use "constructors" as a plural although you can define only one of these overloads for a particular class. If you try to define more than one, you'll get an error — the compiler cannot tell which one you want called.

[2] Margaret A. Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual (Addison-Wesley, 1990).

[3] I do not want to open the semantic Pandora's Box of mixing builtin bool and our boolean class. Safe to say, you can mix the two, but the considerations are beyond the scope of this article.

[4] This assumes you are not calling boolean(char) from with a boolean member function. Remember private boolean members are accessible to all boolean members, so you can still call this constructor by mistake.

[5] I'm consciously omitting the builtin type wchar_t here because many compilers still don't support it as a distinct type.

[6] This is relatively new, as discussed by Dan Saks in his October 1995 CUJ column, and trumps the old "implicit int" rule inherited from C, which assumed a missing function return type was really int.

[7] I have purposely avoided discussing const members, deferring to a later column (or series) devoted to the topic. I note them here only for completeness. For the case of returning an rvalue or non-modifiable lvalue, as we are here with int, you are generally best off making the operator a const member.

[8] This is a change from C, which lets you assign from void * to any pointer type.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 3543 167th Ct NE #BB-301, Redmond WA 98052; by phone at (206) 881-6990, or via Internet e-mail as rschmidt@netcom.com.