Back to TOC Columns


The Learning C/C++urve

Bobby Schmidt

Creating a Boolean Inserter

One way to find out if your new data type behaves like a builtin is to hook it up to an inserter. Bobby uses what he learns from this exercise to spruce up his boolean data type.


Introduction

Last month I left our continuing boolean saga with a version fettered with too much type safety. I also promised to rectify the weaknesses of that version this time around. As a means to that end, I'm changing course a bit by introducing boolean inserters; as we explore such inserters and the boolean limitations that hamper them, we'll discover solutions that not only enable inserters, but also solve problems posed last month.

CS 101

I'll wager the first C program you wrote is something along the lines of

#include <stdio.h>
int main(void)
    {
    printf("Hello, world!\n");
    return 0;
    }

printf is a function sporting a variable argument list. You should avoid such functions like the proverbial plague, for they completely defeat the language's ability to ensure the caller's arguments match what the function actually expects.

Unfortunately, C does not have the equivalent of Pascal's intrinsic writeln statement, which acts as a sort of inline-expanded printf with the format string implied; instead, C must call an external function and hope the arguments match the format string.

To the rescue comes the white knight of C++. While you can still use printf, the comp.lang.c++ elite will sniff at your proletarian tastes. Much more hip and cool are C++ inserters, syntactically represented by the operator <<. In fact, the canonical first C++ program is the above C program updated as

#include <iostream.h>
int main()
    {
    cout << "Hello, world!" << endl;
    return 0;
    }


Now you C programmers, and C++ newcomers, may wonder what is so spiffy about this << operator; after all, such bit shifting has been part of C since Dennis Ritchie was a pup. Well, to paraphrase the old car ads, this is not your father's bit-shift operator. C++ (or more accurately, the C++ standard library) has, in certain contexts, trumped the normal meaning of <<. Take a look in the standard C++ header file iostream.h [1] . There you will find a cornucopia of ostream member function definitions such as

ostream &operator<<(int);

where function operator<< is overloaded — that is, declared multiple times, with different parameter lists in each declaration — for all the builtin arithmetic types and a few pointer types.

Typically you don't use ostream directly, but instead use a type derived from ostream. One such type, also defined in iostream.h, is ostream_withassign. While you can declare your own objects of this type, more often you use the ostream_withassign objects predefined in iostream.h. In fact, the canonical first C++ program above references one of these objects, cout, which is the ostream analog of the familiar stdout.

Life is But a Stream

All this permits writing the short program

#include <iostream.h>
int main()
    {
    int n = 123;
    cout.operator<<(n);
    return 0;
    }

which you can alternately (and more aesthetically) write

#include <iostream.h>
int main()
    {
    int n = 123;
    cout << n;
    return 0;
    }

Here the token << does not mean the builtin bit-shift operator. Instead it means the function operator<<, acting as an inserter; that is, it acts as an operator inserting the text image of a value (123 here) into a stream (cout). Note that you can chain inserters together, like UNIX pipes, so that

cout << "n = " << n << endl;

inserts the text sequence n = 123 (followed by a newline, courtesy of endl) into the stream cout.

I want to use an inserter with boolean, so that

#include <iostream.h>
#include "booltest.h"
int main()
    {
    boolean b = TRUE;
    cout << "boolean b = " << b << endl;
    return 0;
    }

inserts the line booleanb=TRUE into cout. This leads me to add Rule #6 to the informal set of Abstract Data Type (ADT) rules I started creating last month: types acting builtin work directly with inserters.

As you peruse the list of inserters advertised in iostream.h, you'll note the distinct lack of an inserter accepting a boolean. That is, you won't find anything remotely like

ostream &operator<<(boolean);

as a member of class ostream. This is unsurprising, as ostream can't possibly know about the existence of our boolean. You could rewrite your compiler's ostream to recognize boolean, but I gently dissuade you from doing so; instead you should write a non-member function that achieves the same end.

Recall the earlier ostream member

ostream &operator<<(int);

Recall further that non-static function members implicitly have a hidden first parameter, the infamous this pointer. When operator<<(int) gets called, it is passed arguments as if it had really been called as operator<<(ostream * const this, int).

Similarly, if ostream had a member that understood boolean, it would be passed arguments as if it were really called as operator<<(ostream * const this, boolean).

Insert Tab B(oolean) Into Slot O(stream)

Since we can't reasonably augment ostream with such a function, we'll instead craft a non-member function with the same arguments. Where the member function makes the first parameter (this) hidden and implicit, the non-member function makes it overt and explicit:

ostream &operator<<(ostream * const, boolean);


Unfortunately, such a literal extrapolation from function members means that, instead of using

cout << b; // or operator<<(cout, b);

we are stuck with

&cout << b; // or operator<<(&cout, b);

since the first parameter is a pointer to an ostream. Better to define the function as

ostream &operator<<(ostream &, boolean);

so we can safely write [2]

cout << b;


Augmenting class boolean with such a function yields Version 7:

enum boolean_value
    {
    FALSE,
    TRUE
    };
class boolean
    {
public:
    boolean(boolean_value = FALSE);
private:
    char value_;
    };
boolean::boolean(boolean_value value) :
        value_(value ? TRUE : FALSE)
    {
    }
ostream &operator<<(ostream &left, boolean right)
    {
    return left <<
            (right.value_ ? "TRUE" : "FALSE");
    }

Friends, References, Countrymen

Sadly, if you append Version 7 to booltest.h and try compiling our simple test of

#include <iostream.h>
#include "booltest.h"
int main()
    {
    boolean b = TRUE;
    cout << "boolean b = " << b << endl;
    return 0;
    }

the compiler will object with something like

'operator<<':cannotaccessprivatememberofclass'boolean'


Our operator<< references the boolean private data member value_. The only functions that can reference private data are either members of the class defining the data, or friends of that class. operator<< cannot be a boolean member (otherwise its left operand would have to be a boolean instead of an ostream), so we must make it a boolean friend.

Also note that I pass the boolean argument to operator<< by value (boolean b) rather than by constant reference (boolean const &b). Conventional wisdom says you should generally pass objects of class type by constant reference to avoid the overhead of constructing temporaries. Here that wisdom is not so clear. boolean objects contain a single character of data (value_), while references are generally implemented as pointers; thus a boolean object may be smaller than the corresponding reference, meaning less stuff is passed during the call. However, what is gained in size may be lost in speed as the temporary boolean object is constructed.

Instead of optimizing the data passed during the function call, I'm probably better off optimizing the call itself through inlining [3] . A clever compiler could then net all this out so there were no actual arguments passed in either scenario. Still, as a columnist I'm supposed to be a paragon of C++ virtue, so I'll defer to conventional wisdom and pass the parameter by constant reference. All these changes yield Version 8:

enum boolean_value
    {
    FALSE,
    TRUE
    };
class boolean
    {
    friend ostream &operator<<
            (ostream &, boolean const &);
public:
    boolean(boolean_value = FALSE);
private:
    char value_;
    };
inline boolean::boolean(boolean_value value) :
        value_(value ? TRUE : FALSE)
    {
    }
inline ostream& operator<<
        (ostream &left, boolean const &right)
    {
    return left << (right.value_ ? "TRUE" : "FALSE");
    }

Saved into booltest.h, Version 8 works swimmingly for our tiny test program. If you link and run that program on a system that lets you capture what's written to cout, you'll see correct results.

I'm no Bool, no Sirree

Now change the program to

#include <iostream.h>
#include "booltest.h"
int main()
    {
    cout << "TRUE = " << TRUE << endl;
    return 0;
    }

and the test is less cooperative. My Microsoft compiler gives a diagnostic claiming operator<< is ambiguous. MetaWare does better, successfully compiling and linking. However, rather than writing the desired

TRUE = TRUE

the resulting program writes

TRUE = 1


That number 1 should look familiar: it is the integral value of the enumeration constant TRUE. It would seem that operator<< is interpreting TRUE not as an enumeration constant, but as an integral. But why is this even an issue? After all, shouldn't the program call the operator<< we defined above?

Remember that TRUE is of type boolean_value. The operator<< overload we defined for Version 7 accepts a boolean, not a boolean_value. In fact, there is no operator<< that accepts a boolean_value. However, as we saw earlier, there are several operator<< overloads that accept integral types — and enumeration types (like boolean_value) easily decay to their underlying integral type.

So, instead of passing the enumeration constant TRUE directly to the function operator<<, C++ will let TRUE decay into it's underlying integral type, then pass that integral type to the appropriate operator<<. While Microsoft's compiler apparently can't choose a single best fit, leading to the "ambiguous overload" message, MetaWare's can. Even so, the results are incorrect; we want FALSE and TRUE to behave just like boolean.

Since adding an operator<< that accepted boolean worked for that type, perhaps adding an operator<< that accepts boolean_value will for FALSE and TRUE. Adding such a function produces Version 9:

enum boolean_value
    {
    FALSE,
    TRUE
    };
class boolean
    {
    friend ostream &operator<<
        (ostream &, boolean const &);
public:
    boolean(boolean_value = FALSE);
private:
    char value_;
    };
inline boolean::boolean(boolean_value value) :
        value_(value ? TRUE : FALSE)
    {
    }
inline ostream &operator<<
        (ostream &left, boolean const &right)
    {
    return left << (right.value_ ? "TRUE" : "FALSE");
    }
inline ostream &operator<<
        (ostream &left, boolean_value const &right)
    {
    return left << (right ? "TRUE" : "FALSE");
    }

Too Many Notes

Hmmm. A bit redundant, isn't it? I'm troubled that we need two separate operator<< overloads for what are conceptually the same kind of objects. This is a red flag, indicating that we probably have a hole in our design strategy. We need two operator<< overloads because boolean is not boolean_value. Perhaps the time has come to reevaluate our use of boolean_value.

We originally introduced boolean_value in Version 3, before making boolean a class. In that version, boolean_value added type safety. But classes are obscenely more type safe than enumerations. Further, the only "objects" of type boolean_value are FALSE and TRUE. A more ideal solution would both keep FALSE and TRUE type-safe, and make them compatible with boolean, eliminating the need for two operator<< overloads.

The trick, simple and perhaps too obvious, is to actually make FALSE and TRUE of type boolean. This trick bears out the adage "what goes around comes around," for we wrote such a solution in Version 2:

static boolean const FALSE = 0;
static boolean const TRUE = 1;

In C++, these statements are equivalent to

boolean const FALSE(0);
boolean const TRUE(1);


Points to ponder:

I want to explore these last two points (the constructor call and removing boolean_value) more fully. The statement

boolean const FALSE(0);

suggests FALSE is created by a constructor accepting 0 as an argument. As written, boolean has no such constructor. In fact, by removing type boolean_value, we also have to remove boolean(boolean_value), leaving us with no explicit constructors[6] .

To support our new definitions of FALSE and TRUE, I introduce a new constructor accepting an int. You may be tempted to write that constructor's implementation as

inline boolean::boolean(int value) :
        value_(value ? TRUE : FALSE)
    {
    }

modeled after the (now obsolete) boolean(boolean_value) implementation. This new constructor, like that old one, initializes the boolean object from FALSE and TRUE. Now consider what happens if the objects we are constructing are the actual FALSE and TRUE objects. By using this constructor, FALSE and TRUE would be initialized from ... FALSE and TRUE! Turtles, turtles everywhere.

Perhaps surprisingly, this would actually compile. C++ fills static objects with all zeroes at (conceptual) load-time, so that FALSE.value_ and TRUE.value_ would contain zero before they were ever constructed. Thus, before these two objects are initialized, they hold values equivalent to FALSE after construction. The net result, then, is that FALSE would construct properly, but TRUE would be initialized as if it were FALSE.

Little Change, Big Effect

To get around this, change the constructor slightly to

inline boolean::boolean(int value) :
        value_(value ? (char) 1 : 0)
    {
    }

"Slightly" is probably misapplied here, for while the source code change is small, the effect is actually quite large (as we'll see shortly). Also notice I'm casting 1 to be a char; otherwise the compiler may complain of data loss (squeezing an int constant into a smaller char object)[7] . Adapting all these changes to Version 9 gives us Version 10:

#include <iostream.h>
class boolean
    {
    friend ostream &operator<<
        (ostream &, boolean const &);
public:
    boolean(int = 0);
private:
    char value_;
    };
inline boolean::boolean(int value) :
        value_(value ? (char) 1 : 0)
    {
    }
inline ostream &operator<<
        (ostream &left, boolean const &right)
    {
    return left << (right.value_ ? "TRUE" : "FALSE");
    }
boolean const FALSE(0);
boolean const TRUE(1);

If I append this to booltest.h and compile the little test suite

#include <iostream.h>
#include "booltest.h"
int main()
    {
    boolean b(TRUE);
    cout << "b = " << b << endl;
    cout << "TRUE = " << TRUE << endl;
    return 0;
    }

I find the program builds and runs correctly with both my compilers.

A better test, of course, is compiling January's test suites with this version. My results of such compilation appear in Table 1, and show the success percentage has leapt to a whopping 78%, far and away the highest yet. Highlights:

Overall, the high success percentage tells me we are getting very close to what I consider ideal behavior for a simple boolean type.

Next month's column wraps up our survey of fundamental type design. I will add the remaining boolean members, augment the test suite, and discuss the final version's merits (and limitations). I will also run the genuine builtin bool type through the same tests, as a control measurement against our implementation[8] .

Erratica

Note that in my examples this month my main returns an int. In previous listings it returned void so that I wouldn't need an actual return statement. I reasoned that return was not pertinent to what I was demonstrating, and that using it might induce you to wonder how the returned value fit with the problem at hand. But as diligent reader James Stern points out, returning void from main is not strictly conforming, and I had promised in my first column to stick with strictly conforming constructs.

Like programming in general, teaching programming is an exercise in compromise. Were I literally to use only strictly conforming code, the resulting examples would often be pedantic and contain much distracting overhead. Upon reflection, I don't think that returning int from main is such a case, so I am comfortable amending my wicked ways there. But I am also backing off from my doctrine of strictly conforming code where such code would inhibit the clarity and conciseness of my examples. I am not crafting production-quality maintainable code in a "real world" environment — I am writing small examples as demonstration tools. Given the choice, I will sacrifice some portability if doing so makes for more cogent teaching. o

Footnotes and References

[1] The C++ Working Paper denotes inclusion of this header as #include <iostream>, without the trailing .h. I have yet to use a compiler that supports this nomenclature, so for the time being, I'll include the .h in my examples. Also, some compilers (such as MSVC 2.x and earlier) define class ostream in the file ostream.h.

[2] You may be wondering, especially if you program primarily in C, why adding & in the operator<< parameter list means omitting & before cout. Such is the mystery of references, which unfortunately use the same token (&) as the address of operator. For now, just take it on faith; later this year, I'll cover references in greater detail.

[3] I generally avoid inline functions, except for code that demonstrably requires the performance tuning, and even then only in release versions. Inline code often requires more frequent recompilation, may be difficult or impossible to step into with a debugger, and can actually make code slower.

[4] If your compiler supports namespaces, you can define FALSE and TRUE in a namespace and make them look global with a using declaration.

[5] This is a general annoyance with extern objects: you cannot know in which order they are constructed and initialized. This problem crops up with the standard library's predefined stream objects (cout and the like), which you are allowed to use in constructors for your own objects — even though the streams may not yet be constructed. The library employs smoke and mirrors to pull this off, a topic covered by P.J. Plauger in the May 1994 CUJ.

[6] But we still have the implicit language-synthesized default and copy constructors.

[7] I could use one of the new and improved RTTI casts here. However, your compiler may not fully support RTTI. Further, such casts would add little to this discussion, at the cost of confusing C and beginning C++ programmers.

[8] At the time of writing, I still don't have available to me a compiler implementing builtin bool. If this condition persists, I will have to "play compiler" and translate the code by eye.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 3543 167th Ct NE #BB-301, Redmond WA 98052; by phone at (206) 881-6990, or via Internet e-mail as rschmidt@netcom.com.