Columns


Standard C/C++

P. J. Plauger

The Header <utility>

It's not the sexiest header in the Standard Template Library, but <utility> defines templates used throughout STL. Becoming familiar with them now lays important groundwork for understanding the rest of STL.


Introduction

The Standard Template Library (STL for short) is a major component of the draft Standard C++ library. It consists of thirteen headers, as summarized in Table 1. After several months of overviews, it's time to get down to the nitty gritty. I begin this month the detailed review of these headers.

The first item on the list is one of the smallest, the header <utility>. It is also arguably the least interesting of the lot. That's not an auspicious way to begin a long journey, but it is necessary in many ways. The templates defined in the header <utility> are used throughout STL. The sooner you become familiar with them the better.

I hope to compensate for the mundane nature of <utility>, at least in part, by covering several more global issues. Any large collection of source files needs some overarching design criteria, if only to avoid repeating decisions over and over again. It also helps to have clearly stated goals for how the software is to be used, if only to lend a certain consistency to otherwise arbitrary choices. So I begin by discussing the general design of the version of STL I will be presenting in these pages.

Header Conventions

To keep the discussion concrete, take an early look at Listing 1. It shows one way to implement the header <utility>. I'll discuss the details of this small header later in this installment. For now, I focus on the overall structure of the header.

First the obvious. Bracketing the header is the usual explanatory comment and macro guard:

// utility standard header
#ifndef _UTILITY_
#define _UTILITY_
.....
#endif /* _UTILITY_ */


Many C and C++ programmers will recognize this as a common idiom for protecting an include file against multiple inclusions. A typedef or a macro definition can be seen multiple times in a translation unit, so long as the resultant definition is the same each time. (This is called "benign redefinition.") But essentially all other entities must be defined at most once. Here, defining the macro _UTILITY_ serves as a first-time switch to ensure that the body of the header is considered just once, even if it is mentioned in multiple include directives within a translation unit. That protects against troublesome multiple definitions, while encouraging liberal use of include directives in user code.

The macro name _UTILITY_ is chosen with similar care. Both the C Standard and the draft C++ Standard reserve certain sets of names for use by the implementor. In particular, names that begin with an underscore followed by an upper case letter are strictly off limits to all but implementors. Thus, an implementation that uses names just from this set can be certain not to interfere with the proper translation of any conforming program. (Implementors of different parts of a library must still take care not to tread on each others' toes, however.)

The next line of interest is the include directive:

#include <iosfwd>

The header <iosfwd> is yet another one of those inventions of the committees standardizing C++. It supplies several forward references (declarations of incomplete types) for classes defined as part of the iostreams headers. Civilians seldom have occasion to include this header, but to us implementors it is often a godsend. It helps break up any number of circular dependencies among the components of the draft Standard C++ library.

The primary reason for so many of these circular dependencies is historical. To be more precise, it stems from a lack of history. Not to put too fine a point on the matter, the vast bulk of this library was standardized long before it was ever implemented. Those of us supplying the first implementations of the library are essentially debugging the draft C++ Standard as we proceed.

Strictly speaking, the header <utility> has no use for any of the declarations in <iosfwd>. In practice, however, I've had to spend days tinkering with the structure of header inclusion after making the changes voted in at each meeting of the committees to get everything working again. When I find a simple scheme that works, I tend to stick with it. In this particular case, other headers that include <utility> tend to need <iosfwd> in turn. And <iosfwd> itself includes still other widely useful items. I therefore find it convenient to lob in this apparently gratuitous include directive here.

I could have glossed over this implementation detail, dismissing it as an unwelcome distraction. I prefer, however, to underscore how difficult it is to get large quantities of software to cooperate successfully. This particular example also emphasizes that STL does not exist in a vacuum. Self contained as it is in many ways, it is still part of a (much) larger library.

Namespaces and Other Novelties

Proceeding from the outside in, the next bit of structure in the header <utility> is a bracketing pair of comments:

///namespace std {
.....
///};// end of namespace std

Obviously, this does nothing. But if you were to remove the curious triple-slash comment delimiters, you would turn on a namespace declaration. This particular declaration ensures that all (non-macro) names defined inside the braces occupy the namespace called std.

Indeed, the draft C++ Standard has for some time now mandated that all names defined in the library be confined to namespace std. If you want to use a library name, you basically have one of three choices:

1) Qualify every name from the library, as in:

	#include <cstdio>
	int main()
	{std::printf("hello world\n");
	return (0); }

2) "Hoist" the names you choose to use into the global namespace, as in:

	#include <cstdio>
	using std::printf;
	int main()
	{printf("hello world\n");
	return (0); }

3) Hoist all the names, as in:

	#include <cstdio>
	using namespace std;
	int main()
	{std::printf("hello world\n");
	return (0); }

Subtle differences exist in the actual usability of library names with each of these approaches, but I won't begin to address them here. Mostly, they all deal about the same way with the presence of namespace std.

You should be pleased to note that including the traditional header <stdio.h> essentially behaves the same as the first two lines of #2 above, except that it hoists all the names declared in the header. Put simply, it's just like the good old days. (Unfortunately, most of the similar machinery for the C++-specific headers has been dropped from the draft C++ Standard along the way. You'll soon get used to writing using declarations at the top of source files.)

Whatever you may think of namespaces, good or bad, they present a problem for me. Compilers are just appearing in the marketplace that support namespaces. Ones that don't will remain popular for years to come. So do I present code that conforms to the letter of the draft, or do I present code that is likely to be widely executable today? Multiple arguments exist both ways.

It gets worse. The library makes use of all sorts of "new" C++ language features besides namespaces. I put "new" in quotes because some of these features were added years ago, but have yet to find their way into popular compilers. Any given compiler will have some interesting mixture of capabilities. I know of none available today that supplies all capabilities that are required to fully support the draft Standard C++ library.

Real production code is thus chock full of conditional directives. A given implementation turns on whatever features it can. It disables, or supplies a simpler alternate form, for those it cannot. You get a Chinese menu of possibilities at the cost of headers that border on the illegible.

I do have a production version of the entire draft Standard C++ library (coming soon to a commercial compiler near you). Showing you that code here would be a real education in the nitty gritty details of writing portable commercial code. Indeed, I have shown that sort of stuff in the past — with code not quite so heavily parametrized.

But my primary goal in these installments is to describe the intricacies of STL. That's quite a heavy enough undertaking without adding further distractions. Hence, I've chosen to derive from my production code a tutorial version that does essentially the same thing. The major difference is that it is designed to be significantly more readable.

I cut the Gordian knot with a simple bit of notation. The code as presented is designed to work with a variety of popular compilers available today. (I've stuffed it through Borland 4.5, Microsoft v4.0, and Edison Design Group's widely used front end, for example.) Anything likely to cause trouble is commented out with a triple-slash comment.

To be more precise, any code to the left of a triple-slash comment is "commented in." You convert from a pragmatic, workable-today version to a fully conforming, workable-some-day version by a simple edit. Just delete everything on a line up to and including three slashes. Whatever is left represents the future ideal. Any intermediate stages have been dropped in translation.

Every STL header uses this treatment to turn off namespaces for now. In future installments, you'll see much more aggressive use of this notation, however. It's not always wonderfully readable, but it sure beats all those ifdef directives.

The Two-Edged Sword

I've made one additional concession to readability. Listing 1 makes liberal use of traditional names for types, such as T1 and T2, and for objects, such as x and y. These fairly arbitrary names often, if not always, agree with those used in the draft C++ Standard itself. They certainly follow widespread conventions for naming things in C++. Thus, names "internal" to the headers are chosen as much as possible to maximize understanding.

The only problem is, they don't conform to the requirements of the draft C++ Standard. No, there's no problem with "breaking" user code. Parameter names, and names nested inside classes and functions, don't leak out to pollute the containing environment. Rather, the problem is the other way around. Under a variety of circumstances, names chosen by the user can percolate into library declarations and definitions, altering their meanings in pernicious ways. The net effect is often a mysterious diagnostic. On occasion, you can even get a program that compiles but quietly misbehaves.

The easiest way to cause trouble is to define a macro with a common name like T1 or x. If you do so before the first inclusion of a library header, the effect is to rewrite any declarations that make use of these names. Whether you cleverly subvert the library or inadvertently clobber it, serious programming is seldom well served by a library vulnerable to such attacks.

Yes, I know that T1 and x are lousy names for macros. I'm sure a significant fraction of the readership of this magazine wouldn't dream of indulging in such antics. Most C and C++ programmers soon learn to stylize names for defined types, macros, etc. to minimize the problems with later maintenance. But the draft C++ Standard nevertheless permits a program to define macros with these names they are part of the name space reserved for use by the user. (A name space should not be confused with the C++ namespace declaration, despite their similarities.)

You can, in fact, often cause trouble just by declaring types with these common names. I won't try to contrive an example here, but it is a well known problem in both C and C++. In too many contexts, a type in a containing environment alters the interpretation of a declaration, or even some expressions, in a nested environment.

Production library code should treat all names with the same caution it extends to macro names it contrives for its own purposes. That means writing stuff like:

template<class _T1, class _T2> inline
bool operator==(const pair<_T1, _T2>& _X,
    const pair<_T1, _T2>& _Y)
{return (_X.first == _Y.first
    && _X.second == _Y.second); }

instead of the more readable equivalent in Listing 1. I've gotten pretty accustomed to writing and reading this sort of thing all the time. (The example above is an actual excerpt from the production code.) You can understand, however, why I might choose not to inflict yards of such code on people more interested in learning STL.

The Header <utility>

With those lengthy preliminaries out of the way, we can now focus on what the header <utility> actually does. The first item on the list is the template class pair. It is simply a way to specify a structure containing an ordered pair of objects of arbitrary types. Several occasions exist within STL where a function needs to return a pair of related values, or a container needs to store such a pair. This template class is a notational convenience for those cases.

You can construct a pair with explicit initial values, or let the default constructor supply default initial values. (This is not my favorite way of writing the default constructor. It just happens to be the one form I've found that avoids common parsing bugs in existing compilers.) No fancy information hiding here you can access the first member object of the pair x as x.first, and the second as x.second.

The template function make_pair is a handy companion. With it, you can often contrive the pair object you need on the fly. Sadly, however, you can't always do what you want. Template functions ignore any const attributes when determining template parameters. At least they're supposed to current compilers are erratic in this area. So you can't reliably contrive with make_pair a pair object that has one or more const member objects. Still, the template function has its uses.

Sometimes it makes sense to compare two pair objects x and y. The next two templates define functions that perform the essential comparison operations. Equality is the easier of the two to justify. x and y are arguably equal if their corresponding member objects are equal. But what does it mean to say that one pair is "less than" another?

You can begin by assuming that the first of the two member objects carries more weight than the second. In that case, x is less than y for sure if x.first < y.first. You can then go on to add the case where x.first == x.second && x.second < y.second. Curiously, however, the code doesn't say exactly that. Rather, it is written purely in terms of operator<, as defined for each of the member object types. With a bit of head scratching, you can convince yourself that the form actually used is equivalent to the logic I just outlined — at least if the comparison operators have their usual intuitive definition for the member object types.

One reason for this peculiar form is simple elegance. Much better to define a template in terms of just one operator function per parameter type instead of two. Another reason is even subtler, however. Defining ordering relationships in terms of just a single operator< function is a bit more robust, and more powerful, than requiring a whole family of interrelated comparison operators. I'll explain why in more detail in conjunction with some of the algorithms that impose order on sequences. All I'll say for now is that STL worries considerably about seemingly small details like this. That's one of its strengths.

Perhaps now you can guess why the four remaining templates are present in the header <utility>. The first supplies a sensible definition of operator!= for any type T that defines operator==(const T&, const T&). STL code freely intermixes the use of these two operators, for example, in comparing iterators to control loops over sequences. In the presence of this template, all you have to do is define operator== for a class and the library fills in the obvious blanks.

The other three template functions flesh out operator< in a similar way, to provide compatible definitions of the other three order comparison operators: operator<=, operator>, and operator>=. Here, however, the result is somewhat more controversial. The definitions provided enforce what is called a total ordering on objects of type T. In many cases, that's exactly what the programmer expects, but not always.

Some classes may define only a partial ordering. (Scissors cut paper, paper covers rock, rock smashes scissors.) Others may choose to define these operators in ways that have little or nothing to do with conventional ordering rules. In those cases, the presence of these template functions is a downright nuisance. They cause ambiguities, or generate surprising code, or support notation that the class designer might find actively distasteful.

The committees standardizing C++ have been arguing off and on for over a year about these template functions. It doesn't help that they were a last-minute addition to the STL proposal. Hence, there's very little field experience with them, particularly compared to STL as a whole.

For the moment, however, they are very much present. And their presence shapes the implementation of the entire Standard C++ library. I personally don't think that's such a bad thing. But then, I tend to be rather conservative in how I overload operators.

Whatever the eventual outcome, I intend to continue coding as if the header <utility> is always present, and these template operator functions are always visible. I advise you to do the same. o

P.J. Plauger is senior editor of C/C++ Users Journal. He is convener of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Draft Standard C++ Library, Programming on Purpose (three volumes), and Standard C (with Jim Brodie), all published by Prentice-Hall. You can reach him at pjp@plauger.com.