Questions And Answers

Columns

Questions & Answers

Pete Becker

Using Conversion Operators

There are very few places where the Standard C++ library provides conversion operators in standard classes. Pete explains why, and warns of some pitfalls in using conversion operators.

To ask Pete a question about C or C++, send e-mail to pbecker@wpo.borland.com, use subject line: Questions and Answers, or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.

Q
Did I read the last answer in your column in the July 1996 C/C++ Users Journal correctly? Has the C++ Standards Committee decided not to recommend the use of casting operators (conversion operators)?

For example
class A
{
public:
    operator long ();
};

A var;
long MyLong;
// convert from "class A"
// to "long" and assign
MyLong = (long)var;
I tend to agree -- the usefulness of these casting operators tends to be overshadowed by the introduction of obscure bugs when the compiler uses them "secretly." Much better is to code the operation as a member function, e.g.
class A
{
long to_long();
};

A var;
long MyLong;
// call member function which returns
MyLong = var.to_long();
"long"
-- Dave Bartrum

A
Yes, you read my answer correctly: there are very few places where the Standard C++ library provides conversion operators in standard classes. That's not quite the same as deciding not to recommend them, since it is not the function of the ANSI/ISO committee to recommend how the language should be used. However, we do regard the standard library as a sample of how we believe good code can be written, and quite a bit of time has been spent during the standardization process removing conversion operators from the standard classes. Conversion operators do, indeed, lead to problems in classes like the one in your example. In that class, the problem doesn't arise from the explicit use of the cast; it arises in those cases where there is no cast in the source code, but the compiler uses the conversion. For example:
long MyLong1 = var; // no cast
Despite the absence of a cast in this expression, the compiler will use the conversion operator that class A defines, and convert var to long in order to perform this initialization. In a simple example such as this one it's obvious what's going on, but when things get more complicated -- as in an expression when there are conversion operators available -- it can be very difficult to figure out what's happening. Here's a tricky example:
class CharArray
{
public:
 int operator[](long) const;
 operator char *() const;
};

int main()
{
CharArray arr;
// ambiguous!
cout << arr[1] << endl;
return 0;
}
If you don't know why the output expression is ambiguous, welcome to the club. This one gets past almost everyone. In evaluating the expression arr[1] there are two possible interpretations that the compiler can use. First, it can apply the index operator to the object of type CharArray, getting back an int. Second, it can apply the conversion to char *, then apply the built-in index operator to the resulting pointer, just as if arr had been an array of char. In deciding which of these to use, the compiler applies the usual rules for deciding between overloaded functions. This requires several steps.

First, the compiler looks at the first argument and decides which of the two alternatives uses the better conversion. Using the CharArray index operator requires no conversion, while using the conversion to char * requires a user-defined conversion. So the index operator wins on the first argument.

Second, the compiler looks at the second argument, and decides which of the two alternatives uses the better conversion. Using the CharArray index operator requires converting the int argument to a short, while using the conversion to char * and the built-in index operator requires no conversion. So the conversion to char * wins on the second argument.

Third, the compiler takes the set of functions that best match on the first argument and the set of functions that best match on the second argument and computes their intersection. Since there are no functions that are in both of these sets, the function call is ambiguous. That's the rule.

Now, there are a couple of things you can do to avoid the problem in this example. You can change your index operator to take an int instead of a long, and the ambiguity goes away. If you need convincing, start at the second step in disambiguating arr[1]: now both of the two possible interpretations can be used without any conversion of the second argument, so the set of best matches on the second argument contains both functions. The intersection of the two sets now consists of exactly one function, the index operator, and that's the one the compiler chooses.

Even if there was a good reason for making the index operator take a long in the first place, changing the argument type is not really a good solution. If you need a long value for this index, the need doesn't go away just because you'd like to make the call unambiguous. Don't change design decisions just to make the compiler happy .

A better solution is to get rid of the conversion operator and replace it with an explicit function. That's what the ANSI/ISO committee did with the string class: it has an explicit function named c_str() to get a char *, and there is no conversion operator to char *.

Now, you may be objecting that this is a design change that's being made just to make the compiler happy. I disagree. The design is still intact, at least at a high level: there is still a mechanism for getting a char * from objects of this type. What has changed is the mechanism that is used for doing it. This is purely a syntactic change -- that is, all that has changed is what your code must say in order to invoke this conversion. With this change, the output statement that was ambiguous is no longer ambiguous. You can still invoke the char * conversion and use the built-in index operator if you do this:
cout << arr.c_str()[1] << endl;
If that makes you wince, that's good. Be careful about using pointers to the internals of classes. It's easy to go wrong, and there's no good way for the class to check what you're doing. For example, if you accidentally wrote
         
 cout <<
  arr.c_str()[1000000] << endl;
               
               
you'd probably be in trouble.

There's another source of conversions that we haven't talked about yet: single-argument constructors. They can lead to a similar set of problems:
class string
{
public:
 // initialize from c array
 string( const char * );

 // pre-allocate specified number
 // of bytes
 string( size_t );
};

void show( string s )
{
 cout << s;
}

int main()
{
 show(3);
}
The call to show() creates a string with space allocated for three bytes and passes that string to the function show(). The result is almost certainly not what the writer of this code intended. The problem here is that the compiler treats string(size_t) as a conversion, and implicitly uses it in order to be able to call the function show(), which expects a string. This problem can't be solved by changing a name, as we did with the conversion operator; the name of a constructor must be the same as the name of its class. That's why the language now has the keyword explicit, which tells the compiler that it cannot use a constructor marked explicit to perform implicit conversions. The change to our source code is simple:
class string
{
public:
 // initialize from c array
 string( const char * );
 // pre-allocate specified number
 // of bytes
 explicit string( size_t );
};
Now the call to show() is invalid, because the compiler cannot convert the argument 3 to a string.

Be wary of implicit conversions, whether from conversion operators or single-argument constructors. They can get you in trouble. o

Pete Becker is Senior Development Manager for C++ Quality Assurance at Borland International. He has been involved with C++ as a developer and manager at Borland for the past six years, and is Borland's principal representative to the ANSI/ISO C++ standardization committee.