Back to TOC Features


Porting a C++ Application to Java

Danny Kalev

There's more to porting code to Java than just changing the keywords, as you might have guessed.


Introduction

A few years ago, when C++ came into fashion, I had to port legacy C code into C++. That was a straightforward process, but when I was asked to port C++ code into Java some months ago, it was a different story altogether. In this article, I will describe my experience migrating C++ to Java and discuss:

The System to be Migrated

The original system, developed in C++, was a management system for VDOLive servers (i.e., a large database of video films stored on a disk that serves clients' requests to broadcast a film in real time). This system is used to control the server from virtually anywhere via a communication link. It enables the operator to start and stop the server's action, request performance statistics, disconnect clients, check clients' authorization to view a requested film, and report failures. The VDOLive Management system consists of several units: GUI, customers' database, proprietary communication infrastructure, and the messaging unit.

The original messaging unit was mainly a single large and complex C++ class with the addition of some low-level external C-style functions. Because the unit was installed on various platforms, portability was a high priority. However, the lack of compatibility among different C++ compilers, as well as operating systems, turned that into an unacceptable burden. It was decided, therefore, to port this subsystem to Java, maintaining the messaging protocol in its current form since all other units of the system remained intact.

The messaging protocol is a request-response type, consisting of a fixed header containing data fields such as message code, message time, cyclic redundancy check value, and size of accompanying data. Accompanying data can be of any type and size and may contain values such as film name, IP address, channel bandwidth, and log report file. These data values are grouped in units termed data items. Each item consists of three components: item code, item size, and data.

Because the entire message is transmitted as a continuous stream of bytes, the receiving side must know the exact size of the incoming message. It then reconstitutes the header and the data items (if they exist) from the incoming byte stream. The entire message object contains the header, its data items, and most importantly, accessors and mutators.

These accessors and mutators enable an application to:

Obviously, message objects are located on both the operator's system (the management tools) as well as on the server side.

The Design Process

At first, we tested some C++-to-Java automatic source converters, all of which failed badly. Then, we tried to manually convert some C++ code samples to Java in order to assess the time and resources required for the task. We soon realized that the migration project was not going to be a trivial one; even our tiny samples required considerable redesign, so two more colleagues helped me occasionally. The syntax resemblance between C++ and Java turned out to be merely superficial and, hence, very misleading. Most of the design changes were needed due to the lack of C++ features in Java or due to different syntactic requirements.

Design Considerations in Java

One of the most powerful mechanisms in Java is packages. It allows packing several logically related classes within a "capsule," similar to the namespace mechanism in C++ but richer and less complex:

We therefore decided to pack all our project code within one package, mgr.

Converting Header Files to Packages

Java does not make use of header files so each class method is implemented within its class definition. On the other hand, the original C++ subsystem had three header files: A class declaration, a constants pool (macros, enums, and typedefs), and global function prototypes. The first and third header files were not needed anymore; the second was transformed into an interface (see below) within our package. We also decided to split the original C++ class into smaller, autonomous Java classes within a common package. Splitting classes is not easily done in C++ because you end up with a bulk of unrelated headers and implementation files, nested #includes and #ifdefs, and longer compilation time. I never tested it scientifically, but my impression is that an average class in C++ is bigger than an average Java class for that reason. Splitting the original C++ class into smaller components in Java enabled more efficient teamwork, a faster development and debugging process, and easier future maintenance. Recompilation is required only for the class that has actually been changed.

The original class was split into three basic components: the Message class and two more contained classes — Header and Item, which had previously been "Plain Old Data" (POD) members of the original C++ class and were now promoted into independent classes. Three types of constructors were implemented in these classes: a default constructor, a "standard" constructor, and a "reconstituting" constructor. The latter reconstructs a message from a stream of incoming bytes.

The Message class (Listing 1) contains a single Header object and a varying number of Item objects in a dynamic Vector. (Fortunately, Java supplies a Vector class which has some of STL's vector functionality). Message takes care of coordinating Header data and its accompanying Item(s) into a single abstract entity. For instance, when a data item is appended to it, the Header object keeps track of the overall data size. This bookkeeping is one of the Message class's responsibilities. The Message class also has three constructors: a default constructor, which creates an empty Message object ready to be filled with data, say, from an operator's console; a standard constructor, which builds a Message object from the supplied arguments (required by the using application), and a "reconstituting" constructor, defined as before.

The Header class (Listing 2) is an abstraction of the header part of a message. It has a fixed size and its fields are located in fixed positions. The Header class contains information such as the total number of bytes that the accompanying data occupy, creation timestamp, and various flags. It has two constructors: one for an empty (new) message object and a reconstituting constructor that builds a header object of a previously serialized message.

The Item class (Listing 3) is an abstraction of a data item as defined previously, so each data item in a message is an object by itself. As mentioned earlier, each item can contain data of any type and size. To support this capability, the Item class must contain size and type member fields, thus enabling the receiver to reconstitute the item exactly as it was before being serialized and transmitted [1]. The third member is of type Object, which in Java means any type at all. Note that each object in Java is implicitly derived from Object, so generic methods can be declared as taking an argument of type Object. The actual type can be detected at run time by using the Java's instanceof operator. There is one caveat, however. Primitive types such as int or char are not considered objects in Java, so they have to be "wrapped" by a corresponding object: Integer and Character, respectively.

Implementation

With this design skeleton in mind, we began the implementation process, which turned into a continuous challenge. Many of the basic building blocks of C++ were either missing or had a different meaning in Java. Thus, we had to redesign or find a workaround to reconcile the discrepancies to achieve the same functionality as well as retain conformance with the messaging protocol.

Pointers

Consider the following snippet:

//C++
long n=1000, *p = &n;
//get the first memory byte of n
char firstByte = * ( (char *) p);

Pointer manipulations such as this were heavily used in the original C++ subsystem, since it had to serialize a message object into an array of bytes that would later be transmitted to the receiver. Reconstitution of the original object from an array of bytes arriving at the receiver's port was also required. Java, however, does not have a pointer type at all! It also allows only "safe" type casting, so casting a plain int into a byte array won't do. The messaging subsystem is communication-oriented, resulting in a lot of low-level manipulations of packing, transmitting, and unpacking bits in order to implement object persistence (i.e., writing an object into a file or a stream).

In C++, implementing object persistence is a rather easy task. Either you declare an operator char * (); or you use pointer manipulation as shown. In Java, however, you must use streams. Fortunately, Java supplies a rich hierarchy of streams. Two of these, DataInputStream and DataOutputStream, can do most of the low-level work of type casting, pointer arithmetic, and iteration for you. However, they cannot serialize a complete object all at once [2]. Instead, you must activate the same process on each and every data member, assuming it is of a primitive type. But what if a member is in itself an object, such as the Header member of Message? Then, the process must be performed recursively; each object defines its own serialize method, which writes the bytes of its members into a stream if they are of basic types. Otherwise, it sends a serialize request to the embedded object. The result of the process is a stream of aligned bytes, containing the binary representation of the entire Message object in memory, ready to be transmitted.

Enumerations

The original system used dozens of enums for message type, error codes, data item type, and transaction type, to name a few. Yet, enum types do not exist in Java. Two competing solutions were suggested. The first solution was a manually enumerated set of constants grouped in an interface, which is implemented by all classes in our hierarchy, as follows:

//C++
enum transaction {
                     select,
                     insert,
                     .
                     .
                     };
//Java
interface Transaction {
static final int select = 0;
static final int insert = 1;
.
.
                     };

A more complicated solution, yet safer and easier for maintenance, is creating a base class (even an empty class will do) that represents the enum tag and from which other classes, each representing an entry in the tag, inherit:

//Java
class Transaction {}
class Select extends Transaction {}
class Insert extends Transaction {}
void performTransaction (Transaction actionCode)
{
    //runtime type identification
    if (actionCode instanceof Select)
    {
        Db.select();
    }
    else if (actionCode instanceof Insert)
    {
        Db.insert();
    }
    .
    .
    .
}

The second solution has three major drawbacks. Since each class in Java should be implemented within its own source file, we would have needed to use hundreds of tiny files containing null classes! Also, from the conceptual point of view, deriving these dozens or hundreds of null classes from a null base class seemed a cure worse than the disease. And, the excessive use of RTTI would impose severe performance penalties. Eventually, we chose the former solution in spite of its inelegance. (Java designers could have made this a lot easier had they chosen not to exclude enum types from the language.)

Preprocessor

Java has no preprocessor and hence no header files. Conditional compilation is therefore out of the question, but it is compensated by the fact that Java code is almost absolutely portable. Coding conventions in Java state that each class be written in a separate file, whose name should be identical to the class name it contains. Each class/interface is grouped in a dedicated package (which should also be identical to the folder in which the classes' source files are stored). This contributed to the decision to split the original subsystem into three independent entities as described previously.

Globals

Because Java is a pure OO programming language (as opposed to C++, which is a "multi-paradigm language that supports object-oriented and other useful styles of programming" [3]), neither functions (including main) nor variables may be declared outside an enclosing class/interface. The original subsystem, though, had hundreds of global symbols in the form of enums, consts, and macros accessible to all units of the Management system. The use of a base class containing all these symbols and inherited by the previous three classes was out of the question. It would have caused a massive reduplication of data to each and every object instance! Declaring this data static would have prevented reduplication, but it wasn't an ideal solution for other reasons, such as:

Using a public interface was a much better solution. We added an interface termed Codes in which all "global" symbols were declared. Each class in our project/package was declared as implementing that interface and hence was allowed to use all of these symbols directly (and safely). This is, in fact, the Java convention in such cases.

Minor Obstacles

These were the major difficulties encountered during our project. However, we also had to tackle other smaller obstacles.

The following is a list of minor incompatibilities between C++ and Java. They're minor in that they did not, in our case, require redesign or rethinking of our basic migration strategy, but they had to be dealt with as we carried out the migration.

      //C++
	
      template <class T>
      swap ( T& f, T& s);
	
      //Java
      swap( Object f, Object s);
      //full version
      int send( int size, boolean confirm) {...}
      //final arg omitted
      int send( int size) { return send(size, true);}

Besides the features lacking in Java, there are some superficial similarities that may be rather misleading. Java designers purposefully made Java syntax resemble C++ so as to ease the process of learning the new language [5]. But these similarities sometimes have entirely different meanings, which are not always easily tracked by inexperienced Java programmers:

Should You Migrate?

Learning a new programming language can be both interesting and fun, since you can compare them and think of how to improve one by borrowing features from the other. (C and C++ have been doing this for almost two decades.) Java has many advantages over C++, such as automatic garbage collection, bounds checking, rich standard libraries, packages, and most importantly for us, portability. On the other hand, due to the removal of crucial constructs such as pointers, bit fields, and memory management, Java may not be the ideal choice for low-level programming, as required in embedded systems, hardware drivers, and time-critical software. So, if you are contemplating migrating to Java, you should consider the following factors:

Many features were excluded from Java for the sake of simplicity. We sometimes had the feeling that Java designers were interested more in easy compiler writing than easy programming. This is opposite from the C++ approach [6, p. 7]. The exclusion of enums is probably the best example for that. I suspect that as Java becomes more widespread, it will have to be extended.

Conclusion

The migration process was definitely not the kind of thing we all had been accustomed to when porting C code into C++. But the trouble was actually worthwhile. Our goal was to reach a state where one version of software would compile and run on all environments. In that respect, Java fulfilled its promise. Those of you who have experience with writing portable code in C++ know what it means: conditional compilations, avoiding the use of enhanced features such as namespaces, and STL (and even exception handling since at least one compiler does not support it), and chasing nested #ifdefs and #includes.

Our goal was achieved but not without a price. We were aware that Java was inferior to C++ in terms of performance and anticipated a mediocre performance degradation (about double the original response time on average). There are some tricks to enhance performance [1], but don't be misled into thinking you're ever going to get the speed of C++.

Java is not a "lightweight C++" or "a better C++." These two languages are indeed different programming languages, as I've attempted to show. In spite of the fact that these languages resemble one another, they impose disparate design strategies when it comes to programming in the real world. o

Notes

[1]. You may wonder at the use of a switch statement on this type of an object, which may indicate poor design [reference 6, p. 306-308]. Indeed, we tried to use a more OO approach at first, but finally went on a type-field solution for three reasons:

a. Lack of sizeof operator.

b. Lack of RTTI support in Java for primitive types (as opposed to typeid in C++, which makes no such discrimination).

c. Most important: performance. The average message object may contain dozens of data items. As a result, the reconstituting constructor may be invoked dozens of times on a single message object. We discovered that the use of RTTI would impose a severe performance penalty, which was unacceptable. In fact, many high-level OO features are avoided even in C++ when performance matters; such as virtual methods, typeid, and dynamic_cast<>. This is one of the differences between clean textbook code samples and real-world programming where compromises have to be made.

[2]. The 1.1 and the current 1.1.4 JDK releases do support object persistence, but still we could not simply use the ObjectInputStream.writeObject method. It would breach backward compatibility with the messaging protocol, since the size and the layout of the byte stream produced are not identical to those produced by our original protocol. For instance, serialized arrays are not just the sum of their members' bytes but also contain a Java internal header part. So, even a one-char string may occupy more than four bytes. This demonstrates once more that designing from scratch is much easier than migrating legacy code.

[3]. Bjarne Stroustrup's FAQ page at URL: http://www.research.att.com/~bs/ bs_faq.html

[4]. Ken Arnold, and James Gosling. The Java Programming Languauge (Addison-Wesley, 1996).

[5]. M. Grand. Java Language Reference (O'Reilly Associates, 1997).

[6]. Bjarne Stroustrup. The C++ Programming Language, third ed. (Addison-Wesley, 1997).

Danny Kalev is a system analyst and software engineer with several years of experience, specializing in C++ and OOD. He is an active member of the ANSI/ISO J16 C++ Standardization Committee. His current technical interests involve networking, compiler technology, artificial intelligence, and distributed data systems. He can be reached at danni@zoot.tau.ac.il.