Nico Vuyge's blog

Taking the blue AND the red pill

Google's performance analysis comparing C++, Java, Scala and Go

Nico Vuyge

The Register recently commented on a Google research paper comparing the performance of C++, Java, Scala and Go. I'd like to provide my own comments here.

The performance results show that the optimised C++ code is by far the fasted solution. And 'by far' is really an understatement: The Java code is between 4 and 6 times slower, depending on wether you're running 32-bit or 64-bit code. To put this in perspective, this means that this (single-threaded) C++ code on a 1998 era 450 MHz Pentium II would run as fast as on today's 3GHz desktop CPU's. We're basically wasting 15 years of CPU progress by not using the fastest language.

"We find that in regards to performance, C++ wins out by a large margin. However, it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer."

I don't agree with the second part of this quote. The optimisations described in the article are rather trivial like choosing the most appropriate container type (hash_map instead map), and initializing the containers with an properly chosen initial size.

I wouldn't be surprised that if the test would have been implemented in C#, the results would be similar to the Java results. I have some anecdotal evidence: Last year, I took a week off to experiment with the new parallel language features in Visual Studio 2010 (yes, I'm one of those persons who considers that 'holiday'), both in C# and in C++. My test consisted of the same small problem (but with a very large data set), implemented in a straightforward way in C# and C++. I then optimised both implementations (a test case for the VS2010 profilers), and then built various parallel versions with the new .Net and C++ parallel class libaries. My conclusion was that the basic single threaded version in C++ was similar in performance (using 1 core) to the optimised parallel version in C# that used all 8 (hyper-threaded) cores of my quad core laptop. And the parallel version in C++ scaled very well with predictable performance. The parallel C# version had more variability in the performance, probably due to the effects of garbage collection.

This reminds me of a statement of Herb Sutter in a panel discussion on the future of software development in the 2009 PDC. He said that it won't be long before we've exhausted all gains possible by parallelisation, and that from then on, plain old software optimisation will come fashionable again. I would like to add here that this is best done from the best starting point, and that is using the most performant language.