Distributed compiling in Xcode 1.5
Last year, Apple introduced a feature in Xcode that allowed it to use distributed compiling to spawn many instances of gcc (the compiler used by Xcode) across multiple CPUs and computers on your network. The goal here is to speed up compilation in Xcode and shorten your work cycle.
We do all of our development work on the Mac in CodeWarrior, which is only single-CPU aware and has no distributed option. Last September, I decided to do some benchmarking on a "real world" example, Knights of the Old Republic. Read on to find out how the compile times and performance stack up.
KotOR is a C++ based app, and is based on a cross-platform engine. It uses OpenGL natively on both the PC and the Mac. My rough estimate shows that it contains roughly 500 moderately-sized source files, and it uses precompiled headers in both the CodeWarrior and Xcode builds.
So let's establish the baseline time for CodeWarrior on my test Mac, a 2.5GHz G5 with 1 gig of RAM. I used CodeWarrior 9.2 and Xcode 1.5. This is the total build time in CodeWarrior and Xcode (dual-CPU):
Debug build:
CodeWarrior: 6 min, 30 seconds
gcc: 10 minutes
Release build:
CodeWarrior: 11 min, 35 seconds
gcc: 10 minutes, 10 seconds
Interestingly, gcc's compile time for the debug build (i.e. the optimizer turned off) is almost identical to the when the optimizer is turned to the max. Related to that, the performance in gcc's non-optimized build is much better than CodeWarrior's. For KOTOR at least, this means that a debug build in Xcode isn't worth it vs. the release build.
With that baseline in place, let's get to the meat. I had at my disposal a dual-800 MHz G4 Mac, a 1.25 GHz G4 PowerBook, and a sad little 500 MHz G3 iBook. :) I messed around with hosting the Xcode build on various Macs, as well as removing one CPU on the host to see how that affected performance. I had heard that sometimes its best in a distributed setup not to use both CPUs on the host to keep overhead manageable. Here are the results of all the permutations I tested:
800 MHz G4 x 2 (host), 2.5 GHz G5 x 2 (remote) : 12 minutes, 13 seconds.
800 MHz G4 x 1 (host), 2.5 GHz G4 x 2 (remote) : 12m 13s
2.5 GHz G5 x 2 (host), 800 MHz G4 x 2 (remote) : 8m 25s
2.5 GHz G5 x 1 (host), 800 MHz G4 x 2 (remote) : 10m 20s
2.5 GHz G5 x 2 (host), 800 MHz G4 x 2 (remote),
1.25 GHz G4 (remote) : 7m 33s
2.5 GHz G5 x 2 (host), 800 MHz G4 x 2 (remote),
500 MHz G3 (remote), 1.25 GHz G4 (remote) : 8m 20s
As you can see, I was able to start beating the performance of CodeWarrior when I hosted on the 2.5GHz G5 and added in the 800MHz G4. I trimmed a minute off that by adding in the 1.25GHz PowerBook. When I got greedy and threw the iBook into the mix, I lost any gains garnered by adding the PowerBook.
What's interesting about this is the time for the steps to generate the precompiled headers and do the final link. While running these tests, I noticed that these are significantly longer steps in Xcode than CodeWarrior. These steps are also performed solely on the host CPU, so the speed of the host is critical to the overall performance of the distributed build. Here are the times for those 2 steps combined on the 2 Macs I used as hosts:
2.5GHz host - linker/precompiled step time: 1m 45s
800 MHz G4 host - 3m 20s
If you subtract those steps from the aggregate build times above, you get a better feel for how the distributed compile affects things:
2.5 GHz G5 x 2 : 8m 15s
800 MHz G4 x 2 (host), 2.5 GHz G5 x 2 (remote) : 8m 53s
2.5 GHz G5 x 2 (host), 800 MHz G4 x 2 (remote) : 6m 40s
2.5 GHz G5 x 2 (host), 800 MHz G4 x 2 (remote),
1.25 GHz G4 (remote) : 5m 48s
Bottom-line, Xcode's distributed compiling option can beat CodeWarrior in release builds if you have a decent dual-CPU Mac sitting around to throw at things. Although I didn't do tests with the distributed compiler and the debug builds in Xcode, if the times hold fairly closely to the optimized builds, then you'll have to add more CPUs than I had available to approach CodeWarrior's performance. Most of our development turn-around is with the debug builds, and right now, CodeWarrior retains the edge.
For those of you curious about the actual performance of the game between CodeWarrior and Xcode, it's a virtual dead-heat. In my tests here, there is roughly a 1-2 frame per second advantage for CodeWarrior, which is well within the margin of error.
With gcc 4.0/Xcode 2.0 in Tiger, I hope to do these benchmarks again. I'm not expecting compile times to improve a whole lot, although I do hear that the optimizer in gcc 4.0 is much better now.
Comments
Interesting post Brad. I certainly look forward to seeing how gcc4.0/XCode 2 stacks up against these numbers.
Posted by: ben | April 30, 2005 10:22 AM
Likewise, I'm looking forward to the GCC4 numbers
Posted by: Nicholas Shanks | May 1, 2005 11:55 PM
I'm interested to see these results, too, especially since KOTOR 1.0b3 appears to be broken under Mac OS X 10.4 :-(
Posted by: Matt Vaughn | May 3, 2005 06:30 AM
"I'm interested to see these results, too, especially since KOTOR 1.0b3 appears to be broken under Mac OS X 10.4 :-("
How did you get your hands on an old beta release of KOTOR? You should be using 1.03c, the last public release.
I am aware of a crashing issue in KOTOR and 10.4 that is specific to nVidia cards. If you are experiencing a crash when you launch KOTOR and don't have an nVidia card, let me know.
Posted by: Brad Oliver | May 3, 2005 01:15 PM
Maybe he meant 1.03b?
Posted by: a2daj | May 3, 2005 02:45 PM
Have you tried ZeroLink with the debug build? That significantly cuts the link time.
Posted by: Ted Goldstein | May 12, 2005 10:21 PM
"Have you tried ZeroLink with the debug build? That significantly cuts the link time."
ZeroLink won't work as we have to override some system functions (fopen, open, printf) with our own variants at link time.
Posted by: Brad Oliver | May 13, 2005 12:04 AM
Why are you overriding system functions? Its not really a common practice and it really isn't something that you can depend upon. Libsystem is compiled with external linkage, but many libraries are not. It may not be possible to override these functions in the future.
Posted by: Ted Goldstein | May 15, 2005 08:50 AM
"Why are you overriding system functions?"
We're intercepting and tweaking some parameters on their way to the real system calls.
"Libsystem is compiled with external linkage, but many libraries are not. It may not be possible to override these functions in the future."
If the linkage of printf and fopen in libSystem changes to internal, then no app will be able to use those calls. That's not even remotely likely to happen.
Posted by: Brad Oliver | May 15, 2005 02:12 PM