First, construct a benchmark. This will have at least these benefits:
- It will force you to get specific about the software under test. For any significant and complex piece of software, there are a forest of inter-related variables: which features are you interested in tuning, what configuration will be used to run them, what workload will be applied, what duration of test, etc.
- It will give you some real running software to measure, tune, adjust, etc. Without real running software, you won't be able to run experiments, and without experiments, you won't know if you're making progress or not.
OK, so now you have a benchmark. The next step is to run it! You need to run it, in order to start getting some data points about how it performs. You also need to run it, in order to start developing some theories about how it is behaving. Ideally, save all your benchmark results, recording as much information as you can about the configurations that were used, the software that was tested, and the results of the benchmark.
While the benchmark is running, watch it run. At this point, you're just trying to get an overall sense of how the benchmark runs. In particular, pay attention to the four major components of overall system performance:
- CPU cycles
- Disk activity
- Memory usage
- Network activity
Perhaps, while running your benchmark, you didn't seem to be bound by any resource? Your CPU was never fully busy, you always had spare memory, your disks and network were never very active? Here are two other possibilities that you don't want to overlook at this stage:
- Perhaps your benchmark just isn't giving the system enough work to do? Many systems are designed to handle lots of concurrently-overlapping work, so make sure that you're giving your benchmark a large enough workload so that it is busy. Try adding more work (more clients, or more requests, or larger tasks...) and see if throughput goes up, indicating that there was actually spare capacity available.
- Perhaps your benchmark is fine, but your system has some internal contention or concurrency bottlenecks which is preventing it from using all the resources you're giving it. If it seems like you're not using all your resources, but adding more work doesn't help, then you should start to think about whether some critical resource is experiencing contention.
If you're going to be this serious about Java performance tuning, you should be reading, because there's 15 years of smart people thinking about this and you can (and should) benefit from all their hard work. Here's some places to start:
- Read every issue in the archive of Heinz Kabutz's wonderful newsletter.
- Read Brian Goetz's archived series in IBM Developer Works.
- Spend some time on Jack Shirazi and Kirk Pepperdine's web site: javaperformancetuning.com
First, while you are running your benchmark, trying pressing Control-Break a few times. If you are running on a Linux or Solaris system, try sending signal QUIT (kill -QUIT my-pid) to your java process. This will tell your JVM (at least, if it's the Sun JVM) to dump a stack trace of the current location of all running threads in the virtual machine. Do this a half-dozen times, spaced randomly through your benchmark run. Then read through the stack traces, and consider what they are telling you: are you surprised about which threads are doing what? Did you see any enormously deep or intricate stack traces? Did you see a lot of threads waiting for monitor entry on some master synchronization object? I call this the "poor man's profiler", and it's amazing what it can tell you. It's particularly nice because you don't have to set anything up ahead of time; you can just walk up to a misbehaving system, toss a couple stack trace requests into it, and see if anything jumps out at you in the output.
The next step is to instrument your program, at a fairly high level, using the built-in Java clock: System.currentTimeMillis. This handy little routine is quite inexpensive to call, and for many purposes can give you a very good first guess at where problems may be lurking. Just write some code like this:
long myStartTime = System.currentTimeMillis();
int numLoops = 0;
.. lotsa code that might be revealing here ...
long myElapsedTime = System.currentTimeMillis() - myStartTime;
System.out.println("My loop ran " + numLoops +
" times, and took " + myElapsedTime + " milliseconds.");
Usually it doesn't take too long to sprinkle some calls to currentTimeMillis around, and get some first-level reporting on where the time is going.
One thing to keep in mind when using this technique is that the currentTimeMillis clock ticks rather irregularly: it tends to "jump" by 15-16 milliseconds at a time. So it's hard to measure things more accurately than about 1/60th of a second. Starting with JDK 1.5, there is a System.nanoTime call which is much more precise, which gets around this problem.
If you've gotten this far, you should have:
- A reliable benchmark, with a history of results
- Some high-level numbers and theories about where the time is going
- Some initial ideas about whether your system is CPU-bound, Disk-bound, memory-bound, etc.
But if you can't yet figure out what to change, but you at least know what code concerns you, it's time to put a serious profiler to work. You can get profilers from a number of places: there are commercial tools, such as JProfiler, or tools built into Eclipse and NetBeans, or standalone tools like VisualVM. At any rate, find one and get it installed.
When using a profiler, there are essentially 3 things you can do:
- You can look at elapsed time
- You can look at method invocation counts
- You can look at memory allocation behaviors
Analyzing memory allocation is also a great way for finding code which is run a lot, and removing unnecessary and wasteful memory allocation can be a good way to speed up Java code. However, modern garbage collectors are just spectacularly, phenomenally efficient, and in recent years I've often found the cutting back on the generation of garbage by itself often doesn't really help much. However, if the generation of garbage is a clue (a "code smell", as they say) which leads you to locate a bunch of unnecessary and wasteful work that can be eliminated, then it's a good way to quickly locate a problem.
Lastly, there is the profiler's elapsed-time analysis. Sometimes, this can work really well, and the profiler can show you exactly where your code is spending its time. However, in a complicated server process, with lots of background threads, I/O operations, etc., figuring out how to interpret the elapsed-time graphs can be very tricky. I've often thought that the profiler was pointing a giant Flying Fickle Finger of Fate directly at some section of my code, worked industriously to isolate and improve that code, only to find that, overall, it didn't make any difference in the actual benchmark results.