For the most part, I've been extremely happy with the Java platform: it's solid & reliable, performance of the underlying compilers and virtual machines has steadily improved, the class library is rich and well-documented.
However, it's been a persistent aggravation that Java performance tuning is much harder that it seems like it needs to be. I spent several fruitless years in a previous job struggling to improve the performance of some quite complex Java software. Time and time again I'd construct a benchmark, execute it under various profiling regimens, study the results, attempt to improve the performance, and be aggravated to discover that my changes had no visible effect.
Other times, guided either by instinct or by the knowledge and suspicions of fellow developers, I'd make performance-related changes that, according to the profilers, should have had no benefit at all, only to see substantial improvements in benchmark results.
So I was quite interested to happen upon this recent paper: Evaluating the Accuracy of Java Profilers. The paper does a very thorough and approachable job of explaining why current Java profilers produce misleading results, as well as suggesting ways that future profilers could be improved.
From the paper:
The contributions of this paper are as follows:
1. We show that commonly-used profilers often disagree with each other and thus are often incorrect.
2. We use causality analysis to determine whether or not a profiler is producing an actionable profile. This approach enables us to evaluate a profiler without knowing the “correct” profile; prior work on evaluating the accuracy of profilers assumes the existence of a “correct” profile (Section 8.1). For example, Buytaert et al. use a profile obtained using HPMs and frequent sampling to evaluate the accuracy of other profilers.
3. We show, using causality analysis, that commonly-used profil- ers often do not produce actionable profiles.
4. We show that the observer effect biases our profiles. In partic- ular, dynamic optimizations interact with a profiler’s sampling mechanism to produce profiler disagreement.
5. We introduce a proof-of-concept profiler that addresses the above mentioned source of profiler bias. As a consequence our profiler produces actionable profiles.
I thought that the most interesting part of the paper was section 6: "Understanding the cause of profiler disagreement". The key observation that is made in this section has to do with the interaction between the profiler's sampling technique and the underlying JVM's implementation of "yield points":
To understand why our profilers were not randomly picking sam- ples from the program run, we took a closer look at their imple- mentation. We determined that all four profilers take samples only at yield points. More specifically, when a profiler wishes to take a sample, it waits for the program’s execution to reach a yield point.
Yield points are a mechanism for supporting quasi-preemptive thread scheduling; they are program locations where it is “safe” to run a garbage collector (e.g., all the GC tables are in a consistent state). Because yield points are not free, compilers often optimize their placement. For example, as long as application code does not allocate memory and does not run for an unbounded amount of time, the JVM can delay garbage collection until after the appli- cation code finishes; thus, a compiler may omit yield points from a loop if it can establish that the loop will not do any allocation and will not run indefinitely. This clearly conflicts with the goal of profilers; in the worst case, the profiler may wish to take a sample in a hot loop, but because that loop does not have a yield point, the profiler actually takes a sample sometime after the execution of the loop. Thus, the sample may be incorrectly attributed to some method other than the one containing the loop.
Using an alternate technique, the authors prototype a different sampling-based profiling technique, which they call tprof, and test out their profiler on several common benchmarks, and demonstrate dramatically improved profiling results. Unfortunately, they also run up against a number of limitations in their profiling technique, due to the behaviors of current JVMs and JITs, but it seems like these limitations should be surmountable, so I really hope that this paper gets some attention in the Java profiling community and leads to improved profiler behaviors.