Ubuntu has done it again.
I've upgraded my development machine to Karmic Koala, a.k.a. Ubuntu 9.10.
Once again, the Ubuntu upgrade process was trouble-free, although I think I might have hit some congestion at the download servers because the download was slow for a bit. A bit of patience and all was well again.
Now I am up and running on 9.10, and hoping to start learning about all that is new.
Short notes and essays about stuff that interests me (mostly technical stuff).
Pages
▼
Saturday, October 31, 2009
Tuesday, October 27, 2009
Visiting in the reverse direction
Although we had family visitors this week, I am thinking of a different sort of visiting.
For some time now, the Derby code has taken advantage of the Visitor Pattern to organize various code which manipulates query trees during compilation. For example, in DERBY-2085, I enhanced the visitor which examines aggregate functions (SUM, MAX, COUNT, etc.) in the query tree; the visitor pattern made it straightforward to issue a clear error message for an invalid query.
Recently, a Derby user discovered that the following two queries were processed differently:
versus
Since 1=1 and 2>1 both are constant expressions which evaluate to TRUE, it seems reasonable to expect that both would behave the same, but this was not the case, which led to the filing of DERBY-4416.
Knut Anders Hatlen, who has been doing some great work on Derby recently, was studying DERBY-4416 when he realized that this whole issue of examining the query tree for constant expressions (which could be simplified and pre-computed at compile time) was complicated, in part, because the Derby visitor pattern was visiting the tree top-down, and if we instead had a way to visit the tree bottom-up, the expression simplification would be much easier to implement.
So now we have a proposed enhancement to the query tree visitor which allows for visiting the tree in either fashion: bottom-up, or top-down. Both styles appear to be valuable, and it's quite straightforward to make the visitor implementation handle both, with a simple ordering flag which controls the behavior.
For some time now, the Derby code has taken advantage of the Visitor Pattern to organize various code which manipulates query trees during compilation. For example, in DERBY-2085, I enhanced the visitor which examines aggregate functions (SUM, MAX, COUNT, etc.) in the query tree; the visitor pattern made it straightforward to issue a clear error message for an invalid query.
Recently, a Derby user discovered that the following two queries were processed differently:
WHERE 1=1 AND c1 >= ?
versus
WHERE 2>1 AND c1 >= ?
Since 1=1 and 2>1 both are constant expressions which evaluate to TRUE, it seems reasonable to expect that both would behave the same, but this was not the case, which led to the filing of DERBY-4416.
Knut Anders Hatlen, who has been doing some great work on Derby recently, was studying DERBY-4416 when he realized that this whole issue of examining the query tree for constant expressions (which could be simplified and pre-computed at compile time) was complicated, in part, because the Derby visitor pattern was visiting the tree top-down, and if we instead had a way to visit the tree bottom-up, the expression simplification would be much easier to implement.
So now we have a proposed enhancement to the query tree visitor which allows for visiting the tree in either fashion: bottom-up, or top-down. Both styles appear to be valuable, and it's quite straightforward to make the visitor implementation handle both, with a simple ordering flag which controls the behavior.
Monday, October 26, 2009
Linux desktop shift-select
I think this is a significant mis-behavior I've noticed with Ubuntu/Gnome.
Suppose that you have several files "on your desktop", and suppose that they are named "a", "b", "c", and "d".
Furthermore, suppose that you have physically dragged the icons for these files around on your desktop so that they are arranged, from top to bottom on your screen, in the order: "c", "a", "b", "d". That is, the icon for "c" is in the top left, "a" is below it, "b" below that, and "d" at the bottom.
Now, suppose that you single-click on the icon for "d", and then shift-click on the icon for "b". Shift-click is the indication for "range-selection", and it's supposed to select everything that is "in-between" the previous selection and the newly-selected item.
On Windows, what happens in this situation is that two icons are selected, those for "b" and "d". This is because Windows processes the shift-click range selection spatially, and since there are no other icons spatially between "b" and "d" on the screen, only those two icons are selected.
On Gnome, at least in my case, the icon for "c" in the top left of the screen was also selected, so the result of the shift-click on "b" is that three icons are selected: "c", "b", and "d".
I assume this is because Gnome is treating the "range-selection" behavior as meaning "all icons whose names are between the previous selection and the newly-selected item", instead of interpreting it as meaning icons whose position on the desktop is between the selections.
Did I misunderstand the behavior here?
Suppose that you have several files "on your desktop", and suppose that they are named "a", "b", "c", and "d".
Furthermore, suppose that you have physically dragged the icons for these files around on your desktop so that they are arranged, from top to bottom on your screen, in the order: "c", "a", "b", "d". That is, the icon for "c" is in the top left, "a" is below it, "b" below that, and "d" at the bottom.
Now, suppose that you single-click on the icon for "d", and then shift-click on the icon for "b". Shift-click is the indication for "range-selection", and it's supposed to select everything that is "in-between" the previous selection and the newly-selected item.
On Windows, what happens in this situation is that two icons are selected, those for "b" and "d". This is because Windows processes the shift-click range selection spatially, and since there are no other icons spatially between "b" and "d" on the screen, only those two icons are selected.
On Gnome, at least in my case, the icon for "c" in the top left of the screen was also selected, so the result of the shift-click on "b" is that three icons are selected: "c", "b", and "d".
I assume this is because Gnome is treating the "range-selection" behavior as meaning "all icons whose names are between the previous selection and the newly-selected item", instead of interpreting it as meaning icons whose position on the desktop is between the selections.
Did I misunderstand the behavior here?
Tomcat static file caching
Tomcat caches static files by default.
I think this is probably well-known, but since it was a surprise to a few people that I mentioned it to, I thought I'd discuss it in just slightly more detail.
I was working with a test suite which does the following:
After looking at this a little bit, I realized that Tomcat was caching the file, and was not recognizing that the contents of the file had changed, perhaps because Tomcat may use file timestamps for cache coherency and it's possible to execute the above steps on a fast machine so quickly that the file's timestamp doesn't appear to change.
(The file's size, and checksum, and probably other observable characteristics do change, but perhaps Tomcat's static file cache doesn't notice this?)
At any rate, the fix to these tests is fairly simple: I arranged it so that the tests create a context.xml file in webapps/ROOT/META-INF, with the contents:
And now the test seems to reliably retrieve the latest copy of the file when re-fetching it via HTTP.
I think this is probably well-known, but since it was a surprise to a few people that I mentioned it to, I thought I'd discuss it in just slightly more detail.
I was working with a test suite which does the following:
- Copies a simple text file into Tomcat's webapps/ROOT folder.
- Fetches the file via HTTP from Tomcat.
- Copies an altered version of the file over the first file.
- Fetches the file again via HTTP.
After looking at this a little bit, I realized that Tomcat was caching the file, and was not recognizing that the contents of the file had changed, perhaps because Tomcat may use file timestamps for cache coherency and it's possible to execute the above steps on a fast machine so quickly that the file's timestamp doesn't appear to change.
(The file's size, and checksum, and probably other observable characteristics do change, but perhaps Tomcat's static file cache doesn't notice this?)
At any rate, the fix to these tests is fairly simple: I arranged it so that the tests create a context.xml file in webapps/ROOT/META-INF, with the contents:
<?xml version="1.0" encoding="UTF-8"?>
<context cachingallowed="false">
And now the test seems to reliably retrieve the latest copy of the file when re-fetching it via HTTP.
Thursday, October 22, 2009
OSGi bundles and the Export-Package manifest
I can at times be rather a technology dilettante.
There is so much technology, and so little time, that I am often forced to be hurried in trying to learn just enough about some particular technology to get the job done.
A case in point is DERBY-4120, a request by a user to improve the contents of the derbyclient.jar manifest file so that it can be handled as a OSGi bundle.
I don't use OSGi myself, and don't plan to. But this is a simple request, and I think that it doesn't adversely affect any other uses of derbyclient.jar, and so I'd like to cross this item off the list and go on to the next Derby request.
So I've done just enough reading and searching of the web to make me think that I merely have to ensure that the Manifest.mf file contains:
Unfortunately, since I'm not a OSGi user, it's hard for me to know whether this is correct or not. But still, I can:
And that's the way that software evolves.
There is so much technology, and so little time, that I am often forced to be hurried in trying to learn just enough about some particular technology to get the job done.
A case in point is DERBY-4120, a request by a user to improve the contents of the derbyclient.jar manifest file so that it can be handled as a OSGi bundle.
I don't use OSGi myself, and don't plan to. But this is a simple request, and I think that it doesn't adversely affect any other uses of derbyclient.jar, and so I'd like to cross this item off the list and go on to the next Derby request.
So I've done just enough reading and searching of the web to make me think that I merely have to ensure that the Manifest.mf file contains:
Bundle-SymbolicName: derbyclient
DynamicImport-Package: *
Export-Package: org.apache.derby.jdbc
Unfortunately, since I'm not a OSGi user, it's hard for me to know whether this is correct or not. But still, I can:
- Make this change and verify that it works as expected
- Run some tests to ensure that at the very least this doesn't break anything
- Propose the change and commit it to the trunk, and if it isn't good enough for what the requester needs, hope that they will come back and describe more clearly what is needed
And that's the way that software evolves.
Intellij IDEA Community Edition
My colleague Andrew pointed out to me that IntelliJ have decided to open-source their phenomenal IDEA development environment.
I think this is wonderful news, and I'm quite interested in learning more about IDEA and seeing whether a true community forms.
To start, I've begun experimenting with IDEA and Derby:
This was a great first step, and very exciting. Now I'm motivated to learn more about IDEA, and see if it can make me more productive in my Derby work.
I think this is wonderful news, and I'm quite interested in learning more about IDEA and seeing whether a true community forms.
To start, I've begun experimenting with IDEA and Derby:
- I downloaded and installed the new Community Edition IDE on my Ubuntu development machine.
- I enabled several of the core plug-ins, including those for basic Java development, those for Ant support, and several of the source code control plug-ins
- I checked out a fresh copy of the Derby trunk and created a new IDEA project against that source tree.
- I used the Ant support in IDEA to build Derby.
- I defined a configuration to run the Derby "ij" command line tool
- I ran the IDEA debugger, and was able to set breakpoints and step through the Derby source in the debugger.
This was a great first step, and very exciting. Now I'm motivated to learn more about IDEA, and see if it can make me more productive in my Derby work.
Wednesday, October 21, 2009
Data visualization and charting with YUI
(I fear that title is going to promise more than I can deliver.)
I've been working, with several colleagues, on a suite of multi-machine performance tests. These tests exercise a large and complex distributed system, with a workload which is accomplished by routing work requests around to the various servers on those machines.
At the very beginning, the goal was just to get the suite to run. At all.
Then, we moved on to measuring the overall elapsed time of the complete benchmark.
The next step was to try to break down the behavior of the individual machines during the benchmark, so we could understand which machine was the bottleneck at various points during the test.
At this point, I started working with the Windows TypePerf tool, which is a marvelous low-level tool for gathering performance information. I enhanced our harness so that each machine has a TypePerf instance running in the background, gathering performance data and saving it to a file.
This means that each machine gathers low-level data that looks like this:
This is powerful data, but very low-level. Each line records the activity level on that machine at that time, in the areas of disk activity, memory availability, CPU usage, network I/O, etc. For the first pass, we were interested in the patterns of CPU usage on the various machines during the time of the test.
So the next thing I did was to write analysis software which took a collection of these samples, one per machine in the test, and correlated and aggregated the data up into a single HTML table, so that the table has:
The HTML table then looked something like this:
This is much better, but still fairly low level.
So at this point, I turned to the marvelous YUI charting tool.
The YUI charting tool, in its standard invocation, knows how to take a simple HTML table and display it as a YUI chart. Pretty much all you have to do is:
It's barely a dozen lines of JavaScript; YUI does all the heavy lifting. Chris Heilmann's blog is a great overview of the process, with clear examples to show you the results.
And lo! We have a elegant line chart which brings the patterns in the data instantly to the front.
It's very enjoyable to get so much value from such a small amount of code. Way to go YUI!
I've been working, with several colleagues, on a suite of multi-machine performance tests. These tests exercise a large and complex distributed system, with a workload which is accomplished by routing work requests around to the various servers on those machines.
At the very beginning, the goal was just to get the suite to run. At all.
Then, we moved on to measuring the overall elapsed time of the complete benchmark.
The next step was to try to break down the behavior of the individual machines during the benchmark, so we could understand which machine was the bottleneck at various points during the test.
At this point, I started working with the Windows TypePerf tool, which is a marvelous low-level tool for gathering performance information. I enhanced our harness so that each machine has a TypePerf instance running in the background, gathering performance data and saving it to a file.
This means that each machine gathers low-level data that looks like this:
"10/20/2009 08:01:44.272","0.000000","1114927104.000000","6688.000000","0.009847","313399.348675","54.226956"
"10/20/2009 08:01:45.272","0.000000","1120014336.000000","6674.000000","0.125010","123076.715079","40.229407"
"10/20/2009 08:01:46.272","0.000000","1137823744.000000","6649.000000","0.069106","194001.835504","71.091345"
This is powerful data, but very low-level. Each line records the activity level on that machine at that time, in the areas of disk activity, memory availability, CPU usage, network I/O, etc. For the first pass, we were interested in the patterns of CPU usage on the various machines during the time of the test.
So the next thing I did was to write analysis software which took a collection of these samples, one per machine in the test, and correlated and aggregated the data up into a single HTML table, so that the table has:
- a row for each minute
- a column for each machine in the test
- and the contents of the table cell for that row/column pair is the average CPU usage on that machine during that minute
The HTML table then looked something like this:
Date | Node1 | Node2 | Node3 | Node4 | Node5 | Node6 | Node7 | Node8 | Overall |
10/20/2009 05:26 | 0.0 | 0.0 | 0.0 | 5.5 | 0.0 | 30.8 | 0.0 | 36.8 | 23.7 |
10/20/2009 05:27 | 0.0 | 0.0 | 0.0 | 1.4 | 0.0 | 0.9 | 0.0 | 0.3 | 0.9 |
This is much better, but still fairly low level.
So at this point, I turned to the marvelous YUI charting tool.
The YUI charting tool, in its standard invocation, knows how to take a simple HTML table and display it as a YUI chart. Pretty much all you have to do is:
- Define an HTML Table DataSource which points at your HTML table
- Define a YUI Line Chart widget, and feed it your HTML DataSource
It's barely a dozen lines of JavaScript; YUI does all the heavy lifting. Chris Heilmann's blog is a great overview of the process, with clear examples to show you the results.
And lo! We have a elegant line chart which brings the patterns in the data instantly to the front.
It's very enjoyable to get so much value from such a small amount of code. Way to go YUI!
Monday, October 19, 2009
The Oracle/Sun merger and the MySQL issue
On April 20th, Oracle agreed to buy Sun Microsystems. Yet the merger has not yet occurred.
The deal was approved by the US authorities in August, but in September the European Union announced that it was not yet ready to approve the deal, and wanted more time to review the particulars.
The issue appears to involve Oracle's likely treatment of the MySQL database product, which Sun purchased in January 2008.
Oracle's relationship with MySQL has a long history; Oracle purchased part of the MySQL product line, namely the InnoDB storage engine, in 2005, although four years later commentators are still discussing the results of that transition.
Now, over the weekend, Michael Widenius has released his opinions on the Internet; namely, that the MySQL product needs to be spun off in order for the Oracle/Sun merger to be approved.
I believe that the EU has given itself until January 2010 to make a decision, so this public discussion about the future of MySQL will probably continue to make news all fall.
Update: This morning's New York Times has an analysis of the current state of the deal by "The Deal Professor".
Update 2: An interesting essay with various links and comments.
The deal was approved by the US authorities in August, but in September the European Union announced that it was not yet ready to approve the deal, and wanted more time to review the particulars.
The issue appears to involve Oracle's likely treatment of the MySQL database product, which Sun purchased in January 2008.
Oracle's relationship with MySQL has a long history; Oracle purchased part of the MySQL product line, namely the InnoDB storage engine, in 2005, although four years later commentators are still discussing the results of that transition.
Now, over the weekend, Michael Widenius has released his opinions on the Internet; namely, that the MySQL product needs to be spun off in order for the Oracle/Sun merger to be approved.
I believe that the EU has given itself until January 2010 to make a decision, so this public discussion about the future of MySQL will probably continue to make news all fall.
Update: This morning's New York Times has an analysis of the current state of the deal by "The Deal Professor".
Update 2: An interesting essay with various links and comments.
Wednesday, October 14, 2009
Curious use of 'synchronized(this)'
I've been running across a somewhat odd coding pattern recently.
I keep seeing code that looks like this:
I'm trying to figure out, is this at all different from:
I understand the difference in the case where the synchronized(this) { } code block does not contain the entire body of the method, because then it is indicating that some, but not all, of the code in the method is synchronized on this object instance.
But in the case that I showed above, where the synchronized(this) { } code block includes the entire body of the method, is there any difference between that and just declaring the method synchronized?
I keep seeing code that looks like this:
public void myMethod(args...)
{
synchronized(this)
{
// method code here...
}
}
I'm trying to figure out, is this at all different from:
public synchronized void myMethod(args...)
{
// method code here...
}
I understand the difference in the case where the synchronized(this) { } code block does not contain the entire body of the method, because then it is indicating that some, but not all, of the code in the method is synchronized on this object instance.
But in the case that I showed above, where the synchronized(this) { } code block includes the entire body of the method, is there any difference between that and just declaring the method synchronized?
Literature
I'm currently quite caught up in Thomas Foster's How to Read Literature like a Professor.
I stumbled upon this book by accident, and was in the perfect position to be intrigued by it. Although I studied mathematics in school, and am a software guy by profession, my mother, my wife, and my eldest daughter are all literature majors, and I've always been fascinated by how they approach the books they read.
So I've been working my way through Foster's book, and enjoying it quite a bit.
There are parts of the book which are hard for me to get a lot of value from, because he uses as examples works that I'm not very familiar with: I never read any Joyce, or any Eliot, and had never even heard of Angela Carter. So there are significant parts of Foster's book that I just skim through and try to absorb what I can.
But other parts have a real resonance for me. For example, I was reading the section on water and rain. Although Foster discusses how these elements are handled in Joyce's Ulysses, and in some Thomas Hardy novels that I haven't read, I could still immediately comprehend Foster's points about various ways that authors use the themes of water and rain in their work:
And now, back to the world of software :)
I stumbled upon this book by accident, and was in the perfect position to be intrigued by it. Although I studied mathematics in school, and am a software guy by profession, my mother, my wife, and my eldest daughter are all literature majors, and I've always been fascinated by how they approach the books they read.
So I've been working my way through Foster's book, and enjoying it quite a bit.
There are parts of the book which are hard for me to get a lot of value from, because he uses as examples works that I'm not very familiar with: I never read any Joyce, or any Eliot, and had never even heard of Angela Carter. So there are significant parts of Foster's book that I just skim through and try to absorb what I can.
But other parts have a real resonance for me. For example, I was reading the section on water and rain. Although Foster discusses how these elements are handled in Joyce's Ulysses, and in some Thomas Hardy novels that I haven't read, I could still immediately comprehend Foster's points about various ways that authors use the themes of water and rain in their work:
- As references to the Noah's Ark story from the Bible, and with it the notion of water as a spiritual cleansing,
- As references to more physical notions of clean and dirty,
- As mood-setting mechanisms,
- As plot devices, for example Foster shows how Hardy uses rain to force two characters to meet in an unlikely fashion.
- In Marilynne Robinson's Housekeeping, we have of course the lake, and the bridge over the lake, but also a great flood.
- In Zora Neale Hurston's Their Eyes Were Watching God, there is a climactic scene with a hurricane.
- In E.M Forster's A Passage to India, there is a major scene near the end with boating on the lake, with fireworks and various discussions about the role that water and floods play in India.
And now, back to the world of software :)
Tuesday, October 13, 2009
Derby-3002
The initial version of ROLLUP support is now in the Derby trunk!
I've been working on a project to add support for the ROLLUP style of GROUP BY processing to the Derby engine. This project has been underway for two and a half years!
I finally got to the point where I felt that the code was solid enough to warrant being committed to the trunk, where the entire Derby development community can work with it and continue to improve it.
There is still lots more to do in this area:
It's exciting that, at the same time, Dag Wanvik is also working in this part of Derby, enhancing and extending the OFFSET/FETCH, ROW_NUMBER, and other window-related features. Hopefully we will find some users of Derby who want to take advantage of these new features.
Even if we don't, though, I learned an enormous amount about the internals of the Derby SQL execution engine by doing this work, so I'm happy just to have made it this far.
I've been working on a project to add support for the ROLLUP style of GROUP BY processing to the Derby engine. This project has been underway for two and a half years!
I finally got to the point where I felt that the code was solid enough to warrant being committed to the trunk, where the entire Derby development community can work with it and continue to improve it.
There is still lots more to do in this area:
- I need to write documentation
- There are already some open bugs, specifically involving the nullability of columns when a ROLLUP query is used as a sub-query of a larger query (!)
- The initial implementation only implements part of the SQL standard, so I need to understand where the current implementation falls short of the standard, and what it will take to fully implement the standard.
It's exciting that, at the same time, Dag Wanvik is also working in this part of Derby, enhancing and extending the OFFSET/FETCH, ROW_NUMBER, and other window-related features. Hopefully we will find some users of Derby who want to take advantage of these new features.
Even if we don't, though, I learned an enormous amount about the internals of the Derby SQL execution engine by doing this work, so I'm happy just to have made it this far.
What is DB2 pureScale?
Where can I find more detailed information about DB2 pureScale? The press release doesn't contain very much by way of technical background, and the IBM web sites don't seem to have much more (yet?).
Wednesday, October 7, 2009
Snap dumps
While working on the mystery of the extra database read, I added the beginnings of a simple "snap dump" capability to the object-relational mapping library that I maintain at work.
I'm not sure how common the notion of snap dumps is anymore.
I learned about snap dumps 30 years ago, when I was working on mainframes; these dumps were, and perhaps still are, quite common in the mainframe world.
I think that the word "snap" may come from "snapshot", although I remember that IBM mainframe guys also had some sort of tortued acronym for them. The idea of a snap dump is:
For now, I'm pleased to have some basic infrastructure in place, since a primary rule of agile programming is to get something simple that works, then evolve it later.
And, best of all, I think I found the cache bug! The snap dump didn't directly show me the problem, but it ruled out a number of other possibilities, and finally the (obvious all along) answer was right there, staring me in the face.
It's always a great feeling to find the bug, though I also feel moderately foolish that it eluded me for so long.
That's just the way bugs are.
I'm not sure how common the notion of snap dumps is anymore.
I learned about snap dumps 30 years ago, when I was working on mainframes; these dumps were, and perhaps still are, quite common in the mainframe world.
I think that the word "snap" may come from "snapshot", although I remember that IBM mainframe guys also had some sort of tortued acronym for them. The idea of a snap dump is:
- It's initiated by the application software, not by the operating system itself
- It contains information about the contents of program memory which is particularly relevant to the application itself (as opposed to an exhaustive dump of all of the known memory)
- It is often formatted and organized for direct reading by developers; that is, it is emitted in text form, not binary form
- It is intended for post-mortem diagnosis of serious internally-detected error conditions
For now, I'm pleased to have some basic infrastructure in place, since a primary rule of agile programming is to get something simple that works, then evolve it later.
And, best of all, I think I found the cache bug! The snap dump didn't directly show me the problem, but it ruled out a number of other possibilities, and finally the (obvious all along) answer was right there, staring me in the face.
It's always a great feeling to find the bug, though I also feel moderately foolish that it eluded me for so long.
That's just the way bugs are.
Increasing the code weight, just slightly
There's an old computer joke, which was wonderfully executed in a Dilbert strip some years ago.
Three engineers are reminiscing about the good old days:
I was a bit worried about the effect on the code size, but since a 0 is considerably lighter than a 1, I'm feeling fairly sanguine.
Three engineers are reminiscing about the good old days:
- The first says "in my day, we had to write everything in assembly language"
- The second says "in my day, we had to write everything in 1's and 0's"
- The third says "you had 1's? all we had were 0's!"
(The point of the fix is that the integer argument to the BaseCache constructor specifies the cache size; the default is 500, and 0 means "unlimited". For this particular instance of this particular cache, we wish the cache to be unlimited.)
65c65
< private BaseCache tCache = new BaseCache();
---
> private BaseCache tCache = new BaseCache(0);
I was a bit worried about the effect on the code size, but since a 0 is considerably lighter than a 1, I'm feeling fairly sanguine.
Monday, October 5, 2009
On the non-scalability of the Ant fileset task
There's an old grade school joke: "will everybody who's not here please raise your hands!"
We've been fighting with a fairly annoying problem in our build scripts: build jobs have been failing with a fairly terse error:
Well, that didn't give us much information, so we tried running with 'ant -debug', and we got a little bit more:
This wasn't a lot better, unfortunately, as the code in Project.java is afflicted with one of the great sins of Java coding: wrapping the inner exception and losing the good stuff:
All of the good information is in the inner Throwable, but Ant only reports the information from the BuildException.
Sigh.
Eventually, I figured out that the error was coming from what I thought was a fairly simple Ant target, to remove unwanted JUnit output from my build tree:
At this point my colleague Tom got the great idea of re-running the Ant command with the -XX:+HeapDumpOnOutOfMemoryError flag so that we could get a memory dump when we ran out of memory.
Once we had the memory dump, a few minutes with the ever-wonderful Eclipse Memory Analyzer tool made the problem obvious.
It was clear that running this target uses memory proportional to the size of my entire source tree, rather than using memory proportional to the number of JUnit test output files in my tree, which is what I expected.
Reading the heap dump, we discovered that all the memory was consumed by the 'filesNotIncluded' and 'dirsNotIncluded' Vector objects in the DirectoryScanner class.
That's right: in addition to computing the set of files that match the pattern that I specified in my fileset specification, Ant is also computing the set of files that don't match the pattern.
Will everyone who is not here please raise their hands?
Is there actually a use for this information? I've never seen this behavior from the Ant fileset task, and right now it just seems annoying, but perhaps I'm just unaware of the beneficial reason for having it?
We've been fighting with a fairly annoying problem in our build scripts: build jobs have been failing with a fairly terse error:
BUILD FAILED
java.lang.OutOfMemoryError
Well, that didn't give us much information, so we tried running with 'ant -debug', and we got a little bit more:
BUILD FAILED
java.lang.OutOfMemoryError
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1225)
at org.apache.tools.ant.Project.executeTarget(Project.java:1185)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:40)
at org.apache.tools.ant.Project.executeTargets(Project.java:1068)
at org.apache.tools.ant.Main.runBuild(Main.java:668)
at org.apache.tools.ant.Main.startAnt(Main.java:187)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:246)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:67)
Caused by: java.lang.OutOfMemoryError
--- Nested Exception ---
java.lang.OutOfMemoryError
This wasn't a lot better, unfortunately, as the code in Project.java is afflicted with one of the great sins of Java coding: wrapping the inner exception and losing the good stuff:
} catch (Throwable ex) {
if (!(keepGoingMode)) {
throw new BuildException(ex);
}
thrownException = ex;
}
All of the good information is in the inner Throwable, but Ant only reports the information from the BuildException.
Sigh.
Eventually, I figured out that the error was coming from what I thought was a fairly simple Ant target, to remove unwanted JUnit output from my build tree:
<target name="cleanTests">
<delete>
<fileset dir="${SRCROOT}" includes="**/TEST-*.txt">
</delete>
</target>
At this point my colleague Tom got the great idea of re-running the Ant command with the -XX:+HeapDumpOnOutOfMemoryError flag so that we could get a memory dump when we ran out of memory.
Once we had the memory dump, a few minutes with the ever-wonderful Eclipse Memory Analyzer tool made the problem obvious.
It was clear that running this target uses memory proportional to the size of my entire source tree, rather than using memory proportional to the number of JUnit test output files in my tree, which is what I expected.
Reading the heap dump, we discovered that all the memory was consumed by the 'filesNotIncluded' and 'dirsNotIncluded' Vector objects in the DirectoryScanner class.
That's right: in addition to computing the set of files that match the pattern that I specified in my fileset specification, Ant is also computing the set of files that don't match the pattern.
Will everyone who is not here please raise their hands?
Is there actually a use for this information? I've never seen this behavior from the Ant fileset task, and right now it just seems annoying, but perhaps I'm just unaware of the beneficial reason for having it?