This function has been around for a long time, but I recently started thinking about it as a possible tool for controlling memory usage. As you'll see, the Sun documentation describes the function as a tool for altering the behavior of String comparisons:
In my particular case, we maintain a large cache of object graphs, where the object data is retrieved from a database. Furthermore, it so happens that these object graphs contain a large number of strings which are used and re-used quite frequently.
- Returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class
String
.When the intern method is invoked, if the pool already contains a string equal to this
String
object as determined by theequals(Object)
method, then the string from the pool is returned. Otherwise, thisString
object is added to the pool and a reference to thisString
object is returned.It follows that for any two strings
s
andt
,s.intern() == t.intern()
istrue
if and only ifs.equals(t)
istrue
.All literal strings and string-valued constant expressions are interned. String literals are defined in §3.10.5 of the Java Language Specification
So I was recently pawing through an enormous memory dump, and I was skimming through the dump of all the active String objects, and I was struck by how there was a lot of duplication, and that made me think of whether or not we were using String.intern appropriately.
So I did some research, and found several quite interesting essays on the topic.
My reaction so far is that:
- Yes, it looks like String intern'ing could really help.
- Unfortunately, the need to potentially configure PermGen space is a bummer.
- And, it seems important to have a really good handle on what strings are worth interning. Too few, and I've just changed a bunch of code to no real effect. Too much, and I've exchanged a memory waste problem for a PermGen configuration problem, plus possibly burdened the VM by making it do more work on allocations for little gain.
And, as we've discussed previously in this blog, memory is becoming cheap and widely available.
So, it doesn't seem to be immediately obvious that intern'ing will be worth it, because in general it seems like a bad strategy to be asking the CPU to be doing more work in order to conserve memory, unless we have a strong reason to believe that we have a lot of memory duplication and the memory savings are either
- so substantial that they will outweigh the extra expense and hassle of managing the intern pool, or
- so substantial that the conservation of that much memory will open up a broad new range of applications for the code (e.g., we can now handle some problem sizes that were just way too large for us to handle without interning).
Are there profiling features that look at a benchmark underway, and analyze whether or not interning would have been useful?
Yes there are commands in the Eclipse Memory Analyzer to find duplicates of Strings.
ReplyDeleteCheck for example http://kohlerm.blogspot.com/2008/05/analyzing-memory-consumption-of-eclipse.html
Regards,
Markus