This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Patch eliminates a few string creations/iterations in loadClass (I was actually curious if it would be possible to display a wait cursor when doing extended classloading, and got sidetracked). I patched java.lang.String's constructor to see if it helps; it eliminates around 8000 string creations during startup. Of course, out of about 600000 that's not huge.
Created attachment 8187 [details] Patch file for core/bootstrap/src/org/netbeans/ProxyClassLoader
Tim: As I understand it's performance improvement, right ?
passing to Jesse, if it's wrong please pass further.
As I recall I had this exact same thought a few months back, tried more or less exactly the patch you suggest, and found it did not help anything or even hurt a little bit (according to OptimizeIt!). I did not pursue the matter much and I don't remember the exact conclusion, only that I decided to revert the tentative patch before committing. Of course it saves String creations. It correspondingly increases StringBuffer creations. Please analyze for actual total impact on startup time and memory consumption, otherwise I would not bother with it.
Created attachment 8354 [details] Another patch, or perhaps proof of my insanity :-)
I wanted to get more skilled at working with OptimizeIt, so I took the opportunity to play with this further. The numbers are semi-inconclusive - I wrote a test script which only contained the original and modified string algorithms for loadString and getPackage. It looks like the modifications to getPackage might shave a second or so off of startup time (it's called about 42000 times). The modifications to loadClass are more dubious - when profiling with memory profiling on, the difference is significant. Without, they come in about the same - although in both cases, less memory is allocated, which means less to collect, though perhaps not significantly since the young gen gc is very fast. Here are the numbers: getPackage - original string code, 42000 iterations: Time spent in method: 1445/1255/1445ms char[] count - 42000 = ~4345K String count - 84000 = ~3342K Patched getPackage, 42000 iterations Time spent in method: 896/898/895 char[] count - 42000 = ~4345K String count - 42000 = ~1671K loadClass - original string code, 100000 iterations (there seems to be some resolution limit to OptimizeIt - with 10000 iterations, the amount of time spent in the method can vary absurdly widely. Time spent in method: 5260/5714/5575 char[] count - 20000 - ~1407K String count - 30000 - ~1172K Patched loadclass Time spent in method: 5445/5455/6065 char[] 20000 - ~1329K String 20000 - ~781K Who was it who said "put micro trust in microoptimizations?" Probably testing in situ would be a better idea - when I start NB in OptimizeIt, ProxyClassLoader.getPackage at least does show up as a hot spot. For anyone interested in the testing code, I'll attach it.
Created attachment 8355 [details] Test code used to generate numbers from OptimizeIt
Please don't work on this issue right now, spend time on something else... I am investigating other stuff in ProxyClassLoader, which if it comes to anything might obsolete any work done here - either by substantially reducing the number of calls to getPackage, or by completely rewriting the class and rendering a patch to the old version pretty hard to apply. I suspect there are *structural* inefficiencies in the class loader (as opposed to local inlinable code paths like you are examining here), and any structural optimizations have to be done first. I didn't realize you were actively doing anything on this class. The OptimizeIt! results here look interesting, but remember that comparative measurements done in a profiler are pretty much worthless - profilers introduce overhead which skews some kinds of results substantially, and probably HotSpot is not running at its best in this environment. The profiler is invaluable for *finding* the problem areas and getting a clue why they might be problematic, but in order to evaluate a patch you need to do comparative timing runs in a plain VM. I wrote apisupport's bin/unsupported/time-delta.pl to help with that sort of thing. It is imprecise, but better than running in a profiler. It is especially important to measure results from a patch like this that makes already-complex code harder to read and potentially buggier - if there is no measurable improvement, it is better to leave in the simpler code. Certainly a 1000msec improvement, if true, would be well worthwhile. BTW don't catch ArrayIndexOutOfBoundsException - intentionally throwing an unchecked exception is poor style and probably not good for efficiency either. Check the array bounds ahead of time.
Okay - it was more of an academic exercise to get more familiar with OptimizeIt and learn the real performance characteristics of String/StringBuffer/allocating char[]s anyway. >BTW don't catch ArrayIndexOutOfBoundsException - intentionally >throwing an unchecked exception is poor style and probably not good >for efficiency either. Check the array bounds ahead of time. I did this first, and found the check became a minor bottleneck, so instead decided to catch the exception. In testing the patch, I set the array length to 0 so the exception would fire several times when the array was below capacity. In that case, the array was typically grown 4 times during startup; if the array capacity is set to a reasonable value to catch all known cases in the NB codebase, it's probably worth it. Unless somebody comes out with a standard mandating >120 character class names, doing this test thousands of times is probably more expensive than throwing one ArrayIndexOutOfBoundsException, but that should be measured. I agree, code like this is not terribly readable - it's effectively doing what an assembly programmer would do in Java code. For critical code, though, it might be worth it. Some real startup time measurements are needed to determine if it makes a difference - I don't trust the results from OptimizeIt that much either, and saw enough wild variations in executing lower numbers of iterations of the methods that I'm not sure higher numbers are meaningful; and things like cache misses are definitely not covered by OptimizeIt, but might be an issue with the extra defensive copies of char arrays done by creating additional strings - it's jvm impl dependant but one way to avoid the issue is to not make the copies in the first place. Of course, if you can reduce calls to these methods, it might make it not worth bothering - good luck!
I know getPackage is called a bunch of times - in issue #28686 I tried (so far unsuccessfully) to greatly reduce the number of times. Probably more fruitful to reduce how often it is called or to better cache it, than to micro-optimize it just yet. BTW be careful with OptimizeIt! when you see a "hot spot". JCL.simpleFindClass always shows up as a major time consumer in OI. What is so damn slow about that method? ...I often wondered. Turns out if you run with line precision, rather than method, almost all the time is consumed by the (native) call to defineClass, which is dumped into the time for simpleFindClass I guess because that is the nearest enclosing non-native method. Clearly the time here is taken by the VM, not Java code. So I would suggest double-checking "hot spot" results by using line resolution. BTW also re. your last patch - 1. You keep a static char[] and yet do not synchronize against it when loading from different loader instances. Probably it should be an instance var (loading from *one* loader is always synchronous). 2. Best to take all the ugly stuff and put it in well-named separate methods nearer the bottom of the file, covered by precise Javadoc explaining what the effect of the method call is - otherwise the main control flow is impossible to see.
See issue #29834. Re. your last attached patch: - ProxyClassLoader.loadClass consumes < 0.25% of startup time in my profiler runs. YMMV - In my work on #29834, getPackage already gets the package-with-slashes passed in from elsewhere; no need to recompute it, probably obsoleting the second half of your patch (insofar as I can understand it).
Looks like Jesse's optimization obviates the need for this.
Well, you can check the first part of your patch again - I did not touch this code (creation of "foo/bar.class" from "foo.bar"). I *suspect* that that overhead is marginal compared to other stuff but I am not sure I ever measured it directly - it just never appeared as a hotspot in the profiler that I can remember.