29339 – A possibly pointless microoptimization in ProxyClassLoader

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 29339 - A possibly pointless microoptimization in ProxyClassLoader

Summary: A possibly pointless microoptimization in ProxyClassLoader

Status:	RESOLVED INVALID

Alias:	None

Product:	platform
Classification:	Unclassified
Component:	Module System (show other bugs)
Version:	3.x
Hardware:	All All

Importance:	P4 blocker (vote)
Assignee:	_ tboudreau

URL:
Keywords:	PERFORMANCE

Depends on:	29834
Blocks:
	Show dependency tree

Reported:	2002-12-05 21:40 UTC by _ tboudreau
Modified:	2008-12-23 08:37 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	TASK
Exception Reporter:

Attachments
Patch file for core/bootstrap/src/org/netbeans/ProxyClassLoader (1.81 KB, patch) 2002-12-05 21:41 UTC, _ tboudreau	Details \| Diff
Another patch, or perhaps proof of my insanity :-) (9.60 KB, patch) 2002-12-16 21:49 UTC, _ tboudreau	Details \| Diff
Test code used to generate numbers from OptimizeIt (7.62 KB, application/x-java) 2002-12-16 22:07 UTC, _ tboudreau	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description _ tboudreau 2002-12-05 21:40:37 UTC

Patch eliminates a few string creations/iterations
in loadClass (I was actually curious if it would
be possible to display a wait cursor when doing
extended classloading, and got sidetracked).

I patched java.lang.String's constructor to see if
it helps;  it eliminates around 8000 string
creations during startup.  Of course, out of about
600000 that's not huge.

Comment 1 _ tboudreau 2002-12-05 21:41:29 UTC

Created attachment 8187 [details]
Patch file for core/bootstrap/src/org/netbeans/ProxyClassLoader

Comment 2 Marian Mirilovic 2002-12-06 12:55:24 UTC

Tim: As I understand it's performance improvement, right ?

Comment 3 David Simonek 2002-12-06 17:58:54 UTC

passing to Jesse, if it's wrong please pass further.

Comment 4 Jesse Glick 2002-12-06 18:39:40 UTC

As I recall I had this exact same thought a few months back, tried
more or less exactly the patch you suggest, and found it did not help
anything or even hurt a little bit (according to OptimizeIt!). I did
not pursue the matter much and I don't remember the exact conclusion,
only that I decided to revert the tentative patch before committing.

Of course it saves String creations. It correspondingly increases
StringBuffer creations. Please analyze for actual total impact on
startup time and memory consumption, otherwise I would not bother with it.

Comment 5 _ tboudreau 2002-12-16 21:49:42 UTC

Created attachment 8354 [details]
Another patch, or perhaps proof of my insanity :-)

Comment 6 _ tboudreau 2002-12-16 22:05:59 UTC

I wanted to get more skilled at working with OptimizeIt, so I took the
opportunity to play with this further.  The numbers are
semi-inconclusive - I wrote a test script which only contained the
original and modified string algorithms for loadString and getPackage.

It looks like the modifications to getPackage might shave a second
or so off of startup time (it's called about 42000 times).  The
modifications to loadClass are more dubious - when profiling with
memory profiling on, the difference is significant.  Without, they
come in about the same - although in both cases, less memory is 
allocated, which means less to collect, though perhaps not
significantly since the young gen gc is very fast.

Here are the numbers:
getPackage - original string code, 42000 iterations: 
Time spent in method:  1445/1255/1445ms
char[] count - 42000 = ~4345K
String count - 84000 = ~3342K
Patched getPackage, 42000 iterations
Time spent in method:  896/898/895
char[] count - 42000 = ~4345K
String count - 42000 = ~1671K

loadClass - original string code, 100000 iterations (there seems
to be some resolution limit to OptimizeIt - with 10000 iterations, the
amount of time spent in the method can vary absurdly widely.
Time spent in method: 5260/5714/5575
char[] count - 20000 - ~1407K
String count - 30000 - ~1172K
Patched loadclass
Time spent in method: 5445/5455/6065
char[] 20000 - ~1329K
String 20000 - ~781K

Who was it who said "put micro trust in microoptimizations?"  Probably
testing in situ would be a better idea - when I start NB in
OptimizeIt, ProxyClassLoader.getPackage at least does show up as a hot
spot.

For anyone interested in the testing code, I'll attach it.

Comment 7 _ tboudreau 2002-12-16 22:07:17 UTC

Created attachment 8355 [details]
Test code used to generate numbers from OptimizeIt

Comment 8 Jesse Glick 2002-12-17 14:54:13 UTC

Please don't work on this issue right now, spend time on something
else... I am investigating other stuff in ProxyClassLoader, which if
it comes to anything might obsolete any work done here - either by
substantially reducing the number of calls to getPackage, or by
completely rewriting the class and rendering a patch to the old
version pretty hard to apply. I suspect there are *structural*
inefficiencies in the class loader (as opposed to local inlinable code
paths like you are examining here), and any structural optimizations
have to be done first. I didn't realize you were actively doing
anything on this class.

The OptimizeIt! results here look interesting, but remember that
comparative measurements done in a profiler are pretty much worthless
- profilers introduce overhead which skews some kinds of results
substantially, and probably HotSpot is not running at its best in this
environment. The profiler is invaluable for *finding* the problem
areas and getting a clue why they might be problematic, but in order
to evaluate a patch you need to do comparative timing runs in a plain
VM. I wrote apisupport's bin/unsupported/time-delta.pl to help with
that sort of thing. It is imprecise, but better than running in a
profiler.

It is especially important to measure results from a patch like this
that makes already-complex code harder to read and potentially buggier
- if there is no measurable improvement, it is better to leave in the
simpler code. Certainly a 1000msec improvement, if true, would be well
worthwhile.

BTW don't catch ArrayIndexOutOfBoundsException - intentionally
throwing an unchecked exception is poor style and probably not good
for efficiency either. Check the array bounds ahead of time.

Comment 9 _ tboudreau 2002-12-17 18:10:07 UTC

Okay - it was more of an academic exercise to get more familiar with
OptimizeIt and learn the real performance characteristics of
String/StringBuffer/allocating char[]s anyway.

>BTW don't catch ArrayIndexOutOfBoundsException - intentionally
>throwing an unchecked exception is poor style and probably not good
>for efficiency either. Check the array bounds ahead of time.

I did this first, and found the check became a minor bottleneck, so
instead decided to catch the exception.  In testing the patch, I set
the array length to 0 so the exception would fire several times when
the array was below capacity.  In that case, the array was typically
grown 4 times during startup;  if the array capacity is set to a 
reasonable value to catch all known cases in the NB codebase, it's
probably worth it.  Unless somebody comes out with a standard 
mandating >120 character class names, doing this test thousands
of times is probably more expensive than throwing one
ArrayIndexOutOfBoundsException, but that should be measured.

I agree, code like this is not terribly readable - it's effectively
doing what an assembly programmer would do in Java code.  For 
critical code, though, it might be worth it.  Some real startup time
measurements are needed to determine if it makes a difference - I 
don't trust the results from OptimizeIt that much either, and saw
enough wild variations in executing lower numbers of iterations of
the methods that I'm not sure higher numbers are meaningful; and
things like cache misses are definitely not covered by OptimizeIt,
but might be an issue with the extra defensive copies of char 
arrays done by creating additional strings - it's jvm impl dependant
but one way to avoid the issue is to not make the copies in the
first place.

Of course, if you can reduce calls to these methods, it might
make it not worth bothering - good luck!

Comment 10 Jesse Glick 2002-12-18 16:07:34 UTC

I know getPackage is called a bunch of times - in issue #28686 I tried
(so far unsuccessfully) to greatly reduce the number of times.
Probably more fruitful to reduce how often it is called or to better
cache it, than to micro-optimize it just yet.

BTW be careful with OptimizeIt! when you see a "hot spot".
JCL.simpleFindClass always shows up as a major time consumer in OI.
What is so damn slow about that method? ...I often wondered. Turns out
if you run with line precision, rather than method, almost all the
time is consumed by the (native) call to defineClass, which is dumped
into the time for simpleFindClass I guess because that is the nearest
enclosing non-native method. Clearly the time here is taken by the VM,
not Java code. So I would suggest double-checking "hot spot" results
by using line resolution.

BTW also re. your last patch -

1. You keep a static char[] and yet do not synchronize against it when
loading from different loader instances. Probably it should be an
instance var (loading from *one* loader is always synchronous).

2. Best to take all the ugly stuff and put it in well-named separate
methods nearer the bottom of the file, covered by precise Javadoc
explaining what the effect of the method call is - otherwise the main
control flow is impossible to see.

Comment 11 Jesse Glick 2003-01-08 18:28:49 UTC

See issue #29834. Re. your last attached patch:

- ProxyClassLoader.loadClass consumes < 0.25% of startup time in my
profiler runs. YMMV

- In my work on #29834, getPackage already gets the
package-with-slashes passed in from elsewhere; no need to recompute
it, probably obsoleting the second half of your patch (insofar as I
can understand it).

Comment 12 _ tboudreau 2003-01-14 12:11:53 UTC

Looks like Jesse's optimization obviates the need for this.

Comment 13 Jesse Glick 2003-01-14 15:14:27 UTC

Well, you can check the first part of your patch again - I did not
touch this code (creation of "foo/bar.class" from "foo.bar"). I
*suspect* that that overhead is marginal compared to other stuff but I
am not sure I ever measured it directly - it just never appeared as a
hotspot in the profiler that I can remember.