49371 – Enormous monolithic source roots too slow to work on

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 49371 - Enormous monolithic source roots too slow to work on

Summary: Enormous monolithic source roots too slow to work on

Status:	RESOLVED INVALID

Alias:	None

Product:	java
Classification:	Unclassified
Component:	Source (show other bugs)
Version:	4.x
Hardware:	All All

Importance:	P3 blocker (vote)
Assignee:	Tomas Zezula

URL:
Keywords:	PERFORMANCE

Duplicates (4):	52112 64495 70181 89628 (view as bug list)
Depends on:	51151 54065 87968
Blocks:	41535 59792
	Show dependency tree

Reported:	2004-09-21 17:34 UTC by mcico
Modified:	2008-02-19 18:44 UTC (History)
CC List:	5 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Log file (26.60 KB, application/x-gzip) 2007-01-08 23:18 UTC, Jesse Glick	Details
New log file showing various warnings (43.69 KB, text/plain) 2007-01-09 18:56 UTC, Jesse Glick	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description mcico 2004-09-21 17:34:11 UTC

I tried to import a large source base into a 4.0
project, and it was basically not practical.

The problem has to do with how NB 4.0 allows the
specification of src packages and how our project
src base is laid out.  The top level package dirs
are parallel to non-java and lib dirs, and so
there's no clean "src" vs. "lib" separation. 
Since the tree is large, I end up with a huge
number of elements in the package tree, and the
IDE quickly bogs down.

Ideally, it would be more usable if src directory
selection followed an include/exclude semantics,
similar to an Ant fileset.  Then we can limit the
# of src packages that we deal with, and make the
project tree more usable.

Comment 1 David Simonek 2004-09-22 11:31:27 UTC

Hello mcico,
I've changed your report from enhancement to lower priority defect and
I'm passing it to project guys.

However, please add more specific numbers about how your source base
looks. How many top packages? How many subdirs? How big is your source
base? overall? etc. so that we can at least simulate your working
context and see what we could do.

Also I don't quite understand your include/exclude suggestions, but I
hope project guys will understand.

Comment 2 Jesse Glick 2004-09-22 15:27:24 UTC

Re. "the top level package dirs are parallel to non-java and lib dirs,
and so there's no clean "src" vs. "lib" separation." - don't know what
this means exactly, no example given.

Re. OOME - should be filed separately with precise steps to reproduce
from scratch. Or read release notes re. increasing your memory - some
configurations involving a lot of files simply require a larger max heap.

Re. include/exclude semantics - no plans to implement. Doesn't match
javac's semantics well (javac would not honor your excludes anyway).
Prefer to address concrete problems by other means. You may however
look at the Maven project type (mevenide.codehaus.org) which may
support such things, I am not sure.

Re. package view being unwieldy - try unsupported switch in issue #42151.

Comment 3 mcico 2004-09-23 02:43:28 UTC

By large project, I meant a directory structure with 12K+ directories
and over 73K+ files.

The directory structure is like

   src
    |-wls
       |- 3rdparty
       |- weblogic  (actual product src code)
       |- coconut
       |- infra
       |- env
       (119 top-level directories)

"weblogic" is the only dir here that contains product src code.  If I
point to "src/wls" as the root to maintain the package structure, I
end up with the conditions I mention, where I end up with hundreds and
hundreds of "packages" in the project view, and the system bogs down, etc.

If I point to "src/wls/weblogic" as the root, to minimize the number
of files scanned, the packages are incorrectly labeled in the tree.

So, either the former or the latter needs to be addressed.  The latter
issue is easier to resolve, as the package structures in the IDE tree
view should be derived from the src code, and not the path to the files.

By "include/exclude" semantics, I'm not referring to how the code gets
compiled.  I'm strictly referring to how the project setup is used to
scan a directory structure and discover the source files and resources
in the project.  That is how I would see this being used to implement
a solution to the first problem.

Comment 4 Jesse Glick 2004-09-23 04:43:48 UTC

Would require extended API to inform Java parser to ignore certain
directories - currently project system just tells it which folders are
source roots and the Java scanning infrastructure creates databases
based on that. Potentially complicated, would require API changes; no
plans for 4.0.

If at all possible, you are encouraged to break apart the monolithic
source tree into logical units (projects or binary JARs) that can be
considered separately. Of course, projects with an existing fixed
source structure cannot generally do this.

Comment 5 Jesse Glick 2004-09-23 04:45:03 UTC

Reassigning to Tomas since this would require (as yet unstudied)
changes to the Classpath API.

Comment 6 Tomas Zezula 2004-09-27 12:55:49 UTC

This seems rather as an enhancement. The issue requires extension of
ClassPath API/SPI as Jesse mentioned above. This will not be done in
NB 4.0 (too late for such a change),

Comment 7 Jesse Glick 2004-09-28 15:57:49 UTC

Related to issue #49026.

Comment 8 Jesse Glick 2004-12-09 18:33:41 UTC

*** Issue 52112 has been marked as a duplicate of this issue. ***

Comment 9 Jesse Glick 2004-12-09 18:46:41 UTC

Martin your opinion on this issue would be appreciated. We have
repeatedly heard that this issue is a blocker for some people
migrating to 4.x (usually people working on huge projects that for
whatever reason cannot be reorganized into a more digestible directory
structure): unlike in 3.x, it is impossible to simply ignore unwanted
portions of a huge source tree, because the classpath scanner only
deals with complete source roots. There are other related issues (in
building and in display) which the projects team would be able to
resolve with a modest amount of work, I think, but without fixing the
scanning it would be pointless.

Is there any possibility to address this in javacore, say in F?
Probably this would involve adding some kind of refinement to the
ClassPath API which would allow GlobalPathRegistry to include only
portions of a source root (e.g. a list of named package/wildcard
includes or excludes). Javacore would then need to create an
independent scanner database for each source root fragment, I guess,
or leave portions of a single scanner database intentionally empty (to
be filled on demand). If you did e.g. a refactoring operation, only
classes within included packages would be considered. Opening a
database should load only metadata applicable to requested packages,
even if metadata were available on disk for excluded packages, to
conserve memory.

Comment 10 Martin Matula 2004-12-09 20:13:05 UTC

It should be quite easy to implement some kind of filtering per
classpath element URL in the scanner (i.e. restricting which
subdirectories under a given URL should be scanned and which should
not). If there is an API where we could send root URL and relative
name of a directory and the API would return true or false indicating
whether it should be ignored or not, it could be a matter of hours to
make our scanner use this.
Being able to have multiple CP roots with the same URL but different
filters applied on it would be more tricky.

Comment 11 Jesse Glick 2004-12-10 22:02:05 UTC

Based on Martin's encouraging comments, I am marking this a target for
F. Have heard from more than one source that it is important for
developers working on massive legacy projects with poor layout.

More to Martin: the way I am currently thinking it would work would be
that

1. GlobalPathRegistry would have the ability to register (or
unregister) a ClassPath with a list of package includes (i.e. resource
prefixes such as "org/foo/whatever/"). This would be all that would be
required on the SPI side - the project would handle figuring out what
to register.

2. GPR already automatically removes duplicates at the ClassPath
level, though not at the Entry level. It would need to somehow also
keep track of common Entry URLs and merge includes.

3. From the Javacore side, all you should have to do is ask for a set
of entry URLs to scan, and in each case you should be able to confirm
for each package in turn that it should be included (using some method
in GPR).

Still need to work this out more precisely - have to study the usage
of the GPR API from the Javacore side, in particular in
MergedClassPathImplementation, to see what makes sense. It's not just
about scanning - any operation which goes through all classes in a
root (e.g. Find Usages) would need to check the includes list, to
avoid having any slowdown in operations for unused packages.

Again a caveat: the IDE will not *enforce* that the packages you as a
user decide to include are really self-contained: if you accidentally
refer to classes in an excluded package from an included package,
javac due to its nature will quietly compile against them without
errors and you will have potentially messed up your project (depending
on what your policies are). It is much better to rearrange your
sources into smaller units to enforce your intended dependencies, so
you get the benefit of full error checking (and this scenario will
also work well with NetBeans).

It is technically possible to coerce <javac> into compiling only
included packages from a big source tree in isolation, while ensuring
that error messages link back to the real source files - cf. e.g.
openide/build.xml#do-lib-javac in NB sources - but it is quite ugly
and does not work at all with Jikes, so this is not very advisable for
use in build-impl.xml.

Comment 12 Jesse Glick 2005-01-21 16:46:56 UTC

By the way, undocumented experimental startup option to exclude
certain package trees from classpath scanning (NO EFFECT on Ant builds):

-J-Dorg.netbeans.javacore.ignorePackages="sun sunw
org.foo.generatedclasses ..."

Not tested, use at your own risk, may be removed/changed without notice.

Comment 13 Jesse Glick 2005-12-29 19:24:45 UTC

Obviously not for 5.0.

Comment 14 Jesse Glick 2006-01-05 17:04:04 UTC

*** Issue 70181 has been marked as a duplicate of this issue. ***

Comment 15 Tomas Zezula 2006-01-24 17:37:19 UTC

*** Issue 64495 has been marked as a duplicate of this issue. ***

Comment 16 Jesse Glick 2006-11-20 01:28:44 UTC

*** Issue 89628 has been marked as a duplicate of this issue. ***

Comment 17 Jesse Glick 2006-11-29 20:04:58 UTC

To avoid misinterpretation, treating this as purely a performance issue with
large source roots treated as single compilation units. Use issue #49026 for
issues relating to treating a source root as multiple compilation units.

Simple to characterize and reproduce. Run a NB dev build on a fresh userdir.
Make a Java project from existing sources and point to j2se/src/share/classes
from a JDK 6 source checkout. Takes several minutes just to unblock EQ (someone
is calling ProjectUtilities.openAndSelectNewObject, have not fully
investigated). Then you get an OOME.

NB 5.5 fares a little better - initial scan completes with no OOME. But opening
some files such as JTable.java will still yield OOME (if the project is not
compiled).

Comment 18 Jesse Glick 2006-11-29 23:21:19 UTC

I have corrected some miscellaneous performance bugs related to creating and
opening the project. So now the OOME from scanning the source root can be seen
more clearly.

Comment 19 Jesse Glick 2006-12-05 21:35:54 UTC

In 061201, with -Xmx256m, opening nbbuild w/ all deps, scanning did not complete:

WARNING [global]: Not enough memory to compile folder:
/space/src/nb_all/j2ee/ejbjarproject/src

This folder has 80 sources. What's wrong?

After the failed scanning, Find Usages would run, but give incomplete results.
This is pretty bad IMHO - the only reason I knew something went wrong with
scanning was because (1) the "Preparing usages view" task was still running (and
could not be cancelled!); (2) I knew for a fact there were more usages than were
displayed.

Comment 20 Jan Lahoda 2006-12-06 12:53:35 UTC

On a custom build from recent sources, I was able to scan nbbuild (recent
checkout of stable_nowww) with all dependencies with default memory settings
(-Xmx128m) on both JDK1.5.0_09 and JDK1.6 b104. Is there anything special in
your setup? Any special additional modules?

Comment 21 Jesse Glick 2006-12-06 16:53:34 UTC

My attempted scan of nbbuild was in my regular userdir, which has lots of
additional modules. I don't know which ones might have significantly affected
memory usage, of course. The scan of the JDK was in a fresh userdir with default
module config.

Comment 22 Tomas Zezula 2007-01-08 16:12:56 UTC

The cache update was redesigned in the way that the signature files are created
as far as possible. Now I am able to open the JDK project (j2se/share/classes +
j2se/solaris/classes) with 128MB in less than 4 min.

Comment 23 Jesse Glick 2007-01-08 23:16:24 UTC

Really?

I run the IDE (fresh dev from sources, new userdir) under JDK 7 and make New
Project from Existing Sources. Select j2se/src/share/classes only.

time       event
----------------
 0.00      start
 0.30      wiz dialog closes, projects window appears
 1.30 (?)  all packages displayed in projects window, Retouche scan continues
 5.00      IAEs start being thrown (see attachment)
 8.45      still going, at ~300 exceptions I give up and kill IDE

Similar attempt in Eclipse 3.2.1:

time       event
----------------
 0.00      start (pressed Finish in wizard)
 0.20      project opened, building workspace
 3.00      "analyzing sources"
 3.40      "building workspace"
  ... here it gets confused by **/.svn/text-base files...
 7.30      I lose patience and kill at 25% completion; IDE mostly unresponsive

Now in NB 5.5:

time       event
----------------
 0.00      start
 0.52      project opened, scanning classpaths
 1.54      scanning complete
 3.45      I have JTable.java open, Navigator populated, green box
 4.47      I have found all usages of ComponentOrientation
 6.00      IDE hung at 100% CPU, has to be killed

Comment 24 Jesse Glick 2007-01-08 23:18:00 UTC

Created attachment 37155 [details]
Log file

Comment 25 Tomas Zezula 2007-01-09 12:17:39 UTC

The IAE is evaluated by Honza, adding Honza to CC.

Comment 26 Jan Lahoda 2007-01-09 15:01:41 UTC

The IAE should be fixed now. See issue #91745 for more information.

Comment 27 Jesse Glick 2007-01-09 18:55:15 UTC

So trying again w/ today's build:

time       event
----------------
 0.00      start
 0.15      wiz dialog closes, projects window appears
 1.12      all packages displayed in projects window, Retouche scan continues
 2.53      "compiling classes", start to get some warnings on console (attached)
 6.30      scan finishes
 7.40      I have JTable.java open, Navigator populated, yellow box
 8.30      I ask for all usages of ComponentOrientation
 9.05      after freezing for a while, I get an OOME

So... better than it was, somewhat worse than 5.5, still not really usable.

BTW part of the frustration with the speed of scanning is due to issue #87968 -
there is no detailed information about the progress of the scan, unlike in 5.5.

Comment 28 Jesse Glick 2007-01-09 18:56:57 UTC

Created attachment 37197 [details]
New log file showing various warnings

Comment 29 Petr Hrebejk 2007-05-06 08:38:56 UTC

What is status of this issue. AFAIK are able to open src/classes and the exclude
mechanism was implemented as well. Should we close it?

Comment 30 Tomas Zezula 2007-05-07 06:59:37 UTC

I think there are still things to solve like interference among package view
children and initial scan. Maybe you can decrease priority.

Comment 31 Petr Hrebejk 2007-05-17 10:04:09 UTC

Fine lowering the priority and removing the plan60 keyword as most of the things
really needed for 6.0 were done.

Comment 32 Tomas Zezula 2007-06-22 07:33:21 UTC

The issue is very broad. The additional performance optimalization I want to do in NB 6.0:
1) Prescan of archives like in NB 5.0
2) Partial parse (parse of changed method only)
3) Force writers of tasks to fix cancel ()

Comment 33 Jesse Glick 2007-06-22 19:07:18 UTC

Issue should perhaps be closed or broken down into more specific perf RFEs.

Comment 34 camerojo 2007-07-26 02:57:16 UTC

I don't know the exact reasons for the performance issues but they need to be taken very seriously.

NetBeans 6.0 is almost unusable in its current state when using a large code base because it periodically disappears for
extended periods(at 100% CPU use). If this is not fixed in 6.0, I don't think that you have a seriously competitive
product, which is a shame since there is a lot of goodwill for this project and a lot of good work already done.

What can we beta testers do to help you track down these performance issues?

Comment 35 Jan Becicka 2008-02-18 16:17:01 UTC

We did several perf improvements in 6.1 timeframe. Including partial reparse.
Is this issue still valid?

Comment 36 Jesse Glick 2008-02-19 18:44:03 UTC

Probably best to close and any specific performance problems still reproducible in 6.1 builds should be filed
independently with details to reproduce.