47415 – [perf] Scanning Project Classpaths takes too long

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 47415 - [perf] Scanning Project Classpaths takes too long

Summary: [perf] Scanning Project Classpaths takes too long

Status:	RESOLVED DUPLICATE of bug 43909

Alias:	None

Product:	java
Classification:	Unclassified
Component:	Unsupported (show other bugs)
Version:	4.x
Hardware:	All All

Importance:	P2 blocker with 1 vote (vote)
Assignee:	issues@java

URL:
Keywords:	PERFORMANCE

Duplicates (3):	47669 48424 49291 (view as bug list)
Depends on:	50947
Blocks:
	Show dependency tree

Reported:	2004-08-18 19:29 UTC by jhoffman
Modified:	2007-09-26 09:14 UTC (History)
CC List:	6 users (show)

See Also:
Issue Type:	ENHANCEMENT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description jhoffman 2004-08-18 19:29:46 UTC

The first time a project is created in the IDE the
Scanning Project Classpaths dialog takes a long
time (almost 30 seconds on my 3.2GHz PC with 1.5GB
of memory).

Creating the WindowsSampleApplication shows the
Scanning Project Classpaths dialog for almost 10
seconds.

With three projects loaded (a simple web app, the
TomcatJSPSampleProject and the
WindowsSampleApplication), subsequent IDE launches
show the Scanning Project Classpaths dialog for 13
seconds.

There must be a way to cache this information so
that the dialog does not have to appear.  Project
information rarely changes between shutdown of the
IDE and its next restart.

Comment 1 Martin Matula 2004-08-18 21:01:05 UTC

Jeff, we have spent a lot of time optimizing this. The numbers
(especially for subsequent IDE runs) do not depend much on the
processor speed and amount of memory, so the numbers should be similar
even on slower machines - most of the time is spent in I/O. During the
scanning the cached indexes (that were created during the longer
initial scanning) are deserialized - so we already do the caching, you
suggested. I don't see how this could be made faster (unless we will
store even less information we do now - but I don't see what else we
could cut). And since the information from the indexes is used all the
time by the parser (whenever somebody opens a file, etc.) there is no
clean way of making this a background task. Could you please describe
what exactly you consider being a defect (where do you see some
inefficiencies, is the initial scanning the problem, or is it
subsequent starts of the IDE)? And what would be the acceptable
numbers? And why is this a P2?

Comment 2 jhoffman 2004-08-18 21:58:53 UTC

Thank you for your explanation.  However, I see this as a defect
because none of our competitors exhibit this behavior (but they
provide the same functionality).  It will be considered a defect by
our users, who will not expect to wait an extra 30 seconds the first
time they open a project and an extra 15 seconds when they reopen the
IDE.  I also consider it a problem for any product that intends to use
the NetBeans platform.

Comment 3 Martin Matula 2004-08-18 22:45:32 UTC

> none of our competitors exhibit this behavior

AFAIK e.g. IntelliJ IDEA does this and it takes even more time.

Anyway, you haven't answered my questions regarding the acceptable
numbers. How should they be determined? How can we measure whether we
can close this issue?

To me defect=bug=something unintended. In this case the scanning was
intended, agreed upon with our UI team and it currently works as
designed. So getting rid of it or changing the way how it is presented
(making it non-modal, or whatever) looks to me more like an
enhancement request.

Comment 4 jhoffman 2004-08-18 22:53:47 UTC

Some products built on the NetBeans platform have other competitors
besides Java IDEs.  For example, MS Visual Studio.  The platform must
be suitable for different user sets.

However, the requirement is that this functionality should be a
background task, and the progress indicator should only be surfaced if
the user performs an action that would require the scanning to be
done.  The measurement I would use is that when I open the IDE, there
is no additional "scanning" visible to the user.  I would also expect
that the delay on opening the first project in the IDE would be
removed.  If possible, this scanning should take place on
installation, when a 30 second delay would be less intrusive.

Comment 5 Martin Matula 2004-08-18 23:25:46 UTC

> However, the requirement is that this functionality should be a
> background task, and the progress indicator should only be surfaced if
> the user performs an action that would require the scanning to be
> done.

As I explained before, scanning = deserialization of indexes.
The indexes are used when a file is parsed. A file needs to be parsed
when it is opened in the editor (to show code folding, override
annotations, navigation combo box, etc.) This means the scanning needs
to take place immediately after IDE's startup if the files were open
in the editor during its last shutdown (which is quite common), since
they will be reopened during the startup. Is this acceptable?
In fact the parsing could theoretically work without (or with a very
limited set of) the indexes, but there  are some known problems with
that and I am affraid that there is even more unknown problems.
Override annotations won't work until the indexes are complete. If a
user makes changes to any of the files while the files are being
scanned it may cause inconsistencies. The files that were parsed
before the indexes were complete need to be reparsed after the
scanning finishes to make sure all the identifiers in those files are
resolved (to enable go to source/override annotations/refactoring
features, etc.). Code completion won't work.
I believe this can be solved, but not for 4.0 - we have been
feature-frozen for 2 months now, the beta 1 is almost out and I
consider this being quite risky. Is implementing this after 4.0 OK?

Re the initial parsing - we will very likely distribute the
pre-scanned JDK with the IDE or do the scanning in the installer, so
this should be eliminated. Even if we don't do it for NetBeans, the
installer solution can be easily done for other products building on
top of NetBeans.

Comment 6 jhoffman 2004-08-19 00:14:57 UTC

If there are approaches to eliminate the 30 second delay on first
project creation, then they need to be documented for those using the
platform -- and this bug can be changed to an enhancement if the only
issue left is the scanning on subsequent IDE launches.  However, I am
still concerned about the length of time that takes, so I suggest that
there be a plan to correct this for future promotions.  Thanks!

Also -- just an idea -- would it be possible to scan only the file (or
perhaps the associated package or project) that is currently visible
when the IDE is launched?  When another file is surfaced, then scan
it's associated package or project.  That way the user would
experience smaller delays at more predictable intervals.

Comment 7 Peter Zavadsky 2004-08-19 00:40:25 UTC

Is it possible to have this scenario:

1) Scan the stuff on background, informing user about it by modeless
dialog.

2) Let user do the work which doesn't depend on the scanned/parsed
data. I believe opening source, plain editing should be available. Of
course without the code completion, annotations support, navigation
toolbar etc., but that should be OK. (I guess code folding shouldn't
depend on scanning, does it?)

3) The services dependent on scanned/parsed data would be available
just after the scanning is finished. 
Until that time if user requests such a service, e.g. when she tries
to invoke code completion, some msg dialog would appear informing that
scanning is still in progress and that the service is not available yet.


Would it be possible and acceptable this kind of solution?

Comment 8 Martin Matula 2004-08-19 08:43:42 UTC

Jeff, thanks for understanding.

Jeff and Peter, as I said, it is possible to work with the limited set
of indexes. What I meant by that is what Jeff asks for - scan just the
visible files eagerly, rest can be done as a background task. We are
converging to that and already implemented some things that are
necessary to achieve it - so now the infrastructure theoretically
supports it. But as I said, there is still a few (solvable) problems
related to such change and it would have a significant impact on the
UI, not just in javacore module, so it is too risky to implement that
for 4.0 now that the beta 1 is almost out.

Comment 9 Antonin Nebuzelsky 2004-08-19 15:00:06 UTC

> What I meant by that is what Jeff asks for - scan just the
> visible files eagerly, rest can be done as a background task. We are
> converging to that...

I don't like this idea at all. This is how the old code-completion
database was updated and what warmup tasks are doing on the background
right after start and it caused too many UI responsiveness problems
right after IDE start. Hiding the update process into background and
pretending that the IDE is ready for work caused a lot disappointement
to users with slow machines, who could not work with the IDE anyway
until the background tasks after start finished.

(see for example user comments in the issue #40232)

Comment 10 Martin Matula 2004-08-23 08:37:24 UTC

*** Issue 47669 has been marked as a duplicate of this issue. ***

Comment 11 _ rkubacki 2004-09-14 08:35:33 UTC

*** Issue 48424 has been marked as a duplicate of this issue. ***

Comment 12 _ gtzabari 2004-09-14 14:41:43 UTC

Are you absolutely certain that on subsequent "scanning for changes in
X.jar" the majority of time is spent deserializing indexes and not
simply rescanning the contents of the JAR? If not, did you try using
MD5 hashes against the JAR to prevent needless rescanning?

If the index deserialization is a problem, consider bundling an
embedded DB with Netbeans, like hsql. I assume it would be much faster
for deseralizing than the J2SE. Furthermore, it might make all of
Netbeans much faster. There are also other embedded DBs out there that
should be even faster than hsql.

Comment 13 Martin Matula 2004-09-19 09:54:51 UTC

To Gili: We do use an embeded b-tree database. But not for the indexes
that get deserialized during the startup, since in their case we need
to read them as a whole - the fastest way to do it (AFAIK) to read
them sequentially using a buffered stream.
We do not use MD5 or any other hashcode, since its computation would
take longer than timestamp matching, which is sufficient. However, you
can see detailed scanning of classpaths even after restart under some
circumstances:
1) if you do a "hard" shutdown of netbeans (e.g. using Ctrl+C) - in
this case the serialized indexes may not be consistent (since the IDE
was not shutdown properly so the storage caches may not have been flushed)
2) if you install newer build of netbeans (since we were still making
some minor changes to the storage format)
3) if you rebuild the jar

Comment 14 _ gtzabari 2004-09-19 18:02:18 UTC

Martin,

I'm not complaining about how long it takes to parse the classpath the
first time around; that's fine for now. The original issue that I
filed and was duped against this one was that:

1) The "scanning classpath" should be part of the Netbeans startup
progress bar. Once the main window opens, Netbeans should be ready to
use, period.

2) Scanning the classpath for changes should never take over 3 seconds.

I think this is a usability issue more than a technical one. It is
simply very annoying having to see yet another progress bar once
Netbeans comes up.

Comment 15 Martin Matula 2004-09-21 09:33:43 UTC

*** Issue 49291 has been marked as a duplicate of this issue. ***

Comment 16 Martin Matula 2004-09-21 09:35:50 UTC

See also comments from the last duplicate.

Comment 17 kjmcdonald 2004-09-22 16:47:14 UTC

I aggree with a recent comment.

It would be nice if this progress bar was part of the splash screen
progress bar.

Also the 3 second limit seems like a good target, but 5 moght be ok too.

NB4beta1 on my SunBlade 2000 (2x900MHZ, 4096MB, Solaris 10_66,
JDS3_16) just the 'initializing scanning takes 5 seconds. Scanning a
small project (< 20 class files) and it's 2 dependent projects (<10
classes each) takes another 5 seconds.

Why does initializing take as long as the scanning does?

The initial parsing though easily took over a minute (maybe 2 even.)

Comment 18 xxiii 2005-03-19 00:12:34 UTC

If its a matter of I/O time (as I gather from some of the previous comments),
would it be possible to write/read the cached version with compression on faster
machines? (where the cpu time spent decompressing is likely to be less than the
I/O time saved)

Comment 19 _ gtzabari 2005-03-19 01:22:25 UTC

Also, migrating from Xerces to dom4j or xom would likely really help too.

Comment 20 tomzi 2005-04-04 12:15:58 UTC

I think it would just be a sufficient solution if the whole scanning would NOT
block the rest of the ide as pzavadsky suggested. 

Any user perfectly understands as long classpath scanning is not finished the
functionality is restricted. 

A progress bar as part of the ide (see VCS update/checkout) would be sufficient.
If background scanning has finished all the functionality linked to that would
then work again.

Many things could be done until the background scanning has been finished.

On the other hand I also could blame the ide for being forced to drink coffee...
:) Don't know would my boss says to that *gg*

Comment 21 Martin Matula 2005-04-04 12:26:58 UTC

Agreed. Background scanning is a work in progress.

Comment 22 Jan Becicka 2005-04-04 12:30:35 UTC

Scanning in NB is pretty fast (at least as fast as in competitive IDEs) but we
can do it on background. See issue 43909.

*** This issue has been marked as a duplicate of 43909 ***

Comment 23 Quality Engineering 2007-09-20 12:44:48 UTC

Reorganization of java component