This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 183557 - Unrelated projects slow down code completion
Summary: Unrelated projects slow down code completion
Status: VERIFIED FIXED
Alias: None
Product: projects
Classification: Unclassified
Component: Maven (show other bugs)
Version: 6.x
Hardware: PC Mac OS X
: P2 normal (vote)
Assignee: Antonin Nebuzelsky
URL:
Keywords: PERFORMANCE
Depends on: 186024
Blocks:
  Show dependency tree
 
Reported: 2010-04-06 12:14 UTC by Petr Jiricka
Modified: 2010-06-23 14:41 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
Exception in the NB output (4.64 KB, text/plain)
2010-04-06 12:14 UTC, Petr Jiricka
Details
A few more stack traces showing that Maven's FileOwnerQuery implementation can be very slow. (18.00 KB, text/plain)
2010-04-06 13:53 UTC, Petr Jiricka
Details
Profiler snapshot (33.79 KB, application/octet-stream)
2010-04-27 09:00 UTC, Petr Jiricka
Details
Another profiler snapshot (152.10 KB, application/octet-stream)
2010-05-11 16:52 UTC, Petr Jiricka
Details
Patch with incremental update of cache for newly added projects and for project property change events. (9.47 KB, patch)
2010-05-11 21:22 UTC, Antonin Nebuzelsky
Details | Diff
Patch 2 - one line change of the patch, preventing endless iteration if A is in B's list and B in A's list (9.54 KB, patch)
2010-05-12 12:55 UTC, Antonin Nebuzelsky
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Jiricka 2010-04-06 12:14:28 UTC
Created attachment 96761 [details]
Exception in the NB output

I was invoking code completion in a Maven project, and had another unrelated Maven project open. The code completion took a very long time to come up, and meanwhile, a lot of exceptions were printed to the log file, that refer to the other unrelated Maven project. I attached one of the exception that was printed out.
Comment 1 Petr Jiricka 2010-04-06 13:47:12 UTC
This is actually worse than I thought, raising to P2. The exception continues to be printed in the output and the IDE is slowed down to an unacceptable level.
Comment 2 Petr Jiricka 2010-04-06 13:53:27 UTC
Created attachment 96777 [details]
A few more stack traces showing that Maven's FileOwnerQuery implementation can be very slow.
Comment 3 David Simonek 2010-04-06 14:13:05 UTC
But what could we do? Did you look at the stack traces? In order to implement the query, project has to be loaded using MavenEmbedder anyway and it can be slow. If we ever start to use maven 3, it claims to be faster. I don't know how to make things faster now, sorry.
Comment 4 Petr Jiricka 2010-04-07 07:59:48 UTC
Did you try to reproduce? Here are the steps:

1. Check out the GlassFish project per instructions here: http://wiki.glassfish.java.net/Wiki.jsp?page=V3FullBuildInstructions
I.e. check out the v3 workspace: svn checkout https://svn.dev.java.net/svn/glassfish-svn/trunk/v3
Do not build it.
2. Open the top-level project in NetBeans. It will open with the broken icon, as some dependencies are typically missing in your local Maven repository.
3. Open another (small) Maven project and try developing it (code completion, go to type, etc.)

The performance is really unacceptable, please try it out first at least.

As for the solution, would it help to cache the fact that the project was not opened correctly, so the IDE does not try to parse it again and again? If it's slow the first time, then that's life, but it should not slow the IDE down forever.
Comment 5 David Simonek 2010-04-07 08:11:48 UTC
I don't compromise the fact that performance is bad in that case. I'm just stating that issue is not fixable as far as my knowledge goes.
Comment 6 Petr Jiricka 2010-04-07 08:36:41 UTC
I don't know whether and how it is possible to fix this, though I was hoping my suggestion could be applicable. Either way, if a P2 bug can not be fixed, then it can not be closed as WONTFIX - it can potentially be waived with appropriate justification.
Comment 7 _ tboudreau 2010-04-07 18:11:38 UTC
Looking at the thread dumps, a couple of things pop out:

1. org.netbeans.modules.maven.queries.MavenFileOwnerQueryImpl.getAllKnownProjects() - do we know if this is repeatedly reading the same things, and if so, could it use some sort of cache?

2. hidden.org.codehaus.plexus.util.xml.Xpp3DomBuilder.build is in many of the stack traces.  So, we know that Maven is using DOM (okay in a build environment, terrible idea in an IDE w/ near-real-time requirements).  

How much of the information in the built DOM will NetBeans actually consume?  Could we write a SAX parser that tries to get the info we need quickly out of the POM (and perhaps caches the info somewhere with a file timestamp?), but which will fall back to calling Maven's code if it encounters things it cannot understand?  Now, that might make things *slower* in the pathological case that *every* project is doing something strange, but if most Maven projects are using a known subset of features, it could work well.

3. Does *all* of I/O really need to run inside ProjectManager.mutex().readAccess()?  

4.  I see a comment on line 259 of MavenFileOwnerQueryImplementation:
//TODO performance.. this could be expensive, maybe cache somehow
which certainly suggests this is a good place to start.

5. MavenFileOwnerQueryImpl *does* keep a cache of projects (in a Set<NbMavenProjectImpl> owned by the project), BUT:  it dumps the cache very aggressively on any property change from the project or whenever another Maven project is opened or closed (see usages of MFOQI.add/removeMavenProject).  

So it looks like, if you open a list of maven projects, this cache is rebuilt list.size() times - so we're iterating every visible project list.size() times.

If, instead of dumping the cache, we could simply add/remove the newly opened project and enqueue/dequeue the work to scan that too, it would probably dramatically reduce the number of iterations.

6.  Maven generally forces a standard directory structure.  It might be possible to answer an ownership query without fully loading the maven project - just check for a parent directory that contains a pom.xml and create a wrapper Project implementation which will load the real project on-demand.  That could also allow us to escape from ProjectManager.mutex().  Some (project instanceof NbMavenProjectImpl) tests would need to be fixed.


I think those are a few ways this could be optimized from 10 minutes of looking at the code - it looks like the code is clean but no serious attempt at optimization was ever done.

Making the optimizations *correct* is harder, but there are certainly places to start.
Comment 8 _ tboudreau 2010-04-07 18:13:46 UTC
owned by the project -> owned by the query (which is a singleton)
Comment 9 Milos Kleint 2010-04-07 19:03:20 UTC
first and foremost: glassfish root project is *not* the typical maven project out there. Try to open similar scale project(s) in other IDEs before you make claims about unacceptable speed. There is a long term smell around glassfish maven projects that I might want to assert that their  projects are misconfigured and corner-casey..

re 1. maven3 has ways to cache parsed models. That shall improve quite a bit, even though we do have some caching now as well..

re 2, feel free to hack around but I would say It's not the best place to look for improvements. migrating to maven3 is the way with highest net result.. hacking the m3 codebase then if it's not sufficient 

re 3, 4. file owner query shall be deterministic. There is no way to listen to "changes" so if we miss out some initial calls to it, we might never get a chance to be asked again. That results in serious misbehaviour.

re 5. that's a possible improvement for sure.

re 6. you are misinterpreting what this FOQimpl is about, tim. This query impl claims ownership of files in -local maven repository- which is used for interdependencies between projects. In maven there are no hard links to sibling projects, everything is handled via the FOQimpl. Any opened project and it's children for that matter claim their artifact jar file. when this jar file is used as a dependency in another project, we link them together. Then hyperlinks, code completion and everything else starts to work.


I would start with 5, that's probably the low hanging fruit. But I've learned to respect this code as every little change can cause deadlocks or break unrelated codebase in unexpected ways.. You have been warned :)
Comment 10 _ tboudreau 2010-04-07 19:54:55 UTC
> file owner query shall be deterministic

Okay, perhaps you can't avoid the mutex.  What I'm thinking might be useful, along with the optimization suggested in 5, is that we use some sort of blocking queue - i.e. if another thread enters (or the same thread reenters) building the cache, what it wants to scan is added to the work that needs to be done before exiting, and, if necessary, that thread is blocked until all scanning work is complete.  I.e. whoever enters getAllKnownProjects() first does all of the search work for any other thread that tries to enter getAllKnownProjects() while the first thread is running.  So, if project A is opened and a search of its dependencies is done, and then project B is opened while A's scan is happening, we do not do work twice or contend for ProjectManager.mutex() itself (the second thread would enter ProjectManager.mutex() afterward, see there is no pending work and immediately exit).  Yes, you're blocking on something else, but probably doing the scan in a single pass on a known thread will be more efficient.
Comment 11 Petr Jiricka 2010-04-27 09:00:52 UTC
Created attachment 98099 [details]
Profiler snapshot

Yesterday I bumped into this again. I am attaching a profiler snapshot taken during the scanning, hopefully it will help.
Comment 12 Antonin Nebuzelsky 2010-04-29 21:31:51 UTC
For the record, a few observations from investigating MavenFileOwnerQueryImpl so far:

* getOwner returns quickly null if the file in question is not a jar from local repository, only for jars from local repo more computation is done

* getOwner blocks on getting a list of all known projects (opened projects and their subprojects) - done in getAllKnownProjects and cached in cachedProjects set

* getAllKnownProjects does not do the full computation more than once or twice during IDE start with the ~200 GF projects - but later on, it recomputes quite often - see the bullets below

* cachedProjects is cleared up whenever a new project gets opened, closed, or fires a property change

* optimizing the caching by not clearing up the cache for newly added (opened) project is easy - incrementally adding this new project and its subprojects to cachedProjects set

* but, the property change event which clears up the cache anyway, gets fired more often than expected - e.g. the first property change for a newly created project comes right after the project is created and opened and source+javadoc download task is completed - so, unless such property change events get eliminated, the optimization in incremental addition of a project+subprojects to the cache is pointless

* unfortunately many changes which trigger property change (project's pom.xml modification, modification of other related poms, or also activation a project's profile)  can really affect which subprojects are detected for the project - these events cannot be ignored, because cachedProjects would not only contain more projects than desired but also some new subprojects would be missing and FOQ would not identify owners where it should

* total majority of getAllKnownProjects's time is spent in calls to the Maven embedder asking to re(read) project and its properties - a question is if there is any possible improvement here other than replacing the old m2 embedder with newer (and as they say faster) m3 embedder
Comment 13 Petr Jiricka 2010-05-11 16:52:49 UTC
Created attachment 98787 [details]
Another profiler snapshot

I bumped into this problem again when trying to verify bug 185080, attaching a new snapshot.
Comment 14 Antonin Nebuzelsky 2010-05-11 21:22:35 UTC
Created attachment 98804 [details]
Patch with incremental update of cache for newly added projects and for project property change events.

Milosi, Dafe, please review this patch.

I am keeping the full clean-up of cachedProjects only for removeMavenProject() and I am changing the behavior to an incremental addition of a project+subprojects to the cache for both addMavenProject() and propertyChange().

I think it should work correctly like this, with whatever propertyChange events might be triggered for a maven project because what is only important here is to have all current subprojects of the project in cachedProjects, and my change makes sure this is the case.

If there are no objections, I will integrate tomorrow.
Comment 15 Milos Kleint 2010-05-12 05:45:39 UTC
you seem to se projectsToAddToCache to null outside of a lock.

otherwise it looks ok, but it's a big change to review...
Comment 16 Antonin Nebuzelsky 2010-05-12 12:55:51 UTC
Created attachment 98846 [details]
Patch 2 - one line change of the patch, preventing endless iteration if A is in B's list and B in A's list

One line change of the patch at

-           iteratinglist.add((NbMavenProjectImpl) p);
+           if (!iteratinglist.contains((NbMavenProjectImpl)p))
+               iteratinglist.add((NbMavenProjectImpl) p);

preventing endless loop.
Comment 17 Antonin Nebuzelsky 2010-05-12 13:20:47 UTC
http://hg.netbeans.org/core-main/rev/ba04ed2a41f4

Fixed in 6.9 with this hopefully good-enough solution.

It is worth noting that there are two flaws in this approach but with the big performance penalty without the fix I think it makes sense living with it:

* if number of subprojects decreases for a project with propertyChange, the old ones are still kept in the cache until the cache is cleared up later - a minor "memory leak" which is tolerable IMO

* code in propertyChange handler adds this project and its currently active subprojects to the cache (if they are not there yet), but if the propertyChange of this project should affect list of subprojects of a different open project, this will not be detected
Comment 18 David Simonek 2010-05-12 13:29:20 UTC
Well no objections from me mainly because, hate to say this, but I don't fully
understand the situation, sorry.
Comment 19 Jesse Glick 2010-05-12 14:34:37 UTC
For the next release I would recommend very different behavior, as mentioned offline: ignore subprojects, and persistently cache artifact:group -> /project/source/location in e.g. NbPreferences.

(In reply to comment #16)
> -           iteratinglist.add((NbMavenProjectImpl) p);
> +           if (!iteratinglist.contains((NbMavenProjectImpl)p))
> +               iteratinglist.add((NbMavenProjectImpl) p);

BTW LinkedHashSet is the usual way of doing this.
Comment 20 Antonin Nebuzelsky 2010-05-12 15:36:10 UTC
> For the next release I would recommend very different behavior,
> as mentioned offline: ignore subprojects, and persistently cache
> artifact:group -> /project/source/location in e.g. NbPreferences.

Agreed. I filed issue 186024 to keep track.
Comment 21 Quality Engineering 2010-05-13 05:07:12 UTC
Integrated into 'main-golden', will be available in build *201005122200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/
User: 
Log:
Comment 22 Jaroslav Pospisil 2010-06-23 14:41:48 UTC
v.