This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
-create new ear w/ ejb module and appclient -press finish in the new ear wizard => deadlock
Created attachment 29506 [details] thread dump
It's a deadlock between MergedClassPathImplementation from javacore and ProjectManager.mutex. Not sure on which side it should be fixed yet. The first thread is spawned by AppClientProjectGenerator(* - see below) and the second one by NNListenerManager. See the simplified attached thread dump and search for XXX comments which shows possibly problematic places. (*)It can happen also for other project types - possibly also outside of j2ee; since the code for all project types is the same(**). So it seems that it is rather problem of j2ee/metadata - NNListenerManager; or deeper (e.g. projects or javacore). I think we can get out of the flow of the second thread (ProjectGenerator) but maybe that NNListenerManager could from its thread with the fork to ProjectManager.mutex.readAccess -> and thus wait for all pending write-access calls. But this could wrong from the NNListenerManager vs. JMManager semantics points of view.... So (temporarily?) reassinging to Martin who seems to be the author of the code in the NNListenerManager (cvs ann). Please continue in evaluation from the NNListenerManager's point of view so we can move further.
Created attachment 29628 [details] simplified-commented-threaddump.txt
Created attachment 29629 [details] simplified-commented-threaddump.txt (previsous one was not simplified, just adjusted)
Reassigning to javacore for evaluation.
Also see the dependent issue if you want another threaddump. I think the dependent issue is the third deadlock between ProjectManager.MUTEX and MergedClassPathImplementation I saw. Note that you are calling foreign code under your own lock. But maybe it is rightful. Please evalutate, so we can move on. Thanks.
I guess NNListenerManager (now most of the code moved to NNMDRListener) is doing nothing wrong, but let me investigate more.
NNListenerManager is just one case. Other deadlocks do not contain NNLM in any stacktrace in a threaddump.
Created attachment 30164 [details] Proposed Patch
Please test attached patch. The deadlock is pretty complicated. There are too many strange things there. Events are fired under locks, save() is called during project creation etc...
Mariane could you try the patch. You said that you have 100% reproducible case (issue 75635). Let me know through the email if you need the built nbm. Thanks.
Ummm, just guessing... Although the call addClassPathResources(cp) in addClassPath() method escapes the MergedClassPathImplementation lock, it will finally call the line in the addClassPathResources() method: entries = cp.entries(); under MCPI lock which seems to be the problem of issue 75635 (see the deadlock there) since it will try to acquire ProjectManager.MUTEX read access. Maybe it should be switched with the line below? I.e. synchronized(this) { .. entries = cp.entries(); } would be replaced with synchronized(this) { .. } entries = cp.entries(); or not? :)
Created attachment 30168 [details] Improved patch. Thanks to Martin K.
Then you do not need the local variable at all for entries, just cp.entries().iterator(). Nitpicking :) I'll send to Marian a new patch.
Marian is still able to reproduce. See: http://www.netbeans.org/nonav/issues/showattachment.cgi/30173/exc.txt which is threaddump with the last patch applied. Problem starts with line 262 in patched version.
The real problem, from my point of view is, that project infrastructure fires events while holding Mutex. Maybe we can find another workaround for this deadlock, but I'm not sure. It looks like the real problem probably lies in fact, that it is not a good idea to fire events while holding some lock.. Any help appreciated...
We agreed this is not a showstopper for NB 5.5 beta, but still very important. Thanks.
I've seen the deadlock once and even wrote a test for it with Hrebejk and Tomas Z. I agree that the real problem is firing changes while holding the project's mutex. The fix is simple: Just before entering Mutex.xyzAccess, do FileSystem.runAtomicAction, that will prevent the events to be delivered. Imho this is the change that shall be done. Of course with having a test for the deadlock...
In this case the fix should be done in projects...
*** Issue 75164 has been marked as a duplicate of this issue. ***
Jarda, do you have the test? I am afraid that runAtomicAction may not help - the event that caused the deadlock in issue #75635 was not a file event.
I am not sure if it is really the same, but test for similar problem - e.g. fileChangeEvent under writeAccess has been written for issue 50259. Re. 75635: It is a different deadlock, imho. That is why it needs different test ;-)
This issue and issue #75635 are different: in this one the problem is caused by writing into project properties, the second one by write into project.xml. So fix of one of them will not (probably) fix the other.
Re the runAtomicAction: I do not like it very much, as the filesystem events are not a cause of the problem, they are only intermediate. I could simply write a piece of code that would deadlock without any filesystem events. So I do not think this would be a proper fix. I think it would be more appropriate if the listener ("AppClientProjectClassPathExtender$4.run") would be rewritten not save project synchronously. I will check with jungi and madamek if this is possible.
Also be aware of web and ejb modules (copy-pasted code). Thus: $NB_CVS_55/j2ee/clientproject/src/org/netbeans/modules/j2ee/clientproject/classpath/AppClientProjectClassPathExtender.java $NB_CVS_55/j2ee/ejbjarproject/src/org/netbeans/modules/j2ee/ejbjarproject/classpath/EjbJarProjectClassPathExtender.java $NB_CVS_55/web/project/src/org/netbeans/modules/web/project/classpath/WebProjectClassPathExtender.java
Ok, you will write the test (e.g. piece of code) that deadlocks without fs events. I'll wait for your fix and then I am going write one that does deadlock with due to fs events.
Aha, I probably understand what you mean: you propose to wrap the writeAccess in the ProjectManager.saveProject with runAtomicAction. Yes this makes sense to me, but would not solve this problem - the fs events would be fired at the end of saveProject method, and this is still under the writeAccess (acquired in org.netbeans.spi.project.support.ant.AntProjectHelper.putProperties). To solve this (particular) deadlock fully with runAtomicAction, one would have to wrap the writeAccess in putProperties into an atomic action, and this does not make much sense to me.
I doubt that the [J2EE|Web]*ClassPathExtender can be rewritten, it is called under the Mutex.readAccess, so it cannot perform the Mutex.writeAccess it has to post the WriteAccess. There are two possible solutions, either not to throw under the read access or try to minimize the synchronization in the MergedClassPath.
(Sorry for the double-post, some technical problems). Regarding the [J2EE|Web|AppClient]ClassPathExtender, I do not think it should be rewritten, but simply not calling PM.saveProject synchronously (ie. rescheduling it into another thread) would (I hope) solve this problem (with atomic action in saveProject itself, maybe). Or maybe whole storeLibLocations method can be rescheduled into another thread? Do you think this is impossible? Re firing under read access: I do not think it is possible to change it now: it would be huge semantically incompatible change.
Yes, I think that it may be fine to call storeLibraries asynchronously, it is important to take the libraries to be saved synchronously to avoid race conditions. I also think that changing the firing is not good idea at least for now, I've only mentioned it as one possible solution among others.
I would like to propose the following solution: 1. For 5.5, post PM.saveProject in [J2EE|Web|AppClient]ClassPathExtender into another thread. (I think this will not cause any race conditions: if the project metadata are accessed through AntProjectHelp, they will be always consistent, and if accessed through files on disk, they will also be consistent.) 2. For trunk, do 1. + wrap content of ProjectManager.saveProject with runAtomicAction (+test) Any objections or comments?
> 1. For 5.5, post PM.saveProject in [J2EE|Web|AppClient]ClassPathExtender... Seems like the ad-hoc workaround for this issue. It will not solve issue 75635 (at least it seems so from that threaddump) and maybe others yet-unrevealed similar issues. But probably we could workaround them in the similar way if there is not better solution and use the robust one for trunk as you supposed.
>Seems like the ad-hoc workaround for this issue. It will not solve issue 75635 >(at least it seems so from that threaddump) and maybe others yet-unrevealed >similar issues. But probably we could workaround them in the similar way if >there is not better solution and use the robust one for trunk as you supposed. Actually, I do not know about any solution (at least not on the projects side) which would solve both this issue and issue #75635. Wrapping content of PM.saveProject into runAtomicAction does not solve this deadlock - though it may prevent slightly different deadlocks in the future.
After our offline talk, go ahead "probably" ;) Since the postponement, we should write that test we talked about. Feel free to reassign since the proposed 5.5 solution needs to be done actually on the J2EE side.
Honzo, can you proceed with the 5.5 solution ASAP?
I committed the solution for NB5.5 as described above: Checking in j2ee/clientproject/src/org/netbeans/modules/j2ee/clientproject/classpath/AppClientProjectClassPathExtender.java; /cvs/j2ee/clientproject/src/org/netbeans/modules/j2ee/clientproject/classpath/Attic/AppClientProjectClassPathExtender.java,v <-- AppClientProjectClassPathExtender.java new revision: 1.1.4.4; previous revision: 1.1.4.3 done RCS file: /cvs/j2ee/clientproject/test/unit/src/org/netbeans/modules/j2ee/clientproject/classpath/Attic/AppClientProjectClassPathExtenderTest.java,v done Checking in j2ee/clientproject/test/unit/src/org/netbeans/modules/j2ee/clientproject/classpath/AppClientProjectClassPathExtenderTest.java; /cvs/j2ee/clientproject/test/unit/src/org/netbeans/modules/j2ee/clientproject/classpath/Attic/AppClientProjectClassPathExtenderTest.java,v <-- AppClientProjectClassPathExtenderTest.java new revision: 1.1.2.1; previous revision: 1.1 done Checking in j2ee/ejbjarproject/src/org/netbeans/modules/j2ee/ejbjarproject/classpath/EjbJarProjectClassPathExtender.java; /cvs/j2ee/ejbjarproject/src/org/netbeans/modules/j2ee/ejbjarproject/classpath/EjbJarProjectClassPathExtender.java,v <-- EjbJarProjectClassPathExtender.java new revision: 1.6.36.2.2.1; previous revision: 1.6.36.2 done Checking in web/project/src/org/netbeans/modules/web/project/classpath/WebProjectClassPathExtender.java; /cvs/web/project/src/org/netbeans/modules/web/project/classpath/WebProjectClassPathExtender.java,v <-- WebProjectClassPathExtender.java new revision: 1.10.32.3.2.1; previous revision: 1.10.32.3 done
The Martin's improved patch is fine and will work fine.
Thank you guys.