This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
steps to reproduce: Create new web project on glassfish Create new web service deploy project Create another web project use wizard to create new web service client -> select webservice from first project, type some package, press Finish message: "File .../catalog.xml was modified externally. Do you want to reload it?" is shown. Press true -> deadlock happens similar problems were reported in: http://www.netbeans.org/issues/show_bug.cgi?id=74198 http://www.netbeans.org/issues/show_bug.cgi?id=112951 http://www.netbeans.org/issues/show_bug.cgi?id=109557 The key to reproduce is use "slow" connection to the filesystem of projects. Use NFS to reproduce for 100% very easily.
Created attachment 53585 [details] message snapshot
Created attachment 53586 [details] thread dump
There are two problems in fact: The CloneableEditorSupport.checkReload() is triggered by the multiple XDM model modification. It is called from CES$Listener.propertyChange() after a check for the file modification timestamp. However it looks like the timestamp is the modification time on the remote server which is in the future compared to the local time. Due to this the CES "thinks" the file was modified externally and invokes the checkReload() action which really refreshes the document content. This document change triggers a org.netbeans.modules.xml.xam.dom.AbstractDocumentModel$WeakDocumentListener. Then the code attempts to acquire a monitor which is already acquired by DefaultRequestProcessor thread. The second problem is IMHO the fact that the XDM model is accesses without the document readlock. Inside the synchronized XDM code the readlock is then requested which leads to the threadlock.
I had also reported this same bug (including screenshot and dump) http://www.netbeans.org/issues/show_bug.cgi?id=112951 It was closed at that time because the person was not able to reproduce it. By the way after I removed the entire remainings of netbeans and installed the new version (.nbi, .netbeans, .netbeans-derby) I did not encountered the problem. (might be for another reason though). Mac
wmac, did you also used the remote server or the web project was on your local disk. It is imporatant for us to know this.
No, it was local only. I did not used remote.
Not reproducible for me locally. I have looked at all the similar issues mentioned here, however none of us (engg. and QE) have been able to reproduce it in the past. OTOH, I'm not ruling out this as a bug, but will not call it as a P1 for the following reasons: 1. IMO, having a slow connection like NFS is not a common use case. 2. All the issues reported earlier, were not reproducible and Mac's comment who filed 112951 confirms that. 3. There is a workaround. Do not use NFS if that gets you into trouble.
it's P1 for me and no workaround works - I have no access to lokal disks on my working machine - I have access just to my home directory mounted from different FS
Samaresh, I think resolving this bug is very important to universities that use SunRay and home directories are mounted via NFS. Please, consider that University endorsement is our goal for this fiscal year hence the issue should be tracked as P1 in this case. Did you try to reproduce the issue on Sunray? Also, Marek provided great analysis of the problem. I think this issue is candidate for patch.
The deadlock (P1 part of this) is completely clear from the thread dump - lock ordering problem between a document lock and (a whole bunch of) xdm/xam/xml locks.The deadlock is caused by 0x80b401c0 this time, but I'd be very careful where to actually add the wrapping document read lock in the thread(nid=0x50a)'s stack.
I would be even more careful making such patch ... if you look at the threaddump from issue #112951 you'll see that the triggered xdm code may also write to the document so it asks for a writeLock(). AFAIK document writeLock() after document readLock() causes the thread to lock itself even if both locks are held by the same thread.
btw, it would be helpful to know *for sure* why the CES.checkReload() is invoked. It seems that the problem on remote filesystem causes this, but I didn't debug it deeply, it was just an assumption. Moreover the problem on the local disk doesn't fit to the theory, maybe another bug?!?. I am sorry for just jabbing, but it is not my area. Anyway this doesn't change anything on the fact that the real problem is the xdm locks ordering. btw2, I also think the problem should is important, but cannot the workaround be to slightly shift the local Sunray time forward??? I am curious if it helps. Priserko, can you try it just for fun? ;-)
I tried to create a "slow" environment. Have my x86 connected to SWAN via vpn, mounted my home dir to simulate the issue. Slowness was evident from the fact that each project creation (and everything else) was taking atleast 10 mins. However, I'm yet to see the problem. Each time, the WS and client were created w/o any issues. Will resume tomorrow. In the mean while, if you guys can think of some other tricks to get to the problem please let me know.
I also tried 6 different combinations once again (WS on glassfish and Tomcat) and creating Client(selection of project, wsdl URL and file) in LOCAL mode (everything on my local disk) and everything was Ok.
I was unable to reproduce on a Sunray nor was I able to reproduce by mounting my sunray home directory on a remote machine.
one more hint to reproduce. On the machine where I can reproduce it times of NFS and local time are not synchronized: -bash-3.00$ touch pokus; ls --full-time pokus; date -rw-r--r-- 1 js201828 wheel 0 2007-12-04 10:00:54.780153000 +0100 pokus Tue Dec 4 09:53:13 CET 2007 you can see that the date of local computer is about seven minutes before the time of FS server. It can be maybe one more hint how to reproduce the problem, because it could cause calling of CloneableEditorSupport.checkReload() as described by Marek. But the base of problem is probably hidden somewhere else, because there are three more different issues mentioned in issue report and there was nothing about NFS or other far-distant FS.
As per Petr Nejedly's suggestion, I'm attaching a patch. This patch seemed to have worked for Sedek. Do note that, this is a temporary solution and a real fix requires a thorough look at the threading model in xam/retriever. xam/retriever are core to a lot of domain models in xml, hence it must be done with caution.
Created attachment 53970 [details] xam diff
Created attachment 53971 [details] xam patch
Please note that the fix doesn't fix the lock from threaddump attached to related issue #112951: http://www.netbeans.org/nonav/issues/showattachment.cgi/47255/dump1.txt The fix seems to fix just a subset of potential deadlock prone situations. There is more AbstractModel object synchronized methods like start/endTransactions which may be called without the readlock. Another problem is the case when the catalog model hasn't been created yet and so the document content being replaced by the model data. In such case a writeLock is required. This may possibly cause a deadlock if called from readLock. See the threaddump above. I do not say the Petr's fix is incorrect, I do not have any better solution,I just want to warn you that there might be more P1 not explored yet...
Created attachment 54102 [details] second patch made by Petr Nejedly
I've used todays build with second patch and it works well. Thanks for fixing it.
samaresh, could you please merge the fix into trunk?
Sam: I don't see any potential usage issues with the patch please go ahead with integration. Peter: Regarding the other deadlock from http://www.netbeans.org/nonav/issues/showattachment.cgi/47255/dump1.txt, we could use the same strategy to obtain a document write lock before calling super.endTransaction(). Could you help provide a patch for it? Thanks.
Fix integrated to trunk: /cvs/xml/xam/src/org/netbeans/modules/xml/xam/dom/AbstractDocumentModel.java,v <-- AbstractDocumentModel.java new revision: 1.12; previous revision: 1.11
see 125528 for regression caused by this.
Hi priserka, Since Sam and Tony couldn't reproduce it before the fix, they can't verify the fix. I notice you verified it with a patch. Would you please help to verify the fix again with the latest NB 6.1 Milestone1 build: http://bits.netbeans.org/netbeans/6.1/m1/ ? Thanks in advance, Hong
And whoever creates the patch, please do look at issue 125528 for additional integration.
Verified through developer conversations: snippet: "The change I committed for 122528, take care of deadlocks from both issues. It includes the part that reverts the patch for 122943 (it was this patch that cause the deadlock in issue 122528). On top of that I remove over-synchronization to make sure we don't hit the deadlock in issue 122943."
See comments from samaresh on Fri Jan 25 for commit log.
The fixing the bug in Patch1 is risky.