122943 – deadlock while creating webservice client - modified catalog.xml

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 122943 - deadlock while creating webservice client - modified catalog.xml

Summary: deadlock while creating webservice client - modified catalog.xml

Status:	VERIFIED FIXED

Alias:	None

Product:	xml
Classification:	Unclassified
Component:	XAM (show other bugs)
Version:	6.x
Hardware:	Sun All

Importance:	P1 blocker (vote)
Assignee:	Samaresh Panda

URL:
Keywords:

Depends on:
Blocks:	125528
	Show dependency tree

Reported:	2007-11-28 10:15 UTC by priserka
Modified:	2008-05-29 14:27 UTC (History)
CC List:	9 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
message snapshot (5.42 KB, image/png) 2007-11-28 10:21 UTC, priserka	Details
thread dump (24.87 KB, text/plain) 2007-11-28 10:22 UTC, priserka	Details
xam diff (1.39 KB, text/plain) 2007-12-07 01:40 UTC, Samaresh Panda	Details
xam patch (219.82 KB, application/octet-stream) 2007-12-07 01:41 UTC, Samaresh Panda	Details
second patch made by Petr Nejedly (1.43 KB, text/plain) 2007-12-10 17:14 UTC, Jindrich Sedek	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description priserka 2007-11-28 10:15:30 UTC

steps to reproduce:

Create new web project on glassfish
Create new web service 
deploy project
Create another web project
use wizard to create new web service client -> select webservice from first project, type some package, press Finish
message: "File .../catalog.xml was modified externally. Do you want to reload it?" is shown.
Press true -> deadlock happens



similar problems were reported in:
http://www.netbeans.org/issues/show_bug.cgi?id=74198
http://www.netbeans.org/issues/show_bug.cgi?id=112951
http://www.netbeans.org/issues/show_bug.cgi?id=109557


The key to reproduce is use "slow" connection to the filesystem of projects. Use NFS to reproduce for 100% very easily.

Comment 1 priserka 2007-11-28 10:21:56 UTC

Created attachment 53585 [details]
message snapshot

Comment 2 priserka 2007-11-28 10:22:35 UTC

Created attachment 53586 [details]
thread dump

Comment 3 Marek Fukala 2007-11-28 11:03:51 UTC

There are two problems in fact:

The CloneableEditorSupport.checkReload() is triggered by the multiple XDM model modification. It is called from
CES$Listener.propertyChange() after a check for the file modification timestamp. However it looks like the timestamp is
the modification time on the remote server which is in the future compared to the local time. Due to this the CES
"thinks" the file was modified externally and invokes the checkReload() action which really refreshes the document
content. This document change triggers a org.netbeans.modules.xml.xam.dom.AbstractDocumentModel$WeakDocumentListener.
Then the code attempts to acquire a monitor which is already acquired by DefaultRequestProcessor thread.

The second problem is IMHO the fact that the XDM model is accesses without the document readlock. Inside the
synchronized XDM code the readlock is then requested which leads to the threadlock.

Comment 4 wmac 2007-11-28 11:47:49 UTC

I had also reported this same bug (including screenshot and dump) 

http://www.netbeans.org/issues/show_bug.cgi?id=112951

It was closed at that time because the person was not able to reproduce it.

By the way after I removed the entire remainings of netbeans and installed the new version (.nbi, .netbeans,
.netbeans-derby) I did not encountered the problem. (might be for another reason though).

Mac

Comment 5 Marek Fukala 2007-11-28 12:06:27 UTC

wmac, did you also used the remote server or the web project was on your local disk. It is imporatant for us to know this.

Comment 6 wmac 2007-11-28 15:31:07 UTC

No, it was local only. I did not used remote.

Comment 7 Samaresh Panda 2007-11-28 16:53:09 UTC

Not reproducible for me locally.

I have looked at all the similar issues mentioned here, however none of us (engg. and QE) have been able to reproduce it
in the past. OTOH, I'm not ruling out this as a bug, but will not call it as a P1 for the following reasons:
1. IMO, having a slow connection like NFS is not a common use case.
2. All the issues reported earlier, were not reproducible and Mac's comment who filed 112951 confirms that.
3. There is a workaround. Do not use NFS if that gets you into trouble.

Comment 8 priserka 2007-11-28 18:27:46 UTC

it's P1 for me and no workaround works - I have no access to lokal disks on my working machine - I have access just to 
my home directory mounted from different FS

Comment 9 Petr Blaha 2007-11-29 10:15:54 UTC

Samaresh, I think resolving this bug is very important to universities that use SunRay and home directories are mounted
via NFS. Please, consider that University endorsement is our goal for this fiscal year hence the issue should be tracked
as P1 in this case. Did you try to reproduce the issue on Sunray? Also, Marek provided great analysis of the problem. I
think this issue is candidate for patch.

Comment 10 Petr Nejedly 2007-11-29 11:29:49 UTC

The deadlock (P1 part of this) is completely clear from the thread dump - lock ordering problem between a document lock
and (a whole bunch of) xdm/xam/xml locks.The deadlock is caused by 0x80b401c0 this time, but I'd be very careful where
to actually add the wrapping document read lock in the thread(nid=0x50a)'s stack.

Comment 11 Marek Fukala 2007-11-29 12:04:42 UTC

I would be even more careful making such patch ... if you look at the threaddump from issue #112951 you'll see that the
triggered xdm code may also write to the document so it asks for a writeLock(). AFAIK document writeLock() after
document readLock() causes the thread to lock itself even if both locks are held by the same thread.

Comment 12 Marek Fukala 2007-11-29 12:14:24 UTC

btw, it would be helpful to know *for sure* why the CES.checkReload() is invoked. It seems that the problem on remote
filesystem causes this, but I didn't debug it deeply, it was just an assumption. Moreover the problem on the local disk
doesn't fit to the theory, maybe another bug?!?. I am sorry for just jabbing, but it is not my area. Anyway this doesn't
change anything on the fact that the real problem is the xdm locks ordering.

btw2, I also think the problem should is important, but cannot the workaround be to slightly shift the local Sunray time
forward??? I am curious if it helps. Priserko, can you try it just for fun? ;-)

Comment 13 Samaresh Panda 2007-11-30 06:57:54 UTC

I tried to create a "slow" environment. Have my x86 connected to SWAN via vpn, mounted my home dir to simulate the
issue. Slowness was evident from the fact that each project creation (and everything else) was taking atleast 10 mins.
However, I'm yet to see the problem. Each time, the WS and client were created w/o any issues. Will resume tomorrow.

In the mean while, if you guys can think of some other tricks to get to the problem please let me know.

Comment 14 wmac 2007-11-30 11:55:18 UTC

I also tried 6 different combinations once again (WS on glassfish and Tomcat) and creating Client(selection of project,
wsdl URL and file) in LOCAL mode (everything on my local disk) and everything was Ok.

Comment 15 tonybeckham 2007-11-30 21:19:59 UTC

I was unable to reproduce on a Sunray nor was I able to reproduce by mounting my sunray home directory on a remote machine.

Comment 16 Jindrich Sedek 2007-12-04 09:11:42 UTC

one more hint to reproduce. On the machine where I can reproduce it times of NFS and local time are not synchronized:

-bash-3.00$ touch pokus; ls --full-time pokus; date
-rw-r--r--  1 js201828 wheel 0 2007-12-04 10:00:54.780153000 +0100 pokus
Tue Dec  4 09:53:13 CET 2007


you can see that the date of local computer is about seven minutes before the time of FS server. It can be maybe one
more hint how to reproduce the problem, because it could cause calling of CloneableEditorSupport.checkReload() as
described by Marek. But the base of problem is probably hidden somewhere else, because there are three more different
issues mentioned in issue report and there was nothing about NFS or other far-distant FS.

Comment 17 Samaresh Panda 2007-12-07 01:38:53 UTC

As per Petr Nejedly's suggestion, I'm attaching a patch. This patch seemed to have worked for Sedek. Do note that, this
is a temporary solution and a real fix requires a thorough look at the threading model in xam/retriever. xam/retriever
are core to a lot of domain models in xml, hence it must be done with caution.

Comment 18 Samaresh Panda 2007-12-07 01:40:02 UTC

Created attachment 53970 [details]
xam diff

Comment 19 Samaresh Panda 2007-12-07 01:41:02 UTC

Created attachment 53971 [details]
xam patch

Comment 20 Marek Fukala 2007-12-07 11:08:24 UTC

Please note that the fix doesn't fix the lock from threaddump attached to related issue #112951:

http://www.netbeans.org/nonav/issues/showattachment.cgi/47255/dump1.txt

The fix seems to fix just a subset of potential deadlock prone situations. There is more AbstractModel object
synchronized methods like start/endTransactions which may be called without the readlock. 

Another problem is the case when the catalog model hasn't been created yet and so the document content being replaced by
the model data. In such case a writeLock is required. This may possibly cause a deadlock if called from readLock. See
the threaddump above.

I do not say the Petr's fix is incorrect, I do not have any better solution,I just want to warn you that there might be
more P1 not explored yet...

Comment 21 Jindrich Sedek 2007-12-10 17:14:49 UTC

Created attachment 54102 [details]
second patch made by Petr Nejedly

Comment 22 priserka 2007-12-11 17:20:04 UTC

I've used todays build with second patch and it works well. Thanks for fixing it.

Comment 23 Jindrich Sedek 2008-01-02 12:37:57 UTC

samaresh, could you please merge the fix into trunk?

Comment 24 Nam Nguyen 2008-01-10 08:39:27 UTC

Sam: 
I don't see any potential usage issues with the patch please go ahead with integration.

Peter:
Regarding the other deadlock from http://www.netbeans.org/nonav/issues/showattachment.cgi/47255/dump1.txt, we could use
the same strategy to obtain a document write lock before calling super.endTransaction().  Could you help provide a patch
for it?  Thanks.

Comment 25 Samaresh Panda 2008-01-10 16:58:25 UTC

Fix integrated to trunk:
/cvs/xml/xam/src/org/netbeans/modules/xml/xam/dom/AbstractDocumentModel.java,v  <--  AbstractDocumentModel.java
new revision: 1.12; previous revision: 1.11

Comment 26 Shivanand Kini 2008-01-18 20:23:13 UTC

see 125528 for regression caused by this.

Comment 27 _ hong_lin 2008-01-25 19:34:58 UTC

Hi priserka, 

Since Sam and Tony couldn't reproduce it before the fix, they can't verify the fix. I notice you verified it with a
patch. Would you please help to verify the fix again with the latest NB 6.1 Milestone1 build:
http://bits.netbeans.org/netbeans/6.1/m1/ ? 

Thanks in advance,
Hong

Comment 28 Samaresh Panda 2008-01-25 20:43:08 UTC

And whoever creates the patch, please do look at issue 125528 for additional integration.

Comment 29 tonybeckham 2008-01-29 19:17:01 UTC

Verified through developer conversations:

snippet:
"The change I committed for 122528, take care of deadlocks from both issues.  It includes the part that reverts the
patch for 122943 (it was this patch that cause the deadlock in issue 122528).  On top of that I remove
over-synchronization to make sure we don't hit the deadlock in issue 122943."

Comment 30 tonybeckham 2008-01-29 19:20:02 UTC

See comments from samaresh on Fri Jan 25 for commit log.

Comment 31 Petr Blaha 2008-02-01 12:40:48 UTC

The fixing the bug in Patch1 is risky.