This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 36855 - Deadlock during restart of IDE
Summary: Deadlock during restart of IDE
Status: VERIFIED FIXED
Alias: None
Product: javaee
Classification: Unclassified
Component: Code (show other bugs)
Version: 3.x
Hardware: PC Windows ME/2000
: P2 blocker (vote)
Assignee: Ana.von Klopp
URL:
Keywords: RANDOM, THREAD
: 37282 (view as bug list)
Depends on: 157872
Blocks:
  Show dependency tree
 
Reported: 2003-10-27 14:33 UTC by Marian Mirilovic
Modified: 2009-02-04 00:46 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
ide.log file with full thread-dump (46.55 KB, text/plain)
2003-10-27 14:33 UTC, Marian Mirilovic
Details
patched JavaNode$JavaSourceChildren (3.23 KB, patch)
2003-12-01 16:34 UTC, Jan Pokorsky
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marian Mirilovic 2003-10-27 14:33:14 UTC
[nb winsys](031024), [jdk1.4.2_02]

Steps to reproduce:
- run IDE
- open some java file
- maximize Documents Window
- restart IDE
-> IDE hangs (see attached ide.log file with full
thread-dump)

I cannot reproduce this deadlock on Solaris, but
on Win2K is 100% reproducible.
Comment 1 Marian Mirilovic 2003-10-27 14:33:58 UTC
Created attachment 11986 [details]
ide.log file with full thread-dump
Comment 2 Peter Zavadsky 2003-10-27 15:12:58 UTC
[Note: this was found in new winsys build]

I think this is not a problem of winsys itself.

I can't judge from the dump who is making the mistake... all it seems
weird to me. Those stack traces dealing with nodes, folder instance
from various threads (it seems it doesn't have a clean separation).


Passing to nodes first.
Comment 3 _ ttran 2003-11-03 08:50:22 UTC
pnejedly: please take care of this issue.  Thanks
Comment 4 Marian Mirilovic 2003-11-13 17:30:10 UTC
It works if you remove monitor.jar from modules directory :)
Comment 5 Petr Nejedly 2003-11-18 13:54:38 UTC
Not nodes problem.

First of all, I'm not sure why the DataObject performs node creation
from inside a Children.MUTEX.readAccess, but this piece of code has
quite interesting history (DCL, writeAccess, more locks, readAccess
with more locks).

Then a JavaNode waits for FolderInstance to finish from inside a Node
creation. Not nice, especially while holding systemwide lock
(Children.MUTEX).

And finally MonitorAction constructor, called from a FolderInstance
processor thread, creates a lot of interesting stuff, but it also
creates some Nodes and sets them up, so they need to acquire writeAccess.

I'd blame the JavaNode but it is hard to decide between
it and the MonitorAction. IN ideal world, both should do corrective steps.

Note: The problem probably wouldn't arise under new Nodes threading,
as the MonitorAction would initialize all its Nodes lock-free and add
hem to the UI (if really needed) from invokeLater.

Note2: The problem may be provoked by the WS change because of some
startup optimalizations (Monitor's node structure is rooted in Runtime
tab, right)
Comment 6 Petr Nejedly 2003-11-18 14:01:37 UTC
Oops, UI is really not the best subcomponent...
Comment 7 Marian Mirilovic 2003-11-24 17:12:59 UTC
*** Issue 37282 has been marked as a duplicate of this issue. ***
Comment 8 Jan Pokorsky 2003-12-01 16:30:16 UTC
I am not able to reproduce it with the same configuration so the
priority changed to P2.

Even though I am not convinced the java module is the culprit I have
prepared a patch postponing the FolderInstance task to
JavaNode$JavaSourceChildren.addNotify in order to prevent the
starvation. Since it is more hack than a solution I have just attached
it here.

The right solution seems to me to not perform a node creation
inside a Children.MUTEX.readAccess in DataObject as Petr N. mentioned
above. At least I do not see any reason for the mutex there.

Reassined back to openide for further investigation.
Comment 9 Jan Pokorsky 2003-12-01 16:34:44 UTC
Created attachment 12374 [details]
patched JavaNode$JavaSourceChildren
Comment 10 Petr Nejedly 2004-01-28 14:59:00 UTC
OK, It seems I can legally call node construction without the read
lock (with proper locking only). It would solve *this particular*
deadlock (and maybe some others), but your code may (and frequently
will) still get called under the Children.MUTEX.readLock,
because that way it is usually created for all lazy Children (e.g.
Children.Keys -> FolderChildren).
Comment 11 Petr Nejedly 2004-01-29 11:54:44 UTC
No readlock for node creation anymore in
openide/loaders/src/org/openide/loaders/DataObject.java, v1.13

Fixes this particular deadlock, but there are still potential
deadlocks between JavaNode and web module.
Comment 12 Jesse Glick 2004-01-29 14:18:43 UTC
Looks like a bug in MonitorAction to me, though I don't see anything
bad about your patch either.
Comment 13 Petr Nejedly 2004-01-30 17:38:44 UTC
OK, yarda have finally spoken and explained the presence of read-lock
to me:
Usually, when your node is about to be displayed in explorer,
FolderChilden (as any other Children.Keys) calls the node creation
under the readlock:
readLock->getNodeDelegate()->priv.lock->createNodeDelegate()

But the node may be asked by direct query:
getNodeDelegate()->priv.lock->createNodeDelegate()

This means that now (after my patch), the node creation code must not
try to acquire Children.MUTEX

In the light of this, I'm considering rolling my change back.
Comment 14 Jaroslav Tulach 2004-01-30 20:20:27 UTC
Sorry for not speaking up sooner, I had to realize the whole story.
Now I reccon and I support the rollback.

I think that there is little value in solving deadlocks just by
changing random piece of code to lock in different order or delay some
actions. As this example shows, once upon a time I decided to solve
deadlock in issue 11132 by changing the order of locks in
getNodeDelegate, fine for release 3.2, the problem was fixed, but now
Petr decided to revert the order again and we can reopen the issue for
3.6. 

Deadlocks are so easy: After a while everyone learns how to read
thread dumps and change order by modifying few lines of code, but in
spite how tempting this solution is and how it immediatelly helps,
from a longer point of view it is completely useless. The only
valuable solution is JUnit test that is going to reproduce the
deadlock and warn everyone when he mangles those few necessary lines
of code that fixed it. Fighting deadlocks is so hard.

I support the rollback and I'd like to ask for the junit test next
time. And I admit I did not write one for issue 11132 (but we were all
by 20000 issues younger), if you want reopen that issue to me and I
can fix my mistake it. Better late than never.
Comment 15 Petr Nejedly 2004-02-02 10:21:38 UTC
I've reverted the change.
Now it's on web folks to fix the monitor action.
Comment 16 Ana.von Klopp 2004-02-03 00:47:54 UTC
This bug was filed on October 27 and code that supposedly causes the 
problem ("createNodeStructure") is no longer invoked from the 
MonitorAction at startup. 

The monitor caused deadlocks (see issue 36749, which appears to be a 
duplicate) after Jesse Glick modified the use of Nodes (to preempt 
deadlocks) on September 6, this was rolled back on November 14 after 
which those deadlocks disappeared (this issue was filed between those 
dates). In the process I also ensured that the monitor will not create 
any components until it the user starts the UI.  

The startup issue is definitely gone as a result of this, and since we 
have not had any reports of deadlocks during running of the monitor it 
is safer not to attempt to modify that code for now. It's my intention 
to switch to Looks as soon as it becomes available. 
Comment 17 Marian Mirilovic 2004-02-19 09:03:05 UTC
verified in [nb_dev](200402181900)