This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 36582 - deadlock when customizing Sources node
Summary: deadlock when customizing Sources node
Status: VERIFIED WONTFIX
Alias: None
Product: platform
Classification: Unclassified
Component: -- Other -- (show other bugs)
Version: 3.x
Hardware: Sun SunOS
: P3 blocker (vote)
Assignee: Petr Nejedly
URL:
Keywords: THREAD
Depends on: 35833
Blocks:
  Show dependency tree
 
Reported: 2003-10-14 16:00 UTC by Marian Petras
Modified: 2008-12-22 20:53 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
full thread dump (23.98 KB, text/plain)
2003-10-14 16:14 UTC, Marian Petras
Details
Additional test in MutexTest (1.95 KB, patch)
2003-10-15 13:56 UTC, Petr Hrebejk
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marian Petras 2003-10-14 16:00:23 UTC
A deadlock occured after I clicked on check-box

   "Hide Sources That Are Not Under Organizers"

in the customizer for the Sources node.

[NB-dev, trunk, updated & built on 14 Oct 2003]
[JDK 1.4.2_01, Solaris 8, Sun Sparc]
Comment 1 Marian Petras 2003-10-14 16:14:30 UTC
Created attachment 11856 [details]
full thread dump
Comment 2 Jan Pokorsky 2003-10-14 16:25:28 UTC
This timing issue relates to openide/looks.
Comment 3 Marian Petras 2003-10-14 19:05:42 UTC
Correction:

In the description, I should have written

    [NB-dev, projects, ...]

instead of

    [NB-dev, trunk, ...]
Comment 4 Petr Hrebejk 2003-10-15 13:55:26 UTC
The problem is that there is a Reader in the Mutex of registry, then a 
writer and another reader are waiting. Readers are in synchronized
section and will never finish. (Happens when registry is firing events
under readAccess). 
Will attach diff for MutexTest which will add a test for this situation.
Probably either registry will stop using mutex or the mutex has to be
fixed.
Comment 5 Petr Hrebejk 2003-10-15 13:56:11 UTC
Created attachment 11871 [details]
Additional test in MutexTest
Comment 6 David Konecny 2003-10-15 14:27:25 UTC
I prefer to fix the Mutex. Jesse, what do you think about this?

I did not understand it at all at first, but after explanation from
pnejedly and rereading Hrebejk's comment once again I got it and it
really looks like problem in Mutex itself.

The registry mutex is in read access as can be seen in stack of thread
"AWT-EventQueue-0" yet "Active Reference Queue Daemon" is waiting for
read access which it actually should get. But it will not get it
because "OpenIDE-request-processor-0" asked in the meantime for write
access.

The "Active Reference Queue Daemon" should be allowed to enter and
continue.
Comment 7 Jesse Glick 2003-10-15 16:03:33 UTC
There is nothing wrong with Mutex IMHO. It is permitted to have a
policy of refusing further readers while a writer is waiting; this is
a common strategy to avoid writer starvation. It may have contributed
to this deadlock but is not the root cause. The deadlock arises
because of a lock order conflict:

ARQ:
- locked SourcesLook$ChildLook@0xf08bc668
- waiting for read access to Registry's mutex
--> blocks on RP since that is a writer which is preferred

RP:
- waiting for write access to Registry's mutex
--> blocks on EQ since that is a reader

EQ:
- read access to Registry's mutex
- waiting to lock SourcesLook$ChildLook@0xf08bc668
--> blocks on ARQ which holds the monitor

The RP is not very important here because if either ARQ or EQ had
taken write access on Registry's mutex (quite plausible) then there
would still be a two-thread deadlock. Even if the Mutex behavior were
changed to prefer readers there could be a deadlock if one of ARQ or
EQ was requesting, or held, write access.

So who is at fault? Looks, IMHO, for being internally synchronized
while running external code (e.g. getLooksForKey in this case,
detachFrom likely in other cases). In fact in the branch for issue
#35833, I believe this deadlock could not occur, since everything
happening with the Look would run in EQ, which would acquire
Registry's mutex in read access; RP would either get there first or
later. One hot fix might be to merge a change I already made in that
branch which forces the ARQ ref cleanup for a Look into EQ - but only
effective if the other locker of the Look (here, very indirect, from a
Look event, from a Registry change event, from another Look, from a
JToggleButton) is also in EQ, so it should fix this particular
deadlock, but perhaps not others like it. A more complete fix would
also merge a change to make sure the Look's event firing was in EQ,
but that might screw up Look unit tests if more work is not done
(still in progress in the branch).

How critical is this deadlock, i.e. how common? Can it wait for #35833
to be merged, or do you need a fix soon?
Comment 8 Marian Petras 2003-10-15 17:08:30 UTC
Why I assigned it priority P2:
- because every deadlock prevents from saving files (possible loss
  of data)
- because I got two deadlocks in five minutes which lead me to
  an idea that the deadlocks are not as rare as they should be
  (but the fact is that these deadlocks were different and are
  tracked as two different issues in IssueZilla).
Comment 9 Jesse Glick 2003-10-15 17:14:26 UTC
Agreed that inability to save modifications is the primary concrete
harm caused by deadlocks. For that, see issue #7067, which I am
willing to provide a simple implementation of, since I consider it
much more critical than its current P4 rating indicates.

If you think the other deadlock you got might be caused by the same
problem, and is marked duplicate of this one, then that would make
this P1.
Comment 10 Marian Petras 2003-10-15 17:39:58 UTC
To answer the Jesse's last questions:

I can live with the bug until task #35833 is implemented (and
integrated). I encountered it only once during an operation I do not
perform often. I didn't get the deadlock the second try.
Comment 11 Jaroslav Tulach 2003-10-15 18:12:03 UTC
Re. calling into foreign code while holding locks. Are we sure we want
to completely disallow that? Does that mean it is a problem to call to
java.lang.* package? Probably not, there have to be some limits. And I
thought that Registry is supposed to be "the base" for everything. It
is of course bad to call unknown listeners from any synchronized code,
that means it is bad to modify registry under synchronized code called
by someone else, but read only access to registry should be as basic
operation as reading java.util.pref which is obviously allowed.

Comment 12 Jesse Glick 2003-10-15 18:53:46 UTC
"Re. calling into foreign code while holding locks. Are we sure we
want to completely disallow that?" - no, it is necessary sometimes,
but you need to be careful I guess. (By "foreign code" BTW I mean
"module code whose behavior is not statically known", so not
java.lang.) A basic rule is that event listeners on e.g. Registry need
to be aware that they might be called while holding potentially
unlimited foreign locks, and so should think twice before attempting
to acquire locks themselves.

In this example,
o.n.m.p.ide.ui.looks.DefaultContainerLook$Idx.propertyChange is
receiving a Registry change and then refiring a corresponding Look
change. This, I believe, *should* be considered safe, since Look is in
the GUI layer (depends on Registry but not vice-versa); and if it were
not for ARQ attempting to access it from outside EQ with a special
monitor, there would be no problem. I am not sure I can prove this
mathematically (yet!) but I think the DefaultContainerLook code is OK.
It is certainly the natural way to write it, so it would be
undesirable if it were unsafe - lock ordering principles ought to be
intuitive and easy to remember.

Again, I believe that under #35833 there would be no deadlock here;
Registry is not patched, and Look's (etc.) can freely listen to
changes in e.g. Registry and refire events as a result.
Comment 13 Jesse Glick 2003-11-10 17:03:20 UTC
P3 if only occurs in prj40_prototype.

Petr can deal with this stuff now, perhaps? Maybe just wait for #35833.
Comment 14 Marian Mirilovic 2003-12-22 13:22:42 UTC
As described in
http://www.netbeans.org/servlets/ReadMsg?msgId=619519&listName=nbdiscuss
the
current work on projects prototype has been stopped.

WONTFIX
Comment 15 Marian Mirilovic 2005-07-12 09:47:19 UTC
closed