This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
A deadlock occured after I clicked on check-box "Hide Sources That Are Not Under Organizers" in the customizer for the Sources node. [NB-dev, trunk, updated & built on 14 Oct 2003] [JDK 1.4.2_01, Solaris 8, Sun Sparc]
Created attachment 11856 [details] full thread dump
This timing issue relates to openide/looks.
Correction: In the description, I should have written [NB-dev, projects, ...] instead of [NB-dev, trunk, ...]
The problem is that there is a Reader in the Mutex of registry, then a writer and another reader are waiting. Readers are in synchronized section and will never finish. (Happens when registry is firing events under readAccess). Will attach diff for MutexTest which will add a test for this situation. Probably either registry will stop using mutex or the mutex has to be fixed.
Created attachment 11871 [details] Additional test in MutexTest
I prefer to fix the Mutex. Jesse, what do you think about this? I did not understand it at all at first, but after explanation from pnejedly and rereading Hrebejk's comment once again I got it and it really looks like problem in Mutex itself. The registry mutex is in read access as can be seen in stack of thread "AWT-EventQueue-0" yet "Active Reference Queue Daemon" is waiting for read access which it actually should get. But it will not get it because "OpenIDE-request-processor-0" asked in the meantime for write access. The "Active Reference Queue Daemon" should be allowed to enter and continue.
There is nothing wrong with Mutex IMHO. It is permitted to have a policy of refusing further readers while a writer is waiting; this is a common strategy to avoid writer starvation. It may have contributed to this deadlock but is not the root cause. The deadlock arises because of a lock order conflict: ARQ: - locked SourcesLook$ChildLook@0xf08bc668 - waiting for read access to Registry's mutex --> blocks on RP since that is a writer which is preferred RP: - waiting for write access to Registry's mutex --> blocks on EQ since that is a reader EQ: - read access to Registry's mutex - waiting to lock SourcesLook$ChildLook@0xf08bc668 --> blocks on ARQ which holds the monitor The RP is not very important here because if either ARQ or EQ had taken write access on Registry's mutex (quite plausible) then there would still be a two-thread deadlock. Even if the Mutex behavior were changed to prefer readers there could be a deadlock if one of ARQ or EQ was requesting, or held, write access. So who is at fault? Looks, IMHO, for being internally synchronized while running external code (e.g. getLooksForKey in this case, detachFrom likely in other cases). In fact in the branch for issue #35833, I believe this deadlock could not occur, since everything happening with the Look would run in EQ, which would acquire Registry's mutex in read access; RP would either get there first or later. One hot fix might be to merge a change I already made in that branch which forces the ARQ ref cleanup for a Look into EQ - but only effective if the other locker of the Look (here, very indirect, from a Look event, from a Registry change event, from another Look, from a JToggleButton) is also in EQ, so it should fix this particular deadlock, but perhaps not others like it. A more complete fix would also merge a change to make sure the Look's event firing was in EQ, but that might screw up Look unit tests if more work is not done (still in progress in the branch). How critical is this deadlock, i.e. how common? Can it wait for #35833 to be merged, or do you need a fix soon?
Why I assigned it priority P2: - because every deadlock prevents from saving files (possible loss of data) - because I got two deadlocks in five minutes which lead me to an idea that the deadlocks are not as rare as they should be (but the fact is that these deadlocks were different and are tracked as two different issues in IssueZilla).
Agreed that inability to save modifications is the primary concrete harm caused by deadlocks. For that, see issue #7067, which I am willing to provide a simple implementation of, since I consider it much more critical than its current P4 rating indicates. If you think the other deadlock you got might be caused by the same problem, and is marked duplicate of this one, then that would make this P1.
To answer the Jesse's last questions: I can live with the bug until task #35833 is implemented (and integrated). I encountered it only once during an operation I do not perform often. I didn't get the deadlock the second try.
Re. calling into foreign code while holding locks. Are we sure we want to completely disallow that? Does that mean it is a problem to call to java.lang.* package? Probably not, there have to be some limits. And I thought that Registry is supposed to be "the base" for everything. It is of course bad to call unknown listeners from any synchronized code, that means it is bad to modify registry under synchronized code called by someone else, but read only access to registry should be as basic operation as reading java.util.pref which is obviously allowed.
"Re. calling into foreign code while holding locks. Are we sure we want to completely disallow that?" - no, it is necessary sometimes, but you need to be careful I guess. (By "foreign code" BTW I mean "module code whose behavior is not statically known", so not java.lang.) A basic rule is that event listeners on e.g. Registry need to be aware that they might be called while holding potentially unlimited foreign locks, and so should think twice before attempting to acquire locks themselves. In this example, o.n.m.p.ide.ui.looks.DefaultContainerLook$Idx.propertyChange is receiving a Registry change and then refiring a corresponding Look change. This, I believe, *should* be considered safe, since Look is in the GUI layer (depends on Registry but not vice-versa); and if it were not for ARQ attempting to access it from outside EQ with a special monitor, there would be no problem. I am not sure I can prove this mathematically (yet!) but I think the DefaultContainerLook code is OK. It is certainly the natural way to write it, so it would be undesirable if it were unsafe - lock ordering principles ought to be intuitive and easy to remember. Again, I believe that under #35833 there would be no deadlock here; Registry is not patched, and Look's (etc.) can freely listen to changes in e.g. Registry and refire events as a result.
P3 if only occurs in prj40_prototype. Petr can deal with this stuff now, perhaps? Maybe just wait for #35833.
As described in http://www.netbeans.org/servlets/ReadMsg?msgId=619519&listName=nbdiscuss the current work on projects prototype has been stopped. WONTFIX
closed