This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 33398 - IDE froze during project switch in Mutex
Summary: IDE froze during project switch in Mutex
Status: VERIFIED FIXED
Alias: None
Product: platform
Classification: Unclassified
Component: -- Other -- (show other bugs)
Version: 3.x
Hardware: All All
: P2 blocker (vote)
Assignee: Petr Nejedly
URL:
Keywords: RANDOM, THREAD
: 35105 36267 41776 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-05-05 17:14 UTC by ehucka
Modified: 2008-12-23 10:49 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
Thread dump. (14.94 KB, text/plain)
2003-05-05 17:14 UTC, ehucka
Details
Possible fix (2.88 KB, patch)
2004-04-07 16:44 UTC, Petr Nejedly
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description ehucka 2003-05-05 17:14:07 UTC
Nevada build 030502.
Linux RH 7.2, GNome
JDK 1.4.1_02

I don't know if it is in core.

I only tried to open next project by Project
Manager. IDE freezed but its processes were taking
about 80% of CPU. I was waiting for a few minutes
but nothing happened.
Durring opening new project I switched to another
GNome workspace and back.
Comment 1 ehucka 2003-05-05 17:14:47 UTC
Created attachment 10220 [details]
Thread dump.
Comment 2 pzajac 2003-05-05 17:47:04 UTC
Petr try to evaluate. 
Comment 3 _ ttran 2003-05-06 07:52:15 UTC
ppl, please mark deadlock bugs w/ THREAD keyword.  Thx
Comment 4 Petr Nejedly 2003-05-06 08:29:23 UTC
Very strange.
The first thread in the listing is waiting for 0x44c224d8, which is
nowhere marked as locked (in fact, it was just released by 5th thread)
2, 3, 4 are waiting to enter the Children.MUTEX as well as 10th (AWT)
5th is also trying to enter as a writer. Mutex questions:
  *) Trying to upgrade??
  *) Does it first free its S status??
  *) readresNo -= 2;??

The 5th is marked as runnable although it's in Object.wait()
I'm a bit confused.
Comment 5 _ ttran 2003-05-06 08:40:10 UTC
a question from  a different angle: how frequently does this bug
happen?  In other words, how important is it for 3.5/S1S5 release?  (I
assume we don't have a reliable testcase to reproduce it :-(
Comment 6 Marian Mirilovic 2003-05-07 08:01:22 UTC
This issue is not reproducible, it has been only once, so decrease
priority to P3.
Comment 7 _ ttran 2003-05-07 16:23:50 UTC
bcs it's not reproducible and the thread dump is not helpful enough,
we can't fix it now for 3.5
Comment 8 Marian Mirilovic 2003-05-23 09:15:13 UTC
Somebody else has reported the same problem as a part of issue 33840.
Comment 9 Marian Mirilovic 2003-07-23 15:05:58 UTC
*** Issue 35105 has been marked as a duplicate of this issue. ***
Comment 10 Marian Mirilovic 2003-07-23 15:07:58 UTC
reproduced again (see issue 35105) -> P2
Comment 11 Petr Nejedly 2003-07-24 08:21:11 UTC
Jesse, don't you see anything suspisious?
It seems this problem happens when two readers leave their readAccess
"at once" and there is some write request chained.
In this thread dump, I see no lock that could cause it,
only the strange coincidence in 5th thread (just left the lock the 1st
needs and then went sleep, moreover it looks like it either didn't
release the lock when going to sleep or (more probable) was just woken
up and reaquired the lock as it is marked runnable - why it isn't
running then?)

If anybody reproduce this problem again - already reported thrice so
it should be possible :-( - please make more thread dumps, it may be
that the thread got woken up and went back to sleep immediatelly
somehow....

Now, in the latest report (issue 35105), the dump is more interesting:
Again, two threads are leaving, one correctly waits on M$QC(02D418D8)
and the other have it already locked, is again in Object.wait()
but is *not* marged runnable and didn't released the lock while
waiting:
"Default RequestProcessor" daemon prio=2 tid=0x12DF3808 nid=0x204 in
Object.wait() [e6af000..e6afd8c]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:426)
        at org.openide.util.Mutex$QueueCell.sleep(Mutex.java:1172)
        - locked <02D418D8> (a org.openide.util.Mutex$QueueCell)
        at org.openide.util.Mutex.privilegedEnter(Mutex.java:561)

Comment 12 Jesse Glick 2003-07-24 14:49:22 UTC
Please don't ask me anything about the current Mutex impl. :-) I don't
understand it at all. That is why I have it completely rewritten in a
branch...
Comment 13 Petr Nejedly 2004-01-09 10:25:57 UTC
Is it still happening? No report for half a year ... 
Comment 14 ehucka 2004-01-09 10:39:25 UTC
I have seen it only once.
Comment 15 Petr Nejedly 2004-01-09 15:25:32 UTC
*** Issue 36267 has been marked as a duplicate of this issue. ***
Comment 16 Petr Nejedly 2004-01-09 15:28:58 UTC
OK, the bad news is that it is still happening.
The worse news is that it probably is inside of org.openide.util.Mutex
and nobody understands why it is happening.
Comment 17 Petr Nejedly 2004-02-23 11:37:18 UTC
We are not able to safely reproduce the problem and we don't know why
and when it is happening. For this reason I can't safely provide a fix
for this problem for NB3.6

I suggest waiving this issue for NB3.6 and try to provide some
debugging hooks into dev codebase after 3.6 release so we can better
understand the problem.
Comment 18 Patrick Keegan 2004-04-01 18:29:36 UTC
should this be relnoted for 3.6?
Comment 19 Petr Nejedly 2004-04-07 14:52:56 UTC
*** Issue 41776 has been marked as a duplicate of this issue. ***
Comment 20 Petr Nejedly 2004-04-07 14:54:07 UTC
Patrick:
Probably don't need to be relnoted (if it should, we'd better fix it
for 3.6 anyway)
Comment 21 Petr Nejedly 2004-04-07 15:47:50 UTC
Details:
No deadlock, it is a livelock, two (or more) threads are pinging each
other through the waiters queue.
*) in privilegedEnter, a thread (A) tries to chain itself with quite
high priority, but if there are two waiters with the same priority,
it is chained after the other one.

*) then it wakes the other waiter (B) (which removes it from the queue)

*) B re-chains itself, but because it is not the first one (the same
prio as A->chain after A), wakes A again and goes to sleep

... and again and again ...
Comment 22 Petr Nejedly 2004-04-07 16:44:14 UTC
Created attachment 14320 [details]
Possible fix
Comment 23 Petr Nejedly 2004-04-08 10:51:55 UTC
I've integrated the patch.
Maybe the code can be simplified more, but it works this way...
Comment 24 Marian Mirilovic 2004-04-08 10:54:41 UTC
Petr,
do not forget set appropriate TM if you solve an issue !

Thanks in advance ;)
Comment 25 ehucka 2004-07-16 11:07:07 UTC
verified