148817 – infinite wait on remote connection

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 148817 - infinite wait on remote connection

Summary: infinite wait on remote connection

Status:	RESOLVED FIXED

Alias:	None

Product:	cnd
Classification:	Unclassified
Component:	Remote (show other bugs)
Version:	6.x
Hardware:	Sun All

Importance:	P2 blocker (vote)
Assignee:	_ gordonp

URL:
Keywords:	PERFORMANCE

Depends on:
Blocks:

Reported:	2008-10-01 10:32 UTC by Alexander Simon
Modified:	2008-10-09 05:57 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
thread dump (20.62 KB, text/plain) 2008-10-01 10:33 UTC, Alexander Simon	Details
Full stack dump (20.09 KB, text/plain) 2008-10-06 16:18 UTC, Vladimir Kvashin	Details
Just two threads of interest from the previous full thread dump (5.14 KB, text/plain) 2008-10-06 16:20 UTC, Vladimir Kvashin	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alexander Simon 2008-10-01 10:32:55 UTC

I have got infinite wait. UI was frozen.
See tread dump.

Comment 1 Alexander Simon 2008-10-01 10:33:42 UTC

Created attachment 70957 [details]
thread dump

Comment 2 _ gordonp 2008-10-01 23:57:17 UTC

Is this reproducible? The stack trace is interesting. AWT is blocked waiting for a map
to be freed in a RequestProcessor thread. But the RP thread is reading the response
from a *local* "uname -a" command so it doesn't look like the RP itself is hung, which
would make me wonder if another thread dump would show a completely different snapshot.

Is there any chance you're PATH picks up a uname over an NFS connection? The RP is reading
input, so I would think the exec and open both worked. If the uname was picked up from an
NFS mounted filesystem them I could see it blocking mid-read. If it was local, I don't see
how that could happen...

Comment 3 Alexander Simon 2008-10-02 08:13:17 UTC

I have this waiting once on about 10 connections.
My local host is OS, remote is OS on elif, NB project was created in home.

Comment 4 _ gordonp 2008-10-02 16:17:26 UTC

Please attach some more thread dumps or just let me know if AWT is blocked on the same
object which the RP locks on line 121 in RemotePathMap.init(). Also, could you copy your
PATH to the issue (or email it to me)?

Comment 5 Alexander Simon 2008-10-02 16:57:23 UTC

Unfortunately it was yesterday and I take only one thread dump.
#echo $PATH
/export/home/as204739/ant/bin:/opt/csw/bin:/usr/bin:/usr/sbin:/usr/openwin/bin:/etc:/usr/ccs/bin:/usr/sfw/bin:/usr/local/bin:/opt/csw/bin:/opt/ant/bin:/opt/SUNWspro/bin:/opt/onbld/bin:/opt/cvs/bin:/usr/ucb

Comment 6 _ gordonp 2008-10-02 17:21:02 UTC

Have you seen this hang more than once? I couldn't tell if you saw it once during 10 connections
or were seeing it (on average) once every 10 connections.

One possible scenario which comes to mind is that one of the non-local directories in your path was
to a offline host and path searches were timing out on that host before completing. That would explain
why the stack trace looked like a short-term block rather than a deadlock.

If you've only seen this once then I'd like to either downgrade to P3 or close as WORKSFORME. If
its repeatable then I'll continue (but I'll still need some way of repeating it myself).

Comment 7 Alexander Simon 2008-10-02 19:07:55 UTC

I saw it once.
Some information about project: it was CLucene (small project).
vkvashin & sergius saw hang from begining to killing process.
May they can add more information.
vkvashin could you comment thread dump?
IMHO if you do not have any ideas, bug can be closed.

Comment 8 Leonid Lenyashin 2008-10-03 09:05:16 UTC

Please do not close this bug. Feel free to downgrade it and postpone to the next release if it does not repeat. However
I'd like this possible lock in AWT thread to be investigated and at best removed.

Comment 9 Sergey Grinev 2008-10-03 11:05:03 UTC

I believe initial issue (with hung uname) is rare one and isn't worth begin fixing now.
The bigger one is our threading model which allow potentially long calls from AWT thread (like "IZ146696 expensive use
of EDT for reading XML"). But it can't be resolved in this time frame.

So I suggest to downgrade/postpone this bug and address both minor and major issues in the next release.

Comment 10 _ gordonp 2008-10-03 19:16:04 UTC

Downgrading to P3. I'm removing the PERFORMANCE keyword because its not justified unless
the problem proves to be repeatable.

Comment 11 Vladimir Kvashin 2008-10-06 16:16:37 UTC

I faced the same (I believe) problem today. Unfortunately I can't reproduce this easily.
The use case was: 
- add a host (it was elif server with Solaris 10 x86)
- create a CLucene project
- switch it to the newly added remote host
- press "Clean and Build" toolbar button

The UI was frozen for several minutes.
After that I killed the application.

See attached thread dumps

Comment 12 Vladimir Kvashin 2008-10-06 16:18:15 UTC

Created attachment 71218 [details]
Full stack dump

Comment 13 Vladimir Kvashin 2008-10-06 16:20:20 UTC

Created attachment 71219 [details]
Just two threads of interest from the previous full thread dump

Comment 14 Vladimir Kvashin 2008-10-06 16:29:36 UTC

I strongly believe that methods that wait on "slow" semaphores should *never* be called from the event thread.

I mean that there are some semaphores that, are locked for a very short period of time: for example, one that guards
non-synchronized collection. These are "fast" semaphores. It's acceptable to call methods that wait on such locks in AWT
event processing thread (we can't do without that - unfortunately).

But locks that can, under some circumstances, be locked for long (invocation of other processes is an example) should
never be called from the event dispatching thread.

Comment 15 Egor Ushakov 2008-10-07 15:45:20 UTC

just an idea:
Runtime.exec may be blocked because of "incorrect" read from streams, see
http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html
Maybe we need to read stderr just in case?

Comment 16 _ gordonp 2008-10-07 23:17:10 UTC

changeset:   104640:f19eed127a99
user:        Gordon Prieur <gordonp@netbeans.org>
date:        Tue Oct 07 11:24:58 2008 -0700
summary:     Fixed IZ #148817  infinite wait on remote connection

Waiting for Sergey to review and QA to test.

Comment 17 Sergey Grinev 2008-10-08 16:12:29 UTC

reviewed, no objections
was integrated to trunk as http://hg.netbeans.org/main?cmd=changeset;node=d6b27640d009

Comment 18 Quality Engineering 2008-10-09 05:57:43 UTC

Integrated into 'main-golden', will be available in build *200810090201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/d6b27640d009
User: Gordon Prieur <gordonp@netbeans.org>
Log: Fixed IZ #148817  infinite wait on remote connection
Initialization of RemotePathMap was done from RemoteServerRecord.init from a separate RP. Switch
to the same thread RSR.init is in (its *not* the EDT) because a progress bar is up and EDT isn't blocked
this way.