This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 184421 - random failures in JSch
Summary: random failures in JSch
Status: RESOLVED FIXED
Alias: None
Product: cnd
Classification: Unclassified
Component: execution (show other bugs)
Version: 8.2
Hardware: PC Windows 7
: P3 normal (vote)
Assignee: ilia
URL:
Keywords: RANDOM, TEST
Depends on:
Blocks:
 
Reported: 2010-04-19 08:50 UTC by Alexey Vladykin
Modified: 2017-03-08 17:19 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
Possible patch to overcome the 'received data for non-existent channel' problem (10.05 KB, patch)
2010-08-19 10:15 UTC, Andrew Krasny
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Vladykin 2010-04-19 08:50:38 UTC
Some nativeexecution tests are randomly failing with the following symptoms:

FINE: New connection established: dev-tester1@cska:22 - Linux-SuSE
FINEST: JSCH: Caught an exception, leaving main loop due to SSH_MSG_DISCONNECT:
2 Received ieof for nonexistent channel 0. 
FINEST: JSCH: Disconnecting from cska port 22
Apr 16, 2010 3:24:12 PM
org.netbeans.modules.nativeexecution.support.hostinfo.FetchHostInfoTask compute
INFO: Exception while receiving hostinfo for dev-tester1@cska:22
java.io.IOException: Exception while receiving HostInfo for
dev-tester1@cska:22: com.jcraft.jsch.JSchException: session is down
    at org.netbeans.modules.nativeexecution.support.hostinfo.impl.UnixHostInfoProvider.getRemoteHostInfo(UnixHostInfoProvider.java:178)
    at org.netbeans.modules.nativeexecution.support.hostinfo.impl.UnixHostInfoProvider.getHostInfo(UnixHostInfoProvider.java:91)
    at
org.netbeans.modules.nativeexecution.support.hostinfo.FetchHostInfoTask.compute(FetchHostInfoTask.java:60)
    at
org.netbeans.modules.nativeexecution.api.util.HostInfoUtils.getHostInfo(HostInfoUtils.java:202)
    at org.netbeans.modules.nativeexecution.api.NativeProcessBuilderTest.testSetExecutable(NativeProcessBuilderTest.java:260)
    at org.netbeans.modules.nativeexecution.api.NativeProcessBuilderTest.testSetExecutable(NativeProcessBuilderTest.java:127)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at org.netbeans.junit.NbTestCase.access$200(NbTestCase.java:89)
    at org.netbeans.junit.NbTestCase$2.doSomething(NbTestCase.java:345)
    at org.netbeans.junit.NbTestCase$1Guard.run(NbTestCase.java:274)
    at java.lang.Thread.run(Thread.java:619)

According to http://www.mail-archive.com/jsch-users@lists.sourceforge.net/msg00673.html this is a race condition in JSch, which should be fixed in the next release. Current release 0.1.42 does not include the fix.

The error usually occurs when running tests, but there is nothing that prevents it from happening during regular IDE usage.
Comment 1 Alexey Vladykin 2010-04-19 16:07:33 UTC
After adding synchronization in UnixHostInfoProvider the bug typically occurs in SftpSupport:

FINE: New connection established: dev-tester1@spb-nero:22 - SunOS
FINEST: JSCH: Caught an exception, leaving main loop due to SSH_MSG_DISCONNECT: 2 Received ieof for nonexistent channel 0. 
FINEST: JSCH: Disconnecting from spb-nero port 22
Apr 19, 2010 6:08:00 PM org.netbeans.modules.nativeexecution.api.util.SftpSupport$Worker logException
INFO: Error Uploading /var/tmp/hudson3/test-upload-13951965097925722603.tmp to dev-tester1@spb-nero:22:/tmp/test-upload-13951965097925722603.tmp
com.jcraft.jsch.JSchException: session is down
	at com.jcraft.jsch.Channel.connect(Channel.java:185)
	at com.jcraft.jsch.Channel.connect(Channel.java:144)
	at org.netbeans.modules.nativeexecution.api.util.SftpSupport.getChannel(SftpSupport.java:159)
	at org.netbeans.modules.nativeexecution.api.util.SftpSupport.access$200(SftpSupport.java:75)
	at org.netbeans.modules.nativeexecution.api.util.SftpSupport$Uploader.work(SftpSupport.java:274)
	at org.netbeans.modules.nativeexecution.api.util.SftpSupport$Worker.call(SftpSupport.java:187)
	at org.netbeans.modules.nativeexecution.api.util.SftpSupport$Worker.call(SftpSupport.java:165)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:1413)
	at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:1952)
Comment 2 Alexey Vladykin 2010-04-20 10:50:45 UTC
Vladimir V reports that he saw occasional remote build failures when he was preparing for Tech Days presentation. Clicking "Build" again helped. However nobody knows if it was this problem or a different one.

I'm currently checking if the fix proposed at http://www.mail-archive.com/jsch-users@lists.sourceforge.net/msg00673.html help our the tests.
Comment 3 Alexey Vladykin 2010-04-20 13:23:59 UTC
Suggested change does not fix the nativeexecution tests.
Comment 4 Alexey Vladykin 2010-04-20 21:15:13 UTC
Synchronizing only parts of Channel.eof() and Channel.close() is not enough. It does not prevent sending eof after channel is closed. It is necessary to make the whole Channel.eof() body synchronized to fix the problem.

I've tried the patched JSch jar in our cnd-test-unstable job, and the "Received ieof for nonexistent channel 0" problem has gone.

There is one more problem, which sometimes occurred before the fix as well, but was hidden among lots of "ieof for nonexistent channel". It does not have any visible JSch error message at all. Here is a piece of test log:

FINEST: JSCH: Next authentication method: publickey
FINEST: JSCH: Authentications that can continue: keyboard-interactive,password
FINEST: JSCH: Next authentication method: keyboard-interactive
FINEST: JSCH: Authentications that can continue: password
FINEST: JSCH: Next authentication method: password
FINEST: JSCH: Disconnecting from endif port 22
Apr 20, 2010 8:58:03 PM org.openide.util.Exceptions printStackTrace
SEVERE: null
java.util.concurrent.CancellationException: Connection cancelled for rdtest@endif:22
	at org.netbeans.modules.nativeexecution.api.util.ConnectionManager.doConnect(ConnectionManager.java:260)
	at org.netbeans.modules.nativeexecution.api.util.ConnectionManager.connectTo(ConnectionManager.java:218)
	at org.netbeans.modules.nativeexecution.api.util.ConnectionManagerTest.testConnectDisconnect(ConnectionManagerTest.java:108)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at junit.framework.TestCase.runTest(TestCase.java:168)
	at org.netbeans.junit.NbTestCase.access$200(NbTestCase.java:89)
	at org.netbeans.junit.NbTestCase$2.doSomething(NbTestCase.java:345)
	at org.netbeans.junit.NbTestCase$1Guard.run(NbTestCase.java:274)
	at java.lang.Thread.run(Thread.java:619)
Comment 5 Alexey Vladykin 2010-04-22 13:21:12 UTC
The latter unexpected disconnect might be caused by connection timeout. Sometimes I get very long delay between typing "ssh host" and getting "Password:" prompt for our test machines, e.g. spb-nero and endif. The delay is longer than our 10 second connection timeout set in ConnectionManager.

I've added the following options when running tests in cnd-test-unstable:
-Djsch.connection.timeout=30000 -Dsocket.connection.timeout=30000 

Time will show if it helps.
Comment 6 Alexey Vladykin 2010-04-22 13:25:05 UTC
A new problem:

FINEST: JSCH: Authentication succeeded (keyboard-interactive).
FINE: New connection established: rdtest@endif:22 - SunOS
FINEST: rdtest@endif:22 sh "-c" "ps -ef | grep /var/tmp/dlight_rdtest/fa060dd5/tools/SunOS-x86_64/rfs_controller": State changed: INITIAL -> STARTING
Apr 22, 2010 5:19:29 PM org.netbeans.modules.nativeexecution.JschSupport execCommand
INFO: JSch exception
com.jcraft.jsch.JSchException: channel is not opened.
	at com.jcraft.jsch.Channel.connect(Channel.java:188)
	at org.netbeans.modules.nativeexecution.JschSupport.execCommand(JschSupport.java:75)
	at org.netbeans.modules.nativeexecution.RemoteNativeProcess.create(RemoteNativeProcess.java:54)
	at org.netbeans.modules.nativeexecution.AbstractNativeProcess.createAndStart(AbstractNativeProcess.java:143)
	at org.netbeans.modules.nativeexecution.api.NativeProcessBuilder.call(NativeProcessBuilder.java:230)
	at org.netbeans.modules.nativeexecution.api.util.ProcessUtils.execute(ProcessUtils.java:326)
	at org.netbeans.modules.nativeexecution.api.util.ProcessUtils.execute(ProcessUtils.java:272)
	at org.netbeans.modules.cnd.remote.sync.RfsRemoteControllerDeathTestCase.isAlive(RfsRemoteControllerDeathTestCase.java:139)
	at org.netbeans.modules.cnd.remote.sync.RfsRemoteControllerDeathTestCase.waitDeath(RfsRemoteControllerDeathTestCase.java:160)
	at org.netbeans.modules.cnd.remote.sync.RfsRemoteControllerDeathTestCase.testRfsRemoteControllerDeath(RfsRemoteControllerDeathTestCase.java:104)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at junit.framework.TestCase.runTest(TestCase.java:168)
	at org.netbeans.junit.NbTestCase.access$200(NbTestCase.java:89)
	at org.netbeans.junit.NbTestCase$2.doSomething(NbTestCase.java:345)
	at org.netbeans.junit.NbTestCase$1Guard.run(NbTestCase.java:274)
	at java.lang.Thread.run(Thread.java:619)
Comment 7 Leonid Lenyashin 2010-04-23 14:38:43 UTC
I do not see enough evidence it is a P2. Downgrading.
Comment 8 Alexey Vladykin 2010-04-29 11:19:53 UTC
Fixed "Connection cancelled" problem: http://hg.netbeans.org/cnd-main/rev/5717d914b053
Comment 9 Alexey Vladykin 2010-05-04 11:19:19 UTC
Fixed recovery after "channel is not opened" error:
http://hg.netbeans.org/cnd-main/rev/796a74f7f051
http://hg.netbeans.org/cnd-main/rev/5e5a2315043f
Comment 10 Alexey Vladykin 2010-05-04 11:20:56 UTC
I've fixed several problems with JSch session handling on CND side. Things are much better now.

However the race condition in JSch itself still needs to be addressed. I've tested JSch 0.1.43 RC1 provided by JSch developers and verified that it fixes the "Received ieof for nonexistent channel 0" problem. We should update JSch in NetBeans once 0.1.43 is released. Unlikely to happen in NetBeans 6.9 timeframe.
Comment 11 Andrew Krasny 2010-08-19 10:12:02 UTC
Currently release 0.1.43 is used (switched to it with http://hg.netbeans.org/cnd-main/rev/a17194648c9a)

It is better, but still contains many problems. One of them is sometimes shows up in our tests 'Received data for nonexistent channel'. It was described in a bug filed against JSch (https://sourceforge.net/tracker/?func=detail&aid=3047673&group_id=64920&atid=509122)

There are several ways to try to work-around this problem in our code. 

1. (used now as a hot fix) 
   changesets: cnd-main/175724:2053b11cde02   
               cnd-main/175826:8a9764127cc8

Before write anything to OutputStream of remote Channel, check that it is still alive.

This is not good solution as 
- it is still not guarantees that channel will 'die' after the check and before the write operation
- this is done in an only one place in the code. But there are other places, where write is performed about a check. Moreover, we cannot force users of our API to perform these checks on every write

2. There is a lack of 'code discipline' in JSch library which allows us to do some 'hacks'. We can override JSch, Session and ChannelExec objects and 'fix' this problem by overriding OutputStream object returned by a ChannelExec.

This solution is not very good as well, because it relies on extensibility of jsch's classes, and we must put our wrappers to libs.jsch module (to com.jcraft.jsch package) because of package-visible Session constructor...

3. We can do other hacks (like bytecode hacking ;) ) but I don't want to 

The idea for now is to wait for JSch fix (we have some time before the release) and for now just leave existent hot fix (No 1)

If there is no any fix from jsch team in time, we can use No. 2... 
Possible fix in this case is attached..
Comment 12 Andrew Krasny 2010-08-19 10:15:27 UTC
Created attachment 101521 [details]
Possible patch to overcome the 'received data for non-existent channel' problem

After applying, use new HotFixJSch() instead of new JSch() in client code
Comment 13 Leonid Lenyashin 2012-10-30 21:09:41 UTC
There was no activity on this issue for quite a long time. We apologize that the issue was not addressed so far due to lack of development resources. We might not have time in near future to fix this problem, so it is closed as WONTFIX.
If the issue is still critical for you please do not hesitate to REOPEN it.
Thank you for using our product and reporting bugs. We are really sorry that we were not able to fix this issue timely.

Regards,
CND team.
Comment 14 vkmaurya 2015-01-03 04:16:21 UTC
Hi, 
help me on following issue 

we are getting following  error 

com.jcraft.jsch.JSchException: session is down

on linux too many process id is appearing.  

550      30722 30719  0  2014 ?        00:00:00 sshd: testser@notty
550      30729 30726  0  2014 ?        00:00:00 sshd: testUser@notty
550      30754 30751  0  2014 ?        00:00:00 sshd: testUser@notty
550      30788 30785  0  2014 ?        00:00:00 sshd: testUser@notty
550      30930 30927  0  2014 ?        00:00:00 sshd: testUser@notty
550      30940 30937  0  2014 ?        00:00:00 sshd: testUser@notty
550      30960 30957  0  2014 ?        00:00:00 sshd: testUser@notty
550      30968 30965  0  2014 ?        00:00:00 sshd: testUser@notty
550      31001 30998  0  2014 ?        00:00:00 sshd: testUser@notty
550      31011 31008  0  2014 ?        00:00:00 sshd: testUser@notty
550      31143 31140  0  2014 ?        00:00:00 sshd: testUser@notty
550      31150 31147  0  2014 ?        00:00:00 sshd: testUser@notty
550      31184 31181  0  2014 ?        00:00:00 sshd: testUser@notty
550      31213 31210  0  2014 ?        00:00:00 sshd: testUser@notty
550      31229 31226  0  2014 ?        00:00:00 sshd: testUser@notty
550      31404 31401  0  2014 ?        00:00:00 sshd: testUser@notty
550      31435 31432  0  2014 ?        00:00:00 sshd: testUser@notty
550      31469     1  0  2014 ?        00:00:00 sshd: testUser@notty
550      31622 31619  0  2014 ?        00:00:00 sshd: testUser@notty
550      31652     1  0  2014 ?        00:00:00 sshd: testUser@notty
550      31883 31880  0  2014 ?        00:00:00 sshd: testUser@notty


 after killing pid  I am able to connect
Comment 15 ilia 2015-01-19 08:41:23 UTC
vkmaurya,

please create a new bug and attach a stacktrace of 
'com.jcraft.jsch.JSchException: session is down' exception,
the IDE log (http://wiki.netbeans.org/FaqLogMessagesFile)
and steps to reproduce your issue. Thanks!
Comment 16 davelhs03 2017-03-08 17:15:51 UTC
I'm also getting the channel is not opened when trying to SFTP a file.  It authenticates with the server but sending the file over is not happening...  Not sure how to get this issue resolved.  Please let me know if there's anything i can do to help or a link to the actual source and line that is causing the issue below found in var/log/uigestures


 <logger>org.netbeans.modules.php.project.copysupport.CopySupport</logger>
  <level>800</level>
  <thread>69</thread>
  <message>REMOTE copying fail: </message>
  <exception>
    <message>org.netbeans.modules.php.project.connections.RemoteException: Cannot connect to server 


<message>com.jcraft.jsch.JSchException: channel is not opened.</message>
    <frame>
      <class>com.jcraft.jsch.Channel</class>
      <method>sendChannelOpen</method>
      <line>765</line>
      <file>bundleresource://175.fwk1404521754/com/jcraft/jsch/Channel.class</file>
    </frame>
    <frame>
      <class>com.jcraft.jsch.Channel</class>
      <method>connect</method>
      <line>151</line>
      <file>bundleresource://175.fwk1404521754/com/jcraft/jsch/Channel.class</file>
    </frame>
    <frame>
      <class>com.jcraft.jsch.Channel</class>
      <method>connect</method>
      <line>145</line>
      <file>bundleresource://175.fwk1404521754/com/jcraft/jsch/Channel.class</file>
    </frame>
    <frame>
      <class>org.netbeans.modules.php.project.connections.sftp.SftpClient</class>
      <method>connectSftpClient</method>
      <line>281</line>
      <file>${netBeansDir}modules/org-netbeans-modules-php-project.jar</file>
    </frame>
    <frame>
      <class>org.netbeans.modules.php.project.connections.sftp.SftpClient</class>
      <method>init</method>
      <line>159</line>
      <file>${netBeansDir}modules/org-netbeans-modules-php-project.jar</file>
    </frame>
   <more>15</more>
  </exception>
Comment 17 davelhs03 2017-03-08 17:17:24 UTC
Also getting

<message>com.jcraft.jsch.JSchException: Session.connect: java.net.SocketTimeoutException: Read timed out</message>
Comment 18 davelhs03 2017-03-08 17:18:16 UTC
<message>REMOTE copying for project myproject failed but not disabled by user =&gt; resetting</message>
Comment 19 davelhs03 2017-03-08 17:19:31 UTC
<message>REMOTE copying for project docprocessing failed but not disabled by user =&gt; resetting</message>
<exception>
    <message>org.netbeans.modules.php.project.connections.RemoteException: Cannot connect to server myserver.</message>