70939 – Long delay after an application built on top of NB is started

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 70939 - Long delay after an application built on top of NB is started

Summary: Long delay after an application built on top of NB is started

Status:	RESOLVED FIXED

Alias:	None

Product:	platform
Classification:	Unclassified
Component:	-- Other -- (show other bugs)
Version:	5.x
Hardware:	All All

Importance:	P3 blocker (vote)
Assignee:	Jaroslav Tulach

URL:
Keywords:	API_REVIEW_FAST, PERFORMANCE

Depends on:	44083
Blocks:
	Show dependency tree

Reported:	2006-01-03 11:57 UTC by Martin Krauskopf
Modified:	2008-12-22 22:08 UTC (History)
CC List:	3 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
threaddump during idling (3.21 KB, text/plain) 2006-01-03 11:58 UTC, Martin Krauskopf	Details
property API Based Fix (2.91 KB, patch) 2006-01-05 14:37 UTC, Jaroslav Tulach	Details \| Diff
Patch that I wish to apply tomorrow (8.82 KB, patch) 2006-01-11 16:14 UTC, Jaroslav Tulach	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Krauskopf 2006-01-03 11:57:35 UTC

It happens to me quite often that after I start an application build on top of
NetBeans there is a long delay before application actually begins to start. As I
was told by Jarda it is because of "secure random" generation. But it is quite
annoying since it happen almost everytime. If I don't touch anything after
running my application nothing happens for about 90s (than I give up and move
the mouse).
Note that I use mouse really rarely (keyboard doesn't help too much). Don't know
the magic behind the scene.
Attaching threaddump....

Comment 1 Martin Krauskopf 2006-01-03 11:58:29 UTC

Created attachment 28142 [details]
threaddump during idling

Comment 2 Jaroslav Tulach 2006-01-04 09:36:31 UTC

According to bug priority guidelines I would most of all prefer following  
category: "Part of a product feature is affected, a viable workaround exists"  
- as such there is no reason why this should be P2.

Comment 3 Jaroslav Tulach 2006-01-04 09:42:40 UTC

Another interesting question is whether anyone else has seen such behaviour. 
Plus whether the behaviour happened only on linux or on any operating system. 
As the generation of secure random is potentially a system dependent call, it 
may show that the problem is only on linux boxes.

Comment 4 Martin Krauskopf 2006-01-04 10:08:16 UTC

I would prefer P2 until we are sure that it is really rare case. Since it could
affect all applications built on top of NetBeans, maybe also NetBeans itself. I
also think that "pressing Enter" on a application and wait for it to start is
not so rare case among developers - I believe there are lot of them which are
tend to use mouse as less as possible. Since when you start an application with
a mouse you will presuambly not encounter this problem, since you click and move
the mouse at least a little which is probably enough to pass "random generator".
So far it seems to me that random generator is heavily mouse-dependent which is
strange according to what you told me personally (e.g. should be also
"network-dependent").
Probably perfromance team should decide or HIE? I'm not sure. Maybe I'm too
biased since I encounter this very often using shortcuts in the IDE and
keyboard-only outside of it.

Comment 5 Antonin Nebuzelsky 2006-01-04 11:01:38 UTC

> According to bug priority guidelines I would most of all prefer following
> category: "Part of a product feature is affected, a viable workaround exists"
> - as such there is no reason why this should be P2.

Performance bug priority criteria apply to performance issues:
http://performance.netbeans.org/processes/bug_priority_guidelines.html

As such, this might even be P1. ;)

> I would prefer P2 until we are sure that it is really rare case.

We discussed this issue on performance meeting and agreed that this should stay
P2 and waiver process will help us make sure that this is really rare case. If
noone in Netcat encountered this issue, it may be OK waiving this.

Comment 6 Jaroslav Tulach 2006-01-04 12:42:56 UTC

I am not going to accept this as a performance problem until a way how to  
measure the problem is found. Sorry, that judgement cannot be done on just on  
your feelings, if that problem is performance one, provide a reliable way how  
to measure it and show that it appears on all supported platforms.   
  
Until then we cannot apply performance dashboard, just regular functional one  
and that is clear - "viable workaround exists" - that means it is P3. Please  
do your homework, provide reliable performance and repeatable measurements 
before trying to raise the priority again.

Comment 7 Martin Krauskopf 2006-01-04 14:23:04 UTC

I just see this as a real problem from the user's (my) point of view. If you
as a developer of that code feel that this is P3 since the user can simple
deduce that they should move mouse or playing on their keyboard after
they started an application, leave if P3.

Anyway at least I put a simple measurement into CLIHandler - luckily the
user had a NB source checked out :), where the code "stops" and it gives
me following result after more or less subsequential runs. I didn't
touch anything after I started an application and before splash
appeared.

Problem probably will be in an early next application run - so this is
the reason why P3 would be eligible - but just a guess so far. Probably
Random generator needs some time to relax.

SecureRandom.getInstance("SHA1PRNG").nextBytes(arr) call in CLIHandler takes:

6-beta-b59c
 120msec (first start)
 19126msec
 28690msec
 19954msec
 15474msec
 12648msec
 22952msec
 20462msec

1.4.2_10
 26658msec
 26714msec
 20488msec

1.5.0_06
 25688msec
 25066msec
 13131msec
 17895msec

Comment 8 Jesse Glick 2006-01-04 19:10:52 UTC

I have also noticed longish delays (on Linux) on occasion, but not routinely.

Comment 9 Jaroslav Tulach 2006-01-05 14:37:01 UTC

Created attachment 28205 [details]
property API Based Fix

Comment 10 Jaroslav Tulach 2006-01-05 14:37:48 UTC

Martin, does it work? Others, do you like such an API?

Comment 11 Jesse Glick 2006-01-05 15:59:44 UTC

Well here's the threat model. Person A is running NB and developing modules
(against the full IDE, not just the NB Platform). Person B is on the same LAN as
A (or can otherwise connect to A's computer directly) and also A and B share a
common filespace (e.g. network share drive). A's system clock is accurate to +-
15 seconds (using NTP, say).

B constructs a module containing some evil ModuleInstall.restored(). B then
waits for A to launch his app from NB (say, by watching through a window). When
A launches it, B starts a script which runs through all 30000 or so possible
values of the seed for new Random() and tries to connect to A's app using keys
based on those seeds. Assuming a connection attempt can be made in 50msec if A's
computer is fast, B can succeed within about 25 minutes (assuming A does not
shut down the app within that time). If B succeeds he can control A's computer
by running --reload .../evil.jar (though A may notice something weird going on).

Suggested countermeasure:

Index: core/bootstrap/src/org/netbeans/CLIHandler.java
*** CLIHandler.java Base (1.31)
--- CLIHandler.java Locally Modified (Based On 1.31)
***************
*** 810,815 ****
--- 810,816 ----
          private Socket work;
          private static volatile int counter;
          private final boolean failOnUnknownOptions;
+         private static long doPrdele = 1000L;
          
          public Server(byte[] key, Integer block, Collection handlers, boolean
failOnUnknownOptions) throws IOException {
              super("CLI Requests Server"); // NOI18N
***************
*** 984,990 ****
--- 985,999 ----
              } else {
                  enterState(103, block);
                  os.write(REPLY_FAIL);
+                 try {
+                     Thread.sleep(doPrdele);
+                 } catch (InterruptedException e) {
+                     stopServer();
                  }
+                 if (doPrdele * 2 > 0L) {
+                     doPrdele *= 2;
+                 }
+             }
              
              
              enterState(120, block);

Comment 12 Martin Krauskopf 2006-01-05 16:10:24 UTC

As I understand it this will help just in the case I run a suite (or app on top
of NB) from within an IDE - or by the help of ant - during development phase in
other words? On the other side it will not help to a final application at all
(until this magic property is documented somewhere).
As I said one probably notices this mainly during development phase so it solves
the (my) main problem.
But until anybody understand what really happens behind the scene one may just
guess that the problem doesn't appear in the final applications (assuming that
repeated start in a short time is an edge case). So just my guess that this
patch is ok under "normal" circumstances for others :)
Anyway the patch is enough *for me*.

Comment 13 Jaroslav Tulach 2006-01-11 16:14:33 UTC

Created attachment 28352 [details]
Patch that I wish to apply tomorrow

Comment 14 Jaroslav Tulach 2006-01-11 16:22:16 UTC

I'd like to apply the  new patch (with docs and test) as it fixes Martin's 
problem and does not change the behaviour of production version of NetBeans. 
 
Re Jesse: If I understand correctly, you suggest to delay response if there is 
an unauthenticated access to our port. Possibly also useful: so everytime 
someone connects to our port and does not know the key we double the response 
time for such requests. That is a nice enhancement. However I would prefer the 
property patch due to its no influence (changes) on production code.

Comment 15 Jesse Glick 2006-01-11 22:43:59 UTC

My point is that by applying this patch as is, you introduce a (theoretical)
security vulnerability for module developers which did not exist before. That is
why I recommend that it not be committed without a corresponding fix for the
security problem, such as an exponential delay to discourage cracking attempts.
Whether the delay is used even for "production" startups (with SecureRandom) is
less important for now.

Comment 16 _ rkubacki 2006-01-13 10:43:08 UTC

It happens to me as well on my new Ultra 20 / Linux FC4 x64. Start the IDE and
do nothing (no mouse movements, no keyboard input), quit it when it is up w/
Alt-F X and start it again. Now it takes up to 10 seconds to generate next
random number.

Comment 17 Jaroslav Tulach 2006-01-17 15:46:54 UTC

The security concerns were addressed by: 
 
Checking in src/org/netbeans/CLIHandler.java; 
/cvs/core/bootstrap/src/org/netbeans/CLIHandler.java,v  <--  CLIHandler.java 
new revision: 1.34; previous revision: 1.33 
done 
RCS 
file: /cvs/core/bootstrap/test/unit/src/org/netbeans/CLIHowHardIsToGuessKeyTest.java,v 
done 
Checking in test/unit/src/org/netbeans/CLIHowHardIsToGuessKeyTest.java; 
/cvs/core/bootstrap/test/unit/src/org/netbeans/CLIHowHardIsToGuessKeyTest.java,v  
<--  CLIHowHardIsToGuessKeyTest.java 
initial revision: 1.1 
 
I'll go ahead and commit the fix.

Comment 18 Jaroslav Tulach 2006-01-17 16:05:15 UTC

/cvs/core/bootstrap/src/org/netbeans/CLIHandler.java,v  <--  CLIHandler.java 
new revision: 1.35; previous revision: 1.34 
done 
Checking in apisupport/harness/release/run.xml; 
/cvs/apisupport/harness/release/run.xml,v  <--  run.xml 
new revision: 1.15; previous revision: 1.14 
done 
RCS 
file: /cvs/core/bootstrap/test/unit/src/org/netbeans/CLIDoesNotQuerySecureRandomTest.java,v 
done 
Checking in 
core/bootstrap/test/unit/src/org/netbeans/CLIDoesNotQuerySecureRandomTest.java; 
/cvs/core/bootstrap/test/unit/src/org/netbeans/CLIDoesNotQuerySecureRandomTest.java,v  
<--  CLIDoesNotQuerySecureRandomTest.java 
initial revision: 1.1 
done 
Checking in apisupport/harness/arch.xml; 
/cvs/apisupport/harness/arch.xml,v  <--  arch.xml 
new revision: 1.3; previous revision: 1.2

Comment 19 Petr Nejedly 2006-01-25 15:30:41 UTC

Jesse,
Your security threat model has one moot point:
> Person B is on the same LAN as A (or can otherwise connect to A's computer
> directly) 

This is not enough. The IDE's socket is opened on INADDR_LOOPBACK, not
INADDR_ANY, so the attacker would have to be on the same _machine_ as the
victim* and that is hardly a threat we need to address.

Based on the fact above, I'd even suggest getting rid of the cryptorandom at
all. It would fasten startup and slightly reduce the number of loaded classes.
What does perf team think about it?

*) OK, there is a theoretical possibility of spoofing packets with target
address=127.0.0.1 through the victim's external network interface (eth0).
But this case is guarded by about every firewall by default.

Note: I realized that while we try to find 127.0.0.1 among machine's interfaces,
we may actually fail to do so and end up listening on public address of the
host. So there are two options here: disable CLIhandler completly in that (I
strongly believe to be rare) case or fallback to cryptorandom then.

Comment 20 Jesse Glick 2006-01-25 18:36:54 UTC

"the attacker would have to be on the same _machine_ as the victim and that is
hardly a threat we need to address" - this is a common enough scenario on
multiuser Unix machines. E.g. SunRays. Surely these are a small minority of our
users, but I would feel uncomfortable intentionally weakening security even for
such a minority.

Comment 21 Jaroslav Tulach 2006-08-10 05:23:35 UTC

Friend API removed and the SecureRandom access has been made async as part of 
fix of issue 44083.