Bug 233777 - Multiple GlassFish Status Task threads
Multiple GlassFish Status Task threads
Status: NEW
Product: serverplugins
Classification: Unclassified
Component: GlassFish
7.4
PC Windows 7
: P4 (vote)
: 8.0
Assigned To: TomasKraus
issues@serverplugins
: PERFORMANCE
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-01 12:24 UTC by Jiri Skrivanek
Modified: 2013-08-26 09:09 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jiri Skrivanek 2013-08-01 12:24:58 UTC
I see 3 GlassFish Status Task threads when IDE is running with registered GlassFish.

Product Version: NetBeans IDE Dev (Build 201307312300)
Java: 1.7.0_25; Java HotSpot(TM) 64-Bit Server VM 23.25-b01
Runtime: Java(TM) SE Runtime Environment 1.7.0_25-b15
System: Windows 7 version 6.1 running on amd64; Cp1250; en_US (nb)
Comment 1 TomasKraus 2013-08-01 12:34:10 UTC
This is OK. There are 3 checks:

 * admin port availability
 * asadmin version command
 * asadmin locations command

and all of them are asynchronous tasks executed from scheduled thread pool executor. So theoretically there can be N*3 threads where N is number of registered servers and those threads get blocked waiting for some network event.

This is there since 7.3.1 where checks were started in parallel on every refresh request.
In 7.4 it's done in background. Those tasks are usually very short - HTTP request + response. But in some cases (server startup, network issue) tasks can be seen quite long.
Comment 2 TomasKraus 2013-08-01 13:16:40 UTC
I would like to close this. I don't think it's a bug.
Comment 3 Petr Hejl 2013-08-01 13:27:36 UTC
Isn't this a scalability problem? I though there is at most one thread. Now you wrote there might be N*3. Would be good to know why such periodical check is needed and can't be done on demand.
Comment 4 TomasKraus 2013-08-01 14:24:04 UTC
It makes things to work much better - we know server status in real time.
 * Server status icons in Servers node can be up-to-date all the time, not just when user hits refresh.
 * Also this will resolve a lot of slowness reports - on demand check must wait on all those checks to finish - and with network issue it may take 30 seconds.
 * On demand check was doing all those checks too - in original 7.2 code they were serialized and network issue caused delay in minutes. I made it parallel in 7.3.1 but there were still up to 30 seconds delays.

Now tasks scheduling is following server state:
 * In OFFLINE/UNKNOWN states only admin port availabilyty check is being executed.
 * In ONLINE state all 3 checks are being executed with 2 seconds delay between them. And because ONLINE means everything works fine, they won't be alive for more than 50-200 ms depending on network delays.
 * The only phase where all 3 checks are being executed quite often and in parallel is STARTUP and SHUTDOWN state which is limited with timeouts. In this case checks are started in parallel in 2 seconds intervals. So here you can see them running all. And in case of network problem they may be blocked waiting for network timeout. But this state is triggered without user action.

So ...yes, theoretical N*3 may not sound good, but it's like quick sort which is O(n^2). But everyone is using it.
Comment 5 TomasKraus 2013-08-01 14:30:16 UTC
But this state is triggered without user action. -> But this state is never triggered without user action.

Also this kind of intensive checking was also in 7.3/7.3.1. But it was in Start/Stop/Restart Task in a cycle until timeout or server got up. Now it was just moved somewhere else.
Comment 6 Tomas Hurka 2013-08-01 15:14:13 UTC
Creating new thread for such purpose is a waste of OS resources and java heap. If cannot do the work on-demand, use RequestProcessor. It will use its own thread pool and it will crate new thread only if necessary.
Comment 7 Petr Hejl 2013-08-01 17:04:58 UTC
(In reply to comment #4)
> It makes things to work much better - we know server status in real time.
>  * Server status icons in Servers node can be up-to-date all the time, not just
> when user hits refresh.
Quite minor - nobody reported that as an issue.

>  * Also this will resolve a lot of slowness reports - on demand check must wait
> on all those checks to finish - and with network issue it may take 30 seconds.
Slowness reports are not caused by the fact something takes long time but by the code being executed in the awt blocking it. Unless you changed this root cause the slowness may happen again.

>  * On demand check was doing all those checks too - in original 7.2 code they
> were serialized and network issue caused delay in minutes. I made it parallel
> in 7.3.1 but there were still up to 30 seconds delays.

I still don't have answer why this _periodical_ check is needed. Only the first reason makes sense and it is quite minor.

The current solution has 2 major disadvantages:
- UI is not consistent
- It is wasting resources
Comment 8 TomasKraus 2013-08-01 19:29:46 UTC
(In reply to comment #7)
> (In reply to comment #4)
> >  * Also this will resolve a lot of slowness reports - on demand check must wait
> > on all those checks to finish - and with network issue it may take 30 seconds.
> Slowness reports are not caused by the fact something takes long time but by
> the code being executed in the awt blocking it. Unless you changed this root
> cause the slowness may happen again.

Yes, blocking refresh calls were called in awt on many places. Now refresh is not blocking, it just returns what is stored so it does fix root cause.

> 
> >  * On demand check was doing all those checks too - in original 7.2 code they
> > were serialized and network issue caused delay in minutes. I made it parallel
> > in 7.3.1 but there were still up to 30 seconds delays.
> 
> I still don't have answer why this _periodical_ check is needed. Only the first
> reason makes sense and it is quite minor.
>
It speeds up many things - refresh is being used on many places like deploy or UI components (server instance popup fields visibility, icons, subnodes content) and now it does not wait for requests over network.

> The current solution has 2 major disadvantages:
> - UI is not consistent
> - It is wasting resources

Major wasting of resources is the same - intensive asadmin version/locations asynchronous queries during startup and shutdown were there before. Now they are at least handled by java.util.concurrent.ScheduledThreadPoolExecutor which can handle periodical tasks much better.

> Creating new thread for such purpose is a waste of OS resources and java heap.
> If cannot do the work on-demand, use RequestProcessor. It will use its own
> thread pool and it will crate new thread only if necessary.

But I can use any ScheduledExecutorService implementation if there is better one in NetBeans.
Comment 9 Petr Hejl 2013-08-01 19:57:20 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #4)
> It speeds up many things - refresh is being used on many places like deploy or
> UI components (server instance popup fields visibility, icons, subnodes
> content) and now it does not wait for requests over network.
What happens if on the slow connection it will take 2 minutes and meanwhile user will invoke such UI. Either it will be blocked or the information won't be accurate. This does not seem to be fixing anything.
Comment 10 TomasKraus 2013-08-01 20:09:22 UTC
OK, and what do you suggest to do now? Rollback everything?
Comment 11 Petr Hejl 2013-08-02 08:19:55 UTC
(In reply to comment #10)
> OK, and what do you suggest to do now? Rollback everything?
Now:
- fix the slowness reports at appropriate places anyway
- reevaluate periodical check is needed
- use RequestProcessor
- would be nice to not use set of threads per instance

In future:
- carefully evaluate the problem so there are a strong reasons for chosen solution
- carefully check the solution is actually solving the problem
- solution should not make performance worse
- UI should be consistent - either it should be changed at all places or it should be preserved as is
Comment 12 TomasKraus 2013-08-02 10:24:50 UTC
Solution is not making performance worse. I still don't understand where you got this feeling.
Old code was executing tasks in threads the same way and it was even worse, there was no thread pool big enough to handle all of them so every time new thread was created. So there was set of threads per instance long time before I took this plugin.
Again, EVERY asadmin request WAS ALWAYS forking a new thread since this plugin was used for the first time. So why did not you rise your hand at the beginning.

I can't use single thread for everything. There are network requests using HttpURLConnection and this is blocking. Give me HTTP client based on non-blocking sockets with selector which is part of Java SE 7 before asking me to do it this way.

RequestProcessor will require some changes but I'll do that. Is is better than ScheduledThreadPoolExecutor from JDK?
Comment 13 Petr Hejl 2013-08-02 10:54:44 UTC
(In reply to comment #12)
> Solution is not making performance worse. I still don't understand where you
> got this feeling.
Because there is code being executed every 2 seconds, possibly in multiple threads.

> Old code was executing tasks in threads the same way and it was even worse,
> there was no thread pool big enough to handle all of them so every time new
> thread was created. So there was set of threads per instance long time before I
> took this plugin.
> Again, EVERY asadmin request WAS ALWAYS forking a new thread since this plugin
> was used for the first time. So why did not you rise your hand at the
> beginning.
Sorry, but I do not maintain GF plugin. I do care about the possible UI and performance regression. So if I feel something is wrong I comment on that. Does not apply to GF only.

> I can't use single thread for everything. There are network requests using
> HttpURLConnection and this is blocking. Give me HTTP client based on
> non-blocking sockets with selector which is part of Java SE 7 before asking me
> to do it this way.
I wrote that "it would be nice".

> 
> RequestProcessor will require some changes but I'll do that. Is is better than
> ScheduledThreadPoolExecutor from JDK?

I just wanted to know why is there a need to do this periodical check. As it looked like there is no one you asked for suggestions and you get them. If some of them can't be used (3 and 4 from now section) ok then. Others still apply I guess. I'm not going to spend more time on this.
Comment 14 Petr Jiricka 2013-08-02 11:20:36 UTC
One other reason why periodical code execution is bad is that this prevents the OS from swapping the code out from memory in case other processes need it. Not executing code periodically is good practice that NetBeans has strived for since the early days.
Comment 15 TomasKraus 2013-08-03 20:21:30 UTC
I'm targeting this to next release. I would like to get some feedback from users from 7.4 to see if they like this 'background checks' behaviour or not. 

Depending on this I can change monitoring framework strategy to stop server queries when there is no user activity.
But this will require status check before responding to upper layer when there was no recent monitoring activity (let's say 15 seconds) in online/offline state. So we will not get rid of delays.

But then there will be slowness issues again - for server node pop up menu all actions like start/start (debug)/stop/restart/... are dependent on current server state. And retrieving current server state may cost up to 30 seconds again! Unfortunately this is all in AWT queue.

I'm thinking about giving user a choice in next release - for few servers (up to 3) and strong enough HW he can turn this feature on. For many registered servers or weak HW this can be turned off.
Comment 16 TomasKraus 2013-08-04 19:37:07 UTC
In 7.4 I made one minor change:
-------------------------------
For local server check there is no need to run version command so there will be just admin port check and asadmin locations command.

Remote server check is already not executing locations command which makes no sense so it's just validating server version to match majon and minor version numbers.

I'm also considering to remove port check in OFFLINE state. It's a shortcut to allow direct ONLINE -> OFFLINE transition but it's wasting resources a bit.

Without port check there will be longer delay because of ONLINE -> OFFLINE_PORT -> OFFLINE transition:

 * ONLINE state: Server is marked as ONLINE for API in this state. Only location/version asadmin commands are executed depending on local/remote server. Asadmin command failure triggers ONLINE -> OFFLINE_PORT transition.

 * OFFLINE_PORT: Server is marked as OFFLINE for API in this state. Both asadmin command and admin port checks are needed in this state because there are 2 possible transitions:
   ** OFFLINE_PORT -> OFFLINE: when admin port is not listening to mark server as ready to start
   ** OFFLINE_PORT -> ONLINE: when asadmin command responded with 'our' server data to mar server as online

Another change being considered:
--------------------------------
OFFLINE_PORT will cause more intensive checks.
 * This usually means that something is happening with server - it may be starting up.
 * But it can also mean that this is not 'our server' occupying admin port.
   ** asadmin command is responding with data not matching registered server
   ** asadmin interface authentication failure

Bug# 233607 will handle asadmin interface authentication failure case. I'm working on change that allows to turn server monitoring off when authentication fails for server to stop those annoying java.net.Authenticator popups.
With reducing asadmin command checks to only on e for each server there will be always just one popup window.
Monitoring framework was already modified to allow monitoring restart on users request and to register listener to notify about leaving UNKNOWN state (need it to implement 100% reliable waiting loop on NB side).

After fixing Bug# 233607 I'll also add code to stop monitoring server when  it's not 'our one' but admin port is active.

Those changes won't remove background scans on 100% but they will be minimized to monitor active servers and notify about servers coming up when there was no issue.

Maximum possible amount of threads will be 2 - less than in old 7.3 code during startup! And only when there is something going on. Stable states (OFFLINE/ONLINE) won't run more than 1 thread every 5-10 seconds (this is one constant that I can tune to whatever we'll agree on).

Petr, this won't fully remove background checks in 7.4 but it will at least try to reduce them as much as possible without doing very large changes.
Comment 17 Quality Engineering 2013-08-07 02:25:48 UTC
Integrated into 'main-silver', will be available in build *201308062300* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)

Changeset: http://hg.netbeans.org/main-silver/rev/bd08db9052c5
User: Tomas Kraus <TomasKraus@netbeans.org>
Log: #233777 - Did some cleanup in GlassFishState#getStatus() method
Comment 18 TomasKraus 2013-08-26 09:09:03 UTC
I'm lowering this to P4 after some discussion with Petr Hejl. Count of background threads was lowered to have just one in stable states (ONLINE/OFFLINE) with longer period (6 s). Node icons are not updated automatically.

For the future I'll try to implement another monitoring strategy to allow stop periodic tasks in stable states and also to reduce or stop tasks even in unstable states when port is occupied by process that is not registered server.

Targeting for next release.


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo