252073 – Can expand no explorer nodes if one directory is very slow

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 252073 - Can expand no explorer nodes if one directory is very slow

Summary: Can expand no explorer nodes if one directory is very slow

Status:	RESOLVED FIXED

Alias:	None

Product:	platform
Classification:	Unclassified
Component:	Data Systems (show other bugs)
Version:	8.1
Hardware:	All All

Importance:	P1 normal (vote)
Assignee:	Jaroslav Havlin

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-04-24 12:15 UTC by Vladimir Kvashin
Modified:	2015-09-16 06:53 UTC (History)
CC List:	6 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
IDE log and full thread dump (53.27 KB, text/plain) 2015-04-24 12:15 UTC, Vladimir Kvashin	Details
Proposed fix (1010 bytes, patch) 2015-04-24 12:16 UTC, Vladimir Kvashin	Details \| Diff
Sketch of possible fix (request processor per mount point) (4.87 KB, patch) 2015-09-08 15:00 UTC, Jaroslav Havlin	Details \| Diff
Possible fix (request processor per FileSystem instance) (5.48 KB, patch) 2015-09-09 14:43 UTC, Jaroslav Havlin	Details \| Diff
Proposed Patch (3.14 KB, patch) 2015-09-10 06:49 UTC, Jaroslav Havlin	Details \| Diff
Proposed Patch v2 (10.55 KB, patch) 2015-09-11 10:44 UTC, Jaroslav Havlin	Details \| Diff
Extended patch (6.39 KB, patch) 2015-09-11 15:02 UTC, Jaroslav Havlin	Details \| Diff
Show Obsolete (4) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Vladimir Kvashin 2015-04-24 12:15:10 UTC

Created attachment 153367 [details]
IDE log and full thread dump

Consider a situation when a certain directory is very slow (for example is a broken link). Once I tried to expand such directory, no other directory can be expanded. In my case it hanged for 5 minutes.

Comment 1 Vladimir Kvashin 2015-04-24 12:16:49 UTC

Created attachment 153368 [details]
Proposed fix

Comment 2 Vladimir Kvashin 2015-04-24 12:20:00 UTC

The attached fix solves the problem. The 1-threaded request processor was introduced when fixing issue 200752, previously such requests were posted into default request processor.

Comment 3 Vladimir Kvashin 2015-05-13 07:37:24 UTC

Jarda, could you please have a look on this? 
The solution might be as simple as attached patch is (although I'm not 100% sure)

Comment 4 Jaroslav Havlin 2015-05-14 08:16:06 UTC

The patch seems fine to me. Feel free to commit this change.
Thank you.

Comment 5 Vladimir Kvashin 2015-05-14 11:09:30 UTC

Pushed into cnd-main:
http://hg.netbeans.org/cnd-main/rev/5196ecaae73b

Comment 6 Quality Engineering 2015-05-15 02:40:41 UTC

Integrated into 'main-silver', will be available in build *201505150001* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)

Changeset: http://hg.netbeans.org/main-silver/rev/5196ecaae73b
User: Vladimir Kvashin <vkvashin@netbeans.org>
Log: fixing #252073 - Can expand no explorer nodes if one directory is very slow

Comment 7 Jaroslav Havlin 2015-09-07 13:35:28 UTC

I'm sorry, I have to revert the fix.
It turned out that it causes a regression (bug 254845) and that FolderChildren were not designed to run in multiple threads.

To fix the problem, we would need to ensure that FolderChildren tasks for a folder are always processed by the same thread (+ apply fix suggested in bug 254845).

The change could be quite risky now, so I'm also adding 8.1_WAIVER_REQUEST keyword.

Comment 8 Jaroslav Havlin 2015-09-07 13:52:26 UTC

http://hg.netbeans.org/core-main/rev/cc9d38e2cdbe
(Commit reverting the fix.)

Comment 9 Quality Engineering 2015-09-08 01:22:34 UTC

Integrated into 'main-silver', will be available in build *201509080002* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)

Changeset: http://hg.netbeans.org/main-silver/rev/cc9d38e2cdbe
User: Jaroslav Havlin <jhavlin@netbeans.org>
Log: #252073: Revert the fix

Comment 10 Jaroslav Havlin 2015-09-08 15:00:06 UTC

Created attachment 155971 [details]
Sketch of possible fix (request processor per mount point)

Comment 11 Jaroslav Havlin 2015-09-09 11:17:41 UTC

The fix is getting quite complex. We need to ensure that we are fixing the actual problem.

The basic idea of the proposed fix is to have a dedicated request processor for each mounted drive. So if access to some network drive (or broken USB flash disk) is slow, directories in other drives can still be expanded.
The advantage is that FolderChildren tasks do not compete for I/O access, and that one slow drive does not block tasks related to other drives.

The problems are:
 - Mount points need to be parsed from
   java.nio.file.FileStore.toString() - unreliable
 - It is not possible to listen to changes in available
   file stores (so we need some heuristics to guess when
   update of mount point list is required)
 - If a file is moved, tasks for old and new paths will be
   executed by different processors, we need to solve this somehow

Maybe there is some simpler and safer fix.

> Consider a situation when a certain directory is very slow (for example is
> a broken link).
Can you please describe the case that you encountered? Was the slow directory on a remote drive? Or a link on local drive referenced a folder on remote drive?
What are reproducible steps to reproduce?

Thank you.

Comment 12 Vladimir Voskresensky 2015-09-09 11:38:20 UTC

Our case is a Development against remote host. 
So, the simpler way to distinguish mount points is just check FileSystem owner of FileObject.
Dedicated Processor for each FS impl instance would solve the issue.
Then getting children for remote folder of Host1 (managed by instance_1 of RemoteFS) would not block MasterFS, ZipFS and other instances of connected RemoteFS

Comment 13 Jaroslav Havlin 2015-09-09 14:43:25 UTC

Created attachment 156019 [details]
Possible fix (request processor per FileSystem instance)

Comment 14 Jaroslav Havlin 2015-09-09 15:04:04 UTC

> So, the simpler way to distinguish mount points is just check FileSystem owner
> of FileObject.
Great, it makes the implementation much simpler.

Please review the attached patch.

Probably the most controversial change is calling setKeys in synchronized method (FolderChildren.applyKeys). If the folder is moved from one filesystem to another, two FolderChildren tasks can be posted to different request processors. We need to ensure that only the last created task's result is used, otherwise outdated keys might be applied.

Existing tests pass. I'll also try to prepare a test for reported bug and thoroughly check if the change can cause some deadlocks.

Thank you.

Comment 15 Jaroslav Havlin 2015-09-09 15:59:41 UTC

I'm sorry, the synchronization causes a deadlock. The last patch cannot be used.

Comment 16 Jaroslav Havlin 2015-09-10 06:49:27 UTC

Created attachment 156032 [details]
Proposed Patch

Updated patch.

In the original code, serial execution of refresh tasks for a FolderChildren was ensured by posting all tasks to the same request processor (shared by all FolderChildren instances).

In the latest patch, serial execution of refresh tasks is ensured by waiting for previously posted tasks (regardless of to which request processor they have been posted). There is only one new synchronized block, which contains assignment and posting of the task to request processor, so there is no new risk of deadlocks.

Some slow data system can block a fast one if a folder was moved from the slow one the fast one. Tasks in "fast" request processor need to wait for finishing of task posted previously to the "slow" request processor. This seems acceptable and natural.

Data System Tests and commit-validation pass if the patch is applied.

Please review the update patch.

Comment 17 Martin Entlicher 2015-09-10 15:20:07 UTC

I've checked usages of DataNodeUtils.reqProcessor(), which should not be used any more for tasks that can touch files on the underlying filesystem.
The usages are:

1) NewTemplateAction.RootChildren.updateKeys()
   - probably O.K., it access the system FS.
2) DataNode.ExtensionProperty.setValue()
   DataNodeUtils.reqProcessor().post... should be replaced with:
   DataNodeUtils.reqProcessor(obj.getPrimaryFile()).post...
3) DataNode.setShowFileExtensions()
   - this is a bit problematic, it refreshes the display name of all files
     in one RP. If some file access blocks, it will block the generic
     DataNodeUtils.reqProcessor().
   The solution might be to collect all active DOs and find their corresponding
   RPs via DataNodeUtils.reqProcessor(obj.getPrimaryFile()). Then post
   the display name update in the collected RPs for their DOs.
   I'm not sure about a necessity of locking in case DOs primary file happens
   to change in the mean time.
4) DataNode.PropL.annotationChanged()
   It posts new NamesUpdater() into the generic DataNodeUtils.reqProcessor(),
   I'm not sure if this can block somewhere on a file access or not.
   Due to fireChangeAccess() call, I'm afraid we can never be sure... :-(
   Ideally, a logic similar to (3) would have to be applied.
5) FolderChildren.waitRefresh() - this method should be removed,
   it has no usages, I hope that nobody calls it by reflection...
   It's probable that it was used in logging in the past only
   (according to hg annotations).
6) FolderChildren.refreshChildren()
   This is addressed by the attached patch
7) DelayedNode.scheduleRefresh()
   It should be verified, that DelayedNode.run() does not touch file state.
   If in doubt, replace:
   task = DataNodeUtils.reqProcessor().post(this);
   with:
   task = DataNodeUtils.reqProcessor(pair.primaryFile).post(this);

There is one more issue inside FolderChildren.R.run():
if (op == RefreshMode.DEEP) then MUTEX.postWriteRequest(this) is called.
Since the JavaDoc of postWriteRequest() warns that this method blocks (despite it's name), it needs to be investigated if it can really block in such a way that simultaneous refreshes of other filesystems will end up waiting here. If this is the case, then tasks R.run() would run in different RPs, but wait for each other anyway on MUTEX.postWriteRequest().

There is also a danger, that R.run(), which is running under MUTEX write access and blocks on some file access, can block anyone who is using MUTEX for operations on other filesystems. That can completely cancel the effect of this fix.

Comment 18 Jaroslav Havlin 2015-09-11 10:44:09 UTC

Created attachment 156089 [details]
Proposed Patch v2

Thank you very much, Martin, for thorough review.

I have tested whether the fix solves the reported problem in the real world at all (by making some folders slow - temporary modification of "get-children" code), and it does. During expanding of directory tree, slow folders do not block fast folders.

(I've also added a unit test for this case.)

Vladimir, can you please verify that the patch also solves your use-case with remote hosts?

I'm going to prepare extended (full) fix that addresses other issues mentioned by Martin.

Question is whether we should apply just this basic fix and wait with full fix until next release, or apply full fix as soon as possible, or postpone any modifications completely. (Basic fix should not break anything, but it solves the problem only partially. Full fix might be more risky, as it covers more cases.)

Comment 19 Jaroslav Havlin 2015-09-11 15:02:34 UTC

Created attachment 156107 [details]
Extended patch

Extension to proposed patch.

> 2) DataNode.ExtensionProperty.setValue()
Updated to use dedicated request processor.

> 3) DataNode.setShowFileExtensions()
DataObjects collected and processed in dedicated request processors.

> 4) DataNode.PropL.annotationChanged()
The same solution as for 3).

> 5) FolderChildren.waitRefresh() - this method should be removed
Removed.

> 7) DelayedNode.scheduleRefresh()
Updated to use dedicated request processor.

> if (op == RefreshMode.DEEP) then MUTEX.postWriteRequest(this) is called.
Slow operation can be run under MUTEX's write lock. I'm not sure how this could be fixed. This case doesn't seem to be very frequent.

Comment 20 Jaroslav Havlin 2015-09-14 07:25:53 UTC

(In reply to Jaroslav Havlin from comment #18) 
> Question is whether we should apply just this basic fix and wait with full
> fix until next release, or apply full fix as soon as possible, or postpone
> any modifications completely.
I would prefer to apply only the basic fix now and to include the extended fix in the next release. Do you agree?

Comment 21 Vladimir Kvashin 2015-09-14 12:26:03 UTC

(In reply to Jaroslav Havlin from comment #20)
...
> I would prefer to apply only the basic fix now and to include the extended
> fix in the next release. Do you agree?

I tried (Proposed Patch v2 + Extended patch), it seems to work fine. (I don't mean I made a comprehensive testing, just opened a very slow folder in favorites and checked that there are no hangups on other file systems).

By basic fix you mean this combination I tested, right?

Comment 22 Jaroslav Havlin 2015-09-14 12:41:32 UTC

(In reply to Vladimir Kvashin from comment #21) 
> I tried (Proposed Patch v2 + Extended patch), it seems to work fine. (I
> don't mean I made a comprehensive testing, just opened a very slow folder in
> favorites and checked that there are no hangups on other file systems).
Thank you very much.

> By basic fix you mean this combination I tested, right?
I'm sorry, no, by basic fix I mean only Proposed Patch v2.

Comment 23 Vladimir Kvashin 2015-09-14 14:52:42 UTC

(In reply to Jaroslav Havlin from comment #22)
> I'm sorry, no, by basic fix I mean only Proposed Patch v2.
The "Proposed Patch v2", without the "Extended patch" seems to work OK either.

Comment 24 Jaroslav Havlin 2015-09-15 07:18:33 UTC

> The "Proposed Patch v2", without the "Extended patch" seems to work OK either.
Thank you, Vladimir.

Proposed Patch v2 has been integrated:
http://hg.netbeans.org/core-main/rev/72a205c91510

I've created a separate issue for the Extended patch, see bug 255285.

Comment 25 Vladimir Kvashin 2015-09-15 07:40:19 UTC

Jarda, thank you very much!

Comment 26 Jiri Kovalsky 2015-09-15 15:50:10 UTC

Is this waiver request still valid?

Comment 27 Quality Engineering 2015-09-16 01:27:18 UTC

Integrated into 'main-silver', will be available in build *201509160002* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)

Changeset: http://hg.netbeans.org/main-silver/rev/72a205c91510
User: Jaroslav Havlin <jhavlin@netbeans.org>
Log: #252073 - Can expand no explorer nodes if one directory is very slow

Comment 28 Jaroslav Havlin 2015-09-16 06:53:28 UTC

(In reply to Jiri Kovalsky from comment #26)
> Is this waiver request still valid?
No, waiving is not needed any more.
I've just removed the keyword.
Thank you, Jirka.