35847 – Studio hangs during unmount of filesystem

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 35847 - Studio hangs during unmount of filesystem

Summary: Studio hangs during unmount of filesystem

Status:	VERIFIED FIXED

Alias:	None

Product:	platform
Classification:	Unclassified
Component:	Lookup (show other bugs)
Version:	3.x
Hardware:	PC Windows ME/2000

Importance:	P1 blocker (vote)
Assignee:	Jaroslav Tulach

URL:
Keywords:	THREAD

Depends on:
Blocks:

Reported:	2003-08-30 03:13 UTC by Todd Fast
Modified:	2008-12-23 11:42 UTC (History)
CC List:	14 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Thread dump after hang (20.99 KB, text/plain) 2003-08-30 03:15 UTC, Todd Fast	Details
Prevents calling to Nodes from Folder Recognizer Thread (1.10 KB, patch) 2003-09-04 15:17 UTC, Jaroslav Tulach	Details \| Diff
Several other thread dumps from hangs that we suspect are related. Some contain S1AF module classes and some do not. (30.04 KB, application/octet-stream) 2003-09-09 11:54 UTC, Todd Fast	Details
sample war file (996.82 KB, application/octet-stream) 2003-09-23 22:46 UTC, _ hlu	Details
stack trace (20.30 KB, text/plain) 2003-09-23 22:48 UTC, _ hlu	Details
Hang apparently in JavaDataObject.dispose() (32.40 KB, text/plain) 2003-09-24 11:18 UTC, Todd Fast	Details
Two full thread dumps of the deadlock. Build 090322. (50.72 KB, text/plain) 2003-09-30 16:57 UTC, Jan Lahoda	Details
Thread dump showing unmount hang with no JATO stack frames (21.45 KB, text/plain) 2003-10-02 00:25 UTC, Todd Fast	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Todd Fast 2003-08-30 03:13:43 UTC

(I am filing this bug in the OpenIDE category even though 
this could also be related to the Java module, the Web 
module, and/or other modules in the IDE.  It is difficult 
to know which has the primary responsibility in this case.)

We are seeing frequent hangs when unmounting a filesystem 
mounted through our module (a derivative of the Web 
module).  The hangs are complete deadlocks and seem to 
occur between the Swing thread and one or more threads 
trying to lookup object instances.  Note that I do not mean 
to imply that this is only a problem when working with our 
module, it's just that testing our module relies on 
mounting and unmounting a large number of filesystems--this 
could be a problem unmounting any filesystem but we do not 
test that case on a regular basis.

In the attached stack trace, you can see that the 
RunToCursorAction.enable() method on the Swing thread is 
doing a lookup of a debugger instance at the same time the 
Java source parsing thread is looking up a compiler type, 
while several other threads are accessing the FolderList 
and other assorted objects with Mutexes.

We see similar hangs often, as well as a number of other 
Studio hangs under other circumstances.  In all cases, they 
seem to have only cursory relationship to our module (only 
one method, dispose(), is being invoked on one of our 
classes in our module in the attached trace).  The 
probability for hang seems to be exacerbated by having one 
or more Java and/or JSP files from the filesystem open (or 
recently opened) when the filesystem is unmounted.  In the 
case of having JSP files opened, we frequently see in the 
thread dump a lookup for a JSP syntax coloring/parsing 
service.

We leave open the possibility that this is somehow caused 
by our module somehow, but the attached thread dump seems 
to negate that since none of our objects are in the traces 
except or one of our DataObjects (not a Java or JSP file) 
being discarded.

Thread dumps from many of these hangs have lead us to 
suspect that either the Open API's Lookup 
class/infrastructure is not properly threadsafe, or somehow 
use of this system from various modules (directly or 
indirectly) is not properly taking into account thread 
safety issues.  Another possibilty is that the Mutex class 
is broken and/or being used improperly in various places.  
To be honest, these are just suppositions--it is baffling 
trying to analyze this problem.

One open question is why the Java parser is being invoked 
during an unmount of a filesystem.  We have seen this as a 
source of trouble in other cases, though I cannot 
articulate them at this time.  We have also come to believe 
that there may be a significant problem with zombie 
FileObjects and DataObjects in the IDE during filesystem 
unmounts.  Could it be that this hang phenomenon as shown 
in the trace is evidence of that?

Our impression of the stability of the Studio has fallen 
since Studio 4.1, largely due to hangs like this one.  We 
are concerned that users of our module are going to 
encounter this problem, since they are prone to mount and 
unmount filesystems more frequently on average.  We are 
also concerned because similar hangs seem to occur upon 
switching projects, which is certainly something all Studio 
users do.

Comment 1 Todd Fast 2003-08-30 03:15:16 UTC

Created attachment 11473 [details]
Thread dump after hang

Comment 2 Todd Fast 2003-08-30 03:36:50 UTC

My colleague has also mentioned that the same thing occurs 
with the Find... feature, perhaps even more frequently.  I 
am able to reproduce the problem by doing the following:

1) Open a Java file in the editor by double clicking its 
node
2) Use the Find... feature to locate the same node in the 
search window
3) Double click the node in the search window to open it in 
the editor

In a majority of cases, a second editor tab will appear for 
the file.  I have been able to reproduce this reliably.

One note, when the second editor appears, it my be linked 
to the first editor such that edits in one appear in the 
other.  However, it is definitely possible to get into a 
pathological state where the editors are not related and 
changes in one are tracked in the other, resulting in the 
changes not being saved under certain conditions.

Comment 3 Todd Fast 2003-08-30 03:43:56 UTC

Sorry, ignore last post--wrong issue.

Comment 4 David Konecny 2003-09-02 10:30:00 UTC

Jarda, could you please look at this issue and suggest some solution
if there is any? You are the only living expert on Datasystems. :-)

Todd, yes threading is well known problem which we want to address in
future version(s). We are very well aware of this, but unfortunately
in current state it is hard to fix any issue related to deadlocks.
Usually any fix in the openide causes bunch of problems somewhere
else. We will try to prepare some workaround, but do not expect
anything magic what will solve all these problems. Definitely not
before completion of redesign of threading and Datasystems.

Comment 5 Jaroslav Tulach 2003-09-02 15:15:30 UTC

The root of this deadlock is the code inside "Folder Recognizer"
thread. Nearly every other thread is waiting for it to finish, but the
recognizer is blocked. Fix shall be based on based on having following
the priority of resources - e.g. Nodes are allowed to make calls into
DataSystems while holding a resource, Datasystems cannot call nodes
while holding a resource. The "Folder Recognizer" thread is a
resource, being held by DataSystems and calling to Nodes. This shall
not happen.

The possible ways of not doing it:
1.
org.openide.loaders.DataNode$PropL.propertyChange(DataNode.java:551)
shall reschedule to different thread before calling to Node.super.methods

2. Simple workaround in
com.sun.jato.tools.sunone.context.ClassesDataFolder.dispose(ClassesDataFolder.java:54)
to reschedule super.dispose into another thread than Folder Recognizer.

3. The wider problem is why the ClassesDataFolder is being disposed. I
am pretty sure that this is related to unmounting of FileSystem. Such
action triggers recheck of ClassDataFolder and this data object is no
onger able to recognize itself (checkConsistency) that is why it
invokes dispose. If there would be some way for the ClassesDataFolder
to survive we would not get into this trouble.

Generally I think solution 1 is the most appropriate, but requires
changes in platform - is not in release35. Moreover I understand that
it is based on not anywhere written assumption about the "resources
hierarchy" described above, that will be anyway violated on a tons of
other places.

Comment 6 Matthew Stevens 2003-09-03 04:31:09 UTC

some more facts which may help the analysis of the issue...

1) It was said in the opening issue statement: "we frequently see in
the thread dump a lookup for a JSP syntax coloring/parsing service." 
To be more clear, we are specifically referring to the JspParser
(Jasper integration) material and not the syntax analyzer
(org.netbeans.modules.web.core.syntax.Jsp11Syntax) which most of the
JSP syntax coloring relies on.  This is probably unimportant but I
just wanted to be clear.

2) It is important to understand how our module currently reacts to
unmounting; specifically with respect to the reaction of our Loaders
to conditions of invalid filesystems.  Because we have a lot of
tracing in our module we use for debugging specific functionality we
noticed a lot of unexpected behavior executing upon unmounts of the
web app filesystem.  For example, we noticed that our primary design
artifact DataObjects (Models, Views, and Command and JSPs) were being
recreated during unmount.  We would end up having these orphan or what
we called "Zombie" DataObjects around.  Our investigation showed that
our loader's findPrimaryFile() was being called for FileObjects which
were in fact parented by invalid filesystems.  Why FileObject activity
would proceed on filesystems which were invalid was a mystery to us.
Our decision at that time was to return null from all our
findPrimaryFile() calls in our loaders.  This seemed to make the
Zombie S1AF DataObjects disappear.

Question: is there ever a proper situation in which a DataObject
should be created for a FileObject of an invalid filesystem?  If not,
why doesn't the platform assert that a FileObject is in a valid
FileSystem before attempting to load its DataObject?

The hang occurences became pronounced two weeks ago (just about every
time) as we approached a feature complete build of S1AF for Studio5.
The hang thread dumps shows that we were hung deep within some of our
loader's handleFindDataObject().  To be consistent with half of our
loaders which already had the pattern, we added checks for invalid
filesystems to all remaining loaders.  The majority of the hangs went
away.  The hangs that remain are accurately described by Todd in the
original issue posting.

Hopefully this additional information will help frame the issue.

Regarding the comments from jtulach earlier today:

Todd will have to evaluate your suggestions #1 and #2

Regarding your comment #3:

You assert that ClassesDataFolder.dispose() is being called in
reaction to unmount.

Question: Should we not predict that DataObject.dispose() will be
called on all the live DataObjects for our mounted web app after the
app is unmounted?   Were you suggesting that dispose() was abnormal
during unmount?  You state that unmount action triggers "recheck" of
ClassDataFolder but this fails so dispose() is called instead.  You
state "If there would be some way for the ClassesDataFolder to survive
we would not get into this trouble."  In light of our explaination of
how we currently return null from handleFindDataObject() during cases
of invalid filesystems...does this help provide insight to our issue?

Comment 7 Marian Mirilovic 2003-09-03 16:00:48 UTC

Hi,

we would like to reproduce hang on, so can you provide us something
like test case, or reproducible scenario with step by step notice ?

Is the hanging on reproducible just on Windows or is it reproducible
on other operating systems ?

Thanks advance ...

Comment 8 Matthew Stevens 2003-09-03 16:24:19 UTC

I will try to provide ASAP (and attach) a small S1AF web app you can
mount and unmount to reproduce the hang.  I will do this on Solaris
since I heard that is what you folks probably use.  In the meantime
please setup the following test environment to run the S1AF module.

The S1AF module is available at

http://clue.sfbay/kits/jato/trunk/Build030902/

please install the NBM

Optionally if you would like to see some of our debug traffic in
ide.log add the following switches to your ide.cfg

-J-Dsunone.jato.debug=true
-J-Dsunone.jato.debug.usesystemerr=true
-J-Dsunone.jato.debug.file=/tmp/Debug.properties

and then ensure the contents of your /tmp/Debug.properties
is the following (this lines enable indicated classes and packages to
output debug)

----------------------------------
jsp.JatoJspLoader
app.JatoAppLoader
command.CommandDefinitionLoader
mode.ModelDefinitionLoader
view.ViewBeanDefinitionLoader
view.ContainerViewDefinitionLoader
mount
context
zombie
----------------------------------

Comment 9 Matthew Stevens 2003-09-03 16:38:14 UTC

correction to Debug.properties contents (model misspelled):

----------------------------------
jsp.JatoJspLoader
app.JatoAppLoader
command.CommandDefinitionLoader
model.ModelDefinitionLoader
view.ViewBeanDefinitionLoader
view.ContainerViewDefinitionLoader
mount
context
zombie
----------------------------------

Comment 10 Todd Fast 2003-09-04 05:53:57 UTC

Jaroslav--

I do not think you will be able to reproduce this in any reliable 
way, as it is a race condition and subject to a multitude of 
factors.  We see it frequently enough, but only because we mount and 
unmount hundreds of apps.  Each time it happens, the thread dump is 
different and involves different threads and different objects.

I believe this thread dump was from an unmount attempt (that's 
usually the case for these hangs).  I don't think there is any reason 
why we want to allow ClassesDataFolder to survive an unmount--that 
will only lead to zombies and other issues, no?  The dispose() call 
is perfectly reasonable and expected here as far as I know.

Also keep in mind that the ClassesDataFolder.dispose() call is just 
one of many variations.  This is only one instance of the hang and 
every hang is different.  The common feature in all thread dumps 
during these types of hangs is many threads simulataneously accessing 
FolderRecognizer.

I'm worried about your workaround suggestion for a couple of 
reasons.  First, the fact that ClassesDataFolder appears in this 
thread dump is just coincidence.  We have a large number of 
DataObjects, any of which (or none) could be involved in a hang.  In 
this instance, are you perhaps focusing on ClassesDataFolder as a 
problem when in fact it may be several of the other threads that are 
causing the deadlock?  If this is the case, then doing something in 
ClassesDataFolder.dispose() won't have any effect.

Second, wouldn't we need to put your suggested workaround in all of 
our DataObjects, since any one of them could be involved in such a 
hang?  Are you sure that this hang is *caused* by dispose() in 
ClassesDataFolder, and so you recommend we make similar changes 
everywhere?  What about other modules' DataObjects, like 
JavaDataObject?

Third, I assume you know what you are talking about, but to me your 
workaround seems radical.  Under normal conditions I would never 
assume that doing something like this would be a "safe"  or 
recommended operation, as my assumption is that a DataObject's 
lifecycle is complex and something I don't want to mess with.  Is 
there any danger that this workaround will cause more problems (i.e. 
hangs) than it fixes, or cause a problem in future releases of 
NetBeans?  Do you have confidence that this will be a workaround we 
can rely on?  If there is any doubt, I might prefer to take our 
chances with the occasional hang than make our module incompatible 
with future releases.

Is there something else we can do to avoid getting the 
FolderRecognizer involved?  For example, we have enabled filesystem 
refresh on these Web app filesystems, and we are using an instance of 
FolderLookup on the mounted app.  Could these be factors?

Comment 11 David Konecny 2003-09-04 10:05:26 UTC

Todd, please attach some other samples of deadlocks.

Comment 12 Jaroslav Tulach 2003-09-04 15:15:45 UTC

Is there a DataLoader that would recognize a FileObject and after an
unmount of a FileSystem it would not recognize it? From the
description above I think it is, probably a workaround for some other
issue. Can this be problem. Yes, it can. How would one recognize the
problem? Probably by a stacktrace that involves
MultiFileLoader.checkCollision and then DataObject.setValid (false) -
a sign that existing DataObject is no longer recognized by its own
loader. It that a faulty behaviour? Yes, it causes deadlock. Is the
problem in data systems? Yes, they are not ready to survive this
situation. Or in the loader who is doing that? Actually, is it really
necessary to not recognize what has already been recognized? How to
fix it. Either make DataSystems more robust (possible source of other
bugs) or improve the recognition (if possible).

I will attach a patch that might fix this on data system side. If it
helps, we might start considering whether to apply it or find less
dangerous solution.

Comment 13 Jaroslav Tulach 2003-09-04 15:17:33 UTC

Created attachment 11525 [details]
Prevents calling to Nodes from Folder Recognizer Thread

Comment 14 David Konecny 2003-09-08 14:33:02 UTC

Todd, could you please try Yarda's patch and let us know what are the
results? The patch itself looks to me as too dangerous to be put into
an update release. But I would like to know whether it solved the
problem or not.

If yes the I would propose to include the patch into main trunk
sources and have it there for some time to prove that there are no
serious regressions.

As for your current release I afraid that there is no easy solution.
If Yarda'a patch solves the problem or at least improves it I would
propose to implement that directly in your ClassesDataFolder and other
affected DataObjects.

As a side note I would like to assure you that we are aware of these
problems and we work on them. The threading model is being clarified
and simplified as much as possible. The Datasystems are also being
completely redesign. But that's the future.

Comment 15 Todd Fast 2003-09-09 11:54:31 UTC

Created attachment 11550 [details]
Several other thread dumps from hangs that we suspect are related.  Some contain S1AF module classes and some do not.

Comment 16 Todd Fast 2003-09-09 12:14:01 UTC

>Is there a DataLoader that would recognize a FileObject and after an
>unmount of a FileSystem it would not recognize it? From the
>description above I think it is, probably a workaround for some other
>issue. Can this be problem. Yes, it can. How would one recognize the
>problem? Probably by a stacktrace that involves
>MultiFileLoader.checkCollision and then DataObject.setValid (false) -
>a sign that existing DataObject is no longer recognized by its own
>loader. It that a faulty behaviour? Yes, it causes deadlock. Is the
>problem in data systems? Yes, they are not ready to survive this
>situation. Or in the loader who is doing that? Actually, is it really
>necessary to not recognize what has already been recognized? How to
>fix it. Either make DataSystems more robust (possible source of other
>bugs) or improve the recognition (if possible).

Yes, we have implemented what we call a "zombie check" in several of 
our loaders because we saw DataObjects being created during unmount 
and other invalid situations.  The creation of these objects often 
caused the unmount to hang, and these DataObjects remained live in 
the IDE and seemingly caused many other problems.  The zombie checks 
ensure that DataObjects are not created for FileObjects or 
FileSystems which are invalid.  It usually looks something like this:

-----
protected DataObject handleFindDataObject(
    final FileObject fo, 
    RecognizedFiles rf)
    throws IOException
{
    try 
    {
        if (!fo.getFileSystem().isValid())
            return null;
    }
    catch (FileStateInvalidException e)
    {
        // Ignore
        return null;
    }

    ...
-----

Once we implemented checks for zombies, we believe our module's 
reliability went up considerably.  We assumed this was a problem with 
the Open API in that it was incorrectly causing re-recognition of 
DataObjects whose FileObjects or FileSystems were invalid.

However, are you saying that our zombie check logic might be causing 
problems such as hangs by not recognizing previously recognized 
DataObjects?  Is there some other way we could protect from zombie 
DataObjects being created during unmount situations?

>I will attach a patch that might fix this on data system side. If it
>helps, we might start considering whether to apply it or find less
>dangerous solution.

I would try to apply the patch, but it is very difficult to reproduce 
the problem reliably.  I'm not sure I would be able to say anything 
definitive about what I saw.  Also, I have not seen the problem 
lately after we added more zombie checks to our loaders.  Can you 
please advise on the wisdom of using our zombie checks?

Comment 17 David Konecny 2003-09-10 09:12:06 UTC

"Also, I have not seen the problem lately after we added more zombie
checks to our loaders." - I'm really glad to hear that you workaround
it and that it works.

"Can you please advise on the wisdom of using our zombie checks?" -
Yarda could you comment this please?

Comment 18 Jaroslav Tulach 2003-09-11 12:29:28 UTC

As I wrote in my second comment, the root cause for the deadlock in
this issue is the unability of some DataLoader to re-recognize
something previously regonized - e.g. zombie check. If the DataLoader
would not behave in such way, there would be no need for this issue.

Comment 19 David Konecny 2003-09-11 15:50:49 UTC

The problem was workarounded. Closing as WONTFIX.

Comment 20 Matthew Stevens 2003-09-11 16:33:15 UTC

Can you confirm that the disposition of this issue from the core team
is that our module is causing the hang because we have faulty code in
our Loaders findPrimaryFile and handleFindDataObject methods?  That
is, that our code is current refusing to create new DataObjects for
FileObjects from invalid FileSystems.

If you confirm above...what is your recommendation on how we should
implement our loaders to deal with this unmount activity which
proceeds on the fileobjects for the invalid filesystem?

Can you confirm that it is as designed in the core that DataObjects
will be re-recognized in the case of unmount?

We would greatly appreciate some comment and expert perspective on
this scenario.  Again, here it is: We designed DataObjects and Loaders
for our Module.  Everything works nicely for mounting a web
application.  We engage the UNMOUNT action and our DataObjects are
disposed and the loaders recreate the DataObjects all over again but
in this case functionality fails and we hang all over the place. 
Functionality fails because the new DataObjects find themselves
running in a filesystem which is trying to disappear.  We hang in the
same code paths mutex locks that we have presented in this case, its
just that we can follow the stack traces to the creations of new
DataObjects.  Hence, we eliminated the creation of the duplicate
DataObjects, those code paths were eliminated and the hangs, most all
of them, are gone.  I don't remember reading anywhere that it said
there are two conditions in which DataObjects are created: 1) regular
cases and 2) pathologic cases when the filesystem is invalid.  If we
are not suppose to balk on creating new DataObjects during unmount (as
you said in your last comment) what are we suppose to do?  Would you
agree that if we proceeded to create the duplicate DataObjects we
would have to do something different for the condition of unmount? 
What condition do we look for and how should our zombie DataObjects
behave?  When you comment here please consider that our module is the
most stable its been now that we check for invalid filesystems across
the board and deny the creation of duplicate DataObjects.

Comment 21 Matthew Stevens 2003-09-11 17:22:36 UTC

I reopened the issue so that we get closure on our questions.  If you
would like to close the issue its fine by me, I just would like the
questions answered and associated with the issue.

Comment 22 Todd Fast 2003-09-12 03:15:53 UTC

Yarda (or others), can you please answer this short list of open 
questions?  We are still confused until we have clear answers to 
these:

1. Is it normal for our loaders to be asked to recreate DataObjects 
when FileObjects and/or Filesystems are invalid?

2. Is it normal for a filesystem unmount to cause rerecognition of 
DataObjects for the unmounted filesystem?

3. We see that turning off zombie checks results in NEW DataObjects 
being created.  Is that expected?  Or, would you instead expect a 
DataObjectAlreadyExists exception to be thrown?

4. Is returning NULL from our zombie checks the best behavior?  Would 
it be better to throw DataObjectAlreadyExists or some other exception?

5. Do you have any other recommendations for us to avoid the problems 
caused by DataObjects being created for invalid FileObjects and 
Filesystems?

Thank you.

Comment 23 Jaroslav Tulach 2003-09-12 08:11:33 UTC

> 1. Is it normal for our loaders to be asked to recreate DataObjects 
> when FileObjects and/or Filesystems are invalid?

DataObject can work on any filesystem, not only those mounted in
repository and because only filesystems in repository can be valid, it
is ok for data system to work over invalid filesystems.

> 2. Is it normal for a filesystem unmount to cause rerecognition of 
> DataObjects for the unmounted filesystem?

Seems so.

> 3. We see that turning off zombie checks results in NEW DataObjects 
> being created.  Is that expected?  Or, would you instead expect a 
> DataObjectAlreadyExists exception to be thrown?

It is not possible to throw DOAExists. It can be thrown only when
constructor of DataObject fails. Trying to create new objects is fine
if somebody is interested in them.

> 4. Is returning NULL from our zombie checks the best behavior?  Would 
> it be better to throw DataObjectAlreadyExists or some other exception?

You can either return null or try to create new data object (which may
result in DOAExists exception, if it really exists). I am not in
possition to know the best behaviour.

> 5. Do you have any other recommendations for us to avoid the problems 
> caused by DataObjects being created for invalid FileObjects and 
> Filesystems?

My recommendation is to not block "Folder Recognizer" thread by
waiting on Children.MUTEX - e.g. reschedule all possible calls from
that thread to another one.

Comment 24 David Konecny 2003-09-12 09:51:44 UTC

Yarda, yes the best solution is to not block Folder Recognizer and
that's what we have to do in the long term. But how this should be
solved in the short term? Is really the current workaround
unacceptable or dangerous? Could you please answer question 4 from the
short term point of view?

ad answer 1:
yes, it is possible to have filesystem which is not in repository and
which is then "invalid" (kind of strange naming) and Datasystems
should work on it. But is this common? The API allows that but is
there anybody really doing something like that? I do not think so.

The threading model of DS is known to be messy and so any threading
problem is hard to solve. So IMHO if current workaround works and
tests prove that it is not causing regressions I would accept it, live
with it and properly document it in the source code.

Comment 25 David Konecny 2003-09-12 09:54:17 UTC

Oops.. in "Could you please answer question 4 from the
short term point of view?" I of course meant question 5.

Comment 26 Petr Jiricka 2003-09-12 10:02:48 UTC

> Yarda, yes the best solution is to not block Folder Recognizer and
> that's what we have to do in the long term.

Really? I thought the long term solution was to stop using the
datasystems API.

Comment 27 Todd Fast 2003-09-12 10:15:39 UTC

Not really an option for us...<grin>

As for the zombie check workaround we have in place, I think we have 
decided based on our empirical observations of behavior that even if 
it can potentially cause a hang in the Folder Recognizer, it usually 
doesn't, and the module is far more stable overall.  Therefore, we 
will continue with it in place.

I think the root of our issue is that we, like the Web module, are 
trying to provide context for DataObjects rather than simply create 
them on a per-file basis.  This fact leads to the unfortunate problem 
of needing DataObjects that cannot be spuriously recreated if their 
context is invalid.  This seems to be incompatible with current 
Netbeans assumptions about DataObject lifecycle, so we are basically 
on our own trying to make this work flawlessly.

Comment 28 David Konecny 2003-09-12 13:45:00 UTC

PetrJ, sure it is. But current planning and schedules are so unclear
that I rather count that there might be one more release with current
DS. For this one we could do the Yarda's patch.

Todd, could you close the issue then? :-)

Comment 29 Todd Fast 2003-09-12 13:59:14 UTC

Workaround in place. Closing as WONTFIX.

Comment 30 Jesse Glick 2003-09-12 16:58:44 UTC

I don't know much about the semantics of Datasystems in this case (not
sure anyone does, actually), so that might be the "primary cause".
However re. the threading here:

Agreed that one contributing evil factor is that DataNode.fireChange
is receiving an event from the folder recognizer thread - called with
an implied lock, i.e. the recognition task - and then refiring an
event (here, nodeDestroyed) which will surely need to acquire
Children.MUTEX in a write lock, which is IMHO illegal. (Nodes/Children
are close to the GUI and may block on low-level structures like
Datasystems, assuming the blockage is expected to be short-lived. But
not vice-versa.)

No need to worry about the trunk - this deadlock should be made
impossible as a result of issue #35833. (Not to say that some other
problem might not arise, but at least you would not have this EQ <->
FolRec deadlock.)

Comment 31 Todd Fast 2003-09-15 10:30:57 UTC

Sorry to drag this open again, but I noted this interesting comment 
in the Netbeans FolderLookup class:

postCreationTask()
protected final Task postCreationTask(Runnable run)Starts the 
creation of the object in the Folder recognizer thread. Doing all the 
lookup stuff in one thread should prevent deadlocks, but because we 
call unknown data loaders, they obviously must be implemented in 
correct way.

Note that this seems to fit the profile of what we are seeing on 
unmount--we use FolderLookup in our module, we direct it at folders 
that have our DataObjects in them, the hang in response to the 
DataObject rerecognition is a problem with the Folder Recognizer 
thread, and the hang happens on unmount, which is when we commonly 
see the FolderLookup become active and "fight" the unmounting 
filesystem by trying to rerecognize invalid objects.

Is it possible that our use of FolderLookup is the source of (at 
least some of) the zombies we are seeing, and could FolderLookup's 
insistence on running on the Folder Recognizer thread be the problem 
here?

If this is the case, or could be the case, we can override 
postCreationTask() to run in a different thread--does anyone have any 
recommendations for a better thread?

Comment 32 Todd Fast 2003-09-15 10:41:10 UTC

Argh.  I *could* change FolderLookup to run postCreationTask in a 
different thread, to at least test my theory.  That is, if it weren't 
marked final.

This isn't the first time we've been stymied by the liberal and 
constraining use of final methods in the Open API.  Very 
frustrating.  I hope final methods are not part of the plan for the 
datasystems rewrite...

Comment 33 _ hlu 2003-09-23 22:45:19 UTC

This ide hang seems happen often on solaris.
Stripes build 030922 on solaris9:
To reproduce( not 100% , but repeat the steps a few times, you will
get it):
1.swich to Sun ONE application framwork.
2.mount sample application which can be got from unpacking attached
war file.
3.extend Jato Sample node, Settings & Configuration node, and
Application Classes|jatosample|module1 node.
4. double click AddValuesViewBean node, ConceptIndexTiledView node,
ConceptIndexViewBean node, CustomersModel node, and E0120Command node.
5. double click to open ConceptIndex jsp node under
ComceptIndexViewBean|JSP Pages.
6. umount the application with Jato Sample|Unmount Application.
I saw the following NPE sometime without ide hang.
java.lang.NullPointerException
	at
org.netbeans.modules.java.ParserAnnotation.attachToLineSet(ParserAnnotation.java:134)
	at
org.netbeans.modules.java.JavaEditor.processAnnotations(JavaEditor.java:449)
	at
org.netbeans.modules.java.JavaEditor.access$300(JavaEditor.java:77)
[catch] at
org.netbeans.modules.java.JavaEditor$2.run(JavaEditor.java:297)
	at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:178)
	at java.awt.EventQueue.dispatchEvent(EventQueue.java:448)
	at
java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:197)
	at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
	at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:144)
	at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:136)
	at java.awt.EventDispatchThread.run(EventDispatchThread.java:99)

Comment 34 _ hlu 2003-09-23 22:46:54 UTC

Created attachment 11696 [details]
sample war file

Comment 35 _ hlu 2003-09-23 22:48:02 UTC

Created attachment 11697 [details]
stack trace

Comment 36 Todd Fast 2003-09-23 22:56:59 UTC

This hang is identical to the other hangs attached here, which is 
seemingly caused by dispose() called on the Folder recognizer 
thread.  The NPE is just a side effect and is not relevant.

I never heard any response from Yarda regarding my previous comments 
on use of FolderLookup--a response would help us know where to go 
with this issue.

I am attempting a workaround in the JATO module code.  Can someone on 
the Netbeans team please confirm (or deny) whether this workaround 
may help (or hurt):

if (Thread.currentThread().getName().equals(
    "Folder recognizer")) // NOI18N
{
    // Do not call dispose in the folder recognizer thread.
    RequestProcessor.getDefault().post(
        new Runnable()
        {
            public void run()
            {
                dispose();
            }
        });
    return;
}

Comment 37 Jesse Glick 2003-09-23 23:02:53 UTC

Re. the NPE in ParserAnnotation.attachToLineSet - please file
separately, for Java module.


Re. use of final methods - to the contrary, any future API will have
*more* things final and not subclassable or overridable. Permitting
promiscuous subclassing in a public API leads (in our experience) to
horrible backwards compatibility and API evolution problems, as well
as a confusing API. If you want to test your theory about some code,
get the source, patch for testing, and run with a patch turned on:

http://nbbuild.netbeans.org/patching.html

No need to support this kind of thing in the production application.

Anyway this is off-topic for Issuezilla; bring it up on nbdev if you
want. General principles listed here:

http://openide.netbeans.org/tutorial/api-design.html#design.less.final

Comment 38 David Konecny 2003-09-24 08:45:11 UTC

"Re. the NPE in ParserAnnotation.attachToLineSet" - already filed as
36032.

"Can someone on the Netbeans team please confirm (or deny) whether
this workaround may help (or hurt)" - I think it is OK. It is exactly
what Yarda suggested in his first reply.

Comment 39 Todd Fast 2003-09-24 08:49:28 UTC

Thanks David.  I just wanted to ask because the workaround code is in 
our DataObject's dispose() method; a slight variation of Yarda's 
suggestion.

The good news is that the workaround does seem to prevent the hangs 
our QA team was seeing, and so far doesn't appear to have any 
problematic side effects (it does cause a little odd behavior during 
unmount, but nothing problematic).

Comment 40 Todd Fast 2003-09-24 11:17:57 UTC

Unfortunately, we are still seeing occasional hangs.  Please see the 
latest thread dump, 1.txt.  The interesting thing about it is that 
Folder recognizer thread is calling dispose() on the JavaDataObject 
and seemingly doing the same thing Yarda said was a problem in our 
DataObjects.  Our module does not do anything to specialize 
JavaDataObjects or loaders--these are the standard objects that ship 
with Studio.

Again, I have to ask: is this an unusual situation?  Is something 
different about our module that the Folder recognizer thread is 
calling dispose() on DataObjects?  Is it our use of FolderLookup that 
is causing this?

Comment 41 Todd Fast 2003-09-24 11:18:43 UTC

Created attachment 11701 [details]
Hang apparently in JavaDataObject.dispose()

Comment 42 David Konecny 2003-09-25 12:21:21 UTC

QA, are you able to reproduce it in our labs?

Comment 43 Marian Mirilovic 2003-09-25 17:02:01 UTC

Ok, QA/I will look at it tomorrow....

Comment 44 Jan Lahoda 2003-09-26 15:03:06 UTC

Hi,
    I was trying to reproduce the problem and was not sucessfull.
I was using Nevada and with installed JATA from nbm.
I was testing version 030910,030912, 030922, 030924,030925.
I used two machines: single processor&Solaris 8 and double
processor&Solaris 9 and JDK1.4.2 (not all combinations of JATA version
and machines was used, but all 03092* were tested on the
double-processor and 090325 was tested on the single processor).

I used the example attached at 2003-09-23. I had following difficulties:
1. The example does not contain WEB-INFO/jatoapp.xml, so I added one.
2. I was unable to find: "5. double click to open ConceptIndex jsp
node under ComceptIndexViewBean|JSP Pages." and used
"Documents/jatosample/module1/ConceptIndex.jps".

I am probably doing something wrong. If this is still a problem, could
you please write more precise "steps to reproduce"? Or is it a
completely random problem?

Comment 45 _ hlu 2003-09-27 20:36:12 UTC

Here are the steps to reproduce the problem:
1. save the war file(attached) or get one from jato installation by
clicking Help|Sun ONE Application Framework(JATO) Technical
Documentation which will open a browser. Select Sample Application
link and it will show you a page where you can save sample application
war file. You may want to try the second approach as war file from
early build may have different jato library file from the build you
are using.
2. mount the directory containing the war file.
3. unpack the war file by using the war file node's popup menu action.
4. the sample application should be mounted.
5. swith to Sun ONE Application Framework tab.
6. continue with other steps(see the previous comments).

Comment 46 Marek Grummich 2003-09-30 16:00:06 UTC

I tried to reproduce a described behaviour, but I wasn't successful. I
used single procesor machine 1GB memory, solaris 8,j2sdk1.4.2, S1S 5
build 030904 (030922).
I encountered only early mentioned java.lang.NullPointerException.

Comment 47 Jan Lahoda 2003-09-30 16:55:45 UTC

Hi,
   I was able to reproduce the deadlock in build 030922. The thread
dumps are attached to the issue for reference. I have used two
processor machine, Solaris 9 and JDK1.4.2. I will try 030929. (The way
of uncompressing the archive was the most important piece of the puzzle.)

Comment 48 Jan Lahoda 2003-09-30 16:57:05 UTC

Created attachment 11747 [details]
Two full thread dumps of the deadlock. Build 090322.

Comment 49 Todd Fast 2003-10-02 00:25:02 UTC

Created attachment 11766 [details]
Thread dump showing unmount hang with no JATO stack frames

Comment 50 Todd Fast 2003-10-02 00:29:55 UTC

I've added a thread dump attachment that shows an unmount hang that 
occurred without any involvement from the S1AF/JATO module code.  The 
problem appears to be simulataneous access to the lookup and/or 
FolderList.getChildrenList() method.

Comment 51 _ ttran 2003-10-05 19:33:46 UTC

Yarda, please take over this issue.  David K is out of ideas.  This
seems pretty hairy.  Thanks

Comment 52 Jaroslav Tulach 2003-10-05 20:10:26 UTC

Ok, I was chosen to solve the issue, but I was not following its life
for three weeks. Before I do that, I'd like to know if anybody tried
to reproduce the issue with the patch I provided here on 2003-09-04. 

If anybody reproduced the deadlock with my patch applied, please write
it here. Otherwise I am going to apply that patch and mark the issue
as fixed. Thanks.

Comment 53 Todd Fast 2003-10-06 01:49:58 UTC

We have included a workaround based on your patch in the JATO module 
for JATO DataObjects only.  The latest hang was taken from a build 
that included that workaround, but the thread dump does not include 
any JATO stack frames.  We have seen thread dumps from similar hangs 
that did include JATO stack frames, but otherwise they look very 
similar to this hang.  This makes me think that our workaround based 
on your patch has fixed any hangs caused by JATO DataObjects.

Comment 54 Jaroslav Tulach 2003-10-06 16:05:03 UTC

Last deadlock reported as separate issue 36449 as it is different. The
all others (including the original report) have been fixed:

/cvs/openide/test/unit/src/org/openide/loaders/Deadlock35847Test.java,v  
initial revision: 1.1

/cvs/openide/loaders/src/org/openide/loaders/DataNode.java,v  <--   
revision: 1.6;

Comment 55 Antonin Nebuzelsky 2003-11-04 15:03:47 UTC

Fixed also in Nevada Patch 1 and in Arrow.

Comment 56 Lukas Hasik 2004-02-25 13:46:38 UTC

 verified -> todd.fast 2003-10-05

Comment 57 Lukas Hasik 2004-02-25 13:46:57 UTC