This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
See thread dump. Appears that the problem is compounded from several things: 1. apisupport uses JarContent.readContent to recognize module JARs, because it is looking for the manifest header OpenIDE-Module. I guess I will have to drop this feature (don't know yet what to change it to). 2. jarpackager in readContent calls other datasystems methods. It is looking in a totally different place, but that still blocks on the folder recognizer lock. 3. The filesystem refresher calls loader recognition methods synchronously, even though the folder recognizer is also calling them at the same time. Jointly this produces a deadlock pretty often. #1 will probably just need to be dropped - cannot currently be made safe. #2 is probably fine in principle. #3 is disturbing, but seems to be integral to the current DS impl - let's hope the new API does not suffer from this problem.
Created attachment 6493 [details] Thread dump (dev jun 28)
*** Issue 25344 has been marked as a duplicate of this issue. ***
Should now be fixed in the trunk. Candidate for 3.4 merge. Analysis to follow. committed Up-To-Date 1.23 jarpackager/src/org/netbeans/modules/jarpackager/JarCreator.java
Created attachment 6629 [details] Applied patch
Created attachment 6630 [details] Binary patch: save as modules/patches/org-netbeans-modules-jarpackager/patch25293.jar
[MILOS K. - a review of this patch is requested.] Changed my mind - patching according to my earlier #2, i.e. readContent() should not be locking this way. JarCreator keeps a static map of extended properties, available as factories. This is read whenever a JarContent is read. Without the patch, getExtendedProperties synchronizes on the class. The apisupport data loader for .jarContent files including OpenIDE-Module used JarContent to find and read the manifest. The deadlock arose as follows: 1. Thread T1 tries to recognize a module JAR, from a Datasystems API method. This enters ModuleDataLoader.findPrimaryFolder, which eventually calls getExtendedProperties, locking JarContent.class. 2. getExtendedProperties uses something like Lookup to find the ext prop factories. Specifically this involves calling into Datasystems API. So far, OK. 3. Meanwhile, thread T2 which is refreshing data folders also tries to recognize a module JAR - maybe the same one, Datasystems is pretty sloppy. Anyway, it also gets as far as getExtendedProperties, but then blocks on JarCreator.class. 4. Now T1 can wait for folders to be refreshed, waiting on T2 and deadlocking. Looking at the code for JarCreator.getExtendedProperties, it seems oversynchronized. All of the work with the extended properties folder does not need to be locked, since Datasystems API is supposed to be threadsafe and it would be OK for another thread to browse the same folder at the same time. The only real need for the lock is to protect Map extendedProperties, a list of defaults to which folder-supplied factories are added. The patch just reduces the synchronization to only lock this map, not the folder searching. In step 3 above, T2 would not block, but continue and finish recognizing the .jarContent file, as would T1. The affected methods include addExtendedProperty and removeExtendedProperty. The changed synchronization should hardly affect them. Anyway they are deprecated. Note that ExtendedPropertyFactory.getName() could now be called from two different threads at once, whereas it could not before. I think this should pose no problem, since it should be a trivial getter. In fact I was unable to find any actual impl of this interface in all of nb_all and f4j_all sources, except in one unit test for jarpackager. It is not clear whether or not JarContent.readContent is supposed to be threadsafe. Neither Javadoc nor code seems to mention anything about threads. Note that this class is not part of the public API for jarpackager, so the module does not have a responsibility to behave in any particular way with it. The only known affected code is apisupport's ModuleDataLoader. Bug seen by both myself and jchalupa, on Linux and Windows respectively. apisupport/lite's LiteInstaller might also be affected. In principle, any other code which tried (directly or indirectly) to read .jarContent files from within a Datasystems SPI method might produce the same deadlock without this patch. I was able to reproduce the deadlock without much work in [dev jul 10] by creating a module JAR (empty module template using full apisupport), reinstalling it a couple of times, and then File | New (or generally just doing random things with data objects in the GUI when module JARs were visible). Running in the same build with the binary patch, I was unable to reproduce the deadlock, after creating and reloading several test modules, both from the empty module template and from the New Module Wizard, copying JARs in the Explorer, deleting them, doing other things with folders, etc. etc. So I believe it solved the bug. Alternatives: I can only think of changing apisupport to not call readContent synchronously. Either do not call it all - which would mean changing the way module JARs are handled by the API Support, probably dropping the special support for browsing manifest sections etc. - or call it asynch - which might mean initially not recognizing a .jarContent, then calling readContent in a request processor, then if found to be a module, setting a file attribute to mark it as a module and calling DataObject.setValid(false) to force it to change types. Either alternative would be messy, I think.
I've added the lookup things in getExtendedProperties lately when fixing a bug. (Moving of the extended props factories to layer) I didn't think much about the consequences of the fact that getExtendedProperties() in in synch block. Your patch is good IMO.
Merged: committed Up-To-Date 1.22.12.1 jarpackager/src/org/netbeans/modules/jarpackager/JarCreator.java
Have not seen this issue recur since.
Resolved for 3.4 or earlier, no new info since then -> closing.