This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | AWT thread blocked during startup | ||
---|---|---|---|
Product: | java | Reporter: | Antonin Nebuzelsky <anebuzelsky> |
Component: | Unsupported | Assignee: | issues@java <issues> |
Status: | RESOLVED FIXED | ||
Severity: | blocker | CC: | issues, jtulach, tboudreau |
Priority: | P2 | Keywords: | PERFORMANCE, REGRESSION, THREAD |
Version: | 4.x | ||
Hardware: | PC | ||
OS: | Windows ME/2000 | ||
Issue Type: | DEFECT | Exception Reporter: | |
Bug Depends on: | |||
Bug Blocks: | 45449 | ||
Attachments: | Several thread dumps during the time IDE is frozen while J2SE is scanned |
Description
Antonin Nebuzelsky
2004-06-01 14:20:30 UTC
I would like to stress out that this occurence of this AWT blocking is a regression to the previous refactoring builds. This shows how evil this is. It really can show up somewhere else again anytime. Created attachment 15397 [details]
Several thread dumps during the time IDE is frozen while J2SE is scanned
I would not be that fatalist. It can be called a swing/AWT design flaw as well as MDR design flow. This problem was known from the beginning and discussed with Jesse and other folks when working on build system integration. The cure to this is not the mutex you suggest (that would require a lot of code/work to implement and months to stabilize), but making sure that modules make calls to MDR in event thread only when they know what they are doing. Moreover the ExclusiveMutex is now able to recognize that an AWT thread is waiting for it, it gives it a higher priority when graning access to the mutex and also enables long running tasks such as scanning to pause their transaction for a moment (if the nature of the task allows it), let AWT/swing do its job and reacquire the lock again. We are using this support when scanning, but there seems to be a regression in this mechanism which we will look at and try to fix it for the next build. "The cure to this is not the mutex you suggest (that would require a lot of code/work to implement and months to stabilize), but making sure that modules make calls to MDR in event thread only when they know what they are doing." <rant> I truly do not get why, when designing things in NetBeans, we *start* from the proposition that potentially long running operations must be synchronous - it leads to massively deep stack traces (with each additional stack frame an additional opportunity for someone to cause a deadlock). Operating systems have been using message-based queues for years to decouple such things with quite a bit of success, yet we treat the natural case as being synchronous operation for everything, no matter how complex, and invokeLater() as a sort of dirty hack you do when trapped in a corner, rather than using event/message queues as designed. This gets us into troubles like this. This also goes to our typical approach to threading which is, whatever thread asks for a thing first wins (consider the java parser). That's more like a threading circus than a threading model. My point is that, perhaps MDR queries simply *should not be synchronous, period.* That is, when you want some information, you queue a request, and MDR gets back to you when that information is ready. If somebody *really* needs the information synchronously, they can do some equivalent of wait(myQuery), and it's their responsibility not to do that when asking for something that will likely take a long time, and their responsibility to design their code not to expect that all queries take 0 time. I think no synchronous solution can ever work 100% unless you can prove a maximum duration for queries. "enables long running tasks such as scanning to pause their transaction for a moment (if the nature of the task allows it), let AWT/swing do its job" This is *almost* what I'm talking about, except that this should be the standard case, not the exceptional case, and should not involve any kind of black magic or hacks. Queueing work in logical serial units should be the norm, not the exception. </rant> To calm down, a bit, what I would expect from anything that can take a potentially infinite amount of time is: - There is a message queue to which queries are posted - There is/are dedicated thread(s) which will do the work - There is a notification callback when a query's result is available, with failure, cancellation and timeout semantics - No code should be designed to assume a query is instantaneous. Code that must block until a result is available is thus *forced* to post some UI to indicate that it is waiting for a result, but the AWT event queue does not need to be blocked. After all that, *if* MDR can determine *for sure* that a query can complete in <threshold> time *before* it runs, then as an optimization, it may perform the query synchronously, to avoid context switching costs, but that's an optimization you don't do until everything else is rock solid. What scares me is that all this seems pretty obvious and basic, and it sounds like we're not terribly close to such a design. Martin, I hope you can correct me. For the particular stack trace in this exception, it is trying to populate the combo box in the editor toolbar. There is no particular reason that either: - a. the combo's contents must be exactly accurate before it ever appears on the screen - populating it could be invoke-latered - b. it needs to know its full contents - it's a combo box. Unless its popup is open, it does not need to know the full contents of its list, it only needs to know the selected item it should display, and if there is at least one additional item so the popup button should be enabled. So possibly an optimization to request from Explorer - NodeListModel should resolve its contents lazily. What it's doing now is silly. at org.openide.explorer.view.NodeListModel$1.run(NodeListModel.java:86) at org.openide.util.Mutex.doEvent(Mutex.java:903) at org.openide.util.Mutex.readAccess(Mutex.java:227) at org.openide.explorer.view.NodeListModel.setNode(NodeListModel.java:70) at org.openide.explorer.view.ChoiceView.updateChoice(ChoiceView.java:151) at org.openide.explorer.view.ChoiceView.access$200(ChoiceView.java:29) at org.openide.explorer.view.ChoiceView$PropertyIL.propertyChange(ChoiceView.java:195) *** Issue 44089 has been marked as a duplicate of this issue. *** Blocking of AWT thread during scanning is now fixed: Checking in src/org/netbeans/modules/javacore/ExclusiveMutex.java; /cvs/java/javacore/src/org/netbeans/modules/javacore/Attic/ExclusiveMutex.java,v <-- ExclusiveMutex.java new revision: 1.1.2.28.2.18; previous revision: 1.1.2.28.2.17 done Checking in src/org/netbeans/modules/javacore/FileScanner.java; /cvs/java/javacore/src/org/netbeans/modules/javacore/Attic/FileScanner.java,v <-- FileScanner.java new revision: 1.1.2.16.2.12; previous revision: 1.1.2.16.2.11 done Thanks for your summary Tim. Moving from synchronous to ansynchronous is not possible for javacore since it is based on JMI (all the APIs are generated from a model) that does not support asynchronous calls. So what we are doing currently is transforming the clients of the JMI API to call it asynchronously (or at least make sure it is not called in AWT thread). We may provide a utility classes/methods for making it easier for clients in the future so that a single request processor could be used to schedule the calls, etc. But purely moving to this approach without moving the calls to source hierarchy from AWT would not help anyway because of backward compatibility reasons (since the old src API is synchronous). So for now we are trying to fix exactly the problems you found out about populating the editor drop down by moving the data-collecting code to a different thread. "Moving from synchronous to ansynchronous is not possible for javacore since it is based on JMI (all the APIs are generated from a model) that does not support asynchronous calls." This suggests to me that either we are stretching JMI beyond its design limitations, or it is simply not well enough designed to actually solve the problem it's supposed to solve. The utility/helper approach sounds like a very, very good idea - no reason that couldn't be wrappered on top of JMI (preferably along with deprecating any other avenues of access to metadata). Let java/srcmodel be blocking, and deprecated - deadlocks and hangs are good encouragement to stop using deprecated calls. Verified fixed in trunk. Just did a cvs update and a clean build, and got a long delay after the first paint of the main window, when opening with a userdir which had openide open as a project, and a few files open in the editor. Stack dump looks like exactly the same thing going on as before: "AWT-EventQueue-1" prio=5 tid=0x00587920 nid=0x1ebe200 in Object.wait() [f1140000..f1142b20] at java.lang.Object.wait(Native Method) at org.netbeans.modules.javacore.ExclusiveMutex.enter(ExclusiveMutex.java:83) - locked <0x6283ab58> (a org.netbeans.modules.javacore.ExclusiveMutex) at org.netbeans.mdr.NBMDRepositoryImpl.beginTrans(NBMDRepositoryImpl.java:218) at org.netbeans.modules.java.bridge.MemberElementImpl.getName(MemberElementImpl.java :132) at org.openide.src.MemberElement.getName(MemberElement.java:81) at org.netbeans.modules.beans.PatternChildren$Listener.<init>(PatternChildren.java: 234) at org.netbeans.modules.beans.PatternChildren$Listener.<init>(PatternChildren.java: 228) at org.netbeans.modules.beans.PatternChildren.<init>(PatternChildren.java:39) at org.netbeans.modules.beans.PatternChildren.<init>(PatternChildren.java:86) at org.netbeans.modules.beans.PatternsBrowserFactory.createClassNode(PatternsBrowserFact ory.java:75) at org.netbeans.modules.java.ui.nodes.ExFilterFactory.createClassNode(ExFilterFactory.java: 64) at org.openide.src.nodes.FilterFactory.createClassNode(FilterFactory.java:75) at org.netbeans.modules.javadoc.comments.JavaDocPropertySupportFactory.createClassNode (JavaDocPropertySupportFactory.java:57) at org.netbeans.modules.java.ui.nodes.ExFilterFactory.createClassNode(ExFilterFactory.java: 64) at org.openide.src.nodes.FilterFactory.createClassNode(FilterFactory.java:75) at org.netbeans.modules.refactoring.ui.RefactoringFilterFactory.createClassNode(Refactoring FilterFactory.java:42) at org.netbeans.modules.java.ui.nodes.ExFilterFactory.createClassNode(ExFilterFactory.java: 64) at org.openide.src.nodes.SourceChildren.createNodes(SourceChildren.java:169) at org.openide.nodes.Children$Keys$KE.nodes(Children.java:1986) at org.openide.nodes.ChildrenArray.nodesFor(ChildrenArray.java:112) at org.openide.nodes.Children$Info.nodes(Children.java:1082) at org.openide.nodes.Children.justComputeNodes(Children.java:588) at org.openide.nodes.ChildrenArray.nodes(ChildrenArray.java:54) at org.openide.nodes.Children.getNodes(Children.java:324) at org.openide.nodes.FilterNode$ChildrenAdapter.run(FilterNode.java:1297) at org.openide.nodes.FilterNode$Children.updateKeys(FilterNode.java:1253) at org.openide.nodes.FilterNode$Children.addNotifyImpl(FilterNode.java:1150) at org.openide.nodes.FilterNode$Children.addNotify(FilterNode.java:1142) at org.openide.nodes.Children.callAddNotify(Children.java:419) at org.openide.nodes.Children.getArray(Children.java:462) at org.openide.nodes.Children.getNodes(Children.java:315) at org.openide.explorer.view.VisualizerNode.getChildren(VisualizerNode.java:179) at org.openide.explorer.view.NodeListModel.findSize(NodeListModel.java:198) at org.openide.explorer.view.NodeListModel.getSize(NodeListModel.java:133) at org.openide.explorer.view.NodeListModel.addAll(NodeListModel.java:274) at org.openide.explorer.view.NodeListModel$1.run(NodeListModel.java:86) at org.openide.util.Mutex.doEvent(Mutex.java:903) at org.openide.util.Mutex.readAccess(Mutex.java:227) at org.openide.explorer.view.NodeListModel.setNode(NodeListModel.java:70) at org.openide.explorer.view.ChoiceView.updateChoice(ChoiceView.java:151) at org.openide.explorer.view.ChoiceView.access$200(ChoiceView.java:29) at org.openide.explorer.view.ChoiceView$PropertyIL.propertyChange(ChoiceView.java:195) at java.beans.PropertyChangeSupport.firePropertyChange(PropertyChangeSupport.java:252) at org.openide.explorer.ExplorerManager.setExploredContext(ExplorerManager.java: 284) at org.openide.explorer.ExplorerManager.setRootContext(ExplorerManager.java:411) at org.netbeans.modules.java.ui.NavigationView.changeRootContext(NavigationView.java: 289) at org.netbeans.modules.java.ui.NavigationView.activatedNodeChanged(NavigationView.java: 249) at org.netbeans.modules.java.ui.NavigationView.addNotify(NavigationView.java:166) at java.awt.Container.addNotify(Container.java:2049) - locked <0x6227d858> (a java.awt.Component$AWTTreeLock) at javax.swing.JComponent.addNotify(JComponent.java:4288) at org.netbeans.modules.java.ui.actions.NavigateAction$ToolbarPresenter.addNotify(Navigate Action.java:103) at java.awt.Container.addImpl(Container.java:658) - locked <0x6227d858> (a java.awt.Component$AWTTreeLock) at javax.swing.JToolBar.addImpl(JToolBar.java:583) at java.awt.Container.add(Container.java:307) at org.netbeans.modules.editor.NbEditorToolBar.addPresenters(NbEditorToolBar.java: 373) at org.netbeans.modules.editor.NbEditorToolBar.checkPresentersAdded(NbEditorToolBar.java :230) at org.netbeans.modules.editor.NbEditorToolBar.access$200(NbEditorToolBar.java: 79) at org.netbeans.modules.editor.NbEditorToolBar$5.run(NbEditorToolBar.java:218) at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:178) at java.awt.EventQueue.dispatchEvent(EventQueue.java:477) at java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:234) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java: 184) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:178) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:170) at java.awt.EventDispatchThread.run(EventDispatchThread.java:100) P1->P2 *** Issue 44556 has been marked as a duplicate of this issue. *** Moved to new subcomponent java/javacore. Should not happen anymore - should be fixed by fix to issue 45077 (we changed the way how the files are scanned). There can still be a delay if you delete the storage files (or the storage files are corrupted) and you start the IDE with some files open in the edtor. But this should be a very rare case and it should not take too long. Reorganization of java component |