This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 161646 - Filesystems unstable after 2a8c95365005
Summary: Filesystems unstable after 2a8c95365005
Status: RESOLVED FIXED
Alias: None
Product: platform
Classification: Unclassified
Component: Filesystems (show other bugs)
Version: 6.x
Hardware: Other All
: P1 blocker (vote)
Assignee: Jiri Skrivanek
URL:
Keywords: RANDOM
: 161648 (view as bug list)
Depends on: 161648
Blocks:
  Show dependency tree
 
Reported: 2009-04-01 07:55 UTC by Jaroslav Tulach
Modified: 2009-04-03 07:52 UTC (History)
8 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jaroslav Tulach 2009-04-01 07:55:18 UTC
We have a problem. Commit validation started to fail with OutOfMemoryErr recently. Marek so the failure in core-main:
http://deadlock.netbeans.org/hudson/job/NB-Core-Build/2378/
I have seen it in ergonomics:
http://deadlock.netbeans.org/hudson/job/ergonomics/472/
It is not clear who is responsible for the increased memory usage, but as the problem appears in "testEditor", let's 
choose primary suspect to be Víťa.

Anyway I guess we need help dump. I'll see if I can get one locally. Also it might be useful to modify common.xml to 
start the VM with heapdump on OOME parameter to simplify future identification of the problem.
Comment 1 Vitezslav Stejskal 2009-04-01 10:50:43 UTC
This is strange, commit-validation in http://bertram.netbeans.org/hudson/job/jet-main/ passes and it also passed local
on my laptop. Maybe jet-main does not have changes that are causing this problem. I'll keep investigating with Jarda.
Comment 2 Jaroslav Tulach 2009-04-01 11:26:51 UTC
My current understanding of the problem:

There is the bug 161648 which is causing never-ending deletes of some Lucene index file. This delete is "write access" 
to userdir and CountingSecurityManager records it.

On my computer the "never-ending" loop time outs, however on other computers this may cause exhausting of the memory 
due to CSM internal logging (visible in all the OOME stack traces).

I've just integrated ergonomics#f13b8533de19 that disables CSM after start. This shall help with OOME, but still the 
commit validation is unlikely to finish successfully. The testEditor is likely to timeout.
Comment 3 Vitezslav Stejskal 2009-04-01 12:37:22 UTC
Maybe. It certainly looks random, because http://deadlock.netbeans.org/hudson/job/NB-Core-Build/ is passing now without
any attempt to fix this problem (it certainly does not have your f13b8533de19).
Comment 4 Jaroslav Tulach 2009-04-01 14:49:28 UTC
Well, the fault has to be random, otherwise it would not propagate through your builder (assuming it is your team's 
fault).

Anyway: After my fix the ergo build failed. Not on OOME, but on timeout in editor. Exactly as I expected:
http://deadlock.netbeans.org/hudson/job/ergonomics/474/testReport/org.netbeans.test.ide/IDECommitValidationTest/testEditor/

org.netbeans.jemmy.TimeoutExpiredException: No event under 11111110111111111111 event mask during 1000 milliseconds
        at org.netbeans.jemmy.Waiter.waitAction(Waiter.java:169)
        at org.netbeans.jemmy.EventTool.waitNoEvent(EventTool.java:319)
        at org.netbeans.jemmy.EventTool.waitNoEvent(EventTool.java:338)
        at org.netbeans.jellytools.JellyTestCase.runBare(JellyTestCase.java:144)
        at org.netbeans.junit.NbTestCase.run(NbTestCase.java:213)
        at org.netbeans.junit.NbModuleSuite$S.runInRuntimeContainer(NbModuleSuite.java:695)
        at org.netbeans.junit.NbModuleSuite$S.run(NbModuleSuite.java:568)
Comment 5 _ tboudreau 2009-04-01 15:58:07 UTC
I'm curious if this is the same OOME I've run into a few times.  I looked at the heap dump, and the culprit seemed to be
800K+ instances of StackTraceElement.
Comment 6 Jesse Glick 2009-04-01 17:04:28 UTC
I am trying core-main #de53962d1317 to cure the symptom. Root problem of excessive deletes (which I cannot reproduce
locally) would remain.
Comment 7 Jaroslav Tulach 2009-04-02 06:05:27 UTC
Trunk, Core-Main and ergonomics (all running on deadlock) are failing due to this problem:
http://deadlock.netbeans.org/hudson/job/trunk/5429/
http://deadlock.netbeans.org/hudson/job/NB-Core-Build/2387/
http://deadlock.netbeans.org/hudson/job/ergonomics/477/

Is there any logging we shall enable to give you more data?
Comment 8 Quality Engineering 2009-04-02 07:35:38 UTC
Integrated into 'main-golden', will be available in build *200904020200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/de53962d1317
User: Jesse Glick <jglick@netbeans.org>
Log: Attempted hotfix for #161646: test-induced OOME during C/V.
Comment 9 Vitezslav Stejskal 2009-04-02 09:13:51 UTC
The http://bertram.netbeans.org/hudson/job/jet-main/ build is now failing with the same error. We'd had a row of ~10
successful builds, which ended with #348. The #349 shows OOME, which was remedied by Jarda's and/or Jesse's fixes in
#350 and later builds. I hope I'll be able to reproduce the problem locally on a full build after hg fetch.
Comment 10 Vitezslav Stejskal 2009-04-02 12:39:20 UTC
The problem is caused by http://hg.netbeans.org/main-silver/rev/2a8c95365005. With this changeset reverted the commit
validation tests started passing again on my laptop. With the changeset in I am almost always able to reproduce the problem.

I copy here the steps for reproducing the problem from issue #161648, which is a duplicate of this one:

1. make sure that you have 2a8c95365005 in your local clone
2. make full build
3. goto java.kit module
4. run the following command:

ant test-single -Dtest.includes=**/IDECommitVali* -Dtest.type=qa-functional
-Dtest-qa-functional-sys-prop.org.netbeans.modules.parsing.impl.indexing.RepositoryUpdater.level=FINE

The RepositoryUpdater logging was not essential for reproducing the problem, but it increased probability of failure on
my laptop.
Comment 11 Vitezslav Stejskal 2009-04-02 12:39:52 UTC
*** Issue 161648 has been marked as a duplicate of this issue. ***
Comment 12 Jiri Skrivanek 2009-04-02 16:36:27 UTC
I am sorry for inconvenience and thank Vita for evaluation. I had to revert 2a8c95365005. We cannot cache File.exists()
in FileInfo because file can be deleted during issuing a FileObject.

http://hg.netbeans.org/core-main/rev/394548265a59
Comment 13 Quality Engineering 2009-04-03 07:52:07 UTC
Integrated into 'main-golden', will be available in build *200904030200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/394548265a59
User: Jiri Skrivanek <jskrivanek@netbeans.org>
Log: #161646 - Revert 2a8c95365005 of issue 66690 because it caused a regression. We cannot cache File.exists() in FileInfo because file can be deleted during issuing a FileObject.