This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 181684

Summary: NB platform consumes a lot of memory on big projects.
Product: platform Reporter: Alexander Simon <alexvsimon>
Component: FilesystemsAssignee: Jaroslav Tulach <jtulach>
Status: VERIFIED FIXED    
Severity: normal CC: apireviews, hmichel, issues, issues, jglick, jtulach, misterm, ovrabec, pjiricka, tmysik, tstupka, vstejskal
Priority: P2 Keywords: API_REVIEW_FAST, PERFORMANCE
Version: 6.x   
Hardware: PC   
OS: Solaris   
Issue Type: DEFECT Exception Reporter:
Bug Depends on: 183162, 183225    
Bug Blocks: 171672, 189988    
Attachments: memory snapshot at scanning time
patch for reducing memory
memory snapshot without .svn folders
Reduce memory on 20Mb patch
Stacktrace showing the ProjectOpenedHook activity
Enhancement in masterfs to allow versioning systems to skip .svn folders & etc.
New patch with test
versioning spi changes
versioning spi test

Description Alexander Simon 2010-03-09 06:49:17 UTC
Steps to reproduce:
Follow instruction on chromium site: http://dev.chromium.org/Home
1. Download sources
2. Build chromium, provide CFLGS="-g3 -gdwarf-2" and CXXFLGS="-g3 -gdwarf-2".
3. Start IDE with 600Mb of heap and create C/C++ project from existing 
code. Do not select build, select manual configuring code assistance.
4. Switch off Code assistance "Project popup menu->Code 
Assistance->C/C++ code Assistance"
5. Exit from IDE.
6. Clear user dir.
7. Start IDE and open created project.
At this point project do not have C/C++ support. Only 20000 C/C++ data 
objects (11Mb) in memory.
Other memory are consumed by IDE platform.
See constantly increasing memory (up to 440Mb) while scanning is 
processing (284Mb of heap that cannot be GCed)
At the end of scanning IDE holds 234Mb of heap that cannot be GCed.
The biggest objects:
88Mb org.netbeans.modules.masterfs.filebasedfs.fileobjects.FolderObj
78Mb org.netbeans.modules.masterfs.filebasedfs.children.ChildrenSupport
67Mb org.netbeans.modules.masterfs.filebasedfs.naming.FileName
64Mb org.netbeans.modules.masterfs.filebasedfs.fileobjects.FileObjectKeeper

Size of strings 84Mb.
I see a lot of duplicated strings.
For example:
682 "14.3-b01 (Sun Microsystems Inc.)"
690 "false"
580 "Java > 1.6"
778 "1.0"
70 "1.7"
13 "org.netbeans.modules.cnd.debugger.common.resources.Bundle"
13 "SeparatorAfterFormat.instance"
3 
"/net/elif/export1/sside/av202691/chromium-trunk/src/tools/traceline/svgui" 
(1 instance located in the pool)
2 
"org-netbeans-modules-html-editor-coloring-EmbeddingHighlightsLayerFactory.instance"
+ path names of project folder/files have 3-5 duplicated string  instances

To fix BZ#171672 (Cannot create project for Chrome sources with -Xmx512m) we need a help from IDE => could you, please, think, how to 
reduce platform memory consumption by 100Mb?
Comment 1 Jaroslav Tulach 2010-03-09 07:17:37 UTC
Can you generate a heap dump and send me a link where I can download it? That will speedup my evaluation.

Otherwise this is related to addRecursiveListener. It needs to keep all objects representing folders under source roots in memory. 88MB of FolderObj is definitely a lot. That shall be made smaller. Btw. can you count the # of folders in the project?

I cannot help you with memory your string issue however unless you provide GC root path for some of the strings. Report it as a separate issue (I do not think such strings represent file names) if you believe it worths the effort.
Comment 2 Alexander Simon 2010-03-09 08:56:28 UTC
Answers for all questions are in memory snapshots:
/net/elif.russia.sun.com/export1/sside/av202691/ChromiumMemoryProfiling/
Folder contains 4 intermediate (while scanning) snapshots and last after finishing scanning:
Main-2010-03-09.snapshot
Main-2010-03-09(1).snapshot
Main-2010-03-09(2).snapshot
Main-2010-03-09(3).snapshot
Main-2010-03-09(4).snapshot
Snapshots also have information about object allocations (each 10-th).
Snapshots were taken by http://www.yourkit.com/ version 8.0.23.
YourKit license server is endif.russia.sun.com.
Also you can use built sources and project Chromium in the folder /net/elif.russia.sun.com/export1/sside/av202691:
- chromium-trunk
- chromas
But I do not sure that CND can right understand full server name /net/elif.russia.sun.com/... because root /ner/elif/... was used for building and project creation.
Are resources available from your net?
Comment 3 Jaroslav Tulach 2010-03-10 03:45:23 UTC
The chromium sources seem to have a lot of directories, but many of them are SVN ones:

av202691/chromium-trunk$ find . -type dir | wc -l
   32822
av202691/chromium-trunk$ find . -type dir | grep -v .svn | wc -l
    4990

Five thousand of directories is still quite a lot, but significantly less than 33 thousands. 

As the .svn ones are hidden anyway, the question is whether masterfs's recursive listener shall observe their changes or not. CCing Ondřej so he knows that I am considering to not listen on .svn directories and their content.
Comment 4 Jaroslav Tulach 2010-03-10 04:28:37 UTC
I shall also point out for Víťa, that (as soon as bug 180523 is implemented) there  can be hard limit on the size of subdirectories for a source root and the parsing API can disable the addRecursiveListener completely, asking user to do manual refresh via Sources/Scan for External Changes
Comment 5 Quality Engineering 2010-03-12 04:32:54 UTC
Integrated into 'main-golden', will be available in build *201003120200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/414a5a698cea
User: Jaroslav Tulach <jtulach@netbeans.org>
Log: Doing something with #181684: One Object and one field less per each FolderObj
Comment 6 Ondrej Vrabec 2010-03-15 05:46:57 UTC
We don't handle FS events on svn metadata in any special way now, so it'll be all right if the recursive listener is not added to .svn folders or any of its children.
Comment 7 Jaroslav Tulach 2010-03-16 09:11:34 UTC
I did another change in core-main#bb1368f4507a

I failed to download the snapshots (2h not enough to download a single one of them). Please remeasure on your side. Provide the list of biggest objects as you did, please include also the # of their instances. Thanks.
Comment 8 Alexander Simon 2010-03-16 13:06:49 UTC
Created attachment 95231 [details]
memory snapshot at scanning time
Comment 9 Alexander Simon 2010-03-16 13:12:35 UTC
Size of not GCed memory is 398Mb.
After scanning finished:
- memory is reduced to 216Mb
- biggest objects Linked Lists go away
- number of FolderObj anf FileName objects are the same.
Comment 10 Alexander Simon 2010-03-16 13:40:57 UTC
By the way, does IDE really need to store time stamps as strings (see org.netbeans.modules.parsing.impl.indexing.TimeStamps)?
Each string consumes 64 bytes.
Long consumes only 16 bytes.
The best optimization is:
- use specialized hash map that keeps long in the entry.
It allows to reduce values to 8 bytes.
For example see org.netbeans.modules.cnd.repository.util.LongHashMap.
It allows to reduce heap memory on 4Mb. (From CND it is a low hang fruit).
Comment 11 Alexander Simon 2010-03-16 14:13:19 UTC
It seems constructor allocates a lot of empty ArrayLists with size 10:
FileObject.ED(FCLSupport.Op op, Enumeration<FileChangeListener> en, FileEvent fe) {
..
            fsList = (fsll != null) ? new ArrayList<FileChangeListener>(fsll.getAllListeners()) :
                new ArrayList<FileChangeListener>();
            repList = (repll != null) ? new ArrayList<FileChangeListener>(repll.getAllListeners()) :
                new ArrayList<FileChangeListener>();
}

I do not see any modifications of fsList & repList
So it can be rewritten:
            fsList = (fsll != null) ? new ArrayList<FileChangeListener>(fsll.getAllListeners()) :
               Collections.<FileChangeListener>emptyList();
            repList = (repll != null) ? new ArrayList<FileChangeListener>(repll.getAllListeners()) :
               Collections.<FileChangeListener>emptyList();
IMHO it saves more then 10Mb
Comment 12 Jaroslav Tulach 2010-03-22 13:54:32 UTC
FileObject.ED are short time living objects. Are you saying they were present in significant amount in your memory snapshots?
Comment 13 Alexander Simon 2010-03-22 14:32:33 UTC
(In reply to comment #12)
> FileObject.ED are short time living objects. Are you saying they were present
> in significant amount in your memory snapshots?
According to memory snapshot size of FileObject.ED is 103Mb non-GCed memory.
Comment 14 Jaroslav Tulach 2010-03-23 09:52:38 UTC
(In reply to comment #13)
> According to memory snapshot size of FileObject.ED is 103Mb non-GCed memory.

Opps. That is quite a lot. I can imagine ED exists when some kind of refresh is in progress, but then it shall be discarded. If you give me path from some ED to GC root, I'll try to break that chain somehow.
Comment 15 Vladimir Voskresensky 2010-03-23 11:03:50 UTC
Created attachment 95596 [details]
patch for reducing memory
Comment 16 Vladimir Voskresensky 2010-03-23 12:00:56 UTC
proposed patch reduces memory without changing semantics.
+ I've made class static => 
reduced by 4 bytes as well, but object is still 32 bytes, because of aligning.
There is a possibitity to reduce object to 24 bytes using one collection for listeneres if it is possible
Comment 17 Jaroslav Tulach 2010-03-23 13:03:46 UTC
Integrated as core-main#1a92fa20fe36 (hopefully correct and without indentation changes). I would still like to understand why are these objects hold in memory however (if they are for a long term).
Comment 18 Vladimir Voskresensky 2010-03-23 13:42:31 UTC
I think Alexander provided snapshot... But you can run it yourself.

Btw, can you confirm for :
                FileSystem fs = fe.getFile().getFileSystem();
                Repository rep = fs.getRepository();
is the following always true? 
(fs == null) == (rep == null)

If yes => use one listeners list and it would save 8 bytes per object
Comment 19 Vladimir Voskresensky 2010-03-23 13:43:39 UTC
Oops, fs can not be null, otherwise NPE
Comment 20 Vladimir Voskresensky 2010-03-23 14:26:57 UTC
Jarda, can you enhance patch to have logic which checks not only null, but as in patch hasListeners()
+            if (fsll != null && fsll.hasListeners()) {
+                fsList = new ArrayList<FileChangeListener>(fsll.getAllListeners());
+            } else {
+                fsList = Collections.<FileChangeListener>emptyList();
+            }
because
    ListenerList() {
        listenerList = new ArrayList<T>();
    }
Comment 21 Jaroslav Tulach 2010-03-24 18:11:54 UTC
core-main#7d09172f8700, can we now close this bug as fixed?
Comment 22 Alexander Simon 2010-03-24 19:48:15 UTC
IMHO it is a first fix in long chain of fixes.
Mentioned fix improves only peak memory consumption.
See all problems in attached memory snap shot.
Comment 23 Vladimir Voskresensky 2010-03-25 07:09:18 UTC
Alexander, subject of issue is very abstract. May be it's worth to use this as umbrella task and file each problem as separate issue?

Yarda, what do you think?
Comment 24 Jaroslav Tulach 2010-03-25 08:43:38 UTC
I will continue then with ignoring .svn subfolders. That will reduce the number of FolderObj to 20%. If there are other things to do, feel free to report them separately.
Comment 25 Alexander Simon 2010-03-25 10:18:58 UTC
I would suggest criteria of "bug is fixed":
- IDE with switched off C/C++ code assistance can open project, finished scanning and open one file in editor without out of memory exception in 200Mb heap.
Do you agree?
Comment 26 Jaroslav Tulach 2010-03-25 13:00:08 UTC
Before we get to goals, can you "find .svn | xargs rm -rf" and remeasure the current memory requirements without subversion being on?
Comment 27 Alexander Simon 2010-03-25 18:15:53 UTC
Created attachment 95855 [details]
memory snapshot without .svn folders
Comment 28 Alexander Simon 2010-03-25 18:19:44 UTC
Size of not GCed memory is 220Mb.
After scanning finished memory is reduced to 113Mb.
Comment 29 Alexander Simon 2010-03-26 08:00:07 UTC
Vladimir, could you suggest patch for removing string from
org.netbeans.modules.parsing.impl.indexing.TimeStamps?
In fact client set longs that are stored as strings.
Class stores strings because it allows to use standard Properties load and store methods.
IMHO it is a weak reason to keep strings in memory.
Could you also suggest to move in NB utilities CND LongHashMap class?
Comment 30 Alexander Simon 2010-03-26 10:44:49 UTC
Created attachment 95923 [details]
Reduce memory on 20Mb patch

Patch that reduces FileName size from 27Mb to 7Mb.
If you agree with patch, what do you think about moving org.netbeans.modules.cnd.utils.cache.CharSequenceKey in NB utilities API?
Comment 31 Jaroslav Tulach 2010-03-29 07:44:13 UTC
Interesting patch, having efficient compacted string storage could improve many places where we store long time existing strings (module system and apisupport come to my mind). If you want to donate this, I suggest to put the API into org.openide.util.CharSequences (just few static methods, right?). Please start the API review for that in separate issue. When finished, I'll "just" use it in masterfs.
Comment 32 Vitezslav Stejskal 2010-03-30 11:17:13 UTC
(In reply to comment #0)
> 3. Start IDE with 600Mb of heap and create C/C++ project from existing 
> code. Do not select build, select manual configuring code assistance.

How exactly do I do this? The New Project wizard wants either makefile or a 'configure' script capable of generating makefile. I have neither of those...
Comment 33 Vitezslav Stejskal 2010-03-30 11:44:26 UTC
(In reply to comment #32)
> (In reply to comment #0)
> > 3. Start IDE with 600Mb of heap and create C/C++ project from existing 
> > code. Do not select build, select manual configuring code assistance.
> 
> How exactly do I do this? The New Project wizard wants either makefile or a
> 'configure' script capable of generating makefile. I have neither of those...

Ok, I created a fake makefile (an empty one) and managed to create C/C++ project somehow. I selected chromium/src as the sources folder. Clicked OK in the New Project wizard and have been waiting since then for the project to open... No scanning, just opening the project, which seems to be stuck in MakeConfigurationDescriptor and collecting files. I'll attach the stacktrace.
Comment 34 Vitezslav Stejskal 2010-03-30 11:45:46 UTC
Created attachment 96321 [details]
Stacktrace showing the ProjectOpenedHook activity
Comment 35 Alexander Simon 2010-03-30 12:02:03 UTC
(In reply to comment #34)
> Created an attachment (id=96321) [details]
> Stacktrace showing the ProjectOpenedHook activity
It is ordinary C/C++ project creation. It consume a lot of time because perform recursive ls.
Comment 36 Vitezslav Stejskal 2010-03-30 14:40:19 UTC
I changed TimeStamps to use LongHashMap. Thanks
http://hg.netbeans.org/jet-main/rev/a3f97829146c

On the other hand, it seems to me that there are much bigger problems in the C/C++ infrastructure (or platform) itself that prevent using such large projects like chromium. When I tested it following the steps in the first post here the project did not even open.

<crying>
I remember that in 6.7 we struggled to get the IDE open the ACE project, which contained ~60k files. The main problem at that time was files crawling and mime types recognition. This time we are asked to open a project with ~300k files in ~35k folders. The physics may be the limit this time...
</crying>

On the constructive note I'd suggest to temporarily turn off registering source path from the C/C++ project's open-hook-impl. This should effectively turn off indexing (there will be no source roots to scan). If the IDE can start, open and work with Chromium project within a reasonable memory heap and be reasonably responsive we can then look at how much harm is done by indexing and either improve it or avoid using it for C/C++ projects.
Comment 37 Alexander Simon 2010-03-30 15:59:56 UTC
It seem we are close to target.
Two improvements allow to parse and scan project in 512Mb:
- See Comment #27
- See Comment #30 

This project has performance problem in scanning:
- some of files under root has a "parser error" from html,css,js,.. parsers point of view. Such parsers has a bad error recovery algorithms and consume a lot of time on throw-catch exceptions and logging exceptions.
Comment 38 Vladimir Voskresensky 2010-03-30 20:01:36 UTC
(In reply to comment #36)
> I changed TimeStamps to use LongHashMap. Thanks
> http://hg.netbeans.org/jet-main/rev/a3f97829146c
I think, we need to move it into org.openide.util close to WeakSet.
Jarda, what do you think?
Don't like duplication of such huge code.

> 
> On the constructive note I'd suggest to temporarily turn off registering source
> path from the C/C++ project's open-hook-impl. This should effectively turn off
> indexing (there will be no source roots to scan). If the IDE can start, open
> and work with Chromium project within a reasonable memory heap and be
> reasonably responsive we can then look at how much harm is done by indexing and
> either improve it or avoid using it for C/C++ projects.
We think about removing usage of indexing API for C++ for 6.9 (issue #182884),
+ we are open to provide all our optimized structures into NB Platform 
+ come back to use of Indexing API in the next release.
Comment 39 Jaroslav Tulach 2010-03-31 07:32:28 UTC
> > http://hg.netbeans.org/jet-main/rev/a3f97829146c
> I think, we need to move it into org.openide.util close to WeakSet.
> Jarda, what do you think?
> Don't like duplication of such huge code.

I like such huge amount of code in openide.util neither. But if you can simplify the API to something like:

public static <K> Map<Long,K> createLongMap(capacity, factor);

then creating org.openide.util.Maps is probably appropriate.
Comment 40 Vladimir Voskresensky 2010-03-31 08:11:38 UTC
(In reply to comment #39)
> > > http://hg.netbeans.org/jet-main/rev/a3f97829146c
> > I think, we need to move it into org.openide.util close to WeakSet.
> > Jarda, what do you think?
> > Don't like duplication of such huge code.
> 
> I like such huge amount of code in openide.util neither. But if you can
> simplify the API to something like:
> 
> public static <K> Map<Long,K> createLongMap(capacity, factor);
it is not Map<Long, K>, it is "Map<K, long>" which is not possible to declare in Java, because primitive class can not be parameter of template. Purpose of this class is to prevent boxing/unboxing + memory efficient Entry impl.
Comment 41 Quality Engineering 2010-04-02 05:04:06 UTC
Integrated into 'main-golden', will be available in build *201004020200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/a3f97829146c
User: Vita Stejskal <vstejskal@netbeans.org>
Log: #181684: using LongHashMap for timestamps
Comment 42 Vladimir Voskresensky 2010-04-06 14:40:22 UTC
use optimized char sequnce impl (27->7 Mb)
http://hg.netbeans.org/cnd-main/rev/2f368f1909c3
Comment 43 Vladimir Voskresensky 2010-04-06 14:41:38 UTC
what's the progress with remaining SVN issue? Do we need separate blocker bug?
Comment 44 Quality Engineering 2010-04-07 04:47:16 UTC
Integrated into 'main-golden', will be available in build *201004070201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/2f368f1909c3
User: Vladimir Voskresensky <vv159170@netbeans.org>
Log: fixing #181684 -  NB platform consumes a lot of memory on big projects. (use optimized CharSequences)
Comment 45 Jaroslav Tulach 2010-04-07 12:44:31 UTC
Created attachment 96857 [details]
Enhancement in masterfs to allow versioning systems to skip .svn folders & etc.
Comment 46 Jaroslav Tulach 2010-04-07 12:47:31 UTC
Let's review this small addition to ProvidedExtensions class. First and foremost Ondra needs to modify versioning and check that this API change allows SVN to skip .svn folders (and detect change in entries file), Mercurial to watch time stamp of file inside .hg that changes when one does checkin and in general remove any need for FileChangeListener in the versioning implementations.
Comment 47 Ondrej Vrabec 2010-04-08 14:54:54 UTC
jarda, your patch seems not to work, refreshRecursively is not delegated to the ProvidedExtensions implementation in versioning. I tried the patch, did override refreshRecursively in versioning.FilesystemInterceptor yet the default implementation in masterfs is still executed. 
Unless i am mistaken, you probably need to override also refreshRecursively in org.netbeans.modules.masterfs.ProvidedExtensionsProxy which seems to be the class that finally delegates all ProvidedExtensions methods to versioning.
See also org.netbeans.modules.masterfs.filebasedfs.FileBasedFileSystem.StatusImpl.getExtensions(), it returns masterfs.ProvidedExtensionProxy instead of versioning.FilesystemInterceptor
Comment 48 Jaroslav Tulach 2010-04-11 13:20:54 UTC
Created attachment 97027 [details]
New patch with test

This is what one gets when thinking "I'll write a test later...". Nothing obviously works then.
Comment 49 Ondrej Vrabec 2010-04-12 16:20:19 UTC
Extended versioning SPI - delegating refreshRecursively to particular versioning systems
Comment 50 Ondrej Vrabec 2010-04-12 16:20:26 UTC
Created attachment 97111 [details]
versioning spi changes
Comment 51 Tomas Stupka 2010-04-12 17:13:24 UTC
Created attachment 97114 [details]
versioning spi test
Comment 52 Ondrej Vrabec 2010-04-13 13:01:48 UTC
[OV01] IMO masterfs.ProvidedExtensions.refreshRecursively() should return "-1" (instead of "0") as default. masterfs.ProvidedExtensionsProxy.refreshRecursively() iterates through all registered implementations of ProvidedExtensions until the first one is found that returns a value other than "-1" (so e.g. "0"). Currently versioning is probably the only implementor of ProvidedExtensions so everything works just fine, however what if there was another simple implementation which did not override the refreshRecursively method and thus always returning 0? When this happens and the new implementation precedes versioning in the iterator, it will effectively suppress the versioning implementation even without wanting to handle recursive listening in a specific way.
Comment 53 Jaroslav Tulach 2010-04-15 14:47:53 UTC
Thanks for review. I'll implement OV01 and integrate my part tomorrow. Then I assign back to Ondřej.
Comment 54 Jaroslav Tulach 2010-04-16 10:39:52 UTC
My part done in core-main#3da55e68c8b5, passing to Ondřej.
Comment 55 Quality Engineering 2010-04-17 08:21:24 UTC
Integrated into 'main-golden', will be available in build *201004170515* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/3da55e68c8b5
User: Jaroslav Tulach <jtulach@netbeans.org>
Log: #181684: Giving ProvidedExtensions chance to optimize behavior of addRecursiveListener
Comment 56 Ondrej Vrabec 2010-04-19 09:45:39 UTC
versioning.spi changes: http://hg.netbeans.org/cdev/rev/d88f0ca0d0d6
Comment 57 Ondrej Vrabec 2010-04-19 09:45:51 UTC
changes in mercurial, subversion and cvs: http://hg.netbeans.org/cdev/rev/18eaa28d766c
Comment 58 Ondrej Vrabec 2010-04-19 09:47:56 UTC
fixed in versioning, reassigning back to jarda, so he can made any additional changes, test all changes and finally close the issue
Comment 59 Jaroslav Tulach 2010-04-19 10:19:15 UTC
Your change looks meaningful, however all I can say is that testing will happen as part of verification and is up to Alex.
Comment 60 Quality Engineering 2010-04-20 05:18:09 UTC
Integrated into 'main-golden', will be available in build *201004200200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/d88f0ca0d0d6
User: Ondrej Vrabec <ovrabec@netbeans.org>
Log: Issue #181684 - NB platform consumes a lot of memory on big projects.
Comment 61 Alexander Simon 2010-04-20 11:38:34 UTC
Verified. Cromium project can be opened in 512 Mb.
Profiling results, consumption of non GCed memory:
-CND parsing -Indexing Memory  80Mb
+CND parsing -Indexing Memory 207Mb
-CND parsing +Indexing Memory 119Mb
+CND parsing +Indexing Memory 218Mb