CND uses ACE+TAO project as benchmark to improve performance && memory consumption and Scanning Projects task slows down
performance of the parse.
Here you are some numbers:
I)cold open of project gives the following numbers:
-- CND parser stopwatch: 794082 ms
-- Indexer stops: INFO [org.netbeans.modules.parsing.impl.indexing.RepositoryUpdater]: Complete indexing of 2 source
roots took: 1852277 ms
II)close ide/remove cnd cache, leave indexer cache, start ide:
-- CND parser stopwatch: 717407 ms
-- Indexer stops: INFO [org.netbeans.modules.parsing.impl.indexing.RepositoryUpdater]: Complete indexing of 3 source
roots took: 291207 ms
I see several issues here:
- CND is able to parse ACE+TAO faster than indexer enumerate files... not good, I would say.
- Indexer touches files and consume I/O resources very intensively affecting our lexing phase which also needs I/O
Some more details about used project:
- project dir has about 65 000 files, it's normal for C/C++ project to mix object files and source files in the same
folder during build
- only 14 500 of them are source files
- only crawler task is doing it's work and most of the time in the detection of mime-type
What I can propose:
- provide IndexerVisibilityQueries for source roots and C++ source roots provider will provide own filter which files to
skip, there is no sense to detect mime types of all binaries in the directory like object files, intermediate files and
-- this is different from VisibilityQuery, because such files still needs to be shown in Files View
- allow to postpone crawler task, until CND finishes it's own parse
and of course, please, speed up :-)
1852277 ms is unbelievable,
we use only 10 minutes to read all full dwarf info from all that files and construct CND project
(3x times faster than read of few bytes in the beginning of file...)
(this is CND requirement for 6.8 planning)
I have profiled scanning of ACE+TAO and I see two problems:
- probably unnecessary flush of lucene index during each 2000's addDocument
- unnecessary creation of snapshots when no corresponding parser for mime-type
Please, review and apply proposed patches. Scanning have changed from 97 sec to 38 sec after applying this fixes.
Created attachment 84662 [details]
Created attachment 84663 [details]
no extra store
From jlahoda by private email:
I am sending a quick and dirty (esp. the ClassPathProvider part) patch to test the excluding idea on the indexing I
talked today on the meeting. You will need to adjust the "includes" method in "PathResourceImpl" to suite your needs.
Could you please test the patch to see how big is the improvement, if any? The excluding seems to work (I did no see a
.o file in Go to File when I used the patch).
Created attachment 84793 [details]
jlahoda's patch for C++ ClassPathProvider implementation
I have applied this already. But it's not enough :-)
I pushed changes that I believe fix this problem. The main change is the new crawling algorithm that does not resolve
mime types until they are really needed. And even then it's done in the way that should minimize the number of disk
reads from the files (ie. using FileUtil.getMIMEType(FileObject f, String... mimeTypes) and preferring indexers for
mimetypes that are recognized a file extension, etc). I tested this change with ACE+TAO and the up-to-date check for
~50k files in the project takes around 20 seconds, which I think is acceptable. The cold start is still slow; the
indexing and C++ parsing runs in parallel and C++ parsing is still faster then the indexing. I'll try to investigate
what exactly is done there. If this is important for C++ folks please file a separate issue. The second start is much
faster with the indexing doing only its up-to-date check (~20 sec).
The other two changes are Vladimir's patches attached here earlier. I did not do any extensive measurements and so can't
say whether they improved the situation or not. Let's see what the performance guys have to say to that.
http://hg.netbeans.org/jet-main/rev/f22eac907102 - new crawler
http://hg.netbeans.org/jet-main/rev/9353b095ddda - no Snapshot patch
http://hg.netbeans.org/jet-main/rev/394302be6599 - no MAX_DOCS based lucene documents flush
Integrated into 'main-golden', will be available in build *200907200201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
User: Vita Stejskal <email@example.com>
Log: #166340: more efficient files crawling
initial scanning is very slow again. I have filed
I had to backout f2780b119b6b. http://hg.netbeans.org/jet-main/rev/64a6bf12c424
btw, I didn't realize what it was about? Did it have a good impact? How good it was?
Another attempt on filtering roots - http://hg.netbeans.org/jet-main/rev/332df3a78ea8
Vladimir, due to fixing files crawling custom indexers (eg. JavaCustomIndexer) are now asked to index even roots that
never contain files that they are interested in (eg. text/x-java). They usually ignore the roots, but for example
JavaCustomIndexer prints warnings to the log file. These recent fixes attempt to improve the situation. Unfortunately
the first attempt was rather bad, my apologies. I stash the changesets here, because they are related to the original fix.
Integrated into 'main-golden', will be available in build *200907291401* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
User: Vita Stejskal <firstname.lastname@example.org>
Log: another attempt: #166340 (follow up): do not scan roots by CustomIndexers, which are registered for mime types different than the root's mimetypes
*** Issue 170204 has been marked as a duplicate of this issue. ***