Steps to reproduce:
1. open all 841 Netbeans source code projects
2. select Find in Projects
3. select scope Open Projects
it takes 4 minutes real time and 7 minutes CPU time
5. repeat steps 2-4
it takes 3 minutes real time and 6 minutes CPU time
Product Version: NetBeans IDE Dev (Build 20120615-ed0556fcb49e)
Java: 1.7.0_05; Java HotSpot(TM) 32-bit Client VM 23.1-b03
system: 3.2 GHz Quad Core, 4 GB RAM, 64-bit Linux
I will try to fix it.
There are several prepared optimizations for searching in file contents, but they were not tested well enough to be included in 7.2, so currently only the most safe algorithm is used.
See package org.netbeans.modules.search.matcher in module api.search.
Patches are welcome. Thank you.
P.S. You can also use an alternative search provider, e.g. Grep search from C/C++ bundle.
I have some code done, but I'm blocked by bug 214532
Could you please check bug 214532 if you would find solution?
(In reply to comment #4)
> Could you please check bug 214532 if you would find solution?
I'm sorry, I am still unable to solve it.
Module editor depends on api.search, and module parsing.api depends on editor.lib (and some other editor modules). Maybe there is some "hidden cyclic dependency".
I see you are going to use index during searching. This would make searching faster, but can be quite complicated (special algorithms for simple and regexp patterns), and indexing can be slower.
Could you consider creating a custom search provider for your algorithm, and adding it to the default search when it is tested, stabilized and measured? There should also be no problems with dependency on parsing.api. I can help you implementing it.
Thanks for your help. And apologies for the delay.
*DON'T DO THIS* if you don't want to completely completely screw up IDE performance.
My indexer is running in thread and doesn't slow down IDE.
>My indexer is running in thread and doesn't slow down IDE.
1st) Indexer cannot run in thread other than RepositoryUpdater.indexingThread it has to have full ordering of scanStarted, index, scanFinished.
2nd) Yes, it does. It does IO which is not needed as it was already done.
3rd) Even running in background thread slows down other threads, so priority mechanism and hand out scheduling similar to NB 7.2 transactional scanning (background scan) is needed.
What do you propose how to do full text indexing?
My indexer enables Find in Projects to take 4 seconds instead of 4 minutes.
I suppose most users prefer faster IDE (with slow searching) to super-fast searching.
I suggest creating a new module with your indexer and a custom search provider, and possibly uploading it to the plugin portal.
And scan time takes several minutes more. It will never pass the IDE performance criteria.
The feature should be written the other way round, there should not be a specialized indexer reading the file content again and again and trying to analyze the content by *WhileSpaceAnalyzer*, the indexers has to provide identifiers itself to identifier index. The identifier index can be used by the SearchProvider.
The identifier index has still unstable API but should be finalized to NB 7.3.
Such an approach has several benefits:
1st) It's very cheap (no additional IO on sources)
2nd) Instead of dummy *WhileSpaceAnalyzer* the language lexer is used. The tokens are correct.
What about files that are not indexed by language lexers? (I guess not all files are indexed now.) Wouldn't we need a take-the-rest indexer? Can you estimate how much the performance would be affected by such indexer? Thanks.
In fact there is no much of such non binary files.
The take-the-rest is probably not needed.
It can be done in 2 phases:
1st) Fast path - covered by index
2nd) Slow path - covered by search in non indexed files
The split among 1st and 2nd should be based on file list index, no crawling of folders is needed. The file list index holds all file names + relative paths + mime types.
It can be even smarter. If the root contains more unindexed files then LIMIT we can index them by WSA. These files were never touched by any index -> no additional IO is done.
There are several changes needed for this:
1st) Identifier index (as described in comment #11) needs to go through API review, currently it's on branch and needs to be integrated into dev.
2nd) File list index needs a public API (or the search type created by wbrana needs to be integrated into jumpto module which can access it.
Unfortunately I cannot help now I am on vacation, I can help when I'll be back.