214279 – Use identifier index in Search in Projects

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 214279 - Use identifier index in Search in Projects

Summary: Use identifier index in Search in Projects

Status:	STARTED

Alias:	None

Product:	utilities
Classification:	Unclassified
Component:	Search (show other bugs)
Version:	7.2
Hardware:	PC Linux

Importance:	P2 normal (vote)
Assignee:	Jaroslav Havlin

URL:
Keywords:

Depends on:	214532
Blocks:
	Show dependency tree

Reported:	2012-06-15 12:28 UTC by wbrana
Modified:	2013-08-21 15:43 UTC (History)
CC List:	5 users (show)

See Also:
Issue Type:	ENHANCEMENT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description wbrana 2012-06-15 12:28:26 UTC

Steps to reproduce:
1. open all 841 Netbeans source code projects
2. select Find in Projects
3. select scope Open Projects
4. Find
it takes 4 minutes real time and 7 minutes CPU time
5. repeat steps 2-4
it takes 3 minutes real time and 6 minutes CPU time

Product Version: NetBeans IDE Dev (Build 20120615-ed0556fcb49e)
Java: 1.7.0_05; Java HotSpot(TM) 32-bit Client VM 23.1-b03
system: 3.2 GHz Quad Core, 4 GB RAM, 64-bit Linux

Comment 1 wbrana 2012-06-16 22:20:07 UTC

I will try to fix it.

Comment 2 Jaroslav Havlin 2012-06-18 07:40:17 UTC

There are several prepared optimizations for searching in file contents, but they were not tested well enough to be included in 7.2, so currently only the most safe algorithm is used.
See package org.netbeans.modules.search.matcher in module api.search.
Patches are welcome. Thank you.

P.S. You can also use an alternative search provider, e.g. Grep search from C/C++ bundle.

Comment 3 wbrana 2012-07-03 10:45:51 UTC

I have some code done, but I'm blocked by bug 214532

Comment 4 wbrana 2012-07-09 17:24:06 UTC

Could you please check bug 214532 if you would find solution?
Thanks.

Comment 5 Jaroslav Havlin 2012-07-27 11:45:11 UTC

(In reply to comment #4)
> Could you please check bug 214532 if you would find solution?
I'm sorry, I am still unable to solve it.
Module editor depends on api.search, and module parsing.api depends on editor.lib (and some other editor modules). Maybe there is some "hidden cyclic dependency".

I see you are going to use index during searching. This would make searching faster, but can be quite complicated (special algorithms for simple and regexp patterns), and indexing can be slower.

Could you consider creating a custom search provider for your algorithm, and adding it to the default search when it is tested, stabilized and measured? There should also be no problems with dependency on parsing.api. I can help you implementing it.

Thanks for your help. And apologies for the delay.

Comment 6 Tomas Zezula 2012-07-27 12:53:12 UTC

*DON'T DO THIS* if you don't want to completely completely screw up IDE performance.

Comment 7 wbrana 2012-07-27 12:57:33 UTC

My indexer is running in thread and doesn't slow down IDE.

Comment 8 Tomas Zezula 2012-07-27 13:02:04 UTC

>My indexer is running in thread and doesn't slow down IDE.

1st) Indexer cannot run in thread other than RepositoryUpdater.indexingThread it has to have full ordering of scanStarted, index, scanFinished.

2nd) Yes, it does. It does IO which is not needed as it was already done.

3rd) Even running in background thread slows down other threads, so priority mechanism and hand out scheduling similar to NB 7.2 transactional scanning (background scan) is needed.

Comment 9 wbrana 2012-07-27 13:15:39 UTC

What do you propose how to do full text indexing?
My indexer enables Find in Projects to take 4 seconds instead of 4 minutes.

Comment 10 Jaroslav Havlin 2012-07-27 13:23:12 UTC

I suppose most users prefer faster IDE (with slow searching) to super-fast searching.
I suggest creating a new module with your indexer and a custom search provider, and possibly uploading it to the plugin portal.

Comment 11 Tomas Zezula 2012-07-27 13:27:57 UTC

And scan time takes several minutes more. It will never pass the IDE performance criteria.

The feature should be written the other way round, there should not be a specialized indexer reading the file content again and again and trying to analyze the content by *WhileSpaceAnalyzer*, the indexers has to provide identifiers itself to identifier index. The identifier index can be used by the SearchProvider.
The identifier index has still unstable API but should be finalized to NB 7.3.
Such an approach has several benefits:
1st) It's very cheap (no additional IO on sources)
2nd) Instead of dummy *WhileSpaceAnalyzer* the language lexer is used. The tokens are correct.

Comment 12 Jaroslav Havlin 2012-07-27 13:42:36 UTC

What about files that are not indexed by language lexers? (I guess not all files are indexed now.) Wouldn't we need a take-the-rest indexer? Can you estimate how much the performance would be affected by such indexer? Thanks.

Comment 13 Tomas Zezula 2012-07-27 14:16:39 UTC

In fact there is no much of such non binary files.
The take-the-rest is probably not needed.
It can be done in 2 phases:

1st) Fast path - covered by index
2nd) Slow path - covered by search in non indexed files
The split among 1st and 2nd should be based on file list index, no crawling of folders is needed. The file list index holds all file names + relative paths + mime types.

It can be even smarter. If the root contains more unindexed files then LIMIT we can index them by WSA. These files were never touched by any index -> no additional IO is done.

There are several changes needed for this:
1st) Identifier index (as described in comment #11) needs to go through API review, currently it's on branch and needs to be integrated into dev.

2nd) File list index needs a public API (or the search type created by wbrana needs to be integrated into jumpto module which can access it.

Unfortunately I cannot help now I am on vacation, I can help when I'll be back.