197724 – sometimes scanning takes 17896 ms instead of 149 ms

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 197724 - sometimes scanning takes 17896 ms instead of 149 ms

Summary: sometimes scanning takes 17896 ms instead of 149 ms

Status:	RESOLVED FIXED

Alias:	None

Product:	editor
Classification:	Unclassified
Component:	Parsing & Indexing (show other bugs)
Version:	7.0.1
Hardware:	PC Linux

Importance:	P2 normal (vote)
Assignee:	Marek Fukala

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-04-13 21:56 UTC by David Konecny
Modified:	2011-04-19 09:21 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
profiler snapshot (212.70 KB, application/octet-stream) 2011-04-13 21:56 UTC, David Konecny	Details
zipped test project (1.07 MB, application/x-gzip) 2011-04-13 21:57 UTC, David Konecny	Details
two ide startup logs - one OK and one slow (106.06 KB, text/plain) 2011-04-13 22:00 UTC, David Konecny	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Konecny 2011-04-13 21:56:15 UTC

Created attachment 107738 [details]
profiler snapshot

While testing an issue I noticed that about once in five IDE starts initial project scanning is extremely slow. My setup is following:

Product Version: NetBeans IDE Dev (Build 20110408-32ac8b5ddb30)
Java: 1.6.0_24; Java HotSpot(TM) 64-Bit Server VM 19.1-b02
System: Linux version 2.6.35-28-generic running on amd64; UTF-8; en_NZ (nb)
Userdir: /home/david/.netbeans/_UD4

and I have single trivial Web Project opened in the IDE which I will attach. The project is almost empty apart from three copies of the same very large HTML document.

The way I'm reproducing this problem is following: I start IDE and have a look at how long initial scanning took; quit IDE; and keep repeating these two steps (exception from JspEditorWarmUpTask is issue 197723 but it should not matter for this issue). Often I see what you would expect:

Indexing of: file:/home/dev/projs/WebApplication124/web/ took: 51 ms
(New or modified files: 0, Deleted files: 0) [Adding listeners took: 2 ms]
Complete indexing of 3 source roots took: 149 ms
(New or modified files: 0, Deleted files: 0) [Adding listeners took: 11 ms]

but once in a while I can see:

Indexing of: file:/home/dev/projs/WebApplication124/web/ took: 17742 ms
(New or modified files: 0, Deleted files: 0) [Adding listeners took: 2 ms]
Complete indexing of 3 source roots took: 17896 ms
(New or modified files: 0, Deleted files: 0) [Adding listeners took: 12 ms]

It is quite reproducible. Some questions for Marek and Tomas:

* why HTML documents are sometimes processed during startup and sometimes not?

* why they are processed at all during start up? I understand that in Java case this is needed in order to provide features like Goto Declaration or Open Type but what's needed from HTML?

Comment 1 David Konecny 2011-04-13 21:57:13 UTC

Created attachment 107739 [details]
zipped test project

Comment 2 David Konecny 2011-04-13 22:00:32 UTC

Created attachment 107740 [details]
two ide startup logs - one OK and one slow

Comment 3 Tomas Zezula 2011-04-14 07:58:07 UTC

Does not seems to have something in common with parsing api. But I will evaluate.
David, next time you report a performance problem attach the self sampler results or I will close the issue as an incomplete.

>Why HTML is parsed?
HTML Validation is done. HTML itself can contain embedded languages (javascript) which contains types, functions and have to be indexed to present in navigation, code completion, etc. 

BTW Isn't the difference opened task list?

Comment 4 Marek Fukala 2011-04-14 08:14:16 UTC

> >Why HTML is parsed?
> HTML Validation is done. HTML itself can contain embedded languages
> (javascript) which contains types, functions and have to be indexed to present
> in navigation, code completion, etc. 
Not just these, even the html itself is indexed, at least its file references.

Validation is not necessary during the indexing so I hope the validator doesn't run (that would be a bug I think), but it does run when the file errors are requested by the tasklist indexer.

I'm on vacation by the end of the week, I'll evaluate the issue on monday.

Comment 5 David Konecny 2011-04-14 10:46:55 UTC

(In reply to comment #3)
> David, next time you report a performance problem attach the self sampler
> results or I will close the issue as an incomplete.

Well... or you could just say "thank you David for taking time to file this report". It took me a while to track this down, I attached a detailed steps how to reproduce it, a test project, versions of my environment, IDE logs and profiler snapshot of whole scanning period and you are still complaining that it is incomplete?

> >Why HTML is parsed?
> HTML Validation is done.

I will wait for Marek to get back from his holiday to evaluate this. 6MB of HTML should not take 17 seconds to parse.

> BTW Isn't the difference opened task list?

My task list is closed if that's what you are referring to.

Comment 6 Tomas Zezula 2011-04-14 13:21:12 UTC

In reply to comment #5.
Thanks for report :-)
Anyway I don't understand why HTML issues should be evaluated on editor/parsing.api.

Comment 7 Tomas Zezula 2011-04-14 13:38:51 UTC

Some numbers:
The scan took (20s) according to snapshot

From that time:

1st) embedded indexers 16.6s which is HTMLIndexer and TaskList indexer which does HTML, JS, CSS parsing and validation. 

2nd) ~ 2s in storing indexes (cumulative time both source + binary)

3rd) 2s Index binary - the slowest indexer for binaries is JsfBinaryIndexer

Typical parsing.api issue.
Oops I've forgot about java indexing JavaCustomIndexer 0.019s and JavaBinaryIndexer 0.019s

Thanks for letting me to evaluate it.

Comment 8 David Konecny 2011-04-14 21:43:09 UTC

(In reply to comment #7)
> Thanks for letting me to evaluate it.

You are welcome. The issue was assigned to Marek and you were on CC to be informed. 

Generally speaking it would help to make it easier for reporters to identify where the problem is when "scanning took 20 seconds". That way some problems might completely skip your attention. Currently you log how much time it took RepositoryUpdater to index a file/folder. That's useful sometimes but in my experience the number is either very small which is OK or too big which does not help to identify where is the problem. You once sent me a breakdown of how much time was spend in individual stages of scanning. That information + how much time was spent in individual indexers (in total) would very clearly point to the owner of the problem.

Comment 9 Tomas Zezula 2011-04-15 10:29:38 UTC

In reply to comment 8:

What I complained about is that the issue is put onto parsing.api and assign to Marek. It should be either assigned to parsing.api or html (even I doubt that any of these modules is causing the problem but finding the root case is non trivial task).

I've complained that there is not snapshot which you attached I've overlooked it and I am sorry. Both snapshots (good and bad) would be even better (they can be compared) but I've generated them myself.
Also the issue title and description are misleading "sometimes scanning takes 17896 ms instead of 149 ms". It should be s/scanning/up to date check/ this took me some time to realize it.

As I've mention in comment #3 parsing.api just collects file(s), passes them to indexers and provides persistence for them. The persistence took 2sec + parsing.api overhead < 1s. The most time is spent in html which parses the html. But it's OK the html indexer should scan HTML as I explained in comment #3. The question is why they are scanned when they are up to date. And this is the core problem. They are scanned because indexer forced complete reindex. Now I am trying to find out why.

Comment 10 Tomas Zezula 2011-04-15 11:16:15 UTC

I've forgot to attach the indexer's votes, here they are:

Vote: org.netbeans.modules.csl.core.TLIndexerFactory@18adbb30---->true
Vote: org.netbeans.modules.css.indexing.CssIndexer$Factory@35dc58f8---->true
Vote: org.netbeans.modules.csl.core.EmbeddingIndexerFactoryImpl[org.netbeans.modules.javascript.editing.JsIndexer$Factory@b4914ab]---->true
Vote: org.netbeans.modules.csl.core.EmbeddingIndexerFactoryImpl[org.netbeans.modules.web.el.ELIndexer$Factory@77bb6668]---->false
Vote: org.netbeans.modules.csl.core.TLIndexerFactory@5b2eaba4---->true
Vote: org.netbeans.modules.web.jsf.editor.index.JsfIndexer$Factory@3650710f---->false
Vote: org.netbeans.modules.html.editor.indexing.HtmlIndexer$Factory@5506cac0---->true

So the indexers causing the reindex are:

org.netbeans.modules.web.jsf.editor.index.JsfIndexer
org.netbeans.modules.web.el.ELIndexer

BTW aren't those problematic indexers yours? ;-)

Comment 11 Tomas Zezula 2011-04-15 11:20:05 UTC

But thanks for the issue. I will become even more paranoid than I am now :-) and I will add Logging.INFO in dev builds when the indexer will vote for reindex I will log it.

Comment 12 Tomas Zezula 2011-04-15 11:20:46 UTC

Now I am fixing the problematic indexers.

Comment 13 David Konecny 2011-04-17 20:22:31 UTC

(In reply to comment #11)
> I will add Logging.INFO in dev builds when the indexer will vote for
> reindex I will log it.

I think it's a good idea. In case of this issue it would log:

org.netbeans.modules.web.jsf.editor.index.JsfIndexer
org.netbeans.modules.web.el.ELIndexer

and that would clearly identify where to file this issue for evaluation.

> Now I am fixing the problematic indexers.

Thanks. Marek was planning to do that so feel free to leave that for him.

Comment 14 Tomas Zezula 2011-04-18 07:48:41 UTC

>Marek was planning to do that so feel free to leave that for him
There are lots of projects which are doing wrongly the index check, it would be probably safer to change the LuceneIndex.isValid to handle it + change the javadoc.
I am already having it but I have to verify if it does not affect some client.

Comment 15 Marek Fukala 2011-04-18 13:50:06 UTC

(In reply to comment #14)
> >Marek was planning to do that so feel free to leave that for him
> There are lots of projects which are doing wrongly the index check, it would be
> probably safer to change the LuceneIndex.isValid to handle it + change the
> javadoc.
> I am already having it but I have to verify if it does not affect some client.

Do you mean that the JSFIndexer and ELIndexer doesn't implement the filesDeleted/Dirty methods and call the appropriate IndexingSupport methods? (That's the only difference I found for example between css/html indexer and the mentioned two). If that's true then I'm sorry I caused you to spend too much time on this issue. OTOH there's no notion about what must/should be done in the javadoc :-|

Comment 16 Tomas Zezula 2011-04-19 09:11:31 UTC

The javadoc of markDirty and filesDeleted says when it's called and what is passed into it. There are not exact calls which you should do it depends on you how you implemented i. If there will be you will not need to implement it. The SourceFactory is both abstract factory and strategy. The strategy with exact implementation is useless. The implementation of these method is very different for java (custom storage), scala (combined storage) and html (support).
What I can do is add add a mention about the support into it.

The problem is not in these methods it's in scanStarted in some really obscure cases. The same problem is in CSL base. In a bit simplified way the problem is when an indexer accepts files does not store anything and then checks validity.

I've changed the pa support to make it easier to use so you don't have to care about such rare cases.

Comment 17 Tomas Zezula 2011-04-19 09:19:18 UTC

Fixed jet-main
http://hg.netbeans.org/jet-main/rev/f06bf15edd49
http://hg.netbeans.org/jet-main/rev/f5033ec57caa

Comment 18 Tomas Zezula 2011-04-19 09:21:56 UTC

Thanks Davide for the report.