Bug 206025 - HTML Validator slows down indexing
HTML Validator slows down indexing
Status: RESOLVED WONTFIX
Product: web
Classification: Unclassified
Component: HTML Editor
7.2
All All
: P3 (vote)
: 7.2
Assigned To: Marek Fukala
issues@javaee
: PERFORMANCE, PLAN
: 221201 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-06 17:49 UTC by Tomas Zezula
Modified: 2013-07-12 12:13 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
:


Attachments
.npss file attached from NetBeans (1.09 MB, application/x-npss)
2013-01-30 11:17 UTC, Petr Cyhelsky
Details
.npss file attached from NetBeans (1.41 MB, application/x-npss)
2013-01-30 16:44 UTC, Petr Cyhelsky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tomas Zezula 2011-12-06 17:49:19 UTC
The HML Validator slows down the Java project indexing by 10-20%.
It should be fixed or disabled.
Comment 1 Marek Fukala 2011-12-16 13:23:54 UTC
Any link to a measurements? Any snapshots?
Comment 2 Tomas Zezula 2011-12-16 14:29:18 UTC
Here is the link with measurements:
http://wiki.netbeans.org/IndexingMeasurement71

Unfortunately the wiki does not support attachments, no snapshots there :-(
Comment 3 Tomas Zezula 2013-01-03 18:24:49 UTC
*** Bug 221201 has been marked as a duplicate of this bug. ***
Comment 4 Petr Jiricka 2013-01-29 17:47:39 UTC
Changing to (P3) DEFECT so this is not forgotten. Do we have any recent performance data for this? If not, Petr C, could I ask you to measure what's the impact of this?

Also, since apparently a large part of this performance hit is the first-time initialization of the HTML navigator, we need to separate this one time cost from any slowdown that happens on each scan.

As for a potential solution, Marek and I discussed offline that one solution could be to turn off HTML validation errors in the task list by default, and only show them in the current file. 
It would be useful if it was possible to still turn this on so users who want to see HTML validation errors in task list, are able to. Which would mean that after you turn this on, the IDE would need to rescan your project and store extra information in the index, correct? Does the scanning infrastructure allow this?
Comment 5 Petr Cyhelsky 2013-01-30 11:17:18 UTC
Created attachment 130826 [details]
.npss file attached from NetBeans

.npss file
Comment 6 Petr Cyhelsky 2013-01-30 11:26:53 UTC
In the attached snapshot you can see that out of the whole indexing of big web project which took 32,986 ms org.netbeans.modules.html.editor.HtmlErrorFilter.filter()  took 7,246 ms which is roughly 22%
Comment 7 Marek Fukala 2013-01-30 14:28:49 UTC
Petre, thank you for the snapshot.

Here are some results of my evaluation:

from the 7.396s spent in HtmlErrorFilter.filter(), there's:

1) 4.804s spent in HtmlValidatorImpl.validate() ... this is the html code validation entry point. Following code runs from within this method:

a) 2,799s (58%)  spent in html.validator.ValidationTransaction.initialize(), which is a static initializer run just once per the IDE's JVM session.

b) 0.885s (18%) spent in html.validator.ValidationTransaction.loadDocAndSetupParser() which is run just once per JVM session per html content type (which is in 90% html5).

c) 0.49s (10%) in MessageEmitterAdapter's static initializer calls Html5AttributeDatatypeBuilder.parseSyntaxDescriptions(...)

just the three item above represents 86% of the validation time, where quite lot of time during the remaining code run is spent in classloading.

So the result for this part is - the validation itself is very fast, the first time initialization here takes definitively more then 90% of the time.

2) 2.249s in HtmlErrorFilter.isErrorCheckingEnabledForThisMimetype()

a) 1,779s (80%) in JsfPageMetadataProvider.getMatadataMap() which triggers one time initialization job in FaceletsLibrarySupport.findLibraries() 

b) 0.32s (14%) in FaceletsLibrarySupport.checkLibraryDescriptorsUpToDate which is something that can be improved as this time is taken each time a file is being validated. An issue for this problem is already filed agains web.jsf.editor module.

So the situation here is quite similar to the first item - again the first time initialization is the biggest problem.

I think as for #2  I could completely strip the whole code if the file is .html and not .xhtml (I *think* the code now runs also on .html files)
Comment 8 Petr Jiricka 2013-01-30 14:35:50 UTC
Marek, thanks a lot for the evaluation - this is good news, this means that 2nd time scanning will be much faster. 

I am also cc'ing Petr Pisl, who told me today that he encountered the "hanoi tower problem" while testing the scanning of a large JavaScript/HTML project - Petr can you please share the details? Is there something we could improve in that scenario?
Comment 9 Marek Fukala 2013-01-30 15:02:59 UTC
"hanoi tower problem" is a different issue very likely caused by some internals in relaxng-jing. If there's a reproducible case I can take a look at the problem deeper.
Comment 10 Petr Cyhelsky 2013-01-30 15:29:25 UTC
Hanoi-tower is definitely completely different problem and is easily reproducible on multiple projects
Comment 11 Marek Fukala 2013-01-30 15:50:33 UTC
Petre, please add a reference to one of them. Thanks.
Comment 12 Petr Pisl 2013-01-30 16:09:32 UTC
I have spend some time with measurement scanning performance. I have found that the "hanoi tower problem" influence the result time slithly, but I don't have exact numbers. I need to find one file with this problem and measure the practice impact.
Comment 13 Marek Fukala 2013-01-30 16:26:39 UTC
Petre, just for curiosity, can you please re-test with this change?: 

web-main#cdb690b8b6df

summary:     do not initialize facelets support for non-xhtml files.
Comment 14 Marek Fukala 2013-01-30 16:28:17 UTC
(In reply to comment #13)
> Petre, just for curiosity, can you please re-test with this change?: 
Just for sure, I meant Petr C.
Comment 15 Petr Cyhelsky 2013-01-30 16:44:43 UTC
Created attachment 130849 [details]
.npss file attached from NetBeans

.npss file
Comment 16 Petr Cyhelsky 2013-01-30 16:48:31 UTC
One such case is in attached snapshot - it is reproducible when opening the Bigwebproject from http://hg.netbeans.org/ergonomics/file/aa11910f438d/performance/test/unit/src/org/netbeans/performance/scanning/ScanProjectPerfTest.java
Comment 17 Petr Cyhelsky 2013-01-30 16:56:58 UTC
But I agree, that the Hanoi towers are not a big deal from the scanning time point of view - they take ~1s in most cases I have seen. The real problem is that the HTML Validator is run at all for projects where it does nothing usefull and only adds several seconds to the opening time.

Imho it would be best to either:
- turn the validator off for opening of the project - and do it when the file is opened, thus it won't be a bother for users who just have some html file somewhere in the project and don't do anything with it and the users who use html are going to suffer the "penalty" anyway, so it just won't be on first opening of project, but on first opening of html file...

- or turn the html validator off for some kinds of projects (j2se comes to mind immediately) with similar result but more limited scope...
Comment 18 Marek Fukala 2013-01-31 10:36:05 UTC
> Imho it would be best to either:
> - turn the validator off for opening of the project - and do it when the file
> is opened, thus it won't be a bother for users who just have some html file
> somewhere in the project and don't do anything with it and the users who use
> html are going to suffer the "penalty" anyway, so it just won't be on first
> opening of project, but on first opening of html file...
I have no problem with that. The impact will be that the action items won't contain the items for html files until one opens an html file. IMO needs to be resolved somewhere in the infrastructure, not at html.editor side. 

> - or turn the html validator off for some kinds of projects (j2se comes to mind
> immediately) with similar result but more limited scope...
I'd prefer the first option as this means the html validation won't be typically turned on for java projects.

Anyway, as is described in my evaluation above, the html validation of files is not that slow, as it can look like from the results. Its first time initialization takes about 5-6 seconds, then the validation is relatively fast. So the performance bottleneck won't be that big for bigger projects where the scanning takes longer times.

Some more time can be saved by web-main#cdb690b8b6df, as is described in comment#13. I'd still appreciate if you measure the project again on a build with this change.
Comment 19 Marek Fukala 2013-07-12 12:13:06 UTC
1) AFAIK the "normal" html indexing should not trigger the validator
2) the initial html validator initialization is just once per JVM, so if there's multiple projects the only first project scanning will be slowed down.
3) the "penalty" for the users with no-html based projects is one-off only. No validator will run during next scanning as the file remains untouched.
4) Svata already changed the tasklist scanner not to run until the AI is opened.

=> I do not see reasons severe enough wich would justify some extra inconsistent solution as proposed by PetrC. Closing as wont'fix. Please reopen if you disagree. Thanks for understanding.


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo