The HML Validator slows down the Java project indexing by 10-20%.
It should be fixed or disabled.
Any link to a measurements? Any snapshots?
Here is the link with measurements:
Unfortunately the wiki does not support attachments, no snapshots there :-(
*** Bug 221201 has been marked as a duplicate of this bug. ***
Changing to (P3) DEFECT so this is not forgotten. Do we have any recent performance data for this? If not, Petr C, could I ask you to measure what's the impact of this?
Also, since apparently a large part of this performance hit is the first-time initialization of the HTML navigator, we need to separate this one time cost from any slowdown that happens on each scan.
As for a potential solution, Marek and I discussed offline that one solution could be to turn off HTML validation errors in the task list by default, and only show them in the current file.
It would be useful if it was possible to still turn this on so users who want to see HTML validation errors in task list, are able to. Which would mean that after you turn this on, the IDE would need to rescan your project and store extra information in the index, correct? Does the scanning infrastructure allow this?
Created attachment 130826 [details]
.npss file attached from NetBeans
In the attached snapshot you can see that out of the whole indexing of big web project which took 32,986 ms org.netbeans.modules.html.editor.HtmlErrorFilter.filter() took 7,246 ms which is roughly 22%
Petre, thank you for the snapshot.
Here are some results of my evaluation:
from the 7.396s spent in HtmlErrorFilter.filter(), there's:
1) 4.804s spent in HtmlValidatorImpl.validate() ... this is the html code validation entry point. Following code runs from within this method:
a) 2,799s (58%) spent in html.validator.ValidationTransaction.initialize(), which is a static initializer run just once per the IDE's JVM session.
b) 0.885s (18%) spent in html.validator.ValidationTransaction.loadDocAndSetupParser() which is run just once per JVM session per html content type (which is in 90% html5).
c) 0.49s (10%) in MessageEmitterAdapter's static initializer calls Html5AttributeDatatypeBuilder.parseSyntaxDescriptions(...)
just the three item above represents 86% of the validation time, where quite lot of time during the remaining code run is spent in classloading.
So the result for this part is - the validation itself is very fast, the first time initialization here takes definitively more then 90% of the time.
2) 2.249s in HtmlErrorFilter.isErrorCheckingEnabledForThisMimetype()
a) 1,779s (80%) in JsfPageMetadataProvider.getMatadataMap() which triggers one time initialization job in FaceletsLibrarySupport.findLibraries()
b) 0.32s (14%) in FaceletsLibrarySupport.checkLibraryDescriptorsUpToDate which is something that can be improved as this time is taken each time a file is being validated. An issue for this problem is already filed agains web.jsf.editor module.
So the situation here is quite similar to the first item - again the first time initialization is the biggest problem.
I think as for #2 I could completely strip the whole code if the file is .html and not .xhtml (I *think* the code now runs also on .html files)
Marek, thanks a lot for the evaluation - this is good news, this means that 2nd time scanning will be much faster.
"hanoi tower problem" is a different issue very likely caused by some internals in relaxng-jing. If there's a reproducible case I can take a look at the problem deeper.
Hanoi-tower is definitely completely different problem and is easily reproducible on multiple projects
Petre, please add a reference to one of them. Thanks.
I have spend some time with measurement scanning performance. I have found that the "hanoi tower problem" influence the result time slithly, but I don't have exact numbers. I need to find one file with this problem and measure the practice impact.
Petre, just for curiosity, can you please re-test with this change?:
summary: do not initialize facelets support for non-xhtml files.
(In reply to comment #13)
> Petre, just for curiosity, can you please re-test with this change?:
Just for sure, I meant Petr C.
Created attachment 130849 [details]
.npss file attached from NetBeans
One such case is in attached snapshot - it is reproducible when opening the Bigwebproject from http://hg.netbeans.org/ergonomics/file/aa11910f438d/performance/test/unit/src/org/netbeans/performance/scanning/ScanProjectPerfTest.java
But I agree, that the Hanoi towers are not a big deal from the scanning time point of view - they take ~1s in most cases I have seen. The real problem is that the HTML Validator is run at all for projects where it does nothing usefull and only adds several seconds to the opening time.
Imho it would be best to either:
- turn the validator off for opening of the project - and do it when the file is opened, thus it won't be a bother for users who just have some html file somewhere in the project and don't do anything with it and the users who use html are going to suffer the "penalty" anyway, so it just won't be on first opening of project, but on first opening of html file...
- or turn the html validator off for some kinds of projects (j2se comes to mind immediately) with similar result but more limited scope...
> Imho it would be best to either:
> - turn the validator off for opening of the project - and do it when the file
> is opened, thus it won't be a bother for users who just have some html file
> somewhere in the project and don't do anything with it and the users who use
> html are going to suffer the "penalty" anyway, so it just won't be on first
> opening of project, but on first opening of html file...
I have no problem with that. The impact will be that the action items won't contain the items for html files until one opens an html file. IMO needs to be resolved somewhere in the infrastructure, not at html.editor side.
> - or turn the html validator off for some kinds of projects (j2se comes to mind
> immediately) with similar result but more limited scope...
I'd prefer the first option as this means the html validation won't be typically turned on for java projects.
Anyway, as is described in my evaluation above, the html validation of files is not that slow, as it can look like from the results. Its first time initialization takes about 5-6 seconds, then the validation is relatively fast. So the performance bottleneck won't be that big for bigger projects where the scanning takes longer times.
Some more time can be saved by web-main#cdb690b8b6df, as is described in comment#13. I'd still appreciate if you measure the project again on a build with this change.
1) AFAIK the "normal" html indexing should not trigger the validator
2) the initial html validator initialization is just once per JVM, so if there's multiple projects the only first project scanning will be slowed down.
3) the "penalty" for the users with no-html based projects is one-off only. No validator will run during next scanning as the file remains untouched.
4) Svata already changed the tasklist scanner not to run until the AI is opened.
=> I do not see reasons severe enough wich would justify some extra inconsistent solution as proposed by PetrC. Closing as wont'fix. Please reopen if you disagree. Thanks for understanding.