Find out and design solutions to enable the system to scale correctly when a lot
of modules is installed. This task should minimize the negative effect of number
of modules on startup time, memory usage as well as ensuring that the user pays
just for what he is using.
I've measured starting with 1000 dumb modules (empty layer,
no ModuleInstall, no manifest sections).
The additional time spent in startup (~11s) was caused by:
- 8s ModuleList.readInitial
- ~4s spent in parsing separate Modules/modname.xml files
- ~4s spent in opening the jars needed for reading manifests.
- 3s ModuleManager.enable/module preparation
- probably mostly spent in creating the classloaders/reopening
the module jars
Parsing the Modules/*.xml files could probably be made a lot faster by
optimistically assuming that they are in a simple fixed format and
looking for particular substrings. If the assumption fails, revert to
an actual XML parse.
We may be able to create a cache of module manifests so JARs do not
need to be opened until the module is actually turned on. This could
save another chunk of time, perhaps. The basic timestamp
infrastructure already used for layer caching could help.
My guess is that the problem with Modules/* is not in XMP parsing
but in number of files. Cached layer (360KB) is parsed below a second
on my machine, while 1000x560B takes 4s while parsing itself
could be around 1.5s
It may be possible to copy-out the manifests to the module stati
and/or merge the stati to a single file (maybe even binary),
but this collides with moduleset changes on project switch.
(on the other hand, changing moduleset on project switch
quite slows down the project switch so it is a question whether
it is really that useful)
BTW: Computing the fingerprint for a XMLFS of 1000 modules
takes about 600ms itself.
So you're saying just accessing all the files in Modules/*.xml is
itself a problem (LocalFileSystem + MultiFileSystem overhead on top of
raw OS speed for finding files)? Or that opening the JARs to get their
manifests is the problem? I'm confused.
Merging stati to a single file would indeed change the semantics of
the module system (not just because of project switching) so I would
like to avoid it unless it is really shown to be necessary. But
caching more manifest information in the status XML files would be
fine, I think - probably just impl version and dependencies incl.
provides-requires would need to be added. This could permit the JARs
of enabled modules to be opened only once under normal conditions, and
those of disabled modules not at all.
I'm saying that 4s is spent in opening + parsing Modules/* files.
I don't have exact distribution numbers between opening+parsing
but I extrapolated the parsing time to be about 1.5s and from
that I infered that opening takes 2.5s
1st time opening of jars takes 4s, the second opening seems
to be faster so skipping the 1st opening may not be a win itself.
I have to make more experiments yet.
Created attachment 6355 [details]
Attaching some thoughts I had a few months ago on caching stuff during
startup, which never got finished. Note that the layer cache is
already implemented. Treat it as a collection of ideas rather than a
concrete proposal, because I know there are problems with it as written.
Set target milestone to TBD
Seems more or less work for the performance team.