117319 – MIMEResolution doesn't scale

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 117319 - MIMEResolution doesn't scale

Summary: MIMEResolution doesn't scale

Status:	RESOLVED FIXED

Alias:	None

Product:	platform
Classification:	Unclassified
Component:	Filesystems (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	Petr Nejedly

URL:
Keywords:	PERFORMANCE

Duplicates (1):	64461 (view as bug list)
Depends on:
Blocks:	114195
	Show dependency tree

Reported:	2007-10-01 14:04 UTC by Petr Nejedly
Modified:	2008-12-22 11:34 UTC (History)
CC List:	6 users (show)

See Also:
Issue Type:	ENHANCEMENT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Petr Nejedly 2007-10-01 14:04:50 UTC

There are about 40 MIMEResolvers registered currently.
They are used in a for-loop one-by-one, while most of them represent just a direct mapping between file extension and a
fixed MIME type, few of them bound to XML and discriminated by a few XML aspects, and only about 3 instances with more
compilcated processing.
Moreover, most of the implementations are based on a XML description, where we are free to implement them in any way, as
long as we keep the contract.

The (reasonably relaxed) contract boils down to:
*) If I register a mime resolver (either as direct implementation or as a xml description), there will be a MIMEResolver
instance in the default lookup that will match its semantics (provide the right mime type for the file matching the
original MIMEResolver).
*) If there is a conflicting MIMEResolver registered (e.g. the same extension, more specific resolver, ...), the
behavior of using the MIMEResolver sequence from the default lookup will match the ordering of the original resolvers.

Such a contract allows reordering and merging the resolvers in many cases, using the following rules and knowledge of
the internal behavior of the resolver instances):
*) Split multi-rule resolver into several consecutive single-rule instances
*) Switch the order of two consecutive resolvers if they both match primarily on a different file extension.
*) Discard a resolver that matches primarity on the same file extension as another, earlier declared resolver that
matches only at that file extension.
*) Convert a consecutive sequence of resolvers matching only at a (different) file extension into a single resolver
(using e.g. a map(extension->mimeType).
*) Convert a consecutive sequence of resolvers matching primarily on the same file extension (e.g. xml) into a single
resolver matching on the file extension and then using the rest of the in the original sequence.

This way, the current list of NetBeans MIMEResolvers could be processed into a much faster mime resolution scheme, where
most of the files would match on the very first rule or after few tree-like steps, making the mime resolution much
faster and much more scallable than it currently is.

Moreover, this can be implemented in a compatible way, even allowing further optimization, it there is a need.

I'll post a framework suggestion that would make this possible in a separate message.

Comment 1 Petr Nejedly 2007-10-01 14:23:05 UTC

There are currently two registration points for mime resolvers, meta-inf and services lookup.
All in meta-inf is currently before everything in services lookup. 
Most of the entries in the services lookup are xml files that are processed by a private implementation into a resolver
instances (private impl as well).
If we change the xml processing to generate other instances that MIMEResolvers (e.g. MIMEResolverData) and register a
new, singleton, MIMEResolver  as a last slot in the meta-inf, we can make this resolver gather all the registered
instances of MIMEResolverData (private class) and let it do post-processing of the instances (turning them into an
internal sequence of MIMEResolvers), it would keep the semantics but allow much finer control of the resolution process.
The only pitfall is a potential presence of explicitely coded MIMEResolver instance in the services folder, in the
middle of the declaration xmls. (There _is_ one currenly in the big NetBeans).

To solve this compatibly, we could find the original position of such an offender and split the optimized resolver list
into two halves, having the offender between them.
Of course, in the real product and once having tests for this case, we would move such a resolver somewhere else (end of
meta-inf), as log as would the real (human-from-source-code-inferred) ordering be equivalent.

I'll try to post an initial sketch of the framework later.
I have also implemented an ugly proof-of-concept implementation (which didn't even obey the outlined rules) and it loked
like it works reliably so far (no extensive testing, though).

Note that we don't really need to implement optimal, bullet-proof classification framework, it can happily stay as a
heuristic that works for us.

Comment 2 Jesse Glick 2007-10-01 16:26:38 UTC

Is calling a single method on a collection of ~40 resolvers, most of which will return null, really so expensive?

Comment 3 Petr Nejedly 2007-10-01 22:24:47 UTC

Yes, if
*) you do it 120 times in a row (e.g. 120 files in a folder).
*) many resolver instances contains more sub-instances (more <file> tags in the declaration).
*) each instance call typically call FO.getExt, a nontrivial method to begin with.
*) the order of resolvers is suboptimal.
*) each magic resolver copies its load of bytes again through InputStream API.

And we're talking about scalability here, the iteration never scales much.
The individual issues are fixable, but the scallability limit will remain.
And I'm not sucking my thumb to get the arguments, I did the profiling, several times.

If we solve all the subissues of 114195, we can speed up folder expansion (which regressed recently) by more than 60%

Comment 4 Petr Nejedly 2007-10-02 14:57:27 UTC

Because keeping mutual order of real MIMEResolver instances and xml-based instances of some private helper class seems
complicated (probably would require copying part of the FolderLookup implementation), we can do it in following
compatible way:
Let the "private implementation" actually implement the MIMEResolver interface (abstract class in reality), being mostly
no-op (return null, should be fast enough).
Let the convertor infrastructure look up MIMEResolver, not the private interface. It can then filter the instances to
private and unknown, and optimize the private.

For example, if there is a sequence:
METAINF-Unknown1
METAINF-Unknown2
Services-xml1
Services-xml2
...
Services-xml15
Services-Unknown3
Services-xml16
Services-xml17
...

then Services-xml1 instance would cover the functionality of xml1-xml15,
xml-15 would cover xml16-... and the other xml instances would do nothing(return null).

Comment 5 Jesse Glick 2007-10-02 15:39:01 UTC

Idea: rather than hack this case, when we will likely want to do something similar for e.g. DataLoader in the future:

/** Metaservice which can coalesce adjacent services into an optimized form. */
interface LookupCoalescer<T> {
  Class<T> type();
  T/*|null*/ coalesce(T former, T latter);
}
class Lookups {
  /** Coalesces an ordered list of services using all coalescers known in default lookup. */
  public static <T> Lookup.Result coalesce(Lookup.Result<T> r, Class<T> type);
}

Usage:

package core;
class XMLMimeResolver extends MIMEResolver {...}
public/* for M-I/services */ class XMLMRCoalescer implements LookupCoalescer<MIMEResolver> {
  public class<MIMEResolver> type() {return MIMEResolver.class;}
  public MIMEResolver coalesce(MIMEResolver m1, MIMEResolver m2) {
    if (m1 iof XMLMIMEResolver && m2 iof XMLMIMEResolver) {
      return new XMLMIMEResolver(m1.data.append(m2.data));
    } else return null;
  }
}

package fs;
static Lookup.Result<MIMEResolver> resolvers =
  Lookups.coalesce(Lookup.getDefault().lookupResult(MIMEResolver.class), MIMEResolver.class);
public static String getMIMEType(FileObject f) {
  for (MIMEResolver m : resolvers.allInstances()/* or cached MIMEResolver[] + change listener */) {
    String t = m.getMIMEType(f);
    if (t != null) return t;
  }
  return null;
}

Comment 6 Petr Nejedly 2007-10-02 18:16:56 UTC

I have had something similar already coded as a proof of concept (not generic, tight to MIMEResolvers), but abadoned
that approach, as it was API change. And it won't work with DataLoaders, they are not in Lookup currently.
I'd like to try implementing the mime resolver coalescing without a visible change first, to prove it is sound and helpfull.
We can switch over to your idea later, no doors closed by my hack.

Comment 7 Jesse Glick 2007-10-02 19:06:26 UTC

DataLoader's are not in Lookup currently but they probably could and should be.

Comment 8 Petr Nejedly 2007-10-04 14:08:19 UTC

Changing to enhancement after filing issue 117781.
Slightly changing the order of MIME resolvers can save us from the worst problems in common cases.
While this change is desirable, it might not end up in 6.0 and thanks to the other fixes under umbrella issue 114195
is not strictly necessary to keep reasonable performance.

Comment 9 Antonin Nebuzelsky 2007-10-31 16:34:46 UTC

*** Issue 64461 has been marked as a duplicate of this issue. ***

Comment 10 Jaroslav Tulach 2008-10-24 09:18:54 UTC

Jesse rewritten the mime resolvers during 6.5 and move their implementation to openide.filesystems.

Comment 11 Jesse Glick 2008-10-24 19:03:02 UTC

Moving the impl of resolvers to openide.filesystems had nothing (intentional) to do with performance, so I'm not sure
why this is being closed now. Perhaps we have done what we needed to do in 6.5, but that would be orthogonal to the move.

I like Petr's idea of having a singleton resolver which reads all the declarative resolvers and placing this in M-I/s
lookup in a specific location. This would permit individual procedural resolvers to decide for themselves whether to run
before or after the block of declarative resolvers, whereas in the current state this order is hardcoded (declarative
followed by procedural).

Comment 12 Jiri Skrivanek 2008-10-27 11:22:07 UTC

It would be enough to get the position attribute from MetaInfServicesLookup.Item. Then procedural resolvers can be
merged with declarative resolvers.
BTW, I will try to enhance declarative resolvers to be capable of detecting files which only procedural resolvers can.
Then we can replace all existing procedural resolvers in IDE.