Bug 239704 - Upgrade to lucene 4.1, use compressed index for maven
Upgrade to lucene 4.1, use compressed index for maven
Status: NEW
Product: projects
Classification: Unclassified
Component: Maven
8.0
PC Linux
: P3 with 8 votes (vote)
: TBD
Assigned To: Tomas Stupka
issues@projects
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-20 18:34 UTC by everflux
Modified: 2017-08-12 08:32 UTC (History)
3 users (show)

See Also:
Issue Type: ENHANCEMENT
:


Attachments
Reflect lucene API changes (3.17 KB, patch)
2014-02-13 20:40 UTC, everflux
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description everflux 2013-12-20 18:34:51 UTC
If I see this correctly:
http://bits.netbeans.org/trunk/maven-snapshot/org/netbeans/api/org-netbeans-libs-lucene/SNAPSHOT/org-netbeans-libs-lucene-20131219.051203-1.pom

Netbeans is still using the quite old Lucene 3.5
Lucene 4.1 supports compressed index, which would save considerable amount of space for the maven index.

See f.e.
https://netbeans.org/bugzilla/show_bug.cgi?id=235732
https://netbeans.org/bugzilla/show_bug.cgi?id=232687

At the same time the amount of I/O might be reduced so even compression adds CPU overhead, the saved IO could more then amortize the costs.

See f.e. http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene

I would assume that newer lucene versions are better optimized as well.

(I am not sure whether filing it for the maven project is correct, nor am I sure that Netbeans is really using the stone-aged lucene 3.5 / 3.6)
Comment 1 Milos Kleint 2013-12-20 19:00:03 UTC
we're using 3.6.2 actually (shipping a separate lucene with indexer module. The reason is that we actually use maven-indexer component/jar that in turn is using this version of lucene, without maven-indexer guys at apache upgrading we cannot upgrade either I think
Comment 2 everflux 2013-12-20 22:36:36 UTC
I see. From my understanding the nexus index format is based on Lucene (the old one). I am afraid therefore a change of the lucene version in the indexer-library itself is not possible - at least if the new format should be used.

The best alternative from my point of view is to use the new lucene format and maintain an own index inside of netbeans.
Comment 3 everflux 2014-01-20 16:06:20 UTC
I did more background checking, it seems like the new index format is in fact used to be independent of the Lucene format, as described here:
https://docs.sonatype.com/display/SPRTNXOSS/Nexus+Index+Format

So the options are to wait for a new maven-indexer release with a newer lucene bundled, or Netbeans could put use a newer lucene with maven-indexer, given that the API did not break.
Comment 4 Milos Kleint 2014-01-20 17:28:46 UTC
btw as part of another issue, I've removed a osgi related processor that populated the lucene documents with osgi related manifest entries. That effectively halved the central index size (as a local lucene index, the download size is the same but that's already compressed)
Comment 5 ehsavoie 2014-01-22 10:06:40 UTC
I added an issue on the maven indexer : 
http://jira.codehaus.org/browse/MINDEXER-77

@everflux : nexus has its own format but also produce a lucene index which is 'standard' de facto. By the way i think Lucene can update indexxes from previous versions
Comment 6 everflux 2014-01-22 10:18:43 UTC
AFAIK the ".zip" index is plain lucene (legacy version) index file and the ".gz" is a simple compressed text file format which is lucene agnostic. But when you use the ".gz" you have to build your lucene (or whatever you use for searching/indexing) index yourself.

IMHO this is happening in Netbeans, so I see multiple options to solve this
- get a new maven indexer release and use that (out of control of netbeans group)
- fork the maven indexer and upgrade it for newer Lucene + pull-request and use that in Netbeans (could be done by a NetCAT participant as well f.e.)
- Create an own Lucene index outside of the maven indexer, using latest Lucene version (this could lead to have two indices, one from the maven indexer, one from Netbeans, obviously not what we want if the goal is to save space, or we would need to delete the maven indexer index afterwards, but this would prevent incremental updates)

If someone would help me with the Netbeans integration part, I would volunteer to have a look into the maven-indexer and see if I can get it to work with newer Lucene smoothly.
Comment 7 Milos Kleint 2014-01-22 10:34:39 UTC
(In reply to everflux from comment #6)
> AFAIK the ".zip" index is plain lucene (legacy version) index file and the
> ".gz" is a simple compressed text file format which is lucene agnostic. But
> when you use the ".gz" you have to build your lucene (or whatever you use
> for searching/indexing) index yourself.

right, but the zip (legacy) content is not really present at many remote locations anymore.

> 
> IMHO this is happening in Netbeans, so I see multiple options to solve this
> - get a new maven indexer release and use that (out of control of netbeans
> group)

preferable option. Please note that any external binary changes need to be approved by oracle legal thus it's not an option for 8.0 anymore. (yes, it takes a while unfortunately)


> - fork the maven indexer and upgrade it for newer Lucene + pull-request and
> use that in Netbeans (could be done by a NetCAT participant as well f.e.)

-1, forks have a maintainance price tag attached.

> - Create an own Lucene index outside of the maven indexer, using latest
> Lucene version (this could lead to have two indices, one from the maven
> indexer, one from Netbeans, obviously not what we want if the goal is to
> save space, or we would need to delete the maven indexer index afterwards,
> but this would prevent incremental updates)

-1 for the same reasons. 



> 
> If someone would help me with the Netbeans integration part, I would
> volunteer to have a look into the maven-indexer and see if I can get it to
> work with newer Lucene smoothly.

Sure, feel free to ask in this issue or me directly (mkleint@netbeans.org)
Comment 8 everflux 2014-02-13 20:40:45 UTC
Created attachment 145169 [details]
Reflect lucene API changes

This patch is required due to lucene API changes, "indexExists" is moved to DirectoryReader.
Comment 9 everflux 2014-06-28 14:02:13 UTC
Upstreams has merged my patch, will be release with maven indexer 6.0

Not sure about the process: Leave this issue open to track the dependency upgrade or close it and you have a separate issue to update Netbeans dependencies for next release?
Comment 10 _ gtzabari 2017-08-12 01:05:25 UTC
What happened to this issue? Was anything ever integrated into a release?
Comment 11 everflux 2017-08-12 08:32:19 UTC
My changes where never merged to NB, unfortunately. Upstream (maven indexer) did not release a version with my changes (yet).


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2014, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo