Bug 239915 - Unpacking index is extremely slow
Unpacking index is extremely slow
Status: REOPENED
Product: projects
Classification: Unclassified
Component: Maven
8.0.1
PC All
: P3 with 10 votes (vote)
: 8.0
Assigned To: Milos Kleint
issues@projects
:
: 250158 252851 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-02 09:33 UTC by kimsp
Modified: 2017-07-29 04:14 UTC (History)
9 users (show)

See Also:
Issue Type: DEFECT
:


Attachments
Thread dump for the latest scenario (35.76 KB, text/plain)
2016-09-25 22:50 UTC, bht
Details

Note You need to log in before you can comment on or make changes to this bug.
Description kimsp 2014-01-02 09:33:00 UTC
[ BUILD # : 201401020002 ]
[ JDK VERSION : 1.7.0_45 ]

STEPS:
   * Sometimes Netbeans index' a Maven repository

ACTUAL:
   The progress bar stays at 100 % for a long time saying "Unpacking index for
[your repo]".

EXPECTED:
   Unpacking should not take that long, it does not do that in IDEA.
Comment 1 kimsp 2014-01-02 09:36:43 UTC
Forget the 100 % part. That was the progress bar before that, but the bug is still valid, the Unpacking is extremely slow.
Comment 2 Milos Kleint 2014-01-02 09:40:16 UTC
(In reply to kimsp from comment #0)
> [ BUILD # : 201401020002 ]
> [ JDK VERSION : 1.7.0_45 ]
> 
> 
> EXPECTED:
>    Unpacking should not take that long, it does not do that in IDEA.


I'm not really sure what we can improve here. We basically call one method from maven-indexer library that both downloads the index (or it's part) and populates the lucene index.

We do a bit of additional work for local repository (about 30-50% overhead if I recall correctly from the time I measured it)

Please note that the slowness can have multiple causes including not having enough memory available to the IDE's JVM, resulting in excessive GC.
Comment 3 kimsp 2014-01-02 09:55:08 UTC
I would like to fix this problem myself (I have a masters degree in CS and code Java professionally for a living).

Please send me a mail and we can talk.

Thanks and regards
Kim
Comment 4 Milos Kleint 2014-01-02 15:45:29 UTC
issue 225678 is related
Comment 5 Milos Kleint 2014-01-14 12:57:31 UTC
http://hg.netbeans.org/core-main/rev/5cc878ddb9e7
see issue 240150 for details. Could at least partially help with the slowness, in the sense that less gc will be done and likely the lucene index will be smaller.
Comment 6 Quality Engineering 2014-01-16 02:44:15 UTC
Integrated into 'main-silver', will be available in build *201401160001* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)

Changeset: http://hg.netbeans.org/main-silver/rev/5cc878ddb9e7
User: Milos Kleint <mkleint@netbeans.org>
Log: #240150, #239915 avoid slow search for groupids, it's a memory bottleneck. Also avoid storing and retriving osgi related document fields, we do not query them and they can occupy a lot of memory
Comment 7 Milos Kleint 2014-01-17 09:43:32 UTC
the diff between with and without osgi fields in documents is following in terms of size:

Miloss-iMac:mavenindex mkleint$ du my-local-nexus/
1036216	my-local-nexus/
Miloss-iMac:mavenindex mkleint$ rm -rf my-local-nexus/
Miloss-iMac:mavenindex mkleint$ du my-local-nexus/
1885560	my-local-nexus/

that's on my mac with my local nexus proxy of central

we're down close to half the size, which is great especially given we're not using these fields at all. That was not that severe when the feature was introduced (I think I measured it back then). Apparently more projects now include osgi headers to artifacts.
Comment 8 Milos Kleint 2014-01-17 09:51:16 UTC
in human readable sizing it's 921M vs. 506M
Comment 9 Milos Kleint 2014-01-17 11:12:33 UTC
not sure what more can be done without additional information. constructing the index is an expensive operation with high requirements on cpu and io.

considered fixed.
Comment 10 dejlek 2014-08-19 16:00:22 UTC
What puzzles me is why are other processes affected? My machine is literally not usable while this "Unpacking index for Sonatype Repository" is running.

I checked the system processes. I have HP Z600 workstation here, with 12GB RAM and 6-core i7. Whenever NetBeans starts indexing, I have to take a coffee break...

Could this be done in an external process (separate JVM maybe) perhaps with extremely low priority?
Comment 11 Saucistophe 2014-08-27 18:58:11 UTC
I second what dejlek said, I understand not much can be done, but this is the one and only application bringing my PC to its knees.

The problem is that it directly hits the hard drive, and even though the CPU 

Could this index be split in small batches, or deprioritized, or even better, could you put a HD transfer rate limitation?
For instance, one third of the disk's max transfer rate (if one such information can be retrieved in Java) or a manual setting.
Comment 12 jenjen 2014-09-02 00:43:15 UTC
> the one and only application bringing my PC to its knees.

Same here with the latest release version.  This is not resolved.  I'm using a brand new i7 with 32 GB of RAM and an SSD.  It's been this way for over 90 minutes and counting.  Admittedly I just added a huge dependency to my pom.xml file (Spring Framework) and I'm runny a crappy OS(Windows), but the system is nearly unusable.  I'm even having trouble typing in this simple textbox.
Comment 13 mdanjou 2014-10-13 16:15:02 UTC
Hi Milos,

I hope you don't mind, I have decided to reopen this issue as it is reproducible with the latest GA build (Netbeans 8.0.1-javaee-windows.exe) and renders the computer inoperable.
I have also changed the Hardware field to "Windows 7 x64" (was Windows XP)

Environment:
- Windows 7 
- 6GB RAM
- OS is up to date.
- Antivirus is up to date.

Steps to reproduce:
- Download Netbeans javaee 8.0.1
- Import a Maven project. 
- Use Netbeans normally for a few hours. 
  (used maven to compile a project deploy it on a web server for instance without any problems/delays)

Expected behavior:
- Computer should remains be reponsive

Actual behavior:
- After a few hours of usage the following message shows in the bottom corner of the Netbeans window: "Unpacking index for Nexus Public Mirror"
- The whole computer becomes nearly unresponsive . 
- Netbeans itself is excruciatingly slow.
- System has been like this for well over an hour.


Notes:
- Unzipping take a lot longer on my Windows machine compared to a a Linux box with similar horse power. I am thinking that the underlying unzipping algorithm could be the key.
- Interestingly the overall CPU usage as per the "Windows task manager" is only between 5% and 9%.
  - Netbeans memory usage went up to 750Mb and then back down to 422Mb. (Memory usage climbing again.)
  - Netbeans CPU usage is only between 1% and 4%
  - It seems to me that Netbeans is waiting for something. Maybe some decompression library at the OS level that is not listed by the "Windows Task Manager"?
  

Question:
- I will let the machine run overnight but I don't want to loose hours of work whenever the nexus re-indexing kicks-in. Can this feature be turned of? If so, how?

Best regards
Michel
Comment 14 mdanjou 2014-10-14 09:32:56 UTC
Yesterday, shortly after posting the above comment, a message appeared in Netbeans indicating that there was no space left on the C:\ drive. (only about 2Gb left) 

This was strange because I had checked my C:\ drive for free space that morning and I had over 25Gb free...

I found a series of huge files that were created by Netbeans under C:\Users\<nmyUserName>\AppData\Local\Temp
Specifically there was a folder called nexus-maven-index.gz879764343.... that was 19.8 Gb 
There were other nexus-maven-index folders on the system that were created yesterday but they were smaller in size.

I have deleted these files/folders as well as the .m2 folder and I am restarted Netbeans. Will keep an eye on it today.

Regards
Michel
Comment 15 mdanjou 2014-10-15 09:07:14 UTC
Well, the indexing issue is back. 

My computer was unusable for the whole afternoon and I let it run overnight. 
When I came back this morning and I had only 19Gb free on my C:\ (I had 37Gb free before the maven index unpacking started.)

Maybe I am missing something here but I can't see why the index would take so much space (18Gb).

Since the unpacking is finished the computer is responsive again but Netbeans feels very sluggish. 

To me this issue is a P2, not a P3... 

Please take action.
Regards,
Michel
Comment 16 oyviste 2014-11-03 06:41:30 UTC
I have to chime in here, because this problem is getting very annoying and ought to be fixed: Approx. once a week, Netbeans starts thrashing my SSD drive with huge amounts of written data (many many gigabytes written over and over), while doing "Unpacking index for ..<some Maven repo>". It hits my laptop so hard it makes it unusable for other tasks.

It looks like something is constantly optimizing (merging segments) of the underlying Lucene index at each step of the "Unpacking index" process. That's what it looks like in the Lucene index directory under ~/.cache/netbeans/8.0.1/mavenindex/<repo>/. Huge segments files of many gigabytes are written as fast as possible and then deleted (merged). This goes on for a good 10 minutes.

Lucene usually memory-maps these segment files, and it causes crippling heavy I/O, even on a modern laptop with fast SSD and 16GB RAM. (And it sucks life out of SSD as well !)

Something is broken with the Maven [Lucene] index integration.
Comment 17 oyviste 2014-11-03 06:42:33 UTC
I'm on Linux 64bit and using Oracle JDK8 to run Netbeans btw. (same issues also with JDK7).
Comment 18 oyviste 2014-11-03 07:11:44 UTC
Somewhere, in some code, one should make sure that Maven Lucene indices aren't optimized (force merged) all the time (after every modification ?). Optimizing multi-GB Lucene indexes has a high I/O cost. Though I am not familiar with the details of usage in Netbeans/Maven, so it might be caused by some strange Lucene merge policy or similar.

Is it possible to make sure that Maven Lucene indices are only merged down once at the end of the "Unpacking index .." processes ? Or even not at all ?
Comment 19 oyviste 2014-11-03 07:30:22 UTC
Since it may be a hint/trigger of this problem, I should mention that it only seems to happen when "Unpacking" our internal Sonatype Nexus Maven repository, which for some reason has a largeish index of about 7GiB on disk in Netbeans cache (with ~4.4 million Lucene docs in it).
Comment 20 atomixnmc 2014-11-04 08:22:44 UTC
I've met the same problem when add new dependency in my pom.xml . I have an MacBook Pro with i7 and 8GR, and nearly stuck with Netbean unpacking index from Sonatype repo. 

I'd like to know if this bug get fixed.

Product Version: NetBeans IDE 8.0 (Build 201408251540)
Updates: Updates available to version NetBeans 8.0.1 Patch 1.1
Java: 1.7.0_60; Java HotSpot(TM) 64-Bit Server VM 24.60-b09
Runtime: Java(TM) SE Runtime Environment 1.7.0_60-b19
System: Mac OS X version 10.8.5 running on x86_64; UTF-8; en_US (nb)
Comment 21 hmbrand 2014-11-13 14:37:04 UTC
Is there an option to disable this indexing altogether?

I build from the command prompt, so netbeans doesn't need these indexes at all (in my case)

[ build # :201408251540 ]
[ jdk version: 1.8.0_25 ]
[ Linux 3.11.10  x68_64 ]
[ 16 Gb RAM    12 CPU's ]
Comment 22 Tomas Stupka 2014-11-21 15:16:55 UTC
> Is there an option to disable this indexing altogether?
see options > java > maven > index
Comment 23 Tomas Stupka 2014-12-05 13:12:42 UTC
in case somebody wants to avoid index handling for a particular repository - added switch to suppress automatic repository  indexing for a specified list of repos
-J-Dmaven.indexing.doNotAutoIndex=[semicolon serated list of repo id-s]

core-main #c9284e7d40df
Comment 24 Tomas Stupka 2014-12-05 13:47:44 UTC
core-main #c62c1bc19f68 
several fixes to reduce indexing time - down to 35-50%

currently the downloaded gz file from maven central has ~ 150mb and results into a lucene index of ~ 750mb after aprox 2 minutes of indexing time on a moderate hardware (macbook, 2.4 GHz Intel Core i5,  RAM 8 GB, SSD) (download time not considered) so closing for now. 

it is always possible to bring the IDE down at some point - to process a many gigabytes big index file which results into an 5 times bigger lucene idx just can't be done in a few seconds. An index of 7 or 18Gb (like reported), just seems to huge. 

in case you still want to follow up it would be good to know if 
- this was caused by a proportionally big index file (e.g. 4GB???) downloaded from a repository or if things went wrong on the client (and we still have a chance to do something about it)
- if possible please also attach a profiler snapshot
Comment 25 Tomas Stupka 2014-12-08 12:47:06 UTC
one more thing - in case there is anybody how observed a too big (1gb+) lucene cache (see in  {userdir}/var/cache/mavenindex/{repo_name}) on a public repository, please let us know.
Comment 26 Quality Engineering 2014-12-09 04:20:26 UTC
Integrated into 'main-silver', will be available in build *201412090001* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)

Changeset: http://hg.netbeans.org/main-silver/rev/c62c1bc19f68
User: Tomas Stupka <tstupka@netbeans.org>
Log: issue #239915 - Unpacking index is extremely slow
Comment 27 everflux 2014-12-10 14:32:58 UTC
See #239704 regarding compressed indices with newer Lucene as well. (Decreases space requirements, increases indexing speed)
Comment 28 alied 2014-12-11 04:42:12 UTC
Tested with this repository:
https://maven.atlassian.com/content/groups/public/

Could not upload the snapshot; link below.

these are the netbeans start parameters:
netbeans_default_options="-J-server -J-Xss2m -J-Xms32m -J-Djava.io.tmpdir=/home/alied/.tmp/ -J-Dnetbeans.logger.console=true -J-ea -J-Dapple.laf.useScreenMenuBar=true -J-Dapple.awt.graphics.UseQuartz=true -J-Dsun.java2d.noddraw=true -J-Dsun.java2d.dpiaware=true -J-Dsun.zip.disableMemoryMapping=true -J-Dplugin.manager.check.updates=false -J-Dnetbeans.extbrowser.manual_chrome_plugin_install=yes"

index downloaded:
alied@development:~/.tmp$ ls -lh
total 636M
-rw-r--r-- 1 alied alied  184 Dec 11 00:59 fallback1809026237901536194.netbeans.pom
-rw-r--r-- 1 alied alied  182 Dec 11 00:59 fallback2910971787650671001.netbeans.pom
drwxr-xr-x 2 alied alied 4.0K Dec 11 00:59 jarfscachealied
drwxr-xr-x 2 alied alied 4.0K Dec 11 00:52 jna-92903101
-rw-r--r-- 1 alied alied 3.2K Dec 11 01:03 loading3418414702126207214.html
drwxr-xr-x 2 alied alied 4.0K Dec 11 01:14 nexus-maven-repository-index.gz2310359650455394803.dir
-rw-r--r-- 1 alied alied 635M Dec 11 01:14 nexus-maven-repository-index.gz8630440163568623504
-rw-r--r-- 1 alied alied 132K Dec 11 01:05 output1418270720970
-rw-r--r-- 1 alied alied 595K Dec 11 01:03 selfsampler3879241942145247205.npss
-rw-r--r-- 1 alied alied 1.3K Dec 11 01:02 uigesture4331769078722541539.html

HDD:
ATA device, with non-removable media
	Model Number:       WDC WD2500HHTZ-04N21V0                  
	Firmware Revision:  04.06A00
	Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

Processor:
AMD FX(tm)-8320 Eight-Core Processor

RAM: 8GB

Max. uncompressed size:
5.9GB

snapshot:
https://drive.google.com/open?id=0B81H8YEUuTY6V1VvNEhpOVZ2S3c&authuser=0
Comment 29 alied 2014-12-11 05:04:26 UTC
Same for atlassian public repository:

https://m2proxy.atlassian.com/repository/public/

root@development:/home/alied/.tmp# ls -lh
total 637M
-rw-r--r-- 1 alied alied  184 Dec 11 00:59 fallback1809026237901536194.netbeans.pom
-rw-r--r-- 1 alied alied  182 Dec 11 00:59 fallback2910971787650671001.netbeans.pom
drwxr-xr-x 2 alied alied 4.0K Dec 11 00:59 jarfscachealied
drwxr-xr-x 2 alied alied 4.0K Dec 11 00:52 jna-92903101
-rw-r--r-- 1 alied alied 3.2K Dec 11 01:03 loading3418414702126207214.html
-rw-r--r-- 1 alied alied 635M Dec 11 01:47 nexus-maven-repository-index.gz237556840577647767
drwxr-xr-x 2 alied alied  12K Dec 11 01:49 nexus-maven-repository-index.gz8483188642343373115.dir
-rw-r--r-- 1 alied alied 136K Dec 11 01:30 output1418270720970
-rw-r--r-- 1 alied alied 1.2M Dec 11 01:28 selfsampler2253156228267891884.npss
-rw-r--r-- 1 alied alied 595K Dec 11 01:03 selfsampler3879241942145247205.npss
-rw-r--r-- 1 alied alied 1.3K Dec 11 01:02 uigesture4331769078722541539.html

Max. uncompressed size:
6.2GB


Product Version: NetBeans IDE Dev (Build 201412100001)
Java: 1.8.0_25; Java HotSpot(TM) 64-Bit Server VM 25.25-b02
Runtime: Java(TM) SE Runtime Environment 1.8.0_25-b17
System: Linux version 3.17.6 running on amd64; UTF-8; en_US (nb)
User directory: /home/alied/.netbeans/dev
Cache directory: /home/alied/.cache/netbeans/dev

snapshot:
https://drive.google.com/open?id=0B81H8YEUuTY6V1VvNEhpOVZ2S3c&authuser=0

P.S. I  could run these tests for #239704 as well.
Comment 30 alied 2014-12-11 05:10:57 UTC
ERRATA:
actual link for previous snapshot is: https://drive.google.com/open?id=0B81H8YEUuTY6bGctVWpja0piVUE&authuser=0

Anyway, you should be able to access both in https://drive.google.com/folderview?id=0B81H8YEUuTY6RThFeWVpLVNBUm8&usp=sharing
Comment 31 Tomas Stupka 2015-05-27 14:55:08 UTC
*** Bug 250158 has been marked as a duplicate of this bug. ***
Comment 32 Tomas Stupka 2015-06-08 09:08:40 UTC
*** Bug 252851 has been marked as a duplicate of this bug. ***
Comment 33 cbourque 2015-08-27 16:27:11 UTC
Why can't NetBeans simply use Nexus incremental indexes (like Elipse)?

I understand that the first time is has to download the full index but on subsequent weeks it could use the incremenental chunks!
Comment 34 mtbadi39 2015-11-17 09:04:59 UTC
Issue is back. My machine is literally not usable more then 3 hours while this "Unpacking index for Sonatype Repository" is running.

Product Version: NetBeans IDE 8.1 (Build 201510222201)
Java           : 1.7.0_75; Java HotSpot(TM) Client VM 24.75-b04
Runtime        : Java(TM) SE Runtime Environment 1.7.0_75-b13
System         : Windows 7 Professionnel 64-bit (6.1, Build 7601) Service Pack 1
Memory         : 4GB
Processor      : Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz (4 CPUs), ~3.2GHz

size of "C:\Users\badi.mohammedtahar\AppData\Local\NetBeans\Cache\8.1\mavenindex" = 11.2 GO
Comment 35 tobijdc 2015-12-14 08:45:21 UTC
Extremely high memory usage (>5GB) and system unusable for hours if not canceled.

Windows 7 64-bit
Product Version: NetBeans IDE 8.1 (Build 201510222201)
Java: 1.8.0_45; Java HotSpot(TM) 64-Bit Server VM 25.45-b02
Runtime: Java(TM) SE Runtime Environment 1.8.0_45-b15
System: Windows 7 version 6.1 running on amd64; Cp1252; en_US (nb)
Maven 3.3.9

Memory: 8GB
Processor: Intel Core i7
Comment 36 _ gtzabari 2016-03-15 13:26:59 UTC
Out of curiosity, is it possible to index across multiple cores? It's pretty frustrating waiting over 30 minutes for "unpacking index" with CPU usage under 12%.
Comment 37 bht 2016-09-25 22:32:58 UTC
When I disable the the maven index download via the Index Update Frequency setting, then there is an issue with the test connection button of the proxy settings as follows: The test does not finish with a failure even when I wait. The Cancel button on the dialog just hides the dialog, and when I open the Options dialog later, then it is still in its previous state, on the connection test with the progress bar animated.
Comment 38 bht 2016-09-25 22:44:28 UTC
In the above case, the IDE is again frozen to the degree that I cannot close it, I cannot interact with the menu, anything. JVisualVM waits indefinitely trying to connect to NetBeans which is now shown in it.

Could you please try to make an improvement to the NetBeans networking code in general. It does not matter what. Just get started with it. I am confident that anything will help. I hope that I have demonstrated that the code is lacking robustness.

As I have suggested elsewhere in this system, it would help to apply a strategy of post mortem analysis. The user needs feedback regarding the the source of the problem. A contemporary computer program should be able to sort out these type of issues without the requirement for support action.
Comment 39 bht 2016-09-25 22:50:51 UTC
Created attachment 162209 [details]
Thread dump for the latest scenario

JVisualVM finally allowed me to take a thread dump, but its progress bar is still showing "Opening NetBeans ...", being animated.
Comment 40 bht 2016-09-25 22:52:05 UTC
Comment on attachment 162209 [details]
Thread dump for the latest scenario

Sorry, wrong bug
Comment 41 bht 2016-09-25 22:53:39 UTC
Sorry, my comments are for the wrong bug. This is because bugzilla moves to another bug after an update - annoying.
Comment 42 CyRaid 2017-05-16 20:45:25 UTC
Any update on this? :)
Comment 43 amd 2017-06-28 13:32:46 UTC
This is still a major problem. I am using a 64-bit Windows 10 Pro machine, i7 quad core with 4.2Ghz, 64 GB of RAM, and a SSD. I'm running NetBeans 8.2 (build 201609300101) with NetBeans 8.2 Patch 2. I have Java 1.8_131.

When NetBeans starts to index a maven repository, it kills my computer. Windows Task Manager shows that I have plenty of CPU and Memory available. But, my Disk I/O is 100% used. I increased my minimum Page file size to 64 GB without any improvement.

I sorted the Resource Monitor by Read (B/sec), and it shows that netbeans64.exe has one PID with 12 files open in the AppData\Local\NetBeans\Cache\8.2\mavenindex\sonatype-public-repository directory. It is reading from these files at a combined rate of 20,000,000 B/sec.

Thank you for offering a free development environment. But, I cannot use NetBeans because of this problem.
Comment 44 farouka 2017-07-28 20:12:16 UTC
I cannot use netbeans anymore these days. been a user for 10 years! This is killing my disk. Its always 100%.
Comment 45 clucgdc 2017-07-29 04:14:28 UTC
For troubleshooting purpose, could you take screenshots or capture a video of following:

- free disk space
- slow index unpacking
- performance stats / graphs

and info of:

- O/S version
- JDK version


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2014, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo