This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 201802

Summary: Cache the cache to RAM
Product: ide Reporter: ulfzibis <ulfzibis>
Component: PerformanceAssignee: Tomas Hurka <thurka>
Status: NEW ---    
Severity: normal CC: anebuzelsky, jglick, jlahoda, matthies, mrpc, pjiricka, tzezula, vv159170
Priority: P2 Keywords: PERFORMANCE
Version: 7.1   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Exception Reporter:

Description ulfzibis 2011-09-08 10:55:18 UTC
[ JDK VERSION : 1.6.27 ]

As discovered by André Sabosch, redirecting the IDE's scan results cache to a
ram disk, dramatically accelerates NetBeans IDE workflow, I suggest:
1. The scan results should be cached to ram instead directly to the slow
persistent file system.
2. A background thread with kinda low priority should persist the cache to the
file system in times of low activity.

P2, as I believe, such enhancement should be a BIG step in increasing NetBeans
IDE's overall PERFORMANCE.
Comment 1 matthies 2011-09-08 11:28:21 UTC
But make this a switchable option. Developers low on RAM and/or using an SSD will want to have the cache on the file system and not in RAM.
Comment 2 Selpi 2011-09-08 12:02:08 UTC
For my 3 open php projects with about 100000 files the size of a cache is 244Mb and for more projects it will be much bigger. 

So, if this feature will be added to IDE, it should be configurable:
- on|off
- cache size limit
Comment 3 mrpc 2011-09-08 12:47:26 UTC
Same here... I have very large projects, so this could kill the ram if it wasn't configurable. I agree with you.

(In reply to comment #2)
> For my 3 open php projects with about 100000 files the size of a cache is 244Mb
> and for more projects it will be much bigger. 
> 
> So, if this feature will be added to IDE, it should be configurable:
> - on|off
> - cache size limit
Comment 4 Jesse Glick 2011-09-09 14:56:37 UTC
I guess you can always prototype this yourself; for example, in Linux:

rm -rf /ram/$USER-nbcache && \
mkdir -p ~/.cache/netbeans && \
cp -ar ~/.cache/netbeans /ram/$USER-nbcache && \
.../bin/netbeans --cachedir /ram/$USER-nbcache && \
rm -rf ~/.cache/netbeans && \
mv /ram/$USER-nbcache ~/.cache/netbeans

would use the RAM disk for the cache dir but also persist it to a regular disk so you do not recreate all caches just because you rebooted.
Comment 5 ulfzibis 2011-09-10 19:59:27 UTC
(In reply to comment #4)
> I guess you can always prototype this yourself; for example, in Linux:
> ...
Interesting script.

I can imagine, caching the scan data in an internal java object tree rather than an OS ram disk would again increase access speed.
Use UTF-8 coded byte arrays instead char arrays or Strings to minimize the memory footprint. Maybe the data could be held in SoftReference containers, which perform automatic persisting to disk when GCed.
Comment 6 Jesse Glick 2011-10-12 21:37:01 UTC
(In reply to comment #5)
> I can imagine, caching the scan data in an internal java object tree rather
> than an OS ram disk would again increase access speed.

This would make heap usage intolerably high. And SoftReference does not work as consistently as you suppose. Better to use a regular random-access file handle; operating systems cache file segments in RAM when there is plenty available.
Comment 7 ulfzibis 2011-10-13 13:22:05 UTC
(In reply to comment #6)
> This would make heap usage intolerably high.
Hm, it should not matter, if one wastes his RAM with RAM disk, or big heap. The feature should only be enabled by option.

And again, if heap comes at it's configured limit, less used portions of the cache could be silently written to the disk. I suspect, if any RAM disk could do this.

If I see correctly, the cache index is realized by fully written textual pointers. Inside JVM we only need 4-byte pointers, which additionally should not be parsed on each access.

> And SoftReference does not work as consistently as you suppose.
Ok, you have deeper insight. Was just a suggestion.
OT: Maybe db4o [1] could be a fast and footprint-saving option, to persist the index tree in adjustable depth.

> Better to use a regular random-access file handle;
> operating systems cache file segments in RAM when there is plenty available.
Hm, the threshold for "plenty available" seems quite low, otherwise: why is RAM disk so much faster?

[1] http://en.wikipedia.org/wiki/Db4o
Comment 8 Jesse Glick 2011-10-29 01:46:40 UTC
(In reply to comment #7)
> it should not matter, if one wastes his RAM with RAM disk, or big heap.

It can matter, since the heap is managed by a garbage collector.

> less used portions of the
> cache could be silently written to the disk. I [doubt?] if any RAM disk could
> do this.

No, it would not. However opening a regular disk file _can_ work this way; an operating system will allocate some RAM to caching disk pages, dumping the caches when applications request more address space.

> why is RAM disk so much faster?

Because its contents are not expected to ever be saved to disk. You can set your cachedir to a RAM disk location if you want but if you have to reboot the computer everything will have to be recreated from scratch.
Comment 9 ulfzibis 2012-04-24 11:49:43 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > it should not matter, if one wastes his RAM with RAM disk, or big heap.
> 
> It can matter, since the heap is managed by a garbage collector.

I do not really understand, what you mean by that. If you mean GC's additional processing time ...
I believe, a RAM disk must have a kind of garbage collector too, to reuse deleted blocks + dynamically allocate and free memory.
I too believe, GC's processing time per heap size highly depends on the object's sizes to manage. In my current NB cache, the average file size is 20,000 bytes. So if NB's scan task would save it's data in common java byte arrays, for a 200 MB cache, GC would have to manage 10.000 additional objects. Is that MUCH ?

> > why is RAM disk so much faster?
> 
> Because its contents are not expected to ever be saved to disk.
But saving 200 MB to disk should not need more than 5 seconds.
Comment 10 Jan Lahoda 2012-04-24 15:59:30 UTC
Let me split this into several parts:
-the delayed write: I am not sure if that is feasible for production environment. The IDE would need to handle situations like power lost before the caches have been fully synchronized to the disk, etc. Also, the effect may not be as good as it would seem: the Java indexing is able to use free heap to significantly speed up the compilation (only for big enough projects/source roots, of course). Anything that interferes with this typically slows down the (Java) indexing significantly. But of course a production-ready patch, tested and with long-term maintenance proving me wrong is very welcome.
-caching the whole cache content (even per source root) does not make much sense to me: pretty big parts of the cache are not typically used and keeping them in memory would simply be wasteful.
-the "Lucene index" parts of the cache are already cached in RAM since 7.1 or 7.0 on their first use (until a write is performed into the given index). SoftReferences and preset memory limit are used to manage the size of the cache. This made some operations in the IDE much faster.
-another part of the cache for which a memory cache might help are the classfiles that are required to attribute the file(s) opened in editor. How much that does or does not help requires an evaluation, of course. The remark on compilation of big source roots applies to refactoring on big source roots as well and this could also interfere with the cache reserved for the Lucene indexes, so some care is required. My wild guess (based on some previous experiments) is that on Linux the effect is going to be negligible, either in "good" or "bad" direction, at least in most cases. As this would interfere with standard Linux memory management, the effect may actually be pretty bad when the system is running out of the physical memory, but that is not that critical anyway. But might prove to be (possibly much) better on Windows.
Comment 11 ulfzibis 2012-04-24 19:15:10 UTC
(In reply to comment #6)
> Better to use a regular random-access file handle;
> operating systems cache file segments in RAM when there is plenty available.
(In reply to comment #10)
> My wild guess (based on some previous
> experiments) is that on Linux the effect is going to be negligible, either in
> "good" or "bad" direction, at least in most cases. As this would interfere with
> standard Linux memory management, the effect may actually be pretty bad when
> the system is running out of the physical memory, but that is not that critical
> anyway. But might prove to be (possibly much) better on Windows.

So the performance increase by using a RAM disk is only perceivable on Windows?

Did you try java.nio.MappedByteBuffer against normal random access file?
IIRC there is a JVM setting to define the max size of the mapping memory, so the overall result could be similar to a RAM disk, but automatic persisting to physical disk would be included for free.
Comment 12 Vladimir Voskresensky 2012-05-02 07:18:08 UTC
(In reply to comment #11)
> (In reply to comment #6)
> > Better to use a regular random-access file handle;
> > operating systems cache file segments in RAM when there is plenty available.
> (In reply to comment #10)
> > My wild guess (based on some previous
> > experiments) is that on Linux the effect is going to be negligible, either in
> > "good" or "bad" direction, at least in most cases. As this would interfere with
> > standard Linux memory management, the effect may actually be pretty bad when
> > the system is running out of the physical memory, but that is not that critical
> > anyway. But might prove to be (possibly much) better on Windows.
> 
> So the performance increase by using a RAM disk is only perceivable on Windows?
> 
Based on my experience I see significant difference on Linux. The main precondition is to have big RAM (I have 8Gb).
I use 
sudo mkdir -p /mount/ramdisk
sudo mount -t tmpfs  /dev/ram0 /mount/ramdisk
netbeans --cachedir /mount/ramdisk/nb-cache

I do not reboot laptop => cache is not recreated each time.
What I see:
Go To Type/Symbol/File is faster in times (especially the first usages after IDE restart) comparing to Linux workstation with 8Gb RAM but default placement of cachedir
Comment 13 Jan Lahoda 2012-05-02 08:31:32 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #6)
> > > Better to use a regular random-access file handle;
> > > operating systems cache file segments in RAM when there is plenty available.
> > (In reply to comment #10)
> > > My wild guess (based on some previous
> > > experiments) is that on Linux the effect is going to be negligible, either in
> > > "good" or "bad" direction, at least in most cases. As this would interfere with
> > > standard Linux memory management, the effect may actually be pretty bad when
> > > the system is running out of the physical memory, but that is not that critical
> > > anyway. But might prove to be (possibly much) better on Windows.
> > 
> > So the performance increase by using a RAM disk is only perceivable on Windows?
> > 
> Based on my experience I see significant difference on Linux. The main
> precondition is to have big RAM (I have 8Gb).
> I use 
> sudo mkdir -p /mount/ramdisk
> sudo mount -t tmpfs  /dev/ram0 /mount/ramdisk
> netbeans --cachedir /mount/ramdisk/nb-cache
> 
> I do not reboot laptop => cache is not recreated each time.
> What I see:
> Go To Type/Symbol/File is faster in times (especially the first usages after
> IDE restart) comparing to Linux workstation with 8Gb RAM but default placement
> of cachedir

The question is whether you see significant improvements also in other cases than in the index-heavy features (like Go to Type/Symbol/File). Because Lucene indexes *are* already cached in memory. This is probably why you are seeing the improvement especially on first use. Two things important to note, IMO:
-the Lucene in-memory cache itself uses a limited percentage of the heap+SoftReferences for the rest. I don't think this can be much improved, because if there is not enough heap space to keep all the indexes and other IDE data, we cannot crash with OutOfMemoryError (or slow down other features significantly by not giving them reasonable amount of memory).
-the indexes are load into the memory on first use. While it would be conceptually possible to load them ahead of time, e.g. during scanning, I am not sure if that is reasonable thing to do in general: if the user's workflow does not include heavy work with Lucene indexes, this would mean slowing down the scanning and wasting memory&GC cycles for something the user does not need. E.g., I rarely use Go to File, why should the IDE waste CPU&I/O cycles to preload it?