268094 – Excessive IO coming from TrieDictionary (causes 7-10 secs delay in initial project load)

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 268094 - Excessive IO coming from TrieDictionary (causes 7-10 secs delay in initial project load)

Summary: Excessive IO coming from TrieDictionary (causes 7-10 secs delay in initial pr...

Status:	NEW

Alias:	None

Product:	editor
Classification:	Unclassified
Component:	Spellchecker (show other bugs)
Version:	Dev
Hardware:	PC Windows 7

Importance:	P3 normal (vote)
Assignee:	Milutin Kristofic

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-09-19 10:45 UTC by NukemBy
Modified:	2016-09-22 20:19 UTC (History)
CC List:	0 users

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
nb-spellshecker-io.jpg (173.12 KB, image/jpeg) 2016-09-19 10:45 UTC, NukemBy	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description NukemBy 2016-09-19 10:45:31 UTC

Created attachment 162108 [details]
nb-spellshecker-io.jpg

At initial startup when NetBeans load project there is some stage when message "Building dictionaries" appears in status bar. In my environment it lasts for around 7-10 seconds and block consecutive background scan and delays workable state of NetBeans.

Self profiler shows that 90% of that time goes into file IO organized via RandomAccessFile (see attached screenshot). There is API which is 350 times more efficient - I recommend switching to it. Details are below:

    randomAccessFile: 7535ms
    mappedByteBuffer: 21ms

Both methods generate binary identical files with significantly different time.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
public class FileRandomIOTest {

    public static int N_REPEAT = 1000000;
    public static int N_FILESIZE = N_REPEAT;
    public static int test_pos[] = new int[N_REPEAT];
    public static int test_data[] = new int[N_REPEAT];
    
    static {
        for( int i = 0; i < N_REPEAT; i++ ) {
            test_pos[i] = (int)(Math.random() * (N_REPEAT - 8));
            test_data[i] = (int)(Math.random() * Integer.MAX_VALUE);
        }
    }
    
    @Test
    public void randomAccessFile() throws Exception {
        long started = System.currentTimeMillis();
        File tmpFile = File.createTempFile("randomAccessFile", ".tmp");
        
        try( RandomAccessFile out = new RandomAccessFile(tmpFile, "rw") ) {
            out.setLength(N_FILESIZE);
            for( int i = 0; i < N_REPEAT; i++ ) {
                out.seek(test_pos[i]);
                out.writeInt(test_data[i]);
            }
        }
        
        long finished = System.currentTimeMillis();
        
        System.out.println("randomAccessFile: " + (finished - started) + "ms");
    }
    
    @Test
    public void mappedByteBuffer() throws Exception {
        long started = System.currentTimeMillis();
        File tmpFile = File.createTempFile("mappedByteBuffer", ".tmp");
        
        MappedByteBuffer out = new RandomAccessFile(tmpFile, "rw")
                .getChannel().map(FileChannel.MapMode.READ_WRITE, 0, N_FILESIZE);
        
        for( int i = 0; i < N_REPEAT; i++ ) {
            out.putInt(test_pos[i], test_data[i]);
        }
        
        out.force();
        
        long finished = System.currentTimeMillis();
        
        System.out.println("mappedByteBuffer: " + (finished - started) + "ms");
    }
}



Current implementation of file IO in org.netbeans.modules.spellchecker.TrieDictionary.
(Can be easily adopted to use MappedByteBuffer)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

    private static class ByteArray {

        private final RandomAccessFile out;

        public ByteArray(File out) throws FileNotFoundException {
            this.out = new RandomAccessFile(out, "rw");
        }

        public void put(int pos, char what) throws IOException {
            out.seek(pos);
            out.writeChar(what);
        }

        public void put(int pos, byte what) throws IOException {
            out.seek(pos);
            out.writeByte(what);
        }

        public void put(int pos, int what) throws IOException {
            out.seek(pos);
            out.writeInt(what);
        }

        public void close() throws IOException {
            out.close();
        }
    }

Comment 1 NukemBy 2016-09-19 19:40:48 UTC

Mentioned MappedByteBuffer (which is memory-mapped file) has side effects in Java - it is not possible to explicitly close or resize target file on disk. 

Since size of the generated file is rather small (only 4 MB on my machine) - it is sufficient to generate it in memory and then dump to disk. Below is the adopted implementation which works for me:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

    private static class ByteArray {

        private final File outFile;
        private ByteBuffer buf = null;
        private int size = 0;

        public ByteArray(File out) throws FileNotFoundException, IOException {
            this.outFile = out; 
            this.buf = ByteBuffer.allocate(65 * 1024 * 1024);
        }

        private int markUsed(int pos, int itemSize) {
            size = Math.max(size, pos + itemSize);
            return size;
        }
        
        public void put(int pos, char what) throws IOException {
            buf.putChar(pos, what);
            markUsed(pos, Character.SIZE / 8);
        }

        public void put(int pos, byte what) throws IOException {
            buf.put(pos, what);
            markUsed(pos, Byte.SIZE / 8);
        }

        public void put(int pos, int what) throws IOException {
            buf.putInt(pos, what);
            markUsed(pos, Integer.SIZE / 8);
        }

        public void close() throws IOException {
            buf.limit(size);
            try( FileOutputStream fos = new FileOutputStream(outFile) ) {
                try( FileChannel fc = fos.getChannel() ) {
                    fc.write(buf);
                }
            }
        }
    }

Comment 2 Milutin Kristofic 2016-09-21 12:45:56 UTC

I will look at this for next version, but it is not now top priority, since it is not regression

Comment 3 Jan Lahoda 2016-09-21 12:55:55 UTC

Please note the writing was originally done in memory, but was causing OOME is some cases:
https://netbeans.org/bugzilla/show_bug.cgi?id=191287

Comment 4 NukemBy 2016-09-22 20:19:35 UTC

I'm not very certain in my conclusions, but it seems to me that above-mentioned issue is wrong fix for not existing problem. "Wrong" - because degradation in performance after applied changes is far more significant than saved memory. "Not existing" - i suspect that main reason for OOM was just too small Xmx. Memory dump attached there is 55MB - roughly the allocated heap size for that application - this tells me about rather low Xmx (plus - dump does not contain anything pointing to TriDictionary). Post probably failure occurred in serialization of TriDictionary happened only because it was the last one doing something at that moment. It would happen to anything else that would try to allocate another, let's say, 500K of memory.

Dictionary currently being used in netbeans is roughly 40 larger than file attached to mentioned issue and in serialized form it consumes 3.7MB. It is just nothing on modern computers.

So ... the quickest and most cost-efficient solution "for today" would be reverting changes made in #191287. My proposal to use various flavors of bytebuffer actually require pre-allocation of more memory, than in original implementation - so less efficient.

"Ideal" fix for that issue, in terms of optimal performance, is reworking of serialization algorithm to save via sequential buffered DataOutputStream. Not sure if it is possible and, mostly probably, level of the gained optimization will not correspond to effort spent on rework. I would not go that way unless OOM reoccurs with reasonable amount of memory.