This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | FileEncodingQuery.ProxyCharset.decode() returns an empty buffer for input of size 4 kB or less | ||
---|---|---|---|
Product: | projects | Reporter: | Marian Petras <mpetras> |
Component: | Generic Infrastructure | Assignee: | Tomas Zezula <tzezula> |
Status: | VERIFIED WONTFIX | ||
Severity: | blocker | CC: | mmirilovic |
Priority: | P3 | Keywords: | JDK_SPECIFIC |
Version: | 6.x | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | DEFECT | Exception Reporter: |
Description
Marian Petras
2007-05-03 05:38:31 UTC
This bug is the cause of P1 bug #103067 ("Find/Replace in projects removed content of all not saved classes") - that's why I set the priority to P1. For the immediate cause, look at the source code of method java.nio.CharsetDecoder.decode(ByteBuffer), at the for(;;) loop: In JDK 1.5.0_11, the critical part is: for (;;) { CoderResult cr; if (in.hasRemaining()) cr = decode(in, out, true); else cr = flush(out); if (cr.isUnderflow()) break; ... } It means that as soon as you return CoderResult.UNDERFLOW from method decodeLoop(...), the subsequent check for cr.isUnderflow() is met, the cycle is interrupted (break;) and the flush(...) method is never called. In JDK 1.6.0_01, the critical part is different: for (;;) { CoderResult cr = in.hasRemaining() ? decode(in, out, true) : CoderResult.UNDERFLOW; if (cr.isUnderflow()) cr = flush(out); if (cr.isUnderflow()) break; ... } If you return CoderResult.UNDERFLOW from method decodeLoop(...), the subsequent condition cr.isUnderflow() is met and the flush(...) method is called. Only then the cycle is interrupted (break;). A workaround for the unexpected behaviour in JDK 1.5.x is possible. In method decodeLoop(...), before returning CoderResult.CR_UNDERFLOW, decode all buffered bytes and send the resulting chars to the output buffer. I did not find an exactly matching JDK bug but it seems that bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6221056 is close to it. The client of CharSet[Encoder|Decoder] has to flush the encoder|decoder at the end of encdoing|decoding by calling flush. You can easily verify it by running the FileEncodingQueryTest from project/queries on JDK 1.5, it tests also the block < 4KB. For even more details see sun.nio.cs.[StreamDecoder|StreamEncoder]. What I use in my code is method Charset.decode(ByteBuffer) which is a shortcut for charset.newDecoder() .onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE) .decode(bb); (see http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html#decode%28java.nio.ByteBuffer%29) Javadoc documentation for method CharsetDecoder.decode(ByteBuffer) states that "This method implements an entire decoding operation; that is, it resets this decoder, then it decodes the bytes in the given byte buffer, and finally it flushes this decoder." (see http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/CharsetDecoder.html#decode%28java.nio.ByteBuffer%29) I have not studied source code of sun.nio.cs.StreamDecoder but I have studied this bug report against it: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4744247 ("StreamDecoder.CharsetSD.read does not invoke CharsetDecoder.flush") The issue against StreamDecoder does not matter. It seems rather as a bug of 1.5 StreamEncoder, but there is an simple workaround, use 3 parameter version of encode (decode). Here is an algorithm: Encoder enc; while (haveSomethingToEncode) { enc.encode (in,out,false); } enc.encode (emptyIn, out, true); enc.flush (out); I am not even sure if this problem should be worked around in our implementation of Charset. The workaround mentioned by Marian does not work. The CharsetEncoder needs to maintain an internal state since it does not know to which CharseEncoder it should delegate it is decided either by calling flush on it or by over crossing size 4KB. There is no way how to work around the JDK 1.5 issue on the FEQ side since it cannot find out if other data will come or not. Anyway I don't understand why do you report it to NetBeans not to the JDK. The client of CharsetDecoder can use the 3 params version of decode as I explained above to workaround this problem. The workaround I described was the one I use in my custom decoder in the Properties module. I did not know the details of your implementation so I did not know it could not be used. I know it is caused by a bug in the JDK but I thought that it would be better if you made the workaround on the FEQ side than if your clients had to do their workarounds. Now that I understand that workaround on your side is not possible, I will use a workaround on my side, i.e. I will use the three-argument decode method as you have suggested. Thanks. Integrated into 'main-golden', will be available in build *201204250400* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress) Changeset: http://hg.netbeans.org/main-golden/rev/358f7f0a41c5 User: Jesse Glick <jglick@netbeans.org> Log: decodeByteBuffer should no longer be needed for #103067/#103193 fixes since JDK 6. |