The XML lexer uses a bitfield Integer as its state. Since bits are shifted e.g. 24bits, the Integer.valueOf() will not cache the values, and in large documents, an Integer is allocated for each token.
If the bits are used more reasonably (e.g. consider to merge subState and prevState ?), there could be a table of all possible states pre-filled (e.g. 2500 x Integer max ?).
See defect #223953, in a 15M document, number of Integers is 1,9M and their allocated size alone about 23Mbytes.
implemented in http://hg.netbeans.org/jet-main/rev/4871fcd8e4d9
Integrated into 'main-golden', will be available in build *201305021042* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
User: Svata Dedic <email@example.com>
Log: #225628: Lexer states reduced so they fit into byte