This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 87164 - Provide unit testing framework for lexer
Summary: Provide unit testing framework for lexer
Status: NEW
Alias: None
Product: editor
Classification: Unclassified
Component: Lexer (show other bugs)
Version: 6.x
Hardware: All All
: P2 blocker (vote)
Assignee: Miloslav Metelka
URL:
Keywords: TEST
Depends on:
Blocks:
 
Reported: 2006-10-14 12:46 UTC by Jesse Glick
Modified: 2006-12-15 14:45 UTC (History)
3 users (show)

See Also:
Issue Type: ENHANCEMENT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Glick 2006-10-14 12:46:45 UTC
A lexer is a quite self-contained piece of code and it ought to be simple to
unit test. Probably just as easy or even easier to write a proper test than to
manually verify the behavior once in a running IDE. To make this as easy as
possible, there should be a test framework in the Lexer API which particular
language lexers can reuse. For example,

assertLex(HTMLLanguage.description(), /*some config maybe*/...,
"[BLOCK_COMMENT]<!-- ... -->[WS]\n" +
"[TAG_OPEN]<[TAG_OPEN_SYMBOL]hello[TAG_OPEN]>" +
[TEXT]kitty[TAG_CLOSE]</[TAG_CLOSE_SYMBOL]hello[TAG_CLOSE]>");

The assertLex method would strip out any recognized token IDs inside square
brackets, parse the remaining text, and verify that its tokenization follows the
specified sequence. (Of course you could extend this to check information about
embedded languages, etc.) The idea is to make the unit tests as short, readable,
and intuitive as possible.
Comment 1 Miloslav Metelka 2006-10-16 16:33:20 UTC
Currently there is only a support for randomized testing in
org.netbeans.lib.lexer.test.TestRandomModify that allows to specify the
probability for specific chars and strings but it only compares the
incrementally updated token list with a batch lexed one. It does not check the
proper identity of the individual tokens.
It's true that with the declarative stuff the tests like e.g. JavaLexerBatchTest
could become more terse and could be written more quickly so I like this idea.
Comment 2 Miloslav Metelka 2006-12-15 14:45:20 UTC
Some time ago I've made LexerTestUtilities.checkTokenDump() that gets an input
file as parameter and produces a text output describing the created tokens from
the input (please see the javadoc of the method). The output is written to a
file in the same directory with "tokens.txt" appended if such file does not
exist yet and the test fails notifying the user that the file was created (user
should check whether the produced tokens match expectations). If the output file
already exists its content is compared against the produced output and the test
would fail if the contents would not be exactly the same.
As I was coding it I have added some features like multiple inputs (virtual
EOFs), test naming and special chars. I'm using extra lines for control
sequences and interleave the directives with dots to distinguish them from
regular input. The input specification could be rewritten to xml if desirable.
Please see in lexer: TokenDumpTest, TokenDumpTestFile.txt (for control
directives) and in java/lexer: JavaTokenDumpTest and testInput.java.txt and
testInput.java.txt.tokens.txt for using it to test java lexer correctness.
If this would suffice I would then close this issue.