103024 – Support spliting tokens to parser

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 103024 - Support spliting tokens to parser

Summary: Support spliting tokens to parser

Status:	NEW

Alias:	None

Product:	obsolete
Classification:	Unclassified
Component:	languages (show other bugs)
Version:	5.x
Hardware:	All All

Importance:	P3 blocker (vote)
Assignee:	issues@obsolete

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-04-30 20:17 UTC by _ dcaoyuan
Modified:	2009-03-25 15:38 UTC (History)
CC List:	0 users

See Also:
Issue Type:	ENHANCEMENT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description _ dcaoyuan 2007-04-30 20:17:05 UTC

Currently engine reads a file as String, then scan it to tokens, then parse all
tokens as a whole. Thus, in case of big source file, the parsing procedure eats
too much memory, and may cause out of heap space.

To support parsing big source file, the tokens may be split to children of top
'S', then parse token-segments instead of tokens as a whole.

Comment 1 Jan Jancura 2007-05-03 10:04:40 UTC

Reading file as one string is hack, and it should be fixed - you are right. We
should read file token by token.

But I am not sure If its the only problem you would like to address, or if you
see more issues there.

Comment 2 _ dcaoyuan 2007-05-09 07:59:49 UTC

In case of Erlang, the lexical scanning seems not the memory eater, all
'outOfMemoryError' exceptions occurred during syntax parsing. But the final AST
tree seems also not the memory eater, the memory eaters are these Map/Array in
LLSyntaxAnalyser.java. 

I'm not sure if we can parse syntax in such way:
1. parse some tokens, when a top level segment which is the direct child of +S
is got, put it into the AST tree, and release all resources. Loop this procedure.

2. or, if the token can be split to several top level segments that are direct
child of +S, for instance, in Erlang, I can split them by searching a special
'.' char, (for Javascript, I'm not sure if we can also do this by searching
something like "{" "}" pair in lexical scanning procedure), then, I can split
tokens to a couple of segments, and do syntax parsing on one split token segment
each time, then unite them later.

And, how about supporting not only a file inputstream but also a piece of string
as the lexer/parser input?