86473 – Create language embedding through Lexer API

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 86473 - Create language embedding through Lexer API

Summary: Create language embedding through Lexer API

Status:	RESOLVED FIXED

Alias:	None

Product:	editor
Classification:	Unclassified
Component:	Lexer (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	apireviews

URL:
Keywords:	API_REVIEW_FAST

Depends on:
Blocks:	89324
	Show dependency tree

Reported:	2006-10-04 15:16 UTC by Miloslav Metelka
Modified:	2006-12-04 19:39 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	ENHANCEMENT
Exception Reporter:

Attachments
Diff of the change (459.15 KB, patch) 2006-11-28 14:42 UTC, Miloslav Metelka	Details \| Diff
List of committed files (20.58 KB, text/plain) 2006-12-04 15:51 UTC, Miloslav Metelka	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Miloslav Metelka 2006-10-04 15:16:19 UTC

There are usecases that require the dynamic language embedding creation for the
API clients:
1) A java string literal may contain a e.g. a SQL statement text. As there is a
default embedding that recognizes escaped characters e.g. "\n" then this allows
for more than one embedding presence for a single token (the default embedding
should be available as well as there may be clients relying on its presence).

2) A <script> html tag does not need to specify the type of the scripting
language and the default language may be overriden by

<META http-equiv="Content-Script-Type" content="type">

Although the lexer could in theory recognize such declaration and store the
content-type in the state object of each token that follows the declaration it
is non-practical as recognizing the declaration above is a task that should be
reserved for a html parser.

3) A script that follows the <script> tag may be written as comment surrounded by
 <!--
// -->
and there may be no default embedding for the comment so the parser must request
explicit embedding creation for the comment token.


Requirements:
1) API method must be added to TokenSequence for custom embedding creation.

2) Notification model must be extended so that clients (e.g. syntax coloring)
may notice creation of new embedding.

3) Clients must also listen for the case when a new token eligible for custom
embedding gets created. Also if the token with custom embedding becomes damaged
by user's typing then the custom embedding will be lost so the clients must
recreate it.

4) If more than one embedding exist for a single token the one of the embeddings
must be used for syntax coloring purposes. As there is not yet a usecase where
there would be more than one custom embeddings the solution can be that the
syntax coloring will use custom embedding if one exists otherwise it will use
default embedding.

Comment 1 Miloslav Metelka 2006-11-28 14:41:47 UTC

The following attached diff contains implementation of this request. There are
the following changes:
1. Extracted TokenHierarchyEvent.Type inner enum into TokenHierarchyEventType
top-level enum for better readability.

2. Adding TokenSequence.createEmbedding() method was added for creation of a
custom embedding. New TokenHierarchyEventType.EMBEDDING value fired after the
custom embedding creation.

3. Affected offset area information affectedStartOffset() and
affectedEndOffset() extracted from TokenChange to TokenHierarchyEvent because
it's more useful and clear for the clients of these methods - e.g. the syntax
coloring will just query these offsets without digging into the (possibly
embedded) token change(s).

4. Removed tokenComplete parameter from LanguageHierarchy.embedding() because
it's currently unused and the token incompletness will be handled in a different
way in the future (see also issue 87014).

5. Swapped order of <code>token</code> and languagePath parameters in
LanguageProvider to be in sync with LanguageHierarchy.embedding().

6. LanguageEmbedding is now a final class (instead of abstract class) with
private constructor and static create() method. That allows better control over
the evolution of the class and it also allows to cache the created embeddings to
save memory.

7. LanguageEmbedding is now generified with the LanguageEmbedding<T extends
TokenId> which is a generification of the language which it contains.

8. TokenHierarchy.languagePaths() set contains all language paths used in the
particular token hierarchy. TokenHierarchyEventType.LANGUAGE_PATHS is          
 fired after change of the language paths set.

Comment 2 Miloslav Metelka 2006-11-28 14:42:36 UTC

Created attachment 36320 [details]
Diff of the change

Comment 3 Miloslav Metelka 2006-11-28 15:21:08 UTC

Marking for fasttrack review.

Comment 4 Jesse Glick 2006-11-28 22:34:18 UTC

BTW "diff -u" is generally more readable than "diff -c", especially in an
enormous patch like this one. Easiest to append "diff -u" to your ~/.cvsrc file.

Comment 5 Miloslav Metelka 2006-12-04 15:51:19 UTC

Created attachment 36454 [details]
List of committed files

Comment 6 Miloslav Metelka 2006-12-04 15:52:09 UTC

Committed into trunk.

Comment 7 Jesse Glick 2006-12-04 17:05:25 UTC

Uh, did you mean M6?

Comment 8 Miloslav Metelka 2006-12-04 19:39:53 UTC

Sorry, I've meant M6. Thanks, Jesse.