There are usecases that require the dynamic language embedding creation for the
1) A java string literal may contain a e.g. a SQL statement text. As there is a
default embedding that recognizes escaped characters e.g. "\n" then this allows
for more than one embedding presence for a single token (the default embedding
should be available as well as there may be clients relying on its presence).
2) A <script> html tag does not need to specify the type of the scripting
language and the default language may be overriden by
<META http-equiv="Content-Script-Type" content="type">
Although the lexer could in theory recognize such declaration and store the
content-type in the state object of each token that follows the declaration it
is non-practical as recognizing the declaration above is a task that should be
reserved for a html parser.
3) A script that follows the <script> tag may be written as comment surrounded by
and there may be no default embedding for the comment so the parser must request
explicit embedding creation for the comment token.
1) API method must be added to TokenSequence for custom embedding creation.
2) Notification model must be extended so that clients (e.g. syntax coloring)
may notice creation of new embedding.
3) Clients must also listen for the case when a new token eligible for custom
embedding gets created. Also if the token with custom embedding becomes damaged
by user's typing then the custom embedding will be lost so the clients must
4) If more than one embedding exist for a single token the one of the embeddings
must be used for syntax coloring purposes. As there is not yet a usecase where
there would be more than one custom embeddings the solution can be that the
syntax coloring will use custom embedding if one exists otherwise it will use
The following attached diff contains implementation of this request. There are
the following changes:
1. Extracted TokenHierarchyEvent.Type inner enum into TokenHierarchyEventType
top-level enum for better readability.
2. Adding TokenSequence.createEmbedding() method was added for creation of a
custom embedding. New TokenHierarchyEventType.EMBEDDING value fired after the
custom embedding creation.
3. Affected offset area information affectedStartOffset() and
affectedEndOffset() extracted from TokenChange to TokenHierarchyEvent because
it's more useful and clear for the clients of these methods - e.g. the syntax
coloring will just query these offsets without digging into the (possibly
embedded) token change(s).
4. Removed tokenComplete parameter from LanguageHierarchy.embedding() because
it's currently unused and the token incompletness will be handled in a different
way in the future (see also issue 87014).
5. Swapped order of <code>token</code> and languagePath parameters in
LanguageProvider to be in sync with LanguageHierarchy.embedding().
6. LanguageEmbedding is now a final class (instead of abstract class) with
private constructor and static create() method. That allows better control over
the evolution of the class and it also allows to cache the created embeddings to
7. LanguageEmbedding is now generified with the LanguageEmbedding<T extends
TokenId> which is a generification of the language which it contains.
8. TokenHierarchy.languagePaths() set contains all language paths used in the
particular token hierarchy. TokenHierarchyEventType.LANGUAGE_PATHS is
fired after change of the language paths set.
Created attachment 36320 [details]
Diff of the change
Marking for fasttrack review.
BTW "diff -u" is generally more readable than "diff -c", especially in an
enormous patch like this one. Easiest to append "diff -u" to your ~/.cvsrc file.
Created attachment 36454 [details]
List of committed files
Committed into trunk.
Uh, did you mean M6?
Sorry, I've meant M6. Thanks, Jesse.