How to prioritize rules in VS Code language extensions - visual-studio-code

I am trying to create a language extension for VS Code. The comments in this language are single-line comments and start with a semicolon, like this
command ;comment
for this, I put the following into the repository section of my tmLanguage.json:
"comments": {
"name": "comment.lang",
"begin": ";",
"end": "\n"
}
and include it in the patterns section
{
"include": "#comments"
}
this works, so far. Now on top of that, the language also features special blocks, which start with ";!" and with ";;" respectively. Those I want to be treated differently:
"magicString": {
"name": "magicString.lang",
"begin": ";!",
"end": "\n"
},
"commentHeader": {
"name": "commentHeader.lang",
"begin": ";;",
"end": "\n"
},
Again, I include them in the patterns section
{
"include": "#magicString"
},
{
"include": "#commentHeader"
}
Now the obvious problem is that those two start exactly like a comment. As a consequence, they seem to be recognized and treated as comments. The scope inspector confirms that the tokens are indeed handled as "comment.lang".
How can I get around this? Is there a way to prioritize one rule above another? I looked up the topic in the TextMate documentation, but I don't get it. I tried specifying in the begin regex the number of semicolon repetitions -- I thought this should work but it does not.
"magicString": {
"name": "magicString.lang",
"begin": ";!",
"end": "\n"
},
"commentHeader": {
"name": "commentHeader.lang",
"begin": ";{2}",
"end": "\n"
},
"comments": {
"name": "comment.lang",
"begin": ";{1}",
"end": "\n"
}

A simple solution in your case would be to make sure the three begin regular expressions are mutually exclusive. For example, you could change your repository patterns as follows:
"magicString": {
"name": "comment.magic.lang",
"begin": ";!",
"end": "\n"
},
"commentHeader": {
"name": "comment.header.lang",
"begin": ";;",
"end": "\n"
},
"comments": {
"name": "comment.lang",
"begin": ";[^;!]",
"end": "\n"
}
Note the regex for comments: ;[^;!] means "a ; character that is not followed by a ; or ! character".
(The change in scope names in the above snippet is not directly related to your question. It is just my impression of what would be considered better practice, although I caution that I am a complete beginner on TextMate.)

Related

converting vim syntax highlighting into vscode syntax highlighting

I've looked around - and could not find a way to automatically do this. so:
I have some syntax highlighting I built in vim I want to transfer over to vscode. and I'm getting stuck on at least 2 parts.
so far here's where I'm at: I've build a vscode language extension - set up some basic syntax rules, and have that copied over to the vscode config folder.
the parts I'm having trouble with - I could use some clarity in what some fields mean - naming conventions.
and nested parsing of syntax, things only appearing in other elements.
below is the bit I added on top of the vim-markdown syntax.
syntax region spokenWord start=/\v"/ skip=/\v\\./ end=/\v"/ contained
syntax region thoughtWord start=/\v'/ skip=/\v\\./ end=/\v'/ contained
syntax region codeWord start=/\v`/ skip=/\v\\./ end=/\v`/ contained contains=objkw,spokenWord,thoughtWord,action,description,executekw
syntax region action start=/\v*/ skip=/\v\\./ end=/\v*/ contained
syntax region description start=/\~/ skip=/\v\\./ end=/\~/ contained
syntax match executekw "[e][x][e]\s" contained
syntax match objkw "[.][\w]+" contained
syntax region BLOCK start=/{/ skip=/\s+/ end=/}/ contains=spokenWord,thoughtWord,codeWord,action,description
highlight link spokenWord String
hi thoughtWord ctermfg=red
hi codeWord ctermfg=gray
highlight link action function
highlight link description Statement
hi BLOCK guibg=#FF00FF ctermfg=magenta cterm=bold guifg=#00FF00
hi exepm ctermfg=green
hi objP ctermfg=red
hi fntk ctermfg=blue
hi fnnm ctermfg=130
hi executekw ctermfg=130
hi objkw ctermfg=130
which results in this look:
what I have so far for the vs code syntax is as follows:
{
"$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
"name": "MDX",
"patterns": [
{
"include": "#keywords"
},
{
"include": "#strings"
},
{
"include":"#thought"
},
{
"include":"#action"
},
{
"include":"#description"
},
{
"include":"#code"
},
{
"include":"#block"
},
{
"include":"#object"
}
],
"repository": {
"keywords": {
"patterns": [{
"name": "keyword.control.markdownextened",
"match": "\\b(EXE|IF|WHILE|FOR|RETURN)\\b"
}]
},
"strings": {
"name": "string.quoted.double.markdownextened",
"begin": "\"",
"end": "\"",
"patterns": [
{
"name": "constant.character.escape.markdownextened",
"match": "\\\\."
}
]
},
"thought":{
"name": "thought.quoted.single.markdownextened",
"begin": "'",
"end": "'",
"patterns": [
{
"name": "constant.character.escape.markdownextened",
"match": "\\\\."
}
]
},
"action":{
"name": "action.asterisk.markdownextened",
"begin": "*",
"end": "*",
"patterns": [
{
"name": "constant.character.escape.markdownextened",
"match": "\\\\."
}
]
},
"description":{
"name": "action.tilde.markdownextened",
"begin": "~",
"end": "~",
"patterns": [
{
"name": "constant.character.escape.markdownextened",
"match": "\\\\."
}
]
},
"code":{
"name": "action.grave.markdownextened",
"begin": "`",
"end": "`",
"patterns": [
{
"name": "constant.character.escape.markdownextened",
"match": "\\\\."
}
]
},
"block":{
"name": "action.braces.markdownextened",
"begin": "{",
"end": "}",
"patterns": [
{
"name": "constant.character.escape.markdownextened",
"match": "\\\\."
}
]
},
"object":{
"name": "action.object.markdownextened",
"patterns": [
{
"name": "action.object.markdownextened",
"match": "/[.][\\w]/"
}
]
}
},
"scopeName": "source.markdown"
}
I could not find a guide anywhere on converting vim syntax highlighting into vscode syntax highlighting. I'll be reading though the documentation until I figure this out - but would love some help!
Are you wanting to do a one off conversion
or a tool to continuously convert a large amount of highlighters?
I doubt there will be many, if at all any, large public vim to vscode converters tools
I am going to assume your vscode example above is working
and you have read up on pages like
Your First Extension Visual Studio Code Extension
Syntax Highlight Guide Visual Studio Code Extension
Writing a TextMate Grammar Some Lessons Learned
oniguruma/RE at master kkos/oniguruma
I would highly suggest looking at other people's highlighters Where are extensions installed?
and download a TextMate syntax highlighter
TextMate Languages or (my own) Text Mate Language Syntax Highlighter
Colours are defined by your current theme
Your theme defines colours based on the scopename that you give to each token in your highlighter

matching begin and pattern without end with tmLanguage

I'm trying to define a language using tmLanguage for syntax highlighting in vscode. I have the following rule.
"sexp": {
"name": "entity.sexp",
"patterns": [
{"include": "#list_of_sexp"},
{"include": "#atom"}
]
}
Is it possible to have a comment rule that matches sexp prefixed with a ";"? I'm not sure what to put in "end".
"comment": {
"name": "comment.sexp",
"begin": ";",
"end": ??,
"patterns": [{ "include": "#sexp" }]
}
I ended up solving this with a positive lookahead regex in “end”.

How do you isolate the scope of trailing text after a scope-match within vscode syntax grammars

Im attempting to write a syntax grammar but have ran into an issue when attempting to properly scope text after a block comment:
/*
block comment
*/ troublesome text
With the following 'pattern', troublesome text is scoped as invalid.illegal.mircscript, comment.block.mircscript when instead it should be scoped as just invalid.illegal.mircscript
{
"name": "comment.block.mircscript",
"begin": "^\\x20*/\\*",
"end": "^\\x20*\\*/(\\x20*\\S.*$)?"
"endCaptures": {
"1": { "name": "invalid.illegal.mircscript" }
}
}
But if I split the pattern, troublesome text doesn't get matched/scoped at all:
{
"patterns": [
{
"name": "comment.block.mircscript",
"begin": "^\\x20*/\\*",
"end": "^\\x20*\\*/"
},
{
"name": "invalid.illegal.mircscript",
"match": "\\G(?<=\\*/)\\x20*\\S.*$"
}
]
}
How do I go about excluding the trailing text from the comment.block.mircscript while still matching directly after a block comment to scope it for invalid.illegal.mircscript?
The scoping is because your entire rule has "name": "comment.block.mircscript" so anything it matches (including begin/end) will have that scope.
To avoid this, you can omit the top level name and instead use contentName while explicitly setting scopes on the begin and end captures:
{
"contentName": "comment.block.mircscript",
"begin": "^\\x20*/\\*",
"beginCaptures": {
"0": "comment.block.mircscript.begin"
},
"end": "(^\\x20*\\*/)(\\x20*\\S.*$)?"
"endCaptures": {
"1": { "name": "comment.block.mircscript.end" },
"2": { "name": "invalid.illegal.mircscript" }
}
}
contentName only sets the scope of the content matched inside of a begin/end rule, while excluding the begin/end matches themselves

Visual Studio Code Syntax HighLighting tmLanguage.json

I'm working on my first compiler as a bit of training project. I'd also like to create a small syntax highlighting project.
Looking at the default tmLanguage file, it's unclear to me what triggers a color. For example, I see the string type does in fact trigger string coloring when I debug, but what causes this? The 'strings' name of the repo? How does that connect to the coloring theme? Where can I see a list of names for default themes, etc.?
Looking at the examples, they seem to jump over a lot of info, so I'm not sure where to start on some things.
{
"$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
"name": "N",
"patterns": [
{
"include": "#keywords"
},
{
"include": "#strings"
}
],
"repository": {
"keywords":
{
"patterns":
[{
"name": "keyword.control.n",
"match": "\\b(if|while|for|return)\\b"
}]
},
"strings":
{
"name": "string.quoted.double.n",
"begin": "\"",
"end": "\"",
"patterns": [
{
"name": "constant.character.escape.n",
"match": "\\\\."
}
]
}
},
"scopeName": "source.N"
}

TextMate Grammar -- precedence of rules

I'm trying modify syntax highlighting for CSharp language, so I will get syntax highlighting for SQL in C# string. TextMate has support for embeded languages, so this seems possible.
I build on csharp.tmLanguage.json and I would like to be able to enable embeded SQL with special comment before string like
string query = /*SQL*/ $#"SELECT something FROM ..."
Thanks to TextMate's Language Grammars and Introduction to scopes I came up with this JSON rule
"repository": {
"embeded-sql": {
"contentName": "source.sql",
"begin": "/\\*\\s*SQL\\s*\\*/\\s*\\$?#?\"",
"end": "\";",
"patterns": [
{
"include": "source.sql"
}
]
},
...
}
And thanks to VSCode's Themes, Snippets and Colorizers and Running and Debugging Your Extension I was able to test, that this rule works.
But I have one problem, which I'm unable to solve.
My grammar rule works only if signifficant portion of csharp rules are disabled, If I disable all #declarations and #script-top-level, embeded SQL works:
Otherwise, my rule is overridden by csharp rules like
punctuation.definition.comment.cs
string.quoted.double.cs
comment.block
etc.
The problem is, that my rule works on several language elements and the csharp definition wins on targeting these elements.
On which basis are the elements tagged? How to write my rule, so it will win and tag that syntax before other language rules? Is there any algorithm for calculating weight of rules?
Solution
If you cannot hijack comment syntax in csharp, lets us work with comment in SQL. I made a rule enabled by -- SQL comment and I applied this to verbatim string. Now it works but the styles are sometimes mixed with string. Needs some additional improvements, but looks promising.
The rule that proves to work goes like this
"embeded-sql": {
"contentName": "source.sql",
"begin": "--\\s*SQL",
"end": "(?:\\b|^)(?=\"\\s*;)",
"patterns": [
{
"include": "source.sql"
}
]
},
Now I would like to enable Intellisense and error checking in such embedded language.
The rules in the patterns list are matched in order.
Your rule appears like a specialisation of comment, so you can put it just before the comment.block.cs
"comment": {
"patterns": [
{
"contentName": "source.sql",
"begin": "(/\\*\\s*SQL\\s*\\*/)\\s*\\$?#?\"",
"beginCaptures": {
"1": {
"patterns": [
{
"include": "#comment"
}
]
}
},
"end": "\";",
"patterns": [
{
"include": "source.sql"
}
]
},
{
"name": "comment.block.cs",
"begin": "/\\*",
"beginCaptures": {
"0": {
"name": "punctuation.definition.comment.cs"
}
},
"end": "\\*/",
"endCaptures": {
"0": {
"name": "punctuation.definition.comment.cs"
}
}
},
...
The snapshot is done on a language my which is just a copy of c# json plus your sql embedding.