Context dependent syntax highlighting in vs code, sublime or atom - visual-studio-code

I'm developing a language to write down mathematical proofs for proof checking. (To be more accurate: it is a language to write down derivations in Post canonical systems)
Is it possible to create syntax highlighting for VS Code, Sublime Text or Atom which will do something like this:
So I'm asking for context dependent syntax highlighting. Only the "declared" variables should be highlighted (until the next variable declaration).
Of course, with a custom lexer in Pygments it is possible. But the TextMate-like grammars seem to be too limited. If this is the case: Is there a code editor that can use something like Pygments to highlight code?
Remark:
It isn't important what happens if one variable is an initial segment of another.
The semantical content of the example is nonsense, of course.

Related

How to write diff code with syntax highlight in Github

Github supports syntax highlight as follows:
```javascript
let message = 'hello world!'
```
And it supports diff as follows: (but WITHOUT syntax highlight)
```diff
-let message = 'hello world!'
+let message = 'hello stackoverflow!'
```
How can I get both 'syntax hightlight' AND 'diff' ?
No, this is not a supported feature at this time.
GitHub documents their processing of lightweight markup languages (including Markdown, among others) in github/markup. Note step 3:
Syntax highlighting is performed on code blocks. See github/linguist for more information about syntax highlighting.
If we follow that link, we find a list of grammars that Linguist uses to provide syntax highlighting on GitHub. Linguist can only apply one of the grammars in that list to a block of code at a time. Of course, one of the grammars is Diff. However, that grammar knows nothing about the language of code being diffed, so you don't get syntax highlighting of that.
Of course, there are other languages which are often combined. For example, HTML is often included in a templating language. Therefore, in addition to the HTML grammar, we also find grammars for HTML+Django, HTML+ECR HTML+EEX, HTML+ERB, and HTML+PHP. In each case, the single grammar is aware of two languages. Both the specific templating language and the HTML which is interspersed within the template.
To accomplish the same thing with a diff, you would need a separate "diff" grammar for every single language listed. In other words, the number of grammars would double. Of course, a way to avoid this might be to treat diff differently. When diff is specified, they could run the block through the syntax highlighter twice, once for diff and once for the source language. However, at least when processing code blocks in lightweight markup languages, they have not implemented such a feature.
And if they ever were to implement such a feature in the future, it would likely be more complicated that simply running the code block through twice. After all, every line of the diff has diff specific content which would confuse the other language grammar. Therefore, every grammar would need to be diff aware, or each line would need to be fed to the grammar separately with the diff parts removed. The problem with the later is that the grammar would not have the context of each line and is more likely to get things wrong. Whether such a solution is possible is outside this cope of this answer, but the point is that it is reasonable to expect that such a feature would be much lower priority to support due to the complexity involved.
So why does GitHub do syntax highlighting in other places on its website? Because, in those cases, it has access to the two source files being diffed and it generates the diff itself. Each source is first highlighted (avoiding the complexity mentioned above), then the diff is created from the two highlighted source files. However, a diff included in a Markdown code block is already a diff when GitHub first sees it. There is no way for them to highlight the pre-diff code first. In other words, the process they currently use would not be transferable to supporting the requested feature.
You would need to post-process the output of the git diff in order to add syntax highlighting for the right language of the file being diff'ed.
But since you are asking for GitHub, that post-processing is not in your control, and is not provided by GitHub at the moment in its GFM (GitHub Flavored Markdown Spec).
It is supported for source files, in a regular diff like this one or in a PR: GitHub does the syntax highlighting of the two versions of the file, and then computes the diff.
It is not supported in a regular markdown fenced code block, where the +/- of a diff would throw off the syntax highlighting engine, considering there is no "diff" operation done here (just the writer trying to add diff +/- symbols)

How to highlight my own syntax in Emacs?

I am developing my own Domain Specific Language (DSL) and the filename extension is .xyz.
Emacs doesn't know how to highlight syntax in .xyz files so I uausally turn on typescript-mode or json-mode. But the available syntax highlight mode is not good enough for me, so I am considering writing my own syntax highligher for Emacs editor. Any tips on this task? Any toolkit recommendation?
Alternatively, I would be happy with any available mode that highlights common keywords such as class, string, list, variables before =sign and after # sign, braces {}, brackets [], question mark ? and exclamation mark !. Any existing languages have similar syntax?
I am not color-blind and not picky on colors. Any syntax highligher that highlights above syntax can solve my problem.
If you are satisfied with simple syntax highlighting for keywords and comments only, there is a helper for this called define-generic-mode, which is documented in the elisp manual.
Some examples of using it can be found in generic-x.el distributed with Emacs.
But highlighting of variable names is not covered by this. For that, you need to be able to parse the DSL using semantic/bovine, as whether a particular string is interpreted as a variable name depends on context, and not just simple regexp matching.

Is it possible/easy to build a VS Code extension that does syntax highlighting with a lexer?

I am building an experimental lexer generator and I think it would be cool to output simple syntax highlighters for VS Code. The input grammar goes through the classic regular language -> NFA -> DFA transformation, then generates state machine code (it also has some unconventional features to support nested languages). Converting all this back into tmlanguage definitions is a complicated problem, and I'm starting to wonder if a VS Code extension is a better option. The question is:
Are VS Code syntax highlighting internals completely tied to the tmlanguage regex scanner, or would it be possible to write an extension that provides tokens / highlight ranges programmatically?
Is there an API that would make this reasonably straightforward, or would this project be a tour de force?
As of VSCode 1.15, you have to use textmate grammars for syntax highlighting. There's an feature request open that tracks what you are after: https://github.com/Microsoft/vscode/issues/1967

Notepad++ and autocompletion

I'm using mainly Notepad++ for my C++ developing and recently i'm in need for some kind of basic autocompletion, nothing fuzzy, just want to type some letters and get my function declaration instead of having a manual opened all of the time..
The integrated autocompletion feature of my Notepad++ version (6.9.2) gives the declaration of basic C functionality like say fopen and parses my current file user defined functions, but without declaration.
I guess it's normal for a text editor to not give easily such information since it has nothing to parse i.e. other files where your declarations are (as it's not an IDE), but i don't want either to mess again with MSVC just for the sake of autocomplete.
Is there an easy, not so-hackish way to add some basic C++ and/or user defined autocomplete?
UPDATE
Adding declarations the "hard way" in some file cpp.xml is a no-no for me as i have a pretty big base of ever changing declarations. Is there a way to just input say some list of h/cpp files and get declarations? or this falls into custom plugin area ?
Edit the cpp.xml file and add all the keywords and function descriptions you'd like. Just make sure you add them in alphabetical order or they will not show up.
Another option is to select Function and word completion in the Auto-Completion area of the Settings-->Preferences dialog. NPP will suggest every "word" in the current file that starts with the first N letters you type (you choose a value for N in the Auto-Completion controls).

Emacs Mode for a c-like language

I'm trying to write a new emacs mode for a new template c-like language, which I have to use for some academic research.
I want the code to be colored and indented like in c-mode, with the following exceptions:
The '%' is not used as an operator, but as the first character in some specific keywords (like: "%p", "%action", etc.)
The code lines do not end with a semicolon.
Is it possible to create a derived-mode (from c-mode) and set it to ignore the original purposes of '%' and ';'? Is it possible to make the feature of "automatic-indentation after pressing RET" work without ';'?
Are there similar modes for similar languages (with '{}' brackets, but without semicolons) that I could try to patch?
Should I try to write a major mode from scratch?
I thought about patching the R-mode from http://ess.r-project.org/, but this mode does not support comments of the form "/* comment */".
The most important feature that I'm looking for is the brackets-indentation, i.e. indenting code inside a '{}' block after pressing RET (and without the extra-indentation after lines that do not end with ';'). Partial solutions will help too.
More generally, CC-mode has been extended and generalized over time to accomodate ever more languages, and the latest CC-mode is supposed to be reasonably good at isolating the generic code from the language-specific code. So take a look at some of the major modes that use CC-mode (e.g. awk-mode), and get in touch with CC-mode's maintainer who will be able to help you figure out hwo to do what you want.
If you don't mind something really simple, you can look at Gosu mode. Gosu is a language that has curly braces and no semi-colons, so you should be all set for your minimum. It also uses the same comment syntax as C.
The implementation of the mode for it is really simple and based on generic mode, so modifying it to work the way you want should be easy. It is not based on C-mode.
This is what I used to make a mode for the language I was working on for my compilers class, and it was really easy even with limited elisp experience. On the other hand, the indentation is fairly simple--it works for most code, but is not as complete as C-mode's.
Check out arduino-mode: https://github.com/bookest/arduino-mode/blob/master/arduino-mode.el
It is a C based mode that uses the cc-mode features to quickly create something very useful and unique to arduino programming. Using this as a simple template should help a lot.