End of file <EOF> not defined in Scala grammar file - scala

I am using Antlr V4 and grammar file to parse the entire code. The parser, lexer, baselistener are generated from the grammar file using Antlr jar. And then call the appropriate context to scan the entire file.
e.g. For java, the context used is compilationUnit; for C++, the context is translationunit; for Javascript the context is program etc. These context keywords for different languages are defined in their respective grammar file, wherever EOF(End of file) is declared.
e.g.
compilationUnit :packageDeclaration? importDeclaration* typeDeclaration* EOF;
This is few lines in Java8.g4 file where EOF is declared under compilationUnit. So compilationUnit is the context that should be used to scan the entire java file. Similar is the situation for other languages.
But similar context finding isn't possible for scala language. As no EOF is defined in Scala.g4 file. I am referring to scala grammar file from following link.
https://github.com/antlr/grammars-v4/tree/master/scala
All the grammar files for different languages that I use are from the same github page.
This anomaly in scala.g4 file leads to the question; is scala.g4 file not complete or erroneous? Or am I missing anything here?
Basic question is what's the context that should be used to scan entire scala file with the help of Antlr.

Actually, these grammars are developed and supported by an open-source community, not by official language developers.
Of course, some grammars can be incomplete or written in different styles. If you want you can add EOF token to Scala grammar by yourself and make a pull request.
It's also possible to add EOF token programmatically to any rule.

Related

How to write diff code with syntax highlight in Github

Github supports syntax highlight as follows:
```javascript
let message = 'hello world!'
```
And it supports diff as follows: (but WITHOUT syntax highlight)
```diff
-let message = 'hello world!'
+let message = 'hello stackoverflow!'
```
How can I get both 'syntax hightlight' AND 'diff' ?
No, this is not a supported feature at this time.
GitHub documents their processing of lightweight markup languages (including Markdown, among others) in github/markup. Note step 3:
Syntax highlighting is performed on code blocks. See github/linguist for more information about syntax highlighting.
If we follow that link, we find a list of grammars that Linguist uses to provide syntax highlighting on GitHub. Linguist can only apply one of the grammars in that list to a block of code at a time. Of course, one of the grammars is Diff. However, that grammar knows nothing about the language of code being diffed, so you don't get syntax highlighting of that.
Of course, there are other languages which are often combined. For example, HTML is often included in a templating language. Therefore, in addition to the HTML grammar, we also find grammars for HTML+Django, HTML+ECR HTML+EEX, HTML+ERB, and HTML+PHP. In each case, the single grammar is aware of two languages. Both the specific templating language and the HTML which is interspersed within the template.
To accomplish the same thing with a diff, you would need a separate "diff" grammar for every single language listed. In other words, the number of grammars would double. Of course, a way to avoid this might be to treat diff differently. When diff is specified, they could run the block through the syntax highlighter twice, once for diff and once for the source language. However, at least when processing code blocks in lightweight markup languages, they have not implemented such a feature.
And if they ever were to implement such a feature in the future, it would likely be more complicated that simply running the code block through twice. After all, every line of the diff has diff specific content which would confuse the other language grammar. Therefore, every grammar would need to be diff aware, or each line would need to be fed to the grammar separately with the diff parts removed. The problem with the later is that the grammar would not have the context of each line and is more likely to get things wrong. Whether such a solution is possible is outside this cope of this answer, but the point is that it is reasonable to expect that such a feature would be much lower priority to support due to the complexity involved.
So why does GitHub do syntax highlighting in other places on its website? Because, in those cases, it has access to the two source files being diffed and it generates the diff itself. Each source is first highlighted (avoiding the complexity mentioned above), then the diff is created from the two highlighted source files. However, a diff included in a Markdown code block is already a diff when GitHub first sees it. There is no way for them to highlight the pre-diff code first. In other words, the process they currently use would not be transferable to supporting the requested feature.
You would need to post-process the output of the git diff in order to add syntax highlighting for the right language of the file being diff'ed.
But since you are asking for GitHub, that post-processing is not in your control, and is not provided by GitHub at the moment in its GFM (GitHub Flavored Markdown Spec).
It is supported for source files, in a regular diff like this one or in a PR: GitHub does the syntax highlighting of the two versions of the file, and then computes the diff.
It is not supported in a regular markdown fenced code block, where the +/- of a diff would throw off the syntax highlighting engine, considering there is no "diff" operation done here (just the writer trying to add diff +/- symbols)

Go to implementation instead of TypeScript declaration

When I click an imported variable while holding Cmd on MacOS in VSCode (or Ctrl on other platforms), I often end up looking at the TypeScript declaration of that variable.
Is there any way to have VSCode take me to the definition of it instead?
I don't use TypeScript myself, so the feature isn't helpful to me right now.
Try Go to source definition
This command will try to jump to the original JavaScript implementation of a function/symbol, even for code under node_modules.
JavaScript is a very dynamic language though, so we can't figure out the source location in every case. If you aren't getting results for a common library, please file an issue against TypeScript so we can investigate adding support
For faster and more accurate results, libraries can bundle declaration maps that map from .d.ts files back to source .ts (or .js) files. However many libraries currently do not include these
I found a simple solution for this after a lot of searching.
You just need to add "typescript.disableAutomaticTypeAcquisition": true to your project's settings.json (or vscode's global settings).
This will disable the automatic generation of TypeScript definitions and restore the original "Jump to" behaviour of going to the implementation.
Source:
https://ianwalter.dev/jump-to-source-definition-instead-of-typescript-definition-in-vs-code (archive.org link)
The author provided the wrong instructions though (false when it should have been true so be careful when you read the post. Re-installing node modules was also not needed.
VSCode was updated to include a new option Go to Source Definition. If the ts source is available and ts is upgraded to > 4.7 and VSCode to > 1.67 it should work.
Many library authors do not include the ts source code unfortunately. The package often only consists of the compiled *.js files and the *.d.ts definition files. That makes this new feature of VSCode useless for these packages unfortunately.
This is the original issue:
https://github.com/microsoft/TypeScript/issues/6209
And this is an issue for feedback on the new feature.
https://github.com/microsoft/TypeScript/issues/49003
Implementation is bundled and transpiled ro javascript and vscode is not able to take you there but instead of it will take you to interface. You can search for references in javascript file or you can clone or form the repo to see the implementation in typescript.
As other answers have already stated,
Regardless of any of your tsconfig and whether the package you are requiring/importing things from provides type declaration files or whether you installed a Definitely Typed package for it or not, you can use the TypeScript: Go to Source Definition command to go to the symbol definition in the JS file. This functionality is provided TypeScript and the vscode.typescript-language-features extension (which is built-into / ships out-of-box with VS Code).
I thought I'd try to give more interesting information that other answers haven't covered yet for fun and profit curiosity's sake (and also explain why this "often" happens to you, but not always):
You can bind that command to a keybinding. It's keybinding command ID is typescript.goToSourceDefinition.
If the package you require or import packages its own type declaration files or you installed a community-maintained type declaration file from the Definitely Typed project, then ctrl+clicking / cmd+clicking into the require/import argument or putting the caret on it and invoking whatever the editor.action.revealDefinition command or editor.action.goToTypeDefinition are bound to (F12 by default for editor.action.revealDefinition) will take you to the type declaration by default.
If the package you require or import doesn't package its own type declarations and you didn't install a types package from the Definitely Typed project, and you modify your tsconfig or jsconfig to set allowJs: true and maxNodeModuleJsDepth: <N>, then ctrl+clicking / cmd+clicking into the require/import argument or putting the caret on it and invoking whatever the editor.action.revealDefinition command or editor.action.goToTypeDefinition command are bound to (F12 by default for editor.action.revealDefinition) will take you to the symbol's definition in the JS file by default (unless you already performed this action at a point when a type declaration file declaration types for symbol was available and have not since reloaded/restarted VS Code or edited your tsconfig/jsconfig file, because it will cache that association in memory (smells like minor bug, but ¯\_( ツ )_/¯)).
The editor.action.revealDeclaration keybinding seems to do nothing here (at the time of this writing). I guess that keybinding is more for languages like C and C++.
Some loosely related release notes sections and user docs (non-exhaustive list (I don't get paid to do this)):
https://code.visualstudio.com/docs/editor/editingevolved#_go-to-definition
https://code.visualstudio.com/updates/v1_13#_go-to-implementation-and-go-to-type-definition-added-to-the-go-menu
https://code.visualstudio.com/updates/v1_35#_go-to-definition-improvements
https://code.visualstudio.com/updates/v1_67#_typescript-47-support
In TypeScript's GitHub repo: Go To Source Definition feedback thread #49003
https://code.visualstudio.com/updates/v1_68#_go-to-source-definition
Quoting from that last one:
One of VS Code's longest standing and most upvoted feature requests is to make VS Code navigate to the JavaScript implementation of functions and symbols from external libraries. Currently, Go to Definition jumps to the type definition file (the .d.ts file) that defines the types for the target function or symbol. This is useful if you need to inspect the types or the documentation for these symbols but hides the actual implementation of the code. The current behavior also confuses many JavaScript users who may not understand the TypeScript type from the .d.ts.
While changing Go to Definition to navigate to the JavaScript implementation of a symbol may sound simple, there's a reason why this feature request has been open for so long. JavaScript (and especially the compiled JavaScript shipped by many libraries) is much more difficult to analyze than a .d.ts. Trying to analyze all the JavaScript code under node_modules would be both slow and would also dramatically increase memory usage. There are also many JavaScript patterns that the VS Code IntelliSense engine is not able to understand.
That's where the new Go to Source Definition command comes in. When you run this command from either the editor context menu or from the Command Palette, TypeScript will attempt to track down the JavaScript implementation of the symbol and navigate to it. This may take a few seconds and we may not always get the correct result, but it should be useful in many cases.
See also: https://www.typescriptlang.org/docs/handbook/release-notes/typescript-4-7.html#go-to-source-definition.

how to link multiple lex generated c file?

I write 3 .lex files for 3 different format file parsing, after generating scanner c code, I need to build these 3 file into a single executable but it failed for reason like “multiple definition of 'yy_switch_to_buffer(...)'”, “multiple definition of 'yytext'”, ...
How to solve this?
(Answered by the OP in a comment. Converted to a community wiki answer to suit the Q&A style of Stackoverflow.)
The OP wrote:
Use the %option prefix=“zz” to replace the default functions with zz-prefixed macros. Also ylwrap in autotools seems only handle scanner C file named lex.yy.c, so %option outfile=“lex.yy.c” is needed too.

Generate syntactically correct sentences from an Antlr grammar

I have an Xtext/Antlr grammar that parses a subset of coffeescript. I have some test cases, but I thought of doing another sort of test:
Generate random, syntactically correct snippets from my Antlr grammar
Feed these snippets to the original coffeescript parser (calling coffee -ne "the sentence")
Check if each sentence is parsed by coffeescript
Thus I could ensure that my parser accepts a proper subset, and it's not too permissive in some cases. Now, I am stuck with the first step. How can I generate sentences from my Antlr grammar (which also makes heavy use of syntactic predicates)? So I'm interested in the opposite of parsing a sentence.
I found some related attempts, but the answers are not using Antlr at all, but a custom grammar in python, or in clojure, or in ruby. I'd prefer a working solution rather than a hint about how it could be implemented.
No, you can't do this. If you look at the code that ANTLR compiles into, you can see that it's only a recognizer, not a generator.
The links you provided are your best bet -- take your ANTLR grammar, strip out all the rules to make it into a formal grammar, and then try to run it through one of those programs.
Or if your coffeescript subset is very small, you could take the approach of generating strings of random tokens and throwing away all the strings that don't parse.

Xtext: grammar for language with significant/semantic whitespace

How can I use Xtext to parse languages with semantic whitespace? I'm trying to write a grammar for CoffeeScript and I can't find any good documentation on this.
Here's an example whitespace sensitive language in XText
AFAIK, you can't.
In case of parsing Python-like languages, you'd need the lexer to emit INDENT and DEDENT tokens. For that to happen, you'd need semantic predicates to be supported inside lexer rules (Xtext's terminal rules) that would first check if the current-position-in-line of the next character int the input equals 0 (the beginning of the line) and is a ' ' or '\t'.
But browsing through the documentation, I don't see this is supported by Xtext at the moment. Since Xtext 2.0, support has been added for semantic predicates in production rules (see: 6.2.8. Syntactic Predicates), but not in terminal rules.
The only way to do this with Xtext would be to let the lexer produce terminal spaces and line-breaks, but this would make an utter mess of your production rules.
If you want to parse such a language using Java (and a Java oriented parser generator) I'd recommend ANTLR, in which you can emit such INDENT and DEDENT tokens quite easily. But if you're keen on Eclipse integration, then I don't see how you'd be able to do this using Xtext, sorry.
Version 2.8 of Xtext comes with support for Whitespace-Aware Languages. This version ships with the "Home Automation Example" that you can use as a template.
For people interested in CoffeeScript, Adam Schmideg has an Eclipse plugin that uses XText.
For people interested in parsing Python-like DSL's in XText, Ralf Ebert's code for Todotext mentioned above is no longer available from Github but you can find it in the Eclipse test repository. See the original thread about this work and the Eclipse issue that was raised about it.
I have been playing with this code today and my conclusion is it no longer works in the current version of XText. When XText is used in Eclipse, I think it does "partial parsing". This is not compatible with the stateful lexer you need to process indentation sensative languages. So I suspect even if you patch the lexer, the Eclipse editor does not work. In the issue, it looks like Ralf proposed patches to address these issues, but looking into the XText source, these changes seem long gone? If I am wrong and someone can get it to work, I would be very interested?
There is a different implementation here but I cannot get that to work with the current version of XText either.
Instead I have switched to parboiled which does supports indentation based grammars out the box.