The project I'm working on has a custom file format, with a pre-defined structure. The structure is really simple and generic (and I cannot change it): it is composed by (nested) commands and typed properties.
Using this structure, several dialects have been created. The dialects are an "instantiation" of the generic grammar, and specify the name and the meaning of commands and the expected properties.
I created a model with EMF for one of these dialects, and I would like to reuse XText to easily create a professional text editor and be able to read and write my model into the correct format.
Now I have a choice. On one side, I can directly target the dialect, and mix in the same grammar the concepts from the custom file structure and those from the dialect. On the other side, I can create a grammar describing the file structure, and on top of this I can describe my dialect.
Which way I should follow? I think that the latter is the best one, but how can I create a grammar describing those two layers?
Xtext allows extending existing languages: in the head of the grammar you could specify a parent grammar, that gets inherited.
For an example, see the Domain model example from Xtext 2.0, that extends the XBase language:
grammar org.eclipse.xtext.example.domainmodel.Domainmodel with org.eclipse.xtext.xbase.Xbase
Every grammar element can be replaced by new syntax; new validation can be added, etc. See the following blog posts for further ideas: http://koehnlein.blogspot.com/2011/07/extending-xbase.html
You can use the same approach: create a base language, then extend them for your various dialects.
Related
I want to ask the advantages mps and xtext have over each other and the main features when writing a language. I know when working with mps you are directly editing the AST and xtext uses a parser. I have read an advantage of using a AST allows for multiple languages to be extended for the language you are making, I don’t really understand what this means, could this be explained further and why would someone want to extend multiple language ?
Also i have read that the AST cut out ambiguous code, how does it do this?
I know that both MPS and xtext have features like underlining and highlighting code is their any other feature relating to code validation ?
Any other main differences and general feature of them are welcome ?
I have no practical experience with Xtext, so I will talk mostly about MPS.
LWB
Both Xtext and MPS are language workbenches, so they have their own schema used to metamodel the abstract syntax (structure of concepts), some way to define the concrete syntax (notations) and some way to define generators (M2M or M2T transformations) or less usually interpreters. Then they have provide the IDE itself with highlighting, smart actions like refactoring and contextual error fixes, advanced search and navigation (go to declaration etc.), checking for errors (type errors, static code analysis, checking of defined constraints & rules, checking cardinality, dataflow analysis), ... So yes, lots of options for validation. I have mentioned things, that are in MPS, not sure if Xtext provides everything. However, all of these features are organised in so-called aspects which you can check out in a summary table which shortly describes each aspect.
Projectional editor
As you have mentioned, MPS uses a projectional editor. You directly manipulate the AST, parser-based post-IntelliJ smart IDEs are able to provide you with intelligent actions like recaftoring and go to declaration etc. only because they parse the language in memory and construct an AST behind the scenes anyway. Projectional editors skip the parsing step.
Dodging ambiguity
It uses no parser at all, so all of the downsides of having a parser are gone. First of all, the language developer does not need to be an expert in syntax analysis, so you don't need to hire them specifically. But the best win is to have infinite language composability. This is achieved, as you have mentioned, by totally avoiding ambiguities which could appear in grammars (MPS does not use a grammar, but a model). Let's say you use language A and language B. For demonstration, let's say both languages extend BaseLanguage (abbr. BL, the MPS-equivalent of Java) and they have both defined a statement to log. Concept a logs to stderr and b logs to a file. However, both a and b have an identical concrete syntax (i.e. editor definition in MPS) which just says log. Now if you had a parser and it encounters the token log it cannot decide from which language the concept is, so it's ambiguous - not even a look-ahead parser can do it. In a projectional editor this cannot happen, because only the projection is identical and under the hood the AST has an instance of either a or b (you can think of it as always using the whole FQN of a class in Java, just the package is hidden in the IDE, so you can use identically named classes from different packages). The "ambiguity" is resolved at the time of writing by the user: when he writes log a dropdown menu appears clearly showing that one of them is a and the other is b (maybe even shows a description which would say "Log to file" / "Log to stderr").
Modularity
Consequently, MPS has very good modularity, composability and extensibility of languages. You have mentioned
allows for multiple languages to be extended for the language you are making [...]
why would someone want to extend multiple language
You need to differentiate between using a language and extending it
(if you are interested more Völter talks about 4 kinds of composition techniques regarding languages: referencing, extension, reuse and embedding). Using a lanugage is just the ability to write programs in it. If you extend a language, it's kind of like inheritance, you add new concepts to it, f.e. create a new type of Java (BL) statement. And it has been done in the standard languages shipped with MPS too. You have for example the checkedDots language which extends BL with an operation .? which is null-safe (similarly to null-conditional operator ?. in C#). So why extend a language? Because you can use new constructs, add new functionality or syntactic sugar. Another ready-to-use language in BL is the tuples language, which has both indexed and named tuples. Then there is the collections language, which kind-of replaces the Java Stream API. All of these little languages are extensions which you can start using with a simple Ctrl+L. You could also embed another language to your language - use a regex inside an SQL statement inside your Java code.
Generation
Another kind of language dependency in MPS is to have a "generation target" language. Generators in MPS work in a way that you transform your language sentence (i.e. model) into another MPS language. You can invent your own little language, or implement LOLcode and setup the generator to transform it into valid Java code. However, this language must already exist in MPS, so you cannot generate it to Python, if there is no Python implementation in MPS. The other alternative is to generate text (M2T), this way you could theoretically generate Python source code, or just print the LOLcode as-is.
Multiple notations
The second great difference in projectional vs. parser-based editors is that the latter inherently supports only textual notation. Maybe there are some external tools you can use. On the other side, MPS provides textual, tabular, symbolic (math symbols) and graphical (diagrams) notations. There is a possibility to swap your view from one notation to another, per concept or for the whole "file" (program).
Drawbacks
It's not all roses though. Projectional editors have some limitations, or challenges to solve. There is an analysis of challenges in projectional editors which points out mainly usability and infrastructure integration. They are mostly solved in MPS, f.e. regarding infrastructure you have a good VCS diff/merge tool. For automatic/cmd builds there is a language that generates Ant. Gradle or Maven does not work with MPS directly, but through Ant. Regarding usability "MPS takes a
while to get used to, but then its usability is comparable to ParEs."3 You should use a language called GrammarCells (available through MPS-extensions or mbeddr.platform) which makes it easy to build good editors (mainly for arithemtic expressions), otherwise by default you must enter concepts in prefix order (+ first, not the number). Comments in MPS cannot be placed willy-nilly. Cannot establish references to non-existing nodes... (see the Table 1. in 3)
MPS currently does not have a web-based version. There are some planned, though. Jetbrains works on WebMPS, then there is modelix.
Portability
Generally, you are stuck to working in MPS. By default it is not really portable, unless you explicitly define generators which produce portable output. If you want to input a program , you can code a paste-handler where you could put your parser, or you can change the format in which the AST is stored (from XML to maybe directly your language, but this would again require a parser to read). I am currently working on a solution which enables to import an MPS language from a YAJCo model (model-based parser generator, where the input is not a grammar, but Java classes representing the semantic model). Then you can import a sentence (file) which creates and populates a model (AST). From the program in MPS you can generate Java source code which fills the original Java classes if you need it.
BTW the mbeddr project has implemented importing from ECore check here
Dictionary
M2M = model to model
M2T = model to text
Racket and Xtext are both considered as language workbenches, but they are based on different concepts and workflows.
As an experienced Xtext user, I find it difficult to adapt my thought process to Racket.
In Xtext, the grammar of a language is converted into, or mapped to, a set of classes (also called a metamodel).
Xtext also generates a parser that converts a source file into a set of instances of those classes. A scoping API allows to resolve named references, so that the result is an object graph (also called a model) rather than an abstract syntax tree (AST).
Such a model can be queried, transformed, or fed into a template engine to generate code.
In Racket, the reader produces an AST in the form of a syntax object. However, most examples that I have found seem to make an ad-hoc use of this syntax object. Either they are toy languages that do not need a complete object graph, or they are too complex and it is difficult to infer a general methodology.
For my current language project, after struggling with syntax objects, I have created the equivalent of a metamodel using Racket structs. Then it was fairly easy to convert a syntax object into an object graph that I could manipulate as if it were a model in EMF. However, I feel like I am not using syntax objects the way they are intended to be.
Here are my questions:
What tools or APIs are available to work on syntax objects and achieve a similar ease-of-use as a model-driven framework?
Are there documents that describe a general language development methodology in Racket, that could be applied to non-trivial languages?
Are there documents that explain the Racket way, compared to Xtext or any other model-driven language framework?
EDIT:
Based on the documentation for Metaprogramming helpers, syntax classes can be used to specify and compose syntax patterns, and attach attributes to their elements. They can achieve a similar purpose as the classes of a metamodel.
However, as far as I can see, syntax classes are not classes, and syntax objects are not linked to syntax classes in a class-instance relationship.
This has the following consequences:
Syntax classes do not support inheritance directly, but we can achieve a similar effect with ~or* and attribute declarations for subclasses.
Syntax classes do not come with accessors for their attributes: you have to call syntax-parse every time you want to read an attribute.
At this point, there are still two missing features that are not addressed in the documentation that I have found:
Traversing a syntax tree from child to parent: how can I get a reference to the syntax object enclosing a given syntax object?
Scoping: how can I define specific scoping rules for my language?
I am developing a general purpose CRUD code generator application. The idea is that codes/files (model, controller, view) for common insert, update, list, delete etc. operations will be automatically generated from model definition (like the definition used in Grails). But the generated code can be for any framework, e.g. Play (Scala or Java version) or Django or Grails or whatever framework user wants to use it for, even AngularJs. That is, same model definition can be used for generating code for any framework.
My question is, what can I use for this task - Scala or Groovy or some DSL specialized tools like Xtext?
This seems like a good case for a DSL. A DSL can be summed up as the following 3 elements:
Abstract Syntax: the concepts of your DSL. Here you want to specify CRUD applications.
Concrete Syntax(es): a way to materialize your Abstract Syntax. As a programmer, the first thought is often text-based syntaxes, but you could also use a graphical or tree-like syntax, or even simply a GUI with text fields and checkboxes.
Semantics: the meanings of your DSL. Here you want to generate code.
I'll now suggest some solutions which are based on Java and come from the Eclipse Modeling ecosystem.
Eclipse EMF implements standards for the definition of so-called "metamodels" (basically abstract syntaxes). In the Eclipse world, EMF is the base for a lot of tooling.
Assuming you have an EMF metamodel, textual syntaxes can be specified using Eclipse Xtext, and graphical syntaxes using Eclipse Sirius. Note that you can also develop your own GUI in Java and create your model using the EMF Java APIs. Also note that Xtext can create your metamodel for you based on the grammar you want for your text-based syntax. This is nice if you don't want to dive too deep into EMF it self (thus steps 1 and 2 are one and the same).
Eclipse Acceleo provides a template language specifically designed to generate text, including code. Once again, you can also write your code generator using plain Java, or any JVM-based language thanks to the EMF Java APIs. If you use Xtext, there is also a facility for including an Xtend-based code generator alonside your syntax.
i have a question about Xtext. I know that Xtext creates a Ecore Modell for the DSL that is programmed in the .xtext File. Am i getting it right that xtext only creates EClass, EAttribute, EEnum and ERefernce in the Ecore Model? There is no way to create an Attribute of an Rule to have an EOperaption?
Xtext allows to import an existing EPackage or infer a new one from a grammar definition. Since EOperations are not relevant to the concrete syntax, there is nothing that could be inferred for them. If you want to use EOperations, I suggest do switch to a manually maintained, imported package.
Adding to Sebastian's answer: If you still want to use an inferred model you can use a model-postprocessor to adjust the model. This is easier if you only want to adjust only one or two things in the model like - adding additional operations.
I'd like to define a language with different elements that shall be contained into different kind of files though linked (i.e. similarly to C++ with .cpp and .h files).
Is grammar mixin the right way to do that? If so how should I proceed?
Different elements in different file kinds sounds like a use case for Grammar Mixins. The base grammar should define the language concepts common for both languages, and the sub-languages would inherit from the base grammar.
Ideally create a manually written Ecore metamodel and map the concept to it (i.e. don't use 'generate').
Since 2.10, Xtext supports parser rule fragments. This means you can define certain reusable parts of rules with the 'fragment' keyword. See https://github.com/eclipse/xtext-core/blob/761ffeac7e62525be5a5473988d7f1d577298b67/org.eclipse.xtext.tests/src/org/eclipse/xtext/parser/fragments/FragmentTestLanguage.xtext.