"error: unclosed comment" in multiline comments - scala

When converting a template from Java to Scala, I've noticed the following quirk with multiline comments that can be reduced to the following snippet:
/**
* /*
*/
class Blah {}
The above fails to compile with "error: unclosed comment", while being valid in Java.
This proves problematic, since it makes it harder to document e.g. acceptance of glob-type strings (e.g. "requires a path like something/*.myformat").
Is this a bug or a feature?

It is, in fact, a feature. To quote Section 1.4 of the Scala Language Specification:
A multi-line comment is a sequence of characters between /* and */.
Multi-line comments may be nested, but are required to be properly
nested. Therefore, a comment like /* /* */ will be rejected as having
an unterminated comment.
(emph. mine)
Fortunately, it's relatively easy to work around in the case you need it (like the glob example from the question) by escaping the / or * literal, netting something like:
/**
* /*
*/
which displays correctly in the generated Scaladoc.

Related

Use "\return" as "\brief"

For very simple functions I would like to only have a \return section, but still show it as brief. (How) Can this be done?
For example:
/**
\return The distance.
*/
template <typename R = int64_t>
R distance(const pcg32 &other);
this generates no \brief in the docs, whereas e.g. this does:
/**
Multi-step advance function (jump-ahead, jump-back).
\param distance the distance.
*/
template <typename T>
void advance(T distance);
See this screenshot:
I believe that the brief section is meant to hold single paragraphs, while the \return command actually generates a header along with a paragraph.
In order to avoid changing any global REPEAT_BRIEF settings, I'd suggest a simple macro to be used for this case, e.g.,
Add the following to the doxygen ALIASES configuration:
"briefreturn=**Returns** "
Use in your description comment like so:
/** \briefreturn the value that it returns */
int foo();
This would give you a one-line brief that is similar to the multi-line output from \return.
Modifying the macro, you can achieve other behaviors as desired, e.g., forcing generation of a details section, even when no other details supplied but the return:
"briefreturn{1}=**Returns** \1 \details. "
This can still be followed by details in the normal manner. Note that will have a leading ".", though. (Perhaps some other non-whitespace character can be used instead that is less offensive as a leading sentence character)

.tmlanguage escape sequences and rule priorities

I'm implementing a syntax highlighter in Apple's Swift language by parsing .tmlanguage files and applying styles to a NSMutableAttributtedString.
I'm testing with javascript code, a javascript.tmlanguage file, and the monokai.tmtheme theme (both last included in sublime text 3) to check that the syntax get highlighted correctly. By applying each rule (patterns) in the .tmlanguage file in the same order they come, the syntax is almost perfectly highlighted.
The problem I'm having right now is that I don't know how to know that a quote (") should be escaped when it has a backslash before it (\"). Am I missing something in the .tmlanguage file that specifies that?. Other problem is that I have no idea how to know that other rules should be ignored when inside others, for example:
I'm getting double slashes taken as comments when inside strings: "http://stackoverflow.com/" a url is recognised as comment after //
Also double or single quotes are taken as strings when inside comments: // press "Enter" to continue, the word "Enter" gets highlighted as string when should be same color as comments
So, I don't know if there is some priority for some rules over others in the convention, or if there is something in the files that I haven't noticed.
Help please!
Update:
Here is a better example of what I meant by escape quotes:
I'm getting this: while all the letters should be yellow except for the escaped sequence (/") which should be blue.
The question is. How do I know that /" should be escaped? The rule for that piece of code is:
Maybe I am late to answer this. You can apply the following method.
(Ugly) In your end regex, use ([^/])(") and in your endCaptures, it would be
1 = string.quote.double.js
2 = punctuation.definition.string.end.js
If the string must be single line, you can use match=(")(.*)("), captures=
1 = punctuation.definition.string.begin.js
2 = string.quote.double.js
3 = punctuation.definition.string.end.js
and use your patterns
You can try applyEndPatternLast and see if it is allowed. Set applyEndPatternLast=1 will do.
The priority is that earlier rules in the file are prioritized over later rules. As an example, in my Python Improved language definition, I have a scope that contains a series of all-caps constants used in Django, a popular Python web framework. I also have a generic constant.other.allcaps.python scope that recognizes (just about) anything in all caps. Since the Django constants rule is before the allcaps rule in the .tmLanguage file, I can color it with a theme using one color, while the later-occurring "highlight everything in all caps" only grabs identifiers that are NOT part of the first list.
Because of this, you should put your "comments" scope(s) as early in the file as possible, then write your parser in such a way that it obeys the rule I described above. However, it's slightly more complicated than that, as I believe items in the repository are prioritized based on where their include line is, not where the repository rule is defined in the file. You may want to do some testing to verify that, though.
Unfortunately I'm not sure what you mean about the escaped quotes - could you expand on that, and maybe add an example or two?
Hope this helps.
Assuming that / is the correct character for escaping a double quote mark, the following should work:
"str_double_quote": {
"begin": "\"",
"end": "\"",
"name": "string.quoted.double.swift",
"patterns": [
{
"name": "constant.character.escape.swift",
"match": "/[\"/]"
}
]
}
You can match an escaped double quote mark (/") and a literal forward slash (//) in the patterns to consume them before the end marker is used to handle them.
If the character for escaping is actually a backslash, then the tricky bit is that there are two levels of escaping, for the JSON encoding as well as the regular expression syntax. To match \", the regular expression requires you to escape the backslash (\\"). JSON requires you to escape backslashes and double quotes, resulting in \\\\\" in a TextMate JSON grammar file. The match expression would thus be \\\\[\"\\\\].

Change the multi-line comment characters in Xtext

I want a Xtext grammar that allows me to write MIME media types this way:
mediaType application/atom+xml
specURL "http://www.rfc-editor.org/rfc/rfc4287.txt",
This is not a problem, but the following is:
mediaType application/*
specURL "http://www.iana.org/assignments/media-types/application",
You can guess of the troubles ahead with the /* characters that usually define a multi-line comment. The terminal for it is defined in the default Terminals provided by Xtext, more specifically in the ML_COMMENT terminal:
terminal ML_COMMENT : '/*' -> '*/';
I customized it by copying the default terminals to a new one of my own, where the ML_COMMENT terminal is defined this way instead:
terminal ML_COMMENT : '"""' -> '"""';
This produces a more Pythonistic way to have multi-line comments. It works fine in the generated DSL. But the /* characters still pose problem when I try to define the media type for application/*, as shown above. I get an error message of mismatched input '/*' expecting '}' (the } character would specify the end of the media types listing).
Even more troubling is that the content assist of the Xtext editor still auto-fill an ending */ multi-line comment characters when I type a (supposedly obsolete) /* combo characters. As I overrode the multi-line comment terminal, I am wondering why the auto-complete still flirts with the older ML_COMMENT definition. Do I need to override something else?
Here are some fragments for the media type grammar:
MediaType returns restapi::MediaType:
{restapi::MediaType}
'mediaType' name=MediaTypeQualifier ('specURL' specURL=EString)?;
MediaTypeQualifier:
MediaTypeFragment ('/' MediaTypeFragment)?(';' MediaTypeFragment'='MediaTypeFragment)*;
MediaTypeFragment:
(ID ( ('-'|'+'|'.') ID )* ) | '*'
I am using Xtext version 2.3.1 within Eclipse 4.2.2. Does anyone have experience with overriding the multi-line comment terminal? Is there something that I missed?
It's hard to tell from the grammar snippet that you provided, but it appears to me that you still have a keyword /* somewhere in your grammar.

"Unrecognized rule" error in lex program

I am writing a lex program. The objective of this problem is that I enter a string (letters and other characters) and it returns the length of this string.
Here is the code:
letter ([a-z]|[A-Z])
carac (•|¤|¶|§|à|î|ì|Ä|Å|É|æ|Æ|ô|ö|ò|û|ù|ÿ|Ö|Ü|ø|£|Ø|×|ƒ|á|í|ó|ú|ñ|Ñ|ª|º|¿|®|¬|½|¼|¡|:|;|.|,|/|?|=|-|!|*|£|µ|^|¨|%)
String {letter}({letter}|{carac})*
%%
{String} printf("[%d] : The number of your String \n",yyleng);
.* printf("You have a problem somewhere !");
%%
int yywrap(){return 1;}
main ()
{
yylex ();
}
And the output:
(The answers are contained in the comments, which I am including here. See Question with no answers, but issue solved in the comments (or extended in chat) ).
#Thomas Padron-McCarthy and #David Gorsline are correct:
It is likely that Flex doesn't understand the character encoding of your input file. As far as I know, Flex still only understands single-byte characters.
To amplify Thomas's comment: try a simpler version of the program, where you define carac as carac (:|;|.|,|/|?|=|-|!|^|%).
You may need to quote special characters: carac (\:|\;|\.|\,|\/|\?|\=|\-|\!|\^|\%) Or use the character class notation: carac [-:;.,/?=!^%]
To confirm this I applied these edits and ran it through flex. The following does not give flex errors:
carac (\•|\¤|\¶|\§|\à|\î|\ì|\Ä|\Å|\É|\æ|\Æ|\ô|\ö|\ò|\û|\ù|\ÿ|\Ö|\Ü|\ø|\£|\Ø|\×|\ƒ|\á|\í|\ó|\ú|\ñ|\Ñ|\ª|\º|\¿|\®|\¬|\½|\¼|\¡|\:|\;|\.|\,|\/|\?|\=|\-|\!|\*|\£|\µ|\^|\¨|\%)

Force CL-Lex to read whole word

I'm using CL-Lex to implement a lexer (as input for CL-YACC) and my language has several keywords such as "let" and "in". However, while the lexer recognizes such keywords, it does too much. When it finds words such as "init", it returns the first token as IN, while it should return a "CONST" token for the "init" word.
This is a simple version of the lexer:
(define-string-lexer lexer
(...)
("in" (return (values :in $#)))
("[a-z]([a-z]|[A-Z]|\_)" (return (values :const $#))))
How do I force the lexer to fully read the whole word until some whitespace appears?
This is both a correction of Kaz's errors, and a vote of confidence for the OP.
In his original response, Kaz states the order of Unix lex precedence exactly backward. From the lex documentation:
Lex can handle ambiguous specifications. When more than one expression can
match the current input, Lex chooses as follows:
The longest match is preferred.
Among rules which matched the same number of characters, the rule given
first is preferred.
In addition, Kaz is wrong to criticize the OP's solution of using Perl-regex word-boundary matching. As it happens, you are allowed (free of tormenting guilt) to match words in any way that your lexer generator will support. CL-LEX uses Perl regexes, which use \b as a convenient syntax for the more cumbersome lex approximate of :
%{
#include <stdio.h>
%}
WC [A-Za-z']
NW [^A-Za-z']
%start INW NIW
{WC} { BEGIN INW; REJECT; }
{NW} { BEGIN NIW; REJECT; }
<INW>a { printf("'a' in wordn"); }
<NIW>a { printf("'a' not in wordn"); }
All things being equal, finding a way to unambiguously match his words is probably better than the alternative.
Despite Kaz wanting to slap him, the OP has answered his own question correctly, coming up with a solution that takes advantage of the flexibility of his chosen lexer generator.
Your example lexer above has two rules, both of which match a sequence of exactly two characters. Moreover, they have common matches (the language matched by the second is a strict superset of the first).
In the classic Unix lex, if two rules both match the same length of input, precedence is given to the rule which occurs first in the specification. Otherwise, the longest possible match dominates.
(Although without RTFM, I can't say that that is what happens in CL-LEX, it does make a plausible hypothesis of what is happening in this case.)
It looks like you're missing a regex Kleene operator to match a longer token in the second rule.