.tmlanguage escape sequences and rule priorities - swift

I'm implementing a syntax highlighter in Apple's Swift language by parsing .tmlanguage files and applying styles to a NSMutableAttributtedString.
I'm testing with javascript code, a javascript.tmlanguage file, and the monokai.tmtheme theme (both last included in sublime text 3) to check that the syntax get highlighted correctly. By applying each rule (patterns) in the .tmlanguage file in the same order they come, the syntax is almost perfectly highlighted.
The problem I'm having right now is that I don't know how to know that a quote (") should be escaped when it has a backslash before it (\"). Am I missing something in the .tmlanguage file that specifies that?. Other problem is that I have no idea how to know that other rules should be ignored when inside others, for example:
I'm getting double slashes taken as comments when inside strings: "http://stackoverflow.com/" a url is recognised as comment after //
Also double or single quotes are taken as strings when inside comments: // press "Enter" to continue, the word "Enter" gets highlighted as string when should be same color as comments
So, I don't know if there is some priority for some rules over others in the convention, or if there is something in the files that I haven't noticed.
Help please!
Update:
Here is a better example of what I meant by escape quotes:
I'm getting this: while all the letters should be yellow except for the escaped sequence (/") which should be blue.
The question is. How do I know that /" should be escaped? The rule for that piece of code is:

Maybe I am late to answer this. You can apply the following method.
(Ugly) In your end regex, use ([^/])(") and in your endCaptures, it would be
1 = string.quote.double.js
2 = punctuation.definition.string.end.js
If the string must be single line, you can use match=(")(.*)("), captures=
1 = punctuation.definition.string.begin.js
2 = string.quote.double.js
3 = punctuation.definition.string.end.js
and use your patterns
You can try applyEndPatternLast and see if it is allowed. Set applyEndPatternLast=1 will do.

The priority is that earlier rules in the file are prioritized over later rules. As an example, in my Python Improved language definition, I have a scope that contains a series of all-caps constants used in Django, a popular Python web framework. I also have a generic constant.other.allcaps.python scope that recognizes (just about) anything in all caps. Since the Django constants rule is before the allcaps rule in the .tmLanguage file, I can color it with a theme using one color, while the later-occurring "highlight everything in all caps" only grabs identifiers that are NOT part of the first list.
Because of this, you should put your "comments" scope(s) as early in the file as possible, then write your parser in such a way that it obeys the rule I described above. However, it's slightly more complicated than that, as I believe items in the repository are prioritized based on where their include line is, not where the repository rule is defined in the file. You may want to do some testing to verify that, though.
Unfortunately I'm not sure what you mean about the escaped quotes - could you expand on that, and maybe add an example or two?
Hope this helps.

Assuming that / is the correct character for escaping a double quote mark, the following should work:
"str_double_quote": {
"begin": "\"",
"end": "\"",
"name": "string.quoted.double.swift",
"patterns": [
{
"name": "constant.character.escape.swift",
"match": "/[\"/]"
}
]
}
You can match an escaped double quote mark (/") and a literal forward slash (//) in the patterns to consume them before the end marker is used to handle them.
If the character for escaping is actually a backslash, then the tricky bit is that there are two levels of escaping, for the JSON encoding as well as the regular expression syntax. To match \", the regular expression requires you to escape the backslash (\\"). JSON requires you to escape backslashes and double quotes, resulting in \\\\\" in a TextMate JSON grammar file. The match expression would thus be \\\\[\"\\\\].

Related

What characters are allowed in the name of a rule in Drools?

I haven't been able to find in Drools documentation, which characters (beyond alphabet letters) are allowed/disallowed in a rule name in Drools - does anyone know or have a reference?
The only relevant section of Drools doc I've found so far does not specify:
Each rule must have a unique name within the rule package. If you use the same rule name more than once in any DRL file in the package, the rules fail to compile. Always enclose rule names with double quotation marks (rule "rule name") to prevent possible compilation errors, especially if you use spaces in rule names.
I think I have discovered, anecdotally, that some "grouping" characters do not work in rule names (seems rules named with can't be found or aren't included) - or at least, in extension rules (the extended rule seems to work with grouping chars, but not its extension; example below): The grouping chars include parentheses "()", square brackets "[]", and "curly braces" "{}". Although less than & greater than "<>" work, so I'm so far replacing the former with the latter.
Or are there escape chars for the problematic grouping chars?
Example:
rule "(grouping chars, and commas, work here)"
when
// conditions LHS
then
end
// removing parentheses, or replacing with < >,
// from below line works
rule "(grouping chars DON'T work here)"
extends "(grouping chars, and commas, work here)"
when
then
// consequences RHS
I haven't discovered either way yet with all other characters (for example, other punctuation; except I have discovered commas "," work). But it would be nice to know ahead of time what characters are allowed.
Theoretically every identifier inside a string should work, but you might have empirically found some combination that is breaking the grammar somehow.
Thanks for the investigation, I've filled a Jira, please take a look at it

Algolia tag not searchable when ending with special characters

I'm coming across a strange situation where I cannot search on string tags that end with a special character. So far I've tried ) and ].
For example, given a Fruit index with a record with a tag apple (red), if you query (using the JS library) with tagFilters: "apple (red)", no results will be returned even if there are records with this tag.
However, if you change the tag to apple (red (not ending with a special character), results will be returned.
Is this a known issue? Is there a way to get around this?
EDIT
I saw this FAQ on special characters. However, it seems as though even if I set () as separator characters to index that only effects the direct attriubtes that are searchable, not the tag. is this correct? can I change the separator characters to index on tags?
You should try using the array syntax for your tags:
tagFilters: ["apple (red)"]
The reason it is currently failing is because of the syntax of tagFilters. When you pass a string, it tries to parse it using a special syntax, documented here, where commas mean "AND" and parentheses delimit an "OR" group.
By the way, tagFilters is now deprecated for a much clearer syntax available with the filters parameter. For your specific example, you'd use it this way:
filters: '_tags:"apple (red)"'

Any way to extend org-mode tags to include characters like "-"?

BRIEF: can the characters permitted in org-mode tags be extenced in any way?
E.g. to include -, dashes?
DETAIL:
I see
http://orgmode.org/org.html#Tags
Tags are normal words containing letters, numbers, ‘_’, and ‘#’. Tags
must be preceded and followed by a single colon, e.g., ‘:work:’.
I am somewhat surprised that this is not extensible. Is it, and I have missed it?
TODO keywords can include dashes. Occasionally I would like to treat TODOs as intermiscible wth tags - e.g. move a TODO to a tag, and vice versa - but this syntax difference gets in the way.
Before I start coding, does anyone know why dashes are not allowed? I conjecture confusion with timestamps.
Not certain, but it is likely because certain agenda search strings use "-" as an operator, and there is not yet a way to escape those using "-".
I have used hyphens for properties and spaces for todo kw with no problems for my purposes, but I have not tried them in searches.
The new exporter will be included in 8.0 and with it a detailed description of syntax. If you want to file a feature request to allow hyphens, perhaps by allowing escaping in search strings, now is the time to do it.

Regex for strings in Bibtex

I've trying to write a Bibtex parser with flex/bison. Here are the rules for strings in bibtex:
Strings can be enclosed in double quotes "..." or in braces {...}
In a string, braces can be nested
Inside a string, the braces should be balanced (invalid string: {this is a { test})
Inside an "internet" {}, you can have any characters. So this string is valid: {This is a string {test"} and it is valid}
Any ideas on how to do this?
Now you're going into the field of a text parser. Surprisingly, nobody has made a bibtex library for Actionscript that I could find, so it's an interesting problem. If you do make one, do the community a favor and open source it :)
It won't be easy to do since you essentially have to go character by character and check for the chars that you need and do logic around that. However, I recommend you look at as3corelib's implementation of the JSON parser which is somewhat similar to what you're trying to accomplish. You'll at least get an idea of how to do it using a tokenizer and it's a very good start on your project.
Good luck.

Comment token inside string

In pig etc. /* begins a block comment. If I put this in a regex string 'blah/blah/*', emacs thinks this is a block comment and syntax highlighting goes to hell. I am not familiar with elisp but I am certain that is a problem with script that is providing annotations for pig.
How can I fix it?
phils pointed out a better designed major mode in the question comments, but since you are still curious: The pig mode version you are using doesn't have the syntax table set up right. The most reliable way for emacs to recognize comments and strings is to use the syntax table to map characters to start/end of comments and strings. The version you are using is trying to do it with font-lock.
You have to escape the \'es and the *. All the characters that are used by the regexp engine, have to be escaped.
If you want to match "\", you might have to write "\\" when using replace-regexp interactively and "\\\\" if you use it as a lisp function.
(I even have to escape my escapes in this comment, so there are 8 escapes in the last escape sequence above)