Comment token inside string

Comment token inside string - emacs

In pig etc. /* begins a block comment. If I put this in a regex string 'blah/blah/*', emacs thinks this is a block comment and syntax highlighting goes to hell. I am not familiar with elisp but I am certain that is a problem with script that is providing annotations for pig.
How can I fix it?

phils pointed out a better designed major mode in the question comments, but since you are still curious: The pig mode version you are using doesn't have the syntax table set up right. The most reliable way for emacs to recognize comments and strings is to use the syntax table to map characters to start/end of comments and strings. The version you are using is trying to do it with font-lock.

You have to escape the \'es and the *. All the characters that are used by the regexp engine, have to be escaped.
If you want to match "\", you might have to write "\\" when using replace-regexp interactively and "\\\\" if you use it as a lisp function.
(I even have to escape my escapes in this comment, so there are 8 escapes in the last escape sequence above)

Related

How to escape '|' in emacs's org-mode? [duplicate]

I've got a table in Emacs org-mode, and the contents are regular expressions. I can't seem to figure out how to escape a literal pipe-character (|) that's part of a regex though, so it's interpreted as a table-cell separator. Could someone point me to some help? Thanks.
Update: I'm also looking for escapes for a slash (/), so that it doesn't trigger the start of an italic/emphasis sequence. I experimented with \/ and \// - for example, suppose I want the literal text /foo/ in a table cell. Here are 3 ways of attempting it:
| /foo/ | \/foo/ | \//foo/ |
In LaTeX export, that becomes:
\emph{foo} & \/foo/ & \//foo/
So none of them is the plain /foo/ I'm hoping for.

\vert for the pipe.
Forward slashes seem to work fine for me unescaped when exporting both to HTML and PDF.

Use a broken-bar character, “¦”, Unicode 00A6 BROKEN BAR. This may or may not work for your specific needs, but it’s a good visual approximation.

You could also format the relevant text as verbatim or code:
Text in the code and verbatim string is not processed for Org mode
specific syntax; it is exported verbatim.
So you might try something like =foo | bar= (code) or foo ~|~ bar (verbatim). It does change the output format, though.

.tmlanguage escape sequences and rule priorities

I'm implementing a syntax highlighter in Apple's Swift language by parsing .tmlanguage files and applying styles to a NSMutableAttributtedString.
I'm testing with javascript code, a javascript.tmlanguage file, and the monokai.tmtheme theme (both last included in sublime text 3) to check that the syntax get highlighted correctly. By applying each rule (patterns) in the .tmlanguage file in the same order they come, the syntax is almost perfectly highlighted.
The problem I'm having right now is that I don't know how to know that a quote (") should be escaped when it has a backslash before it (\"). Am I missing something in the .tmlanguage file that specifies that?. Other problem is that I have no idea how to know that other rules should be ignored when inside others, for example:
I'm getting double slashes taken as comments when inside strings: "http://stackoverflow.com/" a url is recognised as comment after //
Also double or single quotes are taken as strings when inside comments: // press "Enter" to continue, the word "Enter" gets highlighted as string when should be same color as comments
So, I don't know if there is some priority for some rules over others in the convention, or if there is something in the files that I haven't noticed.
Help please!
Update:
Here is a better example of what I meant by escape quotes:
I'm getting this: while all the letters should be yellow except for the escaped sequence (/") which should be blue.
The question is. How do I know that /" should be escaped? The rule for that piece of code is:

Maybe I am late to answer this. You can apply the following method.
(Ugly) In your end regex, use ([^/])(") and in your endCaptures, it would be
1 = string.quote.double.js
2 = punctuation.definition.string.end.js
If the string must be single line, you can use match=(")(.*)("), captures=
1 = punctuation.definition.string.begin.js
2 = string.quote.double.js
3 = punctuation.definition.string.end.js
and use your patterns
You can try applyEndPatternLast and see if it is allowed. Set applyEndPatternLast=1 will do.

The priority is that earlier rules in the file are prioritized over later rules. As an example, in my Python Improved language definition, I have a scope that contains a series of all-caps constants used in Django, a popular Python web framework. I also have a generic constant.other.allcaps.python scope that recognizes (just about) anything in all caps. Since the Django constants rule is before the allcaps rule in the .tmLanguage file, I can color it with a theme using one color, while the later-occurring "highlight everything in all caps" only grabs identifiers that are NOT part of the first list.
Because of this, you should put your "comments" scope(s) as early in the file as possible, then write your parser in such a way that it obeys the rule I described above. However, it's slightly more complicated than that, as I believe items in the repository are prioritized based on where their include line is, not where the repository rule is defined in the file. You may want to do some testing to verify that, though.
Unfortunately I'm not sure what you mean about the escaped quotes - could you expand on that, and maybe add an example or two?
Hope this helps.

Assuming that / is the correct character for escaping a double quote mark, the following should work:
"str_double_quote": {
"begin": "\"",
"end": "\"",
"name": "string.quoted.double.swift",
"patterns": [
{
"name": "constant.character.escape.swift",
"match": "/[\"/]"
}
]
}
You can match an escaped double quote mark (/") and a literal forward slash (//) in the patterns to consume them before the end marker is used to handle them.
If the character for escaping is actually a backslash, then the tricky bit is that there are two levels of escaping, for the JSON encoding as well as the regular expression syntax. To match \", the regular expression requires you to escape the backslash (\\"). JSON requires you to escape backslashes and double quotes, resulting in \\\\\" in a TextMate JSON grammar file. The match expression would thus be \\\\[\"\\\\].

How to escape double quote?

In org mode, if I want to format text a monospace verbatim, i.e. ~...~, if it is inside quotes: ~"..."~, it is not formatted (left as is).
Also, are quotes a reserved symbol, if so, what do they mean? (they don't seem to affect the generated HTML / inside Emacs display).

The culprit in this case is the regular expression in org-emph-re org-verbatim-re, responsible for determining if a sequence of characters in the document is to be set verbatim or not.
org-verbatim-re is a variable defined in `org.el'.
Its value is
"\([ ('\"{]\|^\)\(\([=~]\)\([^
\n,\"']\|[^
\n,\"'].?\(?:\n.?\)\{0,1\}[^
\n,\"']\)\3\)\([- .,:!?;'\")}\]\|$\)"
quotes and double quotes are explicitly forbidden inside verbatim characters =~ by
[^
\n,\"']\|[^
\n,\"']
I found discussions dating back 3 years comming to the conclusion that you have to tinker with this regular expression and set the variable org-emph-re/org-verbatim-re to something that matches your wishes in your emacs setup (maybe a file local variable works as well). You can experiment by excluding double quotes from the excluding character classes and outside matches as in
"\([ ('{]\|^\)\(\([*/_=~+]\)\([^
\n,']\|[^
\n,'].?\(?:\n.?\)\{0,1\}[^
\n,']\)\3\)\([- .,:!?;')}\]\|$\)"
but looking at that regex, heaven knows what happens to complex documents -- you have to try...
Edit: as it happens, if I evalute the following as region, quotes inside = are exported correctly, but nothing else is :-), I investigate further when I have more time.
(setq org-emph-re "\([ ('{]\|^\)\(\([*/_=~+]\)\([^
\n,']\|[^
\n,'].?\(?:\n.?\)\{0,1\}[^
\n,']\)\3\)\([- .,:!?;')}]\|$\)")
Edit 2:: Got it to work by changing org.el directly:
Change the line following (defvar org-emphasis-regexp-components from '(" \t('\"{" "- \t.,:!?;'\")}\\" " \t\r\n,\"'" "." 1) to '(" \t('{" "- \t.,:!?;')}\\" " \t\r\n,'" "." 1) and recompile org then restart emacs.
This was a defcustom prior to the 8.0 release, it isn't anymore, so you have to live with this manual modification.
regards,
Tom

Finally, I found a solution from http://comments.gmane.org/gmane.emacs.orgmode/82571
According to that thread, the regexp for verbatim is built from variable org-emphasis-regexp-components, which defines legal characters before, after, at the border of, or in the body of emphasis; and verbatim is one of the emphasis environment in org mode.
A workable setting given by that thread:
(setcar (nthcdr 2 org-emphasis-regexp-components) " \t\n,")
(custom-set-variables `(org-emphasis-alist ',org-emphasis-alist))

For small amounts of characters which have some unwanted effect in Emacs org-mode (because being metacharacters) it may be helpful to have a look at special symbols in org-mode (org-entities.el).
So for example " can be encoded by \quot{} (where the braces pair at the end is not mandatory, but needed if no whitespace follows).
So instead ="..."= you would write =\quot{}...\quot{}=.
That is some typing more and looks pretty ugly. But for the latter org-mode has a solution: by C-c C-x \ you can toggle a display magic for those symbols. If the magic is active, so directly after typing \quot{} resp. \quot{} a " will be displayed.
Besides, this symbols list can easily be extended, f.e.
(add-to-list 'org-entities
'("backslash" "\\textbackslash" nil "\\" "\\" "\\" "\\"))
Nevertheless I am heavily missing easier escaping in org-mode, besides the above solution and besides escaping a whole line by a : at its beginning.
I'd be happy if =verbatim= in all cases would leave the text between the ='s unchanged. Not =this*bold*text=, but =this *bold* text=. Like we know that from each well-designed markup/-down language.
But, of course, this is better placed at the org-mode development pages. Ideally with a fitting patch... :-)

I've met similar problem, and thanks #chaiko for a basic solution. However, #chaiko's solution only work for org-mode's fontification, it doesn't affect org-export. To get correct exported document, you need to do some more extra hack to org-mode's parser by (org-element--set-regexps).
So the full code snippets should be something like:
(setcar (nthcdr 2 org-emphasis-regexp-components) " \t\n\r")
(custom-set-variables `(org-emphasis-alist ',org-emphasis-alist))
(org-element--set-regexps)
I've integrated this to my oh-my-emacs project: https://github.com/xiaohanyu/oh-my-emacs/blob/e82fce10d47f7256df6d39e32ca288d0ec97a764/core/ome-org.org#code-block-fontification .

How to use '^#' in Vim scripts?

I'm trying to work around a problem with using ^# (i.e., <ctrl-#>) characters in Vim scripts. I can insert them into a script, but when the script runs it seems the line is truncated at the point where a ^# was located.
My kludgy solution so far is to have a ^# stored in a variable, then reference the variable in the script whenever I would have quoted a literal ^#. Can someone tell me what's going on here? Is there a better way around this problem?

That is one reason why I never use raw special character values in scripts. While ^# does not work, string <C-#> in mappings works as expected, so you may use one of
nnoremap <C-#> {rhs}
nnoremap <Nul> {rhs}
It is strange, but you cannot use <Char-0x0> here. Some notes about null byte in strings:
Inserting null byte into string truncates it: vim uses old C-style strigs that end with null byte, thus it cannot appear in strings. These strings are very inefficient, so if you want to generate a very large text, try accumulating it into a list of lines (using setline is very fast as buffer is represented as a list of lines).
Most functions that return list of strings (like readfile, getline(start, end)) or take list of strings (like writefile, setline, append) treat \n (NL) as Null. It is also the internal representation of buffer lines, see :h NL-used-for-Nul.
If you try to insert \n character into the command-line, you will get Null shown (but this is really a newline). If you want to edit a file that has \n in a filename (it is possible on *nix), you will need to prepend newline with backslash.

The byte ctrl-# is also known as '\0'. Many languages, programs, etc. use it as an "end of string" marker, so it's not surprising that vim gets confused there. If you must use this byte in the middle of a script string, it sounds like your workaround is a decent one.

ack-grep: chars escaping

My goal is to find all "<?=" occurrences with ack. How can I do that?
ack "<?="
Doesn't work. Please tell me how can I fix escaping here?

Since ack uses Perl regular expressions, your problem stems from the fact that in Perl RegEx language, ? is a special character meaning "last match is optional". So what you are grepping for is = preceded by an optional <
So you need to escape the ? if that's just meant to be a regular character.
To escape, there are two approaches - either <\?= or <[?]=; some people find the second form of escaping (putting a special character into a character class) more readable than backslash-escape.
UPDATE As Josh Kelley graciously added in the comment, a third form of escaping is to use the \Q operator which escapes all the following special characters till \E is encountered, as follows: \Q<?=\E

Rather than trying to remember which characters have to be escaped, you can use -Q to quote everything that needs to be quoted.

ack -Q "<?="
This is the best solution if you will want to find by simple text.
(if you need not find by regular expression.)

ack "<\?="
? is a regex operator, so it needs escaping

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse