Problem with RegExKitLite and ampersands - iphone

So I'm trying to rip URLs from an NSString using RegExKitLite and I came across an odd problem.
NSLog(#"Array: %#", [message componentsMatchedByRegex:#"^(http://)[-a-zA-Z0-9+&##/%?=~_()|!:,.;]*"]);
NSString *message is just some text with a URL within it. The strange thing is it doesn't work with the ampersand in it. If I take the ampersand out it works fine, but for obvious reasons I want to keep the ampersand in. I'm also a Regex newb, so don't bash my search expression to much :)
Anyone experience this before with RegExKitLite or RegEx in general in Objective-C?

In ICU regular expression character classes, & means intersection. For example #"[[:letter:] & [a-z]]". So it needs to be quoted as Peter suggestion, with a backslash, ie \& in the regular expression. However, \ has a special meaning in C strings, including Objective C strings. So the \ has to itself be quoted with . So you need \& in your pattern. Ie, [-a-zA-Z0-9+\&##/%?=~_()|!:,.;]
Also, I'm not sure what your intention is with the ^ at the start of the URL. If you want the regex to match anywhere in the string, you should use \b (word break) instead. If you want it to match URLs that are only at the start of the message, then you would only ever get a single match as written. If you want it to match URLs that are at the start of a line, then add (?m) at the start of the regex to turn on multiline matching for ^ (and consider adding $ to the end of the regex).

I've no experience with RegExKitLite, and never encountered & as special inside a character class, but try putting a \ before it to see if that works?
NSLog(#"Array: %#", [message componentsMatchedByRegex:#"^(http://)[-a-zA-Z0-9+\&##/%?=~_()|!:,.;]*"]);

Related

Why do Atom snippets need four backslashes at once in their body in order to print a single backslash

I just realized this while setting up a snippet.
'.source':
'shrug':
'prefix': 'shrug'
'body': '¯\\\\_(ツ)_/¯'
In order to print the typical ¯\_(ツ)_/¯ shrug, you need 4 backslashes. Using 2 backslashes doesn't cause any errors, but the backslash won't be printed. I would understand it if why you'd need 2, but why 4?
The four backslashes in atom snippets is due to snippets using the generic CSON notation (Coffeescript style JSON).
It's well described in this comment on an issue from the atom-snippets repo
I think that four backslashes makes sense, however notationally
inconvenient.
It has to do with the levels of interpretation a snippet goes through
before it ends up in your text buffer:
The snippet is declared in a CSON file, the parsing of string elements
in this format is "backslash sensitive" i.e. \n represents the newline
character and a \ is represented as .
The snippet then has to be
parsed by the snippet body parser. The parser uses one \ to escape the
following character, e.g. \ becomes . So the process goes as follows:
\ --CSON--> \ --BodyParser--> \
The reason two backslashes used to work, was because the snippet body
parser never really handled escaped characters (the escape cases were
handled explicitly rather than in a generic way) this was why we had
bug #60.
The process could be made more notationally friendly if the snippets
were stored in a custom format. Then we would have more control over
how it is parsed, such as not interpreting backslashes before they are
being fed to the body parser.

.tmlanguage escape sequences and rule priorities

I'm implementing a syntax highlighter in Apple's Swift language by parsing .tmlanguage files and applying styles to a NSMutableAttributtedString.
I'm testing with javascript code, a javascript.tmlanguage file, and the monokai.tmtheme theme (both last included in sublime text 3) to check that the syntax get highlighted correctly. By applying each rule (patterns) in the .tmlanguage file in the same order they come, the syntax is almost perfectly highlighted.
The problem I'm having right now is that I don't know how to know that a quote (") should be escaped when it has a backslash before it (\"). Am I missing something in the .tmlanguage file that specifies that?. Other problem is that I have no idea how to know that other rules should be ignored when inside others, for example:
I'm getting double slashes taken as comments when inside strings: "http://stackoverflow.com/" a url is recognised as comment after //
Also double or single quotes are taken as strings when inside comments: // press "Enter" to continue, the word "Enter" gets highlighted as string when should be same color as comments
So, I don't know if there is some priority for some rules over others in the convention, or if there is something in the files that I haven't noticed.
Help please!
Update:
Here is a better example of what I meant by escape quotes:
I'm getting this: while all the letters should be yellow except for the escaped sequence (/") which should be blue.
The question is. How do I know that /" should be escaped? The rule for that piece of code is:
Maybe I am late to answer this. You can apply the following method.
(Ugly) In your end regex, use ([^/])(") and in your endCaptures, it would be
1 = string.quote.double.js
2 = punctuation.definition.string.end.js
If the string must be single line, you can use match=(")(.*)("), captures=
1 = punctuation.definition.string.begin.js
2 = string.quote.double.js
3 = punctuation.definition.string.end.js
and use your patterns
You can try applyEndPatternLast and see if it is allowed. Set applyEndPatternLast=1 will do.
The priority is that earlier rules in the file are prioritized over later rules. As an example, in my Python Improved language definition, I have a scope that contains a series of all-caps constants used in Django, a popular Python web framework. I also have a generic constant.other.allcaps.python scope that recognizes (just about) anything in all caps. Since the Django constants rule is before the allcaps rule in the .tmLanguage file, I can color it with a theme using one color, while the later-occurring "highlight everything in all caps" only grabs identifiers that are NOT part of the first list.
Because of this, you should put your "comments" scope(s) as early in the file as possible, then write your parser in such a way that it obeys the rule I described above. However, it's slightly more complicated than that, as I believe items in the repository are prioritized based on where their include line is, not where the repository rule is defined in the file. You may want to do some testing to verify that, though.
Unfortunately I'm not sure what you mean about the escaped quotes - could you expand on that, and maybe add an example or two?
Hope this helps.
Assuming that / is the correct character for escaping a double quote mark, the following should work:
"str_double_quote": {
"begin": "\"",
"end": "\"",
"name": "string.quoted.double.swift",
"patterns": [
{
"name": "constant.character.escape.swift",
"match": "/[\"/]"
}
]
}
You can match an escaped double quote mark (/") and a literal forward slash (//) in the patterns to consume them before the end marker is used to handle them.
If the character for escaping is actually a backslash, then the tricky bit is that there are two levels of escaping, for the JSON encoding as well as the regular expression syntax. To match \", the regular expression requires you to escape the backslash (\\"). JSON requires you to escape backslashes and double quotes, resulting in \\\\\" in a TextMate JSON grammar file. The match expression would thus be \\\\[\"\\\\].

How to parse special characters in XML for iPad?

I am getting problem while parsing xml files that contains some special characters like single quote,double quote (', "")etc.I am using NSXMLParser's parser:foundCharacters:method to collect characters in my code.
<synctext type = "word" >They raced to the park Arthur pointed to a sign "Whats that say" he asked Zoo said DW Easy as pie</synctext>
When i parse and save the text from above tag of my xml file,the resultant string is appearing,in GDB, as
"\n\t\tThey raced to the park Arthur pointed to a sign \"Whats that say\" he asked Zoo said DW Easy as pie";
Observe there are 2 issues:
1)Unwanted characters at the beginning of the string.
2)The double quotes around Whats that say.
Can any one please help me how to get rid of these unwanted characters and how to read special characters properly.
NSString*string =[string stringByTrimmingCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:#" \n\t"]];
The parser is apparently returning exactly what's in the string. That is, the XML was coded with the starting tag on one line, a newline, two tabs, and the start of the string. And quotes in the string are obviously there in the original (and it's not clear in at least this example why you'd want to delete them).
But if you want these characters gone then you need to post-process the string. You can use Rams' statement to eliminate the newline and tabs, and stringByReplacingOccurrencesOfString:WithString: to zap the quotes.
(Note that some XML parsers can be instructed to return strings like this with the leading/trailing stuff stripped, but I'm not sure about this one. The quotes will always be there, though.)

how to write special character in objective-C NSString

when I try to write this JSON:
{"author":"mehdi","email":"email#hotmail.fr","message":"Hello"}
like this in Objective-C:
NSString *myJson=#"{"author":"mehdi","email":"email#hotmail.fr","message":"Hello"}";
it doesn't work. Can someone help me?
You need to escape quote characters with a backslash:
NSString *myJson = #"{\"author\":\"mehdi\",\"email\":\"email#hotmail.fr\",\"message\":\"Hello\"}";
Otherwise the compiler will think that your string literal ends right after the first {.
The backslashes will not be present as characters in the resulting NSString. They are merely there as hints for the compiler and are removed from the actual string during compilation.
Newbie note: JSON strings that you read directly from a file via Objective C of course do not need any escaping! (JSON itself may need such, but that's about it. No need for additional escaping on the ObjC-side of it.)

ack-grep: chars escaping

My goal is to find all "<?=" occurrences with ack. How can I do that?
ack "<?="
Doesn't work. Please tell me how can I fix escaping here?
Since ack uses Perl regular expressions, your problem stems from the fact that in Perl RegEx language, ? is a special character meaning "last match is optional". So what you are grepping for is = preceded by an optional <
So you need to escape the ? if that's just meant to be a regular character.
To escape, there are two approaches - either <\?= or <[?]=; some people find the second form of escaping (putting a special character into a character class) more readable than backslash-escape.
UPDATE As Josh Kelley graciously added in the comment, a third form of escaping is to use the \Q operator which escapes all the following special characters till \E is encountered, as follows: \Q<?=\E
Rather than trying to remember which characters have to be escaped, you can use -Q to quote everything that needs to be quoted.
ack -Q "<?="
This is the best solution if you will want to find by simple text.
(if you need not find by regular expression.)
ack "<\?="
? is a regex operator, so it needs escaping