NSRegularExpression and capture groups on iphone - iphone

I need a little kickstart on regex on the iphone.
Basically I have a list of dates in a private MediaWiki in the form of
*185 BC: SOME EVENT HERE
*2001: SOME OTHER EVENT MUCH LATER
I now want to parse that into an Object that has a NSDate property and a -say- NSString property.
I have this so far: (rawContentString contains the mediawiki syntax of the page)
NSString* regexString =#"\\*( *[0-9]{1,}.*): (.*)";
NSRegularExpressionOptions options = NSRegularExpressionCaseInsensitive;
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regexString options:options error:&error];
if (error) {
NSLog(#"%#", [error description]);
}
NSArray* results = [regex matchesInString:rawContentString options:0 range:NSMakeRange(0, [rawContentString length])];
for (NSTextCheckingResult* result in results) {
NSString* resultString = [rawContentString substringWithRange:result.range];
NSLog(#"%#",resultString);
}
unfortunately I think the regex is not working the way I hope and I dont know how to capture the matched date and text.
Any help would be great.
BTW: there is not by any chance a regex Pattern compilation for MediaWiki Syntax out there somewhere ?
Thanks in advance
Heiko
*

My issue was that I was using matchesInString and I needed to use firstMatchInString because it returns multiple ranges in a single NSTextCheckingResult.
This is counter intuitive, but it worked.
I got the answer from http://snipplr.com/view/63340/
My Code (to parse credit card track data):
NSRegularExpression *track1Pattern = [NSRegularExpression regularExpressionWithPattern:#"%.(.+?)\\^(.+?)\\^([0-9]{2})([0-9]{2}).+?\\?." options:NSRegularExpressionCaseInsensitive error:&error];
NSTextCheckingResult *result = [track1Pattern firstMatchInString:trackString options:NSMatchingReportCompletion range:NSMakeRange(0, trackString.length)];
self.cardNumber = [trackString substringWithRange: [result rangeAtIndex:1]];
self.cardHolderName = [trackString substringWithRange: [result rangeAtIndex:2]];
self.expirationMonth = [trackString substringWithRange: [result rangeAtIndex:3]];
self.expirationYear = [trackString substringWithRange: [result rangeAtIndex:4]];

As for the regex, i think something around these lines:
\*([ 0-9]{1,}.*):(.*)
should work better to what you need. You're not escaping the first *, and why is there a * in the first group statement?

Related

Matching HTML with NSRegularExpression

Basically I'm looking for a good example of matching HTML (also newlines and whitespace) using NSRegularExpression.
I have this PHP code I wrote a while back:
preg_match_all("/<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>/si", $data, $m['deets']);
Now I know this works in PHP but for the life of me I can't translate it to Objective-C. Here was my attempt.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>" options:(NSRegularExpressionCaseInsensitive) error:&error];
return [regex matchesInString:target options:NSCaseInsensitiveSearch range:NSMakeRange(0, [target length])];
My target in this case is a bunch of HTML.
I never used NSRegularExpression, but NSPredicate instead :
NSError *error = NULL;
NSString* pattern = #"/<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>/si";
NSPredicate* predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", pattern];
if ([predicate evaluateWithObject:myTargetString] == YES) {
// Okay
} else {
// Not found
}
Hope this helps.
EDIT :
NSPredicate is cool, be don't work if you want to get the matching range of your target string.
Your code is right, but the problem comes from the regexp expression, you must escape your \ characters and not escape / ones.
#"<dt>(.+?)</dt>\\W+<dd>(.+?)</dd>"
So :
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<dt>(.+?)</dt>\\W+<dd>(.+?)</dd>" options:(NSRegularExpressionCaseInsensitive) error:&error];
return [regex matchesInString:target options:NSCaseInsensitiveSearch range:NSMakeRange(0, [target length])];

NSSortDescriptor sorting by alpha numeric

I'm trying to sort results from a CoreData "table" of "Tracks" in a similar manner to iTunes. The problem is, "ASC" sort uses the first characters to sort so I end up with:
(I Can't Get No) Satisfaction
A Hard Days Night
I'd like The Stones to show up in the results with "I", basically ignorning anything ^A-Za-z0-9. I've tried a custom selector and comparator block but it just ignores it so I'm stuck.
From my experience you're better off having a sortName attribute that you generate on object creation. You can then use that key to sort your CoreData results in a much simpler and faster fashion.
Another solution would be to sort manually after fetching the results:
[tracks sortUsingComparator:^NSComparisonResult(id obj1, id obj2)
{
NSError *error = nil;
NSString *pattern = #"[^A-Za-z0-9]";
NSRegularExpression *expr = [NSRegularExpression regularExpressionWithPattern:pattern
options:NSRegularExpressionCaseInsensitive
error:&error];
NSString *title1 = [(Track *)obj1 title];
NSString *title2 = [(Track *)obj2 title];
NSString *title1Match = [expr stringByReplacingMatchesInString:title1
options:0
range:NSMakeRange(0, [title1 length])
withTemplate:#""];
NSString *title2Match = [expr stringByReplacingMatchesInString:title2
options:0
range:NSMakeRange(0, [title2 length])
withTemplate:#""];
return [title1Match compare:title2Match options:NSCaseInsensitiveSearch];
}];
I tried [\W] as the pattern as well but seemed like there was a huge performance hit.

How to find if the first character of last word in a NSString value is Ampersand using NSRegularExpression?

I would like to find if the first letter of last word starts with Ampersand in a NSString value using NSRegularExpression.
I used the following expression, but it shows the last word matching even if the the ampersand is anywhere in the last word.
Please advice me that how can i achieve it.
Thank you.
BOOL flagSymbolFound = NO;
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"[&]\\b\\w*$" options:NSRegularExpressionCaseInsensitive error:&error];
if(!error) {
NSUInteger numberOfMatches = [regex numberOfMatchesInString:stringValue options:0 range:NSMakeRange(0, [stringValue length])];
if(numberOfMatches > 0)
flagSymbolFound = YES;
else
flagSymbolFound = NO;
}
Try "\\s[&]\\w+$" pattern. It should match space-separated words, e.g. foo &bar
Try adding the "^" anchor to the beginning of your regex:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^[&]\\b\\w*$" options:NSRegularExpressionCaseInsensitive error:&error];

capturing parentheses using regex on iphone

Trying to get the URL out of some HTML I'm parsing (on iPhone) using 'capturing parentheses' to just group the bit I'm interested in.
I now have this:
NSString *imageHtml; //a string with some HTML in it
NSRegularExpression* innerRegex = [[NSRegularExpression alloc] initWithPattern:#"href=\"(.*?)\"" options:NSRegularExpressionCaseInsensitive|NSRegularExpressionDotMatchesLineSeparators error:nil];
NSTextCheckingResult* firstMatch = [innerRegex firstMatchInString:imageHtml options:0 range:NSMakeRange(0, [imageHtml length])];
[innerRegex release];
if(firstMatch != nil)
{
newImage.detailsURL =
NSLog(#"found url: %#", [imageHtml substringWithRange:firstMatch.range]);
}
The only thing it lists is the full match (so: href="http://tralalala.com" instead of http://tralalala.com
How can I force it to only return my first capturing parentheses match?
Regex groups work by capturing the whole match in group 0, then all groups in the regex will start at index 1. NSTextCheckingResult stores these groups as ranges. Since your regex requires at least one group the following will work.
NSString *imageHtml = #"href=\"http://tralalala.com\""; //a string with some HTML in it
NSRegularExpression* innerRegex = [[NSRegularExpression alloc] initWithPattern:#"href=\"(.*?)\"" options:NSRegularExpressionCaseInsensitive|NSRegularExpressionDotMatchesLineSeparators error:nil];
NSTextCheckingResult* firstMatch = [innerRegex firstMatchInString:imageHtml options:0 range:NSMakeRange(0, [imageHtml length])];
[innerRegex release];
if(firstMatch != nil)
{
//The ranges of firstMatch will provide groups,
//rangeAtIndex 1 = first grouping
NSLog(#"found url: %#", [imageHtml substringWithRange:[firstMatch rangeAtIndex:1]]);
}
You need pattern something like this:
(?<=href=\")(.*?)(?=\")

Text extraction with NSRegularExpression

Given a NSString *test = #"...href="/functions?q=KEYWORD\x26amp...";
How can I extract the word KEYWORD from the string using NSRegularExpression?
I have tried with the following NSRegularExpression on iOS SDK 4.2 but it is not able to find the text. Does the following code looks okay?
NSRegularExpression *testRegex = [NSRegularExpression regularExpressionWithPattern:#"(?<=href=\"\\/functions\\?q=).+?(?=\\x26amp])" options:0 error:nil];
NSRange result = [testRegex rangeOfFirstMatchInString:test options:0 range:NSMakeRange(0, [test length])];
You have a stray "]" in your regex, right before the end, which is probably causing a problem. You also need to use four slashes to match a slash in the input string. (Double it to escape it in the C string, and then double again to escape it in the regex). I'd suggest two things. First, pass something in the error parameter and take a look at in it in the debugger. Second, I'm not a big fan of lookahead/lookbehind expressions. I think this style is more readable:
NSString *regexStr = #"href=\"\\/functions\\?=(.+?)\\\\x26amp";
NSError *error;
NSRegularExpression *testRegex = [NSRegularExpression regularExpressionWithPattern:regexStr options:0 error:&error];
if( testRegex == nil ) NSLog( #"Error making regex: %#", error );
NSTextCheckingResult *result = [testRegex firstMatchInString:test options:0 range:NSMakeRange(0, [test length])];
NSRange range = [result rangeAtIndex:1];