Matching HTML with NSRegularExpression - iphone

Basically I'm looking for a good example of matching HTML (also newlines and whitespace) using NSRegularExpression.
I have this PHP code I wrote a while back:
preg_match_all("/<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>/si", $data, $m['deets']);
Now I know this works in PHP but for the life of me I can't translate it to Objective-C. Here was my attempt.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>" options:(NSRegularExpressionCaseInsensitive) error:&error];
return [regex matchesInString:target options:NSCaseInsensitiveSearch range:NSMakeRange(0, [target length])];
My target in this case is a bunch of HTML.

I never used NSRegularExpression, but NSPredicate instead :
NSError *error = NULL;
NSString* pattern = #"/<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>/si";
NSPredicate* predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", pattern];
if ([predicate evaluateWithObject:myTargetString] == YES) {
// Okay
} else {
// Not found
}
Hope this helps.
EDIT :
NSPredicate is cool, be don't work if you want to get the matching range of your target string.
Your code is right, but the problem comes from the regexp expression, you must escape your \ characters and not escape / ones.
#"<dt>(.+?)</dt>\\W+<dd>(.+?)</dd>"
So :
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<dt>(.+?)</dt>\\W+<dd>(.+?)</dd>" options:(NSRegularExpressionCaseInsensitive) error:&error];
return [regex matchesInString:target options:NSCaseInsensitiveSearch range:NSMakeRange(0, [target length])];

Related

how to write NSRegularExpression to xx:xx:xx?

i am trying to check if NSString is in specific format. dd:dd:dd. I was thinking of NSRegularExpression. Something like
/^(\d)\d:\d\d:\d\d)$/ ?
Have you tried something like:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^\d{2}:\d{2}:\d{2}$"
options:0
error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
(I haven't tested it, because I cannot right now, but it should be working)
I suggest to use RegexKitLite
With this and assuming that in dd:dd:dd 'd' actually stands for a digit from 0-9 it should be fairly easy to implement what you need given the additional comment from Grijesh.
Here's an example copied from the RegexKitLite page:
// finds phone number in format nnn-nnn-nnnn
NSString *regEx = #"{3}-[0-9]{3}-[0-9]{4}";
NSString *match = [textView.text stringByMatching:regEx];
if ([match isEqual:#""] == NO) {
NSLog(#"Phone number is %#", match);
} else {
NSLog(#"Not found.");
}
UPDATE:
NSString *idRegex = #"[0-9][0-9]:[0-9][0-9]:[0-9][0-9]";
NSPredicate *idTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", idRegex];
for (NSString * str in newArrAfterPars) {
if ([idTest evaluateWithObject:str]) {
}
}

How to find if the first character of last word in a NSString value is Ampersand using NSRegularExpression?

I would like to find if the first letter of last word starts with Ampersand in a NSString value using NSRegularExpression.
I used the following expression, but it shows the last word matching even if the the ampersand is anywhere in the last word.
Please advice me that how can i achieve it.
Thank you.
BOOL flagSymbolFound = NO;
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"[&]\\b\\w*$" options:NSRegularExpressionCaseInsensitive error:&error];
if(!error) {
NSUInteger numberOfMatches = [regex numberOfMatchesInString:stringValue options:0 range:NSMakeRange(0, [stringValue length])];
if(numberOfMatches > 0)
flagSymbolFound = YES;
else
flagSymbolFound = NO;
}
Try "\\s[&]\\w+$" pattern. It should match space-separated words, e.g. foo &bar
Try adding the "^" anchor to the beginning of your regex:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^[&]\\b\\w*$" options:NSRegularExpressionCaseInsensitive error:&error];

How can I escape this regex properly in Objective-C?

I have the following regex I would like to escape in Objective-C
/\B\$((?:[0-9]+(?=[a-z])|(?![0-9\.\:\_\-]))(?:[a-z0-9]|[\_\.\-\:](?![\.\_\.\-\:]))*[a-z0-9]+)/ig;
Not exactly sure how to escape it so it works in Objective-C
Update:
NSString* pattern = #"/\\B\\$((?:[0-9]+(?=[a-z])|(?![0-9\\.\\:\\_\\-]))(?:[a-z0-9]|[\\_\\.\\-\\:](?![\\.\\_\\.\\-\\:]))*[a-z0-9]+)/ig;";
NSRegularExpression *usernameRegex = [[[NSRegularExpression alloc] initWithPattern:pattern
options:NSRegularExpressionCaseInsensitive
error:nil];
error:nil];
Gives me an error about Parse Issue - Unexpected Identifier
Backslashes are used as escape characters in C strings. To make a regexp that contains backslashes as regex escapes, you need to double them.
Following on from the correct solution by millimoose, here is a NSString category method I use to escape black slashes for Regex patterns in Objective C.
+ (NSString *)escapeBackslashes:(NSString *)regexString
{
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\\\" options:NSRegularExpressionCaseInsensitive | NSRegularExpressionDotMatchesLineSeparators | NSRegularExpressionAnchorsMatchLines | NSRegularExpressionAllowCommentsAndWhitespace error:&error];
if (error == NULL)
{
return [regex stringByReplacingMatchesInString:regexString options:0 range:NSMakeRange(0, [regexString length]) withTemplate:#"\\\\"];
}
else
{
return regexString;
}
}
Usage example:
NSString* pattern = [NSString escapeBackslashes:pattern];

Text extraction with NSRegularExpression

Given a NSString *test = #"...href="/functions?q=KEYWORD\x26amp...";
How can I extract the word KEYWORD from the string using NSRegularExpression?
I have tried with the following NSRegularExpression on iOS SDK 4.2 but it is not able to find the text. Does the following code looks okay?
NSRegularExpression *testRegex = [NSRegularExpression regularExpressionWithPattern:#"(?<=href=\"\\/functions\\?q=).+?(?=\\x26amp])" options:0 error:nil];
NSRange result = [testRegex rangeOfFirstMatchInString:test options:0 range:NSMakeRange(0, [test length])];
You have a stray "]" in your regex, right before the end, which is probably causing a problem. You also need to use four slashes to match a slash in the input string. (Double it to escape it in the C string, and then double again to escape it in the regex). I'd suggest two things. First, pass something in the error parameter and take a look at in it in the debugger. Second, I'm not a big fan of lookahead/lookbehind expressions. I think this style is more readable:
NSString *regexStr = #"href=\"\\/functions\\?=(.+?)\\\\x26amp";
NSError *error;
NSRegularExpression *testRegex = [NSRegularExpression regularExpressionWithPattern:regexStr options:0 error:&error];
if( testRegex == nil ) NSLog( #"Error making regex: %#", error );
NSTextCheckingResult *result = [testRegex firstMatchInString:test options:0 range:NSMakeRange(0, [test length])];
NSRange range = [result rangeAtIndex:1];

NSRegularExpression and capture groups on iphone

I need a little kickstart on regex on the iphone.
Basically I have a list of dates in a private MediaWiki in the form of
*185 BC: SOME EVENT HERE
*2001: SOME OTHER EVENT MUCH LATER
I now want to parse that into an Object that has a NSDate property and a -say- NSString property.
I have this so far: (rawContentString contains the mediawiki syntax of the page)
NSString* regexString =#"\\*( *[0-9]{1,}.*): (.*)";
NSRegularExpressionOptions options = NSRegularExpressionCaseInsensitive;
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regexString options:options error:&error];
if (error) {
NSLog(#"%#", [error description]);
}
NSArray* results = [regex matchesInString:rawContentString options:0 range:NSMakeRange(0, [rawContentString length])];
for (NSTextCheckingResult* result in results) {
NSString* resultString = [rawContentString substringWithRange:result.range];
NSLog(#"%#",resultString);
}
unfortunately I think the regex is not working the way I hope and I dont know how to capture the matched date and text.
Any help would be great.
BTW: there is not by any chance a regex Pattern compilation for MediaWiki Syntax out there somewhere ?
Thanks in advance
Heiko
*
My issue was that I was using matchesInString and I needed to use firstMatchInString because it returns multiple ranges in a single NSTextCheckingResult.
This is counter intuitive, but it worked.
I got the answer from http://snipplr.com/view/63340/
My Code (to parse credit card track data):
NSRegularExpression *track1Pattern = [NSRegularExpression regularExpressionWithPattern:#"%.(.+?)\\^(.+?)\\^([0-9]{2})([0-9]{2}).+?\\?." options:NSRegularExpressionCaseInsensitive error:&error];
NSTextCheckingResult *result = [track1Pattern firstMatchInString:trackString options:NSMatchingReportCompletion range:NSMakeRange(0, trackString.length)];
self.cardNumber = [trackString substringWithRange: [result rangeAtIndex:1]];
self.cardHolderName = [trackString substringWithRange: [result rangeAtIndex:2]];
self.expirationMonth = [trackString substringWithRange: [result rangeAtIndex:3]];
self.expirationYear = [trackString substringWithRange: [result rangeAtIndex:4]];
As for the regex, i think something around these lines:
\*([ 0-9]{1,}.*):(.*)
should work better to what you need. You're not escaping the first *, and why is there a * in the first group statement?