Text extraction with NSRegularExpression - iphone

Given a NSString *test = #"...href="/functions?q=KEYWORD\x26amp...";
How can I extract the word KEYWORD from the string using NSRegularExpression?
I have tried with the following NSRegularExpression on iOS SDK 4.2 but it is not able to find the text. Does the following code looks okay?
NSRegularExpression *testRegex = [NSRegularExpression regularExpressionWithPattern:#"(?<=href=\"\\/functions\\?q=).+?(?=\\x26amp])" options:0 error:nil];
NSRange result = [testRegex rangeOfFirstMatchInString:test options:0 range:NSMakeRange(0, [test length])];

You have a stray "]" in your regex, right before the end, which is probably causing a problem. You also need to use four slashes to match a slash in the input string. (Double it to escape it in the C string, and then double again to escape it in the regex). I'd suggest two things. First, pass something in the error parameter and take a look at in it in the debugger. Second, I'm not a big fan of lookahead/lookbehind expressions. I think this style is more readable:
NSString *regexStr = #"href=\"\\/functions\\?=(.+?)\\\\x26amp";
NSError *error;
NSRegularExpression *testRegex = [NSRegularExpression regularExpressionWithPattern:regexStr options:0 error:&error];
if( testRegex == nil ) NSLog( #"Error making regex: %#", error );
NSTextCheckingResult *result = [testRegex firstMatchInString:test options:0 range:NSMakeRange(0, [test length])];
NSRange range = [result rangeAtIndex:1];

Related

parsing string starting with # and # in objective-C

So I am trying to parse a string that has the following format:
baz#marroon#red#blue #big#cat#dog
or, it can also be separated by spaces:
baz #marroon #red #blue #big #cat #dog
and here's how I am doing it now:
- (void) parseTagsInComment:(NSString *) comment
{
if ([comment length] > 0){
NSArray * stringArray = [comment componentsSeparatedByString:#" "];
for (NSString * word in stringArray){
}
}
}
I've got the components separated by space working, but what if it has no space.. how do I iterate through these words? I was thinking of using regex.. but I have no idea on how to write such regex in objective-C. Any idea, for a regex that would cover both of these cases?
Here's my first attempt:
NSError * error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(#|#)\\S+" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray* wordArray = [regex matchesInString:comment
options:0 range:NSMakeRange(0, [comment length])];
for (NSString * word in wordArray){
}
Which doesn't work.. I think my regex is wrong.
Here is a way to do it using NSScanner that puts the separated strings and a string representation of their ranges into an array (this assumes that your original string started with a # -- if it doesn't and you need it to, then just prepend the hash to the string at the start).
NSMutableArray *array = [NSMutableArray array];
NSString *str = #"#baz#marroon#red#blue #big#cat#dog";
NSScanner *scanner = [NSScanner scannerWithString:str];
NSCharacterSet *searchSet = [NSCharacterSet characterSetWithCharactersInString:#"##"];
NSString *outputString;
while (![scanner isAtEnd]) {
[scanner scanUpToCharactersFromSet:searchSet intoString:nil];
[scanner scanCharactersFromSet:searchSet intoString:&outputString];
NSString *symbol = [outputString copy];
[scanner scanUpToCharactersFromSet:searchSet intoString:&outputString];
NSString *wholePiece = [[symbol stringByAppendingString:outputString]stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSString *rangeString = NSStringFromRange([str rangeOfString:wholePiece]);
[array addObject:wholePiece];
[array addObject:rangeString];
}
NSLog(#"%#",array);
I think the regular expression you really want is [##]?\\w+. It will find groups of letters optionally preceded by an # or #. Your expression wouldn't work because it looks for any non-space character, which includes # and #. (Depending on what can be in the "words," you might want something more or less specific than \w, but it isn't clear from the question.)
If you need the ranges, then NSRegularExpression probably works well:
NSString *comment = #"#baz#marroon#red#blue #big#cat#dog";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"[##]\\w+" options:0 error:nil];
NSArray* wordArray = [regex matchesInString:comment
options:0
range:NSMakeRange(0, [comment length])];
for (NSTextCheckingResult *result in wordArray)
NSLog(#"%#", [comment substringWithRange:result.range]);
Or, [##][a-zA-z]+ works if you're ok with ASCII alpha words only.

Matching HTML with NSRegularExpression

Basically I'm looking for a good example of matching HTML (also newlines and whitespace) using NSRegularExpression.
I have this PHP code I wrote a while back:
preg_match_all("/<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>/si", $data, $m['deets']);
Now I know this works in PHP but for the life of me I can't translate it to Objective-C. Here was my attempt.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>" options:(NSRegularExpressionCaseInsensitive) error:&error];
return [regex matchesInString:target options:NSCaseInsensitiveSearch range:NSMakeRange(0, [target length])];
My target in this case is a bunch of HTML.
I never used NSRegularExpression, but NSPredicate instead :
NSError *error = NULL;
NSString* pattern = #"/<dt>(.+?)<\/dt>\W+<dd>(.+?)<\/dd>/si";
NSPredicate* predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", pattern];
if ([predicate evaluateWithObject:myTargetString] == YES) {
// Okay
} else {
// Not found
}
Hope this helps.
EDIT :
NSPredicate is cool, be don't work if you want to get the matching range of your target string.
Your code is right, but the problem comes from the regexp expression, you must escape your \ characters and not escape / ones.
#"<dt>(.+?)</dt>\\W+<dd>(.+?)</dd>"
So :
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<dt>(.+?)</dt>\\W+<dd>(.+?)</dd>" options:(NSRegularExpressionCaseInsensitive) error:&error];
return [regex matchesInString:target options:NSCaseInsensitiveSearch range:NSMakeRange(0, [target length])];

Take part of string in-between symbols?

I would like to be able to take the numbers lying behind the ` symbol and in front of any character that is non-numerical and convert it into a integer.
Ex.
Original String: 2*3*(123`)
Result: 123
Original String: 4`12
Result: 4
Thanks,
Regards.
You can use regular expressions. You can find all the occurrences like this:
NSString *mystring = #"123(12`)456+1093`";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"([0-9]+)`" options:0 error:nil];
NSArray *matches = [regex matchesInString:mystring options:0 range:NSMakeRange(0, mystring.length)];
for (NSTextCheckingResult *match in matches) {
NSLog(#"%#", [mystring substringWithRange:[match rangeAtIndex:1]]);
}
// 12 and 1093
If you only need one occurrence, then replace the for loop with the following:
if (matches.count>0) {
NSTextCheckingResult *match = [matches objectAtIndex:0];
NSLog(#"%#", [mystring substringWithRange:[match rangeAtIndex:1]]);
}
There can be better way to do this, Quickly i could come up with this,
NSString *mystring = #"123(12`)";
NSString *neededString = nil;
NSScanner *scanner =[NSScanner scannerWithString:mystring];
[scanner scanUpToString:#"`" intoString:&neededString];
neededString = [self reverseString:neededString];
NSLog(#"%#",[self reverseString:[NSString stringWithFormat:#"%d",[neededString intValue]]]);
To reverse a string you can see this

How to find if the first character of last word in a NSString value is Ampersand using NSRegularExpression?

I would like to find if the first letter of last word starts with Ampersand in a NSString value using NSRegularExpression.
I used the following expression, but it shows the last word matching even if the the ampersand is anywhere in the last word.
Please advice me that how can i achieve it.
Thank you.
BOOL flagSymbolFound = NO;
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"[&]\\b\\w*$" options:NSRegularExpressionCaseInsensitive error:&error];
if(!error) {
NSUInteger numberOfMatches = [regex numberOfMatchesInString:stringValue options:0 range:NSMakeRange(0, [stringValue length])];
if(numberOfMatches > 0)
flagSymbolFound = YES;
else
flagSymbolFound = NO;
}
Try "\\s[&]\\w+$" pattern. It should match space-separated words, e.g. foo &bar
Try adding the "^" anchor to the beginning of your regex:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^[&]\\b\\w*$" options:NSRegularExpressionCaseInsensitive error:&error];

NSRegularExpression and capture groups on iphone

I need a little kickstart on regex on the iphone.
Basically I have a list of dates in a private MediaWiki in the form of
*185 BC: SOME EVENT HERE
*2001: SOME OTHER EVENT MUCH LATER
I now want to parse that into an Object that has a NSDate property and a -say- NSString property.
I have this so far: (rawContentString contains the mediawiki syntax of the page)
NSString* regexString =#"\\*( *[0-9]{1,}.*): (.*)";
NSRegularExpressionOptions options = NSRegularExpressionCaseInsensitive;
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regexString options:options error:&error];
if (error) {
NSLog(#"%#", [error description]);
}
NSArray* results = [regex matchesInString:rawContentString options:0 range:NSMakeRange(0, [rawContentString length])];
for (NSTextCheckingResult* result in results) {
NSString* resultString = [rawContentString substringWithRange:result.range];
NSLog(#"%#",resultString);
}
unfortunately I think the regex is not working the way I hope and I dont know how to capture the matched date and text.
Any help would be great.
BTW: there is not by any chance a regex Pattern compilation for MediaWiki Syntax out there somewhere ?
Thanks in advance
Heiko
*
My issue was that I was using matchesInString and I needed to use firstMatchInString because it returns multiple ranges in a single NSTextCheckingResult.
This is counter intuitive, but it worked.
I got the answer from http://snipplr.com/view/63340/
My Code (to parse credit card track data):
NSRegularExpression *track1Pattern = [NSRegularExpression regularExpressionWithPattern:#"%.(.+?)\\^(.+?)\\^([0-9]{2})([0-9]{2}).+?\\?." options:NSRegularExpressionCaseInsensitive error:&error];
NSTextCheckingResult *result = [track1Pattern firstMatchInString:trackString options:NSMatchingReportCompletion range:NSMakeRange(0, trackString.length)];
self.cardNumber = [trackString substringWithRange: [result rangeAtIndex:1]];
self.cardHolderName = [trackString substringWithRange: [result rangeAtIndex:2]];
self.expirationMonth = [trackString substringWithRange: [result rangeAtIndex:3]];
self.expirationYear = [trackString substringWithRange: [result rangeAtIndex:4]];
As for the regex, i think something around these lines:
\*([ 0-9]{1,}.*):(.*)
should work better to what you need. You're not escaping the first *, and why is there a * in the first group statement?