Stripping URLs from image data using NSRegularExpression - iphone

I want to strip image URLs from lots of differently formed HTML.
I have this already:
NSRegularExpression *regex = [[NSRegularExpression alloc]
initWithPattern:#"(?<=img src=\").*?(?=\")"
options:NSRegularExpressionCaseInsensitive error:nil];
this works fine if the HTML is formed like <img src="someurl.jpg" alt="" .../> , but this isn't always the case, sometimes the there are other attributes before src which is doesn't pick up.

Its a difficult thing to do with regular expressions. You are generally better off with using an XMLParser and XPath. However, if the HTML isn't very valid (even if you use TidyHTML), you can find that XPath just won't work very well.
If you must look for images using regular expressions, I would suggest something like:
<\\s*?img\\s+[^>]*?\\s*src\\s*=\\s*([\"\'])((\\\\?+.)*?)\\1[^>]*?>
So assuming you have rawHTML in a string with the same name, use:
NSRegularExpression* regex = [[NSRegularExpression alloc] initWithPattern:#"<\\s*?img\\s+[^>]*?\\s*src\\s*=\\s*([\"\'])((\\\\?+.)*?)\\1[^>]*?>" options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *imagesHTML = [regex matchesInString:rawHTML options:0 range:NSMakeRange(0, [rawHTML length])];
[regex release];
If you want to get out the actual image URL from the source then I'd use something like (run over the output from previous regex):
(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]
Yeah, I know, crazy! But you did ask :-)
Credit: That final regex is from John Gruber/Daring Fireball.
This is some code I've written in the past that returns an array of NSString url's of images. I use it when trying (as a last resort) to get image URL's from very broken HTML:
- (NSArray *)extractSuitableImagesFromRawHTMLEntry:(NSString *)rawHTML {
NSMutableArray *images = [[NSMutableArray alloc] init];
if(rawHTML!=nil&&[rawHTML length]!=0) {
NSRegularExpression* regex = [[NSRegularExpression alloc] initWithPattern:#"<\\s*?img\\s+[^>]*?\\s*src\\s*=\\s*([\"\'])((\\\\?+.)*?)\\1[^>]*?>" options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *imagesHTML = [regex matchesInString:rawHTML options:0 range:NSMakeRange(0, [rawHTML length])];
[regex release];
for (NSTextCheckingResult *image in imagesHTML) {
NSString *imageHTML = [rawHTML substringWithRange:image.range];
NSRegularExpression* regex2 = [[NSRegularExpression alloc] initWithPattern:#"(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))" options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *imageSource=[regex2 matchesInString:imageHTML options:0 range:NSMakeRange(0, [imageHTML length])];
[regex2 release];
NSString *imageSourceURLString=nil;
for (NSTextCheckingResult *result in imageSource) {
NSString *str=[imageHTML substringWithRange:result.range];
//DebugLog(#"url is %#",str);
if([str hasPrefix:#"http"]) {
//strip off any crap after file extension
//find jpg
NSRange r1=[str rangeOfString:#".jpg" options:NSBackwardsSearch&&NSCaseInsensitiveSearch];
if(r1.location==NSNotFound) {
//find jpeg
NSRange r2=[str rangeOfString:#".jpeg" options:NSBackwardsSearch&&NSCaseInsensitiveSearch];
if(r2.location==NSNotFound) {
//find png
NSRange r3=[str rangeOfString:#".png" options:NSBackwardsSearch&&NSCaseInsensitiveSearch];
if(r3.location==NSNotFound) {
break;
} else {
imageSourceURLString=[str substringWithRange:NSMakeRange(0, r3.location+r3.length)];
}
} else {
//jpeg was found
imageSourceURLString=[str substringWithRange:NSMakeRange(0, r2.location+r2.length)];
break;
}
} else {
//jpg was found
imageSourceURLString=[str substringWithRange:NSMakeRange(0, r1.location+r1.length)];
break;
}
}
}
if(imageSourceURLString==nil) {
//DebugLog(#"No image found.");
} else {
DebugLog(#"*** image found: %#", imageSourceURLString);
NSURL *imageURL=[NSURL URLWithString:imageSourceURLString];
if(imageURL!=nil) {
[images addObject:imageURL];
}
}
}
}
return [images autorelease];
}

Related

regex to find hashtags in tweet not working correctly

I am trying to build a function to find a hashtags in tweest. And surround them with an HTML <a> tag. so that I can link to them. Here is what I do.
NSError* error = nil;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:#"(?:\\s|\\A)[##]+([A-Za-z0-9-_]+)" options:0 error:&error];
NSArray* matches = [regex matchesInString:tweetText options:0 range:NSMakeRange(0, [tweetText length])];
for ( NSTextCheckingResult* match in matches )
{
NSString* matchText = [tweetText substringWithRange:[match range]];
NSString *matchText2 = [matchText stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSString *search = [matchText2 stringByReplacingOccurrencesOfString:#"#"
withString:#""];
NSString *searchHTML= [NSString stringWithFormat:#"<a href='https://twitter.com/search?q=%%23%#'>%#</a>",search,matchText];
tweetText = [tweetText stringByReplacingOccurrencesOfString:matchText
withString:searchHTML];
NSLog(#"match: %#", tweetText);
}
Before I execute this function, the tweetText is looped through another function to find the URL. so the tweet can contain the following. <a href='http://google.be' target='_blank'>http://google.be</a>
Now sometimes it places another tag around other links and not only around the hashtags.
Can somebody help me with this.
TIP
I am trying to transform the following JAVA code into OBJ-C
String patternStr = "(?:\\s|\\A)[##]+([A-Za-z0-9-_]+)"
Pattern pattern = Pattern.compile(patternStr)
Matcher matcher = pattern.matcher(tweetText)
String result = "";
// Search for Hashtags
while (matcher.find()) {
result = matcher.group();
result = result.replace(" ", "");
String search = result.replace("#", "");
String searchHTML="<a href='http://search.twitter.com/search?q=" + search + "'>" + result + "</a>"
tweetText = tweetText.replace(result,searchHTML);
}
EDIT
Gers, we kijken er al naar uit! “#GersPardoel: We zitten in België straks naar Genk!!<a href='<a href<a href='https://twitter.com/search?q=%23='http'>='http</a>s://twitter.com/search?q=%23https:/'>https:/</a>/twitter.com/search?q=%23engaan'> #engaan</a>” #GOS12 #genk #fb
The problem is that you're modifying your tweetText variable (tweetText = ...) as you're looping through matches. Imagine what happens the next time code enters the loop? The substringWithRange will not work properly since it was created on the original string. Try to rectify the problem and if you're unable to do it, check the solution here: http://pastebin.com/DyQqtRzA
EDIT: Adding solution here:
NSError* error = nil;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:#"(?:\\s|\\A)[##]+([A-Za-z0-9-_]+)" options:0 error:&error];
NSArray* matches = [regex matchesInString:tweetText options:0 range:NSMakeRange(0, [tweetText length])];
NSString* processedString = [[tweetText copy] autorelease];
for ( NSTextCheckingResult* match in matches )
{
NSString* matchText = [tweetText substringWithRange:[match range]];
NSString *matchText2 = [matchText stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSString *search = [matchText2 stringByReplacingOccurrencesOfString:#"#"
withString:#""];
NSString *searchHTML= [NSString stringWithFormat:#"<a href='https://twitter.com/search?q=%%23%#'>%#</a>",search,matchText];
processedString = [processedString stringByReplacingOccurrencesOfString:matchText
withString:searchHTML];
NSLog(#"match: %#", processedString);
}

how to write NSRegularExpression to xx:xx:xx?

i am trying to check if NSString is in specific format. dd:dd:dd. I was thinking of NSRegularExpression. Something like
/^(\d)\d:\d\d:\d\d)$/ ?
Have you tried something like:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^\d{2}:\d{2}:\d{2}$"
options:0
error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
(I haven't tested it, because I cannot right now, but it should be working)
I suggest to use RegexKitLite
With this and assuming that in dd:dd:dd 'd' actually stands for a digit from 0-9 it should be fairly easy to implement what you need given the additional comment from Grijesh.
Here's an example copied from the RegexKitLite page:
// finds phone number in format nnn-nnn-nnnn
NSString *regEx = #"{3}-[0-9]{3}-[0-9]{4}";
NSString *match = [textView.text stringByMatching:regEx];
if ([match isEqual:#""] == NO) {
NSLog(#"Phone number is %#", match);
} else {
NSLog(#"Not found.");
}
UPDATE:
NSString *idRegex = #"[0-9][0-9]:[0-9][0-9]:[0-9][0-9]";
NSPredicate *idTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", idRegex];
for (NSString * str in newArrAfterPars) {
if ([idTest evaluateWithObject:str]) {
}
}

NSRegularExpression ISSUE

I'm working with NSRegularExpression to read a text and find out hashtag.
This is NSString that I used in regularExpressionWithPattern.
- (NSString *)hashtagRegex
{
return #"#((?:[A-Za-z0-9-_]*))";
//return #"#{1}([A-Za-z0-9-_]{2,})";
}
And this is my method:
// Handle Twitter Hashtags
detector = [NSRegularExpression regularExpressionWithPattern:[self hashtagRegex] options:0 error:&error];
links = [detector matchesInString:theText options:0 range:NSMakeRange(0, theText.length)];
current = [NSMutableArray arrayWithArray:links];
NSString *hashtagURL = #"http://twitter.com/search?q=%23";
//hashtagURL = [hashtagURL stringByAddingPercentEscapesUsingEncoding:NSASCIIStringEncoding];
for ( int i = 0; i < [links count]; i++ ) {
NSTextCheckingResult *cr = [current objectAtIndex:i];
NSString *url = [theText substringWithRange:cr.range];
NSString *nohashURL = [url stringByReplacingOccurrencesOfString:#"#" withString:#""];
nohashURL = [nohashURL stringByReplacingOccurrencesOfString:#" " withString:#""];
[theText replaceOccurrencesOfString:url
withString:[NSString stringWithFormat:#"%#", hashtagURL, nohashURL, url]
options:NSLiteralSearch
range:NSMakeRange(0, theText.length)];
current = [NSMutableArray arrayWithArray:[detector matchesInString:theText options:0 range:NSMakeRange(0, theText.length)]];
}
[theText replaceOccurrencesOfString:#"\n" withString:#"<br />" options:NSLiteralSearch range:NSMakeRange(0, theText.length)];
[_aWebView loadHTMLString:[self embedHTMLWithFontName:[self fontName]
size:[self fontSize]
text:theText]
baseURL:nil];
Everything worked but it figured out a little issue when I use a string like this:
NSString * theText = #"#twitter #twitterapp #twittertag";
My code highlights only #twitter on each word and not the second part of it (#twitter #twitter(app) #twitter(tag)).
I hope someone will help me!
Thank you :)
The statement
[theText replaceOccurrencesOfString:url
withString:[NSString stringWithFormat:#"%#", hashtagURL, nohashURL, url]
options:NSLiteralSearch
range:NSMakeRange(0, theText.length)];
is replacing all instances of the string url with the replacement string. In the example you give, the first time through the loop, url is #"#twitter", and all three occurrences of that string within theText are replaced in one go. This is what theText looks like then:
#twitter #twitterapp #twittertag
So, of course, the next two times round the loop, the results are not quite what you expect... !
I think the fix is to limit the range of the replacement:
[theText replaceOccurrencesOfString:url
withString:[NSString stringWithFormat:#"%#", hashtagURL, nohashURL, url]
options:NSLiteralSearch
range:cr.range];

Take part of string in-between symbols?

I would like to be able to take the numbers lying behind the ` symbol and in front of any character that is non-numerical and convert it into a integer.
Ex.
Original String: 2*3*(123`)
Result: 123
Original String: 4`12
Result: 4
Thanks,
Regards.
You can use regular expressions. You can find all the occurrences like this:
NSString *mystring = #"123(12`)456+1093`";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"([0-9]+)`" options:0 error:nil];
NSArray *matches = [regex matchesInString:mystring options:0 range:NSMakeRange(0, mystring.length)];
for (NSTextCheckingResult *match in matches) {
NSLog(#"%#", [mystring substringWithRange:[match rangeAtIndex:1]]);
}
// 12 and 1093
If you only need one occurrence, then replace the for loop with the following:
if (matches.count>0) {
NSTextCheckingResult *match = [matches objectAtIndex:0];
NSLog(#"%#", [mystring substringWithRange:[match rangeAtIndex:1]]);
}
There can be better way to do this, Quickly i could come up with this,
NSString *mystring = #"123(12`)";
NSString *neededString = nil;
NSScanner *scanner =[NSScanner scannerWithString:mystring];
[scanner scanUpToString:#"`" intoString:&neededString];
neededString = [self reverseString:neededString];
NSLog(#"%#",[self reverseString:[NSString stringWithFormat:#"%d",[neededString intValue]]]);
To reverse a string you can see this

Match NSArray of characters Objective-C

I have to match the number of occurrences of n special characters in a string.
I thought to create an array with all these chars (they are 20+) and create a function to match each of them.
I just have the total amount of special characters in the string, so I can make some math count on them.
So in the example:
NSString *myString = #"My string #full# of speci#l ch#rs & symbols";
NSArray *myArray = [NSArray arrayWithObjects:#"#",#"#",#"&",nil];
The function should return 5.
Would it be easier match the characters that are not in the array, take the string length and output the difference between the original string and the one without special chars?
Is this the best solution?
NSString *myString = #"My string #full# of speci#l ch#rs & symbols";
//even in first continuous special letters it contains -it will return 8
//NSString *myString = #"#&#My string #full# of speci#l ch#rs & symbols";
NSArray *arr=[myString componentsSeparatedByCharactersInSet:[NSMutableCharacterSet characterSetWithCharactersInString:#"##&"]];
NSLog(#"resulted string : %# \n\n",arr);
NSLog(#"count of special characters : %i \n\n",[arr count]-1);
OUTPUT:
resulted string : (
"My string ",
full,
" of speci",
"l ch",
"rs ",
" symbols"
)
count of special characters : 5
You should utilize an NSRegularExpression, its perfect for your scenario. You can create one like this:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(#|&)" options:NSRegularExpressionCaseInsensitive error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])];
Caveat: I ripped the code from the Apple Developer site. And I'm no regex guru so you will have to tweak the pattern. But you get the gist.
You should look also at NSRegularExpression:
- (NSUInteger)numberOfCharacters:(NSArray *)arr inString:(NSString *)str {
NSMutableString *mutStr = #"(";
for(i = 0; i < [arr count]; i++) {
[mutStr appendString:[arr objectAtIndex:i]];
if(i+1 < [arr count]) [mutStr appendString:#"|"];
}
[mutStr appendString:#")"];
NSRegularExpression *regEx = [NSRegularExpression regularExpressionWithPattern:mutStr options:NSRegularExpressionCaseInsensitive error:nil];
NSUInteger *occur = [regExnumberOfMatchesInString:str options:0 range:NSMakeRange(0, [string length])];
[mutStr release];
return occur;
}
Usage example:
NSString *myString = #"My string #full# of speci#l ch#rs & symbols";
NSArray *myArray = [NSArray arrayWithObjects:#"#",#"#",#"&",nil];
NSLog(#"%d",[self numberOfCharacters:myArray inString:myString]); // will print 5