I used this regex expression to search for img src in a string in one on my site.
Now I wan't to use this expression to do the same thing in objective c. How can I do that using RegexKitLite?
This is my expression
/<img.+src=[\'"]([^\'"]+)[\'"].*>/i
#Tim Pietzcker
Your code works great but for example if I try to search img in this string
<p> <img src="http://www.nationalgeographic.it/images/2011/07/29/115624013-20034abf-4d91-40fe-98ab-782f06a9854d.jpg" width="140" align="left" hspace="10">Scoperta in America del Sud la sepoltura pre-incaica di un uomo circondato da coltelli cerimoniali che secondo gli archeologi eseguiva sacrifici umani</p>
I have this result in my array:
matchArray: (
"<img src=\"http://www.nationalgeographic.it/images/2011/07/29/115624013-20034abf-4d91-40fe-98ab-782f06a9854d.jpg\" width=\"140\" align=\"left\" hspace=\"10\">"
)
How can I mod your regex to only get the content of src tag? thank you so much
The / delimiters are throwing you off. Also, you should at least use lazy quantifiers. Try this:
NSString *regexString = #"(?i)<img.+?src=['\"]([^'\"]+)['\"].*?>";
This breaks when filenames contain quotes, by the way. Could that be a problem for you?
A regex that's a bit safer (and that handles quotes well) would be
NSString *regexString = #"(?i)<img[^<>]+?src=(['\"])((?:(?!\\1).)+)\\1[^<>]*>";
However, now the matches filename will be in capture group 2, not 1, so you need to modify any code that uses the filename after the match.
Related
I am trying to extract a substring out of some html code in wxWidgets but I can't get my method working properly.
content of to_parse:
[HTML CODE]
<html><head></head><body><font face="Segue UI" size=2 .....<font face="Segoe UI"size="2" color="#000FFF"><font face="#DFKai-SB" ... <b><u> the text </u></b></font></font></font></body></html>
[/HTML CODE] (sorry about the format)
wxString to_parse = SOStream.GetString();
size_t spos = to_parse.find_last_of("<font face=",wxString::npos);
size_t epos = to_parse.find_first_of("</font>",wxString::npos);
wxString retstring(to_parse.Mid(spos,epos));
wxMessageBox(retstring); // Output: always ---> tml>
As there are several font face tags in the HTML the to_parse variable I would like to find the postion of the last <"font face= and the postion of the first <"/font>" close tag.
For some reason, only get the same to me unexpected output tml>
Can anyone spot the reason why?
The methods find_{last,first}_of() don't do what you seem to think they do, they behave in the same way as std::basic_string<> methods of the same name and find the first (or last) character of the string you pass to them, see the documentation.
If you want to search for a substring, use find().
Thank you for the answer. Yes you were right, I must have somehow been under the impression that Substring() / substr() / Mid() takes two wxStrings as parameters, which isn't the case.
wxString to_parse = SOStream.GetString();
to_parse = to_parse.Mid(to_parse.find("<p ")); disregarts everything before "<p "
to_parse = to_parse.Remove(to_parse.find("</p>")); removes everything after "</p>"
wxMessageBox(to_parse); // so we are left with everything between "<p" and "</p>"
I am trying to add '\' before all special characters in a string in MATLAB, could anyone please help me out. Here is the example:
tStr = 'Hi, I'm a Big (Not So Big) MATLAB addict; Since my school days!';
I want this string to be changed to:
'Hi\, I\'m a Big \(Not so Big \) MATLAB addict\; Since my school days\!'
The escape character in Matlab is the single quote ('), not the backslash (\), like in C language. Thus, your string must be like this:
tStr = 'Hi\, I\''m a Big (Not so Big ) MATLAB addict\; Since my school days!'
I took the list of special charecters defined on the Mathworks webpage to do this:
special = '[]{}()=''.().....,;:%%{%}!#';
tStr = 'Hi, I''m a Big (Not So Big) MATLAB addict; Since my school days!';
outStr = '';
for l = tStr
if (length(find(special == l)) > 0)
outStr = [outStr, '\', l];
else
outStr = [outStr, l];
end
end
which will automatically add those \s. You do need to use two single quotes ('') in place of the apostrophe in your input string. If tStr is obtained with the function input(), or something similar, this will procedure will still work.
Edited:
Or using regular expressions:
regexprep(tStr,'([[\]{}()=''.(),;:%%{%}!#])','\\$1')
I want to clean my text from html tags, html spacial characters and characters like < > [ ] / \ * ,
I used $str = preg_replace("/&#?[a-zA-Z0-9]+;/i", "", $str);
it works well with html special characters but some characters doesn't remove like :
( /*/*]]>*/ )
how can I remove these characters?
If you are really using php as it looks like, you can just use:
$str = htmlspecialchars($str);
All HTML chars will be escaped (which could be better than just stripping them). If you really want just to filter these characters, what you need to do is escape those characters on the chars list:
$str = preg_replace("/[\&#\?\]\[\/\\\<\>\*\:\(\);]*/i","",$str);
Notice there's just one "/[]*/i", I removed the a-zA-Z0-9 as you should want these chars in. You can also classify only the desired chars to enter your string (will give you trouble with accentuations like á é ü if you use them, you have to specify every accepted char):
$str = preg_replace("/[^a-zA-Z0-9áÁéÉíÍãÃüÜõÕñÑ\.\+\-\_\%\$\#\!\=;]*/","",$str);
Notice also there's never too much to escape characters, unless for example for the intervals (\a-\z would do fine, \a-\z would match a, or -, or z).
I hope it helps. :)
Regular expression for html tags is:
/\<(.*)?\>/
so use something like this:
// The regular expression to remove HTML tags
$htmltagsregex = '/\<(.*)?\>/';
// what shit will substitute it
$nothing = '';
// the string I want to apply it to
$string = 'this is a string with <b>HTML tags</b> that I want to <strong>remove</strong>';
// DO IT
$result = preg_replace ($htmltagsregex,nothing,$string);
and it will return
this is a string with HTML tags that I want to remove
That's all
I have a chat view, where users can send urls to one another.
In case of a url, I want to let the user press on the link and open a web view.
I'm using IFTweetLabel which uses RegexKitLite.
Currently the only support available is if the url starts with http/https.
I want to support links without the http, for example : www.nytimes.com , and even without the "www" , nytimes.com. (and bunch of other extentions).
This is the http/s prefix reg exp :
#"([hH][tT][tT][pP][sS]?:\\/\\/[^ ,'\">\\]\\)]*[^\\. ,'\">\\]\\)])
Can someone tell me the other regular expressions I need to answer my other requirements.
I tried using This one, but adding it to objective c code generates a lot of issues.
Thanks
The following is John Grubers URL Matching Regex:
(?i)\b(?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’])
The following is a regex I came up with by blending a few other regexes I had around and a good chunk of Grubers regex:
(?i)\b(?:(?:[a-z][\w\-]+://(?:\S+?(?::\S+?)?\#)?)|(?:(?:[a-z0-9\-]+\.)+[a-z]{2,4}))(?:[^\s()<>]+|\((?:[^\s()<>]+|(?:\([^\s()<>]*\)))*\))*(?<![\s`!()\[\]{};:'".,<>?«»“”‘’])
The following is a sample program that demonstrates, via RegexKitLite, what each regex matches against the sample text of:
Did you see
http://www.stackoverflow.com? Or
http://www.stackoverflow.com/?
And then there is
www.stackoverflow.com/, along with
www.stackoverflow.com/index.
Maybe something like stackoverflow.com
with extra stackoverflow.com? Or
"stackoverflow.com"?
Perhaps jobs.stackoverflow.com, or
'http://twitter.com/#!/CHOCKENBERRY',
the CHOCKLOCK!!
File
#file:///Users/johne/rkl/rkl.html#RegexKitLiteCookbook?
Maybe
http://www.yahoo.com/index///i.html!
http://www.yahoo.com/////xyz.html?!
The code:
#import <Foundation/Foundation.h>
#import "RegexKitLite.h"
int main(int argc, char *argv[]) {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSString *urlRegex = #"(?i)\\b(?:(?:[a-z][\\w\\-]+://(?:\\S+?(?::\\S+?)?\\#)?)|(?:(?:[a-z0-9\\-]+\\.)+[a-z]{2,4}))(?:[^\\s()<>]+|\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]*\\)))*\\))*(?<![\\s`!()\\[\\]{};:'\".,<>?«»“”‘’])";
// John Gruber's URL matching regex from http://daringfireball.net/2010/07/improved_regex_for_matching_urls
NSString *gruberURLRegex = #"(?i)\\b(?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’])";
NSString *urlString = #"Did you see http://www.stackoverflow.com? Or http://www.stackoverflow.com/?\n\nAnd then there is www.stackoverflow.com/, along with www.stackoverflow.com/index.\n\nMaybe something like stackoverflow.com with extra stackoverflow.com? Or \"stackoverflow.com\"?\n\nPerhaps jobs.stackoverflow.com, or 'http://twitter.com/#!/CHOCKENBERRY', the CHOCKLOCK!!\n\nFile #file:///Users/johne/rkl/rkl.html#RegexKitLiteCookbook?\n\nMaybe http://www.yahoo.com/index///i.html! http://www.yahoo.com/////xyz.html?!";
NSLog(#"String :\n\n%#\n\n", urlString);
NSLog(#"Matches: %#\n", [urlString componentsMatchedByRegex:urlRegex]);
NSLog(#"Gruber URL Regex Matches: %#\n", [urlString componentsMatchedByRegex:gruberURLRegex]);
[pool release]; pool = NULL;
return(0);
}
Compile with:
shell% gcc -o url url.m RegexKitLite.m -framework Foundation -licucore
When run:
shell% ./url
2011-05-27 20:32:58.204 url[25520:903] String :
Did you see http://www.stackoverflow.com? Or http://www.stackoverflow.com/?
And then there is www.stackoverflow.com/, along with www.stackoverflow.com/index.
Maybe something like stackoverflow.com with extra stackoverflow.com? Or "stackoverflow.com"?
Perhaps jobs.stackoverflow.com, or 'http://twitter.com/#!/CHOCKENBERRY', the CHOCKLOCK!!
File #file:///Users/johne/rkl/rkl.html#RegexKitLiteCookbook?
Maybe http://www.yahoo.com/index///i.html! http://www.yahoo.com/////xyz.html?!
2011-05-27 20:32:58.211 url[25520:903] Matches: (
"http://www.stackoverflow.com",
"http://www.stackoverflow.com/",
"www.stackoverflow.com/",
"www.stackoverflow.com/index",
"stackoverflow.com",
"stackoverflow.com",
"stackoverflow.com",
"jobs.stackoverflow.com",
"http://twitter.com/#!/CHOCKENBERRY",
"file:///Users/johne/rkl/rkl.html#RegexKitLiteCookbook",
"http://www.yahoo.com/index///i.html",
"http://www.yahoo.com/////xyz.html"
)
2011-05-27 20:32:58.213 url[25520:903] Gruber URL Regex Matches: (
"http://www.stackoverflow.com",
"http://www.stackoverflow.com/",
"www.stackoverflow.com/",
"www.stackoverflow.com/index",
"http://twitter.com/#!/CHOCKENBERRY",
"file:///Users/johne/rkl/rkl.html#RegexKitLiteCookbook",
"http://www.yahoo.com/index///i.html",
"http://www.yahoo.com/////xyz.html"
)
EDIT 2011/05/27: Made a minor change to the regex to fix a problem where it wasn't matching ( ) parenthesis correctly.
EDIT 2011/05/27: Found some additional corner cases that the regex above didn't handle well. Updated regex:
(?i)\b(?:[a-z][\w\-]+://(?:\S+?(?::\S+?)?\#)?)?(?:(?:(?<!:/|\.)(?:(?:[a-z0-9\-]+\.)+[a-z]{2,4}(?![a-z]))|(?<=://)/))(?:(?:[^\s()<>]+|\((?:[^\s()<>]+|(?:\([^\s()<>]*\)))*\))*)(?<![\s`!()\[\]{};:'".,<>?«»“”‘’])
... as an Obj-C string:
#"(?i)\\b(?:[a-z][\\w\\-]+://(?:\\S+?(?::\\S+?)?\\#)?)?(?:(?:(?<!:/|\\.)(?:(?:[a-z0-9\\-]+\\.)+[a-z]{2,4}(?![a-z]))|(?<=://)/))(?:(?:[^\\s()<>]+|\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]*\\)))*\\))*)(?<![\\s`!()\\[\\]{};:'\".,<>?«»“”‘’])";
The OP also asked for how to make sure the trailing TLD was "valid". Here's the same regex, in Obj-C string form, with all the the currently valid TLDs (as of 2011/05/27):
#"(?i)\\b(?:[a-z][\\w\\-]+://(?:\\S+?(?::\\S+?)?\\#)?)?(?:(?:(?<!:/|\\.)(?:(?:[a-z0-9\\-]+\\.)+(?:(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arpa|as|asia|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cat|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|com|coop|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|edu|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gov|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|info|int|io|iq|ir|is|it|je|jm|jo|jobs|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mil|mk|ml|mm|mn|mo|mobi|mp|mq|mr|ms|mt|mu|museum|mv|mw|mx|my|mz|na|name|nc|ne|net|nf|ng|ni|nl|no|np|nr|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|pro|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tel|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|travel|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|xn--0zwm56d|xn--11b5bs3a9aj6g|xn--3e0b707e|xn--45brj9c|xn--80akhbyknj4f|xn--90a3ac|xn--9t4b11yi5a|xn--clchc0ea0b2g2a9gcd|xn--deba0ad|xn--fiqs8s|xn--fiqz9s|xn--fpcrj9c3d|xn--fzc2c9e2c|xn--g6w251d|xn--gecrj9c|xn--h2brj9c|xn--hgbk6aj7f53bba|xn--hlcj6aya9esc7a|xn--j6w193g|xn--jxalpdlp|xn--kgbechtv|xn--kprw13d|xn--kpry57d|xn--lgbbat1ad8j|xn--mgbaam7a8h|xn--mgbayh7gpa|xn--mgbbh1a71e|xn--mgbc0a9azcg|xn--mgberp4a5d4ar|xn--o3cw4h|xn--ogbpf8fl|xn--p1ai|xn--pgbs0dh|xn--s9brj9c|xn--wgbh1c|xn--wgbl6a|xn--xkc2al3hye2a|xn--xkc2dl3a5ee0h|xn--yfro4i67o|xn--ygbi2ammx|xn--zckzah|xxx|ye|yt|za|zm|zw))(?![a-z]))|(?<=://)/))(?:(?:[^\\s()<>]+|\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]*\\)))*\\))*)(?<![\\s`!()\\[\\]{};:'\".,<>?«»“”‘’])";
This will match both http://example.org and www.example.org.
#"(([hH][tT][tT][pP][sS]?:\\/\\/|www\\.)[^ ,'\">\\]\\)]*\\.[^\\. ,'\">\\]\\)]{2,6})
Although i added a "match group", so check the match/search result returned by the RegExp so the right parameters are re-inserted in the right place.
If you could post the entire code snippet, it would be easier.
RegExp explanation:
(
(
[hH][tT][tT][pP][sS]?:\/\/ # Match HTTP/http (and hTtP :)
| # OR
www\. # www<literal DOT>
)
[^ ,'\">\]\)]* # Match at least 1 character that are not any of space, comma, apostrophe, quotation mark, "more than", "right square bracket", "right parenthese"
\. # Match <literal DOT>
[^\. ,'\">\]\)]{2,6} # Match 2-6 characters that are not any of dot, space, comma, apostrophe, quotation mark, "more than", "right square bracket", "right parenthese"
)
You don't want to use a regular expression for this.
You want an NSDataDetector, and it'll find them all for you.
I have:
NSString *promise = #"thereAreOtherWorldsThanThese";
which I'm trying to transform into the string:
#"There are other worlds than these"
I'm guessing this is a regex job, but I'm very new to Objective C, and have so far had no luck. I would greatly appreciate any help!
I'd use GTMRegex (http://code.google.com/p/google-toolbox-for-mac/), for example:
NSString *promise = #"thereAreOtherWorldsThanThese";
GTMRegex *regex = [GTMRegex regexWithPattern:#"([A-Z])"];
NSLog(#"%#", [[regex stringByReplacingMatchesInString:promise
withReplacement:#" \\1"] lowercaseString]);
As for removing the uppercase letters you can simply use lowercaseString on NSString.
But as for inserting spaces just before an uppercase letter, I would agree that it would be a job for a regex, and sadly, my regex fu is rubbish :)
Without using any libraries you can use this NSString category I posted. Just perform lowerCaseString on the string array.
How do I convert an NSString from CamelCase to TitleCase, 'playerName' into 'Player Name'?