Regex for an email address doesn't work - iphone

I'm trying to check if some email address is correct with the following code :
NSPredicate *regexMail = [NSPredicate predicateWithFormat:#"SELF MATCHES '.*#.*\..*'"];
if([regexMail evaluateWithObject:someMail])
...
But the "\." doesn't seem to work since the mail "smith#company" is accepted. Besides, I have the following warning : "Unknown escape sequence"
Edit :
I'm programming in Objective-C for iPhone.
Thanks

This is the regular expression used by the iOS Mail application to validate an email address:
^[[:alnum:]!#$%&’*+/=?^_`{|}~-]+((\.?)[[:alnum:]!#$%&’*+/=?^_`{|}~-]+)*#[[:alnum:]-]+(\.[[:alnum:]-]+)*(\.[[:alpha:]]+)+$
And here is a copy/paste ready function using this regular expression to validate an email address in Objective-C:
BOOL IsValidEmail(NSString *email)
{
// Regexp from -[NSString(NSEmailAddressString) mf_isLegalEmailAddress] in /System/Library/PrivateFrameworks/MIME.framework
NSString *emailRegex = #"^[[:alnum:]!#$%&'*+/=?^_`{|}~-]+((\\.?)[[:alnum:]!#$%&'*+/=?^_`{|}~-]+)*#[[:alnum:]-]+(\\.[[:alnum:]-]+)*(\\.[[:alpha:]]+)+$";
NSPredicate *emailPredicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", emailRegex];
return [emailPredicate evaluateWithObject:email];
}

You cannot correctly validate an email address with regular expressions alone. A simple search will show you many articles discussing this. The problem lies with the nature of DNS: there are too many possible domain names (including non-english and Unicode domains), you cannot correctly validate them using a regex. Don't try.
The only correct way to determine if an email address is valid is to send a message to the address with a unique URL that identifies the account associated with the email, for the user to click. Anything else will annoy your end-user.

Here's a working example, with a slightly more appropriate pattern (although it's not perfect, as others have mentioned):
NSString* pattern = #"[A-Z0-9a-z._%+-]+#[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}";
NSPredicate* predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", pattern];
if ([predicate evaluateWithObject:#"johndoe#example.com"] == YES) {
// Match
} else {
// No Match
}

I guess it should be \\., since \ itself should be escaped as well.

This page has a good explanation of using regular expressions to validate email, as well as some regexes:
http://www.regular-expressions.info/email.html
Their expression:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
Seems to be the best tradeoff between thoroughness and correctness.

try putting double slash instead of one slash. that's what might the Unknown escape sequence mean.
here is a website that can help you understand how to use regex: http://www.wellho.net/regex/java.html
or just find the appropriate regex for email address here:
http://regexlib.com/DisplayPatterns.aspx?cattabindex=0&categoryId=1

I think youre looking for this. Its a quite comprehensive listing of different regexps and a list of mail addresses for each, stating if the regexp was successful or not.

Copy paste solution (I added capital letters for the first example):
We get a more practical implementation of RFC 5322 if we omit the
syntax using double quotes and square brackets. It will still match
99.99% of all email addresses in actual use today. Source
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
NSString * derivedStandard = #"[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?";
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", derivedStandard];
BOOL isValid = [predicate evaluateWithObject:emailAddress];
//doesn't fail on ;)
#"asdkfjaskdjfakljsdfasdkfjaskdjfakljsdfasdkfjaskdjfakljsdfasdkfjaskdjfakljsdf"
For those who want to implement full RFC5322
The official standard is known as RFC 5322. It describes the syntax
that valid email addresses must adhere to. You can (but you
shouldn't--read on) implement it with this regular expression:
(?:[a-z0-9!#$%\&'*+/=?\^_`{|}~-]+(?:\.[a-z0-9!#$%\&'*+/=?\^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
NSString *rfc5322 = #"(?:[a-z0-9!#$%\\&'*+/=?\\^_`{|}~-]+(?:\\.[a-z0-9!#$%\\&'*+/=?\\^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", rfc5322];
BOOL isValid = [predicate evaluateWithObject:emailAddress];
Above regexps you can test using online regular expression tester

Related

iOS: Valid zip code check

Question: How would one write a function to check and return whether or not a string (NSString) contains a valid zip code worldwide.
Additional info: I am aware of RegEx in iOS. However I am not so fluent at it. Please keep in mind this should accepts anything valid in any country as true.
Examples
US - "10200"
US - "33701-4313"
Canada - "K8N 5W6"
UK - "3252-322"
etc.
Edit: Those who voted down or to close the question, please do mention why. Thank you.
^[ABCEGHJKLMNPRSTVXY]\d[A-Z][- ]*\d[A-Z]\d$
Matches Canadian PostalCode formats with or without spaces (e.g., "T2X 1V4" or "T2X1V4")
^\d{5}(-\d{4})?$
Matches all US format ZIP code formats (e.g., "94105-0011" or "94105")
(^\d{5}(-\d{4})?$)|(^[ABCEGHJKLMNPRSTVXY]\d[A-Z][- ]*\d[A-Z]\d$)
Matches US or Canadian codes in above formats.
UK codes are more complicated than you think: http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom
I suggest you don't do this. I've seen many websites that try to enforce zipcodes, but I've never seen one get it right. Even the name zipcode is specific to the US.
In other words:
- (BOOL)isValidZipCode: (NSString *)zip {
return YES;
}
I was originally going to write [zip length] > 0, but of course even that isn't guaranteed.
Each country that uses postcodes/zip codes usually has their own format. You are going to be hard-pressed to find a regular expression that matches any worldwide code!
You're better off adding a country picker that determines the regular expression (if any) to be used to validate the zip code.
As an aside, the postcode you have given as a UK example is not correct. A decent UK regex is:
^(^gir\\s0aa$)|(^[a-pr-uwyz]((\\d{1,2})|([a-hk-y]\\d{1,2})|(\\d[a-hjks-uw])|([a-hk-y]\\d[abehmnprv-y]))\\s\\d[abd-hjlnp-uw-z]{2}$)$

supporting internationalization for NSString's

I have a bunch of the following line of code:
[NSString stringWithFormat:#"%# and %#", subject.title, secondsubject.title];
[NSString stringWithFormat:#"%# and %d others", subject.title, [newsfeeditem count] - 1];
and a lot more in the app. Basically I am building a news feed style like facebook where it has string constants. blah liked blah. Where/how should I do these string constants so it's easy to do for internationalization? Should I have a file just for storing string constants?
See the String Resources section of the Resource Programming Guide. The key section for this particular problem is "Formatting String Resources."
You'd have something like:
[NSString stringWithFormat:NSLocalizedString(#"%1$# and %2$#", #"two nouns combined by 'and'"),
subject.title, secondsubject.title];
The %1$# is the location of the first substitution. This lets you rearrange the text. Then you would have string resource files like:
English:
/* two nouns combined by 'and' */
"%1$# and %2$#" = "%1$# and %2$#";
Spanish:
/* two nouns combined by 'and' */
"%1$# and %2$#" = "%1$# y %2$#";
You need to be very thoughtful about these kinds of combinations. First, you can never build up a sentence out of parts of sentences in a translatable way. You're almost always need to translate the entire message in one go. What I mean is that you can't have one string that says #"I'm going to delete" and another string that says #"%# and %#" and glue them together. The word order is too variable between languages.
Similarly, complex lists of things can cause all kinds of headaches due to various agreement rules. Some languages have special plural rules, gender agreements, and similar issues. As much as possible, keep your messages simple, short, and static.
But the above tools are very useful for solving the problem you're discussing. See the docs for more details.

Sqlite for iOS - Accent (tilde) insensitive match in an fts table

I enabled fts in sqlite for iphone and tried this and works, although very slow:
SELECT field FROM table_fts WHERE replace(replace(replace(replace(replace(lower(field), 'á','a'),'é','e'),'í','i'),'ó','o'),'ú','u') LIKE replace(replace(replace(replace(replace(lower('%string%'), 'á','a'),'é','e'),'í','i'),'ó','o'),'ú','u')
But it does not work when I want to use MATCH, it does not bring me results and there is no error
SELECT field FROM table_fts WHERE replace(replace(replace(replace(replace(lower(field), 'á','a'),'é','e'),'í','i'),'ó','o'),'ú','u') MATCH replace(replace(replace(replace(replace(lower('string'), 'á','a'),'é','e'),'í','i'),'ó','o'),'ú','u')
Is there any error or is there any other approach where I can make a tilde insensitive search?. I looked answers in the web with no success.
Two approaches:
First, you can violate normal-form and add columns to your table containing ASCII-only representation of your searchable fields. Furthermore, before doing a search against this secondary search column, you also remove international characters from the string that being searched for, too (that way you're looking for ASCII-only string in a field with the ASCII-only representation).
By the way, if you want a more general purpose conversion of international characters with ASCII, you can try something like:
- (NSString *)replaceInternationalCharactersIn:(NSString *)text
{
NSData *stringData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
return [[[NSString alloc] initWithData:stringData encoding:NSASCIIStringEncoding] autorelease];
}
Second, you could presumably use sqlite3_create_function() to write your own function (that presumably invokes a permutation of the above) that you can use right in your SQL statements themselves. See the SQLite documentation.
Update:
By the way, given that you're doing FTS, the sqlite3_create_function() approach is probably not possible, but it strikes me that you could either do FTS on the field containing the ASCII-only string, or write your own tokenizer that does something along those lines.

NSPredicate multiple field search

I'm trying to search multiple fields. Something like this:
[NSPredicate predicateWithFormat:#"(name title contains[cd] %#) AND (title contains[cd] %#", self.searchBar.text];
A search is made on both the name field or the title field.
Also, if anyone knows what a wildcard search would look like I'd appreciate that too.
I tried:
[NSPredicate predicateWithFormat:#"* contains[cd] %#", self.searchBar.text];
my error code:
Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Unable to parse the format string "(name contains[cd] %#) OR (title contains[cd] %#"'
First off, you must specify all the properties to examine, since there is no fixed or universal list of properties, and in Objective-C there is no real semantic distinction between a property and any other method.
Second off, to examine multiple properties to see if they contain a search string, you should use OR, not AND, since your search is satisfied if any of the properties match, not all.
Otherwise, the syntax you have appears correct (I'm assuming name title in the first subpredicate is meant to be just name.)
Not sure if anyone still reads this post, but I found your problem. You are missing a closing ")" in the second predicate. Here is the correct code:
[NSPredicate predicateWithFormat:#"(name title contains[cd] %#) AND (title contains[cd] %#)", self.searchBar.text];
Edit: After testing this I noticed that two format specifiers worked fine with ONE format argument (see code above). When I added a third predicate and format specifier (with only one format argument) it crashed. Either way, you should always have the same amount of format specifiers and format arguments.

Regex with #-sign not working

I am using RegexKitLite in an iPhone project and want to use regex to find words that start with the #-sign. For instance, "#home #chores", when searched, would return both words.
The regex string I am using is "(?m-s:#.*\\s*)". When I use this, though, I get a crash. When I use the same thing, but with a # instead of #, it works just fine: "(?m-s:#.*\\s*)". WTF?
I would much appreciate it if someone with a better understanding of regular expressions could help me on this. The tutorials I have seen so far have been near incomprehensible to me.
I did a modification of Manu's idea, just switching the location of the # in the regex.
/(#\b\w+)/
I tested it on a string with '#foo #bar #baz #lol' and it seemed to do what you're looking for in matching on the words and capturing them with the parens.
Have you tried something like that:
NSString *search = #"This is my #home string with #some tokens to be #found";
NSString *regex = #"\\b#(\\w+)";
NSArray *matches = NULL;
matches = [search componentsMatchedByRegex:regex];
// now matches should have { #"home", #"some", #"found" } values
I haven't tested that but should work.
This may sound too simple, but have you tried changing # to \# or \\#
Why not simply use /\b#\w+/?