Regex application to string? - swift

Currently, I am looping through phone numbers to find matches in a database. However, I need to remove dashes and any area codes so that the database search can exact. Currently, I am trying to use this regex on the phone numbers :
(?:\+\d{2}\s*(?:\(\d{2}\))|(?:\(\d{2}\)))?\s*(\d{4,5}\-?\d{4})
and I am trying to apply it as such
if let longNumber = (contact.phoneNumbers.first?.value as? CNPhoneNumber)?.stringValue {
let phoneNumber = longNumber.replacingOccurrences(of: "(?:\+\d{2}\s*(?:\(\d{2}\))|(?:\(\d{2}\)))?\s*(\d{4,5}\-?\d{4})", with: "$1", options: .regularExpression)
However, I receive the error Invalid escape sequence in literal and Missing argument for parameter 'for' in call
How can I properly get only the phone digits from the string? ie if it is +1 300-300-3000
I need it to return 3003003000.
Examples:
+1 390 - 456 - 8823 -> 3904568823
+92 084 765 4902 --> 0847654902
+922 (064) 864 0921 --> 0648640921
842.231.9072 --> 8422319072
+1 (972) - 864 - 0921 --> 9728640921
+92 33 - 783 - 9382 --> 337839282

From the examples you have shown, I assume the following rules:
Phone numbers are formatted in 3 or 4 parts
Part1 (optional)
+
1 to 3 digits
one or more whitespaces follow
Part2
May be enclosed in ( and )
2 or 3 digits
a hyphen or a decimal point or a whitespace with extra whitespaces at both ends follows
Part3
3 digits
a hyphen or a decimal point or a whitespace with extra whitespaces at both ends follows
Part4
4 digits
(Please remember, this sort of phone number notation rule is local to a specific region. When you want to internationalize your app, you may need many more rules. The pattern you have now may be written for some other region.)
Partial pattern for each part would be something as follows:
let part1 = "(?:\\+\\d{1,3}\\s+)?"
let part2 = "(?:\\((\\d{2,3})\\)|(\\d{2,3}))\\s*[\\s\\.-]?\\s*"
let part3 = "(\\d{3})\\s*[\\s\\.-]?\\s*"
let part4 = "(\\d{4})"
(Please do not miss that all backslashes are escaped.)
Testing code:
import Foundation
let numbers: [(long: String, expected: String)] = [
("+1 300-300-3000", "3003003000"),
("+1 390 - 456 - 8823", "3904568823"),
("+92 084 765 4902", "0847654902"),
("+922 (064) 864 0921", "0648640921"),
("842.231.9072", "8422319072"),
("+1 (972) - 864 - 0921", "9728640921"),
("+92 33 - 783 - 9382", "337839382"), //I assume your example is wrong here
]
for number in numbers {
let longNumber = number.long
let part1 = "(?:\\+\\d{1,3}\\s+)?"
let part2 = "(?:\\((\\d{2,3})\\)|(\\d{2,3}))\\s*[\\s\\.-]?\\s*"
let part3 = "(\\d{3})\\s*[\\s\\.-]?\\s*"
let part4 = "(\\d{4})"
let pattern = "^\(part1)\(part2)\(part3)\(part4)$"
let phoneNumber = longNumber.replacingOccurrences(of: pattern, with: "$1$2$3$4", options: .regularExpression)
print("\(longNumber) --> \(phoneNumber)", phoneNumber == number.expected ? "Success" : "Fail (expected \(number.expected))")
}
Output:
+1 300-300-3000 --> 3003003000 Success
+1 390 - 456 - 8823 --> 3904568823 Success
+92 084 765 4902 --> 0847654902 Success
+922 (064) 864 0921 --> 0648640921 Success
842.231.9072 --> 8422319072 Success
+1 (972) - 864 - 0921 --> 9728640921 Success
+92 33 - 783 - 9382 --> 337839382 Success
The code above may not work as expected for possible other inputs, please try to fix it to fit for such inputs by yourself.

Related

Substring function to extract part of the string

data = {'desc': ['ADRIAN PETER - ANN 80020355787C - 11 Baillon Pass.pdf', 'AILEEN MARCUS - ANC 800E15432922 - 5 Mandarin Way.pdf',
'AJITH SINGH - ANN 80020837750 - 11 Berkeley Loop.pdf', 'ALEX MARTIN-CURTIS - ANC 80021710355 - 26 Dovedale St.pdf',
'Alice.Smith\Jodee - Karen - ANE 80020428377 - 58 Harrisdale Dr.pdf']}
df = pd.DataFrame(data, columns = ['desc'])
df
From the data frame, I want to create a new column called ID, and in that ID, I want to have only those values starting after ANN, ANC or ANE. So I am expecting a result as below.
ID
80020355787C
800E15432922
80020837750
80021710355
80020428377
I tried running the code below, but it did not get the desired result. Appreciate your help on this.
df['id'] = df['desc'].str.extract(r'\-([^|]+)\-')
You can use - AN[NCE] (800[0-9A-Z]+) -, where:
AN[NCE] matches literally AN followed by N or C or E;
800[0-9A-Z]+ matches literally 800 followed by one or more characters between 0 and 9 or between A and Z.
>>> df['desc'].str.extract(r'- AN[NCE] (800[0-9A-Z]+) -')
0
0 80020355787C
1 800E15432922
2 80020837750
3 80021710355
4 80020428377
If not all your ids start with "800", you can just remove it from the pattern.

delete rows with character in cell array

I need some basic help. I have a cell array:
TITLE 13122423
NAME Bob
PROVIDER James
and many more rows with text...
234 456 234 345
324 346 234 345
344 454 462 435
and many MANY (>4000) more with only numbers
text
text
and more text and mixed entries
Now what I want is to delete all the rows where the first column contain a character, and end up with only those rows containing numbers. Row 44 - 46 in this example.
I tried to use
rawdataTruncated(strncmp(rawdataTruncated(:, 1), 'A', 1), :) = [];
but then i need to go throught the whole alphabet, right?
Given data of the form:
C = {'FIRSTX' '350.0000' '' '' ; ...
'350.0000' '0.226885' '254.409' '0.755055'; ...
'349.9500' '0.214335' '254.41' '0.755073'; ...
'250.0000' 'LASTX' '' '' };
You can remove any row that has character strings containing letters using isstrprop, cellfun, and any like so:
index = ~any(cellfun(#any, isstrprop(C, 'alpha')), 2);
C = C(index, :)
C =
2×4 cell array
'350.0000' '0.226885' '254.409' '0.755055'
'349.9500' '0.214335' '254.41' '0.755073'

How to comment on a specific line number on a PR on github

I am trying to write a small script that can comment on github PRs using eslint output.
The problem is eslint gives me the absolute line numbers for each error.
But github API wants the line number relative to the diff.
From the github API docs: https://developer.github.com/v3/pulls/comments/#create-a-comment
To comment on a specific line in a file, you will need to first
determine the position in the diff. GitHub offers a
application/vnd.github.v3.diff media type which you can use in a
preceding request to view the pull request's diff. The diff needs to
be interpreted to translate from the line in the file to a position in
the diff. The position value is the number of lines down from the
first "##" hunk header in the file you would like to comment on.
The line just below the "##" line is position 1, the next line is
position 2, and so on. The position in the file's diff continues to
increase through lines of whitespace and additional hunks until a new
file is reached.
So if I want to add a comment on new line number 5 in the above image, then I would need to pass 12 to the API
My question is how can I easily map between the new line numbers which the eslint will give in it's error messages to the relative line numbers required by the github API
What I have tried so far
I am using parse-diff to convert the diff provided by github API into json object
[{
"chunks": [{
"content": "## -,OLD_TOTAL_LINES +NEW_STARTING_LINE_NUMBER,NEW_TOTAL_LINES ##",
"changes": [
{
"type": STRING("normal"|"add"|"del"),
"normal": BOOLEAN,
"add": BOOLEAN,
"del": BOOLEAN,
"ln1": OLD_LINE_NUMBER,
"ln2": NEW_LINE_NUMBER,
"content": STRING,
"oldStart": NUMBER,
"oldLines": NUMBER,
"newStart": NUMBER,
"newLines": NUMBER
}
}]
}]
I am thinking of the following algorithm
make an array of new line numbers starting from NEW_STARTING_LINE_NUMBER to
NEW_STARTING_LINE_NUMBER+NEW_TOTAL_LINESfor each file
subtract newStart from each number and make it another array relativeLineNumbers
traverse through the array and for each deleted line (type==='del') increment the corresponding remaining relativeLineNumbers
for another hunk (line having ##) decrement the corresponding remaining relativeLineNumbers
I have found a solution. I didn't put it here because it involves simple looping and nothing special. But anyway answering now to help others.
I have opened a pull request to create the similar situation as shown in question
https://github.com/harryi3t/5134/pull/7/files
Using the Github API one can get the diff data.
diff --git a/test.js b/test.js
index 2aa9a08..066fc99 100644
--- a/test.js
+++ b/test.js
## -2,14 +2,7 ##
var hello = require('./hello.js');
-var names = [
- 'harry',
- 'barry',
- 'garry',
- 'harry',
- 'barry',
- 'marry',
-];
+var names = ['harry', 'barry', 'garry', 'harry', 'barry', 'marry'];
var names2 = [
'harry',
## -23,9 +16,7 ## var names2 = [
// after this line new chunk will be created
var names3 = [
'harry',
- 'barry',
- 'garry',
'harry',
'barry',
- 'marry',
+ 'marry', 'garry',
];
Now just pass this data to diff-parse module and do the computation.
var parsedFiles = parseDiff(data); // diff output
parsedFiles.forEach(
function (file) {
var relativeLine = 0;
file.chunks.forEach(
function (chunk, index) {
if (index !== 0) // relative line number should increment for each chunk
relativeLine++; // except the first one (see rel-line 16 in the image)
chunk.changes.forEach(
function (change) {
relativeLine++;
console.log(
change.type,
change.ln1 ? change.ln1 : '-',
change.ln2 ? change.ln2 : '-',
change.ln ? change.ln : '-',
relativeLine
);
}
);
}
);
}
);
This would print
type (ln1) old line (ln2) new line (ln) added/deleted line relative line
normal 2 2 - 1
normal 3 3 - 2
normal 4 4 - 3
del - - 5 4
del - - 6 5
del - - 7 6
del - - 8 7
del - - 9 8
del - - 10 9
del - - 11 10
del - - 12 11
add - - 5 12
normal 13 6 - 13
normal 14 7 - 14
normal 15 8 - 15
normal 23 16 - 17
normal 24 17 - 18
normal 25 18 - 19
del - - 26 20
del - - 27 21
normal 28 19 - 22
normal 29 20 - 23
del - - 30 24
add - - 21 25
normal 31 22 - 26
Now you can use the relative line number to post a comment using github api.
For my purpose I only needed the relative line numbers for the newly added lines, but using the table above one can get it for deleted lines also.
Here's the link for the linting project in which I used this. https://github.com/harryi3t/lint-github-pr

iPhone Dividing string into multi line and display as Label

I've a string as follows:
#define BEEF_LABLE #"Recommended Internal Temp 145 - Medium Rare 160 - Medium 170 - Well done"
I want to display it in a 4 lines label. "Recommended Internal Temp" in one line, "145 - Medium Rare" in 2nd line, "160 - Medium" in 3rd line and "170 - Well done" in 4th line.
How can I split the text accordingly.
yourLabel.lineBreakMode = UILineBreakModeWordWrap;
yourLabel.numberOfLines = 0;
and add ("\n") in the String accordingly... like this
#define BEEF_LABLE #"Recommended Internal Temp \n 145 - Medium Rare 160 \n- Medium \n 170 - Well done"

NSRegularExpression to validate URL

I found this regular expression on a website. It is said to be the best URL validation expression out there and I agree. Diego Perini created it.
The problem I am facing is when trying to use it with objective-C to detect URLs on strings. I have tried using options like NSRegularExpressionAnchorsMatchLines, NSRegularExpressionIgnoreMetacharacters and others, but still no luck.
Is the expression not well formatted for Objective-C? Am I missing something? Any ideas?
I have tried John Gruber's regex, also, but it fails with some invalid URLs.
Regular Expression Explanation of expression
^ match at the beginning
//Protocol identifier
(?:
(?:https?|ftp http, https or ftp
):\\/\\/ ://
)? optional
// User:Pass authentication
(?:
^\\s+ non white spaces, 1 or more times
(?:
:^\\s* : non white spaces, 0 or more times, optionally
)?# #
)? optional
//Private IP Addresses ?! Means DO NOT MATCH ahead. So do not match any of the following
(?:
(?!10 10 10.0.0.0 - 10.999.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, three times
){3}
)
(?!127 127 127.0.0.0 - 127.999.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, three times
){3}
)
(?!169\\.254 169.254 169.254.0.0 - 169.254.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, two times
){2}
)
(?!192\\.168 192.168 192.168.0.0 - 192.168.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, two times
){2}
)
(?!172\\. 172. 172.16.0.0 - 172.31.999.999
(?:
1[6-9] 1 followed by any number between 6 and 9
| or
2\\d 2 and any digit
| or
3[0-1] 3 followed by a 0 or 1
)
(?:
\\.\\d{1,3} . 1 to 3 digits, two times
){2}
)
//First Octet IPv4 // match these. Any non network or broadcast IPv4 address
(?:
[1-9]\\d? any number from 1 to 9 followed by an optional digit 1 - 99
| or
1\\d\\d 1 followed by any two digits 100 - 199
| or
2[01]\\d 2 followed by any 0 or 1, followed by a digit 200 - 219
| or
22[0-3] 22 followed by any number between 0 and 3 220 - 223
)
//Second and Third Octet IPv4
(?:
\\. .
(?:
1?\\d{1,2} optional 1 followed by any 1 or two digits 0 - 199
| or
2[0-4]\\d 2 followed by any number between 0 and 4, and any digit 200 - 249
| or
25[0-5] 25 followed by any numbers between 0 and 5 250 - 255
)
){2} two times
//Fourth Octet IPv4
(?:
\\. .
(?:
[1-9]\\d? any number between 1 and 9 followed by an optional digit 1 - 99
| or
1\\d\\d 1 followed by any two digits 100 - 199
| or
2[0-4]\\d 2 followed by any number between 0 and 4, and any digit 200 - 249
| or
25[0-4] 25 followed by any number between 0 and 4 250 - 254
)
)
//Host name
| or
(?:
(?:
[a-z\u00a1-\uffff0-9]+-? any letter, digit or character one or more times with optional -
)* zero or more times
[a-z\u00a1-\uffff0-9]+ any letter, digit or character one or more times
)
//Domain name
(?:
\\. .
(?:
[a-z\u00a1-\uffff0-9]+-? any letter, digit or character one or more times with optional -
)* zero or more times
[a-z\u00a1-\uffff0-9]+ any letter, digit or character one or more times
)* zero or more times
//TLD identifier
(?:
\\. .
(?:
[a-z\u00a1-\uffff]{2,} any letter, digit or character more than two times
)
)
)
//Port number
(?:
:\\d{2,5} : followed by any digit, two to five times, optionally
)?
//Resource path
(?:
\\/[^\\s]* / followed by an optional non space character, zero or more times
)? optional
$ match at the end
EDIT
I think I forgot to say that I am using the expression in the following code: (partial code)
NSError *error = NULL;
NSRegularExpression *detector = [NSRegularExpression regularExpressionWithPattern:[self theRegularExpression] options:0 error:&error];
NSArray *links = [detector matchesInString:theText options:0 range:NSMakeRange(0, theText.length)];
^(?i)(?:(?:https?|ftp):\\/\\/)?(?:\\S+(?::\\S*)?#)?(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:\\/[^\\s]*)?$
Is the best URL validation regular expression I found and it is explained on my question. It is already formatted to work on Objective-C. However, using it with NSRegularExpression gave me all sorts of problems, including my app crashing. RegexKitLite had no problems handling it. I do not know if it is a size limitation or some flag not being set.
My final code looked like:
//First I take the string and put every word in an array, then I match every word with the regular expression
NSArray *splitIntoWordsArray = [textToMatch componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewLineCharacterSet]];
NSMutableString *htmlString = [NSMutableString stringWithString:textToMatch];
for (NSString *theText in splitIntoWordsArray){
NSEnumerator *matchEnumerator = [theText matchEnumeratorWithRegex:theRegularExpressionString];
for (NSString *temp in matchEnumerator){
[htmlString replaceOccurrencesOfString:temp withString:[NSString stringWithFormat:#"%#", temp, temp] options:NSLiteralSearch range:NSMakeRange(0, [htmlString length])];
}
}
[htmlString replaceOccurrencesOfString:#"\n" withString:#"<br />" options:NSLiteralSearch range:NSMakeRange(0, htmlString.length)];
//embed the text on a webView as HTML
[webView loadHTMLString:[NSString stringWithFormat:embedHTML, [mainFont fontName], [mainFont pointSize], htmlString] baseURL:nil];
The result: a UIWebView with some embedded HTML, where URLs and emails are clickable. Do not forget to set dataDetectorTypes = UIDataDetectorTypeNone
You can also try
NSError *error = NULL;
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"(?i)(?:(?:https?):\\/\\/)?(?:\\S+(?::\\S*)?#)?(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:\\/[^\\s]*)?" options:NSRegularExpressionCaseInsensitive error:&error];
if (error)
NSLog(#"error");
NSString *someString = #"This is a sample of a sentence with a URL http://. http://.. http://../ http://? http://?? http://??/ http://# http://-error-.invalid/ http://-.~_!$&'()*+,;=:%40:80%2f::::::#example.com within it.";
NSRange range = [expression rangeOfFirstMatchInString:someString options:NSMatchingCompleted range:NSMakeRange(0, [someString length])];
if (!NSEqualRanges(range, NSMakeRange(NSNotFound, 0))){
NSString *match = [someString substringWithRange:range];
NSLog(#"%#", match);
}
else {
NSLog(#"no match");
}
Hope it helps somebody in the future
The regular expression will sometimes cause the application to hang, so I decided to use gruber's regular expression modified to recognize url without protocol or the www part:
(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/?)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))*(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’])*)
Am I missing something?
You're missing the built-in stuff to do this for you. There's a handy object called NSDataDetector. You create it to look for certain data "types" (like, say, NSTextCheckingTypeLink), then ask it for its -matchesInString:options:range:.
Here's an earlier answer of mine showing how to use it.