Extracting range of unpadded string - swift

I'd like to extract the Range<String.Index> of a sentence within its whitespace padding. For example,
let padded = " El águila (🦅). "
let sentenceRangeInPadded = ???
assert(padded[sentenceRangeInPadded] == "El águila (🦅).") // The test!
Here's some regex that I started with, but looks like variable length lookbehinds aren't supported.
let sentenceRangeInPadded = padded.range(of: #"(?<=^\s*).*?(?=\s*$)"#, options: .regularExpression)!
I'm not looking to extract the sentence (could just use trimmingCharacters(in:) for that), just the Range.
Thanks for reading!

You may use
#"(?s)\S(?:.*\S)?"#
See the regex demo.
Details
(?s) - a DOTALL modifier making . match any char, including line break chars
\S - the first non-whitespace char
(?:.*\S)? - an optional non-capturing group matching
.* - any 0+ chars as many as possible
\S - up to the last non-whitespace char.

Related

Check string matches exact format swift 5

I want to check if a string matches an exact regex pattern;
Currently, even though the string being compared is not an exact match, my function is returning true.
Pattern String: "([0-9],[0-9])"
For example,
(1,1) is valid
(5,4) is valid
Only strings entered in this format are valid I.E. Bracket Number Comma Number Bracket (Without spaces)
I.E.
[5,5] is not valid
{5,5] is not valid.
5,5 is not valid
Code I am using to check:
let stringToCheck = "[5,5]"
return stringToCheck.range(of: "([0-9],[0-9])", options: .regularExpression, range: nil, locale: nil) != nil
Can anyone help me with how to adjust this to check for exact matches in line with my pattern?
Thanks!
You need two things:
Escape parentheses
Add anchors because in the current code, the regex can match a part of a string.
You can thus use
stringToCheck.range(of: #"^\([0-9],[0-9]\)\z"#, options: .regularExpression, range: nil, locale: nil) != nil
Note the # chars on both ends, they allow escaping with single backslashes.
Details:
^ - start of string
\( - a ( char
[0-9] - a single ASCII digit (add + after ] to match one or more digits)
, - a comma
[0-9] - a single ASCII digit (add + after ] to match one or more digits)
\) - a ) char
\z - the very end of string (if linebreaks cannot be present in the string, $ is enough).

Avoiding duplicate items in a comma-separated list of two-letter words

I need to write a regex which allows a group of 2 chars only once. This is my current regex :
^([A-Z]{2},)*([A-Z]{2}){1}$
This allows me to validate something like this :
AL,RA,IS,GD
AL
AL,RA
The problem is that it validates also AL,AL and AL,RA,AL.
EDIT
Here there are more details.
What is allowed:
AL,RA,GD
AL
AL,RA
AL,IS,GD
What it shouldn't be allowed:
AL,RA,AL
AL,AL
AL,RA,RA
AL,IS,AL
IS,IS,AL
IS,GD,GD
IS,GD,IS
I need that every group of two characters appears only once in the sequence.
Try something like this expression:
/^(?:,?(\b\w{2}\b)(?!.*\1))+$/gm
I have no knowledge of swift, so take it with a grain of salt. The idea is basically to only match a whole line while making sure that no single matched group occurs at a later point in the line.
First of all, let's shorten your pattern. It can be easily achieved since the length of each comma-separated item is fixed and the list items are only made up of uppercase ASCII letters. So, your pattern can be written as ^(?:[A-Z]{2}(?:,\b)?)+$. See this regex demo.
Now, you need to add a negative lookahead that will check the string for any repeating two-letter sequence at any distance from the start of string, and within any distance between each. Use
^(?!.*\b([A-Z]{2})\b.*\b\1\b)(?:[A-Z]{2}(?:,\b)?)+$
See the regex demo
Possible implementation in Swift:
func isValidInput(Input:String) -> Bool {
return Input.range(of: #"^(?!.*\b([A-Z]{2})\b.*\b\1\b)(?:[A-Z]{2}(?:,\b)?)+$"#, options: .regularExpression) != nil
}
print(isValidInput(Input:"AL,RA,GD")) // true
print(isValidInput(Input:"AL,RA,AL")) // false
Details
^ - start of string
(?!.*\b([A-Z]{2})\b.*\b\1\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there is:
.* - any 0+ chars other than line break chars, as many as possible
\b([A-Z]{2})\b - a two-letter word as a whole word
.* - any 0+ chars other than line break chars, as many as possible
\b\1\b - the same whole word as in Group 1. NOTE: The word boundaries here are not necessary in the current scenario where the word length is fixed, it is two, but if you do not know the word length, and you have [A-Z]+, you will need the word boundaries, or other boundaries depending on the situation
(?:[A-Z]{2}(?:,\b)?)+ - 1 or more sequences of:
[A-Z]{2} - two uppercase ASCII letters
(?:,\b)? - an optional sequence: , only if followed with a word char: letter, digit or _. This guarantees that , won't be allowed at the end of the string
$ - end of string.
You can use a negative lookahead with a back-reference:
^(?!.*([A-Z]{2}).*\1).*
if, as in the all the examples in the question, it is known that the string contains only comma-separated pairs of capital letters. I will relax that assumption later in my answer.
Demo
The regex performs the following operations:
^ # match beginning of line
(?! # begin negative lookahead
.* # match 0+ characters (1+ OK)
([A-Z]{2}) # match 2 uppercase letters in capture group 1
.* # match 0+ characters (1+ OK)
\1 # match the contents of capture group 1
) # end negative lookahead
.* # match 0+ characters (the entire string)
Suppose now that one or more capital letters may appear between each pair of commas, or before the first comma or after the last comma, but it is only strings of two letters that cannot be repeated. Moreover, I assume the regex must confirm the regex has the desired form. Then the following regex could be used:
^(?=[A-Z]+(?:,[A-Z]+)*$)(?!.*(?:^|,)([A-Z]{2}),(?:.*,)?\1(?:,|$)).*
Demo
The regex performs the following operations:
^ # match beginning of line
(?= # begin pos lookahead
[A-Z]+ # match 1+ uc letters
(?:,[A-Z]+) # match ',' then by 1+ uc letters in a non-cap grp
* # execute the non-cap grp 0+ times
$ # match the end of the line
) # end pos lookahead
(?! # begin neg lookahead
.* # match 0+ chars
(?:^|,) # match beginning of line or ','
([A-Z]{2}) # match 2 uc letters in cap grp 1
, # match ','
(?:.*,) # match 0+ chars, then ',' in non-cap group
? # optionally match non-cap grp
\1 # match the contents of cap grp 1
(?:,|$) # match ',' or end of line
) # end neg lookahead
.* # match 0+ chars (entire string)
If there is no need check that the string contains only comma-separated strings of one or more upper case letters the postive lookahead at the beginning can be removed.

Match all substrings in string except specific pattern

in this string: "TESTING (hello) 123 (HOW ARE YOU)"
I would like to match:
TESTING
123
Please help.. thanks!
I am only able to use (\\\(.*?\)) to match \(hello) and \(HOW ARE YOU), how can i match the counterpart of this strings?
There is no way with ICU (the regex library used in Swift) regex to match a text chunk that is not equal to some multicharacter string. You could do it if you wanted to match any 1 or more chars other than some specific character. You can't do it if you are "negating" a whole sequence of chars.
You may use
let str = "TESTING \\(hello) 123 \\(HOW ARE YOU)"
let pattern = "\\s*\\\\\\([^()]*\\)\\s*"
let result = str.replacingOccurrences(of: pattern, with: "\0", options: .regularExpression)
print(result.components(separatedBy: "\0").filter({ $0 != ""}))
Output: ["TESTING", "123"]
The idea is to match what you do not need and replace them with a null char, and then split the string with that null char.
Pattern details
\s* - 0+ whitespaces
\\\( - a \( substring
[^()]* - 0+ chars other than ( and )
\) - a ) char
\s* - 0+ whitespaces.
The results are likely to contain empty strings, hence .filter({ $0 != ""}) is used to filter them out.

Swift Regex Search String Except \r\n and \t

I am attempting to match phone numbers that is 6 digits or more with the following regex in swift. Phone numbers can also possess paranthesis and + for country codes.
"[0-9\\s\\-\\+\\(\\)]{6,}".
However, the above implementation matches \r\n and \t as well. How can I write the regex such that it will not match any \r\n or \t.
I attempted the following but didn't work:
"[0-9\\s\\-\\+\\(\\)(^\\r\\n\\t)]{6,}"
"[0-9\\s\\-\\+\\(\\)(?: (\\r|\\n|\\r\\n|\\t)]{6,}"
Thanks.
I suggest using
let regex = "^(?:[ +()-]*[0-9]){6,}[ +()-]*$"
Or
let regex = "^(?:[ +()-]*[0-9]){6,}[ +()-]*\\z"
Details
^ - start of string
(?:[ +()-]*[0-9]){6,} - six or more repetitions of
[ +()-]* - zero or more spaces, +, (, ) or - chars
[0-9] - a digit
[ +()-]* - zero or more spaces, +, (, ) or - chars
$ - end of string (\z is the very end of string).
If the pattern is used inside NSPredicate with MATCHES you may omit the ^ and $/\z anchors.

How to get hashtag from string that contains # at the beginning and end without space at the end?

This is my string
"I made this wonderful pic last #chRistmas... #instagram #nofilter #snow #fun"
and I would like to get hashtag that contains # at the beginning and end without space. My expected result is:
$fun
This is what I have so far for regex search:
#[a-z0-9]+
but it give me all the hashtags not the one that I want. Thank you for your help!
Using #[a-zA-Z0-9]*$ instead of your current regex
It seems you need to match a hashtag at the end of the string, or the last hashtag in the string. So, there are several ways solve the issue.
Matching the last hashtag in the string
let str = "I made this wonderful pic last #chRistmas... #instagram #nofilter #snow #fun"
let regex = "#[[:alnum:]]++(?!.*#[[:alnum:]])"
if let range = str.range(of: regex, options: .regularExpression) {
let text: String = String(str[range])
print(text)
}
Details
# - a hash symbol
[[:alnum:]]++ - 1 or more alphanumeric chars
(?!.*#[[:alnum:]]) - no # + 1+ alphanumeric chars after any 0+ chars other than line break chars immediately to the right of the current location.
Matching a hashtag at the end of the string
Same code but with the following regexps:
let regex = "#[[:alnum:]]+$"
or
let regex = "#[[:alnum:]]+\\z"
Note that \z matches the very end of string, if there is a newline char between the hashtag and the end of string, there won't be any match (in case of $, there will be a match).
Note on the regex
If a hashtag should only start with a letter, it is a better idea to use
#[[:alpha:]][[:alnum:]]*
where [[:alpha:]] matches any letter and [[:alnum:]]* matches 0+ letters or/and digits.
Note that in ICU regex patterns, you may write [[:alnum:]] as [:alnum:].
You can use:
(^#[a-z0-9]+|#[a-z0-9]+$)
Test it online