How to do negate or subtract a regex from another regex result in just one line of regex - swift

I am trying to do a regex string to find all cases of force unwrapping in swift. This will search all words with exclamation points in the entire code base. However, the regex that I already have has included implicit declaration of variable which I am trying to exclude.
This is the regex that I already have.
(:\s)?\w+(?<!as)\)*!
And it works fine. It searches for "variableName!", "(variableName)!", "hello.hello!". The exclusion of force casting also works. It avoids cases like "hello as! UIView", But I am trying also to exclude another cases such as "var hello: UIView!" which has an exclamation point. That's the problem I am having. I tried negative lookahead and negative lookbehind and nothing solved this kind of case.
This is the sample regex I am working on
(:\s)?\w+(?<!as)\)*!
And this is the result
testing.(**test)))!**
Details lists capture **groups!**
hello as! hello
**Hello!**
**testing!**
testing**.test!**
Hello != World
var noNetworkBanner**: StatusBarNotificationBanner!** <-- need to exclude
"var noNetworkBanner**: StatusBarNotificationBanner!**" <-- need to exclude

You may use
(?<!:\s)\b\w+(?<!\bas)\b\)*!
I added \b word boundaries to match whole words only, and changed the (:\s)? optional group to a negative lookbehind, (?<!:\s), that disallows a : + space before the word you need to match.
See the regex demo and the regex graph:
Details
(?<!:\s) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a : and a whitespace
\b - word boundary
\w+ - 1+ word chars
(?<!\bas) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a whole word as
\b - word boundary
\)* - 0 or more ) chars
! - a ! char.

Related

Regular expression with condition on inner expressions

I would like to build a regular expression for replacing a sentence with "per" when it should be (a readable version of a sentence with quantities).
That is:
"3/unit" must match
"unit/3" must match
"feet/second" must match
"05/07" must not match
I know how to create something like "\D+/\D+".
But how can I build a regex saying "not both right and left expressions match \D+" ?
You can use
^(?![0-9]+/[0-9]+$)[^/]+/[^/]+$
See the regex demo. Details:
^ - start of string
(?![0-9]+/[0-9]+$) - a negative lookahead that fails the match if there are one or more digits, /, one or more digits and end of string position immediately to the right of the current location
[^/]+/[^/]+ - one or more chars other than /, a / char, and then one or more chars other than /
$ - end of string.

How to capture a word boundary in Swift NSRegularExpression?

I would like to capture all words within a string which start with some prefix. For example all words which start with a t
if let regex = try? NSRegularExpression(pattern: #"t[^ ]+"#, options: NSRegularExpression.Options.caseInsensitive) {
let input = "this is the best test"
let matches = regex.matches(in: input, options: [], range: NSRange(location: 0, length: input.count))
for match in matches {
print((input as NSString).substring(with: match.range))
}
}
In the code above I am using a simple space as delimiter (#"t[^ ]+"#) and the output is as expected:
this
the
test
However, not only spaces but all word boundaries should be considered. So I replace the space with \b to match all boundaries (#"t[^\b]+"#). However, this does not work:
this is the
t test
It seems that this code does not look for word boundaries but simple for b... Why is this?
I thought using # before and after the regex would create a raw string and thus deliver the \ correctly to the regex system. So #"t[^\b]+"# should the same as "t[^\\b]+" and be translated to t[^\b]+, shouldn't it?
Or is the word boundary operator \b not available in Swift regex?
EDIT:
According to the ICU Documentation \b matches a word boundary, thus [^\b] (anything but a word boundary) should not be the same as [^b] (anything but a b), should it?
However, it seems that \b can not be used in sets, can it? But \Bshould do the same (anything but a word boundary).
So I tried using #"t\B+"# instead. However this does not find any match at all.
The question remains: How to match a word boundary in Swift NSRegularExpression?
The #"t[^\b]+"# string literal results in a t[^\b]+ regex and it simply matches t and then one or more chars other than a b char (the [^\b] is equal to [^b] in ICU regex flavor).
To match a t and then one or more word chars (that is, up to the next leftmost word boundary), you can use
pattern: #"t\w+"#
where \w+ will match one or more word chars.
A [...] is a character set/class. And a character class is meant to match characters. \b is a word boundary only outside a character class, because a word boundary is not a character, it is a zero-width assertion that matches a certain position in a string. All zero-width assertions lose their special, "zero-width" meaning in a character class. [.$] doesn't mean a . or end of string, it matches either a . or $ char. [.\z] does not match . or the very end of string, it matches . or z as \ is omitted since \z is not a valid escape sequence.
Also, t\B+ makes very little sense as \B, also a zero-width assertion, matches a location in the string that is not a word boundary position. Note that zero-width assertions do not consume text, i.e. no text is added to the overall match memory buffer, and the regex index remains where it was before trying the zero-width assertion pattern. By adding + after \B, you just tell the regex engine to match a location after t that is not a word boundary, so the regex engine matches t\B+ the same way as if it were a t\B, i.e. it only matches a t that is followed with a word char (letter, digit, connector punctuation).
\w matches (and consumes) word chars, so if you need to match (and really get as a result) any chars after t till the first word boundary, you just need to use this \w pattern, t\w* or t\w+ (if there must be at least one word char after t).

Regex expression for detecting 2 consecutive words when first word starts with #

I wanted to know the regex expression that detects names starting with #. For eg, in the sentence "Hi #Steve Rogers, how are you?", I want to extract out #Steve Rogers using regex. I tried using Pattern.compile("#\\s*(\\w+)").matcher(text), but only "#Steve" get detected. What else should I use.??
Thanks
Try (#[\w\s]+)
It will only capture word and spaces after the #
See example at https://regex101.com/r/4Pv9bu/1
If you don't want to match an # sign followed by a space only like # and if there can be more than a single word after it:
(?<!\S)#\w+(?:\h+\w+)?
Explanation
(?<!\S) Assert a whitespace boundary to the left
# Match literally
\w+ Match 1+ word characters
(?:\s+\w+)? Optionally match 1+ horizontal whitespace chars and 1+ word chars
Regex demo
In Java
String regex = "(?<!\\S)#\\w+(?:\\h+\\w+)?";

Conditional replacement of a character

I would like to replace a character in a long string only if a special sequence is present in the input.
Example:
This string is a sample! I wrote it to describe my problem! I hope somebody can help me with this! I have the ID: 12345! That's all!
My desired output is:
This string is a sample. I wrote it to describe my problem. I hope somebody can help me with this. I have the ID: 12345. That's all.
Only when '12345' present in the input string.
I tried (positive|negative) look(ahead|behind)
(?<!=12345)(!+(.*))+
Does not work, so as ?=, ?!...
Is this possible with PCRE replacement in one step?
In general, this is possible with any regex flavor supporting \G "string start/end of the previous match" operator. You may replace with $1 + desired text when searching with the following patterns:
(?:\G(?!^)|^(?=.*CHECKME))(.*?)REPLACEME <-- Replace REPLACEME if CHECKME is present
(?:\G(?!^)|^(?!.*CHECKME))(.*?)REPLACEME <-- Replace REPLACEME if CHECKME is absent
With Perl/PCRE/Onigmo that support \K, you may replace with your required text when searching with
(?:\G(?!^)|^(?=.*CHECKME)).*?\KREPLACEME <-- Replace REPLACEME if CHECKME is present
(?:\G(?!^)|^(?!.*CHECKME)).*?\KREPLACEME <-- Replace REPLACEME if CHECKME is absent
In your case, since the text searched for is a single character, you may use a more efficient regex with just one .*:
(?:\G(?!^)|^(?=.*12345))[^!]*\K!
and replace with . (or with $1. if you use (?:\G(?!^)|^(?=.*12345))([^!]*)!). See the regex demo.
If there can be line breaks in the string use (?s)(?:\G(?!^)|^(?=.*12345))[^!]*\K!.
Details
(?:\G(?!^)|^(?=.*12345)) - either the end of the previous match (\G(?!^)) or (|) the start of a string position followed with any 0+ chars as many as possible up to the last occurrence of 12345 (^(?=.*12345))
[^!]* - 0 or more chars other than !
\K - match reset operator that discards all text matched so far in the match memory buffer
! - a ! char.

How to get a perfect match for a regexp pattern in Perl?

I've to match a regular-expression, stored in a variable:
#!/bin/env perl
use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
print "\n%%%%%%%%% $`-----$&-----$'\n";
}
else {
print "\n********* NOT MATCHED\n";
}
But I'm getting the outout in $& as
%%%%%%%%% -----abcd[3] xyzg-----[4:0]
But expecting, it shouldn't go inside the if clause.
What is intended is:
if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg----- (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg----- (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED (CORRECT/INTENDED)
but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.
Can anyone please help me to change my $expr pattern, so that I can have what is intended?
By default, Perl regexes only look for a matching substring of the given string. In order to force comparison against the entire string, you need to indicate that the regex begins at the beginning of the string and ends at the end by using ^ and $:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;
(Also, there's no reason to have the /x modifier, as your regex doesn't include any literal whitespace or # characters, and there's no reason for the /s modifier, as you're not using ..)
EDIT: If you don't want the regex to match against the entire string, but you want it to reject anything in which the matching portion is followed by something like "[0:0]", the simplest way would be to use lookahead:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;
This will match anything that takes the following form:
beginning of the string (which your example in the comments seems to imply you want)
zero or more whitespace characters
one or more word characters
optional: [, one or more digits, ]
one or more whitespace characters
one or more word characters
one of the following, in descending order of preference:
[, one or more digits, ]
an empty string followed by (but not including!) a character that is neither [ nor a word character (The exclusion of word characters is to keep the regex engine from succeeding on "a[0] bc[1:2]" by only matching "a[0] b".)
end of string (A space is needed after the $ to keep it from merging with the following ) to form the name of a special variable, and this entails the reintroduction of the /x option.)
Do you have any more unstated requirements that need to be satisfied?
The short answer is your regexp is wrong.
We can't fix it for you without you explaining what you need exactly, and the community is not going to write a regexp exactly for your purpose because that's just too localized a question that only helps you this one time.
You need to ask something more general about regexps that we can explain to you, that will help you fix your regexp, and help others fix theirs.
Here's my general answer when you're having trouble testing your regexp. Use a regexp tool, like the regex buddy one.
So I'm going to give a specific answer about what you're overlooking here:
Let's make this example smaller:
Your pattern is a(bc+d)?. It will match: abcd abccd etc. While it will not match bcd nor bzd in the case of abzd it will match as matching only a because the whole group of bc+d is optional. Similarly it will match abcbcd as a dropping the whole optional group that couldn't be matched (at the second b).
Regexps will match as much of the string as they can and return a true match when they can match something and have satisfied the entire pattern. If you make something optional, they will leave it out when they have to including it only when it's present and matches.
Here's what you tried:
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
First, s and x aren't needed modifiers here.
Second, this regex can match:
Any or no whitespace followed by
a word of at least one alpha character followed by
optionally a grouped square bracketed number with at least one digit (eg [0] or [9999]) followed by
at least one white space followed by
a word of at least one alpha character followed by
optionally a square bracketed number with at least one digit.
Clearly when you ask it to match abcd[0] xyzg[0:4] the colon ends the \d+ pattern but doesn't satisfy the \] so it backtracks the whole group, and then happily finds the group was optional. So by not matching the last optional group, your pattern has matched successfully.