Regular expression with condition on inner expressions

Regular expression with condition on inner expressions - swift

I would like to build a regular expression for replacing a sentence with "per" when it should be (a readable version of a sentence with quantities).
That is:
"3/unit" must match
"unit/3" must match
"feet/second" must match
"05/07" must not match
I know how to create something like "\D+/\D+".
But how can I build a regex saying "not both right and left expressions match \D+" ?

You can use
^(?![0-9]+/[0-9]+$)[^/]+/[^/]+$
See the regex demo. Details:
^ - start of string
(?![0-9]+/[0-9]+$) - a negative lookahead that fails the match if there are one or more digits, /, one or more digits and end of string position immediately to the right of the current location
[^/]+/[^/]+ - one or more chars other than /, a / char, and then one or more chars other than /
$ - end of string.

Related

perl regex - pattern matching

Can anyone explain what is being done below?
$name=~m,common/([^/]+)/run.*/([^/]+)/([^/]+)$,;

common, run and / are match themselves.
() captures.
[^/]+ matches 1 or more characters that aren't /.
.* matches 0 or more characters that aren't Line Feeds.[1]
$ is equivalent to (\n?\z).[2]
\n optionally matches a Line Feed.
\z matches the end of the string.
I think it's trying to match a path of one or both of the following forms:
.../common/XXX/runYYY/XXX/XXX
common/XXX/runYYY/XXX/XXX
Where
XXX is a sequence of at least one character that doesn't contain /.
YYY is a sequence of any number of characters (incl zero) that doesn't contain /.
It matches more than that, however.
It matches uncommon/XXX/runYYY/XXX/XXX
It matches common/XXX/runYYY/XXX/XXX/XXX/XXX/XXX/XXX
The parts in bold are captured (available to the caller).
When the s flag isn't used.
When the m flag isn't used.

How to do negate or subtract a regex from another regex result in just one line of regex

I am trying to do a regex string to find all cases of force unwrapping in swift. This will search all words with exclamation points in the entire code base. However, the regex that I already have has included implicit declaration of variable which I am trying to exclude.
This is the regex that I already have.
(:\s)?\w+(?<!as)\)*!
And it works fine. It searches for "variableName!", "(variableName)!", "hello.hello!". The exclusion of force casting also works. It avoids cases like "hello as! UIView", But I am trying also to exclude another cases such as "var hello: UIView!" which has an exclamation point. That's the problem I am having. I tried negative lookahead and negative lookbehind and nothing solved this kind of case.
This is the sample regex I am working on
(:\s)?\w+(?<!as)\)*!
And this is the result
testing.(**test)))!**
Details lists capture **groups!**
hello as! hello
**Hello!**
**testing!**
testing**.test!**
Hello != World
var noNetworkBanner**: StatusBarNotificationBanner!** <-- need to exclude
"var noNetworkBanner**: StatusBarNotificationBanner!**" <-- need to exclude

You may use
(?<!:\s)\b\w+(?<!\bas)\b\)*!
I added \b word boundaries to match whole words only, and changed the (:\s)? optional group to a negative lookbehind, (?<!:\s), that disallows a : + space before the word you need to match.
See the regex demo and the regex graph:
Details
(?<!:\s) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a : and a whitespace
\b - word boundary
\w+ - 1+ word chars
(?<!\bas) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a whole word as
\b - word boundary
\)* - 0 or more ) chars
! - a ! char.

PCRE: Difference between .* and .*? in regular expressions

I was wondering, why .* and .*? is not the same in PCRE regular expressions (for example in PHP's preg_match(). Dot . is symbol for any possible character and * is symbol for 0 to infinity repetition. Why is there symbol ? which means 0 to 1 repetition? However it is not obviously the same, because .*? is not interchangeable with .*, but I can't see logic difference, I have to always try what works and what does not work in certain case. I suppose that .* should match nothing to anything and ? is redundant, because it specify that .* can be 0 or 1 times - but zero times is empty string and empty string should be matched by .* too.
Can anyone explain me what is the exact difference and show me short example?
Thanks

i love wantons because they are tasty snacks
In the above string, let's say you try to match it with i.*s. The result would be the entire string, because this is called a greedy match. It matches from the first instance of i until the last instance of s.
If you were to use the non-greedy modifier ?, like i.*?s, then you would result in the following:
i love wantons
This is because the non-greedy ? modifier only matches until the first instance of s.

* is a greedy match - in other words, match zero to many times, as many times as possible. *? is a minimal match - in other words, match zero to many times, as few times as possible for the rest of the pattern to make sense. Similarly, +? is a minimally-matching version of +.
Consider the string this is "quoted" and this is "also quoted". The regular expression ".*" would match one result, "quoted" and this is "also quoted"; ".*?" would match twice, "quoted" and "also quoted".

sed - remove specific subscript from string

please provide me a sed oneliner which provides this output:
sdc3 sdc2
for Input :
sdc3[1] sdc2[0]
I mean remove all subscript value from the string ..

sed 's/\[[^]]*\]//g'
reads: substitute any string with literal "[" followed by zero or more characters that aren't a "]", and then the closing "]", with an empty string.
You need the [^]] bit to prevent greedy matching treating "[1] sdc2[0]" as a single match in your sample string.
As for your comment:
sed 's#\([^[ ]*\)\[[^]]*\]#/dev/\1#g'
I switch the seperator from the usual '/' to '#', just to avoid escaping the /dev/ bit you asked for (I won't say "for clarity")
the \(...\) bit matches a subgroup, here sdc2 or whatever, so we can refer to it in the replacement
the subgroup uses a similar character class to the one we used discarding the index: [^[ ] means any character except an "[" (again, to avoid greedily matching the index) or a space (assuming your values are space-delimited as per your post)
the replacement is now the literal "/dev/" followed by the first (and only) subgroup match
the g flag at the end tells it to perform multiple matches per line, instead of stopping at the first one

How to get a perfect match for a regexp pattern in Perl?

I've to match a regular-expression, stored in a variable:
#!/bin/env perl
use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
print "\n%%%%%%%%% $`-----$&-----$'\n";
}
else {
print "\n********* NOT MATCHED\n";
}
But I'm getting the outout in $& as
%%%%%%%%% -----abcd[3] xyzg-----[4:0]
But expecting, it shouldn't go inside the if clause.
What is intended is:
if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg----- (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg----- (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED (CORRECT/INTENDED)
but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.
Can anyone please help me to change my $expr pattern, so that I can have what is intended?

By default, Perl regexes only look for a matching substring of the given string. In order to force comparison against the entire string, you need to indicate that the regex begins at the beginning of the string and ends at the end by using ^ and $:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;
(Also, there's no reason to have the /x modifier, as your regex doesn't include any literal whitespace or # characters, and there's no reason for the /s modifier, as you're not using ..)
EDIT: If you don't want the regex to match against the entire string, but you want it to reject anything in which the matching portion is followed by something like "[0:0]", the simplest way would be to use lookahead:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;
This will match anything that takes the following form:
beginning of the string (which your example in the comments seems to imply you want)
zero or more whitespace characters
one or more word characters
optional: [, one or more digits, ]
one or more whitespace characters
one or more word characters
one of the following, in descending order of preference:
[, one or more digits, ]
an empty string followed by (but not including!) a character that is neither [ nor a word character (The exclusion of word characters is to keep the regex engine from succeeding on "a[0] bc[1:2]" by only matching "a[0] b".)
end of string (A space is needed after the $ to keep it from merging with the following ) to form the name of a special variable, and this entails the reintroduction of the /x option.)
Do you have any more unstated requirements that need to be satisfied?

The short answer is your regexp is wrong.
We can't fix it for you without you explaining what you need exactly, and the community is not going to write a regexp exactly for your purpose because that's just too localized a question that only helps you this one time.
You need to ask something more general about regexps that we can explain to you, that will help you fix your regexp, and help others fix theirs.
Here's my general answer when you're having trouble testing your regexp. Use a regexp tool, like the regex buddy one.
So I'm going to give a specific answer about what you're overlooking here:
Let's make this example smaller:
Your pattern is a(bc+d)?. It will match: abcd abccd etc. While it will not match bcd nor bzd in the case of abzd it will match as matching only a because the whole group of bc+d is optional. Similarly it will match abcbcd as a dropping the whole optional group that couldn't be matched (at the second b).
Regexps will match as much of the string as they can and return a true match when they can match something and have satisfied the entire pattern. If you make something optional, they will leave it out when they have to including it only when it's present and matches.
Here's what you tried:
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
First, s and x aren't needed modifiers here.
Second, this regex can match:
Any or no whitespace followed by
a word of at least one alpha character followed by
optionally a grouped square bracketed number with at least one digit (eg [0] or [9999]) followed by
at least one white space followed by
a word of at least one alpha character followed by
optionally a square bracketed number with at least one digit.
Clearly when you ask it to match abcd[0] xyzg[0:4] the colon ends the \d+ pattern but doesn't satisfy the \] so it backtracks the whole group, and then happily finds the group was optional. So by not matching the last optional group, your pattern has matched successfully.