How do I replace all occurrences of certain characters with their predecessors? - perl

$s = "bla..bla";
$s =~ s/([^%])\./$1/g;
I think it should replace all occurrences of . that is not after % with the character that is before ..
But $s is then: bla.bla, but
it should be blabla. Where is the problem? I know I can use quantifiers, but I need do it this way.

When a global regular expression is searching a string it will not find overlapping matches.
The first match in your string will be a., which is replaced with a. When the regex engine resumes searching it starts at the next . so it sees .bla as the rest of the string, and your regex requires a character to match before the . so it cannot match again.
Instead, use a negative lookbehind to perform the assertion that the previous character is not %:
$s =~ s/(?<!%)\.//g;
Note that if you use a positive lookbehind like (?<=[^%]), you will not replace the . if it is the first character in the string.

The problem is that even with the /g flag, each substitution starts looking where the previous one left off. You're trying to replace a. with a and then a. with a, but the second replacement doesn't happen because the a has already been "swallowed" by the previous replacement.
One fix is to use a zero-width lookbehind assertion:
$s =~ s/(?<=[^%])\.//g;
which will remove any . that is not the first character in the string, and that is not preceded by %.
But you might actually want this:
$s =~ s/(?<!%)\.//g;
which will remove any . that is not preceded by %, even if it is the first character in the string.

Much simpler than look-behinds is to use:
$s =~ s/([^%])\.+/$1/g;
This replaces any string of one or more dots after a character other than % by nothing.

Related

Need Regular expression - perl

I am looking for a regx for below expression:
my $text = "1170 KB/s (244475 bytes in 2.204s)"; # I want to retrieve last ‘2.204’ from this String.
$text =~ m/\d+[^\d*](\d+)/; #Regexp
my $num = $1;
print " $num ";
Output:
204
But I need 2.204 as output, please correct me.
Can any one help me out?
The regex is doing exactly what you asked it to: It is matching digits \d+, followed by one non-digit or star [^\d*], followed by digits \d+. The only thing that matches that in your string is 204.
If you want a quick fix, you can just move the parentheses:
m/(\d+[^\d*]\d+)/
This would (with the above input) match what you want. A more exact way to put it would be:
m/(\d+\.\d+)/
Of course this will match any float precision number, so if you can have more of those, that's not a good idea. You can shore it up by using an anchor, like so:
m/(\d+\.\d+)s\)/
Where s\) forces the match to occur at only that place. Further strictures:
m/\(\d+\D+(\d+\.\d+)s\)/
You might also want to account for the possibility of your target number not being a float:
m/\(\d+\D+(\d+\.?\d*)s\)/
By using ? and * we allow for those parts not to match at all. This is not recommended to do unless you are using anchors. You can also replace everything in the capture group with [\d.]+.
If you are not fond of matching the parentheses, you can match the text:
m/bytes in ([\d.]+)s/
I'd go with the second marker as indicator where you are in the string:
my ($num) = ($text =~ /(\d+\.\d+)s/);
with explanations:
/( # start of matching group
\d+ # first digits
\. # a literal '.', take \D if you want non-numbers
\d+ # second digits
)/x # close the matching group and the regex
You had the matching groups wrong. Also the [^\d] is a bit excessive, generally you can negate some of the backspaced special classes (\d,\h, \s and \w) with their respective uppercase letter.
Try this regex:
$text =~ m/\d+[^\d]*(\d+\.?\d*)s/;
That should match 1+ digits, a decimal point if there is one, 0 or more decimal places, and make sure it's followed by a "s".

meaning of the following regular expressions written in perl

Here is a piece of code
while($l=~/(\\\s*)$/) {
statements;
}
$l contains a line of text taken form file, in effect this code is for go through lines in file.
Questions:
I don't clearly understand what the condition in while is doing. I think it is trying to match group of \ followed by some number of white spaces at the end of line and loop should stop whenever a line ends with \ and may be some white spaces. I am not sure of it.
I came across statement $a ~= s/^(.*$)/$1/ . What I understand that ^ will force matching at the beginning of string, but in (.*$) would mean match all the characters at the end of string . Dose it mean that the statement is trying to find if any group of character at the end is same as group of character in the beginning of text ?
It is interesting to note that this statement:
while ( $l =~ /(\\\s*)$/ ) {
Is an infinite loop unless $l is altered inside the loop so that the regex no longer matches. As has already been mentioned by others, this is what it matches:
( ... ) a capture group, captures string to $1 (that's the number one, not lower case L)
\\ matches a literal backslash
\s* matches 0 or more whitespace characters.
$ matches end of line with optional newline.
Since you do not have the /g modifier, this regex will not iterate through matches, it will simply check if there is a match, resetting the regex each iteration, thereby causing an endless loop.
The statement
$a ~= s/^(.*$)/$1/
Looks rather pointless. It captures a string of characters up until end of string, then replaces it with itself. The captured text is stored in $1 and is simply replaced. The only marginally useful thing about this regex is that:
It matches up until newline \n, and nothing further, which may be of some use to a parser. A period . matches any character except newline, unless the /s modifier is present on the regex.
It captures the line in $1 for future use. However, a simple /^(.*$)/ would do the same.
1. the while
Usually while (regex) is used with the /g modifier, otherwise, if it matches, you get an infinite loop (unless you exit the loop, like using last).
statements would be executed continuously in an infinite loop.
In your case, adding the g
while($l=~/(\\\s*)$/g)
will have the while make only one loop, due to the $ - making a match unique (whatever matches up to the end of string is unique, as $ marks the end, and there is nothing after...).
2. $a ~= s/^(.*$)/$1/
This is a substitution. If the string ^.*$ matches (and it will, since ^.*$ matches (almost, see comment) anything) it is replaced with... $1 or what's inside the (), ie itself, since the match occurs from 1st char to the end of string
^ means beginning of string
(.*) means all chars
$ end of string
so that will replace $a with itself - probably not what you want.
it matches a literal backslash followed by 0 or more spaces followed by the end of the line.
it executes statements for all the lines in that text file that contain a \, followed by zero or more spaces ( \s* ), at the end of the line ($).
It matches lines that end with a backslash character, ignoring any trailing whitespace characters.
Ending a line with a backslash is used in some languages and data files to indicate that the line is being continued on the next line. So I suspect this is part of a parser that merges these continuation lines.
If you enter a regular expression at RegExr and hover your mouse over the pieces, it displays the meaning of each piece in a tooltip.
(\\\s*)$ this regex means --- a \ followed by zero or more number of white space characters which is followed by end of the line. Since you have your regex in (...), you can extract what you matched using $1, if you need.
http://rubular.com/r/dtHtEPh5DX
EDIT -- based on your update
$a ~= s/^(.$)/$1/ --- this is search and replace. So your regex matches a line which contains exactly one character (since you use . http://www.regular-expressions.info/dot.html), except a new-line character. Since you use (...), the character which matched the regex is extracted and stored in variable a
EDIT -- you changed your regex so here is the updated answer
$a ~= s/^(.*$)/$1/ -- same as above except now it matches zero or more characters (except new-line)

searching a word with a particular character in it in perl

am trying to search a word where it starts with any character (Capital letter) but ends with zero in perl.
For example
ABC0
XYZ0
EIU0
QW0
What I have tried -
$abc =~ /^[A-Z].+0$/
But I am not getting proper output for this. Can anybody help me please?
The ^ anchores at the start of a string, the $ at the end. .+ matches as many non-newline-characters as possible. Therefore
"ABC0 XYZ0 EIU0 QW0" =~ /^[A-Z].+0$/
matches the whole string.
The \b assertion matches at word edges: everywhere a word character and a non-word-character are adjacent. The \w charclass holds only word characters, the \S charclass all non-space-characters. Either of these is better than ..
So you may want to use /\b[A-Z]\W*0\b/.
This might work :
$abc =~ /\b[A-Z].*0\b/
\b matches word boundaries.

How to replace a character with null

I have one string
"8.53" I want my resulting string "853"
I have tried
the following code
tr|.||;
but its not replacing its giving 8.53 only .
I have tried another way using
tr|.|NULL|;
but its giving 8N53 can anyone please suggest me how to use tr to replace a character with NULL.
Thanks
You need to specify the d modifier to delete chars with no corresponding char:
tr/.//d;
Or you could use the (slower but more familiar) substitution operator:
s/\.//g;
You don't want tr because that transliterates characters from the 1st list with the corresponding character in the 2nd list (which was N in your example since that was the first character). You'll want the substitution operator.
my $var = "8.53";
$var =~ s/\.//;
print $var;
Add the g flag if there are multiple instances you want to replace (s/\.//g).

how to delete single quotes but not apostrophes in perl

I would like to know how to delete single quotes but not apostrophes in perl.
For example:
'It's raining again!'
print
It's raining again!
Thanks so much
If you assume that a single-quote is always preceded or followed by whitespace, the following pair of regular expressions should work:
$line =~ s/\s'/ /g; #preceded by whitespace
$line =~ s/'\s/ /g; #followed by whitespace
you also need to account for if the string starts or ends with a single quote:
$str =~ s/^'//; #at the start of a string
$str =~ s/'$//; #at the end of a string
foreach (<DATA>) {
s/(:?(^\s*'|'$))//g;
print;
}
__DATA__
'It's raining again!'
OUTPUT
It's raining again!
EXPLANATIONS
there's more one than one way to do it
(:?) prevent non-needed capture
Tricky one. Some single quotes come after or before letters, but you want to remove only those between letters. Perhaps something like this, using negative lookarounds:
s/(?<![\pL\s])'|'(?![\pL\s])//g;
Which will remove either single quotes without letters or whitespace after or before it. Lots of negations to keep track of there. The expanded version:
s/
(?<![\pL\s])' # no letters or whitespace before single quote
| # or
'(?![\pL\s]) # no letters or whitespace after single quote
//gx;
This will cover words like - as Eli Algranti pointed out in a comment - boys' toys and that's, but language is always tricky to predict. For example, it will be next to impossible to solve something like:
'She looked at him and said, 'That's impossible!''
Of course, if you expect your single quotes to appear only at end or beginning of string, you don't need to be this fancy, you can just remove the last and first character, with any means necessary. Such as, for example, as sputnik just suggested:
s/^'|'$//g;