Perl - can't remove trailing characters at the end of string - perl

I have some trailing characters at the end of a string peregrinevwap^_^_
print "JH 4 - app: $application \n";
app: peregrinevwap^_^_
Do you know why they are there and how I can remove them. I tried the chomp command but this hasn't worked.

Check out the tr//cd operator to get rid of unwanted characters.
It's documented in "perldoc perlop"
$application =~ tr/a-zA-Z//cd;
Will remove everything except letters from the string and
$application =~ tr/^_//d;
Will remove all "^" and "_" characters.
If you only want to remove certain characters when they at the end of the string, use the s// search/replace operator with regular expressions and the $ anchor to match the end of the string.
Here's an example:
s/[\^_]*$//;
Let's hope the underscores do not occur at the end of your strings, otherwise you can't automatically separate them from these unwanted characters.

Are you sure these characters are actually ^ and _ characters?
^_ could also indicate Ctrl-Underscore, ASCII character 0x1F (Unit Separator). (Not a character I've ever seen used, but you never know.)
If this is in fact the case, you can remove them with something like:
$application =~ s/\x1F//g;

Related

what does this $tok =~ s{\\(.)|([\$\#]|\\$)}{'\\'.($2 || $1)}sge; (perl code) mean?

What does this mean?
$tok =~ s{\\(.)|([\$\#]|\\$)}{'\\'.($2 || $1)}sge;
This comes from a cve study blog which written in Perl. I know this is a regular expression, the content in the second {} should replace that in the first, but I do NOT get what '\\'.($2 || $1)means.
$tok =~ s{\\(.)|([\$\#]|\\$)}{'\\'.($2 || $1)}sge;
It is a substitution operator s/// applied to the string $tok, with the modifiers sge. The delimiters of the operator has been changed from / to {}. Lets break that regex down
s{
\\(.) # (1) match a backslash followed by 1 character, capture
| # (2) or
( # (3) start capture parens
[\$\#] # (4) either a literal $ or #
| # (5) or
\\$ # (6) backslash at the end of line (including newline)
) # end capture parens
}{ # replace with
'\\'.($2 || $1)} # (7) backslash concatenated with either capture 2 or 1
sge; # (8) s = . matches newline, g = match multiple times, e = eval
Judging (at a glance) from the rest of that blog code, this code is not written by someone skilled at Perl. So I will take their comments at face value:
# must protect unescaped "$" and "#" symbols, and "\" at end of string
The eval (8) is apparently to concatenate a backslash with either capture group 2 (2) or 1 (1), depending on which is "true". Or rather, which one matched the string.
Looking closer at the code, (1) and (6) are very similar. The latter one will trigger only at the end of a line that does not have a newline, whereas the first one will handle all other cases, including end of line with a newline (because of /s modifier).
(1) will match any escaped character, so \1, or \$ or \\ anything with a backslash followed by a character. If we look at the replacement part (7), we see that this capture group is the fallback, which will only trigger if the second capture group fails. The second capture group also only matches if the first fails. Confusing? Maybe a little.
(2) triggers if the matching character is not a backslash followed by a character. Now we are looking for a literal $ or #. Or failing that, a backslash at the end of line. But wait a minute, we already checked for backslash? Yes, but this is an edge case.
In the case of (1) matching, $2 will be undefined, and $1, the first capture group, a single character, will be put back into the text. The backslash that was before it will be removed in (1), and then put back in (7). This will not really do anything, just make the regex not destroy already escaped characters.
In the case of (2) matching, it will either be an end of line backslash that is consumed (6) and put back (7), or it will be a $ or # which is consumed (4) and put back (7), with a backslash in front.
So basically what the OP says in the comment is happening.

Java like trim function for perl

Is there an Java like trim function for Perl.
I am looking for function in Perl that removes all the leading and trailing characters below 0x20, like in Java.
After calling the function on the following string.
my $string = "\N{U+0020}\N{U+001f}\N{U+001e}\N{U+001d}\N{U+001c}\N{U+001b}\N{U+001a}\N{U+0019}\N{U+0018}\N{U+0017}\N{U+0016}\N{U+0015}\N{U+0014}\N{U+0013}\N{U+0012}\N{U+0011}Hello Moto\N{U+0010}\N{U+000f}\N{U+000e}\N{U+000d}\N{U+000c}\N{U+000b}\N{U+000a}\N{U+0009}\N{U+0008}\N{U+0007}\N{U+0006}\N{U+0005}\N{U+0004}\N{U+0003}\N{U+0002}\N{U+0001}\N{U+0000}";
Only "Hello Moto" should be left.
The trim from String::Util only removes the first whitespace (\N{U+0020}).
The traditional ASCII way was to use
$string =~ s/^\s+|\s+$//g;
(i.e. remove whitespace (\s) from the beginning (^) and end ($) of the string.
U+001f is not whitespace, it's a Control. You can use Unicode properties in regular expressions with \p:
my $drop = qr/[\p{Space}\p{Cc}]+/;
$whitespace =~ s/^$drop|$drop$//g;
Or, more verbose:
$drop = qr/[\p{White_Space}\p{Cntrl}]+/;
You should probably change the name of the variable.

Perl Search and Replace — issues is caused by "\"

I am parsing a text doc and replacing some text. Lines of text without the "\" seem to be found and replaced no issues.
By the way this is to be done in Perl
I have a string like below:
Path=S:\2014 March\Test Scenarios\load\2014 March
that contains "\" that slash is an issue. I am using a simple search and replace line of code
$nExit =~ s/$sMatchPattern/$sFullReplacementString/;
How should I do it?
I suspect that you're trying to match a literal string, and therefore need to escape regex special characters.
You can use quotemeta or the escape codes \Q ... \E to do that:
$nExit = s/\Q$sMatchPattern/$sFullReplacementString/;
The above variable $sMatchPattern will be interpolated, but then any special characters will be escaped before the regex is compiled. Therefore the value of $sMatchPattern will be treated like a literal string.
Is this string inputed, or is it embedded in your program. You could do this to get rid of the backslash character:
my $path = "S:/2014 March/Test Scenarios/load/2014 March";
By the way, it's best not to have spaces in file and path names. They can be a bit problematic in certain situations. If you can't eliminate them, it's understandable.
Two things you should look at:
Use quotemeta which can help quote special characters in strings and allow you to use them in substitutions. Even if you had backslashes in your strings, quotemeta will handle them.
You don't have to use / as separators in match and substitutions. Instead, you can substitute various other characters.
These are all the same:
$string =~ s/$regex/$replace/;
$string =~ s#$regex#$replace#;
$string =~ s|$regex|$replace|;
You can also use parentheses, square braces, or curly brackets:
$string =~ s($regex)($replace);
$string =~ s[$regex][$replace]; # Not really recommended because `[...]` is a common regex
$string =~ s{$regex}{$replace};
The advantage of these as regular expression quote-like characters is that they must be balanced, so if I had this:
my $string = "I have (parentheses) in my string";
my $regex = "(parentheses}";
my $replace = "{curly braces}";
$string = s($regex)($replace);
print "$string\n"; # Still works. This will be "I have {curly braces} in my string"
Even if my string contains these types of characters, as long as they're balanced, everything will still work.
For yours:
my $Path = 'S:\2014 March\Test Scenarios\load\2014 March';
$nExit = quotemeta $string; #Quotes all meta characters...
$nExit =~ s($sMatchPattern)($sFullReplacementString);
That should work for you.
if you want to have a \ in your replacement string or match string dont forget to put another backslash in front of the backslash you want, as its an operator...
$sFullReplacementString = "\\";
That would turn the string into a single \

meaning of the following regular expressions written in perl

Here is a piece of code
while($l=~/(\\\s*)$/) {
statements;
}
$l contains a line of text taken form file, in effect this code is for go through lines in file.
Questions:
I don't clearly understand what the condition in while is doing. I think it is trying to match group of \ followed by some number of white spaces at the end of line and loop should stop whenever a line ends with \ and may be some white spaces. I am not sure of it.
I came across statement $a ~= s/^(.*$)/$1/ . What I understand that ^ will force matching at the beginning of string, but in (.*$) would mean match all the characters at the end of string . Dose it mean that the statement is trying to find if any group of character at the end is same as group of character in the beginning of text ?
It is interesting to note that this statement:
while ( $l =~ /(\\\s*)$/ ) {
Is an infinite loop unless $l is altered inside the loop so that the regex no longer matches. As has already been mentioned by others, this is what it matches:
( ... ) a capture group, captures string to $1 (that's the number one, not lower case L)
\\ matches a literal backslash
\s* matches 0 or more whitespace characters.
$ matches end of line with optional newline.
Since you do not have the /g modifier, this regex will not iterate through matches, it will simply check if there is a match, resetting the regex each iteration, thereby causing an endless loop.
The statement
$a ~= s/^(.*$)/$1/
Looks rather pointless. It captures a string of characters up until end of string, then replaces it with itself. The captured text is stored in $1 and is simply replaced. The only marginally useful thing about this regex is that:
It matches up until newline \n, and nothing further, which may be of some use to a parser. A period . matches any character except newline, unless the /s modifier is present on the regex.
It captures the line in $1 for future use. However, a simple /^(.*$)/ would do the same.
1. the while
Usually while (regex) is used with the /g modifier, otherwise, if it matches, you get an infinite loop (unless you exit the loop, like using last).
statements would be executed continuously in an infinite loop.
In your case, adding the g
while($l=~/(\\\s*)$/g)
will have the while make only one loop, due to the $ - making a match unique (whatever matches up to the end of string is unique, as $ marks the end, and there is nothing after...).
2. $a ~= s/^(.*$)/$1/
This is a substitution. If the string ^.*$ matches (and it will, since ^.*$ matches (almost, see comment) anything) it is replaced with... $1 or what's inside the (), ie itself, since the match occurs from 1st char to the end of string
^ means beginning of string
(.*) means all chars
$ end of string
so that will replace $a with itself - probably not what you want.
it matches a literal backslash followed by 0 or more spaces followed by the end of the line.
it executes statements for all the lines in that text file that contain a \, followed by zero or more spaces ( \s* ), at the end of the line ($).
It matches lines that end with a backslash character, ignoring any trailing whitespace characters.
Ending a line with a backslash is used in some languages and data files to indicate that the line is being continued on the next line. So I suspect this is part of a parser that merges these continuation lines.
If you enter a regular expression at RegExr and hover your mouse over the pieces, it displays the meaning of each piece in a tooltip.
(\\\s*)$ this regex means --- a \ followed by zero or more number of white space characters which is followed by end of the line. Since you have your regex in (...), you can extract what you matched using $1, if you need.
http://rubular.com/r/dtHtEPh5DX
EDIT -- based on your update
$a ~= s/^(.$)/$1/ --- this is search and replace. So your regex matches a line which contains exactly one character (since you use . http://www.regular-expressions.info/dot.html), except a new-line character. Since you use (...), the character which matched the regex is extracted and stored in variable a
EDIT -- you changed your regex so here is the updated answer
$a ~= s/^(.*$)/$1/ -- same as above except now it matches zero or more characters (except new-line)

How do I replace all occurrences of certain characters with their predecessors?

$s = "bla..bla";
$s =~ s/([^%])\./$1/g;
I think it should replace all occurrences of . that is not after % with the character that is before ..
But $s is then: bla.bla, but
it should be blabla. Where is the problem? I know I can use quantifiers, but I need do it this way.
When a global regular expression is searching a string it will not find overlapping matches.
The first match in your string will be a., which is replaced with a. When the regex engine resumes searching it starts at the next . so it sees .bla as the rest of the string, and your regex requires a character to match before the . so it cannot match again.
Instead, use a negative lookbehind to perform the assertion that the previous character is not %:
$s =~ s/(?<!%)\.//g;
Note that if you use a positive lookbehind like (?<=[^%]), you will not replace the . if it is the first character in the string.
The problem is that even with the /g flag, each substitution starts looking where the previous one left off. You're trying to replace a. with a and then a. with a, but the second replacement doesn't happen because the a has already been "swallowed" by the previous replacement.
One fix is to use a zero-width lookbehind assertion:
$s =~ s/(?<=[^%])\.//g;
which will remove any . that is not the first character in the string, and that is not preceded by %.
But you might actually want this:
$s =~ s/(?<!%)\.//g;
which will remove any . that is not preceded by %, even if it is the first character in the string.
Much simpler than look-behinds is to use:
$s =~ s/([^%])\.+/$1/g;
This replaces any string of one or more dots after a character other than % by nothing.