Java like trim function for perl - perl

Is there an Java like trim function for Perl.
I am looking for function in Perl that removes all the leading and trailing characters below 0x20, like in Java.
After calling the function on the following string.
my $string = "\N{U+0020}\N{U+001f}\N{U+001e}\N{U+001d}\N{U+001c}\N{U+001b}\N{U+001a}\N{U+0019}\N{U+0018}\N{U+0017}\N{U+0016}\N{U+0015}\N{U+0014}\N{U+0013}\N{U+0012}\N{U+0011}Hello Moto\N{U+0010}\N{U+000f}\N{U+000e}\N{U+000d}\N{U+000c}\N{U+000b}\N{U+000a}\N{U+0009}\N{U+0008}\N{U+0007}\N{U+0006}\N{U+0005}\N{U+0004}\N{U+0003}\N{U+0002}\N{U+0001}\N{U+0000}";
Only "Hello Moto" should be left.
The trim from String::Util only removes the first whitespace (\N{U+0020}).

The traditional ASCII way was to use
$string =~ s/^\s+|\s+$//g;
(i.e. remove whitespace (\s) from the beginning (^) and end ($) of the string.
U+001f is not whitespace, it's a Control. You can use Unicode properties in regular expressions with \p:
my $drop = qr/[\p{Space}\p{Cc}]+/;
$whitespace =~ s/^$drop|$drop$//g;
Or, more verbose:
$drop = qr/[\p{White_Space}\p{Cntrl}]+/;
You should probably change the name of the variable.

Related

Perl - can't remove trailing characters at the end of string

I have some trailing characters at the end of a string peregrinevwap^_^_
print "JH 4 - app: $application \n";
app: peregrinevwap^_^_
Do you know why they are there and how I can remove them. I tried the chomp command but this hasn't worked.
Check out the tr//cd operator to get rid of unwanted characters.
It's documented in "perldoc perlop"
$application =~ tr/a-zA-Z//cd;
Will remove everything except letters from the string and
$application =~ tr/^_//d;
Will remove all "^" and "_" characters.
If you only want to remove certain characters when they at the end of the string, use the s// search/replace operator with regular expressions and the $ anchor to match the end of the string.
Here's an example:
s/[\^_]*$//;
Let's hope the underscores do not occur at the end of your strings, otherwise you can't automatically separate them from these unwanted characters.
Are you sure these characters are actually ^ and _ characters?
^_ could also indicate Ctrl-Underscore, ASCII character 0x1F (Unit Separator). (Not a character I've ever seen used, but you never know.)
If this is in fact the case, you can remove them with something like:
$application =~ s/\x1F//g;

Perl Search and Replace — issues is caused by "\"

I am parsing a text doc and replacing some text. Lines of text without the "\" seem to be found and replaced no issues.
By the way this is to be done in Perl
I have a string like below:
Path=S:\2014 March\Test Scenarios\load\2014 March
that contains "\" that slash is an issue. I am using a simple search and replace line of code
$nExit =~ s/$sMatchPattern/$sFullReplacementString/;
How should I do it?
I suspect that you're trying to match a literal string, and therefore need to escape regex special characters.
You can use quotemeta or the escape codes \Q ... \E to do that:
$nExit = s/\Q$sMatchPattern/$sFullReplacementString/;
The above variable $sMatchPattern will be interpolated, but then any special characters will be escaped before the regex is compiled. Therefore the value of $sMatchPattern will be treated like a literal string.
Is this string inputed, or is it embedded in your program. You could do this to get rid of the backslash character:
my $path = "S:/2014 March/Test Scenarios/load/2014 March";
By the way, it's best not to have spaces in file and path names. They can be a bit problematic in certain situations. If you can't eliminate them, it's understandable.
Two things you should look at:
Use quotemeta which can help quote special characters in strings and allow you to use them in substitutions. Even if you had backslashes in your strings, quotemeta will handle them.
You don't have to use / as separators in match and substitutions. Instead, you can substitute various other characters.
These are all the same:
$string =~ s/$regex/$replace/;
$string =~ s#$regex#$replace#;
$string =~ s|$regex|$replace|;
You can also use parentheses, square braces, or curly brackets:
$string =~ s($regex)($replace);
$string =~ s[$regex][$replace]; # Not really recommended because `[...]` is a common regex
$string =~ s{$regex}{$replace};
The advantage of these as regular expression quote-like characters is that they must be balanced, so if I had this:
my $string = "I have (parentheses) in my string";
my $regex = "(parentheses}";
my $replace = "{curly braces}";
$string = s($regex)($replace);
print "$string\n"; # Still works. This will be "I have {curly braces} in my string"
Even if my string contains these types of characters, as long as they're balanced, everything will still work.
For yours:
my $Path = 'S:\2014 March\Test Scenarios\load\2014 March';
$nExit = quotemeta $string; #Quotes all meta characters...
$nExit =~ s($sMatchPattern)($sFullReplacementString);
That should work for you.
if you want to have a \ in your replacement string or match string dont forget to put another backslash in front of the backslash you want, as its an operator...
$sFullReplacementString = "\\";
That would turn the string into a single \

split string with "."

I am trying to split a string with "." but getting nothing in the array. File name is "Head-First-Java-2nd-edition.pdf" After splitting I want to extract extension, but don't know why it is giving blank array.
my #fileInfo = split(/./, $filename);
&logMsg("Array is: #fileInfo");
The split is giving an empty list because you are splitting on a wildcard .. Period is a meta character, and if you want to split on a literal period, you need to escape it
my #fileInfo = split(/\./, $filename);
Also, the syntax for calling a subroutine is NAME(LIST). Using the & prefix has a certain hidden feature, in that it circumvents prototypes. Read more in perldoc perlsub.
. in a regular expression means any character except \n. To split on a literal ., you need to escape it:
split /\./, $filename;

How to search for a string that contains no whitespace in perl

my $string3 = "anima ls";
my $t3 = $string3 =~ /[^\s]+/;
print "$t3\n";
I wanted to write a regex that searches for a string containing no whitespace. The above code works even if i give space.
The regex [^\s]+ searches for at least one character that is not whitespace. It is better written as \S+, though. A regex that matches any string that does not contain a whitespace character is rather
/^\S+$/

How do I escape special characters for a substitution in a Perl one-liner?

Is there some way to replace a string such as #or * or ? or & without needing to put a "\" before it?
Example:
perl -pe 'next if /^#/; s/\#d\&/new_value/ if /param5/' test
In this example I need to replace a #d& with new_value but the old value might contain any character, how do I escape only the characters that need to be escaped?
You have several problems:
You are using \b incorrectly
You are replacing code with shell variables
You need to quote metacharacters
From perldoc perlre
A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it
Neither of the characters # or & are \w characters. So your match is guaranteed to fail. You may want to use something like s/(^|\s)\#d\&(\s|$)/${1}new text$2/
(^|\s) says to match either the start of the string (^)or a whitespace character (\s).
(\s|$) says to match either the end of the string ($) or a whitespace character (\s).
To solve the second problem, you should use %ENV.
To solve the third problem, you should use the \Q and \E escape sequences to escape the value in $ENV{a}.
Putting it all together we get:
#!/bin/bash
export a='#d&'
export b='new text'
echo 'param5 #d&' |
perl -pe 'next if /^#/; s/(^|\s)\Q$ENV{a}\E(\s|$)/$1$ENV{b}$2/ if /param5/'
Which prints
param5 new text
As discussed at perldoc perlre:
...Today it is more common to use the quotemeta() function or the "\Q" metaquoting
escape sequence to disable all metacharacters' special meanings like this:
/$unquoted\Q$quoted\E$unquoted/
Beware that if you put literal backslashes (those not inside interpolated variables) between "\Q" and "\E", double-quotish backslash interpolation may
lead to confusing results. If you need to use literal backslashes within "\Q...\E", consult "Gory details of parsing quoted constructs" in perlop.
You can also use a ' as the delimiter in the s/// operation to make everything be parsed literally:
my $text = '#';
$text =~ s'#'1';
print $text;
In your example, you can do (note the single quotes):
perl -pe 's/\b\Q#f&\E\b/new_value/g if m/param5/ and not /^ *#/'
The other answers have covered the question, now here's your meta-problem: Leaning Toothpick Syndrome. Its when the delimiter and escapes start to blur together:
s/\/foo\/bar\\/\/bar\/baz/
The solution is to use a different delimiter. You can use just about anything, but balanced braces work best. Most editors can parse them and you generally don't have to worry about escaping.
s{/foo/bar\\}{/bar/baz}
Here's your regex with braced delimiters.
s{\#d\&}{new_value}
Much easier on the eyeholes.
If you really want to avoid typing the \s, put your search string into a variable and then use that in your regex instead. You don't need quotemeta or \Q ... \E in that case. For example:
my $s = '#d&';
s/$s/new_value/g;
If you must use this in a one-liner, bear in mind that you will have to escape the $s if you use "s to contain your perl code, or escape the 's if you use 's to contain your perl code.
If you have a string like
my $var1 = abc$123
and you want to replace it with abcd then you have to use \Q \E. If you don't then no matter what perl doesn't replace the string.
This is the only thing that worked for me.
my $var2 = s/\Q$var1\E/abcd/g;