difference between string representation and value in perl - perl

I am puzzled at the following difference.
$str="\xd6\xd0";
decode("GBK",$str);
vs.
$str="d6d0";
#list=map "\\x".$_,unpack("(a2)*", $str);
$str=join "", #list;
decode("GBK",$str);
Why in the first case, it worked to print out the character, while in the second case, it is not working? How can I make it to work in the latter case?
Many thanks.

If you're trying to turn "d6d0" into "\xd6\xd0", you want pack 'H*':
my $str = "d6d0";
$str = pack('H*', $str);
decode("GBK",$str);
join does not interpret escape sequences, it just concatenates strings.

In the first case, the parser interprets the escape sequences and builds a string that is two bytes long. In the second case, you are creating a string that is eight characters:
\xd6\xd0. You probably want to unpack like you are doing, but without prepending the \x, and then use pack with template (H2)* instead of join to put it all together.

Related

Perl: quoting correctly all special characters [duplicate]

This question already has answers here:
How can I prevent Perl from interpreting double-backslash as single-backslash character?
(3 answers)
Closed 4 years ago.
I have this sample string, containing 2 backslashes. Please don't ask me for the source of the string, it is just a sample string.
my $string = "use Ppppp\\Ppppp;";
print $string;
Both, double quotes or quotes will print
use Ppppp\Ppppp;
Using
my $string = "\Quse Ppppp\\Ppppp;\E";
print $string;
will print
use\ Ppppp\\Ppppp\;
adding those extra backslashes to the output.
Is there a simple solution in perl to display the string "literally", without modifying the string like adding extra backslashes to escape?
I have this sample string, containing 2 backslashes. ...
my $string = "use Ppppp\\Ppppp;";
Sorry, but you're mistaken - that string only contains one backslash*, as \\ is a escape sequence in double-quoted (and single-quoted) strings that produces a single backslash. See also "Quote and Quote-like Operators" in perlop. If your string really does contain two backslashes, then you need to write "use Ppppp\\\\Ppppp;", or use a heredoc, as in:
chomp( my $string = <<'ENDSTR' );
use Ppppp\\Ppppp;
ENDSTR
If you want the string output as valid Perl source code (using its escaping), then you can use one of several options:
my $string = "use Ppppp\\Ppppp;";
# option 1
use Data::Dumper;
$Data::Dumper::Useqq=1;
$Data::Dumper::Terse=1;
print Dumper($string);
# option 2
use Data::Dump;
dd $string;
# option 3
use B;
print B::perlstring($string);
Each one of these will print "use Ppppp\\Ppppp;". (There are of course other modules available too. Personally I like Data::Dump. Data::Dumper is a core module.)
Using one of these modules is also the best way to verify what your $string variable really contains.
If that still doesn't fit your needs: A previous edit of your question said "How can I escape correctly all special characters including backslash?" - you'd have to specify a full list of which characters you consider special. You could do something like this, for example:
use 5.014; # for s///r
my $string = "use Ppppp\\Ppppp;";
print $string=~s/(?=[\\])/\\/gr;
That'll print $string with backslashes doubled, without modifying $string. You can also add more characters to the regex character class to add backslashes in front of those characters as well.
* Update: So I don't sound too pedantic here: of course the Perl source code contains two backslashes. But there is a difference between the literal source code and what the Perl string ends up containing, the same way that the string "Foo\nBar" contains a newline character instead of the two literal characters \ and n.
For the sake of completeness, as already discussed in the comments: \Q\E (aka quotemeta) is primarily meant for escaping any special characters that may be special to regular expressions (all ASCII characters not matching /[A-Za-z_0-9]/), which is why it is also escaping the spaces and semicolon.
Since you mention external files: If you are reading a line such as use Ppppp\\Ppppp; from an external file, then the Perl string will contain two backslashes, and if you print it, it will also show two backslashes. But if you wanted to represent that string as Perl source code, you have to write "use Ppppp\\\\Ppppp;" (or use one of the other methods from the question you linked to).

How to trim the last part of string using Perl?

I need to trim the last part of the string here. The string i have is : abc|bcd|cde|.
I need to get rid of the last |... The trim() command is not helping for some reason may be i am using it wrong, please help... thank you.
This can be done with a simple substitution.
my $string = "abc|bcd|cde|";
$string =~ s/\|$//;
Use chop:
my $string = "abc|bcd|cde|";
chop($string);
say $string;
output:
abc|bcd|cde
From the doc:
Chops off the last character of a string and returns the character
chopped. It is much more efficient than s/.$//s because it neither
scans nor copies the string. If VARIABLE is omitted, chops $_ . If
VARIABLE is a hash, it chops the hash's values, but not its keys.

Why does split not return anything?

I am trying to get that Perl split working for more than 2 hours. I don't see an error. Maybe some other eyes can look at it and see the issue. I am sure its a silly one:
#versionsplit=split('.',"15.0.3");
print $versionsplit[0];
print $versionsplit[1];
print $versionsplit[2];
I just get an empty array. Any idea why?
You need:
#versionsplit=split(/\./,"15.0.3");
The first argument to split is a regular expression, not a string. And . is the regex symbol which means ‘match any character’. So all the characters in your input string were being treated as separators, and split wasn't finding anything between them to return.
the "." represents any character.You need to escape it for split function to recognise as a field separator.
change your line to
#versionsplit=split('\.',"15.0.3");

Split by dot using Perl

I use the split function by two ways. First way (string argument to split):
my $string = "chr1.txt";
my #array1 = split(".", $string);
print $array1[0];
I get this error:
Use of uninitialized value in print
When I do split by the second way (regular expression argument to split), I don't get any errors.
my #array1 = split(/\./, $string); print $array1[0];
My first way of splitting is not working only for dot.
What is the reason behind this?
"\." is just ., careful with escape sequences.
If you want a backslash and a dot in a double-quoted string, you need "\\.". Or use single quotes: '\.'
If you just want to parse files and get their suffixes, better use the fileparse() method from File::Basename.
Additional details to the information provided by Mat:
In split "\.", ... the first parameter to split is first interpreted as a double-quoted string before being passed to the regex engine. As Mat said, inside a double-quoted string, a \ is the escape character, meaning "take the next character literally", e.g. for things like putting double quotes inside a double-quoted string: "\""
So your split gets passed "." as the pattern. A single dot means "split on any character". As you know, the split pattern itself is not part of the results. So you have several empty strings as the result.
But why is the first element undefined instead of empty? The answer lies in the documentation for split: if you don't impose a limit on the number of elements returned by split (its third argument) then it will silently remove empty results from the end of the list. As all items are empty the list is empty, hence the first element doesn't exist and is undefined.
You can see the difference with this particular snippet:
my #p1 = split "\.", "thing";
my #p2 = split "\.", "thing", -1;
print scalar(#p1), ' ', scalar(#p2), "\n";
It outputs 0 6.
The "proper" way to deal with this, however, is what #soulSurfer2010 said in his post.

How can I prevent Perl from interpreting \ as an escape character?

How can I print a address string without making Perl take the slashes as escape characters? I don't want to alter the string by adding more escape characters also.
What you're asking about is called interpolation. See the documentation for "Quote-Like Operators" at perldoc perlop, but more specifically the way to do it is with the syntax called the "here-document" combined with single quotes:
Single quotes indicate the text is to be treated literally with no interpolation of its content. This is similar to single quoted strings except that backslashes have no special meaning, with \ being treated as two backslashes and not one as they would in every other quoting construct.
This is the only form of quoting in perl where there is no need to worry about escaping content, something that code generators can and do make good use of.
For example:
my $address = <<'EOF';
blah#blah.blah.com\with\backslashes\all\over\theplace
EOF
You may want to read up on the various other quoting operators such as qw and qq (at the same document as I referenced above), as they are very commonly used and make good shorthand for other more long-winded ways of escaping content.
Use single quotes. For example
print 'lots\of\backslashes', "\n";
gives
lots\of\backslashes
If you want to interpolate variables, use the . operator, as in
$var = "pesky";
print 'lots\of\\' . $var . '\backslashes', "\n";
Notice that you have to escape the backslash at the end of the string.
As an alternative, you could use join:
print join("\\" => "lots", "of", $var, "backslashes"), "\n";
We could give much more helpful answers if you'd give us sample code.
It depends what you're escaping, but the Quote-like operators may help.
See the perlop man page.
Use the backslah two times,
print "This is a backslah character \\";