How can I compare md5 checksums in Perl? - perl

I'm trying to compare the checksum value of a file.
One variable $a has the checksum (output of md5sum command, only the hexadecimal part)
and the same value is in variable $b.
If I do ($a == $b), I am getting an error, but if I do ($a eq $b) it gives not equal.
Thanks for your answers, it worked in string comparison after trimming the white spaces, though the use of chomp din't work.

You are comparing strings, not numbers, so do use eq.
Also use lc(), and chomp() or $a=~s/^\s+//;$a=~s/\s+$//;.
You do have the pretty decent option of converting the input to numbers with hex() and using ==.
Try:
if (hex($a) == hex($b)){}
This all depends on how well you're handling the output of your md5sum command. Mine looks like this:
dlamblin$ md5 .bash_history
MD5 (.bash_history) = 61a4c02cbd94ad8604874dda16bdd0d6
So I process it with this:
dlamblin$ perl -e '$a=`md5 .bash_history`;$a=~s/^.*= |\s+$//g;print $a,"\n";'
61a4c02cbd94ad8604874dda16bdd0d6
Now I do notice that hex() has an integer overflow error on this so you'll want to use bigint;
dlamblin$ perl -e '
$a=`md5 .bash_history`;$a=~s/^.*= |\s+$//g;print hex($a),"\n";'
Integer overflow in hexadecimal number at -e line 1.
1.29790550043292e+38
dlamblin$ perl -Mbigint -e '
$a=`md5 .bash_history`;$a=~s/^.*= |\s+$//g;print hex($a),"\n";'
129790550043292010470229278762995667158

Make sure that your strings don't have new-lines or other characters at the end. If in doubt, chomp() both then compare. Also (just to cover off the exceedingly obvious), they are both using the same case to encode the hex chars?

If ($a eq $b) is false, then they are indeed not equal. If you've ruled out obvious differences like "filename: " on one of them, you need to look for whitespace or nonprintable character differences. The easy way to do that is:
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper($a);
print Dumper($b);

Related

perl substitute numbers to alphabet,used with for

Perl substitute all Numbers to Alphabet
abc4xyz5u
to
abcdxyzeu
I try this,but it not work:
echo 'abc4xyz5u' | perl -pe'#n=1..9;#a=a..j;#h{#n}=#a;s#$n[$_]#$h{$&}#g for 0..$#n'
I know y/[1-9]/[a-j]/, but I want to use a substitute.
Your issue is within
s#$n[$_]#$h{$&}#g for 0..$#n
You expect $_ to be your input (so that s### is applied on it), but also $n[$_] to use the $_ from the for loop (0 to $#n). If you were to add a print, you'd notice that $_'s value within this loop is 0 to $#n, rather than your input.
What you could do instead to fix it is something like:
$r=$_; $r=~s#$n[$_]#$h{$&}#g for 0..$#n; $_=$r
But that's much more complicated that it has to be. I would instead do:
s#([1-9])#$h{$1}#g
Or, without using %h (since, let's face it, an hash with 0 => a, 1 => b etc. should be an array):
perl -pe '#a="a".."j"; s#([1-9])#$a[$1-1]#g'
Or, without requiring an array at all (I'll let you decide if you find it easier or harder to read; personally I'm fine with it),
perl -pe 's/([1-9])/chr(ord("a")+$1-1)/ge'
I would suggest to write it properly as a perl script.
The one liner you mentioned is little hard to understand.
use strict;
use warnings;
my #alphabets = ("a".."z");
my $input = $ARGV[0];
$input =~ s/(\d)/$alphabets[$1 - 1]/g;
print $input;
Run -
perl substitute.pl abc4xyz5u
Output -
abcdxyzeu
I am serching for the number in the string and replacing it with the alphabet on the same position(remenber array start form 0 index and hence 'position -1') in the 'alphabets' array

Unexpected character when running one-liner on Windows

I want to generate an output file that shows the frequency of each word inside an input file. After some search, I found that Perl is the ideal language for this problem, but I don't know this language.
After some more search, I found the following code here at stackoverflow, supposedly it provides the solution I want at great efficiency:
perl -lane '$h{$_}++ for #F; END{for $w (sort {$h{$b}<=>$h{$a} || $a cmp $b} keys %h) {print "$h{$w}\t$w"}}' file > freq
I tried running this command line using the form below:
perl -lane 'code' input.txt > output.txt
The execution halts due to an unexpected '>' (the one at '<=>'). I did some research but can't understand what is wrong.
Could some one enlight me? Thanks!
Here is the topic from where I got the code:
Elegant ways to count the frequency of words in a file
If it's relevant, my words use letters and numbers and are separated by a single white space.
You are probably using Windows. You therefore need to use doubles quotes " instead of singles quotes ' around your code:
perl -lane "$h{$_}++ for #F; END{for $w (sort {$h{$b}<=>$h{$a} || $a cmp $b} keys %h) {print qq($h{$w}\t$w)}}" file > freq
Also, note how I used qq() instead of "..." within the code, as suggested by #mob. Another option is to escape the quotes with \".

Preserving backslashes in Perl strings

Is there a way in Perl to preserve and print all backslashes in a string variable?
For example:
$str = 'a\\b';
The output is
a\b
but I need
a\\b
The problem is can't process the string in any way to escape the backslashes because
I have to read complex regular expressions from a database and don't know in which combination and number they appear and have to print them exactly as they are on a web page.
I tried with template toolkit and html and html_entity filters. The only way it works so far is to use a single quoted here document:
print <<'XYZ';
a\\b
XYZ
But then I can't interpolate variables which makes this solution useless.
I tried to write a string to a web page, into file and on the shell, but no luck, always one backslash disappears. Maybe I am totally on the wrong track, but what is the correct way to print complex regular expressions including backslashes in all combinations and numbers without any changes?
In other words:
I have a database containing hundreds of regular expressions as string data. I want to read them with perl and print them on a web page exatly as they are in the database.
There are all the time changes to these regular expressions by many administrators so I don't know in advance how and what to escape.
A typical example would look like this:
'C:\\test\\file \S+'
but it could change the next day to
'\S+ C:\\test\\file'
Maybe a correct conclusion would be to escape every backslash exactly one time no matter in which combination and in which number it appears? This would mean it works to double them up. Then the problem isn't as big as I feared. I tested it on the bash and it works with two and even three backslashes in a row (4 backslaches print 2 ones and 6 backslashes print 3 ones).
The backslash only has significance to Perl when it occurs in Perl source code, e.g.: your assignment of a literal string to a variable:
my $str = 'a\\b';
However, if you read data from a file (or a database or socket etc) any backslashes in the data you read will be preserved without you needing to take any special steps.
my $str = 'a\\b';
print $str;
This prints a\\b.
Use
my $str = 'a\\\\b';
instead
It's a PITA, but you will just have to double up the backslashes, e.g.
a\\\\b
Otherwise, you could store the backslash in another variable, and interpolate that.
The minimum to get two slashes is (unfortunately) three slashes:
use 5.016;
my $a = 'a\\\b';
say $a;
The problem I tried to solve does not exist. I confused initializing a string directly in the code with using the html forms. Using a string inside the code preserving all backslashes is only possible either with a here document or by reading a textfile containing the string. But if I just use the html form on a web page to insert a string and use escapeHTML() from the CGI module it takes care of all and you can insert the most wired combinations of special characters. They all get displayed and preserved exactly as inserted. So I should have started directly with html and database operations instead of trying to examine things first
by using strings directly in the code. Anyway, thanks for your help.
You can use the following regular expression to form your string correctly:
my $str = 'a\\b';
$str =~ s/\\/\\\\/g;
print "$str\n";
This prints a\\b.
EDIT:
You can use non-interpolating here-document instead:
my $str = <<'EOF';
a\\b
EOF
print "$str\n";
This still prints a\\b.
Grant's answer provided the hint I needed. Some of the other answers did not match Perl's operation on my system so ...
#!/usr/bin/perl
use warnings;
use strict;
my $var = 'content';
print "\'\"\N{U+0050}\\\\\\$var\n";
print <<END;
\'\"\N{U+0050}\\\\\\$var\n
END
print '\'\"\N{U+0050}\\\\\\$var\n'.$/;
my $str = '\'\"\N{U+0050}\\\\\\$var\n';
print $str.$/;
print #ARGV;
print $/;
Called from bash ... using the bash means of escaping in quotes which changes \' to '\''.
jamie#debian:~$ ./ft.pl '\'\''\"\N{U+0050}\\\\\\$var\n'
'"P\\\content
'"P\\\content
'\"\N{U+0050}\\\$var\n
'\"\N{U+0050}\\\$var\n
\'\"\N{U+0050}\\\\\\$var\n
The final line, with six backslashes in the middle, was what I had expected. Reality differed.
So:
"in here \" is interpolated
in HEREDOC \ is interpolated
'in single quotes only \' is interpolated and only for \ and ' (are there more?)
my $str = 'same limited \ interpolation';
perl.pl 'escape using bash rules' with #ARGV is not interpolated

Perl ord and chr working with unicode

To my horror I've just found out that chr doesn't work with Unicode, although it does something. The man page is all but clear
Returns the character represented by that NUMBER in the character set. For example, chr(65)" is "A" in either ASCII or Unicode, and chr(0x263a) is a Unicode smiley face.
Indeed I can print a smiley using
perl -e 'print chr(0x263a)'
but things like chr(0x00C0) do not work. I see that my perl v5.10.1 is a bit ancient, but when I paste various strange letters in the source code, everything's fine.
I've tried funny things like use utf8 and use encoding 'utf8', I haven't tried funny things like use v5.12 and use feature 'unicode_strings' as they don't work with my version, I was fooling around with Encode::decode to find out that I need no decoding as I have no byte array to decode. I've read much more documentation than ever before, and found quite a few interesting things but nothing helpful. It looks like a sort of the Unicode Bug but there's no usable solution given. Moreover I don't care about the whole string semantics, all I need is a trivial function.
So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq 'À' holds?
The first answer I've got explains quite everything about IO, but I still don't understand why
#!/usr/bin/perl -w
use strict;
use utf8;
use encoding 'utf8';
print chr(0x00C0) eq 'À' ? 'eq1' : 'ne1', " - ", chr(0x263a) eq '☺' ? 'eq1' : 'ne1', "\n";
print 'À' =~ /\w/ ? "match1" : "no_match1", " - ", chr(0x00C0) =~ /\w/ ? "match2" : "no_match2", "\n";
prints
ne1 - eq1
match1 - no_match2
It means that the manually entered 'À' differs from chr(0x00C0). Moreover, the former is a word constituent character (correct!) while the latter is not (but should be!).
First,
perl -le'print chr(0x263A);'
is buggy. Perl even tells you as much:
Wide character in print at -e line 1.
That doesn't qualify as "working". So while they differ in how fail to provide what you want, neither of the following gives you what you want:
perl -le'print chr(0x263A);'
perl -le'print chr(0x00C0);'
To properly output the UTF-8 encoding of those Unicode code points, you need to tell Perl to encoding the Unicode points with UTF-8.
$ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x263A);'
☺
$ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x00C0);'
À
Now on to the "why".
File handle can only transmit bytes, so unless you tell it otherwise, Perl file handles expect bytes. That means the string you provide to print cannot contain anything but bytes, or in other words, it cannot contain characters over 255. The output is exactly what you provide:
$ perl -e'print map chr, 0x00, 0x65, 0xC0, 0xF0' | od -t x1
0000000 00 65 c0 f0
0000004
This is useful. This is different then what you want, but that doesn't make it wrong. If you want something different, you just need to tell Perl what you want.
By adding an :encoding layer, the handle now expects a string of Unicode characters, or as I call it, "text". The layer tells Perl how to convert the text into bytes.
$ perl -e'
use open ":std", ":encoding(UTF-8)";
print map chr, 0x00, 0x65, 0xC0, 0xF0, 0x263a;
' | od -t x1
0000000 00 65 c3 80 c3 b0 e2 98 ba
0000011
Your right that chr doesn't know or care about Unicode. Like length, substr, ord and reverse, chr implements a basic string function, not a Unicode function. That doesn't mean it can't be used to work with text string. As you've seen, the problem wasn't with chr but with what you did with the string after you built it.
A character is an element of a string, and a character is a number. That means a string is just a sequence of numbers. Whether you treat those numbers as Unicode code points (text), packed IP addresses or temperature measurements is entirely up to you and the functions to which you pass the strings.
Here are a few example of operators that do assign meaning to the strings they receive as operands:
m// expects a string of Unicode code points.
connect expects a sequence of bytes that represent a sockaddr_in structure.
print with a handle without :encoding expect a sequence of bytes.
print with a handle with :encoding expect a sequence of Unicode code points.
etc
So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq 'À' holds?
chr(0xC0) eq 'À' does hold. Did you remember to tell Perl you encoded your source code using UTF-8 by using use utf8;? If you didn't tell Perl, Perl actually sees a two-character string on the RHS.
Regarding the question you've added:
There are problems with the encoding pragma. I recommend against using it. Instead, use
use open ':std', ':encoding(UTF-8)';
That'll fix one of the problems. The other problem you are encountering is with
chr(0x00C0) =~ /\w/
It's a known bug that's intentionally left broken for backwards compatibility reasons. That is, unless you request a more recent version of the language as follows:
use 5.014; # use 5.012; *might* suffice.
A workaround that works as far back as 5.8:
my $x = chr(0x00C0);
utf8::upgrade($x);
$x =~ /\w/

How can I interpolate literal \t and \n in Perl strings? [duplicate]

This question already has answers here:
How can I manually interpolate string escapes in a Perl string?
(2 answers)
Closed 8 years ago.
Say I have an environment variable myvar:
myvar=\tapple\n
When the following command will print out this variable
perl -e 'print "$ENV{myvar}"'
I will literally have \tapple\n, however, I want those control chars to be evaluated and not escaped. How would I achieve it?
In the real world $ENV residing in substitution, but I hope the answer will cover that.
Use eval:
perl -e 'print eval qq{"$ENV{myvar}"}'
UPD: You can also use substitution with the ee switch, which is safer:
perl -e '(my $s = $ENV{myvar}) =~ s/(\\n|\\t)/"qq{$1}"/gee; print $s'
You should probably be using String::Escape.
use String::Escape qw(unbackslash);
my $var = unbackslash($ENV{'myvar'});
unbackslash unescapes any string escape sequences it finds, turning them into the characters they represent. If you want to explicitly only translate \n and \t, you'll probably have to do it yourself with a substitution as in this answer.
There's nothing particularly special about a sequence of characters that includes a \. If you want to substitute one sequence of characters for another, it's very simple to do in Perl:
my %sequences = (
'\\t' => "\t",
'\\n' => "\n",
'foo' => 'bar',
);
my $string = '\\tstring fool string\\tfoo\\n';
print "Before: [$string]\n";
$string =~ s/\Q$_/$sequences{$_}/g for ( keys %sequences );
print "After: [$string]\n";
The only trick with \ is to keep track of the times when Perl thinks it's an escape character.
Before: [\tstring fool string\tfoo\n]
After: [ string barl string bar
]
However, as darch notes, you might just be able to use String::Escape.
Note that you have to be extremely careful when you're taking values from environment variables. I'd be reluctant to use String::Escape since it might process quite a bit more than you are willing to translate. The safe way is to only expand the particular values you explicitly want to allow. See my "Secure Programming Techniques" chapter in Mastering Perl where I talk about this, along with the taint checking you might want to use in this case.