How to output binary files in Perl? - perl

I want to be able to output 0x41, and have it show up as A.
This is what I have tried so far:
my $out;
open $out, ">file.txt" or die $!;
binmode $out;
print $out 0x41;
close $out;
It outputs 65 instead of A in the resulting file. This is not what I want.
I also have read this similar question, but I wouldn't transfer the answer over. pack a short results to 2 bytes instead of 1 byte.

You can use chr(0x41).
For larger structures, you can use pack:
pack('c3', 0x41, 0x42, 0x43) # gives "ABC"
Regarding your suspicion of pack, do go read its page - it is extremely versatile. 'c' packs a single byte, 's' (as seen in that question) will pack a two-byte word.

Use the chr function:
print $out chr 0x41

pack need two argument: The first argument explain how and how many data have to be packed:
perl -e 'printf "|%s|\n",pack("c",0x41,0x42,0x44);'
|A|
perl -e 'printf "|%s|\n",pack("c3",0x41,0x42,0x44);'
|ABD|
perl -e 'my #bytes=(0x41,0x42,0x43,0x48..0x54);
printf "|%s|\n",pack("c".(1+$#bytes),#bytes);'
|ABCHIJKLMNOPQRST|
you could even mix format in the 1st part:
perl -e 'printf "|%s|\n",pack("c3B8",0x41,0x42,0x44,"01000001");'
|ABDA|

Related

Awk's output in Perl doesn't seem to be working properly

I'm writing a simple Perl script which is meant to output the second column of an external text file (columns one and two are separated by a comma).
I'm using AWK because I'm familiar with it.
This is my script:
use v5.10;
use File::Copy;
use POSIX;
$s = `awk -F ',' '\$1==500 {print \$2}' STD`;
say $s;
The contents of the local file "STD" is:
CIR,BS
60,90
70,100
80,120
90,130
100,175
150,120
200,260
300,500
400,600
500,850
600,900
My output is very strange and it prints out the desired "850" but it also prints a trailer of the line and a new line too!
ka#man01:$ ./test.pl
850
ka#man01:$
The problem isn't just printing. I need to use the variable generated by awk "i.e. the $s variable) but the variable is also being reserved with a long string and a new line!
Could you guys help?
Thank you.
I'd suggest that you're going down a dirty road by trying to inline awk into perl in the first place. Why not instead:
open ( my $input, '<', 'STD' ) or die $!;
while ( <$input> ) {
s/\s+\z//;
my #fields = split /,/;
print $fields[1], "\n" if $fields[0] == 500;
}
But the likely problem is that you're not handling linefeeds, and say is adding an extra one. Try using print instead, or chomp on the resultant string.
perl can do many of the things that awk can do. Here's something similar that replaces your entire Perl program:
$ perl -naF, -le 'chomp; print $F[1] if $F[0]==500' STD
850
The -n creates a while loop around your argument to -e.
The -a splits up each line into #F and -F lets you specify the separator. Since you want to separate the fields on a comma you use -F,.
The -l adds a newline each time you call print.
The -e argument is the program to run (with the added while from -n). The chomp removes the newline from the output. You get a newline in your output because you happen to use the last field in the line. The -l adds a newline when you print; that's important when you want to extract a field in the middle of the line.
The reason you get 2 newlines:
the backtick operator does not remove the trailing newline from the awk output. $s contains "850\n"
the say function appends a newline to the string. You have say "850\n" which is the same as print "850\n\n"

How to print out the hexadecimal content of one text using Perl binmode?

Can someone tell me how binmode in Perl can be used to achieve the same function as following one-liner, so as to print out the hex data of one text ?
$ perl -nle 'print map {sprintf "%02X",ord} split //'
For example, if I input "abcABC", the output will be "616263414243".
Please give a similar one-liner using binmode.
Thanks.
binmode would be of no use at doing what you asked. There are currently no PerlIO layers that convert all output bytes but 0A to their ASCII hex representation.
Are you asking how to do that from within a program? The default input mode is the same as binmode on Linux, and on Windows the only difference is that every CR LF is converted to LF.
So you can safely write this.
while (<>) {
chomp;
printf '%02X, $_ for split //;
print "\n";
}
with the proviso that the CR characters will disappear if you are reading a Windows file using a Windows version of perl
I don't understand your insistance on using binmode, but perhaps this is what you mean?
perl -le 'print map { sprintf q{%02X}, ord } split //, shift' abcABC
output
616263414243

How can I slurp STDIN in Perl?

I piping the output of several scripts. One of these scripts outputs an entire HTML page that gets processed by my perl script. I want to be able to pull the whole 58K of text into the perl script (which will contain newlines, of course).
I thought this might work:
open(my $TTY, '<', '/dev/tty');
my $html_string= do { local( #ARGV, $/ ) = $TTY ; <> } ;
But it just isn't doing what I need. Any suggestions?
my #lines = <STDIN>;
or
my $str = do { local $/; <STDIN> };
I can't let this opportunity to say how much I love IO::All pass without saying:
♥ ♥ __ "I really like IO::All ... a lot" __ ♥ ♥
Variation on the POD SYNOPSIS:
use IO::All;
my $contents < io('-') ;
print "\n printing your IO: \n $contents \n with IO::All goodness ..." ;
Warning: IO::All may begin replacing everything else you know about IO in perl with its own insidious goodness.
tl;dr: see at the bottom of the post. Explanation first.
practical example
I’ve just wondered about the same, but I wanted something suitable for a shell one-liner. Turns out this is (Korn shell, whole example, dissected below):
print -nr -- "$x" | perl -C7 -0777 -Mutf8 -MEncode -e "print encode('MIME-Q', 'Subject: ' . <>);"; print
Dissecting:
print -nr -- "$x" echos the whole of $x without any trailing newline (-n) or backslash escape (-r), POSIX equivalent: printf '%s' "$x"
-C7 sets stdin, stdout, and stderr into UTF-8 mode (you may or may not need it)
-0777 sets $/ so that Perl will slurp the entire file; reference: man perlrun(1)
-Mutf8 -MEncode loads two modules
the remainder is the Perl command itself: print encode('MIME-Q', 'Subject: ' . <>);, let’s look at it from inner to outer, right to left:
<> takes the entire stdin content
which is concatenated with the string "Subject: "
and passed to Encode::encode asking it to convert that to MIME Quoted-Printable
the result of which is printed on stdout (without any trailing newline)
this is followed by ; print, again in Korn shell, which is the same as ; echo in POSIX shell – just echoïng a newline.
tl;dr
Call perl with the -0777 option. Then, inside the script, <> will contain the entire stdin.
complete self-contained example
#!/usr/bin/perl -0777
my $x = <>;
print "Look ma, I got this: '$x'\n";
To get it into a single string you want:
#!/usr/bin/perl -w
use strict;
my $html_string;
while(<>){
$html_string .= $_;
}
print $html_string;
I've always used a bare block.
my $x;
{
undef $/; # Set slurp mode
$x = <>; # Read in everything up to EOF
}
# $x should now contain all of STDIN

CRLF translation with Unicode in Perl

I'm trying to write to a Unicode (UCS-2 Little Endian) file in Perl on Windows, like this.
open my $f, ">$fName" or die "can't write $fName\n";
binmode $f, ':raw:encoding(UCS-2LE)';
print $f, "ohai\ni can haz unicodez?\nkthxbye\n";
close $f;
It basically works except I no longer get the automatic LF -> CR/LF translation on output that I get on regular text files. (The output files just have LF) If I leave out :raw or add :crlf in the "binmode" call, then the output file is garbled. I've tried re-ordering the "directives" (i.e. :encoding before :raw) and can't get it to work. The same problem exists for reading.
This works for me on windows:
open my $f, ">:encoding(UCS-2LE):crlf", "test.txt";
print $f "ohai\ni can haz unicodez?\nkthxbye\n";
close $f;
Yielding UCS-16 LE output in test.txt of
ohai
i can haz unicodez?
kthxbye
Here is what I have found to work, at least with perl 5.10.1:
Input:
open(my $f_in, '<:raw:perlio:via(File::BOM):crlf', $file);
Output:
open(my $f_out, '>:raw:perlio:encoding(UTF-16LE):crlf:via(File::BOM)', $file);
These handle BOM, CRLF translation, and UTF-16LE encoding/decoding transparently.
Note that according to the perlmonks post below, if trying to specify with binmode() instead of open(), an extra ":pop" is required:
binmode $f_out, ':raw:pop:perlio:encoding(UTF-16LE):crlf';
which my experience corroborates. I was not able to get this to work with the ":via(File::BOM)" layer, however.
References:
http://www.perlmonks.org/?node_id=608532
http://metacpan.org/pod/File::BOM
The :crlf layer does a simple byte mapping of 0x0A -> 0x0D 0x0A (\n --> \r\n) in the output stream, but for the most part this isn't valid in any wide character encoding.
How about using a raw mode but explicitly print the CR?
print $f "ohai\r\ni can haz unicodez?\r\nkthxbye\r\n";
Or if portability is a concern, discover and explicitly use the correct line ending:
## never mind - $/ doesn't work
# print $f "ohai$/i can haz unicodez?$/kthxbye$/";
open DUMMY, '>', 'dummy'; print DUMMY "\n"; close DUMMY;
open DUMMY, '<:raw', 'dummy'; $EOL = <DUMMY>; close DUMMY;
unlink 'dummy';
...
print $f "ohai${EOL}i can haz unicodez?${EOL}kthxbye${EOL}";
Unrelated to the question, but Ωmega
asked in a comment about the difference between :raw and :bytes. As documented in perldoc perlio, you can think of :raw as removing all I/O layers, and :bytes as removing a :utf8 layer. Compare the output of these two commands:
$ perl -E 'binmode *STDOUT,":crlf:raw"; say' | od -c
0000000 \n
0000001
$ perl -E 'binmode *STDOUT,":crlf:bytes";say' | od -c
0000000 \r \n
0000002

Why does my Perl script remove characters from the file?

I have some issue with a Perl script. It modifies the content of a file, then reopen it to write it, and in the process some characters are lost. All words starting with '%' are deleted from the file. That's pretty annoying because the % expressions are variable placeholders for dialog boxes.
Do you have any idea why? Source file is an XML with default encoding
Here is the code:
undef $/;
open F, $file or die "cannot open file $file\n";
my $content = <F>;
close F;
$content =~s{status=["'][\w ]*["']\s*}{}gi;
printf $content;
open F, ">$file" or die "cannot reopen $file\n";
printf F $content;
close F or die "cannot close file $file\n";
You're using printf there and it thinks its first argument is a format string. See the printf documentation for details. When I run into this sort of problem, I always ensure that I'm using the functions correctly. :)
You probably want just print:
print FILE $content;
In your example, you don't need to read in the entire file since your substitution does not cross lines. Instead of trying to read and write to the same filename all at once, use a temporary file:
open my($in), "<", $file or die "cannot open file $file\n";
open my($out), ">", "$file.bak" or die "cannot open file $file.bak\n";
while( <$in> )
{
s{status=["'][\w ]*["']\s*}{}gi;
print $out;
}
rename "$file.bak", $file or die "Could not rename file\n";
This also reduces to this command-line program:
% perl -pi.bak -e 's{status=["\']\\w ]*["\']\\s*}{}g' file
Er. You're using printf.
printf interprets "%" as something special.
use "print" instead.
If you have to use printf, use
printf "%s", $content;
Important Note:
PrintF stands for Print Format , just as it does in C.
fprintf is the equivelant in C for File IO.
Perl is not C.
And even IN C, putting your content as parameter 1 gets you shot for security reasons.
Or even
perl -i bak -pe 's{status=["\'][\w ]*["\']\s*}{}gi;' yourfiles
-e says "there's code following for you to run"
-i bak says "rename the old file to whatever.bak"
-p adds a read-print loop around the -e code
Perl one-liners are a powerful tool and can save you a lot of drudgery.
If you want a solution that is aware of the XML nature of the docs (i.e., only delete status attributes, and not matching text contents) you could also use XML::PYX:
$ pyx doc.xml | perl -ne'print unless /^Astatus/' | pyxw
That's because you used printf instead of print and you know printf doesn't print "%" (because it would think you forgot to type the format symbol such as %s, %f etc) unless you explicitly mention by "%%". :-)