Considering this value:
my $value = hex('0x12345678');
And I would like my hexdump to be like this (same bits order):
0000000 1234 5678
I used this method but it mixes up my value:
open(my $out, '>:raw', 'foo') or die "Unable to open: $!";
print $out pack('l', $value); # Test in little endian
print $out pack('l>', $value); # Test in big endian
Here's what I get:
0000000 5678 1234 3412 7856
How can I get the bits in order?
EDIT
So the problem might come from my hexdump, because I get the same output with the suggested answer.
$ perl -e 'print pack $_, 0x12345678 for qw( l> N )' | hexdump
0000000 3412 7856 3412 7856
I got the correct result with hexdump -C:
$ perl -e 'print pack $_, 0x12345678 for qw( l> N )' | hexdump -C
00000000 12 34 56 78 12 34 56 78 |.4Vx.4Vx|
And I found the explanation here:
hexdump confusion
The 'l>' option works for me (note there's no call to hex, though). Also, N as the template works:
perl -e 'print pack $_, 0x12345678 for qw( l> N )' | xxd
0000000: 1234 5678 1234 5678
Related
How can Perl do input from stdin, one char like
readline -N1
does?
You can do that with the base perl distribution, no need to install extra packages:
use strict;
sub IO::Handle::icanon {
my ($fh, $on) = #_;
use POSIX;
my $ts = new POSIX::Termios;
$ts->getattr(fileno $fh) or die "tcgetattr: $!";
my $f = $ts->getlflag;
$ts->setlflag($on ? $f | ICANON : $f & ~ICANON);
$ts->setattr(fileno $fh) or die "tcsetattr: $!";
}
# usage example
# a key like `Left` or `á` may generate multiple bytes
STDIN->icanon(0);
sysread STDIN, my $c, 256;
STDIN->icanon(1);
# the read key is in $c
Reading just one byte may not be a good idea because it will just leave garbage to be read later when pressing a key like Left or F1. But you can replace the 256 with 1 if you want just that, no matter what.
<STDIN> will read stdin one byte (C char type, which is not the same as a character which these days are typically made of several bytes except for those in the US-ASCII charset) at a time from stdin if the record separator is set to a reference to the number 1.
$ echo perl | perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'
<p>
Note that underneath, it may read (consume) more than one byte from the input. Above, the next <STDIN> within perl would return <e>, but possibly from some large buffer that was read beforehand.
$ echo perl | (perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<p>
0
Above, you'll notice that wc didn't receive any input as it had all already been consumed by perl.
$ echo perl | (PERLIO=raw perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<p>
4
This time, wc got 4 bytes (e, r, l, \n) as we told perl to use raw I/O so the <STDIN> translates to a read(0, bud, 1).
Instead of <STDIN>, you can use perl's read with the same caveat:
$ echo perl | (perl -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
0
$ echo perl | (PERLIO=raw perl -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
4
Or use sysread which is the true wrapper for the raw read():
$ echo perl | (perl -le 'sysread STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
4
To read one character at a time, you need to read one byte at a time until the end of the character.
You can do it for UTF-8 encoded input (in locales using that encoding) in perl with <STDIN> or read (not sysread) with the -C option, including with raw PERLIO:
$ echo été | (PERLIO=raw perl -C -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<é>
4
$ echo été | (PERLIO=raw perl -C -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<é>
4
With strace, you'd see perl does two read(0, buf, 1) system calls underneath to read that 2-byte é character.
Like with ksh93 / bash's read -N (or zsh's read -k), you can get surprises if the input is not properly encoded in UTF-8:
$ printf '\375 12345678' | (PERLIO=raw perl -C -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<� 1234>
4
\375 (\xFD) would normally be the first byte of the encoding of a 6 byte character in UTF-8¹, so perl reads all 6 bytes here even though the second to sixth can't possibly be part of that character as they don't have the 8th bit set.
Note that when stdin is a tty device, read() will not return until the terminal at the other end sends a LF (eol), CR (which is by default converted to LF), or eof (usually ^D) or eol2 (usually not defined) character as configured in the tty line discipline (like with the stty command) as the tty driver implements its own internal line editor allowing you to edit what you type before pressing enter.
If you want to read the byte(s) that is(are) sent for each key pressed by the user there, you'd need to disable that line editor (which bash/ksh93's read -N or zsh's read -k do when stdin is a tty), see #guest's answer for details on how to do that.
¹ While now Unicode restricts codepoints to up to 0x10FFFF which means UTF-8 encodings have at most 4 bytes, UTF-8 was originally designed to encode code points up to 0x7fffffff (up to 6 byte encoding) and perl extends it to up to 0x7FFFFFFFFFFFFFFF (13 byte encoding)
First, this is not a duplicate of, e.g., How can I replace each newline (\n) with a space using sed?
What I want is to exactly replace every newline (\n) in a string, like so:
printf '%s' $'' | sed '...; s/\n/\\&/g'
should result in the empty string
printf '%s' $'a' | sed '...; s/\n/\\&/g'
should result in a (not followed by a newline)
printf '%s' $'a\n' | sed '...; s/\n/\\&/g'
should result in
a\
(the trailing \n of the final line should be replaced, too)
A solution like :a;N;$!ba; s/\n/\\&/g from the other question doesn't do that properly:
printf '%s' $'' | sed ':a;N;$!ba; s/\n/\\&/g' | hd
works;
printf '%s' $'a' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 |a|
00000001
works;
printf '%s' $'a\nb' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 5c 0a 62 |a\.b|
00000004
works;
but when there's a trailing \n on the last line
printf '%s' $'a\nb\n' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 5c 0a 62 0a |a\.b.|
00000005
it doesn't get quoted.
Easier to use perl than sed, since it has (by default, at least) a more straightforward treatment of the newlines in its input:
printf '%s' '' | perl -pe 's/\n/\\\n/' # Empty string
printf '%s' a | perl -pe 's/\n/\\\n/' # a
printf '%s\n' a | perl -pe 's/\n/\\\n/' # a\<newline>
printf '%s\n' a b | perl -pe 's/\n/\\\n/' # a\<newline>b\<newline>
# etc
If your inputs aren't huge, you could use
perl -0777 -pe 's/\n/\\\n/g'
instead to read the entire input at once instead of line by line, which can be more efficient.
how to replace newline charackters with a string in sed
It's not possible. From sed script point of view, the trailing line missing or not makes no difference and is undetectable.
Aaaanyway, use GNU sed with sed -z:
sed -z 's/\n/\\\n/g'
GNU awk can use the RT variable to detect a missing record terminator:
$ printf 'a\nb\n' | gawk '{ORS=(RT != "" ? "\\" : "") RT} 1'
a\
b\
$ printf 'a\nb' | gawk '{ORS=(RT != "" ? "\\" : "") RT} 1'
a\
b$
This adds a "\" before each non-empty record terminator.
Using any awk:
$ printf 'a\nb\n\n' | awk '{printf "%s%s", sep, $0; sep="\\\n"}'
a\
b\
$ printf 'a\nb\n' | awk '{printf "%s%s", sep, $0; sep="\\\n"}'
a\
b$
Or { cat file; echo; } | awk ... – always add a newline to the input.
I use sed -e '$s/.$//' to trim the last character of a stream. Is it the correct way to do so? Are there other better ways to do so with other command line tools?
$ builtin printf 'a\nb\0' | sed -e '$s/.$//' | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
EDIT: It seems that this command is not robust. The expected output is a\nb for the following example. Better methods (but not too verbose) are needed.
$ builtin printf 'a\nb\n' | sed -e '$s/.$//' | od -c -t x1 -Ax
000000 a \n \n
61 0a 0a
000003
You may use head -c -1:
printf 'a\nb\0' | head -c -1 | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
printf 'a\nb\n' | head -c -1 | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
It seems you can't rely on any line-oriented tools (like sed) that automatically remove and re-add newlines.
Perl can slurp the whole stream into a string and can remove the last char:
$ printf 'a\nb\0' | perl -0777 -pe chop | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
$ printf 'a\nb\n' | perl -0777 -pe chop | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
The tradeoff is that you need to hold the entire stream in memory.
When I run this without using the extra-escape for the "\n", hexdump doesn't print the 0a for the embedded newline.
Why does the "\n" need here an extra-treatment?
(While searching for an answer I found String::ShellQuote which does the escaping.)
#!/usr/bin/env perl
use warnings;
use 5.012;
use utf8;
binmode STDOUT, ':utf8';
use charnames qw(:full);
use IPC::System::Simple qw(system);
for my $i ( 0x08 .. 0x0d ) {
printf "0x%02x - %s\n", $i, '\N{' . charnames::viacode( $i ) . '}';
my $string = "It" . chr( $i ) . "s";
$string =~ s/\n/\\n/g;
system( "echo -e \Q$string\E | hexdump -C" );
say "";
}
When you don't convert the newline to the two characters \n, you're executing the command
echo -e \
| hexdump -C
To sh, that's equivalent to
echo -e | hexdump -C
When you convert the newline to the two characters \n, you're executing the command
echo -e \\n | hexdump -C
That passes the two characters \n to echo, for which it outputs a newline under -e.
You don't need to use -e and to create escapes for -e. You could create a proper shell command. That command would be:
echo '
' | hexdump -C
You can do that a number of ways. You could roll out your own solution.
(my $sh_literal = $string) =~ s/'/'\\''/g;
$sh_literal = "'$sh_literal'";
system( "echo $sh_literal | hexdump -C" );
There is String::ShellQuote.
use String::ShellQuote qw( shell_quote );
my $sh_literal = shell_quote($string);
system( "echo $sh_literal | hexdump -C" );
Finally, you could avoid the shell entirely.
open(my $fh, "|-", "hexdump", "-vC")
or die("Could not start hexdump: $!\n");
print($fh $string);
as #mugenkenichi commented echo is interpreting your strings too, so you have to escape special characters twice, once for perl and once for echo.
Instead this approach might be more convenient:
#!/usr/bin/env perl
use warnings;
use 5.012;
use utf8;
binmode STDOUT, ':utf8';
use charnames qw(:full);
use IPC::System::Simple qw(system);
for my $i ( 0x08 .. 0x0d ) {
printf "0x%02x - %s\n", $i, '\N{' . charnames::viacode($i) . '}';
my $string = "It" . chr($i) . "s";
open( my $fh, "| hexdump -vC" )
or die "could not talk to hexdump";
print $fh $string;
say "";
}
I’d like to write a Perl one-liner to decode a line of ASCII characters encoded as hexadecimal numbers (for example the line 48 54 54 50 should be decoded as HTTP). I came up with this:
perl -nE 'say map(chr, map { qq/0x$_/ } split)'
It prints an empty line. What am I doing wrong and how would you write it?
It's your qq/0x$_/ trick that doesn't work. chr expects a number as argument, but gets the string literal "0x48". Use the hex function to convert 48 to a decimal number, like datageist does in his answer.
This works for me:
echo '48 54 54 50' | perl -nE 'say map(chr, map { hex } split)'
This works:
echo '48 54 54 50' | perl -nE 'say map{chr(hex)} split'
I’m assuming you want to feed the data from STDIN.
As always with Perl TIMTOWTDI.
I thought I would submit several options, and show what they would look like if they were written normally. If you want to know more about the command line options perldoc perlrun is a useful resource.
These all output the same thing. With the exception that some of them don't print a newline on the end.
echo '48 54 54 50' | perl -0x20 -pe'$_=chr hex$_'
echo '48 54 54 50' | perl -0x20 -ne'print chr hex$_'
echo '48 54 54 50' | perl -0777 -anE'say map chr,map hex,#F'
echo '48 54 54 50' | perl -0777 -anE'say map{chr hex$_}#F'
echo '48 54 54 50' | perl -0apple'$_=chr hex$_' -0x20
echo '48 54 54 50' | perl -apple'$_=join"",map{chr hex}#F'
echo '48 54 54 50' | perl -lanE'say map{chr hex}#F'
The following is what some of the examples would look like if they were written normally. If you want to figure out what the rest of them do, definitely look at perldoc perlrun.
perl -0x20 -pe'$_=chr hex$_'
This is one is fairly straight forward. It is perhaps the best example here, and is also the shortest one. It pretends that spaces are used to separate lines, so that there is only one letter to deal with inside of the loop.
# perl -0x20 -pe'$_=chr hex$_'
$/ = " "; # -0 ( input separator )
while( <> ){
$_ = chr hex $_;
} continue {
print $_;
}
perl -0apple'$_=chr hex$_' -0x20
This one has a few command line options that don't do anything useful.
The first -0 option is there so that -l sets the output separator to an empty string.
Which is actually the default for the output separator.
There are two -p options where one would have sufficed.
The -a option sets up the #F array, but we don't actually use it.
Basically I used -a -l and a second -p so that the options would spell apple. Otherwise this one is the same as the last example.
echo '48 54 54 50' | perl -0x20 -pe'$_=chr hex$_'
# perl -0apple'$_=chr hex$_' -0x20
$/ = ""; # -0 ( input separator )
$\ = $/; # -l ( output separator )
$/ = " "; # -0x20 ( input separator )
while( <> ){
#F = split " ", $_; # -a ( unused )
$_ = chr hex $_;
} continue {
print $_;
}
perl -lanE'say map{chr hex}#F'
I figured I already spelled apple, I might as well spell lanE.
-l isn't really useful, because we already are using say.
Used -E instead of -e so that we could use say.
# perl -lanE'say map{chr hex}#F'
$\ = $/; # -l ( output separator set to "\n" )
while( <> ){
#F = split " ", $_; # -a
say map { chr hex $_ } #F;
}
Play perlgolf?
-ple y/0-9A-Fa-f//cd;$_=pack"H*",$_
-ple $_=pack"H*",$_,join"",split
-nE say map chr hex,split
-naE say map chr hex,#F