Convert number to hex - perl

I use sprintf for conversion to hex - example >>
$hex = sprintf("0x%x",$d)
But I was wondering, if there is some alternative way how to do it without sprintf.
My goal is convert a number to 4-byte hex code (e.g. 013f571f)
Additionally (and optionally), how can I do such conversion, if number is in 4 * %0xxxxxxx format, using just 7 bits per byte?

sprintf() is probably the most appropriate way. According to http://perldoc.perl.org/functions/hex.html:
To present something as hex, look into printf, sprintf, and unpack.
I'm not really sure about your second question, it sounds like unpack() would be useful there.

My goal is convert a number to 4-byte hex code (e.g. 013f571f)
Hex is a textual representation of a number. sprintf '%X' returns hex (the eight characters 013f571f). sprintf is specifically designed to format numbers into text, so it's a very elegant solution for that.
...But it's not what you want. You're not looking for hex, you're looking for the 4-byte internal storage of an integer. That has nothing to do with hex.
pack 'N', 0x013f571f; # "\x01\x3f\x57\x1f" Big-endian byte order
pack 'V', 0x013f571f; # "\x1f\x57\x3f\x01" Little-endian byte order

sprintf() is my usual way of performing this conversion. You can do it with unpack, but it will probably be more effort on your side.
For only working with 4 byte values, the following will work though (maybe not as elegant as expected!):
print unpack("H8", pack("N1", $d));
Be aware that this will result in 0xFFFFFFFF for numbers bigger than that as well.
For working pack/unpack with arbitrary bit length, check out http://www.perlmonks.org/?node_id=383881
The perlpacktut will be a handy read as well.

For 4 * %0xxxxxxx format, my non-sprintf solution is:
print unpack("H8", pack("N1",
(((($d>>21)&0x7f)<<24) + ((($d>>14)&0x7f)<<16) + ((($d>>7)&0x7f)<<8) + ($d&0x7f))));
Any comments and improvements are very welcome.

Related

How to truncate a 2's complement output

I have data written into short data type. The data written is of 2's complement form.
Now when I try to print the data using %04x, the data with MSB=0 is printed fine for eg if data=740, print I get is 0740
But when the MSB=1, I am unable to get a proper print. For eg if data=842, print I get is fffff842
I want the data truncated to 4 bytes so expected output is f842
Either declare your data as a type which is 16 bits long, or make sure the printing function uses the right format for 16 bits value. Or use your current type, but do a bitwise AND with 0xffff. What you can do depends on the language you're doing it in really.
But whichever way you go, check your assumptions again. There seems to be a few issues in your question:
2s-complement applies to signed numbers only. There are no negative numbers in your question.
Assuming you mean C's short - it doesn't have to be 16 bits long.
"I get is fffff842 I want the data truncated to 4 bytes" - fffff842 is 4 bytes long. f842 is 2 bytes long.
2-bytes long value 842 does not have the MSB set.
I'm assuming C (or possibly C++) as the language here.
Because of the default argument promotions involved when calling a variable argument function (such as printf), your use of a short will result in an integer promotion, which states that "If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int".
A short is converted to an int by means of sign-extension, and 0xf842 sign-extended to 32 bits is 0xfffff842.
You can use a bitwise AND to mask off the most significant word:
printf("%04x", data & 0xffff);
You could also add the h length specifier to state that you only want to print an (unsigned) short worth of bits from an int:
printf("%04hx", data);

How can I work with raw bytes in perl

Documentation all directs me to unicode support, yet I don't think my request has anything to do with Unicode. I want to work with raw bytes within the context of a single scalar; I need to be able to figure out its length (in bytes), take substrings of it (in bytes), write the bytes to disc, and over the network. Is there an easy way to do this, without treating the bytes as any sort of encoding in perl?
EDIT
More explicitly,
my $data = "Perl String, unsure of encoding and don't need to know";
my #data_chunked_into_1024_bytes_each = #???
Perl strings are, conceptually, strings of characters, which are positive 32-bit integers that (normally) represent Unicode code points. A byte string, in Perl, is just a string in which all the characters have values less than 256.
(That's the conceptual view. The internal representation is somewhat more complicated, as the perl interpreter tries to store byte strings — in the above sense — as actual byte strings, while using a generalized UTF-8 encoding for strings that contain character values of 256 or higher. But this is all supposed to be transparent to the user, and in fact mostly is, except for some ugly historical corner cases like the bitwise not (~) operator.)
As for how to turn a general string into a byte string, that really depends on what the string you have contains and what the byte string is supposed to contain:
If your string already is a string of bytes — e.g. if you read it from a file in binary mode — then you don't need to do anything. The string shouldn't contain any characters above 255 to being with, and if it does, that's an error and will probably be reported as such by the encryption code.
Similarly, if your string is supposed to encode text in the ASCII or ISO-8859-1 encodings (which encode the 7- and 8-bit subsets of Unicode respectively), then you don't need to do anything: any characters up to 255 are already correctly encoded, and any higher values are invalid for those encodings.
If your input string contains (Unicode) text that you want to encode in some other encoding, then you'll need to convert the string to that encoding. The usual way to do that is by using the Encode module, like this:
use Encode;
my $byte_string = encode( "name of encoding", $text_string );
Obviously, you can convert the byte string back to the corresponding character string with:
use Encode;
my $text_string = decode( "name of encoding", $byte_string );
For the special case of the UTF-8 encoding, it's also possible to use the built-in utf8::encode() function instead of Encode::encode():
utf8::encode( $string );
which does essentially the same thing as:
use Encode;
$string = encode( "utf8", $string );
Note that, unlike Encode::encode(), the utf8::encode() function modifies the input string directly. Also note that the "utf8" above refers to Perl's extended UTF-8 encoding, which allows values outside the official Unicode range; for strictly standards-compliant UTF-8 encoding, use "utf-8" with a hyphen (see Encode documentation for the gory details). And, yes, there's also a utf8::decode() function that does pretty much what you'd expect.
If I understood your question correctly, what you want is the pack/unpack functions: http://perldoc.perl.org/functions/pack.html
As long as your string doesn't contain characters above codepoint 255, it will mostly work as plain byte string, with length and substr operating on bytes. Additionally, most output functions like print expect octets/bytes by default and will actually complain if you try to stuff anything else to them.
You may need to explicitly encode/decode your output if it is known to be in some encoding, but more details can only be added if you ask another specific question for each problematic part of your program.

hexadecimal encoding/decoding functions in mysql

are there any hexadecimal encoding/decoding functions that mysql will recognize to avoid writing a whole algorithm out. base16_encode/decode does not work. Thanks
MySQL understands hexidecimal literals
MySQL supports hexadecimal values, written using X'val', x'val', or 0xval format, where val contains hexadecimal digits (0..9, A..F). Lettercase of the digits does not matter. For values written using X'val' or x'val' format, val must contain an even number of digits. For values written using 0xval syntax, values that contain an odd number of digits are treated as having an extra leading 0. For example, 0x0a and 0xaaa are interpreted as 0x0a and 0x0aaa.
http://dev.mysql.com/doc/refman/5.0/en/hexadecimal-literals.html

decode hex in PostgreSQL - got error "odd number of digits"

I have a problem using this query:
select decode(to_hex(ascii('ل')::int),'hex')
When I execute it, I get:
ERROR: invalid hexadecimal data: odd number of digits
decode(..., 'hex') doesn't mean convert this hexadecimal number to something. Hex encoding is a particular encoding format for bytes, and it requires two hexadecimal digits per octet. On the other hand, to_hex converts an integer to a hexadecimal representation, and that could have an even or odd number of digits.
So the answer is, you can't do that (without some manual fixups). And it's not clear why you would want to, either. It looks like you could just do 'ل'::bytea, but that might not be what you wanted either.
May be it's simpler to use something like this:
select encode('ل','escape');

Can you explain the bits I'm getting from unpack?

I'm relatively inexperienced with Perl, but my question concerns the unpack function when getting the bits for a numeric value. For example:
my $bits = unpack("b*", 1);
print $bits;
This results in 10001100 being printed, which is 140 in decimal. In the reverse order it's 49 in decimal. Any other values I've tried seem to give the incorrect bits.
However, when I run $bits through pack, it produces 1 again. Is there something I'm missing here?
It seems that I jumped to conclusions when I thought my problem was solved. Maybe I should briefly explain what it is I'm trying do.
I need to convert an integer value that could be as big as 24 bits long (the point being that it could be bigger than one byte) into a bit string. This much can be accomplished using unpack and pack as suggested by #ikegami, but I also need to find a way to convert that bit string back into it's original integer (not a string representation of it).
As I mentioned, I'm relatively inexperienced with Perl, and I've been trying with no success.
I found what seems to be an optimal solution:
my $bits = sprintf("%032b", $num);
print "$bits\n";
my $orig = unpack("N", pack("B32", substr("0" x 32 . $bits, -32)));
print "$orig\n";
This might be obvious, but the other answers haven't pointed it out explicitly: The second argument in unpack("b*", 1) is being typecast to the string "1", which has an ASCII value of 31 in hex (with the most significant nibble first).
The corresponding binary would be 00110001, which is reversed to 10001100 in your output because you used "b*" instead of "B*". These correspond to the opposite "endian" forms of the binary representation. "Endian-ness" is just whether the most-significant bits go at the start or the end of the binary representation.
Yes, you're missing that different machines support different "endianness". And Perl is treating 1 like '1' so ( 0x31 ). So, you're seeing 1 -> 1000 (in ascending order) and 3 -> 1100.
"Wrong" depends on perspective and whether or not you gave Perl enough information to know what encoding and endianness you wanted.
From pack:
b A bit string (ascending bit order inside each byte, like vec()).
B A bit string (descending bit order inside each byte).
I think this is what you want:
unpack( 'B*', chr(1))
You're trying to convert an integer to binary and then back. While you can do that with pack and then unpack, the better way is to use sprintf or printf with the %b format:
my $int = 5;
my $bits = sprintf "%024b\n", $int;
print "$bits\n";
To go the other way (converting a string of 0s & 1s to an integer), the best way is to use the oct function with a 0b prefix:
my $orig = oct("0b$bits");
print "$orig\n";
As the others explained, unpack expects a string to unpack, so if you have an integer, you first have to pack it into a string. The %b format expects an integer to begin with.
If you need to do a lot of this on bytes, and speed is crucial, you could build a lookup table:
my #binary = map { sprintf '%08b', $_ } 0 .. 255;
print $binary[$int]; # Assuming $int is between 0 and 255
The ord(1) is 49. You must want something like sprintf("%064b", 1), although that does seem like overkill.
You didn't specify what you expect. I'm guessing you're expecting 00000001.
That's the correct bits for the byte you provided, at least on non-EBCDIC systems. Remember, the input of unpack is a string (mostly strings of bytes). Perhaps you wanted
unpack('b*', pack('C', 1))
Update: As others have pointed out, the above gives 10000000. For 00000001, you'd use
unpack('B*', pack('C', 1)) # 00000001
You want "B" instead of "b".
$ perl -E'say unpack "b*", "1"'
10001100
$ perl -E'say unpack "B*", "1"'
00110001
pack