Convert Binary File To String In Perl - perl

Ok. I have spent the last 14 hours trying to figure this out. I have a binary file with the following contents - (much more, but this is truncated version). I wish to convert this to readable string format.
^#^P<9A>^#^#^A^#^#И^#^#^A^#^#Κ^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^F<9A>^#^#^#^#^#^#^C^#FQ]U:^#^M^#
^B^#^E^#^#^#`ESC^B^#d^#^#^#^T^R^B^#^E^#^#^#^#^#^#^#^T^R^B^#^#^#^#^#^#^#^#^#^K^B^#^#^#^#^#^C^#HQ]U:^#^S^#^#^#(^#^#^#V^#^#2^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^C^#HQ]U:^#^V^#<8C>I^B^#^E^#^#^#O^B^#
^#^#^#O^B^#^E^#^#^#^#^#^#^#O^B^#^#^#^#^#^#^#^#^#^RK^B^#^#^#^#^#^C^#HQ]U:^#^Y^#0^A^#d^#^#^#1^A^#<96>^#^#^#L0^A^#d^#^#^#^#^#^#
^#71^A^#^#^#^#^#^#^#^#^#0^A^#^#^#^#^#^C^#=Q]U:^#"^#<92>T^#^#2^#^#^#CN^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^AT^#^#
^#^#^#^#^C^#FQ]U:^#(^#$^M^A^# ^#^#^#^G^A^#2^#^#^#^O^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#R^L^A^#^#^#^#^#^C^#=Q]U:^#.^#<85>^B
^#^#^G^#^#g^B^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#<85>^B^#^#^#^#^#^#^C^#HQ]U:^#4^#^CH^#^#^Y^#^#^#G^#^#d^#^#^#
H^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^CH^#^#^#^#^#^#^C^#HQ]U:^#O^#^M^#^#<89>^#^#^#^G^M^#^#^A^#^#^P^N^#^#^#^#^#^#^#^#^#^#
^#^#^#^#^#^#^#^#^#^#^#^#^M^#^#^#^#^#^#^C^#HQ]U:^#R^#^B^#^#^A^#^#^B^#^#<8C>0^B^#^B^#^#^A^#^#^#^#^#^#^B^#^#^#^#^#^#^#^#^#^#^B^#^#^#^#^#^#^C^#HQ]U:^#d^#F^A^#
^#^#^#^TJ^A^#
^#^#^#<98>M^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#<8A>G^A^#^#^#^#^#^C^#HQ]U:^#y^#j;^#^#^A^#^#^#=;^#^#d^#^#^#(<^#^#^C^#^#^#^#^#^#P<^#^#^#^#^#^#^#^#^#^#=;^#^#^#^#^#^#^C^#FQ]U:^#<88>^#&^#^#^A^#^#^#&^#^#d^#^#^#'^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#&^#^#
^#^#^#^#^C^#FQ]U:^#<94>^#^H^#^#^#^#^#^H^#^#d^#^#^#
^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^H^#^#^#^#^#^#^C^#HQ]U:^#<9A>^#w^#^#^A^#^#^#\^#^#^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#Z^#^#^#^#^#^#^C^#HQ]U:^#<9D>^#^A^B^#
^#^#^#^A^B^#^A^#^#^#^A^#
^#^#^#^#^#^#^#"^A^#^#^#^#^#^#^#^#^#^A^#^#^#^#^#^C^#HQ]U:^#^#4I^#^#^A^#^#^#DH^#^#^A^#^#^#^]B^#^#<9E>^#^#^#^#^#^#^#I^#^#^#^#^#^#^#^#^#^#MI^#^#^#^#^#^#^C^#FQ]U:^#^#y^#^#^A^#^#^#^Xy^#^#^A^#^#^#]a^#^#^C^#^#^#^#^#^#^#Px^#^#^#^#^#^#^#^#^#^#wy^#^#^#^#^#^#^C^#HQ]U:
^#^#V^^#^#^T^#^#^#^^#^#^A^#^#^#ZU^#^#e^#^#^#^#^#^#^#$^^#^#^#^#^#^#^#^#^#^#^^#^#^#^#^#^#^C^#DQ]U:^#^#DESC^#^#^A^#^#XESC^#
^#^A^#^#^#<84>^\^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#<80>ESC^#^#^#^#^#^#^C^#HQ]U:^#^#ESC^#^#2^#^#^#ESC^#^#d^#^#^#ESC^#^#^#^#^#^#^#^#^#ESC^#^#^#^#^#^#^#^#^#^#ESC^#^#^#^#^#^#^C^#HQ]U:^#^#<8B>-^A^#^#^#^##-^A^#<^#^#^#,^A^##^#^#^#^#^#^#^##-^A^#^#^#^#^#^#^#^#^##-^A^#^#^#^#^#^C^#HQ]U:^#^#<86>^A^#^#U^#^#<86>^A^#^##<9C>^#^#<90>^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
<86>^A^#^#^#^#^#^#^C^#FQ]U:^#^G^A^T^A^#
^#^#^#Y^A^#Q^#^#^#^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^P^A^#^#^#^#^#^C^#HQ]U:^#^S^A^^^B^#^A^#^#^#<80>2^B^#
^#^#^#^O^B^#n^#^#^#^#^#^#^#^^B^#^#^#^#^#^#^#^#^#^_^B^#^#^#^#^#^C^#DQ]U:^#^V^A4^A^#^#^#^#^P!^A^#K^#^#^#8D^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#*^A^#^#^#^#^#^C^#?Q]U:^#.^Aw^F^#^#^A^#^#^#h^F^#^#^O^#^#b^G^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#w^F^#^#^#^#^#^#^C^#HQ]U:^#1^A^A^#^A^#^#^#^A^#^\^B^#^#X^O^B^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^A^#^#^#^#^#^C^#HQ]U:^#4^A^X^F^#^#^G^#^#x^E^#^#^Z^D^#^##^F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^X^F^#^#^#^#^#^#^C^#FQ]U:^#=^A^L^F^#^A^#^#^#\^F^#^G^#^#^#X^F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#S^F^#^#^#^#^#^C^#=Q]U:^#O^A^P!^A^#^#^#^#^#^#^#^#^#^A^#^#^#^H^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
"^A^#^#^#^#^#^C^#BQ]U:^#R^AX^#^#^Y^#^#^#^#^#^E^#^#^#x^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^C^#HQ]U:^#U^A^R^Q^#^#^A^#^#^#^P^#^#2^#^#^#^P^#^#^A^#^#^#^#^#^#^#^P^#^#^#^#^#^#^#^#^#^#^H^Q^#^#^#^#^#^#^C^##Q]U:^#^^An^A^#^A^#^#^#pM
^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#hn^A^#^#^#^#^#^C^#HQ]U:^#p^A<9D>^A^#^B^#^#^#d<90>^A^#^E^#^#^#<90>^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^R<9C>^A^#^#^#^#^#^C^#HQ]U:^#s^A^A^#^Y^#^#^#ȩ^A^#^T^#^#^#а^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
^#^#^#y^A^#^#^#^#^#^C^#HQ]U:^#|^A<8E>^#^#^A^#^#M<9E>^#^#d^#^#^#<^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
I have the template definition for this file as follows -
HEADER
Transcode Short 2 Bytes,
Timestamp Long 4 Bytes,
Message Length Short 2 Bytes,
(Total 8 Bytes)
DATA
Security Token Short 2 Bytes,
Last Traded Price Long 4 Bytes,
Best Buy Quantity Long 4 Bytes,
Best Buy Price Long 4 Bytes,
Best Sell Quantity Long 4 Bytes,
Best Sell Price Long 4 Bytes,
Total Traded Quantity Long 4 Bytes,
Average Traded Price Long 4 Bytes,
Open Price Long 4 Bytes,
High Price Long 4 Bytes,
Low Price Long 4 Bytes,
Close Price Long 4 Bytes,
Filler Long 4 Bytes (Blank),
(Total 50 Bytes)
I tried perl's pack, unpack, ord, reading byte by byte, getting rid of those "^#" and trying to make sense of what is remaining, seeming to be Hex code, but I am not able to make this readable in ASCII strings via perl. I also tried raw, encoding, decoding and even searched stackoverflow thoroughly. There were few problems in the same league but none of those guys shared the template to decode it back. I have it but still can't figure it out.
There is something basic that I am missing but can't really point out. Would really appreciate if someone can show me step by step with code how this conversion is supposed to be done.
Have never done this before ...
$ xxd 1.bin
0000000: 0300 3b51 5d55 3a00 0700 f87f 0000 0100 ..;Q]U:.........
0000010: 0000 587f 0000 0100 0000 6b67 0000 0100 ..X.......kg....
0000020: 0000 0000 0000 587f 0000 0000 0000 0000 ......X.........
0000030: 0000 e880 0000 0000 0000 0300 4851 5d55 ............HQ]U
0000040: 3a00 0a00 109a 0000 f401 0000 d098 0000 :...............
0000050: f401 0000 ce9a 0000 0000 0000 0000 0000 ................
0000060: 0000 0000 0000 0000 0000 0000 069a 0000 ................
0000070: 0000 0000 0300 4651 5d55 3a00 0d00 a80a ......FQ]U:.....
0000080: 0200 0500 0000 601b 0200 6400 0000 1412 ......`...d.....
0000090: 0200 0500 0000 0000 0000 1412 0200 0000 ................
00000a0: 0000 0000 0000 ac0b 0200 0000 0000 0300 ................
00000b0: 4851 5d55 3a00 1300 f8f2 0000 2800 0000 HQ]U:.......(...
00000c0: 56c2 0000 3200 0000 fbf9 0000 0000 0000 V...2...........
00000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000e0: a3f2 0000 0000 0000 0300 4851 5d55 3a00 ..........HQ]U:.
00000f0: 1600 8c49 0200 0500 0000 cc4f 0200 0a00 ...I.......O....
0000100: 0000 cc4f 0200 0500 0000 0000 0000 cc4f ...O...........O
0000110: 0200 0000 0000 0000 0000 124b 0200 0000 ...........K....
Still doesn't make much sense.

The biggest issue I see is that there's more to unpacking binary data than just knowing "short" or "long".
For numeric values, you need to specify whether or not the byte data is in little or big endian order. You also need to know whether or not you're dealing with signed or unsigned values.
For this example, I'm just going to assume everything is in little endian and unsigned: which is probably wrong, but it's up to you to tweak the pack templates once you find it out. In case you need a link, try http://perldoc.perl.org/functions/pack.html
I haven't tested this on my machine, so pardon if there are any errors, but this is roughly how I would go about what you're trying to do.
#!/usr/bin/perl
use strict;
$/ = undef; #may not be necessary, I haven't tested this
open IN, "path/to/file.ext"; #open the file for reading
read(IN,my $raw_header, 8); #read 8 bytes off of the file into $raw_header
my #header = unpack("vVv", $raw_header); #unpack header into array
read(IN,my $raw_data, 50); #similar
my #data = unpack("vVVVVVVVVVVVV", $raw_data); #"vV12" is also acceptable, assuming everything is little endian and unsigned
print join "\n", #header, #data; #print all the values in order on their own lines.

Related

Perl: Reproducing a UDP packet

I'm trying to emulate the UDP output of a particular piece of software (a very simple remote control for an audio player). Capturing its output with tcpdump -X was no problem, and resulted in packets like the ones below:
Turn the device off...
18:30:03.623499 IP x.x.x.1.27490 > x.x.x.2.23500: UDP, length 32
0x0000: 4500 003c 42e6 0000 8011 92ee 0a02 2810 E..<B.........(.
0x0010: 0a02 28c9 6b62 5bcc 0028 b213 0500 007f ..(.kb[..(......
0x0020: 0100 0000 1800 0000 0000 0000 0000 0000 ................
0x0030: 0000 0000 0100 0000 0200 0000 ............
Turn the device on...
18:30:06.222808 IP x.x.x.1.27490 > x.x.x.2.23500: UDP, length 32
0x0000: 4500 003c 42e7 0000 8011 92ed 0a02 2810 E..<B.........(.
0x0010: 0a02 28c9 6b62 5bcc 0028 b213 0500 007f ..(.kb[..(......
0x0020: 0100 0000 1800 0000 0000 0000 0000 0000 ................
0x0030: 0000 0000 0200 0000 0100 0000 ............
Now I'm trying to reproduce these packets in Perl, so I can send them via IO::Socket::INET and have them arrive looking the same as the above.
I've been fooling around with the Perl pack function, but I'm afraid my understanding of what I'm looking at is insufficient to craft a working template.
Am I even on the right track here?
To turn the device off, send the following string to 10.2.40.201:23500 using UDP:
"\x05\x00\x00\x7f"."\x01\x00\x00\x00"."\x18\x00\x00\x00"."\x00\x00\x00\x00".
"\x00\x00\x00\x00"."\x00\x00\x00\x00"."\x01\x00\x00\x00"."\x02\x00\x00\x00"
To turn it on,
"\x05\x00\x00\x7f"."\x01\x00\x00\x00"."\x18\x00\x00\x00"."\x00\x00\x00\x00".
"\x00\x00\x00\x00"."\x00\x00\x00\x00"."\x02\x00\x00\x00"."\x01\x00\x00\x00"
IP packet structure
UDP packet structure
The first byte tells gives us important information.
45
Version: 4 (IPv4)
IHL: 5 (IP header is 20 bytes long, including this byte.)
Now we know how long the header is.
00 00 3c 42 e6 00 00 80 11 92 ee 0a 02 28 10
0a 02 28 c9
DSCP: 0
ECN: 0
Total length: 0x003c (60)
Identification: 0x42e6
Flags: 0
Fragment offset: 0
TTL: 0x80 (128)
Protocol: 0x11 (UDP)
Header checksum: 0x92ee
Source IP: 0x0a022810 (10.2.40.16)
Destination IP: 0x0a0228c9 (10.2.40.201)
It's an unfragmented packet, so it's the full packet.
The next 60 - 20 = 40 bytes is the IP packet payload.
It's a UDP packet, so the next 8 bytes form the UDP packet header.
6b 62 5b cc 00 28 b2 13
Source port: 0x6b62 (27490)
Destination port: 0x5bcc (23500)
Length (IP header excluded): 0x0028 (40)
Checksum: 0xb213
The next 40 - 8 = 32 bytes is the UDP packet payload.
05 00 00 7f 01 00 00 00 18 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00
To send this message, you'd use UDP send the following string to 10.2.40.201:23500:
"\x05\x00\x00\x7f"."\x01\x00\x00\x00"."\x18\x00\x00\x00"."\x00\x00\x00\x00".
"\x00\x00\x00\x00"."\x00\x00\x00\x00"."\x01\x00\x00\x00"."\x02\x00\x00\x00"
Here, I divided into four-byte strings for readability. How you build the string is irrelevant. You could use pack in a million different ways to generate this string, but there's no way to know which one is significant without knowing the protocol.
Now, you asked to replicate the packets, but that's surely unnecessary. You only need to send the same message. There are going to be differences in the headers. You might be able to control some of them (e.g. by binding the socket to 10.2.40.16:27490). But others might be more difficult. The TTL field decreases as the the packet moves through the network. The packet can become fragmented (divided into smaller packets called fragments). etc. But none of that should be relevant.

Convert from QBasic "Binary - Fast load and save" Format?

I have several hundred games I wrote as a kid some 20 years ago saved in the QBasic 7's .bas "binary" output format (Not to be confused with executable "binaries")
I have been slowly converting these by hand with QBasic in DosBox to ASCII for posterity.
I am curious if anyone know anything about the encoding used by the format such that one could write a script to decode these en masse.
I have been poking at the data a bit, I believe it is beyond me.
For instance the HEX of "ABCD" saved in this format is
fc02 0100 0d00 a801 a801 0700 0102 0304
0605 0810 10ff ff24 00ff ff64 0100 0056
0000 005b 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0061
0052 0000 0000 0161 0000 0002 6162 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
000c 0000 0038 0000 005b 0009 0008 00ff
ffff ffff ffff ff01 0000 0000 0003 01

Peer Wire Protocol - Message of length 255 and id 255

I am working on an implementation of the BitTorrent peer-wire-protocol. I am blocked by a problem with my implementation. It parses messages correctly until stopped by a message of length 255 and id 255. Below is a dump of the raw data received from the peer. I believe the erroneous message is at offset 0x3ff, with it first parsing len = 0x00FF and id=0xFF.
0000000: 1342 6974 546f 7272 656e 7420 7072 6f74 .BitTorrent prot
0000010: 6f63 6f6c 0000 0000 0010 0000 3d76 88ff ocol........=v..
0000020: 87d6 251a ad81 f5e4 fb90 468b a1a4 5ec0 ..%.......F...^.
0000030: 2d6c 7430 4436 302d 1361 5c7a b4cc 83e1 -lt0D60-.a\z....
0000040: 2d6a a503 0000 0011 0500 0000 0000 0000 -j..............
0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000100: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000110: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000120: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000130: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000140: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000150: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000160: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000170: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000180: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000190: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000200: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000210: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000220: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000230: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000240: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000250: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000260: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000270: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000280: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000290: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000300: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000310: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000320: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000330: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000340: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000350: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000360: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000370: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000380: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000390: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000400: ffff ffff ffff ffff ffff ffff ffff fff0 ................
0000410: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000420: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000430: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000440: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000450: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000460: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000470: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000480: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000490: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000500: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000510: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000520: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000530: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000540: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000550: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000560: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000570: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000580: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000590: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000600: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000610: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000620: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000630: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000640: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000650: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000660: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000670: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000680: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000690: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000700: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000710: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000720: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000730: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000740: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000750: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000760: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000770: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000780: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000790: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000800: 0a
My implementation only handles messages with a length prefix and id, as specified by the BitTorrent protocol, and processes socket data accordingly.
I see nothing in the BitTorrent spec about handling any other kind of message or data when parsing data from a socket. But data of this kind has been received with every peer I try my client, i.e. failing to parse a message with a rediculous length like 0x00FF or 0xFFFF, many more keep-alives than I would expect, and all ending with byte 0x0a. What further does my peer-wire-protocol implementation need?
0000 0011 0500
This indicates a 17-bytes long bitfield (minus the message type). The large block of zeroes and FFFFs after that doesn't make sense if that's indeed the real bitfield length. So either the length is wrong or the block after that is rubbish.

Can wprintf output be properly redirected to UTF-16 on Windows?

In a C program I'm using wprintf to print Unicode (UTF-16) text in a Windows console. This works fine, but when the output of the program is redirected to a log file, the log file has a corrupted UTF-16 encoding.
When redirection is done in a Windows Command Prompt, all line breaks are encoded as a narrow ASCII line break (0d0a). When redirection is done in PowerShell, null characters are inserted.
Is it possible to redirect the output to a proper UTF-16 log file?
Example program:
#include <stdio.h>
#include <windows.h>
#include <fcntl.h>
#include <io.h>
int main () {
int prevmode;
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
_setmode(_fileno(stdout), prevmode);
return 0;
}
Redirecting the output in Command Prompt. See the 0d0a which should be 0d00 0a00:
c:\test>.\testu16.exe > o.txt
c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d0a 0074 0077 006f 000d o.n.e....t.w.o..
0000010: 0a00 7400 6800 7200 6500 6500 0d0a 00 ..t.h.r.e.e....
Redirecting the output in PowerShell. See all the 0000 inserted.
PS C:\test> .\testu16.exe > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 7400 0000 7700 0000 6f00 0000 ....t...w...o...
0000020: 0d00 0a00 0000 7400 0000 6800 0000 7200 ......t...h...r.
0000030: 0000 6500 0000 6500 0000 0d00 0a00 0000 ..e...e.........
0000040: 0d00 0a00 ....
I got this answer from Hans Passant.
Thanks Hans.
The wrong line breaks are an effect of the buffering of stdout. We need to flush the stream before we set the mode back to the original mode.
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout); /* flush stream */
_setmode(_fileno(stdout), prevmode);
Redirecting the output in Command Prompt (cmd.exe) creates a correct UTF-16 file, without BOM.
c:\test>.\testu16 > o.txt
c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d00 0a00 7400 7700 6f00 o.n.e.....t.w.o.
0000010: 0d00 0a00 7400 6800 7200 6500 6500 0d00 ....t.h.r.e.e...
0000020: 0a00 ..
In powershell the output is still wrong.
PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700 ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00 ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500 ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00 ..e.............
0000050: 0000 0d00 0a00 ......
This is because PowerShell doesn't keep the stream untouched. It tries to interpret it and convert it to UTF-16. It guessed that the input stream encoding was ANSI. PowerShell added an UTF-16 BOM and the rest is double encoded UTF-16. This explains the extra zeros.
Even using out-file and specifying the encoding doesn't help.
PS C:\test> .\testu16.exe | out-file p.txt -encoding unicode
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700 ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00 ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500 ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00 ..e.............
0000050: 0000 0d00 0a00 ......
PowerShell needs to be informed about the encoding, which is done by first printing an UTF-16 BOM:
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout, L"\xfeff"); /* UTF-16LE BOM */
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout); /* flush stream */
_setmode(_fileno(stdout), prevmode);
Now we get a correct UTF-16 file.
PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 6e00 6500 0d00 0a00 7400 7700 ..o.n.e.....t.w.
0000010: 6f00 0d00 0a00 7400 6800 7200 6500 6500 o.....t.h.r.e.e.
0000020: 0d00 0a00
">" will always redirect your console UTF16 as printable "ASCII", even if you put a BOM on your output or use prevmode = _setmode(_fileno(stdout), _O_BINARY);. I have the same problem with windows7 there is no way to do this with fwprintf.

RIFF WAV header format 2014 update?

I'm trying to decode and play WAV files in perl for further operations, I have found some references about format, and some interesting q+a
What does a audio frame contain?
error in reading a wav file with C++
Writing musical notes to a wav file
I found out the "Cannonical WAVE file format"
But at the end I'm testing 2 different WAV files that doesn't follow the "standard". Mplayer has no problems at all at reading data, and I figured out a workaround on my perl code:
sysread WAV, $riff, 12;
sysread WAV, $fmt, 24;
do{
sysread WAV, $wtf, 2;
}while( unpack("A4",$wtf) ne "da" );
sysread WAV, $wtf, 2;
#94
sysread WAV, $data, 4;
Still it troubles me how it really works, and what is that variable data is between the field "bits per sample" and the "data" field.
Thank you guys!
(I'm getting addicted to this forums)
test2.wav
v--------- riff --------------v---------
0000000 4952 4646 685e 0931 4157 4556 6d66 2074
-------------- fmt --------------------
0000010 0028 0000 fffe 0006 bb80 0000 ca00 0008
---------v-----------------------------
0000020 000c 0010 0016 0010 060f 0000 0001 0000
---------------------------------------
0000030 0000 0010 0080 aa00 3800 719b 494c 5453
---------------------------------------
0000040 001a 0000 4e49 4f46 5349 5446 000e 0000
----------------------------------v----
0000050 614c 6676 3535 312e 2e39 3031 0034 6164
----v---------v
0000060 6174 6800 0931 0000 0000 0000 0000 0000
0000070 0000 0000 0000 0000 0000 0000 0000 0000
test.wav
v--------- riff --------------v---------
0000000 4952 4646 7048 095b 4157 4556 6d66 2074
-------------- fmt --------------------
0000010 0012 0000 0001 0002 ac44 0000 b110 0002
---------v-----------------------------
0000020 0004 0010 0000 494c 5453 001a 0000 4e49
---------------------------------------
0000030 4f46 5349 5446 000e 0000 614c 6676 3535
-------------------v---------v---------v
0000040 312e 2e39 3031 0034 6164 6174 7000 095b
0000050 0000 0000 0000 0000 0000 0000 0000 0000
The AudioFormat field in test2.wav is 0xfffe which indicates that is the header is WAVEFORMATEXTENSIBLE. When this happens then you need to interpret the rest of the header differently.
AudioFormat : 2
NumChannels : 2
SampleRate : 4
ByteRate : 4
BlockAlign : 2
BitsPerSample : 2
cbSize : 2 - size of the rest of the chunk
ChannelMask : 4
SubFormat : 16 - GUID
For more information look at some docs on WAVEFORMATEX and WAVEFORMATEXTENSIBLE