Peer Wire Protocol - Message of length 255 and id 255 - sockets

I am working on an implementation of the BitTorrent peer-wire-protocol. I am blocked by a problem with my implementation. It parses messages correctly until stopped by a message of length 255 and id 255. Below is a dump of the raw data received from the peer. I believe the erroneous message is at offset 0x3ff, with it first parsing len = 0x00FF and id=0xFF.
0000000: 1342 6974 546f 7272 656e 7420 7072 6f74 .BitTorrent prot
0000010: 6f63 6f6c 0000 0000 0010 0000 3d76 88ff ocol........=v..
0000020: 87d6 251a ad81 f5e4 fb90 468b a1a4 5ec0 ..%.......F...^.
0000030: 2d6c 7430 4436 302d 1361 5c7a b4cc 83e1 -lt0D60-.a\z....
0000040: 2d6a a503 0000 0011 0500 0000 0000 0000 -j..............
0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000100: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000110: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000120: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000130: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000140: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000150: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000160: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000170: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000180: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000190: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000200: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000210: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000220: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000230: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000240: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000250: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000260: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000270: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000280: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000290: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000300: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000310: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000320: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000330: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000340: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000350: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000360: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000370: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000380: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000390: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000400: ffff ffff ffff ffff ffff ffff ffff fff0 ................
0000410: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000420: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000430: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000440: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000450: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000460: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000470: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000480: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000490: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000500: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000510: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000520: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000530: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000540: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000550: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000560: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000570: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000580: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000590: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00005f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000600: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000610: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000620: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000630: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000640: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000650: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000660: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000670: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000680: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000690: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00006f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000700: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000710: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000720: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000730: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000740: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000750: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000760: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000770: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000780: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000790: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00007f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000800: 0a
My implementation only handles messages with a length prefix and id, as specified by the BitTorrent protocol, and processes socket data accordingly.
I see nothing in the BitTorrent spec about handling any other kind of message or data when parsing data from a socket. But data of this kind has been received with every peer I try my client, i.e. failing to parse a message with a rediculous length like 0x00FF or 0xFFFF, many more keep-alives than I would expect, and all ending with byte 0x0a. What further does my peer-wire-protocol implementation need?

0000 0011 0500
This indicates a 17-bytes long bitfield (minus the message type). The large block of zeroes and FFFFs after that doesn't make sense if that's indeed the real bitfield length. So either the length is wrong or the block after that is rubbish.

Related

Convert from QBasic "Binary - Fast load and save" Format?

I have several hundred games I wrote as a kid some 20 years ago saved in the QBasic 7's .bas "binary" output format (Not to be confused with executable "binaries")
I have been slowly converting these by hand with QBasic in DosBox to ASCII for posterity.
I am curious if anyone know anything about the encoding used by the format such that one could write a script to decode these en masse.
I have been poking at the data a bit, I believe it is beyond me.
For instance the HEX of "ABCD" saved in this format is
fc02 0100 0d00 a801 a801 0700 0102 0304
0605 0810 10ff ff24 00ff ff64 0100 0056
0000 005b 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0061
0052 0000 0000 0161 0000 0002 6162 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
000c 0000 0038 0000 005b 0009 0008 00ff
ffff ffff ffff ff01 0000 0000 0003 01

tcpdump of tcpreplay output does not match input

I am having an issue in which on certain machines the number of bytes that tcpdump reports tcpreplay has output does not match tcpreplay's input.
Specifically, tcpdump always reports 14 bytes more than the pcap given to tcpreplay.
To replicate, I've created a simple packet in scapy using the command:
packet = Ether()/IP(dst='1.2.3.4')/TCP()/Raw(load='S:' + ('-' * 64) + ':E')
wrpcap("tcp.pcap", packet)
I set up virtual interfaces with:
ip link add front1 type veth peer name back1
ifconfig back1 up
ifconfig front1 up
Monitor the input to the interface with:
sudo tcpdump -XX -Q out -i front1
Then send the generated packet with:
sudo tcpreplay -i front1 tcp.pcap
The tcpdump monitor produces:
0x0000: d4ae 52c1 2005 2c59 e547 2ca4 0800 4500 ..R...,Y.G,...E.
0x0010: 006c 0001 0000 4006 d31c 9e82 04e7 0102 .l....#.........
0x0020: 0304 0014 0050 0000 0000 0000 0000 5002 .....P........P.
0x0030: 2000 b4a6 0000 533a 2d2d 2d2d 2d2d 2d2d ......S:--------
0x0040: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0050: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0060: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0070: 2d2d 2d2d 2d2d 2d2d 3a45 0000 0000 5400 --------:E....T.
0x0080: 0000 0000 0000 0000 ........
Whereas a tcpdump of the original file produces:
0x0000: d4ae 52c1 2005 2c59 e547 2ca4 0800 4500 ..R...,Y.G,...E.
0x0010: 006c 0001 0000 4006 d31c 9e82 04e7 0102 .l....#.........
0x0020: 0304 0014 0050 0000 0000 0000 0000 5002 .....P........P.
0x0030: 2000 b4a6 0000 533a 2d2d 2d2d 2d2d 2d2d ......S:--------
0x0040: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0050: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0060: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0070: 2d2d 2d2d 2d2d 2d2d 3a45 --------:E
That is, the monitor produces content identical to the file with 14 extra bytes appended.
This appears to be happening regardless of the size of the input.
I've verified that this problem does not occur on other machines, but cannot identify the settings that cause it to occur.
Some version information:
$ tcpreplay -V
tcpreplay version: 3.4.4 (build 2450) (debug)
Copyright 2000-2010 by Aaron Turner <aturner at synfin dot net>
Cache file supported: 04
Not compiled with libdnet.
Compiled against libpcap: 1.7.4
64 bit packet counters: enabled
Verbose printing via tcpdump: enabled
Packet editing: disabled
Fragroute engine: disabled
Injection method: PF_PACKET send()
$ tcpdump --version
tcpdump version 4.9.2
libpcap version 1.7.4
OpenSSL 1.0.2g 1 Mar 2016
Running on Ubuntu 16.04.5
This wound up being due to a bug in the linux kernel 4.15.0 as reported in the following bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1782544

Can wprintf output be properly redirected to UTF-16 on Windows?

In a C program I'm using wprintf to print Unicode (UTF-16) text in a Windows console. This works fine, but when the output of the program is redirected to a log file, the log file has a corrupted UTF-16 encoding.
When redirection is done in a Windows Command Prompt, all line breaks are encoded as a narrow ASCII line break (0d0a). When redirection is done in PowerShell, null characters are inserted.
Is it possible to redirect the output to a proper UTF-16 log file?
Example program:
#include <stdio.h>
#include <windows.h>
#include <fcntl.h>
#include <io.h>
int main () {
int prevmode;
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
_setmode(_fileno(stdout), prevmode);
return 0;
}
Redirecting the output in Command Prompt. See the 0d0a which should be 0d00 0a00:
c:\test>.\testu16.exe > o.txt
c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d0a 0074 0077 006f 000d o.n.e....t.w.o..
0000010: 0a00 7400 6800 7200 6500 6500 0d0a 00 ..t.h.r.e.e....
Redirecting the output in PowerShell. See all the 0000 inserted.
PS C:\test> .\testu16.exe > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 7400 0000 7700 0000 6f00 0000 ....t...w...o...
0000020: 0d00 0a00 0000 7400 0000 6800 0000 7200 ......t...h...r.
0000030: 0000 6500 0000 6500 0000 0d00 0a00 0000 ..e...e.........
0000040: 0d00 0a00 ....
I got this answer from Hans Passant.
Thanks Hans.
The wrong line breaks are an effect of the buffering of stdout. We need to flush the stream before we set the mode back to the original mode.
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout); /* flush stream */
_setmode(_fileno(stdout), prevmode);
Redirecting the output in Command Prompt (cmd.exe) creates a correct UTF-16 file, without BOM.
c:\test>.\testu16 > o.txt
c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d00 0a00 7400 7700 6f00 o.n.e.....t.w.o.
0000010: 0d00 0a00 7400 6800 7200 6500 6500 0d00 ....t.h.r.e.e...
0000020: 0a00 ..
In powershell the output is still wrong.
PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700 ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00 ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500 ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00 ..e.............
0000050: 0000 0d00 0a00 ......
This is because PowerShell doesn't keep the stream untouched. It tries to interpret it and convert it to UTF-16. It guessed that the input stream encoding was ANSI. PowerShell added an UTF-16 BOM and the rest is double encoded UTF-16. This explains the extra zeros.
Even using out-file and specifying the encoding doesn't help.
PS C:\test> .\testu16.exe | out-file p.txt -encoding unicode
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700 ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00 ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500 ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00 ..e.............
0000050: 0000 0d00 0a00 ......
PowerShell needs to be informed about the encoding, which is done by first printing an UTF-16 BOM:
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout, L"\xfeff"); /* UTF-16LE BOM */
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout); /* flush stream */
_setmode(_fileno(stdout), prevmode);
Now we get a correct UTF-16 file.
PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 6e00 6500 0d00 0a00 7400 7700 ..o.n.e.....t.w.
0000010: 6f00 0d00 0a00 7400 6800 7200 6500 6500 o.....t.h.r.e.e.
0000020: 0d00 0a00
">" will always redirect your console UTF16 as printable "ASCII", even if you put a BOM on your output or use prevmode = _setmode(_fileno(stdout), _O_BINARY);. I have the same problem with windows7 there is no way to do this with fwprintf.

Convert Binary File To String In Perl

Ok. I have spent the last 14 hours trying to figure this out. I have a binary file with the following contents - (much more, but this is truncated version). I wish to convert this to readable string format.
^#^P<9A>^#^#^A^#^#И^#^#^A^#^#Κ^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^F<9A>^#^#^#^#^#^#^C^#FQ]U:^#^M^#
^B^#^E^#^#^#`ESC^B^#d^#^#^#^T^R^B^#^E^#^#^#^#^#^#^#^T^R^B^#^#^#^#^#^#^#^#^#^K^B^#^#^#^#^#^C^#HQ]U:^#^S^#^#^#(^#^#^#V^#^#2^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^C^#HQ]U:^#^V^#<8C>I^B^#^E^#^#^#O^B^#
^#^#^#O^B^#^E^#^#^#^#^#^#^#O^B^#^#^#^#^#^#^#^#^#^RK^B^#^#^#^#^#^C^#HQ]U:^#^Y^#0^A^#d^#^#^#1^A^#<96>^#^#^#L0^A^#d^#^#^#^#^#^#
^#71^A^#^#^#^#^#^#^#^#^#0^A^#^#^#^#^#^C^#=Q]U:^#"^#<92>T^#^#2^#^#^#CN^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^AT^#^#
^#^#^#^#^C^#FQ]U:^#(^#$^M^A^# ^#^#^#^G^A^#2^#^#^#^O^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#R^L^A^#^#^#^#^#^C^#=Q]U:^#.^#<85>^B
^#^#^G^#^#g^B^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#<85>^B^#^#^#^#^#^#^C^#HQ]U:^#4^#^CH^#^#^Y^#^#^#G^#^#d^#^#^#
H^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^CH^#^#^#^#^#^#^C^#HQ]U:^#O^#^M^#^#<89>^#^#^#^G^M^#^#^A^#^#^P^N^#^#^#^#^#^#^#^#^#^#
^#^#^#^#^#^#^#^#^#^#^#^#^M^#^#^#^#^#^#^C^#HQ]U:^#R^#^B^#^#^A^#^#^B^#^#<8C>0^B^#^B^#^#^A^#^#^#^#^#^#^B^#^#^#^#^#^#^#^#^#^#^B^#^#^#^#^#^#^C^#HQ]U:^#d^#F^A^#
^#^#^#^TJ^A^#
^#^#^#<98>M^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#<8A>G^A^#^#^#^#^#^C^#HQ]U:^#y^#j;^#^#^A^#^#^#=;^#^#d^#^#^#(<^#^#^C^#^#^#^#^#^#P<^#^#^#^#^#^#^#^#^#^#=;^#^#^#^#^#^#^C^#FQ]U:^#<88>^#&^#^#^A^#^#^#&^#^#d^#^#^#'^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#&^#^#
^#^#^#^#^C^#FQ]U:^#<94>^#^H^#^#^#^#^#^H^#^#d^#^#^#
^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^H^#^#^#^#^#^#^C^#HQ]U:^#<9A>^#w^#^#^A^#^#^#\^#^#^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#Z^#^#^#^#^#^#^C^#HQ]U:^#<9D>^#^A^B^#
^#^#^#^A^B^#^A^#^#^#^A^#
^#^#^#^#^#^#^#"^A^#^#^#^#^#^#^#^#^#^A^#^#^#^#^#^C^#HQ]U:^#^#4I^#^#^A^#^#^#DH^#^#^A^#^#^#^]B^#^#<9E>^#^#^#^#^#^#^#I^#^#^#^#^#^#^#^#^#^#MI^#^#^#^#^#^#^C^#FQ]U:^#^#y^#^#^A^#^#^#^Xy^#^#^A^#^#^#]a^#^#^C^#^#^#^#^#^#^#Px^#^#^#^#^#^#^#^#^#^#wy^#^#^#^#^#^#^C^#HQ]U:
^#^#V^^#^#^T^#^#^#^^#^#^A^#^#^#ZU^#^#e^#^#^#^#^#^#^#$^^#^#^#^#^#^#^#^#^#^#^^#^#^#^#^#^#^C^#DQ]U:^#^#DESC^#^#^A^#^#XESC^#
^#^A^#^#^#<84>^\^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#<80>ESC^#^#^#^#^#^#^C^#HQ]U:^#^#ESC^#^#2^#^#^#ESC^#^#d^#^#^#ESC^#^#^#^#^#^#^#^#^#ESC^#^#^#^#^#^#^#^#^#^#ESC^#^#^#^#^#^#^C^#HQ]U:^#^#<8B>-^A^#^#^#^##-^A^#<^#^#^#,^A^##^#^#^#^#^#^#^##-^A^#^#^#^#^#^#^#^#^##-^A^#^#^#^#^#^C^#HQ]U:^#^#<86>^A^#^#U^#^#<86>^A^#^##<9C>^#^#<90>^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
<86>^A^#^#^#^#^#^#^C^#FQ]U:^#^G^A^T^A^#
^#^#^#Y^A^#Q^#^#^#^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^P^A^#^#^#^#^#^C^#HQ]U:^#^S^A^^^B^#^A^#^#^#<80>2^B^#
^#^#^#^O^B^#n^#^#^#^#^#^#^#^^B^#^#^#^#^#^#^#^#^#^_^B^#^#^#^#^#^C^#DQ]U:^#^V^A4^A^#^#^#^#^P!^A^#K^#^#^#8D^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#*^A^#^#^#^#^#^C^#?Q]U:^#.^Aw^F^#^#^A^#^#^#h^F^#^#^O^#^#b^G^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#w^F^#^#^#^#^#^#^C^#HQ]U:^#1^A^A^#^A^#^#^#^A^#^\^B^#^#X^O^B^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^A^#^#^#^#^#^C^#HQ]U:^#4^A^X^F^#^#^G^#^#x^E^#^#^Z^D^#^##^F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^X^F^#^#^#^#^#^#^C^#FQ]U:^#=^A^L^F^#^A^#^#^#\^F^#^G^#^#^#X^F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#S^F^#^#^#^#^#^C^#=Q]U:^#O^A^P!^A^#^#^#^#^#^#^#^#^#^A^#^#^#^H^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
"^A^#^#^#^#^#^C^#BQ]U:^#R^AX^#^#^Y^#^#^#^#^#^E^#^#^#x^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^C^#HQ]U:^#U^A^R^Q^#^#^A^#^#^#^P^#^#2^#^#^#^P^#^#^A^#^#^#^#^#^#^#^P^#^#^#^#^#^#^#^#^#^#^H^Q^#^#^#^#^#^#^C^##Q]U:^#^^An^A^#^A^#^#^#pM
^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#hn^A^#^#^#^#^#^C^#HQ]U:^#p^A<9D>^A^#^B^#^#^#d<90>^A^#^E^#^#^#<90>^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^R<9C>^A^#^#^#^#^#^C^#HQ]U:^#s^A^A^#^Y^#^#^#ȩ^A^#^T^#^#^#а^A^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
^#^#^#y^A^#^#^#^#^#^C^#HQ]U:^#|^A<8E>^#^#^A^#^#M<9E>^#^#d^#^#^#<^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
I have the template definition for this file as follows -
HEADER
Transcode Short 2 Bytes,
Timestamp Long 4 Bytes,
Message Length Short 2 Bytes,
(Total 8 Bytes)
DATA
Security Token Short 2 Bytes,
Last Traded Price Long 4 Bytes,
Best Buy Quantity Long 4 Bytes,
Best Buy Price Long 4 Bytes,
Best Sell Quantity Long 4 Bytes,
Best Sell Price Long 4 Bytes,
Total Traded Quantity Long 4 Bytes,
Average Traded Price Long 4 Bytes,
Open Price Long 4 Bytes,
High Price Long 4 Bytes,
Low Price Long 4 Bytes,
Close Price Long 4 Bytes,
Filler Long 4 Bytes (Blank),
(Total 50 Bytes)
I tried perl's pack, unpack, ord, reading byte by byte, getting rid of those "^#" and trying to make sense of what is remaining, seeming to be Hex code, but I am not able to make this readable in ASCII strings via perl. I also tried raw, encoding, decoding and even searched stackoverflow thoroughly. There were few problems in the same league but none of those guys shared the template to decode it back. I have it but still can't figure it out.
There is something basic that I am missing but can't really point out. Would really appreciate if someone can show me step by step with code how this conversion is supposed to be done.
Have never done this before ...
$ xxd 1.bin
0000000: 0300 3b51 5d55 3a00 0700 f87f 0000 0100 ..;Q]U:.........
0000010: 0000 587f 0000 0100 0000 6b67 0000 0100 ..X.......kg....
0000020: 0000 0000 0000 587f 0000 0000 0000 0000 ......X.........
0000030: 0000 e880 0000 0000 0000 0300 4851 5d55 ............HQ]U
0000040: 3a00 0a00 109a 0000 f401 0000 d098 0000 :...............
0000050: f401 0000 ce9a 0000 0000 0000 0000 0000 ................
0000060: 0000 0000 0000 0000 0000 0000 069a 0000 ................
0000070: 0000 0000 0300 4651 5d55 3a00 0d00 a80a ......FQ]U:.....
0000080: 0200 0500 0000 601b 0200 6400 0000 1412 ......`...d.....
0000090: 0200 0500 0000 0000 0000 1412 0200 0000 ................
00000a0: 0000 0000 0000 ac0b 0200 0000 0000 0300 ................
00000b0: 4851 5d55 3a00 1300 f8f2 0000 2800 0000 HQ]U:.......(...
00000c0: 56c2 0000 3200 0000 fbf9 0000 0000 0000 V...2...........
00000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000e0: a3f2 0000 0000 0000 0300 4851 5d55 3a00 ..........HQ]U:.
00000f0: 1600 8c49 0200 0500 0000 cc4f 0200 0a00 ...I.......O....
0000100: 0000 cc4f 0200 0500 0000 0000 0000 cc4f ...O...........O
0000110: 0200 0000 0000 0000 0000 124b 0200 0000 ...........K....
Still doesn't make much sense.
The biggest issue I see is that there's more to unpacking binary data than just knowing "short" or "long".
For numeric values, you need to specify whether or not the byte data is in little or big endian order. You also need to know whether or not you're dealing with signed or unsigned values.
For this example, I'm just going to assume everything is in little endian and unsigned: which is probably wrong, but it's up to you to tweak the pack templates once you find it out. In case you need a link, try http://perldoc.perl.org/functions/pack.html
I haven't tested this on my machine, so pardon if there are any errors, but this is roughly how I would go about what you're trying to do.
#!/usr/bin/perl
use strict;
$/ = undef; #may not be necessary, I haven't tested this
open IN, "path/to/file.ext"; #open the file for reading
read(IN,my $raw_header, 8); #read 8 bytes off of the file into $raw_header
my #header = unpack("vVv", $raw_header); #unpack header into array
read(IN,my $raw_data, 50); #similar
my #data = unpack("vVVVVVVVVVVVV", $raw_data); #"vV12" is also acceptable, assuming everything is little endian and unsigned
print join "\n", #header, #data; #print all the values in order on their own lines.

RIFF WAV header format 2014 update?

I'm trying to decode and play WAV files in perl for further operations, I have found some references about format, and some interesting q+a
What does a audio frame contain?
error in reading a wav file with C++
Writing musical notes to a wav file
I found out the "Cannonical WAVE file format"
But at the end I'm testing 2 different WAV files that doesn't follow the "standard". Mplayer has no problems at all at reading data, and I figured out a workaround on my perl code:
sysread WAV, $riff, 12;
sysread WAV, $fmt, 24;
do{
sysread WAV, $wtf, 2;
}while( unpack("A4",$wtf) ne "da" );
sysread WAV, $wtf, 2;
#94
sysread WAV, $data, 4;
Still it troubles me how it really works, and what is that variable data is between the field "bits per sample" and the "data" field.
Thank you guys!
(I'm getting addicted to this forums)
test2.wav
v--------- riff --------------v---------
0000000 4952 4646 685e 0931 4157 4556 6d66 2074
-------------- fmt --------------------
0000010 0028 0000 fffe 0006 bb80 0000 ca00 0008
---------v-----------------------------
0000020 000c 0010 0016 0010 060f 0000 0001 0000
---------------------------------------
0000030 0000 0010 0080 aa00 3800 719b 494c 5453
---------------------------------------
0000040 001a 0000 4e49 4f46 5349 5446 000e 0000
----------------------------------v----
0000050 614c 6676 3535 312e 2e39 3031 0034 6164
----v---------v
0000060 6174 6800 0931 0000 0000 0000 0000 0000
0000070 0000 0000 0000 0000 0000 0000 0000 0000
test.wav
v--------- riff --------------v---------
0000000 4952 4646 7048 095b 4157 4556 6d66 2074
-------------- fmt --------------------
0000010 0012 0000 0001 0002 ac44 0000 b110 0002
---------v-----------------------------
0000020 0004 0010 0000 494c 5453 001a 0000 4e49
---------------------------------------
0000030 4f46 5349 5446 000e 0000 614c 6676 3535
-------------------v---------v---------v
0000040 312e 2e39 3031 0034 6164 6174 7000 095b
0000050 0000 0000 0000 0000 0000 0000 0000 0000
The AudioFormat field in test2.wav is 0xfffe which indicates that is the header is WAVEFORMATEXTENSIBLE. When this happens then you need to interpret the rest of the header differently.
AudioFormat : 2
NumChannels : 2
SampleRate : 4
ByteRate : 4
BlockAlign : 2
BitsPerSample : 2
cbSize : 2 - size of the rest of the chunk
ChannelMask : 4
SubFormat : 16 - GUID
For more information look at some docs on WAVEFORMATEX and WAVEFORMATEXTENSIBLE