What exactly is the info_Hash in a torrent file - hash

I am reading lately a lot about hash from torrents, and magnetic links, etc. But there is a question I don't understand.
I have:
hash of a file
and the infohash of a torrent
Is the infohash = hash of the file ?
If yes what if the torrent describes 6 Files to download?
If no what does it stand for?

So I finally figured it out.
The “infohash” is the SHA1 Hash over the part of a torrent file that includes:
ITEM: length(size) and path (path with filename)
Name: The name to search for
Piece length: The length(size) of a single piece
Pieces: SHA1 Hash of EVERY piece of this torrent
Private: flag for restricted access
To show this a little more I took a random torrent file and used the “BEncode Editor” from Ultima to make it more clearly to me.
As you can see the the red box marked the information part of the torrent file.
The torrent file includes not the Hash of the items, but the hashes of every piece.
For item1 with: 1069496548
and item2 with: 223
It is together: 1069496771
With a piece size of: 524288
There are 2040 pieces. (1069496771/524288=2039.9032 approximately)
The pieces section includes 40800 byte of data what are 81600 + 2 chars in the file.
the +2 because 0x marks that this is hexadecimal.
A SHA1 hash has 40 0x chars or 20 Byte of data what are 2040 SHA1 hashes.
I am sorry that this information is about a torrent that leads to a illegal movie, but i wanted to use a torrent that realy exists.

I wanted to add another example, slightly more concrete.
We start with one of the smallest .torrent files i have:
64 34 3A 69 6E 66 6F 64 35 3A 66 69 6C 65 73 6C 64 36 3A 6C 65 6E 67 74
68 69 36 31 35 65 34 3A 70 61 74 68 6C 32 36 3A 66 72 65 65 20 61 75 64
69 6F 62 6F 6F 6B 20 76 65 72 73 69 6F 6E 2E 74 78 74 65 65 64 36 3A 6C
65 6E 67 74 68 69 33 39 33 34 31 37 65 34 3A 70 61 74 68 6C 36 31 3A 57
61 72 63 72 61 66 74 5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65 20
4E 6F 76 65 6C 69 7A 61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69 65
20 47 6F 6C 64 65 6E 2E 65 70 75 62 65 65 65 34 3A 6E 61 6D 65 36 31 3A
57 61 72 63 72 61 66 74 5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65
20 4E 6F 76 65 6C 69 7A 61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69
65 20 47 6F 6C 64 65 6E 20 45 50 55 42 31 32 3A 70 69 65 63 65 20 6C 65
6E 67 74 68 69 31 30 34 38 35 37 36 65 36 3A 70 69 65 63 65 73 32 30 3A
43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
If we decode the BEncoding:
64 ; DICTIONARY (d)
| 34 3A 69 6E 66 6F ; - 4:info
| 64 ; - DICTIONARY (d)
| | 35 3A 66 69 6C 65 73 ; - 5:files
| | 6C ; - LIST (l)
| | | 64 ; - DICTIONARY (d)
| | | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | | 69 36 31 35 65 ; - i615e
| | | | 34 3A 70 61 74 68 ; - 4:path
| | | | 6C ; - LIST (l)
| | | | | 32 36 3A 66 72 65 65 20 61 75 ; - 26:free audiobook verison.txt
| | | | | 64 69 6F 62 6F 6F 6B 20 76 65 ;
| | | | | 72 73 69 6F 6E 2E 74 78 74 ;
| | | | 65 ; - END (e)
| | | 65 ; - END (e)
| | | 64 ; - DICTIONARY (d)
| | | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | | 69 33 39 33 34 31 37 65 ; - i393417e
| | | | 34 3A 70 61 74 68 ; - 4:path
| | | | 6C ; - LIST (l)
| | | | | 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden.epub
| | | | | 74 5F 20 4F 66 66 69 63 69 61 ;
| | | | | 6C 20 4D 6F 76 69 65 20 4E 6F ;
| | | | | 76 65 6C 69 7A 61 74 69 6F 6E ;
| | | | | 20 62 79 20 43 68 72 69 73 74 ;
| | | | | 69 65 20 47 6F 6C 64 65 6E 2E ;
| | | | | 65 70 75 62 ;
| | | | 65 ; - END (e)
| | | 65 ; - END (e)
| | 65 ; - END (e)
| | 34 3A 6E 61 6D 65 ; - 4:name
| | 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden EPUB
| | 74 5F 20 4F 66 66 69 63 69 61 ;
| | 6C 20 4D 6F 76 69 65 20 4E 6F ;
| | 76 65 6C 69 7A 61 74 69 6F 6E ;
| | 20 62 79 20 43 68 72 69 73 74 ;
| | 69 65 20 47 6F 6C 64 65 6E 20 ;
| | 45 50 55 42 ;
| | 31 32 3A 70 69 65 63 65 20 6C ; - 12:piece length
| | 65 6E 67 74 68 ;
| | 69 31 30 34 38 35 37 36 65 ; - i10485765e
| | 36 3A 70 69 65 63 65 73 ; - 6:pieces
| | 32 30 3A 43 92 4C 22 BB 42 9E ; - 20:43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
| | EA BD FF 66 C6 79 4C 29 E4 F9 ;
| | D0 F3 B9 ;
| 65 ; - END (e)
65 ; - END (e)
Or, in pseudo-json:
{
info: {
files: [
{ length: 615, path: ["free audiobook verison.txt"] },
{ length: 393417, path: ["Warcraft_ Official Movie Novelization by Christie Golden.epub"] }
],
name: "Warcraft_ Official Movie Novelization by Christie Golden EPUB",
"piece length": 10485765,
pieces: 43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
}
}
InfoHash is hash of the info
The InfoHash is the SHA-1 hash of the info dictionary contents.
We want to take the SHA-1 hash of value of the info dictionary key:
64 ; - DICTIONARY (d)
| 35 3A 66 69 6C 65 73 ; - 5:files
| 6C ; - LIST (l)
| | 64 ; - DICTIONARY (d)
| | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | 69 36 31 35 65 ; - i615e
| | | 34 3A 70 61 74 68 ; - 4:path
| | | 6C ; - LIST (l)
| | | | 32 36 3A 66 72 65 65 20 61 75 ; - 26:free audiobook verison.txt
| | | | 64 69 6F 62 6F 6F 6B 20 76 65 ;
| | | | 72 73 69 6F 6E 2E 74 78 74 ;
| | | 65 ; - END (e)
| | 65 ; - END (e)
| | 64 ; - DICTIONARY (d)
| | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | 69 33 39 33 34 31 37 65 ; - i393417e
| | | 34 3A 70 61 74 68 ; - 4:path
| | | 6C ; - LIST (l)
| | | | 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden.epub
| | | | 74 5F 20 4F 66 66 69 63 69 61 ;
| | | | 6C 20 4D 6F 76 69 65 20 4E 6F ;
| | | | 76 65 6C 69 7A 61 74 69 6F 6E ;
| | | | 20 62 79 20 43 68 72 69 73 74 ;
| | | | 69 65 20 47 6F 6C 64 65 6E 2E ;
| | | | 65 70 75 62 ;
| | | 65 ; - END (e)
| | 65 ; - END (e)
| 65 ; - END (e)
| 34 3A 6E 61 6D 65 ; - 4:name
| 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden EPUB
| 74 5F 20 4F 66 66 69 63 69 61 ;
| 6C 20 4D 6F 76 69 65 20 4E 6F ;
| 76 65 6C 69 7A 61 74 69 6F 6E ;
| 20 62 79 20 43 68 72 69 73 74 ;
| 69 65 20 47 6F 6C 64 65 6E 20 ;
| 45 50 55 42 ;
| 31 32 3A 70 69 65 63 65 20 6C ; - 12:piece length
| 65 6E 67 74 68 ;
| 69 31 30 34 38 35 37 36 65 ; - i10485765e
| 36 3A 70 69 65 63 65 73 ; - 6:pieces
| 32 30 3A 43 92 4C 22 BB 42 9E ; - 20:43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
| EA BD FF 66 C6 79 4C 29 E4 F9 ;
| D0 F3 B9 ;
65 ; - END (e)
We run all these bytes together:
64 35 3A 66 69 6C 65 73 6C 64 36 3A 6C 65 6E 67 74 68 69 36 31 35 65 34
3A 70 61 74 68 6C 32 36 3A 66 72 65 65 20 61 75 64 69 6F 62 6F 6F 6B 20
76 65 72 73 69 6F 6E 2E 74 78 74 65 65 64 36 3A 6C 65 6E 67 74 68 69 33
39 33 34 31 37 65 34 3A 70 61 74 68 6C 36 31 3A 57 61 72 63 72 61 66 74
5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65 20 4E 6F 76 65 6C 69 7A
61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69 65 20 47 6F 6C 64 65 6E
2E 65 70 75 62 65 65 65 34 3A 6E 61 6D 65 36 31 3A 57 61 72 63 72 61 66
74 5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65 20 4E 6F 76 65 6C 69
7A 61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69 65 20 47 6F 6C 64 65
6E 20 45 50 55 42 31 32 3A 70 69 65 63 65 20 6C 65 6E 67 74 68 69 31 30
34 38 35 37 36 65 36 3A 70 69 65 63 65 73 32 30 3A 43 92 4C 22 BB 42 9E
EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65
And then take the SHA-1 hash to generate a 160-bit (20-byte) digest:
7EDA978ED7628595BB91C48B947F025BAE78CB77
Which is the right answer:

Here's how to pull the pertinent segment of a *.torrent datum for a bittorrent “info hash”.
I made this for an example.
0000000: 6438 3A61 6E6E 6F75 6E63 6530 3A31 303A d8:announce0:10:
0000010: 6372 6561 7465 6420 6279 3133 3A6D 6B74 created by13:mkt
0000020: 6F72 7265 6E74 2031 2E30 3133 3A63 7265 orrent 1.013:cre
0000030: 6174 696F 6E20 6461 7465 6931 3537 3037 ation datei15707
0000040: 3530 3238 3565 343A 696E 666F 6436 3A6C 50285e4:infod6:l
0000050: 656E 6774 6869 3230 6534 3A6E 616D 6534 engthi20e4:name4
0000060: 3A70 7269 7631 323A 7069 6563 6520 6C65 :priv12:piece le
0000070: 6E67 7468 6932 3632 3134 3465 363A 7069 ngthi262144e6:pi
0000080: 6563 6573 3230 3AF1 D7EE 4236 3434 D06F eces20:...B644.o
0000090: 27C4 BBAD 87F0 F089 7A22 2B37 3A70 7269 '.......z"+7:pri
00000a0: 7661 7465 6931 6565 65 vatei1eee
The content of the “info” key is between (inclusive) offsets 0x4D and 0xA7. So…
#!/crit/shell/bsh
bbe \
-e '
d 0x0 0x4C ;
d 0xA8 * ;
' \
${example} \
|
shasum -a 1 -b
You should see this:
1799a58b9f8ff2b9b9bcecd0d438c5f37f19a31c *-
Here is the xxd output, in–lieu of shasum, for more elucidation:
0000000: 6436 3A6C 656E 6774 6869 3230 6534 3A6E d6:lengthi20e4:n
0000010: 616D 6534 3A70 7269 7631 323A 7069 6563 ame4:priv12:piec
0000020: 6520 6C65 6E67 7468 6932 3632 3134 3465 e lengthi262144e
0000030: 363A 7069 6563 6573 3230 3AF1 D7EE 4236 6:pieces20:...B6
0000040: 3434 D06F 27C4 BBAD 87F0 F089 7A22 2B37 44.o'.......z"+7
0000050: 3A70 7269 7661 7465 6931 6565 :privatei1ee
You can refer to The BitTorrent Protocol Specification for an explanation, albeit a terse and rather grammatically inelegant one, as to their nomenclature and why the final 0x65 needs be excluded.
Concisely: the entire datum is encased in a pair of US-ASCII ‘d’ and ‘e’; the content of the “info” key, or field, is similarly so encased. You want everything between the first 0x64 — ‘d’, — which succeeds the US-ASCII string 4:info, and the terminal 0x65 — ‘e’, — which is paired with the aforementioned 0x64.

Related

Why can't I find line with two character with select-line [duplicate]

This question already has answers here:
Powershell - Strange WSL output string encoding
(4 answers)
Closed last month.
To find every line with that "-" from the command wsl --help, theses lines work
wsl --help | Select-String -Pattern "-"
wsl --help | Select-String "-"
Now I try with more complicated pattern: "--"
wsl --help | Select-String -Pattern "--"
wsl --help | Select-String "--"
Nothing is return although there is line with this pattern. Why?
updated:
wsl --help | Select-String "--" -SimpleMatch
doesn't work either
Yep, wsl outputs utf16le or unicode. Even bytes are null.
wsl --help | select -first 1 | format-hex
Label: String (System.String) <09F5DDB6>
Offset Bytes Ascii
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
------ ----------------------------------------------- -----
0000000000000000 43 00 6F 00 70 00 79 00 72 00 69 00 67 00 68 00 C o p y r i g h
0000000000000010 74 00 20 00 28 00 63 00 29 00 20 00 4D 00 69 00 t ( c ) M i
0000000000000020 63 00 72 00 6F 00 73 00 6F 00 66 00 74 00 20 00 c r o s o f t
0000000000000030 43 00 6F 00 72 00 70 00 6F 00 72 00 61 00 74 00 C o r p o r a t
0000000000000040 69 00 6F 00 6E 00 2E 00 20 00 41 00 6C 00 6C 00 i o n . A l l
0000000000000050 20 00 72 00 69 00 67 00 68 00 74 00 73 00 20 00 r i g h t s
0000000000000060 72 00 65 00 73 00 65 00 72 00 76 00 65 00 64 00 r e s e r v e d
0000000000000070 2E 00 .
"`0" means null. In powershell 7, the matches are highlighted.
wsl --help | Select-String -Pattern "-`0-" | select -first 1
--exec, -e <CommandLine>

How to correct encoding that went wrong

I have a VBscript code that processes utf-8 files.
It works perfectly. Except there is a problem with the source files in that sometimes the input (despite the script is clearly labeled to be used for utf-8 files) is unicode-LE.
This then creates a corrupt output of course. I am putting in a check for the BOM to ensure no Unicode-LE files are opened incorrectly. But I already have files that got corrupted this way.
Is there a way to seamlessly revert the damage? Meaning to read it back "incorrectly" and saving it correctly?
Here is the code:
Private Sub UnicodeToUTF8(ByVal InFName, ByVal OutFName)
Dim strText
With CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile InFName
.Type = adTypeText
.Charset = "utf-8"
'Read Unicode source file
strText = .ReadText(adReadAll)
'Process file
strText = OffsetTCs(strText)
'Output UTF-8 file
.Position = 0
.SetEOS
.Charset = "utf-8"
.WriteText strText, adWriteChar
.SaveToFile OutFName, adSaveCreateOverWrite
.Close
End With
End Sub
Edit:
I tried this script to save the day, but it reports an error on the file.Write data line. It does show the ASCII content properly in the message box, but not the Chinese characters:
Dim fso, file, data
Set fso = CreateObject("Scripting.FileSystemObject")
Set file = fso.OpenTextFile("damaged_Test.sub", 1, False, -1)
data = ""
data = file.ReadAll
MsgBox(data)
Set file = fso.OpenTextFile("output.txt", 2, True)
file.Write data
Here is the hex dump of the damaged file:
EF BB BF EF BF BD EF BF BD 5B 00 53 00 63 00 72 00 69 00 70 00 74 00 20 00 49 00 6E 00 66 00 6F 00 5D 00 0D 00 0A 00 3B 00 0D 00 0A 00 54 00 69 00 74 00 6C 00 65 00 3A 00 20 00 20 00 28 00 29 00 0D 00 0A 00 4F 00 72 00 69 00 67 00 69 00 6E 00 61 00 6C 00 20 00 53 00 63 00 72 00 69 00 70 00 74 00 3A 00 20 00 0D 00 0A 00 4F 00 72 00 69 00 67 00 69 00 6E 00 61 00 6C 00 20 00 54 00 69 00 6D 00 69 00 6E 00 67 00 3A 00 20 00 0D 00 0A 00 53 00 63 00 72 00 69 00 70 00 74 00 54 00 79 00 70 00 65 00 3A 00 20 00 76 00 34 00 2E 00 30 00 38 00 0D 00 0A 00 43 00 6F 00 6C 00 6C 00 69 00 73 00 69 00 6F 00 6E 00 73 00 3A 00 20 00 4E 00 6F 00 72 00 6D 00 61 00 6C 00 0D 00 0A 00 50 00 6C 00 61 00 79 00 52 00 65 00 73 00 58 00 3A 00 20 00 31 00 32 00 38 00 30 00 0D 00 0A 00 50 00 6C 00 61 00 79 00 52 00 65 00 73 00 59 00 3A 00 20 00 37 00 32 00 30 00 0D 00 0A 00 50 00 6C 00 61 00 79 00 44 00 65 00 70 00 74 00 68 00 3A 00 20 00 30 00 0D 00 0A 00 54 00 69 00 6D 00 65 00 72 00 3A 00 20 00 31 00 30 00 30 00 2E 00 30 00 30 00 30 00 30 00 0D 00 0A 00 0D 00 0A 00 5B 00 56 00 34 00 20 00 53 00 74 00 79 00 6C 00 65 00 73 00 5D 00 0D 00 0A 00 46 00 6F 00 72 00 6D 00 61 00 74 00 3A 00 20 00 4E 00 61 00 6D 00 65 00 2C 00 20 00 46 00 6F 00 6E 00 74 00 6E 00 61 00 6D 00 65 00 2C 00 20 00 46 00 6F 00 6E 00 74 00 73 00 69 00 7A 00 65 00 2C 00 20 00 50 00 72 00 69 00 6D 00 61 00 72 00 79 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 53 00 65 00 63 00 6F 00 6E 00 64 00 61 00 72 00 79 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 54 00 65 00 72 00 74 00 69 00 61 00 72 00 79 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 42 00 61 00 63 00 6B 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 42 00 6F 00 6C 00 64 00 2C 00 20 00 49 00 74 00 61 00 6C 00 69 00 63 00 2C 00 20 00 42 00 6F 00 72 00 64 00 65 00 72 00 53 00 74 00 79 00 6C 00 65 00 2C 00 20 00 4F 00 75 00 74 00 6C 00 69 00 6E 00 65 00 2C 00 20 00 53 00 68 00 61 00 64 00 6F 00 77 00 2C 00 20 00 41 00 6C 00 69 00 67 00 6E 00 6D 00 65 00 6E 00 74 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 4C 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 52 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 56 00 2C 00 20 00 41 00 6C 00 70 00 68 00 61 00 4C 00 65 00 76 00 65 00 6C 00 2C 00 20 00 45 00 6E 00 63 00 6F 00 64 00 69 00 6E 00 67 00 0D 00 0A 00 53 00 74 00 79 00 6C 00 65 00 3A 00 20 00 50 00 75 00 62 00 6C 00 69 00 63 00 2C 00 44 00 46 00 4B 00 61 00 69 00 53 00 68 00 75 00 20 00 53 00 74 00 64 00 20 00 57 00 35 00 2C 00 35 00 32 00 2C 00 31 00 32 00 35 00 37 00 31 00 38 00 37 00 32 00 2C 00 31 00 32 00 35 00 37 00 31 00 38 00 37 00 32 00 2C 00 31 00 32 00 35 00 37 00 31 00 38 00 37 00 32 00 2C 00 2D 00 32 00 31 00 34 00 37 00 34 00 38 00 33 00 36 00 34 00 30 00 2C 00 2D 00 31 00 2C 00 30 00 2C 00 31 00 2C 00 33 00 2C 00 33 00 2C 00 32 00 2C 00 31 00 32 00 38 00 2C 00 31 00 32 00 38 00 2C 00 37 00 32 00 2C 00 30 00 2C 00 31 00 33 00 36 00 0D 00 0A 00 0D 00 0A 00 5B 00 45 00 76 00 65 00 6E 00 74 00 73 00 5D 00 0D 00 0A 00 46 00 6F 00 72 00 6D 00 61 00 74 00 3A 00 20 00 4D 00 61 00 72 00 6B 00 65 00 64 00 2C 00 20 00 53 00 74 00 61 00 72 00 74 00 2C 00 20 00 45 00 6E 00 64 00 2C 00 20 00 53 00 74 00 79 00 6C 00 65 00 2C 00 20 00 4E 00 61 00 6D 00 65 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 4C 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 52 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 56 00 2C 00 20 00 45 00 66 00 66 00 65 00 63 00 74 00 2C 00 20 00 54 00 65 00 78 00 74 00 0D 00 0A 00 44 00 69 00 61 00 6C 00 6F 00 67 00 75 00 65 00 3A 00 20 00 4D 00 61 00 72 00 6B 00 65 00 64 00 3D 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 30 00 38 00 2E 00 36 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 31 00 31 00 2E 00 31 00 30 00 2C 00 50 00 75 00 62 00 6C 00 69 00 63 00 2C 00 30 00 31 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 2C 00 EF BF BD 65 74 5E EF BF BD 5F 02 6A 0D 00 0A 00 44 00 69 00 61 00 6C 00 6F 00 67 00 75 00 65 00 3A 00 20 00 4D 00 61 00 72 00 6B 00 65 00 64 00 3D 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 31 00 31 00 2E 00 32 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 31 00 35 00 2E 00 31 00 33 00 2C 00 50 00 75 00 62 00 6C 00 69 00 63 00 2C 00 30 00 31 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 2C 00 EF BF BD 65 74 5E EF BF BD 5F 02 6A 0D 00 0A

Output binary data on PowerShell pipeline

I need to pipe some data to a program's stdin:
First 4 bytes are a 32-bit unsigned int representing the length of the data. These 4 bytes are exactly the same as C would store an unsigned int in memory. I refer to this as binary data.
Remaining bytes are the data.
In C, this is trivial:
WriteFile(h, &cb, 4); // cb is a 4 byte integer
WriteFile(h, pData, cb);
or
fwrite(&cb, sizeof(cb), 1, pFile);
fwrite(pData, cb, 1, pFile);
or in C# you would use a BinaryWriter (I think this code is right, i don't have C# lying around right now...)
Bw.Write((int)Data.Length);
Bw.Write(Data, 0, Data.Length);
In PowerShell I'm sure it's possible, but this is as close as I could get. This is obviously printing out the 4 bytes of the size as 4 human readable numbers:
$file = "c:\test.txt"
Set-content $file "test data" -encoding ascii
[int]$size = (Get-ChildItem $file).Length
$bytes = [System.BitConverter]::GetBytes($size)
$data = Get-content $file
$bytes
$data
11
0
0
0
test data
I need the binary data sent out on the pipe to look like this (\xA is the escaped representation of a non printable character, I don't want '\' in my output, I want the BYTE that '\xA' represents in the output) :
\xA\x0\x0\0test data
I don't know how to write a byte array out the pipeline in binary format. I also don't know how to get rid of the carriage returns.
EDIT:
I have found that I can do this:
$file = "c:\test.txt"
Set-content $file "test data" -encoding ascii
"File: ""{0}""" -f (Get-content $file)
[int]$size = (Get-ChildItem $file).Length
"Size: " + $size
$bytes = [System.BitConverter]::GetBytes($size)
"Bytes: " + $bytes
$data = Get-content $file
$file1 = "c:\test1.txt"
Set-content $file1 $bytes -encoding byte
Add-Content $file1 $data -encoding ASCII
"File: ""{0}""" -f (Get-content $file1)
"Size: " + (Get-ChildItem $file1).Length
File: "test data"
Size: 11
Bytes: 11 0 0 0
File: " test data"
Size: 15
But this requires me to build a temporary file. There must be a better way!
EDIT:
That solution above, corrupts any character code > 127. There is no "binary" encoding mode for the pipe.
EDIT:
I finally discovered a roundabout way to get a BinaryWriter wired up to an application's stdin. See my answer.
Bill_Stewart is correct that you can't pipe binary data. When you use the | operator, PowerShell uses the encoding dictated by $OutputEncoding. I could not find an encoding that would not corrupt data.
I found something that does work though, BinaryWriter.
Here is my test code, starting with C:\foo.exe that simply outputs the data it receives:
#include <windows.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
HANDLE hInput = GetStdHandle(STD_INPUT_HANDLE);
BYTE aBuf[0x100];
int nRet;
DWORD cbRead;
if (!(nRet = ReadFile(hInput, aBuf, 256, &cbRead, NULL)))
return printf("err: %u %d %d", cbRead, nRet, GetLastError());
for (int i=0 ; i<256 ; ++i)
printf("%d ", aBuf[i]);
return 0;
}
This PowerShell script demonstrates the "corruption":
$data = [Byte[]] (0..255)
$prefix = ($data | ForEach-Object {
$_ -as [Char]
}) -join ""
"{0}" -f $prefix
$OutputEncoding = [System.Text.Encoding]::GetEncoding("us-ascii")
$prefix | c:\foo.exe
Here is the output. First you see that $prefix does have the complete charset. Second, you see the data that got to foo.exe has been converted.
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
 ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 5
0 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 9
7 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 63 63 63 63 63 63 63
63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63
63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63
63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63
Using BinaryWriter works:
$data = [Byte[]] (0..255)
$ProcessInfo = New-Object System.Diagnostics.ProcessStartInfo
$ProcessInfo.FileName = "C:\foo.exe"
$ProcessInfo.RedirectStandardInput = $true
$ProcessInfo.RedirectStandardOutput = $true
$ProcessInfo.UseShellExecute = $false
$Proc = New-Object System.Diagnostics.Process
$Proc.StartInfo = $ProcessInfo
$Proc.Start() | Out-Null
$Writer = New-Object System.IO.BinaryWriter($proc.StandardInput.BaseStream);
$Writer.Write($data, 0, $data.length)
$Writer.Flush()
$Writer.Close()
$Proc.WaitForExit()
$Proc.StandardOutput.ReadToEnd()
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 5
0 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 9
7 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 1
33 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 16
8 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
So, my final script which writes the length in binary before writing the data file, would look something like this:
$data = [Byte[]] (0..255)
$ProcessInfo = New-Object System.Diagnostics.ProcessStartInfo
$ProcessInfo.FileName = "C:\foo.exe"
$ProcessInfo.RedirectStandardInput = $true
$ProcessInfo.RedirectStandardOutput = $true
$ProcessInfo.UseShellExecute = $false
$Proc = New-Object System.Diagnostics.Process
$Proc.StartInfo = $ProcessInfo
$Proc.Start() | Out-Null
$Writer = New-Object System.IO.BinaryWriter($proc.StandardInput.BaseStream);
$Writer.Write([Int32]$data.length)
$Writer.Write($data, 0, $data.length)
$Writer.Flush()
$Writer.Close()
$Proc.WaitForExit()
$Proc.StandardOutput.ReadToEnd()
You can see the first 4 bytes 0 1 0 0 are the raw binary representation of an [Int32] that is equal to 256:
0 1 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 1
31 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 16
6 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251
I need to pipe some data to a program's stdin.
You can indeed cause many problems when using different encodings. Here is a different approach, without any encoding applied by Get-/Set-Content.
You can actually pipe binary data to an external program by using the Start-Process cmdlet:
Start-Process my.exe -RedirectStandardInput my.bin
Works at least since PowerShell 2.0.
Would this work for you?
$fileName = "C:\test.txt"
$data = [IO.File]::ReadAllText($fileName)
$prefix = ([BitConverter]::GetBytes($data.Length) | foreach-object {
"\x{0:X2}" -f $_
}) -join ""
"{0}{1}" -f $prefix,$data
You can replace "\x{0:X2}" -f $_ with $_ -as [Char] if you want $prefix to contain the raw data representations of the bytes.
[System.Console]::OpenStandardOutput().Write($bytes, 0, $bytes.Length)
Shorter sample using a binaryWriter:
$file = 'c:\temp\test.txt'
$test = [byte[]](0..255)
$mode = [System.IO.FileMode]::Create
$stream = [System.IO.File]::Open($file, $mode)
$bw = [System.IO.BinaryWriter]::new($stream)
$bw.Write($test)
$bw.Flush()
$bw.Dispose()
$stream.Dispose()

How to use the output of the 'script' command in a pipe with sed

Following this answer here I'm trying to use the 'script' command to unbuffer output for use with a pipe. But it's not working as I'd expect.
I have the following file:
$ cat test.txt
first line
second line
third line
Now, when I run two following two commands I expect their outputs to be the same, but they are not:
$ cat test.txt | sed -n '{s/^\(.*\)$/\^\1\$/;p;}'
^first line$
^second line$
^third line$
$ script -c "cat test.txt" -q /dev/null | sed -n '{s/^\(.*\)$/\^\1\$/;p;}'
$first line
$^second line
$^third line
The output of the first command is the expected output. How can the output of the second command be explained?
As script is emulating a terminal it converts linefeed characters (\n) to carriage return/linefeed sequences (\r\n). OTOH, sed interpretes carriage returns as part of a line and inserts '$' after it. Then when this is output to a terminal, it interprets carriage returns by moving cursor to the start of the line and continuing output there.
You can see this by piping output to hexdump -C. First compare cat and script output:
$ cat test.txt | hexdump -C
00000000 66 69 72 73 74 20 6c 69 6e 65 0a 73 65 63 6f 6e |first line.secon|
00000010 64 20 6c 69 6e 65 0a 74 68 69 72 64 20 6c 69 6e |d line.third lin|
00000020 65 0a |e.|
00000022
$ script -c "cat test.txt" -q /dev/null | hexdump -C | cat
00000000 66 69 72 73 74 20 6c 69 6e 65 0d 0a 73 65 63 6f |first line..seco|
00000010 6e 64 20 6c 69 6e 65 0d 0a 74 68 69 72 64 20 6c |nd line..third l|
00000020 69 6e 65 0d 0a |ine..|
00000025
Then compare output piped through sed:
$ cat test.txt | sed -n 's/^\(.*\)$/\^\1\$/;p;' | hexdump -C
00000000 5e 66 69 72 73 74 20 6c 69 6e 65 24 0a 5e 73 65 |^first line$.^se|
00000010 63 6f 6e 64 20 6c 69 6e 65 24 0a 5e 74 68 69 72 |cond line$.^thir|
00000020 64 20 6c 69 6e 65 24 0a |d line$.|
00000028
$ script -c "cat test.txt" -q /dev/null | sed -n 's/^\(.*\)$/\^\1\$/;p;' | hexdump -C
00000000 5e 66 69 72 73 74 20 6c 69 6e 65 0d 24 0a 5e 73 |^first line.$.^s|
00000010 65 63 6f 6e 64 20 6c 69 6e 65 0d 24 0a 5e 74 68 |econd line.$.^th|
00000020 69 72 64 20 6c 69 6e 65 0d 24 0a |ird line.$.|
0000002b
So, when script | sed outputs this to a terminal:
$first line
$^second line
$^third line
This is what happens:
"^first line" is output, cursor is at the end of the line
"\r" is output, cursor moves to the start of the line (column 0)
"$" is output, overwriting "^" and moving cursor to column 1
"\n" is output, moving cursor to the next line, but leaving it in column 1
"^second line" is output starting from column 1 (no character at column 0 at that moment), cursor is at the end of the line
"\r" is output, moving cursor to the start of the line (column 0)
"$" is output at the column 0, moving cursor to column 1
"\n" is output, moving cursor to the next line, but leaving it in column 1
and so on
If you still want to use script, remove \r characters. Like this:
script -c "cat test.txt" -q /dev/null | sed -n 's/\r//; s/^\(.*\)$/\^\1\$/;p;'
Note, that you will still see "staircase" output on the terminal even though the sed output is alright. I'm not sure why that happens, probably script is modifying terminal settings. The "staircase" effect disappears, if you pipe the output through "cat", for example.
This might work for you:
script -c"cat test.txt |sed 's/.*/^&$/'" -q /dev/null
or better still:
script -c"sed 's/.*/^&$/' test.txt" -q /dev/null
N.B. the entire script is passed to script.

Match over multiple lines perl regular expression

I have a file like this:
01 00 01 14 c0 00 01 10 01 00 00 16 00 00 00 64
00 00 00 65 00 00 01 07 40 00 00 22 68 61 6c 2e
6f 70 65 6e 65 74 2e 63 6f 6d 3b 30 30 30 30 30
30 30 30 32 3b 30 00 00 00 00 01 08 40 00 00 1e
68 61 6c 2e 6f 70 65 6e 65 74 2d 74 65 6c 65 63
6f 6d 2e 6c 61 6e 00 00 00 00 01 28 40 00 00 21
72 65 61 6c 6d 31 2e 6f 70 65 6e 65 74 2d 74 65
6c 65 63 6f 6d 2e 6c 61 6e 00 00 00 00 00 01 25
40 00 00 1e 68 61 6c 2e 6f 70 65 6e 65 74 2d 74
65 6c 65 63 6f 6d 2e 6c 61 6e 00 00 00 00 01 1b
40 00 00 20 72 65 61 6c 6d 2e 6f 70 65 6e 65 74
2d 74 65 6c 65 63 6f 6d 2e 6c 61 6e 00 00 01 02
40 00 00 0c 01 00 00 16 00 00 01 a0 40 00 00 0c
00 00 00 01 00 00 01 9f 40 00 00 0c 00 00 00 00
00 00 01 16 40 00 00 0c 00 00 00 00 00 00 01 bb
40 00 00 28 00 00 01 c2 40 00 00 0c 00 00 00 00
00 00 01 bc 40 00 00 13 31 39 37 37 31 31 31 32
32 33 31 00
I am reading the file and then finding certain octets and replacing them with tags:
while(<FH>){
$line =~ s/(00 00 00 64)/<incr4> /g;
$line =~ s/(00 00 00 65)/<incr4> /g;
$line =~ s/(30 30 30 30 30 32)/<incr6ascii:999999:0>/g;
$line =~ s/(31 31 32 32 33 31)/<incr6ascii:999999:0>/g;
print OUTPUT $line;
}
So for example, 00 00 00 64 would be replaced by the <incr4> tag. This was working fine, but it doesn't seem to able to match over multiple lines any more. For example the pattern 31 31 32 32 33 31 runs over multiple lines, and the regular expression doesn't seem to catch it. I tried using /m /s pattern modifiers to ignore new lines but they didn't match it either. The only way around it I can come up with, is to read the whole file into a string using:
undef $/;
my $whole_file = <FH>;
my $line = $whole_file;
$line =~ s/(00 00 00 64)/<incr4> /g;
$line =~ s/(00 00 00 65)/<incr4> /g;
$line =~ s/(30 30 30 30 30 32)/<incr6ascii:999999:0>/g;
$line =~ s/(31 31 32 32 33 31)/<incr6ascii:999999:0>/g;
print OUTPUT $line;
This works, the tags get inserted correctly, but the structure of the file is radically altered. It is all dumped out on a single line. I would like to retain the structure of the file as it appears here. Any ideas as to how I might do this?
/john
The trick here is to match the class of all space like characters \s:
my $file = do {local (#ARGV, $/) = 'filename.txt'; <>}; # slurp file
my %tr = ( # setup a translation table
'00 00 00 64' => '<incr4>',
'00 00 00 65' => '<incr4>',
'00 30 30 30 30 32' => '<incr6ascii:999999:0>',
'31 31 32 32 33 31' => '<incr6ascii:999999:0>',
);
for (keys %tr) {
my $re = join '\s+' => split; # construct new regex
$file =~ s{($re)}{
$1 =~ /\n/ ? "\n$tr{$_}" : $tr{$_} # if octets contained \n, add \n
}ge # match multiple times, execute the replacement block as perl code
}
print $file;