change the character NUL by enter in batch - powershell

I am working on a bat file. I have a .txt file like this -
1: 04000025000 00000008,37 NULNULNULNUL
2: 04000455000 0465346000008,37 NULNULNULNUL
I want to delete all NUL for have -
1: 04000025000 00000008,37
2: 04000455000 0465346000008,37
Coud you please help me ?
The character for NUL is \x00
I need a bat code, which looks for the character x00 in the txt file and replaces it with the character to go to the next line.
Like this:
powershell -Command "(MMAI_CONTRAT_20180519.txt) -replace %\x00% , '\n' | Out-File test.txt"

There is a Get-Content missing in your command.
Provided the text file looks like this before
> (Get-Content MMAI_CONTRAT_20180519.txt)|Format-Hex
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 31 3A 20 20 30 34 30 30 30 30 32 35 30 30 30 20 1: 04000025000
00000010 20 20 20 20 20 20 20 20 20 30 30 30 30 30 30 30 0000000
00000020 38 2C 33 37 20 00 00 00 00 32 3A 20 20 30 34 30 8,37 ....2: 040
00000030 30 30 34 35 35 30 30 30 20 20 20 20 20 20 20 20 00455000
00000040 20 20 30 34 36 35 33 34 36 30 30 30 30 30 38 2C 0465346000008,
00000050 33 37 20 00 00 00 00 37 ....
This should do the job:
> (Get-Content MMAI_CONTRAT_20180519.txt) -replace '\x00+', "`r`n"|Format-Hex
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 31 3A 20 20 30 34 30 30 30 30 32 35 30 30 30 20 1: 04000025000
00000010 20 20 20 20 20 20 20 20 20 30 30 30 30 30 30 30 0000000
00000020 38 2C 33 37 20 0D 0A 32 3A 20 20 30 34 30 30 30 8,37 ..2: 04000
00000030 34 35 35 30 30 30 20 20 20 20 20 20 20 20 20 20 455000
00000040 30 34 36 35 33 34 36 30 30 30 30 30 38 2C 33 37 0465346000008,37
00000050 20 0D 0A ..
The command escaped and wrapped in batch:
powershell -nop -c "(Get-Content MMAI_CONTRAT_20180519.txt) -replace '\x00+', \"`r`n\"| Out-File test.txt"

Related

Why can't I find line with two character with select-line [duplicate]

This question already has answers here:
Powershell - Strange WSL output string encoding
(4 answers)
Closed last month.
To find every line with that "-" from the command wsl --help, theses lines work
wsl --help | Select-String -Pattern "-"
wsl --help | Select-String "-"
Now I try with more complicated pattern: "--"
wsl --help | Select-String -Pattern "--"
wsl --help | Select-String "--"
Nothing is return although there is line with this pattern. Why?
updated:
wsl --help | Select-String "--" -SimpleMatch
doesn't work either
Yep, wsl outputs utf16le or unicode. Even bytes are null.
wsl --help | select -first 1 | format-hex
Label: String (System.String) <09F5DDB6>
Offset Bytes Ascii
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
------ ----------------------------------------------- -----
0000000000000000 43 00 6F 00 70 00 79 00 72 00 69 00 67 00 68 00 C o p y r i g h
0000000000000010 74 00 20 00 28 00 63 00 29 00 20 00 4D 00 69 00 t ( c ) M i
0000000000000020 63 00 72 00 6F 00 73 00 6F 00 66 00 74 00 20 00 c r o s o f t
0000000000000030 43 00 6F 00 72 00 70 00 6F 00 72 00 61 00 74 00 C o r p o r a t
0000000000000040 69 00 6F 00 6E 00 2E 00 20 00 41 00 6C 00 6C 00 i o n . A l l
0000000000000050 20 00 72 00 69 00 67 00 68 00 74 00 73 00 20 00 r i g h t s
0000000000000060 72 00 65 00 73 00 65 00 72 00 76 00 65 00 64 00 r e s e r v e d
0000000000000070 2E 00 .
"`0" means null. In powershell 7, the matches are highlighted.
wsl --help | Select-String -Pattern "-`0-" | select -first 1
--exec, -e <CommandLine>

File size is double while replacing value in variable and output with a different name

When replacing a value from a base text, and outputting the file, the file size double, rather than 4kb to 8kb.
$t=3
$F=30
Do{
$t = $t+1
$F=$F+10
$y = (Get-Content -Path D:\test.php).Replace("YU9","$F")
$y | Out-File D:\Test\delivery$t.php -Force
}
until($t -eq 50)
Right, powershell 5.1 out-file defaults to utf16 or unicode, so it's twice as big as ascii or utf8. Unicode has nulls inbetween each letter. The first two bytes are the BOM. "0D 0A" is carriage return and linefeed.
'abcde' | set-content file
'abcde' | out-file file2
format-hex file
Path: C:\Users\js\foo\file
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 61 62 63 64 65 0D 0A abcde..
format-hex file2
Path: C:\Users\js\foo\file2
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 FF FE 61 00 62 00 63 00 64 00 65 00 0D 00 0A 00 .þa.b.c.d.e.....

How to correct encoding that went wrong

I have a VBscript code that processes utf-8 files.
It works perfectly. Except there is a problem with the source files in that sometimes the input (despite the script is clearly labeled to be used for utf-8 files) is unicode-LE.
This then creates a corrupt output of course. I am putting in a check for the BOM to ensure no Unicode-LE files are opened incorrectly. But I already have files that got corrupted this way.
Is there a way to seamlessly revert the damage? Meaning to read it back "incorrectly" and saving it correctly?
Here is the code:
Private Sub UnicodeToUTF8(ByVal InFName, ByVal OutFName)
Dim strText
With CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile InFName
.Type = adTypeText
.Charset = "utf-8"
'Read Unicode source file
strText = .ReadText(adReadAll)
'Process file
strText = OffsetTCs(strText)
'Output UTF-8 file
.Position = 0
.SetEOS
.Charset = "utf-8"
.WriteText strText, adWriteChar
.SaveToFile OutFName, adSaveCreateOverWrite
.Close
End With
End Sub
Edit:
I tried this script to save the day, but it reports an error on the file.Write data line. It does show the ASCII content properly in the message box, but not the Chinese characters:
Dim fso, file, data
Set fso = CreateObject("Scripting.FileSystemObject")
Set file = fso.OpenTextFile("damaged_Test.sub", 1, False, -1)
data = ""
data = file.ReadAll
MsgBox(data)
Set file = fso.OpenTextFile("output.txt", 2, True)
file.Write data
Here is the hex dump of the damaged file:
EF BB BF EF BF BD EF BF BD 5B 00 53 00 63 00 72 00 69 00 70 00 74 00 20 00 49 00 6E 00 66 00 6F 00 5D 00 0D 00 0A 00 3B 00 0D 00 0A 00 54 00 69 00 74 00 6C 00 65 00 3A 00 20 00 20 00 28 00 29 00 0D 00 0A 00 4F 00 72 00 69 00 67 00 69 00 6E 00 61 00 6C 00 20 00 53 00 63 00 72 00 69 00 70 00 74 00 3A 00 20 00 0D 00 0A 00 4F 00 72 00 69 00 67 00 69 00 6E 00 61 00 6C 00 20 00 54 00 69 00 6D 00 69 00 6E 00 67 00 3A 00 20 00 0D 00 0A 00 53 00 63 00 72 00 69 00 70 00 74 00 54 00 79 00 70 00 65 00 3A 00 20 00 76 00 34 00 2E 00 30 00 38 00 0D 00 0A 00 43 00 6F 00 6C 00 6C 00 69 00 73 00 69 00 6F 00 6E 00 73 00 3A 00 20 00 4E 00 6F 00 72 00 6D 00 61 00 6C 00 0D 00 0A 00 50 00 6C 00 61 00 79 00 52 00 65 00 73 00 58 00 3A 00 20 00 31 00 32 00 38 00 30 00 0D 00 0A 00 50 00 6C 00 61 00 79 00 52 00 65 00 73 00 59 00 3A 00 20 00 37 00 32 00 30 00 0D 00 0A 00 50 00 6C 00 61 00 79 00 44 00 65 00 70 00 74 00 68 00 3A 00 20 00 30 00 0D 00 0A 00 54 00 69 00 6D 00 65 00 72 00 3A 00 20 00 31 00 30 00 30 00 2E 00 30 00 30 00 30 00 30 00 0D 00 0A 00 0D 00 0A 00 5B 00 56 00 34 00 20 00 53 00 74 00 79 00 6C 00 65 00 73 00 5D 00 0D 00 0A 00 46 00 6F 00 72 00 6D 00 61 00 74 00 3A 00 20 00 4E 00 61 00 6D 00 65 00 2C 00 20 00 46 00 6F 00 6E 00 74 00 6E 00 61 00 6D 00 65 00 2C 00 20 00 46 00 6F 00 6E 00 74 00 73 00 69 00 7A 00 65 00 2C 00 20 00 50 00 72 00 69 00 6D 00 61 00 72 00 79 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 53 00 65 00 63 00 6F 00 6E 00 64 00 61 00 72 00 79 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 54 00 65 00 72 00 74 00 69 00 61 00 72 00 79 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 42 00 61 00 63 00 6B 00 43 00 6F 00 6C 00 6F 00 75 00 72 00 2C 00 20 00 42 00 6F 00 6C 00 64 00 2C 00 20 00 49 00 74 00 61 00 6C 00 69 00 63 00 2C 00 20 00 42 00 6F 00 72 00 64 00 65 00 72 00 53 00 74 00 79 00 6C 00 65 00 2C 00 20 00 4F 00 75 00 74 00 6C 00 69 00 6E 00 65 00 2C 00 20 00 53 00 68 00 61 00 64 00 6F 00 77 00 2C 00 20 00 41 00 6C 00 69 00 67 00 6E 00 6D 00 65 00 6E 00 74 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 4C 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 52 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 56 00 2C 00 20 00 41 00 6C 00 70 00 68 00 61 00 4C 00 65 00 76 00 65 00 6C 00 2C 00 20 00 45 00 6E 00 63 00 6F 00 64 00 69 00 6E 00 67 00 0D 00 0A 00 53 00 74 00 79 00 6C 00 65 00 3A 00 20 00 50 00 75 00 62 00 6C 00 69 00 63 00 2C 00 44 00 46 00 4B 00 61 00 69 00 53 00 68 00 75 00 20 00 53 00 74 00 64 00 20 00 57 00 35 00 2C 00 35 00 32 00 2C 00 31 00 32 00 35 00 37 00 31 00 38 00 37 00 32 00 2C 00 31 00 32 00 35 00 37 00 31 00 38 00 37 00 32 00 2C 00 31 00 32 00 35 00 37 00 31 00 38 00 37 00 32 00 2C 00 2D 00 32 00 31 00 34 00 37 00 34 00 38 00 33 00 36 00 34 00 30 00 2C 00 2D 00 31 00 2C 00 30 00 2C 00 31 00 2C 00 33 00 2C 00 33 00 2C 00 32 00 2C 00 31 00 32 00 38 00 2C 00 31 00 32 00 38 00 2C 00 37 00 32 00 2C 00 30 00 2C 00 31 00 33 00 36 00 0D 00 0A 00 0D 00 0A 00 5B 00 45 00 76 00 65 00 6E 00 74 00 73 00 5D 00 0D 00 0A 00 46 00 6F 00 72 00 6D 00 61 00 74 00 3A 00 20 00 4D 00 61 00 72 00 6B 00 65 00 64 00 2C 00 20 00 53 00 74 00 61 00 72 00 74 00 2C 00 20 00 45 00 6E 00 64 00 2C 00 20 00 53 00 74 00 79 00 6C 00 65 00 2C 00 20 00 4E 00 61 00 6D 00 65 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 4C 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 52 00 2C 00 20 00 4D 00 61 00 72 00 67 00 69 00 6E 00 56 00 2C 00 20 00 45 00 66 00 66 00 65 00 63 00 74 00 2C 00 20 00 54 00 65 00 78 00 74 00 0D 00 0A 00 44 00 69 00 61 00 6C 00 6F 00 67 00 75 00 65 00 3A 00 20 00 4D 00 61 00 72 00 6B 00 65 00 64 00 3D 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 30 00 38 00 2E 00 36 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 31 00 31 00 2E 00 31 00 30 00 2C 00 50 00 75 00 62 00 6C 00 69 00 63 00 2C 00 30 00 31 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 2C 00 EF BF BD 65 74 5E EF BF BD 5F 02 6A 0D 00 0A 00 44 00 69 00 61 00 6C 00 6F 00 67 00 75 00 65 00 3A 00 20 00 4D 00 61 00 72 00 6B 00 65 00 64 00 3D 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 31 00 31 00 2E 00 32 00 30 00 2C 00 30 00 3A 00 30 00 30 00 3A 00 31 00 35 00 2E 00 31 00 33 00 2C 00 50 00 75 00 62 00 6C 00 69 00 63 00 2C 00 30 00 31 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 30 00 30 00 30 00 30 00 2C 00 2C 00 EF BF BD 65 74 5E EF BF BD 5F 02 6A 0D 00 0A

What exactly is the info_Hash in a torrent file

I am reading lately a lot about hash from torrents, and magnetic links, etc. But there is a question I don't understand.
I have:
hash of a file
and the infohash of a torrent
Is the infohash = hash of the file ?
If yes what if the torrent describes 6 Files to download?
If no what does it stand for?
So I finally figured it out.
The “infohash” is the SHA1 Hash over the part of a torrent file that includes:
ITEM: length(size) and path (path with filename)
Name: The name to search for
Piece length: The length(size) of a single piece
Pieces: SHA1 Hash of EVERY piece of this torrent
Private: flag for restricted access
To show this a little more I took a random torrent file and used the “BEncode Editor” from Ultima to make it more clearly to me.
As you can see the the red box marked the information part of the torrent file.
The torrent file includes not the Hash of the items, but the hashes of every piece.
For item1 with: 1069496548
and item2 with: 223
It is together: 1069496771
With a piece size of: 524288
There are 2040 pieces. (1069496771/524288=2039.9032 approximately)
The pieces section includes 40800 byte of data what are 81600 + 2 chars in the file.
the +2 because 0x marks that this is hexadecimal.
A SHA1 hash has 40 0x chars or 20 Byte of data what are 2040 SHA1 hashes.
I am sorry that this information is about a torrent that leads to a illegal movie, but i wanted to use a torrent that realy exists.
I wanted to add another example, slightly more concrete.
We start with one of the smallest .torrent files i have:
64 34 3A 69 6E 66 6F 64 35 3A 66 69 6C 65 73 6C 64 36 3A 6C 65 6E 67 74
68 69 36 31 35 65 34 3A 70 61 74 68 6C 32 36 3A 66 72 65 65 20 61 75 64
69 6F 62 6F 6F 6B 20 76 65 72 73 69 6F 6E 2E 74 78 74 65 65 64 36 3A 6C
65 6E 67 74 68 69 33 39 33 34 31 37 65 34 3A 70 61 74 68 6C 36 31 3A 57
61 72 63 72 61 66 74 5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65 20
4E 6F 76 65 6C 69 7A 61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69 65
20 47 6F 6C 64 65 6E 2E 65 70 75 62 65 65 65 34 3A 6E 61 6D 65 36 31 3A
57 61 72 63 72 61 66 74 5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65
20 4E 6F 76 65 6C 69 7A 61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69
65 20 47 6F 6C 64 65 6E 20 45 50 55 42 31 32 3A 70 69 65 63 65 20 6C 65
6E 67 74 68 69 31 30 34 38 35 37 36 65 36 3A 70 69 65 63 65 73 32 30 3A
43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
If we decode the BEncoding:
64 ; DICTIONARY (d)
| 34 3A 69 6E 66 6F ; - 4:info
| 64 ; - DICTIONARY (d)
| | 35 3A 66 69 6C 65 73 ; - 5:files
| | 6C ; - LIST (l)
| | | 64 ; - DICTIONARY (d)
| | | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | | 69 36 31 35 65 ; - i615e
| | | | 34 3A 70 61 74 68 ; - 4:path
| | | | 6C ; - LIST (l)
| | | | | 32 36 3A 66 72 65 65 20 61 75 ; - 26:free audiobook verison.txt
| | | | | 64 69 6F 62 6F 6F 6B 20 76 65 ;
| | | | | 72 73 69 6F 6E 2E 74 78 74 ;
| | | | 65 ; - END (e)
| | | 65 ; - END (e)
| | | 64 ; - DICTIONARY (d)
| | | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | | 69 33 39 33 34 31 37 65 ; - i393417e
| | | | 34 3A 70 61 74 68 ; - 4:path
| | | | 6C ; - LIST (l)
| | | | | 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden.epub
| | | | | 74 5F 20 4F 66 66 69 63 69 61 ;
| | | | | 6C 20 4D 6F 76 69 65 20 4E 6F ;
| | | | | 76 65 6C 69 7A 61 74 69 6F 6E ;
| | | | | 20 62 79 20 43 68 72 69 73 74 ;
| | | | | 69 65 20 47 6F 6C 64 65 6E 2E ;
| | | | | 65 70 75 62 ;
| | | | 65 ; - END (e)
| | | 65 ; - END (e)
| | 65 ; - END (e)
| | 34 3A 6E 61 6D 65 ; - 4:name
| | 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden EPUB
| | 74 5F 20 4F 66 66 69 63 69 61 ;
| | 6C 20 4D 6F 76 69 65 20 4E 6F ;
| | 76 65 6C 69 7A 61 74 69 6F 6E ;
| | 20 62 79 20 43 68 72 69 73 74 ;
| | 69 65 20 47 6F 6C 64 65 6E 20 ;
| | 45 50 55 42 ;
| | 31 32 3A 70 69 65 63 65 20 6C ; - 12:piece length
| | 65 6E 67 74 68 ;
| | 69 31 30 34 38 35 37 36 65 ; - i10485765e
| | 36 3A 70 69 65 63 65 73 ; - 6:pieces
| | 32 30 3A 43 92 4C 22 BB 42 9E ; - 20:43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
| | EA BD FF 66 C6 79 4C 29 E4 F9 ;
| | D0 F3 B9 ;
| 65 ; - END (e)
65 ; - END (e)
Or, in pseudo-json:
{
info: {
files: [
{ length: 615, path: ["free audiobook verison.txt"] },
{ length: 393417, path: ["Warcraft_ Official Movie Novelization by Christie Golden.epub"] }
],
name: "Warcraft_ Official Movie Novelization by Christie Golden EPUB",
"piece length": 10485765,
pieces: 43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
}
}
InfoHash is hash of the info
The InfoHash is the SHA-1 hash of the info dictionary contents.
We want to take the SHA-1 hash of value of the info dictionary key:
64 ; - DICTIONARY (d)
| 35 3A 66 69 6C 65 73 ; - 5:files
| 6C ; - LIST (l)
| | 64 ; - DICTIONARY (d)
| | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | 69 36 31 35 65 ; - i615e
| | | 34 3A 70 61 74 68 ; - 4:path
| | | 6C ; - LIST (l)
| | | | 32 36 3A 66 72 65 65 20 61 75 ; - 26:free audiobook verison.txt
| | | | 64 69 6F 62 6F 6F 6B 20 76 65 ;
| | | | 72 73 69 6F 6E 2E 74 78 74 ;
| | | 65 ; - END (e)
| | 65 ; - END (e)
| | 64 ; - DICTIONARY (d)
| | | 36 3A 6C 65 6E 67 74 68 ; - 6:length
| | | 69 33 39 33 34 31 37 65 ; - i393417e
| | | 34 3A 70 61 74 68 ; - 4:path
| | | 6C ; - LIST (l)
| | | | 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden.epub
| | | | 74 5F 20 4F 66 66 69 63 69 61 ;
| | | | 6C 20 4D 6F 76 69 65 20 4E 6F ;
| | | | 76 65 6C 69 7A 61 74 69 6F 6E ;
| | | | 20 62 79 20 43 68 72 69 73 74 ;
| | | | 69 65 20 47 6F 6C 64 65 6E 2E ;
| | | | 65 70 75 62 ;
| | | 65 ; - END (e)
| | 65 ; - END (e)
| 65 ; - END (e)
| 34 3A 6E 61 6D 65 ; - 4:name
| 36 31 3A 57 61 72 63 72 61 66 ; - 61:Warcraft_ Official Movie Novelization by Christie Golden EPUB
| 74 5F 20 4F 66 66 69 63 69 61 ;
| 6C 20 4D 6F 76 69 65 20 4E 6F ;
| 76 65 6C 69 7A 61 74 69 6F 6E ;
| 20 62 79 20 43 68 72 69 73 74 ;
| 69 65 20 47 6F 6C 64 65 6E 20 ;
| 45 50 55 42 ;
| 31 32 3A 70 69 65 63 65 20 6C ; - 12:piece length
| 65 6E 67 74 68 ;
| 69 31 30 34 38 35 37 36 65 ; - i10485765e
| 36 3A 70 69 65 63 65 73 ; - 6:pieces
| 32 30 3A 43 92 4C 22 BB 42 9E ; - 20:43 92 4C 22 BB 42 9E EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65 65
| EA BD FF 66 C6 79 4C 29 E4 F9 ;
| D0 F3 B9 ;
65 ; - END (e)
We run all these bytes together:
64 35 3A 66 69 6C 65 73 6C 64 36 3A 6C 65 6E 67 74 68 69 36 31 35 65 34
3A 70 61 74 68 6C 32 36 3A 66 72 65 65 20 61 75 64 69 6F 62 6F 6F 6B 20
76 65 72 73 69 6F 6E 2E 74 78 74 65 65 64 36 3A 6C 65 6E 67 74 68 69 33
39 33 34 31 37 65 34 3A 70 61 74 68 6C 36 31 3A 57 61 72 63 72 61 66 74
5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65 20 4E 6F 76 65 6C 69 7A
61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69 65 20 47 6F 6C 64 65 6E
2E 65 70 75 62 65 65 65 34 3A 6E 61 6D 65 36 31 3A 57 61 72 63 72 61 66
74 5F 20 4F 66 66 69 63 69 61 6C 20 4D 6F 76 69 65 20 4E 6F 76 65 6C 69
7A 61 74 69 6F 6E 20 62 79 20 43 68 72 69 73 74 69 65 20 47 6F 6C 64 65
6E 20 45 50 55 42 31 32 3A 70 69 65 63 65 20 6C 65 6E 67 74 68 69 31 30
34 38 35 37 36 65 36 3A 70 69 65 63 65 73 32 30 3A 43 92 4C 22 BB 42 9E
EA BD FF 66 C6 79 4C 29 E4 F9 D0 F3 B9 65
And then take the SHA-1 hash to generate a 160-bit (20-byte) digest:
7EDA978ED7628595BB91C48B947F025BAE78CB77
Which is the right answer:
Here's how to pull the pertinent segment of a *.torrent datum for a bittorrent “info hash”.
I made this for an example.
0000000: 6438 3A61 6E6E 6F75 6E63 6530 3A31 303A d8:announce0:10:
0000010: 6372 6561 7465 6420 6279 3133 3A6D 6B74 created by13:mkt
0000020: 6F72 7265 6E74 2031 2E30 3133 3A63 7265 orrent 1.013:cre
0000030: 6174 696F 6E20 6461 7465 6931 3537 3037 ation datei15707
0000040: 3530 3238 3565 343A 696E 666F 6436 3A6C 50285e4:infod6:l
0000050: 656E 6774 6869 3230 6534 3A6E 616D 6534 engthi20e4:name4
0000060: 3A70 7269 7631 323A 7069 6563 6520 6C65 :priv12:piece le
0000070: 6E67 7468 6932 3632 3134 3465 363A 7069 ngthi262144e6:pi
0000080: 6563 6573 3230 3AF1 D7EE 4236 3434 D06F eces20:...B644.o
0000090: 27C4 BBAD 87F0 F089 7A22 2B37 3A70 7269 '.......z"+7:pri
00000a0: 7661 7465 6931 6565 65 vatei1eee
The content of the “info” key is between (inclusive) offsets 0x4D and 0xA7. So…
#!/crit/shell/bsh
bbe \
-e '
d 0x0 0x4C ;
d 0xA8 * ;
' \
${example} \
|
shasum -a 1 -b
You should see this:
1799a58b9f8ff2b9b9bcecd0d438c5f37f19a31c *-
Here is the xxd output, in–lieu of shasum, for more elucidation:
0000000: 6436 3A6C 656E 6774 6869 3230 6534 3A6E d6:lengthi20e4:n
0000010: 616D 6534 3A70 7269 7631 323A 7069 6563 ame4:priv12:piec
0000020: 6520 6C65 6E67 7468 6932 3632 3134 3465 e lengthi262144e
0000030: 363A 7069 6563 6573 3230 3AF1 D7EE 4236 6:pieces20:...B6
0000040: 3434 D06F 27C4 BBAD 87F0 F089 7A22 2B37 44.o'.......z"+7
0000050: 3A70 7269 7661 7465 6931 6565 :privatei1ee
You can refer to The BitTorrent Protocol Specification for an explanation, albeit a terse and rather grammatically inelegant one, as to their nomenclature and why the final 0x65 needs be excluded.
Concisely: the entire datum is encased in a pair of US-ASCII ‘d’ and ‘e’; the content of the “info” key, or field, is similarly so encased. You want everything between the first 0x64 — ‘d’, — which succeeds the US-ASCII string 4:info, and the terminal 0x65 — ‘e’, — which is paired with the aforementioned 0x64.

Match over multiple lines perl regular expression

I have a file like this:
01 00 01 14 c0 00 01 10 01 00 00 16 00 00 00 64
00 00 00 65 00 00 01 07 40 00 00 22 68 61 6c 2e
6f 70 65 6e 65 74 2e 63 6f 6d 3b 30 30 30 30 30
30 30 30 32 3b 30 00 00 00 00 01 08 40 00 00 1e
68 61 6c 2e 6f 70 65 6e 65 74 2d 74 65 6c 65 63
6f 6d 2e 6c 61 6e 00 00 00 00 01 28 40 00 00 21
72 65 61 6c 6d 31 2e 6f 70 65 6e 65 74 2d 74 65
6c 65 63 6f 6d 2e 6c 61 6e 00 00 00 00 00 01 25
40 00 00 1e 68 61 6c 2e 6f 70 65 6e 65 74 2d 74
65 6c 65 63 6f 6d 2e 6c 61 6e 00 00 00 00 01 1b
40 00 00 20 72 65 61 6c 6d 2e 6f 70 65 6e 65 74
2d 74 65 6c 65 63 6f 6d 2e 6c 61 6e 00 00 01 02
40 00 00 0c 01 00 00 16 00 00 01 a0 40 00 00 0c
00 00 00 01 00 00 01 9f 40 00 00 0c 00 00 00 00
00 00 01 16 40 00 00 0c 00 00 00 00 00 00 01 bb
40 00 00 28 00 00 01 c2 40 00 00 0c 00 00 00 00
00 00 01 bc 40 00 00 13 31 39 37 37 31 31 31 32
32 33 31 00
I am reading the file and then finding certain octets and replacing them with tags:
while(<FH>){
$line =~ s/(00 00 00 64)/<incr4> /g;
$line =~ s/(00 00 00 65)/<incr4> /g;
$line =~ s/(30 30 30 30 30 32)/<incr6ascii:999999:0>/g;
$line =~ s/(31 31 32 32 33 31)/<incr6ascii:999999:0>/g;
print OUTPUT $line;
}
So for example, 00 00 00 64 would be replaced by the <incr4> tag. This was working fine, but it doesn't seem to able to match over multiple lines any more. For example the pattern 31 31 32 32 33 31 runs over multiple lines, and the regular expression doesn't seem to catch it. I tried using /m /s pattern modifiers to ignore new lines but they didn't match it either. The only way around it I can come up with, is to read the whole file into a string using:
undef $/;
my $whole_file = <FH>;
my $line = $whole_file;
$line =~ s/(00 00 00 64)/<incr4> /g;
$line =~ s/(00 00 00 65)/<incr4> /g;
$line =~ s/(30 30 30 30 30 32)/<incr6ascii:999999:0>/g;
$line =~ s/(31 31 32 32 33 31)/<incr6ascii:999999:0>/g;
print OUTPUT $line;
This works, the tags get inserted correctly, but the structure of the file is radically altered. It is all dumped out on a single line. I would like to retain the structure of the file as it appears here. Any ideas as to how I might do this?
/john
The trick here is to match the class of all space like characters \s:
my $file = do {local (#ARGV, $/) = 'filename.txt'; <>}; # slurp file
my %tr = ( # setup a translation table
'00 00 00 64' => '<incr4>',
'00 00 00 65' => '<incr4>',
'00 30 30 30 30 32' => '<incr6ascii:999999:0>',
'31 31 32 32 33 31' => '<incr6ascii:999999:0>',
);
for (keys %tr) {
my $re = join '\s+' => split; # construct new regex
$file =~ s{($re)}{
$1 =~ /\n/ ? "\n$tr{$_}" : $tr{$_} # if octets contained \n, add \n
}ge # match multiple times, execute the replacement block as perl code
}
print $file;