How to get the size of DER structure? - aes

I need to learn the length of Der structure.
It's used as a header for the cipher text file. My cipher writes the DER encoded data and the ciphertext back to back to the (cipher text) file.
I need to learn the size of the DER structure so I can pass it and only get the ciphertext from the cipher text file for decoding it. I know, I need to parse the length byte (or bytes) of header's outer asn1 sequence to get that info, but I don't know how to do it since I am not sure how many bytes it takes to store that length data.
I put the DER Header down below to give a basic idea. I would appreciate if you can take a look on it.
Header = \
asn1_sequence(
asn1_sequence(
asn1_octetstring(salt)+
asn1_integer(iter)+
asn1_integer(len(key))
) +
asn1_sequence(
asn1_objectidentifier([2,16,840,1,101,3,4,1,2])+
asn1_octetstring(iv_current)
)+
asn1_sequence(
asn1_sequence(
asn1_objectidentifier([2,16,840,1,101,3,4,2,1])+
asn1_null()
)+
asn1_octetstring(digestinfo)
)
)

Your data is encoded with DER, this means it uses TLV (Tag Length Value) form.
To know the length of the Value, you will have to read the Tag and the Length.
Reading Tag and Length is not trivial and it is explained in Recommendation X.690
8.1.2 Identifier octets
8.1.3 Length octets
Depending on which language you want to use, you should be able to find some code to use or hack.
Example in Java
Read a tag
Read a length
This lib would be used like this
BERReader reader = new BERReader(input);
reader.readTag();
reader.readLength();
reader.getLengthValue(); // gives you how many bytes you have to read

Related

Perl Digest Bcrypt, generating a proper hash

I have written a test program that generates a Bcrypt hash. This hash later needs to be verified by a PHP backend.
This is my perl code:
use Digest;
#use Data::Entropy::Algorithms qw(rand_bits);
#my $bcrypt = Digest->new('Bcrypt', cost=>10, salt=>rand_bits(16*8));
my $bcrypt = Digest->new('Bcrypt', cost=>10, salt=>'1111111111111111');
my $settings = $bcrypt->settings(); # save for later checks.
my $pass_hash = $bcrypt->add('bob')->b64digest;
print $settings.$pass_hash."\n";
This prints
$2a$10$KRCvKRCvKRCvKRCvKRCvKOoFxCE1d/OZTKQqhet3bKOq6ZVIACXBU
This does not validate as a proper hash if I use an online bcrypt tool such as https://bcrypt-generator.com
Can someone point out the error? Thanks.
Figured out the problem. I have to use bcrypt_b64digest instead of b64digest. I wish the perl documentation was clearer in which one needs to be used so that other bcrypt implementations can "get it".
my $pass_hash = $bcrypt->add('bob')->bcrypt_b64digest;
From https://metacpan.org/pod/Digest::Bcrypt#bcrypt_b64digest
Same as "digest", but will return the digest base64 encoded using the
alphabet that is commonly used with bcrypt. The length of the returned
string will be 31 and will only contain characters from the ranges
'0'..'9', 'A'..'Z', 'a'..'z', '+', and '.'
The base64 encoded string returned is not padded to be a multiple of 4
bytes long. Note: This is bcrypt's own non-standard base64 alphabet,
It is not compatible with the standard MIME base64 encoding.

How to decode a Base64 string?

I have a normal string in Powershell that is from a text file containing Base64 text; it is stored in $x. I am trying to decode it as such:
$z = [System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($x));
This works if $x was a Base64 string created in Powershell (but it's not). And this does not work on the $x Base64 string that came from a file, $z simply ends up as something like 䐲券.
What am I missing? For example, $x could be YmxhaGJsYWg= which is Base64 for blahblah.
In a nutshell, YmxhaGJsYWg= is in a text file then put into a string in this Powershell code and I try to decode it but end up with 䐲券 etc.
Isn't encoding taking the text TO base64 and decoding taking base64 BACK to text? You seem be mixing them up here. When I decode using this online decoder I get:
BASE64: blahblah
UTF8: nVnV
not the other way around. I can't reproduce it completely in PS though. See sample below:
PS > [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("blahblah"))
nV�nV�
PS > [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("nVnV"))
blZuVg==
EDIT I believe you're using the wrong encoder for your text. The encoded base64 string is encoded from UTF8(or ASCII) string.
PS > [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("YmxhaGJsYWg="))
blahblah
PS > [System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String("YmxhaGJsYWg="))
汢桡汢桡
PS > [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String("YmxhaGJsYWg="))
blahblah
There are no PowerShell-native commands for Base64 conversion - yet (as of PowerShell [Core] 7.1), but adding dedicated cmdlets has been suggested in GitHub issue #8620.
For now, direct use of .NET is needed.
Important:
Base64 encoding is an encoding of binary data using bytes whose values are constrained to a well-defined 64-character subrange of the ASCII character set representing printable characters, devised at a time when sending arbitrary bytes was problematic, especially with the high bit set (byte values > 0x7f).
Therefore, you must always specify explicitly what character encoding the Base64 bytes do / should represent.
Ergo:
on converting TO Base64, you must first obtain a byte representation of the string you're trying to encode using the character encoding the consumer of the Base64 string expects.
on converting FROM Base64, you must interpret the resultant array of bytes as a string using the same encoding that was used to create the Base64 representation.
Examples:
Note:
The following examples convert to and from UTF-8 encoded strings:
To convert to and from UTF-16LE ("Unicode") instead, substitute [Text.Encoding]::Unicode for [Text.Encoding]::UTF8
Convert TO Base64:
PS> [Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes('Motörhead'))
TW90w7ZyaGVhZA==
Convert FROM Base64:
PS> [Text.Encoding]::Utf8.GetString([Convert]::FromBase64String('TW90w7ZyaGVhZA=='))
Motörhead
This page shows up when you google how to convert to base64, so for completeness:
$b = [System.Text.Encoding]::UTF8.GetBytes("blahblah")
[System.Convert]::ToBase64String($b)
Base64 encoding converts three 8-bit bytes (0-255) into four 6-bit bytes (0-63 aka base64). Each of the four bytes indexes an ASCII string which represents the final output as four 8-bit ASCII characters. The indexed string is typically 'A-Za-z0-9+/' with '=' used as padding. This is why encoded data is 4/3 longer.
Base64 decoding is the inverse process. And as one would expect, the decoded data is 3/4 as long.
While base64 encoding can encode plain text, its real benefit is encoding non-printable characters which may be interpreted by transmitting systems as control characters.
I suggest the original poster render $z as bytes with each bit having meaning to the application. Rendering non-printable characters as text typically invokes Unicode which produces glyphs based on your system's localization.
Base64decode("the answer to life the universe and everything") = 00101010
If anyone would like to do it with a pipe in Powershell (like a filter) (e.g. read file contents and decode it), it can be achieved with a one-liner like that:
Get-Content base64.txt | %{[Text.Encoding]::UTF8.GetString([Convert]::FromBase64String($_))}
I had issues with spaces showing in between my output and there was no answer online at all to fix this issue. I literally spend many hours trying to find a solution and found one from playing around with the code to the point that I almost did not even know what I typed in at the time that I got it to work. Here is my fix for the issue: [System.Text.Encoding]::UTF8.GetString(([System.Convert]::FromBase64String($base64string)|?{$_}))
Still not a "built-in", but published to gallery, authored by MS:
https://github.com/powershell/textutility
TextUtility
ConvertFrom-Base64
Return a string decoded from base64.
ConvertTo-Base64
Return a base64 encoded representation of a string.

The torrent info_hash parameter

How does one calculate the info_hash parameter? Aka the hash corresponding to the info dictionar??
From official specs:
info_hash
The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file.
This value will almost certainly have to be escaped.
Does this mean simply get the substring from the meta-info file and do a sha-1 hash on the reprezentative bytes??
.... because this is how i tried 12 times but without succes meaning I have compared the resulting hash with the one i should end up with..and they differ ..that + tracker response is FAILURE, unknown torrent ...or something
So how do you calculate the info_hash?
The metafile is already bencoded so I don't understand why you encode it again?
I finally got this working in Java code, here is my code:
byte metaData[]; //the raw .torrent file
int infoIdx = ?; //index of 'd' right after the "4:info" string
info_hash = SHAsum(Arrays.copyOfRange(metaData, infoIdx, metaData.length-1));
This assumes the 'info' block is the last block in the torrent file (wrong?)
Don't sort or anything like that, just use a substring of the raw torrent file.
Works for me.
bdecode the metafile. Then it's simply sha1(bencode(metadata['info']))
(i.e. bencode only the info dict again, then hash that).

How can I convert the tiger hash values from the official implementations into the form used by Direct Connect?

I am trying to implement a Direct Connect Client, and I am currently stuck at a point where I need to hash the files in order to be able to upload them to other clients.
As the all other clients require a TTHL (Tiger Tree Hashing Leaves) support for verification of the downloaded data. I have searched for implementations of the algorithm, and found tiger-hash-python.
I have implemented a routine that uses the hash function from before, and is able to hash large files, according to the logic specified in Tree Hash EXchange format (THEX) (basically, the tree diagram is the important part on that page).
However, the value produced by it is similar to those shown on Wikipedia, a hex digest, but is different from those shown in the DC clients I'm using for reference.
I have been unable to find out how the hex digest form is converted to this other one (39 characters, A-Z, 0-9). Could someone please explain how that is done?
Well ... I tried what Paulo Ebermann said, using the following functions:
def strdivide(list,length):
result = []
# Calculate how many blocks there are, using the condition: i*length = len(list).
# The additional maths operations are to deal with the last block which might have a smaller size
for i in range(0,int(math.ceil(float(len(list))/length))):
result.append(list[i*length:(i+1)*length])
return result
def dchash(data):
result = tiger.hash(data) # From the aformentioned tiger-hash-python script, 48-char hex digest
result = "".join([ "".join(strdivide(result[i:i+16],2)[::-1]) for i in range(0,48,16) ]) # Representation Transform
bits = "".join([chr(int(c,16)) for c in strdivide(result,2)]) # Converting every 2 hex characters into 1 normal
result = base64.b32encode(bits) # Result will be 40 characters
return result[:-1] # Leaving behind the trailing '='
The TTH for an empty file was found to be 8B630E030AD09E5D0E90FB246A3A75DBB6256C3EE7B8635A, which after the transformation specified here, becomes 5D9ED00A030E638BDB753A6A24FB900E5A63B8E73E6C25B6. Base-32 encoding this result yielded LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHHHZWCLNQ, which was found to be what DC++ generates.
The only mention of the format of the hash in the Direct Connect protocol I found is on the $SR page on the NMDC Protocol wiki:
For files containing TTH, the <hub_name> parameter is replaced with TTH:<base32_encoded_tth_hash> (ref: TTH_Hash).
So, it is Base32-encoding. This is defined in RFC 4648 (and some earlier ones), section 6.
Basically, you are using the capital letters A-Z and the decimal digits 2 to 7, and one base32 digit represents 5 bits, while one base16 (hexadecimal) digit represents only 4 ones.
This means, each 5 hex digits map to 4 base32-digits, and for a Tiger hash (192 bits) you will need 40 base32-digits (in the official encoding, the last one would be a = padding, which seems to be omitted if you say that there are always 39 characters).
I'm not sure of an implementation of a conversion from hex (or bytes) to base32, but it shouldn't be too complicated with a lookup table and some bit-shifting.

Why does a base64 encoded string have an = sign at the end

I know what base64 encoding is and how to calculate base64 encoding in C#, however I have seen several times that when I convert a string into base64, there is an = at the end.
A few questions came up:
Does a base64 string always end with =?
Why does an = get appended at the end?
Q Does a base64 string always end with =?
A: No. (the word usb is base64 encoded into dXNi)
Q Why does an = get appended at the end?
A: As a short answer:
The last character (= sign) is added only as a complement (padding) in the final process of encoding a message with a special number of characters.
You will not have an = sign if your string has a multiple of 3 characters, because Base64 encoding takes each three bytes (a character=1 byte) and represents them as four printable characters in the ASCII standard.
Example:
(a) If you want to encode
ABCDEFG <=> [ABC] [DEF] [G]
Base64 deals with the first block (producing 4 characters) and the second (as they are complete). But for the third, it will add a double == in the output in order to complete the 4 needed characters. Thus, the result will be QUJD REVG Rw== (without spaces).
[ABC] => QUJD
[DEF] => REVG
[G] => Rw==
(b) If you want to encode ABCDEFGH <=> [ABC] [DEF] [GH]
similarly, it will add one = at the end of the output to get 4 characters.
The result will be QUJD REVG R0g= (without spaces).
[ABC] => QUJD
[DEF] => REVG
[GH] => R0g=
It serves as padding.
A more complete answer is that a base64 encoded string doesn't always end with a =, it will only end with one or two = if they are required to pad the string out to the proper length.
From Wikipedia:
The final '==' sequence indicates that the last group contained only one byte, and '=' indicates that it contained two bytes.
Thus, this is some sort of padding.
Its defined in RFC 2045 as a special padding character if fewer than 24 bits are available at the end of the encoded data.
No.
To pad the Base64-encoded string to a multiple of 4 characters in length, so that it can be decoded correctly.
The equals sign (=) is used as padding in certain forms of base64 encoding. The Wikipedia article on base64 has all the details.
It's padding. From http://en.wikipedia.org/wiki/Base64:
In theory, the padding character is not needed for decoding, since the
number of missing bytes can be calculated from the number of Base64
digits. In some implementations, the padding character is mandatory,
while for others it is not used. One case in which padding characters
are required is concatenating multiple Base64 encoded files.
http://www.hcidata.info/base64.htm
Encoding "Mary had" to Base 64
In this example we are using a simple text string ("Mary had") but the principle holds no matter what the data is (e.g. graphics file). To convert each 24 bits of input data to 32 bits of output, Base 64 encoding splits the 24 bits into 4 chunks of 6 bits. The first problem we notice is that "Mary had" is not a multiple of 3 bytes - it is 8 bytes long. Because of this, the last group of bits is only 4 bits long. To remedy this we add two extra bits of '0' and remember this fact by putting a '=' at the end. If the text string to be converted to Base 64 was 7 bytes long, the last group would have had 2 bits. In this case we would have added four extra bits of '0' and remember this fact by putting '==' at the end.
= is a padding character. If the input stream has length that is not a multiple of 3, the padding character will be added. This is required by decoder: if no padding present, the last byte would have an incorrect number of zero bits.
Better and deeper explanation here: https://base64tool.com/detect-whether-provided-string-is-base64-or-not/
The equals or double equals serves as padding. It's a stupid concept defined in RFC2045 and it is actually superfluous. Any decend parser can encode and decode a base64 string without knowing about padding by just counting up the number of characters and filling in the rest if size isn't dividable by 3 or 4 respectively. This actually leads to difficulties every now and then, because some parsers expect padding while others blatantly ignore it. My MPU base64 decoder for example needs padding, but it receives a non-padded base64 string over the network. This leads to erronous parsing and I had to account for it myself.