I was wondering if someone could explain to me .decode and .encode in hashlib? - encoding

I understand that you have a hex string and perform SHA256 on it twice and then byte-swap the final hex string. The goal of this code is to find a Merkle Root by concatenating two transactions. I would like to understand what's going on in the background a bit more. What exactly are you decoding and encoding?
import hashlib
transaction_hex = "93a05cac6ae03dd55172534c53be0738a50257bb3be69fff2c7595d677ad53666e344634584d07b8d8bc017680f342bc6aad523da31bc2b19e1ec0921078e872"
transaction_bin = transaction_hex.decode('hex')
hash = hashlib.sha256(hashlib.sha256(transaction_bin).digest()).digest()
hash.encode('hex_codec')
'38805219c8ac7e9a96416d706dc1d8f638b12f46b94dfd1362b5d16cf62e68ff'
hash[::-1].encode('hex_codec')
'ff682ef66cd1b56213fd4db9462fb138f6d8c16d706d41969a7eacc819528038'

header_hex is a regular string of lower case ASCII characters and the decode() method with 'hex' argument changes it to a (binary) string (or bytes object in Python 3) with bytes 0x93 0xa0 etc. In C it would be an array of unsigned char of length 64 in this case.
This array/byte string of length 64 is then hashed with SHA256 and its result (another binary string of size 32) is again hashed. So hash is a string of length 32, or a bytes object of that length in Python 3. Then encode('hex_codec') is a synomym for encode('hex') (in Python 2); in Python 3, it replaces it (so maybe this code is meant to work in both versions). It outputs an ASCII (lower hex) string again that replaces each raw byte (which is just a small integer) with a two character string that is its hexadecimal representation. So the final bit reverses the double hash and outputs it as hexadecimal, to a form which I usually call "lowercase hex ASCII".

Related

Get UTF-16 code unit at a given index in ABAP

I want to get the UTF-16 code unit at a given index in ABAP.
Same can be done in JavaScript with charCodeAt().
For example "d".charCodeAt(); will give back 100.
Is there a similar functionality in ABAP?
This can be done with class CL_ABAP_CONV_OUT_CE
DATA(lo_converter) = cl_abap_conv_out_ce=>create( encoding = '4103' ). "Litte Endian
TRY.
CALL METHOD lo_converter->convert
EXPORTING
data = 'a'
n = 1
IMPORTING
buffer = DATA(lv_buffer). "lv_buffer will 0061
CATCH ...
ENDTRY.
Codepage 4102 is for UTF-16 Big endian.
It is possible to encode not just a single character, but a string as well:
EXPORTING
data = 'abc'
n = 3
"n" always stands for the length of the string you want to be encoded. It could be less, than the actual length of the string.
When you say you "want to get the UTF-16 code unit",
either you mean the Unicode code point, e.g. the character d is always U+0064 (official "name" of Unicode character, the two bytes 0x0064 being the hexadecimal representation of decimal 100),
or you mean you want to encode d to UTF-16 little endian (SAP code page 4103) or big endian (SAP code page 4102) which gives respectively 2 bytes 0x4400 or 2 bytes 0x0044.
For the second case, see József answer.
For the first case, you may get it using the method UCCP (UniCode Code Point) or UCCPI (UniCode Code Point Integer) of class CL_ABAP_CONV_OUT_CE:
DATA: l_unicode_point_hex TYPE x LENGTH 2,
l_unicode_point_int TYPE i.
l_unicode_point_hex = cl_abap_conv_out_ce=>UCCP( 'd' ).
ASSERT l_unicode_point_hex = '0064'.
l_unicode_point_int = cl_abap_conv_out_ce=>UCCPI( 'd' ).
ASSERT l_unicode_point_int = 100.
EDIT: Note that the two methods return always the same values whatever the SAP system code page is (4102, 4103 or whatever).

How to convert string in UTF-8 to ASCII ignoring errors and removing non ASCII characters

I am new to Scala.
Please advise how to convert strings in UTF-8 to ASCII ignoring errors and removing non ASCII characters in output.
For example, how to remove non ASCII character \uc382 from result string: "hello���", so that "hello" is printed in output.
scala.io.Source.fromBytes("hello\uc382".getBytes ("UTF-8"), "US-ASCII").mkString
val str = "hello\uc382"
str.filter(_ <= 0x7f) // keep only valid ASCII characters
If you had text in UTF-8 as bytes that is now in a String then it was converted.
If you have text in a String and you want it in ASCII as bytes, you can convert it later.
It seems that you just want to filter for only the UTF-16 code units for the C0 Controls and Basic Latin codepoints. Fortunately, such codepoints take only one code unit so we can filter them directly without converting them to codepoints.
"hello\uC382"
.filter(Character.UnicodeBlock.of(_) == Character.UnicodeBlock.BASIC_LATIN)
.getBytes(StandardCharsets.US_ASCII)
.foreach {
println }
With the question generalized to an arbitrary, known character encoding, filtering doesn't do the job. Instead, the feature of the encoder to ignore characters that are not present in the target Charset can be used. An Encoder requires a bit more wrapping and unwrapping. (The API design is based on streaming and reusing the buffer within the same stream and even other streams.) So, with ISO_8859_1 as an example:
val encoder = StandardCharsets.ISO_8859_1
.newEncoder()
.onMalformedInput(CodingErrorAction.IGNORE)
.onUnmappableCharacter(CodingErrorAction.IGNORE)
val string = "ñhello\uc382"
println(string)
val chars = CharBuffer.allocate(string.length())
.put(string)
chars.rewind()
val buffer = encoder.encode(chars)
val bytes = Array.ofDim[Byte](buffer.remaining())
buffer.get(bytes)
println(bytes)
bytes
.foreach {
println }

Get the UTF-8 Encoding of a Character in Bytes

On a String, I can use utf8 and count to get the number of bytes required to encode the String with UTF-8 encoding:
"a".utf8.count // 1
"チャオ".utf8.count // 9
"チ".utf8.count // 3
However, I don't see an equivalent method on a single Character value. To get the number of bytes required to encode a character in the string to UTF-8, I could iterate through the string by character, convert the Character to a String, and get the utf8.count of that String:
"チャオ".characters.forEach({print(String($0).utf8.count)}) // 3, 3, 3
This seems unnecessarily verbose. Is there a way to get the UTF-8 encoding of a Character in Swift?
Character has no direct (public) accessor to its UTF-8 representation.
There are some internal methods in Character.swift dealing with the UTF-8 bytes, but the public stuff is implemented in
String.UTF8View in StringUTF8.swift.
Therefore String(myChar).utf8.count is the correct way to obtain
the length of the characters UTF-8 representation.

digits in long to base64 characters

I am working on a small task which requires some base64 encoding. I am trying to do it in head but getting lost .
I have a 13 digit number in java long format say: 1294705313608 , 1294705313594 , 1294705313573
I do some processing with it, bascially I take this number append it with stuff put it in a byte array and then convert it to base64 using:
String b64String = new sun.misc.BASE64Encoder().encodeBuffer(bArray);
Now , I know that for my original number, the first 3 digits would never change. So 129 is constant in above numbers. I want to find out how many chars corresponding to those digits would not change in the resultant base64 string.
Code to serialize long to the byte array. I ignore the first 2 bytes since they are always 0:
bArray[0] = (byte) (time >>> 40);
bArray[1] = (byte) (time >>> 32);
bArray[2] = (byte) (time >>> 24);
bArray[3] = (byte) (time >>> 16);
bArray[4] = (byte) (time >>> 8);
bArray[5] = (byte) (time >>> 0);
Thanks.
Notes:
I know that base64 would take 6 bits and make one character out of it. So if first 3 digits do not change in long how many chars would not change in base64.
This in NOT a HW assignment, but I am not very familiar with encoding.
1290000000000 is 10010110001011001111111011110010000000000 in binary.
1299999999999 is 10010111010101110000010011100011111111111 in binary.
Both are 41 bits long, and they differ after the first 7 bits. Your shift places bits 41-48 in the first byte, which will always be 00000001. The following byte will always be 00101110, 00101101, or 00101110. So you've got the leading 14 bits in common across all your possible array values, which (at 6 bits per encoded base64 char) means 2 characters in common in the encoded string.
Appears you're on the right track. I think what you want to do is convert a long to a byte array, then convert the byte array to Base64.
How do I convert Long to byte[] and back in java shows you how to convert it to bytes.

Objective-C Decimal to Base 16 Hex conversion

Does anyone have a code snippet or a class that will take a long long and turn it into a 16 byte Hex string?
I'm looking to turn data like this
long long decimalRepresentation = 1719886131591410351;
and turn it into this
//Base 16 Hex Output: 17DE435307A07300
The %x operator doesn't want to work for me
NSLog(#"Hex: %x",decimalRepresentation);
//console : "Hex: 7a072af"
As you can see that's not even close. Any help is truly appreciated!
%x prints an unsigned integer in hexadecimal representation and sizeof(long long) != sizeof(unsigned). See e.g. "Data Type Size and Alignment" in the 64bit transitioning guide.
Use the ll specifier (thats two lower-case L) to get the desired output:
NSLog(#"%llx", myLongLong);