SHA256 of a binary string? - hash

I have a JSON object and I need to calculate the SHA256 hash of it. Now, I can either serialize it into a binary string or a human readable string.
My question is does calculating the SHA256 hash of a binary string make any sence or should I go the human readable string route?

The algorithm does not care about that. Eventually it will take the bits of what you provided and calculates the hash.
If you want to compare hashes, you obviously just need to make sure that you doing it the same way.
When it comes to speed, I tend to say that the binary string will be faster. Human readable representations have often far more indentations and in general "overhead", which means more bits to process.

Related

Why doesn't md5 (and other hash algorithms) output in base32?

It seems like most hashes (usually in base16/hex) could be easily represented in base32 in a lossless way, resulting in much shorter (and more easily readable) hash strings.
I understand that naive implementations might mix "O"s, "0"s, "1"s, and "I"s, but one could easily choose alphabetic characters without such problems. There are also enough characters to keep hashes case-insensitive. I know that shorter hash algorithms exist (like crc32), but this idea could be applied to those too for even shorter hashes.
Why, then, do most (if not all) hash algorithm implementations not output in base32, or at least provide an option to do so?

Is base64 encoding always one to one

Is it possible to obtain two identical encoded values for two different inputs to the Base64 encoding algorithm?
Let's use another algorithm for example, a function that replaces underscores with the letter X.
Foo_Bar = FooXBar
FooXBar = FooXBar
Can this sort of thing ever happen with Base64 encoding?
No, it cannot happen. Base64 is a lossless conversion (and it even needs a 33% space more). In math terms, the Base64 function is a bijection.
Note how HTTP basic access authentication use this encoding for the username and the password. Anyone can get the original strings from the encoded one, and for this reason this authentication should be used only under HTTPS.
You can find more details on Base64 also on Wikipedia.
No, base64 is just a way of encoding binary data as printable characters.
Strictly speaking, it's just a number system, like binary (base 2), decimal(base 10), or hexadecimal (base 16). Just as you can convert losslessly between those you can with base 64. In fact, mathematically bases are irrelevant, and are only used for notation and human use, math is equivalent no matter what base you use.

Base58 Encoder function in PostgreSQL for TEXT

Can anyone help me to implement Base58 encoding stored procedure in PostgreSQL.
I've found answer for numbers but I'm looking for similar stored procedure that can accept TEXT or VARCHAR value.
On this very rare occasion I'm going to suggest you don't do this. It will be computationally possible but highly inadvisable.
https://en.wikipedia.org/wiki/Base58
In contrast to Base64, the digits of the encoding don't line up well
with byte boundaries of the original data. For this reason, the method
is well-suited to encode large integers, but not designed to encode
longer portions of binary data.
To put this another way, Base58 is not designed to encode strings / text. Your main alternatives are:
Base64 which if copied manually by a human, the human may make mistakes. Otherwise Base64 is safe to copy / paste
Hexadecimal which is easily copied by humans but significantly longer than Base64
If you feel you really need Base58 and not Base64 then it may be worth editing your requirements into your question. This may help someone give an answer more specific to your requiremnts:
What are these strings you need to convert (examples are preferable)?
Why do they need to be Base58 and not Base64 (what other system are you passing these to)?

String that cannnot be generated by SHA1

How do I generate or find string that can't be possibly generated by SHA1 encrypting of any input string?
The reason I ask this is because I need a global password placeholder in user table.
Thanks
It depends on the representation you use to store the SHA1-hash, actually. But just a * like sometimes used in /etc/passwd, should work. Actually an empty string would work, too, but I would use something more explicid -- like '*invalid'
If you are using the standard hex representation (like '68ac906495480a3404beee4874ed853a037a7a8f' e.g.), you could use everything that is not a 40digit hex number actually. Use some ascii char, not in [0-9a-f] better yet not in [0-9a-zA-Z].

Convert 32-char md5 string to integer

What's the most efficient way to convert an md5 hash to a unique integer to perform a modulus operation?
Since the solution language was not specified, Python is used for this example.
import os
import hashlib
array = os.urandom(1 << 20)
md5 = hashlib.md5()
md5.update(array)
digest = md5.hexdigest()
number = int(digest, 16)
print(number % YOUR_NUMBER)
You haven't said what platform you're running on, or what the format of this hash is. Presumably it's hex, so you've got 16 bytes of information.
In order to convert that to a unique integer, you basically need a 16-byte (128-bit) integer type. Many platforms don't have such a type available natively, but you could use two long values in C# or Java, or a BigInteger in Java or .NET 4.0.
Conceptually you need to parse the hex string to bytes, and then convert the bytes into an integer (or two). The most efficient way of doing that will entirely depend on which platform you're using.
There is more data in a MD5 than will fit in even a 64b integer, so there's no way (without knowing what platform you are using) to get a unique integer. You can get a somewhat unique one by converting the hex version to several integers worth of data then combining them (addition or multiplication). How exactly you would go about that depends on what language you are using though.
Alot of language's will implement either an unpack or sscanf function, which are good places to start looking.
If all you need is modulus, you don't actually need to convert it to 128-byte integer. You can go digit by digit or byte by byte, like this.
mod=0
for(i=0;i<32;i++)
{
digit=md5[i]; //I presume you can convert chart to digit yourself.
mod=(mod*16+digit) % divider;
}
You'll need to define your own hash function that converts an MD5 string into an integer of the desired width. If you want to interpret the MD5 hash as a plain string, you can try the FNV algorithm. It's pretty quick and fairly evenly distributed.