Will hashing a UUID with CRC32 be always unqiue? - hash

For example, I have a UUID 3f107c44-336c-409b-b6d8-889d552a5339, if I hash it with a CRC32, can I ensure none of the hash of UUID will collapse?
=========
Reason I wanted to ask this is because I was uncertain if I am correct.
What I really wanted to do is to generate a unique id based on an existing unique id.

No. A UUID represents a 128-bit value. A CRC-32 is only 32 bits. So at best you have a mapping of any one CRC-32 value to 296 (79,228,162,514,264,337,593,543,950,336) different UUID values.

Related

What's a best practice for saving a unique, random, short string to db?

I have a table with a varchar column named key, which is supposed to hold a unique, 8-char random string, which is going to be used as an unique identifier by users. This field should be generated and saved on creation of objects, I have a question about how to create it:
Most of recommendations point to UUID field, but it's not applicable for me because it's too long, and if just get a subset of it then there's no guarantee of uniqueness.
Currently I've just implemented a loop in my backend (not DB), which generates a random string and tries to insert it to DB, and retries if the string turns out to be not unique. But I feel that this is just a really bad practice.
What's the best way to do this?
I'm using Postgresql 9.6
UPDATE:
My main concern is to remove the loop that retries to find a random, short string (or number, doesn't matter) that is unique in that table. AFAIK the solution should be a way to generate the string in DB itself. The only thing that I can find for Postgresql is uuid and uuid-ossp that does something like this, but uuid is way too long for my application, and I don't know of any way to have a shorter representation of uuid without compromising it's uniqueness (and I don't think it's possible theoretically).
So, how can I remove the loop and it's back-and-forth to DB?
Encryption is guaranteed unique, it has to be otherwise decryption would not work. Provided you encrypt unique inputs, such as 0, 1, 2, 3, ... then you are guaranteed unique outputs.
You want 8 characters. You have 62 characters to play with: A-Z, a-z, 0-9 so convert your binary output from the encryption to a base 62 number.
You may need to use the cycle walking technique from Format-preserving encryption to handle a few cases.

Conversion of hashbytes output to integer allowed?

I have some code with the following lines:
CAST(HASHBYTES('MD5', 'some long string with up to 256 characters') AS int)
CAST(HASHBYTES('SHA2_256', 'some very very long string...') AS int)
This has been done to generate a unique int value and later the int value is used as a lookup key (or foreign key in a join). So my best guess to why it's done this way is make the join quicker an be able to generate an index (and not run into the 900 bytes limit there).
But I'm unsure if the above output will not create much more collisions when converting it to int.
My understanding is that it is not possible to represent an MD5 hash or even a SHA-256 hash as an int value...
The code originally was developed on SQL-Server 2008. I currently use SQL-Server 2014.
Of course it will create much more collisions. An int is only 4 bytes (32 bits), whereas MD5 generates 20 (160 bits) and SHA2_256 32 (256 bits). Less bits means less unique values so more collisions.
Technically it is allowed...

Swift: Unique Int id from String

I am using Parse which has an preload User table in the database. I want from each user a unique userId (Int). Parse's objectId is unique but not an Int and username is a String. Username is unique for each user , so can I somehow convert each username into a number ?
I tried .toInt() , Int() but I got nothing.
WHY :
I have an existing table with user's ratings (movies) and I want to extent this table with more ratings. The userId field is a Number value so I must keep it this way.
Swift String has a hash property. It also conforms to the Hashable protocol. Maybe you can use that.
However, hashValue has the following comment:
Axiom:x == y implies x.hashValue == y.hashValue.
Note: The hash value
is not guaranteed to be stable across different invocations of the
same program. Do not persist the hash value across program runs.
so, use carefully...
Note: as stated in the comments, the hashValue is not guaranteed to be unique, but collisions should be rare, so it may be a solution anyway.
Having unique arbitrary String to Int map is not possible. You have to put some constraints on the allowed characters and string length. However, even if you use case-insensitive alpha-numeric user names, with some smart variable-length bit-encoding, then you look at some 5 bits per character on rough average. 64-bit integer can accomodate up to some 12 characters this way. Longer than that, you will inevitably have collisions.
I think you approach the problem from the wrong end. Instead of having a function for String -> Int mapping, what stops you from having a separate table with Int <-> String mapping? Just have some functionality that will check whether a userID exists in that table, and if it does not, then insert a new record for such userID and assign a new unique number to it. This way it will take quite some time and service popularity to deplete 64-bit integer capacity.

Data hashing in Pentaho

Can anyone suggest me the best possible options that I can use in pentaho to suit my requirement. The requirement is we need to convert first_name & last_name attributes into hash and load the hash values for these columns into the user table to support the business reports. For the reports the actual values for these columns are not needed, the reporting code only checks for NULL values in first_name & last_name columns, and validates length of these fields.
I tried converting the fields to hash using Add checksum transformation but wasn't sure about which type of checksum to use (CRC 32, ADLER 32, MD5, SHA-1). Any suggestions?
source & target DB is PostgreSql not sure if it's needed.
Thanks in advance.
Hashing and encryption are not the same thing.
It seems you want a one-way hash. What hash you choose depends mainly on how much you care about collisions. If you don't care that multiple names could generate the same hash, a short fast hash like CRC32 is fine. If you do care about collisions then I'd use at least MD5.

How to encrypt a string with standard PostgreSQL?

I'm working with PostgreSQL.
I need to transform "http://www.xyz.com/some_uri/index1.html" in something like "scdfdsffd"(some unique key, based on the URL that is a unique key in the table).
By other words... the URL is a unique key in the table, but I need to generate a small unique key based on the URL.
What can I do with standard PostgreSQL 8.4?
Best Regards,
Several methods:
a) Why not use an auto-incrementing column or sequence generator to generate unique integers per insert? If you have less than 100 million URLs, your identifiers are short and easy to remember. However, if that's not an option (e.g. because you don't want people guessing IDs and attacking the database that way):
b) The built-in MD5() function may help:
INSERT INTO table (pkey, url) VALUES (MD5('http://...'), 'http://...');
MD5() is a hash function and will most likely give you a unique identifier per URL. I say "most likely" because you get a 128-bit hash from MD5, and the likelihood of a hash collision is on the order of 2^-128 (about 10^-55).
If you need smaller identifiers you can chop the result from MD5 down to a smaller number of characters, but you could potentially significantly increase the chance of a hash collision depending on which characters you take.
[Note: timestamp answer redacted since it in no way solves the original problem. -BobG]