Please refer to this JsFiddle where I have the data separated by the appropriated columns:
http://jsfiddle.net/hsZvq/
Good Demo (For those who don't want to click the link):
Unique ID Generated Code Part 1 Part 2
--------------------------------------------------------------------------------
877023281 9F044F5BCF2D97B2 9F044F5BCF2D97B2
790200492 3B9BD10FBDB90D7F613313A492ACC67B 3B9BD10FBDB90D7F 613313A492ACC67B
The Generated Code is somehow generated /derived for the Unique ID. At first I think it was a 256-bit hash because all the codes were a set length, but some of the ID's actually only have 128-bit so that leads me to believe its a combination hash.
If you split up each 128-bit part of the code you will notice that the 2nd part repeats itself a lot. It seems to be based on something that is obviously repeating.
note:
Unique ID may refer to the numerical value given or possibly the numerical value with an R infront. For example the above Generated Code may be based on 877023281 or R877023281.
Do you have access to the function?
It would be helpful to generate a bunch of input/output pairs where the inputs are more related to each other. For example, inputs that differ by only one bit: 0,1,2,4,8,16,.... and 1,3,5,9,17,..
Even if the function is close to trivial, the small number of samples you have presented doesn't offer much fodder for analysis.
Of course, if you have the code, you could try reverse-engineering the code instead of its numerical output.
Related
I'd like to be able to create an algorithm that generates a 6 character confirmation code (e.g. A1JU2Z) that will be unique for a given (user, code) pair. The reason is, I'd like to keep the code at 6 characters, but using a trimmed set of alphanumerics (to avoid confusion with 1 and I, etc) only allows for ~300 million codes before collisions occur. Sure I may never need 300 million codes, but if I do, it will be a huge pain to go back and fix this.
So is there a way to utilize the user ... say their username, to generic unique codes such that if the same user wants to generate another code, its guaranteed that it is unique for them? (This is of course assuming a single user doesn't generate over 300 mill codes)
Thanks!
If the ID is unique only to the current user, you can just generate each character of the ID randomly. As long as the user is not expected to generate a large number of such IDs, you will have reasonable chance of not generating the same ID more than once (you need to do some math to get exact numbers for the expected chance of collision as the number of generated ID grow).
If you must not have collision at all cost, you need to either keep all previously generated IDs and do a comparison for the new one, or keep the count of the generated IDs (this requires a scheme where the ID generation is deterministic based on the count, but also unique -- a very simple case would be {ID=count; ++count;})
I think you can use a simple password generator like this : http://www.webtoolkit.info/php-random-password-generator.html
in combination with a check algorithm to be sure it is not already used.
$pass=generate_password();
$found=find_password($pass);
while($found){
$pass=generate_password();
$found=find_password($pass);
}
save_password($user,$code,$pass);
generate_password() is the function refered in the link.
find_password() is a function you have to write to check already generated codes in a database.
save_password() is a function you have to write to store the generated code in a database.
The code is in PHP, but the logic is here.
The password generator in the link is easy to understand, you can get 6 chars long, with the character rules you want.
I have streaming strings (text containing words and number).
Taking one line at a time for streaming strings, I would like to assign a unique value to them.
the examples may be:strings with their scores/hash
User1 logged in Comp1 port8087 1109
User2 logged in comp2 1135
user3 logged in port8080 1098
user1 logged in comp2 port8080 1178
these string should be in same cluster. For this what i have thought is mapping(bad type of hashing) the strings such that the small change in the string wont affect the score that much.
One simple way of doing that may be: taking UliCp8, Ulic .... ( i.e. 1st letter of each sentence) and find some way of scoring. After then the similar scored strings are kept in same bucket and later on sub group them.
The improved method would be: lets not take out first word of each word of the string but find some way to take representative value of the word such that the string representation may be quite suitable for mapping with score/hash as i mention.
Considering Levenstein distance or jaccard_index or some similarity distance metrices, all of them require inputting the strings for comparisions. Isn't there any method to hash/score the string as stated without going for comparisions.( POS tagging, comparing looks uneffiecient for my purpose as the data are streaming, huge in number, unstructured)
Hope you understand what i want to achieve and please help me out. Forgot about the comments below and lets restart.
"at least two similar word (not considering length) should have similar hash value"
This is against the most basic requirements for a hash function. With a hash function also minimal changes to the input should produce vehement changes to the bucket the hash falls into.
You are looking for an algorithm that calculates the similarity or distance between two inputs.
As stated you are not looking for a hash function, rather something like the Levenshtein distance which is an algorithm for calculating a metric representing the degree of differences between two sequences of bits. It is commonly used to find out how similar/dissimilar two strings are. Hashing / message digests are good for creating identifiers for unique, distinct values but they will produce entirely different results for "similar" values.
You are interested in the similarity of strings. Here is a nice post that names a few resources that are used for measuring string similarity. Maybe Lucene could help you in your situation.
A friend of me ask this, and i was thinking of asking this here too..
"What kind of data are this, how are they encrypted, or decrypted?"
My friend told me he got this from facebook.
d9ca6435295fcd89e85bd56c2fd51ccc
It looks like it could be an md5 hash.
Basically a hash is a one-way function. The idea is that you take some input data and run it through the algorithm to create a value (such as the string above) that has a low probability of collisions (IE, two input values hashing to the same string).
You cannot decrypt a hash because there is not enough information in the resultant string to go back. However, it may be possible for someone to figure out your input values if you use a 'weak' hashing algorithm and do not do proper techniques such as salting a hash, etc.
I don't know how FaceBook uses hashes, but a common use for a hash might be to uniquely identify a page. For example, if you had a private image on a page, you might ask to generate a link to the image that you can email to friends. That link might use a hash as part of the URL since the value can be computed quickly, is reasonably unique, and has a low probability of a third party figuring it out.
This is actually a large topic that I am by no means doing justice to. I suggest googling around for hash, md5, etc to learn more, if you are so inclinded.
It is a sequence of 128 bits, encoded as a lower-case hex string.
If you are talking about a Facebook API key, there is no deeper meaning to decode from the bits. The keys are created at random by Facebook and assigned to a particular application to identify it. Each application gets a different set of random bits for its API key.
This appears the be the...
hexadecimal representation for...
- ... a 16 bytes encryption block or..
- ... some 128 bits hash code or even
- ... just for some plain random / identifying number.
(Hexadecimal? : note how there are only 0 thru 9 digits and a thru f letters.)
While the MD5 Hash guess suggested by others is quite plausible, it could be just about anything...
If it is a hash or a identifying / randomly assigned number, its meaning is external to the code itself.
For example it could be a key to be used to locate records in a database, or a value to be compared with the result of the hash function applied to the user supplied password etc.
If it is an encrypted value, its meaning (decrypted value) is directly found within the code, but it could be just about anything. Also, assuming it is produced with modern encryption algorithm, it could take a phenomenal amount of effort to crack the code (if at all possible).
We would like to give each of users an alias so that we can refer to them in discussions while protecting their identity. These aliases should be unique.
The easy way would be to simply use a SERIAL column, but ints aren't memorable. We would like to use real people names so that we can remember the aliases.
The other easy way would be to find a list of first names somewhere, number them, and use a SERIAL to fetch names from the list. When the list runs out, add more names.
But is there some clever way to map ints to names?
We currently have about 2,000 users and are growing, but I doubt we'll ever become Google.
It may sound crazy. But there is an algorithm used in game programming to create meaningless but phonetically unique names like Alveolar, Bilabial, Glottal, Palatal, Velar.
Pick a random name from the Census Bureau's names file.
Have you tried any Hash functions? I am not sure whether they are available in Postgres. But yeah, one way to do is let the internal hash function take care. They will output unique IDs.
Back in "the day" Compuserve (or was it AOL?) used to give out temporary, initial passwords by having two lists of words and taking one word from each list and putting it together, so you would get something like EasyTomato or whatever. Perhaps something like that would work for your user base. If each word list has 256 characters, that's 65535 unique combinations (and notice how easily you can pick the combination by just incrementing a 16-bit integer).
EDIT: Well don't do a straight increment of the integer after all, or the first 256 people will all get the same first word, but the basic idea is still sound. Pick a random, not-yet-used 16-bit number. High 8 bits are your index into the first word list, low 8 bits are your index into the second word list.
A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.