I'm implementing an ASN1 parser to decode x509 certificates. There are sequence and set tags that you can use. I don't see a reason why those two shouldn't be the same thing.
What is the difference, and appropriate time to use each?
A sequence is ordered, a set is not.
When would a set be appropriate? One example might be when encoding huge chunks of data, where bringing them in the right order at the sender would cause more work than mandating that they be accepted in any order.
As an addendum to #ChristophSommer's answer
A sequence is ordered, a set is not.
... and this makes a difference e.g. when encoding some object. Using DER (distinguished encoding rules) requires a distinguished order of set elements while sequence elements keep their original order.
appropriate time to use each
As Christoph's answer indicates, you use a sequence for entries for which the order of appearance in the sequence is relevant, and a set for entries for which the order is irrelevant.
Furthermore the entries of a set usually are assumed to be unique, no two identical entries in a set, while a sequence may contain many identical entries.
Related
In our web application we display a list of pulses, but for linking and such we make every pulse uniquely available. In our Couch DB we are giving every pulse a unique id by md5'ing their unique attributes. I.E.: www.foo.com/bar/
Though these md5 sums are extremely long and make for ugly URLs. Is there another way to hash the attributes that will require less characters but still guarantee uniqueness.
Thanks a lot
Instead of creating an ugly md5 you could use a method like this to create a random string of a given length containing certain characters and insert this into a row next to the md5 row that is used for retrieving the data from the database using the 'pretty url' string. One thing to think about would be to take out the vowels from the possible characters as with them, you could end up with bad words :) Also, make sure it does not already exist in the database of course, and if it does just create another one... that won't happen very often though.
Please refer to this JsFiddle where I have the data separated by the appropriated columns:
http://jsfiddle.net/hsZvq/
Good Demo (For those who don't want to click the link):
Unique ID Generated Code Part 1 Part 2
--------------------------------------------------------------------------------
877023281 9F044F5BCF2D97B2 9F044F5BCF2D97B2
790200492 3B9BD10FBDB90D7F613313A492ACC67B 3B9BD10FBDB90D7F 613313A492ACC67B
The Generated Code is somehow generated /derived for the Unique ID. At first I think it was a 256-bit hash because all the codes were a set length, but some of the ID's actually only have 128-bit so that leads me to believe its a combination hash.
If you split up each 128-bit part of the code you will notice that the 2nd part repeats itself a lot. It seems to be based on something that is obviously repeating.
note:
Unique ID may refer to the numerical value given or possibly the numerical value with an R infront. For example the above Generated Code may be based on 877023281 or R877023281.
Do you have access to the function?
It would be helpful to generate a bunch of input/output pairs where the inputs are more related to each other. For example, inputs that differ by only one bit: 0,1,2,4,8,16,.... and 1,3,5,9,17,..
Even if the function is close to trivial, the small number of samples you have presented doesn't offer much fodder for analysis.
Of course, if you have the code, you could try reverse-engineering the code instead of its numerical output.
I want to store and index all of my historical e-mail and news as individual message files, using some computed hash code based on the message body+headers. Then I'll index on other things as well -- for searching.
For the primary index key, my thought is to use SHA-1 for the hash algorithm and assume that there will never be any collisions (although I know that there theoretically could be).
Besides the body, what headers should I index? Or more generally, what transformations should I apply to an in-memory copy of the message prior to hashing?
Should I ignore "ReSent-*:" headers? Should I join line-broken headers into single-line headers and remove extraneous whitespace?
(The reason I want to index the messages based on some head instead of on the Message-ID header is because Message-ID headers aren't uniformly formatted.)
You should hash precisely that which constitutes uniqueness of the message. If two messages may differ by the presence of "ReSent-*:" headers but still must be considered to be the "same" message, then those headers must not be part of what is hashed. Similarly, if equal messages may differ in header syntax then you should normalize header syntax. Hash functions such as SHA-1 return the same output only if the input is eaxctly the same, every single bit of it.
Now if using Message-IDs are just enough for you, save for the formatting issue, then there is a simple way: just hash the Message-IDs. A hashed Message-ID will have your regular, fixed-size, randomized format on which you can index.
We would like to give each of users an alias so that we can refer to them in discussions while protecting their identity. These aliases should be unique.
The easy way would be to simply use a SERIAL column, but ints aren't memorable. We would like to use real people names so that we can remember the aliases.
The other easy way would be to find a list of first names somewhere, number them, and use a SERIAL to fetch names from the list. When the list runs out, add more names.
But is there some clever way to map ints to names?
We currently have about 2,000 users and are growing, but I doubt we'll ever become Google.
It may sound crazy. But there is an algorithm used in game programming to create meaningless but phonetically unique names like Alveolar, Bilabial, Glottal, Palatal, Velar.
Pick a random name from the Census Bureau's names file.
Have you tried any Hash functions? I am not sure whether they are available in Postgres. But yeah, one way to do is let the internal hash function take care. They will output unique IDs.
Back in "the day" Compuserve (or was it AOL?) used to give out temporary, initial passwords by having two lists of words and taking one word from each list and putting it together, so you would get something like EasyTomato or whatever. Perhaps something like that would work for your user base. If each word list has 256 characters, that's 65535 unique combinations (and notice how easily you can pick the combination by just incrementing a 16-bit integer).
EDIT: Well don't do a straight increment of the integer after all, or the first 256 people will all get the same first word, but the basic idea is still sound. Pick a random, not-yet-used 16-bit number. High 8 bits are your index into the first word list, low 8 bits are your index into the second word list.
A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.