Is it SAFE to use partial MongoDB Id Key - mongodb

I'm developing one system, where Badges will be created with a QRCode for each User, and I need to read that QRCode and show specific information to the user on the public screen.
QRCode reading is a little 'tricky'. When I did something like this, I was using MySQL with enumerated Ids (1, 100, 2304, 9990)... Witch is only about 5 characters.
However, MongoDB keys (DB that I'm using now) consists of a biggg key such as 52d35bf26bda8a5c8f8a22a8 witch has MANY characters.
What is the problem with that: QRCode becomes larger (more data, bigger the size), and becomes harder to read it fast on the WebCam (even in HD).
So, here is my idea: Use part of the Id, So that: 52d35bf26bda8a5c8f8a22a8 becomes perhaps 52d35bf26bd.
The question is really simple: Can I safely use the partial ID Key, without having re-occurrences? The maximum elements I will have, will be about 1000 order.
The question has has nothing to do with QRCode, but it explains the reason why I'm doing it.

ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
Once known that, it depends on the part of the objectId you choose and how many time will pass between the insertions.
Regards

Regardless of whether it's safe or not, the size difference between the QR codes isn't that great.
Using the full string will get you an image like:
Using half the characters produces a code like:
I would suggest that even the cheapest smartphone would be able to scan the larger of the two images - it's not very complex at all.

Yes, you can safely compress a long hexadecimal string into a higher base to get fewer characters while retaining the same value.
Example:
Hex: 52d35bf26bda8a5c8f8a22a8
Base64: UtNb8mvailyPiiKo
The same idea can be taken further by using binary or even Chinese pictograph characters instead of base64.
This concept can be tested using a Numeral Base Converter Tool.

Related

Encode binary or ASCII (at least 7000 bytes) into image/barcode?

I'm wondering if there's some way to encode data (either binary or ASCII) into a printable image or data pattern that can easily be rescanned again and interpreted back into a file. The problem with QR codes is that they won't handle file sizes of 7-10KB. Any suggestions?
EDIT: One catch: Can't store said data on the server. Security reasons. The data must not exist anywhere except on a printed piece of paper.
7 kilobytes is 57344 bits, hence the graphical code needs a lot of bars or squares (in case of QR) in order to represent the data, and that's without thinking of data error correction, format information, positioning, alignment etc...
I think a sound solution will be to put the data on a server. map it with index and create a service to retrieve the data by index.
the QR/barcode will scan index and get the data from the service

How Does The Google URL Shortener Generate A 5 Digit Hash Without Collisions

How can the Google URL shortener generate a unique hash with five characters without collisions. Seems like there are bound to be collisions, where different urls generate the same hash.
stackoverflow.com => http://goo.gl/LQysz
What's also interesting, is the same URL, generates a completely different hash each time:
stackoverflow.com => http://goo.gl/Dl7sz
So, doing some math, using lower-case characters, upper-case characters, and digits, the total number of combinations are 62^5 = 916,132,832 clearly collisions bound to happen.
How does Google do this?
They have a database which tracks all previously generated URLs and the longer URL that each of those maps to. Easy to make sure that newly generated URLs don't already exist in that table. A little tricky to scale out (they surely have multiple servers so each one needs to be assigned a bucket of values from which it can give out to users). If they ever reach the point of having generated 916,132,832 URLs, they'll just add another character.
They have a hash table with hash to url.
Count the number of rows in that table and encrypt it with a stream cipher then encode with base62.
Using a stream cipher instead of a hash will give you a short pseudo random output that doesn't collide with any previous output so you don't need to check the table.
It keeps track of previously used long URLs. This means that, when someone goes to create a short URL, if the place they are pointing to already has a short URL, it will just give them the pre-existing short URL.
Actually, it would be inefficient to have a system dedicated to creating 'hashes' based on a given set of data. Rather, the short URL is simply a random set of characters which has already been identified as ten digits, plus 26 lowercase letters, plus 26 uppercase letters = 916132832 permutations (not combinations). Random short URLs is the most efficient way to make it work, and that is why they are always different (though I suppose there could be some other component in the algorithm like the time of day, but I don't think it's worth it....there's no point in making it that complex; spending all of that processing power just to make a silly 5 character string which any monkey could do by pressing a button the right way on a permutation calculator).

What kind of encrypted data is this?

A friend of me ask this, and i was thinking of asking this here too..
"What kind of data are this, how are they encrypted, or decrypted?"
My friend told me he got this from facebook.
d9ca6435295fcd89e85bd56c2fd51ccc
It looks like it could be an md5 hash.
Basically a hash is a one-way function. The idea is that you take some input data and run it through the algorithm to create a value (such as the string above) that has a low probability of collisions (IE, two input values hashing to the same string).
You cannot decrypt a hash because there is not enough information in the resultant string to go back. However, it may be possible for someone to figure out your input values if you use a 'weak' hashing algorithm and do not do proper techniques such as salting a hash, etc.
I don't know how FaceBook uses hashes, but a common use for a hash might be to uniquely identify a page. For example, if you had a private image on a page, you might ask to generate a link to the image that you can email to friends. That link might use a hash as part of the URL since the value can be computed quickly, is reasonably unique, and has a low probability of a third party figuring it out.
This is actually a large topic that I am by no means doing justice to. I suggest googling around for hash, md5, etc to learn more, if you are so inclinded.
It is a sequence of 128 bits, encoded as a lower-case hex string.
If you are talking about a Facebook API key, there is no deeper meaning to decode from the bits. The keys are created at random by Facebook and assigned to a particular application to identify it. Each application gets a different set of random bits for its API key.
This appears the be the...
hexadecimal representation for...
- ... a 16 bytes encryption block or..
- ... some 128 bits hash code or even
- ... just for some plain random / identifying number.
(Hexadecimal? : note how there are only 0 thru 9 digits and a thru f letters.)
While the MD5 Hash guess suggested by others is quite plausible, it could be just about anything...
If it is a hash or a identifying / randomly assigned number, its meaning is external to the code itself.
For example it could be a key to be used to locate records in a database, or a value to be compared with the result of the hash function applied to the user supplied password etc.
If it is an encrypted value, its meaning (decrypted value) is directly found within the code, but it could be just about anything. Also, assuming it is produced with modern encryption algorithm, it could take a phenomenal amount of effort to crack the code (if at all possible).

Pretty URLs with hashes (md5)

In our web application we display a list of pulses, but for linking and such we make every pulse uniquely available. In our Couch DB we are giving every pulse a unique id by md5'ing their unique attributes. I.E.: www.foo.com/bar/
Though these md5 sums are extremely long and make for ugly URLs. Is there another way to hash the attributes that will require less characters but still guarantee uniqueness.
Thanks a lot
Instead of creating an ugly md5 you could use a method like this to create a random string of a given length containing certain characters and insert this into a row next to the md5 row that is used for retrieving the data from the database using the 'pretty url' string. One thing to think about would be to take out the vowels from the possible characters as with them, you could end up with bad words :) Also, make sure it does not already exist in the database of course, and if it does just create another one... that won't happen very often though.

How can I generate a unique, small, random, and user-friendly key?

A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.