How to make a 20-char Id with a 24-char ObjectId - mongodb

So here is the problem: I'm using MongoDB in my project so there are 24-characters ObjectId, using only hexadecimal alphabet. I'm make http request in my project to a provider, in this request I need to put a unique Id for callbacks purpose, but the provider allows only 20 characters for this id, and I don't know why.
So, my question is, with a 16 characters alphabet (hexa), there are : 16^24 possible mongo Ids, right ?
Supposing I use in the HTTP request an Id based on 64 different characters ([0-9][a-z][A-Z]-_),
correct me if I'm wrong but I think there are 64^20 possible Ids.
So technically, it is possible to encode every possible MongoDB ObjectId with a corresponding Id, isn't it ?
It seems to be a classic Base64 encoding but mysteriously this does not work as I expected, I think I didn't understand how Base64 encoding works because the generated strings are bigger than original strings...
Do you think all of this is even possible or did I totally miss something ?
Thanks in advance!
EDIT:
One of my colleague tried something which seems to work.
Here is the Java code :
byte[] decodedHex = Hex.decodeHex("53884594e4b0695f366f8128".toCharArray());
byte[] encodedHexB64 = Base64.encodeBase64(decodedHex);
System.out.println(new String(encodedHexB64)); // --> U4hFlOSwaV82b4Eo
For a reason that I ignore, doing this is not the same:
String anotherB64 = Base64.encodeBase64String("53884594e4b0695f366f8128".getBytes());
System.out.println(anotherB64);
And it prints : NTM4ODQ1OTRlNGIwNjk1ZjM2NmY4MTI4

MongoDB is using ObjectId as a default primary key for the documents because it's fast to generate and very likely to be unique.
But you are not forced to use it as a primary key. You can use any BSON data type in the _id field as long is not an array. That being said, you can use your 20-char Id in _id field.
EDIT:
From your original question I didn't know that you're using an existing DB. The _id field is immutable and it cannot be changed in an existing document.
If you only wanted to convert the existing ObjectId to something else that's 20 chars long the method you posted will work.
The second method produces a long string because you're basically base64 encoding a string which will produce an even longer string.

Related

What's a best practice for saving a unique, random, short string to db?

I have a table with a varchar column named key, which is supposed to hold a unique, 8-char random string, which is going to be used as an unique identifier by users. This field should be generated and saved on creation of objects, I have a question about how to create it:
Most of recommendations point to UUID field, but it's not applicable for me because it's too long, and if just get a subset of it then there's no guarantee of uniqueness.
Currently I've just implemented a loop in my backend (not DB), which generates a random string and tries to insert it to DB, and retries if the string turns out to be not unique. But I feel that this is just a really bad practice.
What's the best way to do this?
I'm using Postgresql 9.6
UPDATE:
My main concern is to remove the loop that retries to find a random, short string (or number, doesn't matter) that is unique in that table. AFAIK the solution should be a way to generate the string in DB itself. The only thing that I can find for Postgresql is uuid and uuid-ossp that does something like this, but uuid is way too long for my application, and I don't know of any way to have a shorter representation of uuid without compromising it's uniqueness (and I don't think it's possible theoretically).
So, how can I remove the loop and it's back-and-forth to DB?
Encryption is guaranteed unique, it has to be otherwise decryption would not work. Provided you encrypt unique inputs, such as 0, 1, 2, 3, ... then you are guaranteed unique outputs.
You want 8 characters. You have 62 characters to play with: A-Z, a-z, 0-9 so convert your binary output from the encryption to a base 62 number.
You may need to use the cycle walking technique from Format-preserving encryption to handle a few cases.

MongoDB generate UUID with dashes

I need to add random UUID with dashes to my column GUID.
How I should write my sql script, I was using GUID = MD5(UUID()); but it generates UUID without dashes, what would be solution ?
If you use MD5(UUID()), it no longer is a UUID, and the "guaranteed unique" property is also lost. Don't do that.
That said, a UUID is a 128 bits number. It doesn't have dashes. Some representations do, but that's a display issue.

How to shorten the value of Url query parameter?

There is a similar looking question asked at :
How can I shorten the URL query paramters?
But here the query parameter is single and its value is comma separated list of long Ids.
Eg. http://example.com/page?q='111100000123,111100000234,11134423213,238418249,823481293,841298472384,89234798124,981248923,24982134983'
Encode this value to something like
htp://example.com/page?q='cdw,erw,ere...'
or
htp://example.com/page?q=asfjeoren
and then decode it back at server side to original value.
Assuming you want the encoded versions of your ids to use alphanumeric characters, and if your ids are all 12 digits (or less) then you would need up to 6 encoded characters per id - you won't get it down to 3 characters as in your example, there simply aren't enough combinations of alphanumeric characters to be able to uniquely represent each id.
I would question if it is actually worth doing what you propose - how many ids are going to be in your urls? Have you proved that the long urls actually cause a problem or are you pre-optimising?

Will I get clash issues if I compare mongodb Ids case-insensitively?

I have an email token collection in my mongodb database for a meteor app and I stick these email tokens in the reply address of my email (eg. #example.com) so that when I parse it I know what it's relating to.
The problem I have is that the email token uses the default _id algorithm to generate a unique id and that algorithm generates a string that is a mixture of upper case and lower case characters.
However, I've discovered that some email clients, lowercases the entire reply address, which means that I can only identify the addresses case-insensitvely.
I guess now I have two options.
1) The easiest option would be to match the email tokens with the reply address case insensitively. What would be the chance of clashes in that respect?
2) Make the email token some sort of guid and generate this guid independent of the mongodb ID creation.
Yes, you would get issues. Meteor uses both upper and lower case values in its 17 character id values. You can have a look at the code in the Random package: https://github.com/meteor/meteor/tree/devel/packages/random.
So it would be possible to get two distinct values of which the differences could only be casing. This could cause mixups if your client's email applications convert the address to lowercase characters.
In your case it is best not to use Random.id(), rather to make up your own Random character generator. Something like this might work:
var lowerCaseId = function() {
var digits = [],
self = this;
for (var i = 0; i < 17; i++) {
digits[i] = Random.choice("23456789abcdefghijkmnopqrstuvwxyz");
}
return digits.join("");
};
Also of note is the meteor _id value is built up of 'unmistakeable characters' - There are no characters that can cause confusion such as 0 vs O, 1 vs I, etc.
If you don't use it in your _id field, you would have to generate a value with this and check it does not exist in your database before inserting it, or using a unique index for it.
Additionally also be aware there will be a significant decrease in entropy since the number of possible combinations will have dropped with the loss of the uppercased characters. If this is of significance to you, you could increase the number of digits from 17 in the code above.
Meteor is generating it's own Id's which are different from the MongoDB ObjectId's. As noted, these would be subject to clash when converting case or checking case insensitively. This is kind of interesting and I'm not sure of the project's reasons for this.
Under the hood however the mongodb node native driver. So the ObjectId creation functions should be available if you want to use them.
https://github.com/mongodb/js-bson/blob/master/lib/bson/objectid.js#L68-L74
The important part is in these calls:
value.toString(16)
So the radix here is set to 16 for hex or all the characters 0-9a-f.
You can also note in drivers that they will Regex check like this:
^[0-9a-fA-F]{24}$
So it would seem that case sensitivity is not an issue.
Still if you want to use something alternate there is a section in the documentation that might serve as a useful guide.
http://docs.mongodb.org/manual/core/document/#the-id-field

MongoDB empty string value vs null value

In MongoDB production, if a value of a key is empty or not provided (optional), should I use empty string value or should I use null for value.
1) Is there any pros vs cons between using empty string vs null?
2) Is there any pros vs cons if I set value to undefined to remove properties from your existing doc vs letting properties value to either empty string or null?
Thanks
I think the best way is undefined as I would suggest not including this key altogether. Mongo doesn't work as SQL, where you have to have at least null in every column. If you don't have value, simply don't include the key. Then if you make query for all documents, where this key doesn't exists it will work correctly, otherwise not. Also if you don't use the key you save a little bit of disk space. Do this is the correct way in Mongo.
function deleteEmpty (v) {
if(v==null){
return undefined;
}
return v;
}
var UserSchema = new Schema({
email: { type: String, set: deleteEmpty }
});
i would say that null indicates absence of the value and empty string indicates that the value is there, but its empty.
While reading the data you can distinguish between blank values and non-existing values.
Still it depends upon your use-case
This question has been answered at least 4 times by me and a Google search will get you a lot of information.
You must take into consideration what removing the key means. If your document will eventually use that schema in most of its defined state, within the application, then you could be seeing a lot of movement of the document, this neuts the benefit of no having these keys: space. Those couple of bytes you will save in space will be rendered useless and you will get a swiss cheese effect.
However if you do not use these fields at all then having those few extra bytes with millions of documents in your working set could cause real problems that need not be there (if you for some reason want to shove that many documents into your working set), as for the space issue, MongoDB fundamentally has a space issue and I have not really known omitting a couple of keys to do anything to help that.