Get unique user ID for a webapp from MongoDB's ObjectID

Get unique user ID for a webapp from MongoDB's ObjectID - mongodb

I need to get a unique user ID identifying a user for a voice recording web app. In order to do this with minimum additional requirements, I am trying to use MongoDB's ObjectID to get this done. According to the official documentation, ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
The ObjectID 3 byte machine field is the first three bytes of the (md5) hash of the machine host name according to this answer. So shouldn't this be unique for a given machine?
However, the machine identifier does not seem to be what I thought it is.
--From Laptop--
56316c85 b47e28 f61a6 2b931
56316c89 b47e28 f61a6 2b934
--From a phone--
56316dc9 d75b48 ce1c4 9f2b3
56316dcb d75b48 ce1c4 9f2b4
--From the same Laptop--
56316f47 d75b48 ce1c4 9f2bd
56316f7e d75b48 ce1c4 9f2be
Can someone help me understand this? This is the same MongoDB being hosted locally (on the laptop) and being accessed once from a Phone's browser (which is on the same network) and once on the laptop.

Related

How to implement MongoDB ObjectId validation from scratch?

I'm developing a front-end app where I want to support searching data by the id, so I'm going to have an "object id" field. I want to validate the object id to make sure it's a valid MongoDB ObjectId before sending it to the API.
So I searched on how to do it and I found this thread, where all the answers suggest using a implementation provided by a MongoDB driver or ORM such as mongodb or mongose. However, I don't want to go that way, because I don't want to install an entire database driver/ORM in my front-end app just to use some id validation - I'd rather implement the validation myself.
Unfortunately, I couldn't find an existing implementation. Then I tried checking the ObjectId spec and implementing the validation myself, but that didn't work out either.
The specification says...
The ObjectID BSON type is a 12-byte value consisting of three different portions (fields):
a 4-byte value representing the seconds since the Unix epoch in the highest order bytes,
a 5-byte random number unique to a machine and process,
a 3-byte counter, starting with a random value.
Which doesn't make much sense to me. When it says the ObjectId has 12 bytes, it makes me think that the string representation is going to have 12 characters (1 byte = 1 char), but it doesn't. Most object ids have 24 characters.
Finally, I searched mongodb's and mongoose's source code but I didn't had much luck with that either. The best I could do was finding this line of code, but I don't know where to go from there.
TL;DR: What is the actual algorithm to check if a given string is a valid MongoDB Object Id?

You find is correct, you just stopped too early. the isValid comes from the underlying bson library: https://github.com/mongodb/js-bson/blob/a2a81bc1bc63fa5bf3f918fbcaafef25aca2df9d/src/objectid.ts#L297
And yes, you get it right - there is not much to validate. Any 12 bytes can be an object ID. The reason you see 24 characters is because not all 256 ASCII are printable/readable, so the ObjectID is usually presented in hex format - 2 characters per byte. The regexp to validate 12-bytes hex representation would be /[0-9a-f]{24}/i
TL;DR: check the constructor of ObjectId in the bson library for the official validation algorithm
Hint: you don't need most of it, as you are limited to string input on frontend.

outlook EntryId syntax

I am writing a tool to backup my mails. In order to understand if I have already backed up a mail I use the entryID.
The Entry ID is however very very long and so I have problems in serializing my datastructure with JSON, using the entryID as index in a hash.
Furthermore I noticed that the first part of the entryID remains identic throughout all my mails. Therefore my suspect, that the first part identifies the Outlook Server, and the last part the e-mails themselves. Therefore there should no need to use the whole entryID to identify a single mail in my account.
Anybody knows the syntax of this entryID, I did not find nothing on the Microsoft Site, maybe I did the wrong query.
Thx a lot
Example of EntryID:
00000000AC032ADC2BFB3545BD2CEE24F67EAFF507000C7E507D761D09469E2B3AC3FA5E65770034EA28BA320000FD962E1BCA05E74595C077ACB6D7D7D30001C72579700000
quite long, isnt´t it ?

All entry ids must be treated as black boxes. The first 4 bytes (8 hex characters) are the flags (0s for the long term entry id). Next 16 bytes (32 hex characters) are the provider UID registered with the M

use the database id as the restful service id expose a threat?

I have a restful service for the documents, where the documents are stored in mongodb, the restful api for the document is /document/:id, initially the :id in the api is using the mongodb 's object id, but I wonder deos this approach reveal the database id, and expose the potential threat, should I want to replace it with a pseudonymity id.
if it is needed to replace it the pseudonymity id, I wonder if there is a algorithmic methods for me to transform the object id and pseudonymity id back and forth without much computation

First, there is no "database id" contained in the ObjectID.
I'm assuming your concern comes from the fact that the spec lists a 3 byte machine identifier as part of the ObjectID. A couple of things to note on that:
Most of the time, the ObjectID is actually generated on the client side, not the server (though it can be). Hence this is usually the machine identifier for the application server, not your database
The 3 byte Machine ID is the first three bytes of the (md5) hash of the machine host name, or of the mac/network address, or the virtual machine id (depending on the particular implementation), so it can't be reversed back into anything particularly meaningful
With the above in mind, you can see that worrying about exposing information is not really a concern.
However, with even a small sample, it is relatively easy to guess valid ObjectIDs, so if you want to avoid that type of traffic hitting your application, then you may want to use something else (a hash of the ObjectID might be a good idea for example), but that will be dependent on your requirements.

am I exposing sensitive data if I put a bson ID in a url?

Say I have a Products array in my Mongodb. I'd like users to be able to see each product on their own page: http://www.mysite.com/product/12345/Widget-Wodget. Since each Product doesn't have an incremental integer ID (12345) but instead it has a BSON ID (5063a36bdeb13f7505000630), I'd need to either add the integer ID or use the BSON ID.
Since BSON ID's include the PID:
4-byte timestamp,
3-byte machine identifier,
2-byte process id,
3-byte counter.
Am I exposing secure information to the outside world if I use the BSON ID in my url?

I can't think of any use to gain privileges on your machines, however using ObjectIds everywhere discloses a lot of information nonetheless.
By crawling your website, one could:
find about some hidden objects: for instance, if the counter part goes from 0x....b1 to 0x....b9 between times t1 and t2, one can guess ObjectIds within these invervals. However, guessing ids is most likely useless if you enforce access permissions
know the signup date of each user (not very sensitive info but better than nothing)
deduce actual (as opposed to publicly available) business hours from the timestamps of objects created by the staff
deduce in which timezones your audience lives from the timestamps of user-generated objects: if your website is one which people use mostly at lunchtime, then one could measure peaks of ObjectIds and deduce that a peak at 8 PM UTC means the audience was on the US West coast
and more generally, by crawling most of your website, one can build a timeline of the success of your service, having for any given time knowledge of: your user count, levels of user engagement, how many servers you've got, how often your servers are restarted. PID changes occurring on weekends are more likely crashes, whereas those on business days are more likely crashes + software revisions
and probably find other info specific to your business processes and domain
To be fair, even with random ids one can infer a lot. The main issue is that you need to prevent anyone from scraping a statistically significant part of your site. But if someone is determined, they'll succeed eventually, which is why providing them with all of this extra, timestamped info seems wrong.

Sharing the information in the ObjectID will not compromise your security. Someone could infer minor details such as when the ObjectID was created (timestamp), but none of the ObjectID components should be tied to authentication or authorization.
If you are building an e-commerce site, SEO is typically a strong consideration for public URLs. In this case you normally want to use a friendlier URL with shorter and more semantic path components than an ObjectID.
Note that you do not have to use the default ObjectID for your _id field .. so could always generate something more relevant for your application. The default ObjectID does provide a reasonable guarantee of uniqueness, so if you implement your own _id allocation you will have to take this into consideration.
See also:
Create an Auto-Incrementing Sequence Field

As #Stennie said, not really.
Let's start with the pid, most hackers wouldn't bother looking for a pid, on say Linux, instead they would just do:
ps aux | grep mongod
or something similar. Of course this requires the hacker to have actually hacked your server, I know of no public hack available based on the pid alone. Considering the pid will change when you restart the machine or mongod, this information is utterly useless to anyone trying to spy.
The machine id is another bit of data that is quite useless publicly and, to be honest, they would get a better understanding of your network using ping or digg than they would through the machine id alone.
So to answer the question: No, there is no real security threat and the information you are displaying is of no use to anyone except MongoDB really.
I also agree with #Stennie on using SEO friendly URLs, an example which I commonly use for e-commerce is /product/product_title_ with a smaller random id (maybe base 64 encode the _id) or a auto incrementing id with .html on the end.

Is it ok to turn the mongo ObjectId into a string and use it for URLs?

document/show?id=4cf8ce8a8aad6957ff00005b

Generally I think you should be cautious to expose internals (such as DB ids) to the client. The URL can easily be manipulated and the user has possibly access to objects you don't want him to have.
For MongoDB in special, the object ID might even reveal some additional internals (see here), i.e. they aren't completely random. That might be an issue too.
Besides that, I think there's no reason not to use the id.

I generally agree with #MartinStettner's reply. I wanted to add a few points, mostly elaborating what he said. Yes, a small amount of information is decodeable from the ObjectId. This is trivially accessible if someone recognizes this as a MongoDB ObjectID. The two downsides are:
It might allow someone to guess a different valid ObjectId, and request that object.
It might reveal info about the record (such as its creation date) or the server that you didn't want someone to have.
The "right" fix for the first item is to implement some sort of real access control: 1) a user has to login with a username and password, 2) the object is associated with that username, 3) the app only serves objects to a user that are associated with that username.
MongoDB doesn't do that itself; you'll have to rely on other means. Perhaps your web-app framework, and/or some ad-hoc access control list (which itself could be in MongoDB).
But here is a "quick fix" that mostly solves both problems: create some other "id" for the record, based on a large, high-quality random number.
How large does "large" need to be? A 128-bit random number has 3.4 * 10^38 possible values. So if you have 10,000,000 objects in your database, someone guessing a valid value is a vanishingly small probability: 1 in 3.4 * 10^31. Not good enough? Use a 256-bit random number... or higher!
How to represent this number in the document? You could use a string (encoding the number as hex or base64), or MongoDB's binary type. (Consult your driver's API docs to figure out how to created a binary object as part of a document.)
While you could add a new field to your document to hold this, then you'd probably also want an index. So the document size is bigger, and you spend more memory on that index. Here's what you might not have though of: simply USE that "truly random id" as your documents "_id" field. Thus the per-document size is only a little higher, and you use the index that you [probably] had there anyways.

I can set both the 128 character session string and other collection document object ids as cookies and when user visits do a asynchronous fetch where I fetch the session, user and account all at once. Instead of fetching the session first and then after fetching user, account. If the session document is valid ill share the user and account documents.
If I do this I'll have to make every single request for a user and account document require the session 128 character session cookie to be fetched too thus making exposing the user and account object id safer. It means if anyone is guessing a user ID or account ID, they also have to guess the 128 string to get any answers from the system.
Another security measure you could do is wrap the id is some salt which you only know the positioning such as
XXX4cf8ce8XXXXa8aad6957fXXXXXXXf00005bXXXX
Now you know exactly how to slice that up to get the ID.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse