MongoDB ObjectId vs string in find() - mongodb

I'm starting to play with mongodb, and I learned that when inserting a document, you can either provide an ID, or let mongodb generate it for you.
I thought this is nice, because I want to let my users optionally choose an id, and if not generate it for them.
But the problem is, the generated one is of type ObjectId while the user provided one is a string, and the find method only returns the correct answer if you pass it with the correct type. So when a user requests GET /widget/123, I have no idea if the original ID was stored as an ObjectId or a string do I?
So how am I supposed to use this feature?

First off, I'd recommend against letting users provide _ids: if 2 users want to use the same _id, the second user will be unable to, which will be frustrating. If users want that functionality, I'd recommend storing the user created id on a separate field & querying by user (or company or whatever) and the user-created id.
That said, mongo ObjectIds are 24 hex characters, so you can safely identify when an id is not a MongoId by checking whether it doesn't match /^[a-f0-9]{24}$/ (or by seeing whether a call to ObjectId("maybeAnObjectId") throws). In the case where it's unclear (where a user might have provided 24 hex characters as their id), you'll need to use $in (or $or) to query for both cases:
const query = /^[a-f0-9]{24}$/.test(id) ? { _id: {$in: [ObjectId(id), id]}} : {_id: id}
(an annoying user could re-use an autogenerated ObjectId as their string id, and then queries to that route would return two values and there'd be no way of differentiating them).

Related

NoSQL/MongoDB: When do I need _id in nested objects?

I don't understand the purpose of the _id field when it refers to a simple nested object, eg it is not used as a root entity in a dedicated collection.
Let's say, we have a simple value-object like Money({currency: String, amount: Number }). Obviously, you are never going to store such a value object by itself in a dedicated collection. It will rather be attached to a Transaction or a Product, which will have their own collection.
But you still might query for specific currencies or amounts that are higher/lower than a given value. And you might programmatically (not on DB level) set the value of a transaction to the same as of a purchased product.
Would you need an _id in this case? What are the cases, where a nested object needs an _id?
Assuming the _id you are asking about is an ObjectId, the only purpose of such field is to uniquely identify an object. Either a document in a collection or a subdocument in an array.
From the description of your money case it doesn't seem necessary to have such _id field for your value object. A valid use case might be an array of otherwise non-unique/non-identifiable subdocuments e.g. log entries, encrypted data, etc.
ODMs might add such field by default to have a unified way to address subdocuments regardless of application needs.

Is there a way to convert Mongoose document ObjectId to string?

I've to filter the document keys for different users based on the user roles/permissions.
For example, 1 user can get secretCode while others not.
Converting the document to an object does convert the doc to a plain object but the ObjectIds are still objects (bson type).
I can try _id.toString() but I've multiple fields with the objectId. So it will be kind of hard-coding in all the fields. Plus, I'll also have to validate if the value is not null, else .toString() is not a function will pop up.
Another way is to use JSON.parse(JSON.stringify(obj)) but this synchronous and unnecessary computation.
Any suggestions?

mgo - bson.ObjectId vs string id

Using mgo, it seems that best practice is to set object ids to be bson.ObjectId.
This is not very convenient, as the result is that instead of a plain string id the id is stored as binary in the DB. Googling this seems to yield tons of questions like "how do I get a string out of the bson id?", and indeed in golang there is the Hex() method of the ObjectId to allow you to get the string.
The bson becomes even more annoying to work with when exporting data from mongo to another DB platform (this is the case when dealing with big data that is collected and you want to merge it with some properties from the back office mongo DB), this means a lot of pain (you need to transform the binary ObjectId to a string in order to join with the id in different platforms that do not use bson representation).
My question is: what are the benefits of using bson.ObjectId vs string id? Will I lose anything significant if I store my mongo entities with a plain string id?
As was already mentioned in the comments, storing the ObjectId as a hex string would double the space needed for it and in case you want to extract one of its values, you'd first need to construct an ObjectId from that string.
But you have a misconception. There is absolutely no need to use an ObjectId for the mandatory _id field. Quite often, I advice against that. Here is why.
Take the simple example of a book, relations and some other considerations set aside for simplicty:
{
_id: ObjectId("56b0d36c23da2af0363abe37"),
isbn: "978-3453056657",
title: "Neuromancer",
author: "William Gibson",
language: "German"
}
Now, what use would have the ObjectId here? Actually none. It would be an index with hardly any use, since you would never search your book databases by an artificial key like that. It holds no semantic value. It would be a unique ID for an object which already has a globally unique ID – the ISBN.
So we simplify our book document like this:
{
_id: "978-3453056657",
title: "Neuromancer",
author: "William Gibson",
language: "German"
}
We have reduced the size of the document, make use of a preexisting globally unique ID and do not have a basically unused index.
Back to your basic question wether you loose something by not using ObjectIds: Quite often, not using the ObjectId is the better choice. But if you use it, use the binary form.

Mongoid: retrieving documents whose _id exists in another collection

I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.

Confusion regarding Mongo db Schema. How to make it better?

I am using mongoose with node.js for this.
My current Schema is this:
var linkSchema = new Schema({
text: String,
tags: array,
body: String,
user: String
})
My use-case is this: There are a list of users and each user has a list of links associated with it. Users and links are different Schemas of course. Thus, how does one get that sort of one to one relationship done using mongo-db.
Should I make a User Schema and embed linkSchema in it? Or the other way around?
Another doubt regarding that. Tags would always be an array of strings which I can use to browse through links later. Should it be an array data type or is there a better way to represent it?
If it's 1:1 then nest one document inside the other. Which way around depends on the queries, but you could easily do both if you need to.
For tags, you can index an array field and use that for searching/filtering documents and from the information you've given that sounds reasonable IMHO.
If you had a fixed set of tags it would make sense to represent those as a nested object with named fields perhaps, depending on queries. Don't forget you not only can create nested documents in Mongo but you can also search on sub-fields and even use entire nested documents as searchable/indexable fields. For instance, you could have a username like this;
email: "joe#somewhere.com"
as a string, and you could also do;
email: {
user: "joe",
domain: "somewhere.com"
}
you could index email in both cases and use either for matching. In the latter case though you could also search on domain or user only without resorting to RegEx style queries. You could also store both variants, so there's lots of flexibile options in Mongo.
Going back to tags, I think your array of strings is a fine model given what you've described, but if you were doing more complex bulk aggregation, it wouldn't be crazy to store a document for every tag with the same document contents, since that's essentially what you'd have to do for every query during aggregation.