The MongoDB docs show an example of a one to many relationship...
(Abbreviated...)
Model One-to-Many Relationships with Document References
// Publisher.
{
_id: "oreilly",
name: "O'Reilly Media",
}
// Book.
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
publisher_id: "oreilly"
}
// Book.
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
publisher_id: "oreilly"
}
Why is the publisher _id a logical, human-readable name while the book _ids appear to be generated surrogate keys?
Wouldn't all _ids be generated values?
Is it conventional in MongoDB to sometimes use the data itself as a unique key and sometimes not?
If so, when do we use ordinary names ("mary", "joe", "exxon"), and when do we prefer generated values?
Wouldn't all _ids be generated values?
MongoDB auto-generates _id only when it is not provided by the user.
Is it conventional in MongoDB to sometimes use the data itself as a
unique key and sometimes not?
Yes, The data can be used as the key. The _id value has to be unique in a collection to correctly identify a document. Any parameter in the document can be set as the _id if it satisfies the above criteria. When the _id is not unique in a collection the duplicate key error is thrown.
If so, when do we use ordinary names ("mary", "joe", "exxon"), and
when do we prefer generated values?
We prefer a generated _id when there is no parameter(or parameter groups), that uniquely identify the document. For example, using the name of a person will not work because, there might come a situation to add another person with the same name. However, consider a books ISBN number, it uniquely identifies a book. Such parameters can be used as _id.
Additional Notes:
The auto-generated _id has embedded timestamp value (which may be of use).
No need to worry about duplicate values with auto-generated _id.
However, an user specified _id value may be more application/domain specific.
Related
Reading the mongo docs for modeling many-to-many relationships, I see they are using simple strings for the _id
{
_id: "oreilly",
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
where I imagined it was beneficial to use actual ObjectId values as per this question: Difference between storing an ObjectId and its string form, in MongoDB
I’m finding it complex to have to cast between the object and string as this data passes back and forth between the Frontend (Vue) and backend (NestJS/Node) as JSON and am wondering if there is any real necessity to concern myself with utilizing ObjectId as it adds a fair bit of complexity.
Does storing the reference id as a string make any difference when performing an aggregation/$graphLookup? Am I meant to actually store this reference itself as an ObjectId or is that wholly unnecessary?
The only requirement for the values of _id is that they be unique as the _id is always indexed automatically by MongoDB and that index is unique.
The purpose of ObjectIDs is to allow the client to generate an ID that is guaranteed to be unique across a broad range of clients that are writing to the same collection. If you have a better unique ID you are encouraged to use it as it saves you an index. You do not need to cast that value into an ObjectID. It can be used in the clear as can other types (e.g. integers Decimals etc.).
is it somehow possible, to define one compound key, consisting of two mongoDB objectID's or numeric types, so to make one key out of it?
This is necessary, because I have lots of participants creating documents which they save into one big collection together, so I cannot be sure, that the MongoDB Object ID for each document is distinct. So I wanted to add some additional key, maybe one userID's number or email or something similar...
maybe 2 ObjectID's
ObjectId in MongoDb is hexadecimal value.
ObjectId() Returns a new ObjectId value. The 12-byte
ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
https://docs.mongodb.com/manual/reference/method/ObjectId/
Hence, the object Id will be uniquely auto-generated when you insert a document.
However, you can make a custom combination of hexadecimal value of length 24, when you insert a document.
For example,
1DCD6500 -- this can be custom hex identifier
A98AC7 -- another custom hex identifier
2B67 -- another custom hex identifier
A981CE -- Incremental custom hex identifier
Now if you try to insert a document with _id as 1DCD6500A98AC72B67A981CE. The document will be saved.
e.g. { "_id" : ObjectId("1DCD6500A98AC72B67A981CE"), "name" : "sample", "personid" : 39 }
So based on definition of the ObjectId you can make custom ObjectId.
But in that case you will be responsible to make sure ObjectId is unique, otherwise the mongodb will throw error
"E11000 duplicate key error collection:
You can use anything for your _id field. So this is possible:
db.collection.insertOne({
_id: {
"first": new ObjectId(),
"second": new ObjectId(),
}
})
The default unique index on the _id field also guarantees uniqueness on this kind of field.
However, I would doubt that this is a good solution to your problem as it would probably just defer the underlying problem (which really doesn't exist - kindly see this answer, too: How to generate unique object id in mongodb). Instead, I would suggest you have your clients create documents without specifying an _id explicitly and let MongoDB create the _id (on the server side or on the client side depending on your driver and your settings where client-side generation should be preferred). This will guarantee uniqueness (even when you do sharding).
There always is a unique index on your _id field anyway so to be on the super safe side with respect to run-time behaviour you could put a retrying exception handler in place on the client side for the (pretty much impossible) case that you end up with two identical _ids and hence an exception.
Also see this answer: Mongodb - must _id be globally unique when sharding
I'm developing an application that create permalinks. I'm not sure how save the documents in MondoDB. Two strategies:
ObjectId autogeneration
MongoDB autogenerates the _id. I need to create an index on the permalink field because I get the information by the permalink. Also I can access to the creation time of the ObjectId, using the getTimestamp() method, so datetime fields seems to be redundant but if I delete this field I need two calls to MongoDB one to take the information and another to take the timestamp.
{
"_id": ObjectId("5210a64f846cb004b5000001"),
"permalink": "ca8W7mc0ZUx43bxTuSGN",
"data": "a lot of stuff",
"datetime": ISODate("2013-08-18T11:47:43.460+-100")
}
Generate _id
I generate the _id with the permalink.
{
"_id": "ca8W7mc0ZUx43bxTuSGN",
"data": "a lot of stuff",
"datetime": ISODate("2013-08-18T11:47:43.460+-100")
}
I not see any advantage to use ObjectIds. Am I missing something?
ObjectIds are there for situations where you don't have a unique key for every document in a collection. They're unique, so you don't have to worry about conflicts and they shard reasonably well in large deployments without too much worry (they have they're pros and cons, read more here).
The ObjectId also contains the timestamp of the client where the ObjectId was generated (unless the DB server is configured to generate all keys). With that, as you noticed, you can use the time stamp to perform some date operations. However, if you plan on using the Aggregation Framework, you'll find that you can't use an ObjectId in any date operations currently (issue). If you want to use the AF, you'll need a second field that contains the date, unfortunately doubly storing it with the ObjectId's internal value.
If you can be assured that the _id you're generating is unique, then there's not much reason to use an ObjectId in your data structure.
I wish to add an _id as property for objects in a mongo array.
Is this good practice ?
Are there any problems with indexing ?
I wish to add an _id as property for objects in a mongo array.
I assume:
{
g: [
{ _id: ObjectId(), property: '' },
// next
]
}
Type of structure for this question.
Is this good practice ?
Not normally. _ids are unique identifiers for entities. As such if you are looking to add _id within a sub-document object then you might not have normalised your data very well and it could be a sign of a fundamental flaw within your schema design.
Sub-documents are designed to contain repeating data for that document, i.e. the addresses or a user or something.
That being said _id is not always a bad thing to add. Take the example I just stated with addresses. Imagine you were to have a shopping cart system and (for some reason) you didn't replicate the address to the order document then you would use an _id or some other identifier to get that sub-document out.
Also you have to take into consideration linking documents. If that _id describes another document and the properties are custom attributes for that document in relation to that linked document then that's okay too.
Are there any problems with indexing ?
An ObjectId is still quite sizeable so that is something to take into consideration over a smaller, less unique id or not using an _id at all for sub-documents.
For indexes it doesn't really work any different to the standard _id field on the document itself and a unique index across the field should work across the collection (scenario dependant, test your queries).
NB: MongoDB will not add an _id to sub-documents for you.
I am reading MongoDB in Action and when talking about querying many-to-many relationships in a Document, I'm having difficulty understanding how he wrote his example query (using the Ruby driver).
The query is finding all products in a specific category, where there is a products and category collection. The author says "To query for all products in the Gardening Tool category, the code is simple:
db.products.find({category_ids => category['id']})
A PRODUCT doc is like this:
doc =
{ _id: new ObjectId("4c4b1476238d3b4dd5003981"),
slug: "wheel-barrow-9092",
sku: "9092",
name: "Extra Large Wheel Barrow",
description: "Heavy duty wheel barrow...",
details: {
weight: 47,
weight_units: "lbs",
model_num: 4039283402,
manufacturer: "Acme",
color: "Green"
},
category_ids: [new ObjectId("6a5b1476238d3b4dd5000048"),
new ObjectId("6a5b1476238d3b4dd5000049")],
main_cat_id: new ObjectId("6a5b1476238d3b4dd5000048"),
tags: ["tools", "gardening", "soil"],
}
And a CATEGORY doc is like this:
doc =
{ _id: new ObjectId("6a5b1476238d3b4dd5000048"),
slug: "gardening-tools",
ancestors: [{ name: "Home",
_id: new ObjectId("8b87fb1476238d3b4dd500003"),
slug: "home"
},
{ name: "Outdoors",
_id: new ObjectId("9a9fb1476238d3b4dd5000001"),
slug: "outdoors"
}
],
parent_id: new ObjectId("9a9fb1476238d3b4dd5000001"),
name: "Gardening Tools",
description: "Gardening gadgets galore!",
}
Can someone please explain it a little more to me? I still can't understand how he wrote that query :(
Thanks all.
The query is searching the products collection for all products with a value of category['id'] in the field category_ids
When you search a field that contains an array for a specific value, MongoDB automatically enumerates each value in that array searching for matches.
To construct the query, you must first notice that the category collection defines your category hierarchy, and that each category has a unique ID (stored, as is usual in MongoDB, in the _id field)
You must also notice that the product collection has a field that stores a list of category ids, category_ids, that reference the unique ids of the category collection.
Therefore, to find all products in a particular category, you search the category_ids field of the product collection for the unique ID of the category you're interested in, which you get from the category collection.
If I were to write a query for the Mongo javascript based shell interpreter, mongothat find products in the Gardening Tools category, I would do the following:
Look up the ID of the Gardening Tools category (which, as noted before, is stored in the _id field of the category collection)
In this case, the value in your example is ObjectId("6a5b1476238d3b4dd5000048")
Insert the value into a query that searches through the category_ids field of the product collection
This is the query that you give in your question, which for the mongo shell I would write as: db.products.find({category_ids : new ObjectId("6a5b1476238d3b4dd5000048")})
I hope that's clearer than the original explanation!
(As an aside: I'm not quite sure what language your query is written in, is it perhaps PHP? In any case, javascript seems to be the language of choice for examples in the MongoDB docs because the MongoDB server installs the mongo command line interpreter alongside the server itself, so everyone has access to it)