MongoDB schema design: reference by ID vs. reference by name? - mongodb

With this simple example
(use short ObjectId to make it read easier)
Tag documents:
{
_id: ObjectId('0001'),
name: 'JavaScript',
// other data
},
{
_id: ObjectId('0002'),
name: 'MongoDB',
// other data
},
...
Assume that we need a individual tag collection, e.g. we need to store some information on each tag.
If reference by ID:
// a book document
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: [ObjectId('0001'), ObjectId('0002'), ...]
}
If reference by name:
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: ['JavaScript', 'MongoDB', ...]
}
It's known that "reference by ID" is feasible.
I'm thinking if use "reference by name", a query for book's info only need to find within the book collection, we could know the tags' name without a join ($lookup) operation, which should be faster.
If the app performs a tag checking before book creating and modifying, this should also be feasible, and faster.
I'm still not very sure:
Is there any hider on "reference by name" ?
Will "reference by name" slower on "finding all books with a given tag" ? Maybe ObjectId is somehow special ?
Thanks.

I would say it depends on what your use case is for tags. As you say, it will be more expensive to do a $lookup to retrieve tag names if you reference by id. On the other hand, if you expect that tag names may change frequently, all documents in the book collection containing that tag will need to be updated every change.
The ObjectID is simply a 12 byte value, which is autogenerated by a driver if no _id is present in inserted documents. See the MongoDB docs for more info. The only "special behavior" would be the fact that _id has an index by default. An index will speedup lookups in general, but indexes can be created on any field, not just _id.
In fact, the _id does not need to be an ObjectID. It is perfectly legal to have documents with integer _id values for instance:
{
_id: 1,
name: 'Javascript'
},
{
_id: 2,
name: 'MongoDB'
},

Related

Storing enum to MongoDb (for managing tag names)

If we have a collection of books, we can assign tags of authors into an array as follows:
Books collection
{
...,
"authors" : ["John Michaels", "Bill Williams"]
}
This can cause problems if an author's name changes.
Instead, I was thinking of assigning an integer value to each author and creating a 'tags' collection:
Tags collection
{
“tags” : [
{“John Michaels” : 0},
{“Jane Collins” : 1},
{“Bill Williams” : 2}
]
}
Here is my books collection, here we specify that ‘John Michaels’ and ‘Bill Williams’ are the authors:
{
…,
“authors” : [0, 2]
}
If I ever needed to change the author’s name ‘Bill Williams’ to ‘Bill H. Williams’, there would be no problem because the value stored in the books collection remains unchanged.
My Question is if MongoDB has something like enums that will automatically increment the integral value or if there is something else built into MongoDB to help with this type of situation.
Thank you
This is typical use case of referencing other collections. So, you should have 2 collections:
Authors collection:
{
_id: ObjectId,
name: String,
... // Other fields
}
Books collection:
{
_id: ObjectId,
authors: [ ObjectId ], // References to documents from Author collection
... // Other fields
}
So, in authors property of the Books collection, you store _id values of all the authors. Then when you fetch book document, you can easily fetch up-to-date authors data from Authors collection.

Create unique indexes for document's objects stored in an array

How do one create unique indexes for document's objects stored in array?
{
_id: 'documentId',
books: [
{
unique_id: 1,
title: 'Asd',
},
{
unique_id: 2,
title: 'Wsad',
}
...
]
}
One thing I can think of is autoincrementing. Or is there any mongo way to do so?
if you remove the _id field from your doc, mongo will automatically add one for you, which is:
guaranteed to be unique
contains the timestamp of creation
lots of other features.
see here: https://docs.mongodb.com/v3.2/reference/method/ObjectId/
Looking at the example object again, are you referring to the ids in the books array?
If so, you can assign them with ObjectIds as well, just like in the document root's _id field:
doc.books.forEach(x => { x.unique_id = new ObjectId() } );

Is it better to save id of a document in another document as ObjectId or String

Lets take a simple "bad" example : lets assume I have 2 collections 'person' and 'address'. And lets assume in 'address' I want to store '_id' of the person the address is associated with. Is there any benefit to store this "referential key" item as ObjectId vs string in 'address' collection?
I feel like storing them as string should not hurt but I have not worked in mongo for very long and do not know if it will hurt down the road if I follow this pattern.
I read the post here : Store _Id as object or string in MongoDB?
And its said that ObjectId is faster, and I assume its true if you are fetching/updating using the ObjectId in parent collection(for eg. fetching/updating 'person' collection using person._id as ObjectId), but I couldn't find anything that suggests that same could be true if searching by string id representation in other collection(in our example search in address collection by person._id as string)
Your feedback is much appreciated.
Regardless of performance, you should store the "referential key" in the same format as the _id field that you are referring too. That means that if your referred document is:
{ _id: ObjectID("68746287..."), value: 'foo' }
then you'd refer to it as:
{ _id: ObjectID(…parent document id…), subDoc: ObjectID("68746287...")
If the document that you're pointing to has a string as an ID, then it'd look like:
{ _id: "derick-address-1", value: 'foo' }
then you'd refer to it as:
{ _id: ObjectID(…parent document id…), subDoc: "derick-address-1" }
Besides that, because you're talking about persons and addresses, it might make more sense to not have them in two documents altogether, but instead embed the document:
{ _id: ObjectID(…parent document id…),
'name' : 'Derick',
'addresses' : [
{ 'type' : 'Home', 'street' : 'Victoria Road' },
{ 'type' : 'Work', 'street' : 'King William Street' },
]
}
As for use string as id of document, in meteor collection, you could generate the document id either Random.id() as string or Meteor.Collection.ObjectID() as ObjectId.
In this discussion loop, Mongodb string id vs ObjectId, here is one good summary,
ObjectId Pros
it has an embedded timestamp in it.
it's the default Mongo _id type; ubiquitous
interoperability with other apps and drivers
ObjectId Cons
it's an object, and a little more difficult to manipulate in practice.
there will be times when you forget to wrap your string in new ObjectId()
it requires server side object creation to maintain _id uniqueness
- which makes generating them client-side by minimongo problematic
String Pros
developers can create domain specific _id topologies
String Cons
developer has to ensure uniqueness of _ids
findAndModify() and getNextSequence() queries may be invalidated
All those information above is based on the meteor framework. For Mongodb, it is better to use ObjectId, reasons are in the question linked in your question.
Storing it as objectId is benificial. It is faster as ObjectId size is 12 bytes compared to string which takes 24 bytes.
Also, You should try to de-normalize your collections so that you don't need to make 2 collections (Opposite to RDBMS).
Something like this might be better in general:
{ _id : "1",
person : {
Name : "abc",
age: 20
},
address : {
street : "1st main",
city: "Bangalore",
country: "India"
}
}
But again, it depends on your use case. This might be not suitable sometimes.
Hope that helps! :)

How to store related records in mongodb?

I have a number of associated records such as below.
Parent records
{
_id:234,
title: "title1",
name: "name1",
association:"assoc1"
},
Child record
{
_id:21,
title: "child title",
name: "child name1",
},
I want to store such records into MongoDb. Can anyone help?
Regards.
Even MongoDB doesn't support joins, you can organize data in several different ways:
1) First of all, you can inline(or embed) related documents. This case is useful, if you have some hierarchy of document, e.g. post and comments. In this case you can like so:
{
_id: <post_id>,
title: 'asdf',
text: 'asdf asdf',
comments: [
{<comment #1>},
{<comment #2>},
...
]
}
In this case, all related data will be in the save document. You can fetch it by one query, but pushing new comments to post cause moving this document on disk, frequent updates will increase disk load and space usage.
2) referencing is other technique you can use: in each document, you can put special field that contains _id of parent/related object:
{
_id: 1,
type: 'post',
title: 'asdf',
text: 'asdf asdf'
},
{
_id:2
type: 'comment',
text: 'yep!',
parent_id: 1
}
In this case you store posts and comments in same collection, therefor you have to store additional field type. MongoDB doesn't support constraints or any other way to check data constancy. This means that if you delete post with _id=1, comments with _id=2 store broken link in parent_id.
You can separate posts from comments in different collections or even databases by using database references, see your driver documentation for more details.
Both solutions can store tree-structured date, but in different way.

MongoDB: Is a range query possible using multikeys?

var jd = {
type: "Person",
attributes: {
name: "John Doe",
age: 30
}
};
var pd = {
type: "Person",
attributes: {
name: "Penelope Doe",
age: 26
}
};
var ss = {
type: "Book",
attributes: {
name: "The Sword Of Shannara",
author: "Terry Brooks"
}
};
db.things.save(jd);
db.things.save(pd);
db.things.save(ss);
db.things.ensureIndex({attributes: 1})
db.things.find({"attributes.age": 30}) // => John Doe
db.things.find({"attributes.age": 30}).explain() // => BasicCursor... (don't want a scan)
db.things.find({"attributes.age": {$gte: 18}) // John Doe, Penelope Doe (via a scan)
The goal is that all attributes be indexed and searchable via range queries and that the index actually be used (as opposed to a collection scan). There's no telling what attributes a document will have. I have read about multikeys but they seem only to work (by index) with exact-match queries.
Multikeys prefers this format for a document:
var pd = {
type: "Person",
attributes: [
{name: "Penelope Doe"},
{age: 26}
]
};
Is there a pattern where by one index I can find items by attribute using a range?
EDIT:
In a schemaless DB it makes sense to have potentially a limitless array of types, yet a collection name practically implies some sort of type. But if we go to the extreme, we want to allow for any number of types within a collection (so that we don't have to define a collection for every conceivable custom type a user might imagine). Searching, therefore, by attributes (of any sort) with just a single deep index (that supports ranged queries) makes this sort of thing far more feasible. Seems to me a natural fit for a schemaless DB.
Opened a ticket if you wanna vote it up:
http://jira.mongodb.org/browse/SERVER-2675
Yes range queries work with multikeys. However multikeys are for arrays rather than embedded objects.
In the example above try
db.things.ensureIndex({"attributes.age": 1})
Range queries are possible using multikeys; however, expressing the query can be tricky.