How to store related records in mongodb? - mongodb

I have a number of associated records such as below.
Parent records
{
_id:234,
title: "title1",
name: "name1",
association:"assoc1"
},
Child record
{
_id:21,
title: "child title",
name: "child name1",
},
I want to store such records into MongoDb. Can anyone help?
Regards.

Even MongoDB doesn't support joins, you can organize data in several different ways:
1) First of all, you can inline(or embed) related documents. This case is useful, if you have some hierarchy of document, e.g. post and comments. In this case you can like so:
{
_id: <post_id>,
title: 'asdf',
text: 'asdf asdf',
comments: [
{<comment #1>},
{<comment #2>},
...
]
}
In this case, all related data will be in the save document. You can fetch it by one query, but pushing new comments to post cause moving this document on disk, frequent updates will increase disk load and space usage.
2) referencing is other technique you can use: in each document, you can put special field that contains _id of parent/related object:
{
_id: 1,
type: 'post',
title: 'asdf',
text: 'asdf asdf'
},
{
_id:2
type: 'comment',
text: 'yep!',
parent_id: 1
}
In this case you store posts and comments in same collection, therefor you have to store additional field type. MongoDB doesn't support constraints or any other way to check data constancy. This means that if you delete post with _id=1, comments with _id=2 store broken link in parent_id.
You can separate posts from comments in different collections or even databases by using database references, see your driver documentation for more details.
Both solutions can store tree-structured date, but in different way.

Related

MongoDB schema design: reference by ID vs. reference by name?

With this simple example
(use short ObjectId to make it read easier)
Tag documents:
{
_id: ObjectId('0001'),
name: 'JavaScript',
// other data
},
{
_id: ObjectId('0002'),
name: 'MongoDB',
// other data
},
...
Assume that we need a individual tag collection, e.g. we need to store some information on each tag.
If reference by ID:
// a book document
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: [ObjectId('0001'), ObjectId('0002'), ...]
}
If reference by name:
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: ['JavaScript', 'MongoDB', ...]
}
It's known that "reference by ID" is feasible.
I'm thinking if use "reference by name", a query for book's info only need to find within the book collection, we could know the tags' name without a join ($lookup) operation, which should be faster.
If the app performs a tag checking before book creating and modifying, this should also be feasible, and faster.
I'm still not very sure:
Is there any hider on "reference by name" ?
Will "reference by name" slower on "finding all books with a given tag" ? Maybe ObjectId is somehow special ?
Thanks.
I would say it depends on what your use case is for tags. As you say, it will be more expensive to do a $lookup to retrieve tag names if you reference by id. On the other hand, if you expect that tag names may change frequently, all documents in the book collection containing that tag will need to be updated every change.
The ObjectID is simply a 12 byte value, which is autogenerated by a driver if no _id is present in inserted documents. See the MongoDB docs for more info. The only "special behavior" would be the fact that _id has an index by default. An index will speedup lookups in general, but indexes can be created on any field, not just _id.
In fact, the _id does not need to be an ObjectID. It is perfectly legal to have documents with integer _id values for instance:
{
_id: 1,
name: 'Javascript'
},
{
_id: 2,
name: 'MongoDB'
},

Mongodb array of documents vs array of objects

Can you tell me when array of documents should be used and when array of objects should be used ?
By array of objects I'm assuming you mean ObjectId's AKA references to other collections, since a document is just a JSON object anyway.
The basic paradigm of data modeling is to embed whenever possible. If your collection references a finite number such a user's list of phone numbers like this - you definitely want to embed.
{
phone_numbers: [
{
type: "mobile",
number: "(123)456-7890"
},
{
type: "home",
number: "(456)789-0123"
}
]
}
If you are referencing a 1 <--> Many or 1 <--> Very Many collection, that is when you want to use references such as messages sent/received to a user.
{
from: ObjectId, // Reference to ObjectId of the sender
to: [], // Array of ObjectId references
message: String,
date: Date
}
I highly advise reading here:
https://docs.mongodb.com/v3.2/core/data-model-design/

MongoDB, positional argument, two levels deep - still not possible?

I have a collection with the following structure:
event_id: "id",
votedUp: 0,
votedDown: 0,
questions: [
{
title: "title"
desc: "desc"
type: "TEXT"
answers: []
},
{
title: "title"
desc: "desc"
type: "MULTIPLE"
answers: [
{
value: "ans1",
answered: 0
},
{
value: "ans2",
answered: 2
},
]
}
]
Important thing here: question doesn't make any sense without an Event, so it can't have it's own ID - they are identified by their properties within given Event.
This is some kind of questionnaire, mixed a little bit with survey.
I have two use cases:
User completes questionnaire.
For TEXT questions, I need to push submitted answers.
For multi/single choice I need to increment 'answered' field.
How can I achieve this WITHOUT changing the model using mongodb? I've found this issue and I'm quite terrified. It seems like there's no way to do this with one query.
Is it still the case? Any workarounds?
The only idea I have is to retrieve document from database, edit it and save again, but since mongo has nothing to do with transactions, it could be dangerous...

Mongodb querying document with linked id

I have a document that has an id of another document from a different collection embedded in it.
My desired result is to return (I'm using python and pymongo) the all the fields of the first collection, and all of the friends from the document that was embedded.
I understand mongo doesn't do joins and I understand I'll need to make two queries. I also don't want to duplicate my data.
My question is how to piece the two queries together in python/pymongo so I have one results with all the fields from both documents in it.
Here is what my data looks like:
db.employees
{_id: ObjectId("4d85c7039ab0fd70a117d733"), name: 'Joe Smith', title: 'junior',
manager: ObjectId("4d85c7039ab0fd70a117d730") }
db.managers
{_id: ObjectId("ObjectId("4d85c7039ab0fd70a117d730"), name: 'Jane Doe', title: 'senior manager'}
desired result
x = {_id: ObjectId("4d85c7039ab0fd70a117d733"), name: 'Joe Smith', title: 'junior',
manager: 'Jane Doe' }
Your basically doing something that Mongo does not support out of the box and would actually be more painful than using the two records separately.
Basically (in pseudo/JS code since I am not a Python programmer):
var emp = db.employees.find({ name: 'Joe Smith' });
var mang = db.managers.find({ _id: emp._id });
And there you have it you have the two records separately. You cannot chain as #slownage shows and get a merged result, or any result infact since MongoDB, from one qauery, will actually return a dictionary not the value of the field even when you specify only one field to return.
So this is really the only solution, to get the the two separately and then work on them.
Edit
You could use a DBRef here but it is pretty much the same as doing it this way, only difference is that it is a helper to put it into your own model instead of doing it youself.
If it works it should be something like:
db.managers.find({
'_id' => db->employees->find({ ('_id' : 1),
('_id': ObjectId("4d85c7039ab0fd70a117d733") }))
})
updated

Ways to implement data versioning in MongoDB

Can you share your thoughts how would you implement data versioning in MongoDB. (I've asked similar question regarding Cassandra. If you have any thoughts which db is better for that please share)
Suppose that I need to version records in an simple address book. (Address book records are stored as flat json objects). I expect that the history:
will be used infrequently
will be used all at once to present it in a "time machine" fashion
there won't be more versions than few hundred to a single record.
history won't expire.
I'm considering the following approaches:
Create a new object collection to store history of records or changes to the records. It would store one object per version with a reference to the address book entry. Such records would looks as follows:
{
'_id': 'new id',
'user': user_id,
'timestamp': timestamp,
'address_book_id': 'id of the address book record'
'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...}
}
This approach can be modified to store an array of versions per document. But this seems to be slower approach without any advantages.
Store versions as serialized (JSON) object attached to address book entries. I'm not sure how to attach such objects to MongoDB documents. Perhaps as an array of strings.
(Modelled after Simple Document Versioning with CouchDB)
The first big question when diving in to this is "how do you want to store changesets"?
Diffs?
Whole record copies?
My personal approach would be to store diffs. Because the display of these diffs is really a special action, I would put the diffs in a different "history" collection.
I would use the different collection to save memory space. You generally don't want a full history for a simple query. So by keeping the history out of the object you can also keep it out of the commonly accessed memory when that data is queried.
To make my life easy, I would make a history document contain a dictionary of time-stamped diffs. Something like this:
{
_id : "id of address book record",
changes : {
1234567 : { "city" : "Omaha", "state" : "Nebraska" },
1234568 : { "city" : "Kansas City", "state" : "Missouri" }
}
}
To make my life really easy, I would make this part of my DataObjects (EntityWrapper, whatever) that I use to access my data. Generally these objects have some form of history, so that you can easily override the save() method to make this change at the same time.
UPDATE: 2015-10
It looks like there is now a spec for handling JSON diffs. This seems like a more robust way to store the diffs / changes.
There is a versioning scheme called "Vermongo" which addresses some aspects which haven't been dealt with in the other replies.
One of these issues is concurrent updates, another one is deleting documents.
Vermongo stores complete document copies in a shadow collection. For some use cases this might cause too much overhead, but I think it also simplifies many things.
https://github.com/thiloplanz/v7files/wiki/Vermongo
Here's another solution using a single document for the current version and all old versions:
{
_id: ObjectId("..."),
data: [
{ vid: 1, content: "foo" },
{ vid: 2, content: "bar" }
]
}
data contains all versions. The data array is ordered, new versions will only get $pushed to the end of the array. data.vid is the version id, which is an incrementing number.
Get the most recent version:
find(
{ "_id":ObjectId("...") },
{ "data":{ $slice:-1 } }
)
Get a specific version by vid:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } } }
)
Return only specified fields:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } }, "data.content":1 }
)
Insert new version: (and prevent concurrent insert/update)
update(
{
"_id":ObjectId("..."),
$and:[
{ "data.vid":{ $not:{ $gt:2 } } },
{ "data.vid":2 }
]
},
{ $push:{ "data":{ "vid":3, "content":"baz" } } }
)
2 is the vid of the current most recent version and 3 is the new version getting inserted. Because you need the most recent version's vid, it's easy to do get the next version's vid: nextVID = oldVID + 1.
The $and condition will ensure, that 2 is the latest vid.
This way there's no need for a unique index, but the application logic has to take care of incrementing the vid on insert.
Remove a specific version:
update(
{ "_id":ObjectId("...") },
{ $pull:{ "data":{ "vid":2 } } }
)
That's it!
(remember the 16MB per document limit)
If you're looking for a ready-to-roll solution -
Mongoid has built in simple versioning
http://mongoid.org/en/mongoid/docs/extras.html#versioning
mongoid-history is a Ruby plugin that provides a significantly more complicated solution with auditing, undo and redo
https://github.com/aq1018/mongoid-history
I worked through this solution that accommodates a published, draft and historical versions of the data:
{
published: {},
draft: {},
history: {
"1" : {
metadata: <value>,
document: {}
},
...
}
}
I explain the model further here: http://software.danielwatrous.com/representing-revision-data-in-mongodb/
For those that may implement something like this in Java, here's an example:
http://software.danielwatrous.com/using-java-to-work-with-versioned-data/
Including all the code that you can fork, if you like
https://github.com/dwatrous/mongodb-revision-objects
If you are using mongoose, I have found the following plugin to be a useful implementation of the JSON Patch format
mongoose-patch-history
Another option is to use mongoose-history plugin.
let mongoose = require('mongoose');
let mongooseHistory = require('mongoose-history');
let Schema = mongoose.Schema;
let MySchema = Post = new Schema({
title: String,
status: Boolean
});
MySchema.plugin(mongooseHistory);
// The plugin will automatically create a new collection with the schema name + "_history".
// In this case, collection with name "my_schema_history" will be created.
I have used the below package for a meteor/MongoDB project, and it works well, the main advantage is that it stores history/revisions within an array in the same document, hence no need for an additional publications or middleware to access change-history. It can support a limited number of previous versions (ex. last ten versions), it also supports change-concatenation (so all changes happened within a specific period will be covered by one revision).
nicklozon/meteor-collection-revisions
Another sound option is to use Meteor Vermongo (here)