Model tree structure in mongo - mongodb

I'm looking to build a Tree structure for an apps where I'll have to store multiple Tree for each user.
Each tree will be composed of documents for each node :
{
_id:1,
user_id:12345,
category_id:6789,
data:[]
}
So when I need to, I can access these data with a query looking for user_id and category_id
I was looking on mongoDB Docs and found that :
https://docs.mongodb.com/manual/applications/data-models-tree-structures/
Which is pretty interesting, but I have a few questions
Considering that I'll only search for full tree, which solutions is better?
Are child references better, or other structure might do the work better?
And if I use child's references, what is the best way to get all the tree?
Can it be done with a single request or do I have to recursively search for each child ?
I know I could get all docs in one query and kind of build the tree from there, is that a good idea?
EDIT:
So I tried with that:
[{
_id:1,
user_id:12345,
category_id:6789,
data:{name:"root"},
parent:null,
childs:[2,3]
},
{
_id:2,
user_id:12345,
category_id:6789,
data:{name:"child1"},
parent:1,
childs:[]
},
{
_id:3,
user_id:12345,
category_id:6789,
data:{name:"child2"},
parent:1,
childs:[4]
},
{
_id:4,
user_id:12345,
category_id:6789,
data:{name:"child2_1"},
parent:3,
childs:[]
}]
With both parent and children, I can easily find leaves and root when building the tree back. (I chosed to build it in the client app, and query the full tree at once)
The fact is I don't really use parent for now, so it looks "Overkill" to get a reference to the parents, but the query is fast enough, it just take some extra space. Maybe a simple "root" boolean could be better ? I really need some kind of advice with that.
I'm still up to some improvements, I'd like to get this working really fast because each users will have 0 to n tree with 0 to n nodes, and I don't want to mess the data structure for that.

This article shows a similar kind of solution : How to query tree structure recursively with MongoDB?
Other wise parent in consideration, you can query out in this way to convert it to tree
function list_to_tree(list) {
const map1 = new Map();
var node,
roots = [],
i;
for (i = 0; i < list.length; i += 1) {
map1.set(list[i]._id.toString(), i); // initialize the map
list[i].childs = []; // initialize the children
}
for (i = 0; i < list.length; i += 1) {
node = list[i];
if (node.parent) {
list[map1.get(node.parent.toString())].childs.push(node);
} else {
roots.push(node);
}
}
let newList = [];
for (let z = 0; z < roots.length; z++) {
newList.push(list.filter((x) => x._id === roots[z]._id)[0]);
}
return newList;
}

Related

Mongo - Replace references with embedded documents

I have a Collection with a nested attribute that is an array of ObjectId References. These refer to documents in another Collection.
I'd like to replace these references with the documents themselves, i.e. embed those documents where the references are now. I've tried with and without the .snapshot() option. This may be caused because I'm updating a document while in a loop on that doc, and .snapshot() isn't available at that level.
My mongo-fu is low and I'm stuck on a call stack error. How can I do this?
Example code:
db.CollWithReferences.find({}).snapshot().forEach( function(document) {
var doc_id = document._id;
document.GroupsOfStuff.forEach( function(Group) {
var docsToEmbed= db.CollOfThingsToEmbed.find({ _id: { $in: Group.ArrayOfReferenceObjectIds }});
db.CollWithReferences.update({"_id": ObjectId(doc_id) },
{$set: {"Group.ArrayOfReferenceObjectIds ":docsToEmbed}} )
});
});
Gives this error:
{
"message" : "Maximum call stack size exceeded",
"stack" : "RangeError: Maximum call stack size exceeded" +
....}
I figure this is happening for one of two reasons. Either you are running out of memory by executing two queries in a for loop, or the update operation is being executed before the find operation has finished.
Either way, it is not a good idea to execute too many queries in a for loop as it can lead to this type of error.
I can't be sure if this will fix your problem as I don't know how many documents are in your collections, but it may work if you first get all documents from the CollWithReferences collection, then all you need from the CollOfThingsToEmbed collection. Then build a map of an _id from the CollOfThingsToEmbed collection to the actual document that corresponds to that. You can then loop through each document you got from the CollWithReferences collection, and mutate the groupsOfStuff array by accessing each ArrayOfReferenceObjectIds array and setting the ObjectId to the value that you have in the map you already built up, which will be the whole document. Then just update that document by setting GroupsOfSuff to its mutated value.
The following JavaScript code will do this (it could be organised better to have no logic in the global scope etc.):
var references = db.CollWithReferences.find({});
function getReferenceIds(references) {
var referenceIds = [];
for (var i = 0; i < references.length; i++) {
var group = references[i].GroupsOfStuff;
for (let j = 0; j < group.ArrayOfReferenceObjectIds; j++) {
referenceIds.push(group.ArrayOfReferenceObjectIds[j]);
}
}
return referenceIds;
}
function buildIdMap(docs) {
var map = {};
for (var i = 0; i < docs.length; i++) {
map[docs[i]._id.toString()] = docs[i];
}
return map;
}
var referenceIds = getReferenceIds(references);
var docsToEmbed = db.CollOfThingsToEmbed.find({_id: {$in: referenceIds}});
var idMap = buildIdMap(docsToEmbed);
for (var i = 0; i < references.length; i++) {
var groups = references[i].GroupsOfStuff;
for (var j = 0; j < groups.length; j++) {
refs = groups[j].ArrayOfReferenceObjectIds;
refs.forEach(function(ref) {
ref = idMap[ref.toString()];
});
}
db.CollWithReferences.update({
_id: ObjectId(ref._id)
}, {
$set: {GroupsOfStuff: groups}
});
}
It would be better if it was possible to just do one bulk update, but as each document needs to be updated differently, this is not possible.

mapreduce between consecutive documents

Setup:
I got a large collection with the following entries
Name - String
Begin - time stamp
End - time stamp
Problem:
I want to get the gaps between documents, Using the map-reduce paradigm.
Approach:
I'm trying to set a new collection of pairs mid, after that I can compute differences from it using $unwind and Pair[1].Begin - Pair[0].End
function map(){
emit(0, this)
}
function reduce(){
var i = 0;
var pairs = [];
while ( i < values.length -1){
pairs.push([values[i], values[i+1]]);
i = i + 1;
}
return {"pairs":pairs};
}
db.collection.mapReduce(map, reduce, sort:{begin:1}, out:{replace:"mid"})
This works with limited number of document because of the 16MB document cap. I'm not sure if I need to get the collection into memory and doing it there, How else can I approach this problem?
The mapReduce function of MongoDB has a different way of handling what you propose than the method you are using to solve it. The key factor here is "keeping" the "previous" document in order to make the comparison to the next.
The actual mechanism that supports this is the "scope" functionality, which allows a sort of "global" variable approach to use in the overall code. As you will see, what you are asking when that is considered takes no "reduction" at all as there is no "grouping", just emission of document "pair" data:
db.collection.mapReduce(
function() {
if ( last == null ) {
last = this;
} else {
emit(
{
"start_id": last._id,
"end_id": this._id
},
this.Begin - last.End
);
last = this;
}
},
function() {}, // no reduction required
{
"out": { "inline": 1 },
"scope": { "last": null }
}
)
Out with a collection as the output as required to your size.
But this way by using a "global" to keep the last document then the code is both simple and efficient.

Relationship Mapping in Mongodb

I am building a game engine on Meteor JS and trying to create a way to link together a number of collections. The current 'schema' looks like this:
GameCollection = { <meta> } //This is a Collection (a Meteor MongoDB document)
Scene = {gameId: _id, <other resource ids and meta>} //This is a Collection
The issue is I need to create a map from one scene to anther. These paths needs to fork and merge easily. I am getting the feeling that I should be using a graph/triple database to represent this but I want to say within "Meteor's magic" and that means normal MongoDB Collections. If someone has a simple to use alternative I would still like to hear it, but I would prefer a Meteor-esk pattern. Pushes in the right direction would also be great!
I have three specific needs:
If I am at this scene what scene or scenes do I lead to.
If I am at this scene then give me the ids of all scene x number of steps into the future. Where 'x' is a variable (so I can send the lot of them down to the client)
Count and give me all possible paths so I can give a visual representation of the game.
What I am specifically looking for is: is a graph database what I am looking and if not what schema pattern should I use with mongoDB?
UPDATE:
I have confirmed that neo4j will do what I need from a logical standpoint. But I would lose the benefit of working with Meteor Collections. This means losing reactivity which in turn breaks my live collaborative model. I really need a MongoDB alternative.
UPDATE 2:
I ended up trying to stick the relationship inside of the GameCollection. It seems to be working but I would like a cleaner way if possible.
map: [ { //an array of objects (relations)
id: _id //key to a Scene
toKey: _id //leads to scene; toKey is ether 'next' or some num [0..n] for multi paths
}]
So I ended up going the denormalization route. I put an array of scenes into the GameCollection.
scenes: [{ id: random_id, next: 'next_id' || [next_ids], <other resource ids and meta> }]
Then I built this monster:
getScene = function (scenes, id) {
return _.find(scenes, function (scene) {
return (scene.id == id)
})
}
getNext = function (scene) {
if (!scene) { return null }
if (scene.type == 'dialogue') {
return scene.next
}
if (scene.type == 'choice') {
return _.pluck(scene.next, 'id')
}
}
scenesDive = function (list, next, container, limit, depth) {
if (!depth) {
depth = 0
}
myDepth = depth + 1
var scene = getScene(list, next)
if (container.indexOf(scene) != -1) { return } //This path has already been added. Go back up.
container.push(scene)
if (myDepth == limit) { return } //Don't dive deeper then this depth.
var nextAoS = getNext(scene) //DIVE! (array or string)
if (_.isArray(nextAoS)) {
nextAoS.forEach(function (n) {
scenesDive(list, n, container, limit, myDepth)
});
} else {
scenesDive(list, nextAoS, container, limit, myDepth)
}
}
I am sure there is a better way but this is what I am going with for now.

Are DBRefs supported in Meteor yet? [duplicate]

I'm using meteor 0.3.7 in Win7(32) and trying to create a simple logging system using 2 MongoDB collections to store data that are linked by DBRef.
The current pseudo schema is :
Users {
username : String,
password : String,
created : Timestamp,
}
Logs {
user_id : DBRef {$id, $ref}
message : String
}
I use server methods to insert the logs so I can do some upserts on the clients collection.
Now I want to do an old "left join" and display a list of the last n logs with the embedded User name.
I don't want to embed the Logs in Users because the most used operation is getting the last n logs. Embedding in my opinion was going to have a big impact in performance.
What is the best approach to achieve this?
Next it was great if possible to edit the User name and all items change theis name
Regards
Playing around with Cursor.observe answered my question. It may not be the most effective way of doing this, but solves my future problems of derefering DBRefs "links"
So for the server we need to publish a special collection. One that can enumerate the cursor and for each document search for the corresponding DBRef.
Bare in mind this implementation is hardcoded and should be done as a package like UnRefCollection.
Server Side
CC.Logs = new Meteor.Collection("logs");
CC.Users = new Meteor.Collection("users");
Meteor.publish('logsAndUsers', function (page, size) {
var self = this;
var startup = true;
var startupList = [], uniqArr = [];
page = page || 1;
size = size || 100;
var skip = (page - 1) * size;
var cursor = CC.Logs.find({}, {limit : size, skip : skip});
var handle = cursor.observe({
added : function(doc, idx){
var clone = _.clone(doc);
var refId = clone.user_id.oid; // showld search DBRefs
if (startup){
startupList.push(clone);
if (!_.contains(uniqArr, refId))
uniqArr.push(refId);
} else {
// Clients added logs
var deref = CC.Users.findOne({_id : refid});
clone.user = deref;
self.set('logsAndUsers', clone._id, clone);
self.flush();
}
},
removed : function(doc, idx){
self.unset('logsAndUsers', doc._id, _.keys(doc));
self.flush();
},
changed : function(new_document, idx, old_document){
var set = {};
_.each(new_document, function (v, k) {
if (!_.isEqual(v, old_document[k]))
set[k] = v;
});
self.set('logsAndUsers', new_document._id, set);
var dead_keys = _.difference(_.keys(old_document), _.keys(new_document));
self.unset('logsAndUsers', new_document._id, dead_keys);
self.flush();
},
moved : function(document, old_index, new_index){
// Not used
}
});
self.onStop(function(){
handle.stop();
});
// Deref on first Run
var derefs = CC.Users.find({_id : {$in : uniqArr} }).fetch();
_.forEach(startupList, function (item){
_.forEach(derefs, function(ditems){
if (item["user_id"].oid === ditems._id){
item.user = ditems;
return false;
}
});
self.set('logsAndUsers', item._id, item);
});
delete derefs; // Not needed anymore
startup = false;
self.complete();
self.flush();
});
For each added logs document it'll search the users collection and try to add to the logs collection the missing information.
The added function is called for each document in the logs collection in the first run I created a startupList and an array of unique users ids so for the first run it'll query the db only once. Its a good idea to put a paging mechanism to speed up things.
Client Side
On the client, subscribe to the logsAndUsers collection, if you want to make changes do it directly to the Logs collection.
LogsAndUsers = new Meteor.collection('logsAndUser');
Logs = new Meteor.colection('logs'); // Changes here are observed in the LogsAndUsers collection
Meteor.autosubscribe(function () {
var page = Session.get('page') || 1;
Meteor.subscribe('logsAndUsers', page);
});
Why not just also store the username in the logs collection as well?
Then you can query on them directly without needing any kind of "join"
If for some reason you need to be able to handle that username change, you just fetch the user object by name, then query on Logs with { user_id : user._id }

Meteor and DBRefs

I'm using meteor 0.3.7 in Win7(32) and trying to create a simple logging system using 2 MongoDB collections to store data that are linked by DBRef.
The current pseudo schema is :
Users {
username : String,
password : String,
created : Timestamp,
}
Logs {
user_id : DBRef {$id, $ref}
message : String
}
I use server methods to insert the logs so I can do some upserts on the clients collection.
Now I want to do an old "left join" and display a list of the last n logs with the embedded User name.
I don't want to embed the Logs in Users because the most used operation is getting the last n logs. Embedding in my opinion was going to have a big impact in performance.
What is the best approach to achieve this?
Next it was great if possible to edit the User name and all items change theis name
Regards
Playing around with Cursor.observe answered my question. It may not be the most effective way of doing this, but solves my future problems of derefering DBRefs "links"
So for the server we need to publish a special collection. One that can enumerate the cursor and for each document search for the corresponding DBRef.
Bare in mind this implementation is hardcoded and should be done as a package like UnRefCollection.
Server Side
CC.Logs = new Meteor.Collection("logs");
CC.Users = new Meteor.Collection("users");
Meteor.publish('logsAndUsers', function (page, size) {
var self = this;
var startup = true;
var startupList = [], uniqArr = [];
page = page || 1;
size = size || 100;
var skip = (page - 1) * size;
var cursor = CC.Logs.find({}, {limit : size, skip : skip});
var handle = cursor.observe({
added : function(doc, idx){
var clone = _.clone(doc);
var refId = clone.user_id.oid; // showld search DBRefs
if (startup){
startupList.push(clone);
if (!_.contains(uniqArr, refId))
uniqArr.push(refId);
} else {
// Clients added logs
var deref = CC.Users.findOne({_id : refid});
clone.user = deref;
self.set('logsAndUsers', clone._id, clone);
self.flush();
}
},
removed : function(doc, idx){
self.unset('logsAndUsers', doc._id, _.keys(doc));
self.flush();
},
changed : function(new_document, idx, old_document){
var set = {};
_.each(new_document, function (v, k) {
if (!_.isEqual(v, old_document[k]))
set[k] = v;
});
self.set('logsAndUsers', new_document._id, set);
var dead_keys = _.difference(_.keys(old_document), _.keys(new_document));
self.unset('logsAndUsers', new_document._id, dead_keys);
self.flush();
},
moved : function(document, old_index, new_index){
// Not used
}
});
self.onStop(function(){
handle.stop();
});
// Deref on first Run
var derefs = CC.Users.find({_id : {$in : uniqArr} }).fetch();
_.forEach(startupList, function (item){
_.forEach(derefs, function(ditems){
if (item["user_id"].oid === ditems._id){
item.user = ditems;
return false;
}
});
self.set('logsAndUsers', item._id, item);
});
delete derefs; // Not needed anymore
startup = false;
self.complete();
self.flush();
});
For each added logs document it'll search the users collection and try to add to the logs collection the missing information.
The added function is called for each document in the logs collection in the first run I created a startupList and an array of unique users ids so for the first run it'll query the db only once. Its a good idea to put a paging mechanism to speed up things.
Client Side
On the client, subscribe to the logsAndUsers collection, if you want to make changes do it directly to the Logs collection.
LogsAndUsers = new Meteor.collection('logsAndUser');
Logs = new Meteor.colection('logs'); // Changes here are observed in the LogsAndUsers collection
Meteor.autosubscribe(function () {
var page = Session.get('page') || 1;
Meteor.subscribe('logsAndUsers', page);
});
Why not just also store the username in the logs collection as well?
Then you can query on them directly without needing any kind of "join"
If for some reason you need to be able to handle that username change, you just fetch the user object by name, then query on Logs with { user_id : user._id }