MongoDB, positional argument, two levels deep - still not possible? - mongodb

I have a collection with the following structure:
event_id: "id",
votedUp: 0,
votedDown: 0,
questions: [
{
title: "title"
desc: "desc"
type: "TEXT"
answers: []
},
{
title: "title"
desc: "desc"
type: "MULTIPLE"
answers: [
{
value: "ans1",
answered: 0
},
{
value: "ans2",
answered: 2
},
]
}
]
Important thing here: question doesn't make any sense without an Event, so it can't have it's own ID - they are identified by their properties within given Event.
This is some kind of questionnaire, mixed a little bit with survey.
I have two use cases:
User completes questionnaire.
For TEXT questions, I need to push submitted answers.
For multi/single choice I need to increment 'answered' field.
How can I achieve this WITHOUT changing the model using mongodb? I've found this issue and I'm quite terrified. It seems like there's no way to do this with one query.
Is it still the case? Any workarounds?
The only idea I have is to retrieve document from database, edit it and save again, but since mongo has nothing to do with transactions, it could be dangerous...

Related

Mongo error 16996 during aggregation - too large document produced

I am parsing Wikipedia dumps in order to play with the link-oriented metadata. One of the collections is named articles and it is in the following form:
{
_id : "Tree",
id: "18955875",
linksFrom: " [
{
name: "Forest",
count: 6
},
[...]
],
categories: [
"Trees",
"Forest_ecology"
[...]
]
}
The linksFrom field stores all articles this article points to, and how many times that happens. Next, I want to create another field linksTo with all the articles that point to this article. In the beginning, I went through the whole collection and updated every article, but since there's lots of them it takes too much time. I switched to aggregation for performance purposes and tried it on a smaller set - works like a charm and is super fast in comparison with the older method. The aggregation pipeline is as follows:
db.runCommand(
{
aggregate: "articles",
pipeline : [
{
$unwind: "$linksFrom"
},
{
$sort: { "linksFrom.count": -1 }
},
{
$project:
{
name: "$_id",
linksFrom: "$linksFrom"
}
},
{
$group:
{
_id: "$linksFrom.name",
linksTo: { $push: { name: "$name", count: { $sum : "$linksFrom.count" } } },
}
},
{
$out: "TEMPORARY"
}
] ,
allowDiskUse: true
}
)
However, on a large dataset being the english Wikipedia I get the following error after a few minutes:
{
"ok" : 0,
"errmsg" : "insert for $out failed: { connectionId: 24, err: \"BSONObj size: 24535193 (0x1766099) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: \"United_States\"\", code: 10334, n: 0, ok: 1.0 }",
"code" : 16996
}
I understand that there are too many articles, which link to United_States article and the corresponding document's size grows above 16MB, currently almost 24MB. Unfortunately, I cannot even check if that's the case (error messages sometimes tend to lie)... Because of that, I'm trying to change the model so that the relationship between articles is stored with IDs rather than long names but I'm afraid that might not be enough - especially because my plan is to merge the two collections for every article later...
The question is: does anyone have a better idea? I don't want to try to increase the limit, I'm rather thinking about a different approach of storing this data in the database.
UPDATE after comment by Markus
Markus is correct, I am using a SAX parser and, as a matter of fact, I'm already storing all the links in a similar way. Apart from articles I have three more collections - one with links and two others, labels and stemmed-labels. The first one stores all links that occur in the dump in the following way:
{
_id : "tree",
stemmedName: "tree",
targetArticle: "Christmas_tree"
}
_id stores the text that is used to represent a given link, stemmedName represents stemmed _id and targetArticle marks what article this text pointed to. I'm in the middle of adding sourceArticle to this one, because it's obviously a good idea.
The second collection labels contains documents as follows:
{
_id : "tree",
targetArticles: [
{
name: "Christmas_tree",
count: 1
},
{
name: "Tree",
count: 166
}
[...]
]
}
The third stemmed-labels is analogous to the labels with its _id being a stemmed version of the root label.
So far, the first collection links serves as a baseline for the two other collections. I group the labels together by their name so that I only do one lookup for every phrase and then I can immiedately get all target articles with one query. Then I use the articles and labels collections in order to:
Look for label with a given name.
Get all articles it might
point to.
Compare the incoming and outcoming links for these
articles.
This is where the main question comes. I thought that it's better if I store all possible articles for a given phrase in one document rather than leave them scattered in the links collection. Only now did it occur to me, that - as long as the lookups are indexed - the overall performance might be the same for one big document or many smaller ones! Is this a correct assumption?
I think your data model is wrong. It may well be (albeit a bit theoretical) that individual articles (let's stick with the wikipedia example) are linked more often than you could store in a document. Embedding only works with One-To(-Very)-Few™ relationships.
So basically, I think you should change your model. I will show you how I would do it.
I will use the mongoshell and JavaScript in this example, since it is the lingua franca. You might need to translate accordingly.
The questions
Lets begin with the questions you want to have answered:
For a given article, which other articles link to that article?
For a given article, to which other articles does that article link to?
For a given article, how many articles link to it?
Optional: For a given article, to how many articles does it link to?
The crawling
What I would do basically is to implement a SAX parser on the articles, creating a new document for each article link you encounter. The document itself should be rather simple:
{
"_id": new ObjectId(),
// optional, for recrawling or pointing out a given state
"date": new ISODate(),
"article": wikiUrl,
"linksTo": otherWikiUrl
}
Note that you should not do an insert, but an upsert. The reason for this is that we do not want to document the number of links, but the articles linked to. If we did an insert, the same combination of article and linksTocould occur multiple times.
So our statement when encountering a link would look like this for example:
db.links.update(
{ "article":"HMS_Warrior_(1860)", "linksTo":"Royal_Navy" },
{ "date": new ISODate(), "article":"HMS_Warrior_(1860)", "linksTo":"Royal_Navy" },
{ upsert:true }
)
Answering the questions
As you might already guess, answering the questions becomes pretty straightforward now. I have use the following statements for creating a few documents:
db.links.update(
{ "article":"HMS_Warrior_(1860)", "linksTo":"Royal_Navy" },
{ "date": new ISODate(), "article":"HMS_Warrior_(1860)", "linksTo":"Royal_Navy" },
{ upsert:true }
)
db.links.update(
{ "article":"Royal_Navy", "linksTo":"Mutiny_on_the_Bounty" },
{ "date":new ISODate(), "article":"Royal_Navy", "linksTo":"Mutiny_on_the_Bounty" },
{ upsert:true }
)
db.links.update(
{ "article":"Mutiny_on_the_Bounty", "linksTo":"Royal_Navy"},
{ "date":new ISODate(), "article":"Mutiny_on_the_Bounty", "linksTo":"Royal_Navy" },
{ upsert:true }
)
For a given article, which other articles link to that article?
We found out that we should not use an aggregation, since that might exceed the size limit. But we don't have to. We simply use a cursor and gather the results:
var toLinks =[]
var cursor = db.links.find({"linksTo":"Royal_Navy"},{"_id":0,"article":1})
cursor.forEach(
function(doc){
toLinks.push(doc.article);
}
)
printjson(toLinks)
// Output: [ "HMS_Warrior_(1860)", "Mutiny_on_the_Bounty" ]
For a given article, to which other articles does that article link to?
This works pretty much like the first question – we basically only change the query:
var fromLinks = []
var cursor = db.links.find({"article":"Royal_Navy"},{"_id":0,"linksTo":1})
cursor.forEach(
function(doc){
fromLinks.push(doc.linksTo)
}
)
printjson(fromLinks)
// Output: [ "Mutiny_on_the_Bounty" ]
For a given article, how many articles link to it?
It should be obvious that in case you already have answered question 1, you could simply check toLinks.length. But let's assume you haven't. There are two other ways of doing this
Using .count()
You can use this method on replica sets. On sharded clusters, this doesn't work well. But it is easy:
db.links.find({ "linksTo":"Royal_Navy" }).count()
// Output: 2
Using an aggregation
This works on any environment and isn't much more complicated:
db.links.aggregate([
{ "$match":{ "linksTo":"Royal_Navy" }},
{ "$group":{ "_id":"$linksTo", "isLinkedFrom":{ "$sum":1 }}}
])
// Output: { "_id" : "Royal_Navy", "isLinkedFrom" : 2 }
Optional: For a given article, to how many articles does it link to?
Again, you can answer this question by reading the length of the array from question 2 of use the .count()method. The aggregation again is simple
db.links.aggregate([
{ "$match":{ "article":"Royal_Navy" }},
{ "$group":{ "_id":"$article", "linksTo":{ "$sum":1 }}}
])
// Output: { "_id" : "Royal_Navy", "linksTo" : 1 }
Indices
As for the indices, I haven't really checked them, but individual indices on the fields is probably what you want:
db.links.createIndex({"article":1})
db.links.createIndex({"linksTo":1})
A compound index will not help much, since order matters and we do no always ask for the first field. So this is probably as optimized as it can get.
Conclusion
We are using an extremely simple, scalable model and rather simple queries and aggregations to get the questions answered you have to the data.

Meteor Collection: find element in array

I have no experience with NoSQL. So, I think, if I just try to ask about the code, my question can be incorrect. Instead, let me explain my problem.
Suppose I have e-store. I have catalogs
Catalogs = new Mongo.Collection('catalogs);
and products in that catalogs
Products = new Mongo.Collection('products');
Then, people add there orders to temporary collection
Order = new Mongo.Collection();
Then, people submit their comments, phone, etc and order. I save it to collection Operations:
Operations.insert({
phone: "phone",
comment: "comment",
etc: "etc"
savedOrder: Order //<- Array, right? Or Object will be better?
});
Nice, but when i want to get stats by every product, in what Operations product have used. How can I search thru my Operations and find every operation with that product?
Or this way is bad? How real pro's made this in real world?
If I understand it well, here is a sample document as stored in your Operation collection:
{
clientRef: "john-001",
phone: "12345678",
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 1, itemRef: "XYZ001", unitPriceInCts: 1050, desc: "USB Pen Drive 8G" },
{ qty: 1, itemRef: "ABC002", unitPriceInCts: 19995, desc: "Entry level motherboard" },
]
}
},
{
clientRef: "paul-002",
phone: null,
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 3, itemRef: "XYZ001", unitPriceInCts: 950, desc: "USB Pen Drive 8G" },
]
}
},
Given that, to find all operations having item reference XYZ001 you simply have to query:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"})
This will return the whole document. If instead you are only interested in the client reference (and operation _id), you will use a projection as an extra argument to find:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"}, {"clientRef": 1})
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "clientRef" : "john-001" }
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c17a"), "clientRef" : "paul-002" }
If you need to perform multi-documents (incl. multi-embedded documents) operations, you should take a look at the aggregation framework:
For example, to calculate the total of an order:
> db.operations.aggregate([
{$match: { "_id" : ObjectId("556f07b5d5f2fb3f94b8c179") }},
{$unwind: "$savedOrder.lines" },
{$group: { _id: "$_id",
total: {$sum: {$multiply: ["$savedOrder.lines.qty",
"$savedOrder.lines.unitPriceInCts"]}}
}}
])
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "total" : 21045 }
I'm an eternal newbie, but since no answer is posted, I'll give it a try.
First, start by installing robomongo or a similar software, it will allow you to have a look at your collections directly in mongoDB (btw, the default port is 3001)
The way I deal with your kind of problem is by using the _id field. It is a field automatically generated by mongoDB, and you can safely use it as an ID for any item in your collections.
Your catalog collection should have a string array field called product where you find all your products collection items _id. Same thing for the operations: if an order is an array of products _id, you can do the same and store this array of products _id in your savedOrder field. Feel free to add more fields in savedOrder if necessary, e.g. you make an array of objects products with additional fields such as discount.
Concerning your queries code, I assume you will find all you need on the web as soon as you figure out what your structure is.
For example, if you have a product array in your savedorder array, you can pull it out like that:
Operations.find({_id: "your operation ID"},{"savedOrder.products":1)
Basically, you ask for all the products _id in a specific operation. If you have several savedOrders in only one operation, you can specify too the savedOrder _id, if you used the one you had in your local collection.
Operations.find({_id: "your_operation_ID", "savedOrder._id": "your_savedOrder_ID"},{"savedOrder.products":1)
ps: to bad-ass coders here, if I'm doing it wrong, please tell me.
I find an answer :) Of course, this is not a reveal for real professionals, but is a big step for me. Maybe my experience someone find useful. All magic in using correct mongo operators. Let solve this problem in pseudocode.
We have a structure like this:
Operations:
1. Operation: {
_id: <- Mongo create this unique for us
phone: "phone1",
comment: "comment1",
savedOrder: [
{
_id: <- and again
productId: <- whe should save our product ID from 'products'
name: "Banana",
quantity: 100
},
{
_id:,
productId: <- Another ID, that we should save if order
name: "apple",
quantity: 50
}
]
And if we want to know, in what Operation user take "banana", we should use mongoDB operator"elemMatch" in Mongo docs
db.getCollection('operations').find({}, {savedOrder: {$elemMatch:{productId: "f5mhs8c2pLnNNiC5v"}}});
In simple, we get documents our saved order have products with id that we want to find. I don't know is it the best way, but it works for me :) Thank you!

Inserting multiple nested objects with MongoDB

I'm very new to MongoDB so forgive me if this question isn't worded correctly. I know how to insert into the database, and I also know that I can have a nested object and know how to install that. I have:
Questions.insert({ Order:1, Question: "What type of property is it?",
Answers: { Order: 1, Answer: "House" }});
I hope from the above statement you can see that I'm aiming to try and insert multiple answers for this question (this may be where I'm going wrong, is this the right approach?). So looking at the above statement, I thought that I could insert multiple answers as such:
Questions.insert({ Order:1, Question: "What type of property is it?",
Answers: [{ Order: 1, Answer: "House" },
{ Order: 2, Answer: "Flat" },
{ Order: 3, Answer: "Bungalow" },
{ Order: 4, Answer: "Maisonette }]
});
SyntaxError: Unexpected token ILLEGAL
You are missing a " at the end of Maisonette which is where the error is coming from.
{ Order: 4, Answer: "Maisonette }]
Otherwise your query is on the right track for inserting embedded documents.
Your answers sub-document is kind of acting like an array. There are two possibilities you could use to store multiple answers in each question:
1) Just use an array:
Questions.insert({order : 1,
question : "What type of property is it?",
answers : [ "House", "Flat", "Bungalow", "Maisonette" ]
});
2) The way MongoDB will sometimes internally store arrays is to simply use an ordinal as the key to each sub-document, like so:
Questions.insert({order : 1,
question : "What type of property is it?",
answers : {"1" : "House",
"2" : "Flat",
"3" : "Bungalow",
"4" : "Maisonette"}
});

How to store related records in mongodb?

I have a number of associated records such as below.
Parent records
{
_id:234,
title: "title1",
name: "name1",
association:"assoc1"
},
Child record
{
_id:21,
title: "child title",
name: "child name1",
},
I want to store such records into MongoDb. Can anyone help?
Regards.
Even MongoDB doesn't support joins, you can organize data in several different ways:
1) First of all, you can inline(or embed) related documents. This case is useful, if you have some hierarchy of document, e.g. post and comments. In this case you can like so:
{
_id: <post_id>,
title: 'asdf',
text: 'asdf asdf',
comments: [
{<comment #1>},
{<comment #2>},
...
]
}
In this case, all related data will be in the save document. You can fetch it by one query, but pushing new comments to post cause moving this document on disk, frequent updates will increase disk load and space usage.
2) referencing is other technique you can use: in each document, you can put special field that contains _id of parent/related object:
{
_id: 1,
type: 'post',
title: 'asdf',
text: 'asdf asdf'
},
{
_id:2
type: 'comment',
text: 'yep!',
parent_id: 1
}
In this case you store posts and comments in same collection, therefor you have to store additional field type. MongoDB doesn't support constraints or any other way to check data constancy. This means that if you delete post with _id=1, comments with _id=2 store broken link in parent_id.
You can separate posts from comments in different collections or even databases by using database references, see your driver documentation for more details.
Both solutions can store tree-structured date, but in different way.

Mongo data modeling/updates for voting (up and down)

There is an example on voting data model/update queries in Mongo:
http://www.mongodb.org/display/DOCS/MongoDB+Data+Modeling+and+Rails#MongoDBDataModelingandRails-AtomicUpdates
However I need both up and down votes (basically, one person can either cast up vote or down vote). Also, I want for voter to be able to change his mind and change upvote to downvote or vice-versa (so the list of voters and total number does not fit).
What would be the best data model and corresponding update call?
I see two possibilities, either do a
'votes': [{ 'user_id' : ... , 'vote': ±1 }]
or
'upvoters': [...], 'downvoters': [...]
But I can't make an update query for the first one yet, and second one looks a bit weird (though it may be just me).
Seems much simpler to use the second schema.
Document: { name: "name",
upvoters: [name1, name2, etc],
downvoters: [name1, name2, etc],
}
To get total votes you can get the doc and use
doc.upvoters.length-doc.downvoters.length
(start each document with upvoters and downvoters arrays being [ ])
To record any upvote by User "x" on item "c" just do:
db.votes.update({name:"c"},{$addToSet:{upvotes:"x"},$pull:{downvotes:"x"}})
This is atomic and it has advantage of doing the same thing even if you run it 10 times.
It also spares you from having to check if "x" already voted for "c" and which way.
To record downvote just reverse it:
db.votes.update({name:"c"},{$addToSet:{downvotes:"x"},$pull:{upvotes:"x"}})
First schema looks like good. Second schema is hard because when user click upvote and than downvote you need add userId to 'upvoters' that to 'downvoters' and remove from 'upvoters' and vice versa.
I suppose votes it nestead collection of some document(suppose it questions).
db.questions.update({votes.userId: .. },{ $set : { votes.$.vote : 1 } });//upvote
db.questions.update({votes.userId: .. },{ $set : { votes.$.vote : -1 } });//down
And seems you need create extra field inside of questions collection to calculate sum of up/down votes:
db.questions.update({_id: .. },{ $inc : { votesCount : 1 } }); //up vote
db.questions.update({_id: .. },{ $inc : { votesCount : -1 } }); // down vote
If you need add new user to array of votes use
Possitional operator.