how to populate field of one collection with count query results of another collection? - mongodb

Kind of a complex one here and i'm pretty new to Mongo, so hopefully someone can help. I have a db of users. Each user has a state/province listed. I'm trying to create another collection of the total users in each state/province. Because users sign up pretty regularly, this will be an growing total i'm trying to generate and display on a map.
I'm able to query the database to find total number of users in a specific state, but i want to do this for all users and come out with a list of totals in all states/provinces and have a separate collection in the DB with all states/provinces listed and the field TOTAL to be dynamically populated with the count query of the other collection. But i'm not sure how to have a query be the result of a field in another collection.
used this to get users totals:
db.users.aggregate([
{"$group" : {_id:"$state", count:{$sum:1}}}
])
My main question is how to make the results of a query the value of a field in each corresponding record in another collection. Or if that's even possible.
Thanks for any help or guidance.

Looks like that On-Demand Materialized Views (just added on version 4.2 of MongoDB) should solve your problem!
You can create an On-Demand Materialized View using the $merge operator.
A possible definition of the Materialized View could be:
updateUsersLocationTotal = function() {
db.users.aggregate( [
{ $match: { <if you need to perform a match, like $state, otherwise remove it> } },
{ $group: { _id:"$state", users_quantity: { $sum: 1} } },
{ $merge: { into: "users_total", whenMatched: "replace" } }
] );
};
And then you perform updates just by calling updateUsersLocationTotal()
After that you can query the view just like a normal collection, using db.users_total.find() or db.users_total.aggregate().

Related

MongoDB , how to choose the right design?

I have to create a collection of document and I have a doubt about the right design.
Each document is an "identity"; each Identity has a list of "partner Data"; each partner data are defined by an ID and a set of Data.
One approach can be (1):
{
_id: ...
partners: [
{
id: partner1,
data: {
}
},
{
id: partner2,
data: {
}
},
]
}
Another approach can be (2)
{
_id: ...
partners: {
partner1: {
data: {
}
},
partner2: {
data: {
}
},
]
}
I prefer the first one, but considering that I could have million of these identities, which could be the most performed schema?
A typical query can be: "how many identities have partner with ID N".
With the second example, a query can be:
db.identities.find({partner.partnerName: {$exists:true}})
With first approach, how can I get this count?
The second solution is more easy to handle Server Side; each document will have a list where each KEY is the partner ID, so instead of scan all document, I can simply get partner data by key...
What do you think about these solutions? I prefer the first one but the second I think that is more "usable"...
Thanks
I prefer the first one, but considering that I could have million of these identities, wich could be the most performed schema?
If you going to have millions of identities, then both approaches
are not really scalable .
Each document in mongo has a size limit (16MB) (read about it here)
In case you are going to have really lot's of identities ,
the scalable approach would be to create a different collection,
only for the relations and partnership data.
Now , I also want you to consider how you treat "partnership",
if I'm a user and I got you on my partners list , will you see me as a partner on your list ?
In case we both see each other as partners , then mongo-db may not be the best solution. graph db's are more appropriate for dealing with relations of this type.
All solutions within mongo for two-ways relations will be built on double updates (Your id on my partner's list , My id on your partner's list).
(In SQL you could add an extra condition for joining but not in mongo) ,
so you don't need to save twice partnerships . (me and you , you and me)
just you and me.
Do you see where is going this way ?
If you need to go only in one way,
Then just create a second collection , "partnerships" ,
{
_id: should be uniqe,
user_id: 'your_id',
partner_id: 'his_id'
data: {} or just flatten the fields into the root object.
}
Please notice that you create a row for each partnership !
Then you could use $lookup in order to query for a user with all of his
partners .
something like:
db.getCollection('partners').aggregate([
{
$lookup: {
from: 'parterships',
localField: '_id',
foreignField: 'user_id',
as: 'partners'
}
},
{
$project: {
name: 1,
partners: 1,
num_partners: { $size: "$partners" }
}
}
])
Read more about the aggregation stages here.
In case you are not going to have lot's of partnership's Then please continue with
your first approach which is good .
The second approach will make most queries to this collection pretty weird and you will always have to write code in order to query this table .
It won't be "straight forward" mongo queries .

aggregating and sorting based on a Mongodb Relationship

I'm trying to figure out if what I want to do is even possible in Mongodb. I'm open to all sorts of suggestions regarding more appropriate ways to achieve what I need.
Currently, I have 2 collections:
vehicles (Contains vehicle data such as make and model. This data can be highly unstructured, which is why I turned to Mongodb for this)
views (Simply contains an IP, a date/time that the vehicle was viewed and the vehicle_id. There could be thousands of views)
I need to return a list of vehicles that have views between 2 dates. The list should include the number of views. I need to be able to sort by the number of views in addition to any of the usual vehicle fields. So, to be clear, if a vehicle has had 1000 views, but only 500 of those between the given dates, the count should return 500.
I'm pretty sure I could perform this query without any issues in MySQL - however, trying to store the vehicle data in MySQL has been a real headache in the past and it has been great moving to Mongo where I can add new data fields with ease and not worry about the structure of my database.
What do you all think?? TIA!
As it turns out, it's totally possible. It took me a long while to get my head around this, so I'm posting it up for future google searches...
db.statistics.aggregate({
$match: {
branch_id: { $in: [14] }
}
}, {
$lookup: {
from: 'vehicles', localField: 'vehicle_id', foreignField: '_id', as: 'vehicle'
}
}, {
$group: {
_id: "$vehicle_id",
count: { $sum: 1 },
vehicleObject: { $first: "$vehicle" }
}
}, { $unwind: "$vehicleObject" }, {
$project: {
daysInStock: { $subtract: [ new Date(), "$vehicleObject.date_assigned" ] },
vehicleObject: 1,
count: 1
}
}, { $sort: { count: -1 } }, { $limit: 10 });
To explain the above:
The Mongodb aggregate framework is the way forward for complex queries like this. Firstly, I run a $match to filter the records. Then, we use $lookup to grab the vehicle record. Worth mentioning here that this is a Many to One relationship here (lots of stats, each having a single vehicle). I can then group on the vehicle_id field, which will enable me to return one record per vehicle with a count of the number of stats in the group. As it is a group, we technically have lots of copies of that same vehicle document now in each group, so I then add just the first one into the vehicleObject variable. This would be fine, but $first tends to return an array with a single entry (pointless in my opinion), so I added the $unwind stage to pull the actual vehicle out. I then added a $project stage to calculate an additional field, sorted by the count descending and limited the results to 10.
And take a breath :)
I hope that helps someone. If you know of a better way to do what I did, then I'm open to suggestions to improve.

MongoDB - Aggregation on referenced field

I've got a question on the design of documents in order to be able to efficiently perform aggregation. I will take a dummy example of document :
{
product: "Name of the product",
description: "A new product",
comments: [ObjectId(xxxxx), ObjectId(yyyy),....]
}
As you could see, I have a simple document which describes a product and wraps some comments on it. Imagine this product is very popular so that it contains millions of comments. A comment is a simple document with a date, a text and eventually some other features. The probleme is that such a product can easily be larger than 16MB so I need not to embed comments in the product but in a separate collection.
What I would like to do now, is to perform aggregation on the product collection, a first step could be for example to select various products and sort the comments by date. It is a quite easy operation with embedded documents, but how could I do with such a design ? I only have the ObjectId of the comments and not their content. Of course, I'd like to perform this aggregation in a single operation, i.e. I don't want to have to perform the first part of the aggregation, then query the results and perform another aggregation.
I dont' know if that's clear enough ? ^^
I would go about it this way: create a temp collection that is the exact copy of the product collection with the only exception being the change in the schema on the comments array, which would be modified to include a comment object instead of the object id. The comment object will only have the _id and the date field. The above can be done in one step:
var comments = [];
db.product.find().forEach( function (doc){
doc.comments.forEach( function(x) {
var obj = {"_id": x };
var comment = db.comment.findOne(obj);
obj["date"] = comment.date;
comments.push(obj);
});
doc.comments = comments;
db.temp.insert(doc);
});
You can then run your aggregation query against the temp collection:
db.temp.aggregate([
{
$match: {
// your match query
}
},
{
$unwind: "$comments"
},
{
$sort: { "comments.date": 1 } // sort the pipeline by comments date
}
]);

Mongodb alternative to Dot notation to edit nested fields with $inc

conceptually what I am trying to figure out is if there is an alternative to accessing nested docs with mongo other than dot notation.
What I am trying to accomplish:
I have a user collection, and each user has a nested songVotes collection where the keys for this nested songVotes collection are the songIds and the value is their vote form the user -1,0, or 1.
I have a "room collection" where many users go and their collective votes for each song influence the room. A single room also has a nested songVotes collection with keys as songIds, however the value is the total number of accumulated votes for all the users in the room added up. For purposes of Meteor.js, its more efficient as users enter the room to add their votes to this nested cumulative vote collection.
Again because reactive joins in Meteor.js arent supported in any kind of efficient way, it also doesnt make sense to break out these nested collections to solve my problem.
So what I am having trouble with is this update operation when a user first enters the room where I take a single users nested songVotes collection and use the mongo $inc operator to apply it to the nested cumulative songVotes collection of the entire room.
The problem is that if you want to use the $inc operator with nested fields, you must use dot notation to access them. So what I am asking on a broad sense is if there is a nice way to apply updates like this to a nested object. Or perhaps specify a global dot notation prefix for $inc something like:
var userVotes = db.collection.users.findOne('user_id').songVotes
// userVotes --> { 'song1': 1, 'song2': -1 ... }
db.rooms.update({ _id: 'blah' }, { $set: { roomSongVotes: { $inc: userVotes } } })
You do need to use dot notation, but you can still do that in your case by programmatically building up the update object:
var userVotes = {'song1': 1, 'song2': -1};
var update = {$inc: {}};
for (var songId in userVotes) {
update.$inc['roomSongVotes.' + songId] = userVotes[songId];
}
db.rooms.update({ _id: 'blah' }, update);
This way, update gets built up as:
{ '$inc': { 'roomSongVotes.song1': 1, 'roomSongVotes.song2': -1 } }

how do I do 'not-in' operation in mongodb?

I have two collections - shoppers (everyone in shop on a given day) and beach-goers (everyone on beach on a given day). There are entries for each day, and person can be on a beach, or shopping or doing both, or doing neither on any day. I want to now do query - all shoppers in last 7 days who did not go to beach.
I am new to Mongo, so it might be that my schema design is not appropriate for nosql DBs. I saw similar questions around join and in most cases it was suggested to denormalize. So one solution, I could think of is to create collection - activity, index on date, embed actions of user. So something like
{
user_id
date
actions {
[action_type, ..]
}
}
Insertion now becomes costly, as now I will have to query before insert.
A few of suggestions.
Figure out all the queries you'll be running, and all the types of data you will need to store. For example, do you expect to add activities in the future or will beach and shop be all?
Consider how many writes vs. reads you will have and which has to be faster.
Determine how your documents will grow over time to make sure your schema is scalable in the long term.
Here is one possible approach, if you will only have these two activities ever. One record per user per day.
{ user: "user1",
date: "2012-12-01",
shopped: 0,
beached: 1
}
Now your query becomes even simpler, whether you have two or ten activities.
When new activity comes in you always have to update the correct record based on it.
If you were thinking you could just append a record to your collection indicating user, date, activity then your inserts are much faster but your queries now have to do a LOT of work querying for both users, dates and activities.
With proposed schema, here is the insert/update statement:
db.coll.update({"user":"username", "date": "somedate"}, {"shopped":{$inc:1}}, true)
What that's saying is: "for username on somedate increment their shopped attribute by 1 and create it if it doesn't exist aka "upsert" (that's the last 'true' argument).
Here is the query for all users on a particular day who did activity1 more than once but didn't do any of activity2.
db.coll.find({"date":"somedate","shopped":0,"danced":{$gt:1}})
Be wary of picking a schema where a single document can have continuous and unbounded growth.
For example, storing everything in a users collection where the array of dates and activities keeps growing will run into this problem. See the highlighted section here for explanation of this - and keep in mind that large documents will keep getting into your working data set and if they are huge and have a lot of useless (old) data in them, that will hurt the performance of your application, as will fragmentation of data on disk.
Remember, you don't have to put all the data into a single collection. It may be best to have a users collection with a fixed set of attributes of that user where you track how many friends they have or other semi-stable information about them and also have a user_activity collection where you add records for each day per user what activities they did. The amount or normalizing or denormalizing of your data is very tightly coupled to the types of queries you will be running on it, which is why figure out what those are is the first suggestion I made.
Insertion now becomes costly, as now I will have to query before insert.
Keep in mind that even with RDBMS, insertion can be (relatively) costly when there are indices in place on the table (ie, usually). I don't think using embedded documents in Mongo is much different in this respect.
For the query, as Asya Kamsky suggest you can use the $nin operator to find everyone who didn't go to the beach. Eg:
db.people.find({
actions: { $nin: ["beach"] }
});
Using embedded documents probably isn't the best approach in this case though. I think the best would be to have a "flat" activities collection with documents like this:
{
user_id
date
action
}
Then you could run a query like this:
var start = new Date(2012, 6, 3);
var end = new Date(2012, 5, 27);
db.activities.find({
date: {$gte: start, $lt: end },
action: { $in: ["beach", "shopping" ] }
});
The last step would be on your client driver, to find user ids where records exist for "shopping", but not for "beach" activities.
One possible structure is to use an embedded array of documents (a users collection):
{
user_id: 1234,
actions: [
{ action_type: "beach", date: "6/1/2012" },
{ action_type: "shopping", date: "6/2/2012" }
]
},
{ another user }
Then you can do a query like this, using $elemMatch to find users matching certain criteria (in this case, people who went shopping in the last three days:
var start = new Date(2012, 6, 1);
db.people.find( {
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
}
});
Expanding on this, you can use the $and operator to find all people went shopping, but did not go to the beach in the past three days:
var start = new Date(2012, 6, 1);
db.people.find( {
$and: [
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
},
actions : {
$not: {
$elemMatch : {
action_type : { $in: ["beach"] },
date : { $gt : start }
}
}
}
]
});