Aggregate query with no $match - mongodb

I have a collection in which unique documents from a different collection can appear over and over again (in example below item), depending on how much a user shares them. I want to create an aggregate query which finds the most shared documents. There is no $match necessary because I'm not matching a certain criteria, I'm just querying the most shared. Right now I have:
db.stories.aggregate(
{
$group: {
_id:'item.id',
'item': {
$first: '$item'
},
'total': {
$sum: 1
}
}
}
);
However this only returns 1 result. It occurs to me I might just need to do a simple find query, but I want the results aggregated, so that each result has the item and total is how many times it's appeared in the collection.
Example of a document in the stories collection:
{
_id: ObjectId('...'),
user: {
id: ObjectId('...'),
propertyA: ...,
propertyB: ...,
etc
},
item: {
id: ObjectId('...'),
propertyA: ...,
propertyB: ...,
etc
}
}
users and items each have their own collections as well.

Change the line
_id:'item.id'
to
_id:'$item.id'
Currently you group by the constant 'item.id' and therefore you only get one document as result.

Related

Speed up aggregation on large collection

I currently have a database with about 270 000 000 documents. They look like this:
[{
'location': 'Berlin',
'product': 4531,
'createdAt': ISODate(...),
'value': 3523,
'minOffer': 3215,
'quantity': 7812
},{
'location': 'London',
'product': 1231,
'createdAt': ISODate(...),
'value': 53523,
'minOffer': 44215,
'quantity': 2812
}]
The database currently holds a bit over one month of data and has ~170 locations (in EU and US) with ~8000 products. These documents represent timesteps, so there are about ~12-16 entries per day, per product per location (at most 1 per hour though).
My goal is to retrieve all timesteps of a product in a given location for the last 7 days. For a single location this query works reasonable fast (150ms) with the index { product: 1, location: 1, createdAt: -1 }.
However, I also need these timesteps not just for a single location, but an entire region (so about 85 locations). I'm currently doing that with this aggregation, which groups all the entries per hour and averages the desired values:
this.db.collection('...').aggregate([
{ $match: { { location: { $in: [array of ~85 locations] } }, product: productId, createdAt: { $gte: new Date(Date.now() - sevenDaysAgo) } } }, {
$group: {
_id: {
$toDate: {
$concat: [
{ $toString: { $year: '$createdAt' } },
'-',
{ $toString: { $month: '$createdAt' } },
'-',
{ $toString: { $dayOfMonth: '$createdAt' } },
' ',
{ $toString: { $hour: '$createdAt' } },
':00'
]
}
},
value: { $avg: '$value' },
minOffer: { $avg: '$minOffer' },
quantity: { $avg: '$quantity' }
}
}
]).sort({ _id: 1 }).toArray()
However, this is really really slow, even with the index { product: 1, createdAt: -1, location: 1 } (~40 secs). Is there any way to speed up this aggregation so it goes down to a few seconds at most? Is this even possible, or should I think about using something else?
I've thought about saving these aggregations in another database and just retrieving that and aggregating the rest, this is however really awkward for the first users on the site who have to sit 40 secs through waiting.
These are some ideas which can benefit the querying and performance. Whether all these will work together is matter of some trials and testing. Also, note that changing the way data is stored and adding new indexes means that there will changes to application, i.e., capturing data, and the other queries on the same data need to be carefully verified (that they are not affected in a wrong way).
(A) Storing a Day's Details in a Document:
Store (embed) a day's data within the same document as an array of sub-documents. Each sub-document represents an hour's entry.
From:
{
'location': 'London',
'product': 1231,
'createdAt': ISODate(...),
'value': 53523,
'minOffer': 44215,
'quantity': 2812
}
to:
{
location: 'London',
product: 1231,
createdAt: ISODate(...),
details: [ { value: 53523, minOffer: 44215, quantity: 2812 }, ... ]
}
This means about ten entries per document. Adding data for an entry will be pushing data into the details array, instead of adding a document as in present application. In case the hour's info (time) is required it can also be stored as part of the details sub-document; it will entirely depend upon your application needs.
The benefits of this design:
The number of documents to maintain and query will reduce (per
product per day about ten documents).
In the query, the group stage will go away. This will be just a
project stage. Note that the $project supports accumulators $avg and $sum.
The following stage will create the sums and averages for the day (or a document).
{
$project: { value: { $avg: '$value' }, minOffer: { $avg: '$minOffer' }, quantity: { $avg: '$quantity' } }
}
Note the increase in size of the document is not much, with the amount of details being stored per day.
(B) Querying by Region:
The present matching of multiple locations (or a region) with this query filer: { location: { $in: [array of ~85 locations] } }. This filter says : location: location-1, -or- location: location-3, -or- ..., location: location-50. Adding a new field , region, will filter with one value matching.
The query by region will change to:
{
$match: {
region: regionId,
product: productId,
createdAt: { $gte: new Date(Date.now() - sevenDaysAgo) }
}
}
The regionId variable is to be supplied to match with the region field.
Note that, both the queries, "by location" and "by region", will benefit with the above two considerations, A and B.
(C) Indexing Considerations:
The present index: { product: 1, location: 1, createdAt: -1 }.
Taking into consideration, the new field region, newer indexing will be needed. The query with region cannot benefit without an index on the region field. A second index will be needed; a compound index to suit the query. Creating an index with the region field means additional overhead on write operations. Also, there will be memory and storage considerations.
NOTES:
After adding the index, both the queries ("by location" and "by region") need to be verified using explain if they are using their respective indexes. This will require some testing; a trial-and-error process.
Again, adding new data, storing data in a different format, adding new indexes requires to consider these:
Careful testing and verifying that the other existing queries perform as usual.
The change in data capture needs.
Testing the new queries and verifying if the new design performs as expected.
Honestly your aggregation is pretty much as optimized as it can get, especially if you have { product: 1, createdAt: -1, location: 1 } as an index like you stated.
I'm not exactly sure how your entire product is built, however the best solution in my opinion is to have another collection containing just the "relevant" documents from the past week.
Then you could query that collection with ease, This is quite easy to do in Mongo as well using a TTL Index.
If this not an option you could add a temporary field to the "relevant" documents and query on that making it somewhat faster to retrieve them, but maintaining this field will require you to have a process running every X time which could make your results now 100% accurate depending when you decide to run it.

Updating multiple subdocument arrays in MongoDB

I have a collection full of products each of which has a subdocument array of up to 100 variants (SKUs) of that product:
e.g.
{
'_id': 12345678,
'handle': 'my-product-handle',
'updated': false
'variants': [
{
'_id': 123412341234,
'sku': 'abc123',
'inventory': 1
},
{
'_id': 123412341235,
'sku': 'abc124',
'inventory': 2
},
...
]
}
My goal is to be able to update the inventory quantity of all instances of a SKU number. It is important to note that in the system I'm working with, SKUs are not unique. Therefore, if a SKU shows up multiple times in a single product or across multiple products, they all need to be updated to the new inventory quantity.
Furthermore, I need the "updated" field to be changed to "true" *only if the inventory quantity for that SKU has changed"
As an example, if I want to update all instances of SKU "abc123" to have 25 inventory, the example of above would return this:
{
'_id': 12345678,
'handle': 'my-product-handle',
'updated': true
'variants': [
{
'_id': 123412341234,
'sku': 'abc123',
'inventory': 25
},
{
'_id': 123412341235,
'sku': 'abc124',
'inventory': 2
},
...
]
}
Thoughts?
MongoDB 3.6 has introduced the filtered positional operator $[<identifier>] which can be used to update multiple elements of an array which match an array filter condition. You can read more about this operator here: https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/
For example, to update all elements of the variants array where sku is "abc123" across every document in the collection:
db.collection.update({}, { $set: { "variants.$[el].inventory": 25 }}, { multi: true, arrayFilters: [{ "el.sku": "abc123"}] })
Unfortunately I'm not aware of any way in a single query to update a document's field based on whether another field in the document was updated. This is something you would have to implement with some client-side logic and a second query.
EDIT (thanks to Asya's comment):
You can do this in a single query by only matching documents which will be modified. So if nMatched and nModified are necessarily equal, you can just set updated to true. For example, I think this would solve the problem in a single query:
db.collection.update({ variants: { $elemMatch: { inventory: { $ne: 25 }, sku: "abc123" } } }, { $set: { "variants.$[el].inventory": 25, updated: true }}, { multi: true, arrayFilters: [{ "el.sku": "abc123"}] })
First you match documents where the variants array contains documents where the sku is "abc123" and the inventory does not equal the number you are setting it to. Then you go ahead and set the inventory on all matching subdocuments and set updated to true.

How to use $nin to exclude array of objects in Meteor, MongoDB, React

I want to exclude array of objects from my query when fetching Object.
mixins: [ReactMeteorData],
getMeteorData() {
// Standard stuff
var selector = {};
var handle = Meteor.subscribe('videos', selector);
var data = {};
// Return data = OK!
data.video = Videos.findOne({ _id: this.props._id });
// Fetch objects with $lte _id to exclude, Return id_ field array = OK!
data.excnext = Videos.find({ votes: data.video.votes, _id: {$lt: data.video._id}},{fields: {_id:1}},{sort: {_id: 1},limit:50}).fetch();
// Fetch objects by Votes, Exclude array of objects with $nin = NOT OK!
data.next = Videos.findOne({ _id: { $ne: this.props._id, $nin:data.excnext }, votes: { $gte: data.video.votes}},{ sort: { votes: 1, _id: 1 }});
return data;
},
Why is $nin not working like expected?
Am unsure if am doing something wrong when fetching my array or when returning it using $ini
Logged example = data.excnext
[ { _id: 'A57WgS6n3Luu23A4N' },
{ _id: 'JDarJMxPAnmeTwgK4' },
{ _id: 'DqaeqTfi8RyvPPTiD' },
{ _id: 'BN5qShBJzd6N7cRzh' },
{ _id: 'BSw2FAthNLjav5T4w' },
{ _id: 'Mic849spXA25EAWiP' } ]
Grinding on my first app using this stack. My core setup is Meteor,Flow-router-ssr, React-layout, MongoDB, React. What am trying to do is to fetch next object by votes, problem is that sometimes several objects have the same amount of votes, so then i need to sort it by id and exclude unwanted objects.
First of all am seeking the answer how to use $nin correct in my example above
Second, Suggestions / examples how to do this better are welcome, There could be a much better and simpler way to do this, Information is not easy to find and without any previous experience there is a chance that am complicating this more than needed.
// ❤ peace
In this case $nin needs an array of id strings, not an array of objects which have an _id field. Give this a try:
data.excnext = _.pluck(Videos.find(...).fetch(), '_id');
That uses pluck to extract an array of ids which you can then use in your subsequent call to findOne.

Returning whole object in MongoDB aggregation

I have Item schema in which I have item details with respective restaurant. I have to find all items of particular restaurant and group by them with 'type' and 'category' (type and category are fields in Item schema), I am able to group items as I want but I wont be able to get complete item object.
My query:
db.items.aggregate([{
'$match': {
'restaurant': ObjectId("551111450712235c81620a57")
}
}, {
'$group': {
id: {
'$push': '$_id'
}
, _id: {
type: '$type'
, category: '$category'
}
}
}, {
$project: {
id: '$id'
}
}])
I have seen one method by adding each field value to group then project it. As I have many fields in my Item schema I don't feel this will good solution for me, Can I get complete object instead of Ids only.
Well you can always use $$ROOT providing that your server is MongoDB 2.6 or greater:
db.items.aggregate([
{ '$match': {'restaurant': ObjectId("551111450712235c81620a57")}},
{ '$group':{
_id : {
type : '$type',
category : '$category'
},
id: { '$push': '$$ROOT' },
}}
])
Which is going to place every whole object into the members of the array.
You need to be careful when doing this as with larger results you are certain to break BSON limits.
I would suggest that you are trying to contruct some kind of "search results", with "facet counts" or similar. For that you are better off running a separate query for the "aggregation" part and one for the actual document results.
That is a much safer and flexible approach than trying to group everything together.

MongoDB sorting documents by nested data

Considering the following design for posts:
{
title: string,
body: string,
comments: [
{name: string, comment: string, ...},
{name: string, comment: string, ...},
...
]
}
...
1) I would like to select all posts in my collection and have them sorted by the posts that have the most comments. I'm assuming since the .length variable is always set via javascript that it is possible to use this to sort by but I don't know how or if it's actually more efficient to store the comment count in a field in the post document?
1.1) Or does it make more sense to store the comment count in a separate document and continiously update that?
2) When selecting posts, is it possible to limit the result to only return back the last 3 comments of a post document as opposed to the whole array?
You need to use the aggregate command
This should give you a list of post _id with the number of comments sorted by the count in reverse order.
You can use the $limit operators to return the x top rows. e.g. { $limit : 5 }
db.posts.aggregate(
{ $unwind : "$comments" },
{ $group : { _id : "$_id" , number : { $sum : 1 } } },
{ $sort : { number : -1 } }
);
Take a look
http://docs.mongodb.org/manual/tutorial/aggregation-examples/