Mongo match based on sum of array components - mongodb

I have the following schema and I am having trouble returning the data that I want.
var Book = new Schema({
ISBN: String,
title: String,
author: String,
image: String,
availability: [{zipcode: String,
total: Number,
loaned: Number
}]
});
I would like to return a random sample of items (maybe 25) in which there are items available. In this case availability would be defined be the total being greater than the number loaned in at least one of the sets under "availability" (or by looking at the sum of total and loaned in all the sets). Every time I start, I seem to run into a wall. Does anyone have any idea if this is possible or how to do it?

I did finally get it working by doing this:
db.books.aggregate(
[
{
$addFields: {
available: {$gt: [{$sum: "$availability.total"}, {$sum: "$availability.loaned"}]}
}
},
{
$match: {available: true}
},
{
$sample: {size: 25}
}
]
)
I'm not sure if there are any downsides to this approach, but with limited testing it does seem to work for me.

Related

What is the best practice to find mongo documents count?

Wanted to know the performance difference between countDocument and find query.
I have to find the count of documents based on certain filter, which approach will be better and takes less time?
db.collection.countDocuments({ userId: 12 })
or
db.collection.find({ userId: 12 }) and then using the length of resulted array.
You should definitely use db.collection.countDocuments() if you don't need the data. This method uses an aggregation pipeline with the filters you pass on and only returns the count so you don't waste processing and time waiting for an array with all results.
This:
db.collection.countDocuments({ userId: 12 })
Is equivalent to:
db.collection.aggregate([
{ $match: { userId: 12 } },
{ $group: { _id: null, n: { $sum: 1 } } }
])

Speed up aggregation on large collection

I currently have a database with about 270 000 000 documents. They look like this:
[{
'location': 'Berlin',
'product': 4531,
'createdAt': ISODate(...),
'value': 3523,
'minOffer': 3215,
'quantity': 7812
},{
'location': 'London',
'product': 1231,
'createdAt': ISODate(...),
'value': 53523,
'minOffer': 44215,
'quantity': 2812
}]
The database currently holds a bit over one month of data and has ~170 locations (in EU and US) with ~8000 products. These documents represent timesteps, so there are about ~12-16 entries per day, per product per location (at most 1 per hour though).
My goal is to retrieve all timesteps of a product in a given location for the last 7 days. For a single location this query works reasonable fast (150ms) with the index { product: 1, location: 1, createdAt: -1 }.
However, I also need these timesteps not just for a single location, but an entire region (so about 85 locations). I'm currently doing that with this aggregation, which groups all the entries per hour and averages the desired values:
this.db.collection('...').aggregate([
{ $match: { { location: { $in: [array of ~85 locations] } }, product: productId, createdAt: { $gte: new Date(Date.now() - sevenDaysAgo) } } }, {
$group: {
_id: {
$toDate: {
$concat: [
{ $toString: { $year: '$createdAt' } },
'-',
{ $toString: { $month: '$createdAt' } },
'-',
{ $toString: { $dayOfMonth: '$createdAt' } },
' ',
{ $toString: { $hour: '$createdAt' } },
':00'
]
}
},
value: { $avg: '$value' },
minOffer: { $avg: '$minOffer' },
quantity: { $avg: '$quantity' }
}
}
]).sort({ _id: 1 }).toArray()
However, this is really really slow, even with the index { product: 1, createdAt: -1, location: 1 } (~40 secs). Is there any way to speed up this aggregation so it goes down to a few seconds at most? Is this even possible, or should I think about using something else?
I've thought about saving these aggregations in another database and just retrieving that and aggregating the rest, this is however really awkward for the first users on the site who have to sit 40 secs through waiting.
These are some ideas which can benefit the querying and performance. Whether all these will work together is matter of some trials and testing. Also, note that changing the way data is stored and adding new indexes means that there will changes to application, i.e., capturing data, and the other queries on the same data need to be carefully verified (that they are not affected in a wrong way).
(A) Storing a Day's Details in a Document:
Store (embed) a day's data within the same document as an array of sub-documents. Each sub-document represents an hour's entry.
From:
{
'location': 'London',
'product': 1231,
'createdAt': ISODate(...),
'value': 53523,
'minOffer': 44215,
'quantity': 2812
}
to:
{
location: 'London',
product: 1231,
createdAt: ISODate(...),
details: [ { value: 53523, minOffer: 44215, quantity: 2812 }, ... ]
}
This means about ten entries per document. Adding data for an entry will be pushing data into the details array, instead of adding a document as in present application. In case the hour's info (time) is required it can also be stored as part of the details sub-document; it will entirely depend upon your application needs.
The benefits of this design:
The number of documents to maintain and query will reduce (per
product per day about ten documents).
In the query, the group stage will go away. This will be just a
project stage. Note that the $project supports accumulators $avg and $sum.
The following stage will create the sums and averages for the day (or a document).
{
$project: { value: { $avg: '$value' }, minOffer: { $avg: '$minOffer' }, quantity: { $avg: '$quantity' } }
}
Note the increase in size of the document is not much, with the amount of details being stored per day.
(B) Querying by Region:
The present matching of multiple locations (or a region) with this query filer: { location: { $in: [array of ~85 locations] } }. This filter says : location: location-1, -or- location: location-3, -or- ..., location: location-50. Adding a new field , region, will filter with one value matching.
The query by region will change to:
{
$match: {
region: regionId,
product: productId,
createdAt: { $gte: new Date(Date.now() - sevenDaysAgo) }
}
}
The regionId variable is to be supplied to match with the region field.
Note that, both the queries, "by location" and "by region", will benefit with the above two considerations, A and B.
(C) Indexing Considerations:
The present index: { product: 1, location: 1, createdAt: -1 }.
Taking into consideration, the new field region, newer indexing will be needed. The query with region cannot benefit without an index on the region field. A second index will be needed; a compound index to suit the query. Creating an index with the region field means additional overhead on write operations. Also, there will be memory and storage considerations.
NOTES:
After adding the index, both the queries ("by location" and "by region") need to be verified using explain if they are using their respective indexes. This will require some testing; a trial-and-error process.
Again, adding new data, storing data in a different format, adding new indexes requires to consider these:
Careful testing and verifying that the other existing queries perform as usual.
The change in data capture needs.
Testing the new queries and verifying if the new design performs as expected.
Honestly your aggregation is pretty much as optimized as it can get, especially if you have { product: 1, createdAt: -1, location: 1 } as an index like you stated.
I'm not exactly sure how your entire product is built, however the best solution in my opinion is to have another collection containing just the "relevant" documents from the past week.
Then you could query that collection with ease, This is quite easy to do in Mongo as well using a TTL Index.
If this not an option you could add a temporary field to the "relevant" documents and query on that making it somewhat faster to retrieve them, but maintaining this field will require you to have a process running every X time which could make your results now 100% accurate depending when you decide to run it.

Nested JSON output from mongodb via Get

I have some status-Data from a bunch of devices stored in mongoDB and I like to aggregate via mongoose some simple statistics pending on input variables e.g. input about timespan (start to end).
Plane Dataset in mongoDB looks somehowe like this:
{
"_id":"5d65b4a9cef78a5c987b2224",
"Date":"2019-08-01T00:00:00.000Z",
"Id":9,
"StandingNoMovement":21.9,
"DrivingHours":0.4,
"StandingWithEngineOn":1.6
},
{
"_id":"5d65b4a9cef78a5c987b2225",
"Date":"2019-08-02T00:00:00.000Z",
"Id":9,
"StandingNoMovement":19.2,
"DrivingHours":2.3,
"StandingWithEngineOn":2.3,
}
and I need to create a GET for an API with the structure of
[{"Id":9,
"DrivingHours":
{"Total":276.9,"Day":0.0,"ThisWeek":0.0,"ThisMonth":0.0},
"StandingNoMovement":
{"Total":678.4,"Day":0.0,"ThisWeek":0.0,"ThisMonth":0.0},
"StandingWithEngineOn":
{"Total":521.4,"Day":0.0,"ThisWeek":0.0,"ThisMonth":0.0}]
So fare, I only managed to aggregate some not nestet statistics in mongoose like:
const aggReport = mongoose.model('aggReport', aggReportingSchema);
aggReport.aggregate([
{ $match:
{ Id : req.params.Id ,'Date': { $gte: start, $lt: end}
}
},
{ $group:
{ _id: null,
'DrivingHours': {$sum: '$DrivingHours'},
'StandingWithEngineOn': {$sum: '$StandingWithEngineOn'},
'StandingNoMovement': {$sum: '$StandingNoMovement'}
}
}])
on the Schema
export const aggReportingSchema = new Schema({
Id:{ type: Number},
Date:{ type: Date},
StandingNoMovement:{ type: Number},
StandingWithEngineOn:{ type: Number},
},{ collection : 'status_daily' });
How to get the nested statistics as they look that simple to me?
I just switched now to a total different approach, as I got the impression, that there are no proper and usable solutions (perhaps with good reasons) to dynamically create such a API. So now I store all the date precalculated in the mondodb and I am using express and node just to call the necessary dataset.

Aggregate query with no $match

I have a collection in which unique documents from a different collection can appear over and over again (in example below item), depending on how much a user shares them. I want to create an aggregate query which finds the most shared documents. There is no $match necessary because I'm not matching a certain criteria, I'm just querying the most shared. Right now I have:
db.stories.aggregate(
{
$group: {
_id:'item.id',
'item': {
$first: '$item'
},
'total': {
$sum: 1
}
}
}
);
However this only returns 1 result. It occurs to me I might just need to do a simple find query, but I want the results aggregated, so that each result has the item and total is how many times it's appeared in the collection.
Example of a document in the stories collection:
{
_id: ObjectId('...'),
user: {
id: ObjectId('...'),
propertyA: ...,
propertyB: ...,
etc
},
item: {
id: ObjectId('...'),
propertyA: ...,
propertyB: ...,
etc
}
}
users and items each have their own collections as well.
Change the line
_id:'item.id'
to
_id:'$item.id'
Currently you group by the constant 'item.id' and therefore you only get one document as result.

How can I retrieve all the fields when using $elemMatch?

Consider the following posts collection:
{
_id: 1,
title: "Title1",
category: "Category1",
comments: [
{
title: "CommentTitle1",
likes: 3
},
{
title: "CommentTitle2",
likes: 4
}
]
}
{
_id: 2,
title: "Title2",
category: "Category2",
comments: [
{
title: "CommentTitle3",
likes: 1
},
{
title: "CommentTitle4",
likes: 4
}
]
}
{
_id: 3,
title: "Title3",
category: "Category2",
comments: [
{
title: "CommentTitle5",
likes: 1
},
{
title: "CommentTitle6",
likes: 3
}
]
}
I want to retrieve all the posts, and if one post has a comment with 4 likes I want to retrieve this comment only under the "comments" array. If I do this:
db.posts.find({}, {comments: { $elemMatch: {likes: 4}}})
...I get this (which is exactly what I want):
{
_id: 1,
comments: [
{
title: "CommentTitle2",
likes: 4
}
]
}
{
_id: 2,
comments: [
{
title: "CommentTitle4",
likes: 4
}
]
}
{
_id: 3
}
But how can I retrieve the remaining fields of the documents without having to declare each of them like below? This way if added more fields to the post document, I wouldn't have to change the find query
db.posts.find({}, {title: 1, category: 1, comments: { $elemMatch: {likes: 4}}})
Thanks
--EDIT--
Sorry for the misread of your question. I think you'll find my response to this question here to be what you are looking for. As people have commented, you cannot project this way in a find, but you can use aggregation to do so:
https://stackoverflow.com/a/21687032/2313887
The rest of the answer stands as useful. So I think I'll leave it here
You must specify all of the fields you want or nothing at all when using projection.
You are asking here essentially that once you choose to alter the output of the document and limit how one field is displayed then can I avoid specifying the behavior. The bottom line is thinking of the projection part of a query argument to find just like SQL SELECT.It behaves in that * or all is the default and after that is a list of fields and maybe some manipulation of the fields format. The only difference is for _id which is always there by default unless specified otherwise by excluding it, i.e { _id: 0 }
Alternately if you want to filter the collection you nee to place your $elemMatch in thequery itself. The usage here in projection is to explicitly limit the returned document to only contain the matching elements in the array.
Alter your query:
db.posts.find(
{ comments: { $elemMatch: {likes: 4}}},
{ title: 1, category: 1, "comments.likes.$": 1 }
)
And to get what you want we use the positional $ operator in the projection portion of the find.
See the documentation for the difference between the two usages:
http://docs.mongodb.org/manual/reference/operator/query/elemMatch/
http://docs.mongodb.org/manual/reference/operator/projection/elemMatch/
This question is pretty old, but I just faced the same issue and I didn't want to use the aggregation pipeline as it was simple query and I only needed to get all fields applying an $elemMatch to one field.
I'm using Mongoose (which was not the original question but it's very frequent these days), and to get exactly what the question said (How can I retrieve all the fields when using $elemMatch?) I made this:
const projection = {};
Object.keys(Model.schema.paths).forEach(key => {
projection[key] = 1;
});
projection.subfield = { $elemMatch: { _id: subfieldId } };
Model.find({}, projection).then((result) => console.log({ result });