Writting down $subtract in No-SQL MongoDB in a sophisticated way - mongodb

I'm doing MongoDB Academy, and for this question:
What is the difference between the number of people born in 1998 and the number of people born after 1998 in the sample_training.trips collection?
The simplest way is to do this is (the way they expect you to answer):
db.trips.find({"birth year":1988}).count()
and:
db.trips.find({"birth year":{$gt:1988}}).count()
then, manually calculate.
I'm not familiar with programming and code syntax but, wondering about a sophisticated method, something like the code bellow can be improved.
db.trips.aggregate({$subtract:[db.trips.find({"pop":1988}).count(),db.trips.find({"pop":{$gt:1988}}).count()]})
Note: Atlas "free subscriber" don't allow to use $subtract in queries, so I even tested if it will work.

Much Simpler way use:
(query 1) - (query 2)
db.trips.find({"birth year":{"$gt":1998}}).count() - db.trips.find({"birth year":1998}).count()
You'll get the result as an integer
6

You can group the number of people born in the year 1998 and after 1988 using the following approach that too with just one aggregate query.
Aggregate pipeline stages:
Select only those records where the birth year is equal or greater than 1988
Group the number of people by the birth year
Add a new field as born_on to every record using the $cond operator in the $project stage
Group record again by the newly added field born_on, which will now give you only two records with count as on_1988 and after_1988
The query is as following
db.trips.aggregate([
{
"$match":{
"birth year":{
"$gte":1988
}
}
},
{
"$group":{
"_id":"$birth year",
"count":{
"$sum":1
}
}
},
{
"$project":{
"count":"$count",
"born_on":{
"$cond":[
{
"$eq":[
"$_id",
1988
]
},
"on_1988",
"after_1988"
]
}
}
},
{
"$group":{
"_id":"$born_on",
"total":{
"$sum":"$count"
}
}
}
])
** There could be another more optimized way of achieving the same result but this also works

you can use the following queries to get this value:
db.trips.find( { "birth year": { "$gt": 1998 } } ).count()
db.trips.find( { "birth year": 1998 } ).count()
Use the $gt instead of $gte operator to exclude all 1998 births, and then see how many people were born in 1998 by using implicit equality, then subtract the two values to get the difference.

Related

Could someone tell me the difference between these two MongoDB queries

I was given the following problem in MongoDB:
How many companies in the sample_training.companies dataset were
(Either founded in 2004 && ( either have the social category_code [or] web category_code))
|| (founded in the month of October && ( either have the social category_code [or] web category_code))
The actual query for the above question is given below and it returned 149 documents.
CorrectQuery:
db.companies.find({
"$or":[
{
"founded_year":2004,
"$or":[
{ "category_code":"social" },
{ "category_code":"web" }
]
},
{
"founded_month":10,
"$or":[
{ "category_code":"social" },
{ "category_code":"web" }
]
}
]
}).count()
But I've tried to formulate another query for the same problem, unfortunately, it returned the incorrect value of 668 rows.
Incorrect Query:
db.companies.find({
"$or":[
{"category_code":"social"},
{"category_code":"web"}
],
"$or":[
{"founded_year":2004},
{"founded_month":10}
]
}).count()
Could someone help me in understanding the difference between these queries?
The answer is just the standard order of operations in boolean logic.
AND has priority over OR in the same way that multiplication has priority over addition in mathematics.
The first example is correct because the logic matches the logic in your problem.
The second example logic is:
(category_code is "social" or category_code is "web") and (founded_year is 2004 or founded_month is 10)
Your second example:
db.companies.find({
"$or":[
{"category_code":"social"},
{"category_code":"web"}
],
"$or":[
{"founded_year":2004},
{"founded_month":10}
]
}).count()

Mongo 4.2 query base on date

I need to query Mongo using the FIND function, I can't use the aggregate function.
My documents are like this:
{
"name": "Tom",
"priDate":2010-04-11T00:00:00.000Z
}
The query I would like to make is:
Find all documents where ("priDate" + 1 year) is lte today.
Is it possible to do this without using an aggregation query? I can't use the field value in find ..
The query that I would need, I think, would be like this one I made:
db.system.profile.find({
"priDate" :
{
$gte: new Date(ITSELF + 1 year??) ,
$lt : new Date()
}
})
Can you help me?
many thanks, i'm going crazy :)
see if this works:
db.collection.find(
{
$expr: {
$lte: ['$priDate', { $subtract: ['$$NOW', 31536000000] }]
}
}
)
https://mongoplayground.net/p/QJ3BbHTQlgh
Adding "1 year" can be be difficult because of leap years or daylight-saving-times.
I suggest moment.js, then solution would be
db.system.profile.find(
{
priDate: {
$gte: moment().add(1, "year").toDate(),
$lt: moment().toDate()
}
}
)
However priDate >= "today + 1 year" AND priDate < "today" is not possible. Change the condition according to your needs.
MongoDB stores dates as milliseconds since epoch, so you can advance a date one year by adding the number of milliseconds in a year using $add inside $expr, then test with $lte:
db.system.profile.find({$expr:{$lte:[{$add:["$priDate",31536000000]},new ISODate()]}})
Note that this will be off by a day around leap year unless you adjust the number of milliseconds for that.
You expressed the constraint that you cannot use aggregate, but with $expr in Mongo 3.6 onwards, you can use any and all aggregation operators in a find query as well.
https://docs.mongodb.com/manual/reference/operator/query/expr/#definition

Mongodb aggregation performance with indexes

I have a collection with a structure similar to this.
{
"_id" : ObjectId("59d7cd63dc2c91e740afcdb"),
"dateJoined": ISODate("2014-12-28T16:37:17.984Z"),
"dateActivatedMonth": 15,
"enrollments" : [
{ "month":-10, "enrolled":'00'},
{ "month":-9, "enrolled":'00'},
{ "month":-8, "enrolled":'01'},
//other months
{ "month":8, "enrolled":'11'},
{ "month":9, "enrolled":'11'},
{ "month":10, "enrolled":'00'}
]
}
dateActivatedMonth is number of months from dateJoined.
month in enrollments sub document is a relative month from dateJoined.
I am using Mongodb aggregation framework to process queries like "all enrollments with enrolled as '01' and 'enrolled month is 10 months before activation and 25 months after activation' ".
In my aggregation, first I am applying all possible filters in the $match pipeline and the applying condition on "month" in the $project pipeline.
db.getCollection("enrollments").aggregate(
{ $match:{ //initial conditions }},
{ $project: {
//Here i will apply month filter by considering the number of month before and after.
}
}
//other pipeline operations
)
All the filters that I am applying in my condition are optional. So in some cases $match will not filter anything. The only condition that is guaranteed to be there is the one on "month" in the "enrollments" sub document.
This query is slow. Takes about 6-7 seconds. (data is also huge). I am looking for ways to improve this. First thing I am looking at is to create indexes.
Now my questions are:
Can $project use indexes? I tried creating index on month" but I don't see anything about index usage in the queryPlanner with explain()
I would like to move the "month" condition to $match, So that it uses the index. How can I use a field value in $match? something like this:
db.getCollection('collection').aggregate([
{
$match:{
dateActivatedMonth: {$exists: true}
//,'enrollment.month': dateActivatedMonth -- not working
//,'enrollment.month': '$dateActivatedMonth' -- not working
}
}])
Thank you for your patience.

getting the latest xx records with mongoose, How to order them?

I'm trying to get the last 20 records of user collection with mongoose:
User.find({'owner': req.params.id}).
sort(date:'-1').
limit(20).
exec(.....)
This works well, show the last 20 items.
But the items inside the array are sorted from the most recent to the oldest, Is there any way to reverse this with mongoose?
Thanks
You can certainly do this with an aggregation, such as this:
db.user.aggregate[(
{ $match : {"owner" : req.params.id}},
{ $sort : {"date" : -1}},
{ $limit : 20},
{ $sort : {"date" : 1}}
])
Notes on this aggregation:
The first three parts do the same job as the Find in your question
The fourth part applies a further sort, which re-orders the returned 20 records from oldest to most recent
I have written it in native MongoDB aggregation syntax; you will need to adjust the code to generate the same aggregation from Mongoose.
Update: I think this is not possible with a find() with cursor methods, because you would need two different sort() operations. But, MongoDB does not treat them as a sequence of independent operations; the docs give an example of methods written in one order — sort().limit() — being equivalent to the opposite order — limit().sort(), showing that the order cannot be relied upon as meaningful.
Find total and select only latest 20 , may be this is not effective way you found , but this will solve your problem.
User.count({'owner': req.params.id},function(err,count){
if(count){
var skipItem=count-20;
User.find({'owner': req.params.id}).
.skip(skipItem)
.limit(20)
.sort(date:'1').
exec(.....)
}
});
db.users.aggregate([
{ $match: {
'owner': req.params.id
}},
{ $unwind: '[arrayFieldName]' },
{ $sort: {
'[arrayFieldName]': -1/1,
'date':-1
}}
])

Mongo: Ensuring latest nested attribute has a value between given arguments

I have a mongo collection 'books'. Here's a typical book:
BOOK
name: 'Test Book'
author: 'Joe Bloggs'
print_runs: [
{publisher: 'OUP', year: 1981},
{publisher: 'Penguin', year: 1987},
{publisher: 'Harper-Collins', year: 1992}
]
I'd like to be able to filter books to return only books whose last print run was after a given date, and/or before a given date...and I've been struggling to find a feasible query. Any suggestions appreciated.
There are a few options, as getting access to the "last" element in the array and only filtering on that is difficult/impossible with the normal find options in MongoDB queries. (Unfortunately, you can't $slice with find).
Store the most recent published publisher and year in the print_runs array and in a special (denormalized/copy) of the data directly on the book object. Book.last_published_by and Book.last_published_date for example. Queries would be simple and super fast.
MapReduce. This would be simple enough to emit the last element in the array and then "reduce" it to just that. You'd need to do incremental updates on the MapReduce to keep it accurate.
Write a relatively complex aggregation framework expression
The aggregation might look like:
db.so.aggregate({ $project :
{ _id: 1, "print_run_year" : "$print_runs.year" }},
{ $unwind: "$print_run_year" },
{ $group : { _id : "$_id", "newest" : { $max : "$print_run_year" }}},
{ $match : { "newest" : { $gt : 1991, $lt: 2000 } }
})
As it may require a bit of explanation:
It projects and unwinds the year of the print runs for each book.
Then, group on the _id (of the book, and create a new computed field called, newest which contains the highest print run year (from the projection).
Then, filter on newest using a $gt and $lt
I'd suggest option #1 above would be the best from an efficiency perspective, followed by the MapReduce, and then a distant third, option #3.