Why doesn't this Mongo aggregation work? - mongodb

I feel like I must be missing something obvious. Here is the aggregation, as one would post it in the shell:
db.documents.aggregate(
{ $project: { title: 1, "date.year": 1, decade:
{ $subtract: ['$date.year', { $mod: ['$date.year', 10]}]}
}})
This is supposed to take a list of documents, each with a date.year field, and add a decade field indicating which decade the document is in (1900, 1910, etc.) I'm planning on further transforming the data after I get that added.
The problem is that when I run the aggregation, I get:
{
"errmsg" : "exception: $subtract resulted in a non-numeric type",
"code" : 16413,
"ok" : 0
}
If I change $subtract to $add, it works fine (but doesn't give me the right result, of course.) So what's going on with the subtraction? Why am I getting a non-numeric type when I subtract but a number when I add?
Thanks in advance!

This looks like a bug in the aggregation framework - it's not handling subtraction correctly when the fields you are operating for are not set in the documents going through the pipeline.
It's been fixed in 2.3.2 (I can't reproduce this - it projects "null" when "date" is not set) but one way you can work around this limitation is by adding a $match condition to your pipeline, i.e. prefix {$project} with:
{$match: {"date.year":{$exists:true}}}

Related

MongoDB - Find documents which contain an element in an array which has a property of null

I'm struggling with a seemingly simple query in mongodb.
I have a job collection that has objects like:
{
"_id" : ObjectId("5995c1fc3c2a353a782ee51b"),
"stages" : [
{
"start" : ISODate("2017-02-02T22:06:26Z"),
"end" : ISODate("2017-02-03T22:06:26Z"),
"name" : "stage_one"
},
{
"start" : ISODate("2017-02-03T22:06:26Z"),
"end" : ISODate("2017-02-07T20:34:01Z"),
"name" : "stage_two"
}
]
}
I want to get a job whose second stage does not have an end time, i.e. end is null or not defined.
According to the mongo docs on querying for null and querying an array of embedded documents, it would seem the correct query should be:
db.job.findOne({'stages.1.end': null})
However, running that query returns me the job above which does have a non-null end date. In fact, if I run the query with count instead of findOne, I see that all jobs are returned - no filtering is done at all.
For completeness, here is the output from an example on a fresh mongo instance:
So in this example, I would expect db.job.findOne({'stages.1.end': null}) to return nothing since there is only one document and its second stage has a non-null end date.
This feels like the sort of issue where it's just me being an idiot and if so, I apologise profusely.
Thanks in advance for your help and let me know if you need any more details!
EDIT:
After some more experimentation, I think I can achieve what I want with the following:
db.job.find({$or: [{'stages.1.end': { $type: 10 }}, {'stages.1.end': {$exists: false}}]})
While this gets the job done, it doesn't feel like the simplest way and I still don't understand why the original query doesn't work. If anyone could shed some light on this it'd be much appreciated.

Inner query using mongo template

I am new to MongoDB and Spring mongotemplate. I would like to build a query using mongotemplate whose equivalent in Postgres would be
select * from feedback
where feedback.outletId in (
select outletId from feedback
where feedback.createdOn >= '2013-05-03'::date
)
Is this even possible in MongoDB?
Well there is no concept of inner queries in MongoDB so basically it can be achieved by 2 queries but probably you already know that and want a 'better' solution. Since you asked if it is possible, I think it can be achieved by aggregation however that can be tricky.
db.feedback.aggregate([
{$project : {
'outletId' : 1,
'feedback._id' : '$_id',
'feedback.createdOn' : '$createdOn',
'feedback.a' : '$a'
}},
{$group : {
_id : $outletId,
feedbacks : {$addToSet : '$feedback'}
}},
{$match : {
'feedbacks.createdOn' : {
$gte : ISODate('2013-05-03')}
}},
{$unwind : '$feedback'}]);
You can add one more $project stage in the end to turn child object into values as it was in the document. I know it doesn't look pretty, but I would explain it stage by stage,
first we project a document with putting all the needed field inside a child field called feedback,
in second stage we grouped it by outletId and put all the child feedback into an array named feedbacks,(so for each outletid we get all feedbacks).
in third stage we use $match to filter out where there is not even a single feedback in array which createdOn field is greater than set date,
after those outletIds are filtered out we call unwind to get each child in feedbacks array as a single document.
Now if we talk about mongoTemplate, yes it accepts all these parameter for aggregation including the nesting of child in feedback in first stage. just see some example of TypedAggregation
if you are saving the createdOn field as a string instead of timestamp or ISODate even normal mongo queries won't work on that when you need to find range as its working in your postgres example.
Hope it helps.

How can I get all the doc ids in MongoDB?

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.
You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.
One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})
I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.
Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!
I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.
Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")
i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.
One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});

MongoDB - Aggregate Sum

I am attempting to calculate the total amount of money spent being tracked inside of our database. Each order document contains a field "total_price"
I am attempting to use the following code:
db.orders.aggregate({
$group: {
_id: null,
total: {$sum: "$total_price"}
}
})
Unfortunately, the only output I get is: { "result" : [ { "_id" : null, "total" : 0 } ], "ok" : 1 }
But to verifiy there is actually numerical data stored, and just not being totaled: db.orders.find()[0].total_price this results in 8.99
Any help would be greatly appreciated. I have very little experience using MongoDB. I've only covered the basics at this point.
Thank you in advance.
$sum only works with ints, longs and floats. Right now, there is no operator to parse a string into a number, although that would be very useful. You can do this yourself as is described in Mongo convert all numeric fields that are stored as string but that would be slow.
I would suggest you make sure that your application stores numbers as int/long/float, and that you write a script that iterators over all your documents and updates the value. I would also suggest that you add a feature request at https://jira.mongodb.org/browse/SERVER to add an operator that converts a string to a number.

Mongo: Ensuring latest nested attribute has a value between given arguments

I have a mongo collection 'books'. Here's a typical book:
BOOK
name: 'Test Book'
author: 'Joe Bloggs'
print_runs: [
{publisher: 'OUP', year: 1981},
{publisher: 'Penguin', year: 1987},
{publisher: 'Harper-Collins', year: 1992}
]
I'd like to be able to filter books to return only books whose last print run was after a given date, and/or before a given date...and I've been struggling to find a feasible query. Any suggestions appreciated.
There are a few options, as getting access to the "last" element in the array and only filtering on that is difficult/impossible with the normal find options in MongoDB queries. (Unfortunately, you can't $slice with find).
Store the most recent published publisher and year in the print_runs array and in a special (denormalized/copy) of the data directly on the book object. Book.last_published_by and Book.last_published_date for example. Queries would be simple and super fast.
MapReduce. This would be simple enough to emit the last element in the array and then "reduce" it to just that. You'd need to do incremental updates on the MapReduce to keep it accurate.
Write a relatively complex aggregation framework expression
The aggregation might look like:
db.so.aggregate({ $project :
{ _id: 1, "print_run_year" : "$print_runs.year" }},
{ $unwind: "$print_run_year" },
{ $group : { _id : "$_id", "newest" : { $max : "$print_run_year" }}},
{ $match : { "newest" : { $gt : 1991, $lt: 2000 } }
})
As it may require a bit of explanation:
It projects and unwinds the year of the print runs for each book.
Then, group on the _id (of the book, and create a new computed field called, newest which contains the highest print run year (from the projection).
Then, filter on newest using a $gt and $lt
I'd suggest option #1 above would be the best from an efficiency perspective, followed by the MapReduce, and then a distant third, option #3.