Find maximum date from multiple embedded documents - mongodb

One of many documents in my collection is like below:
{ "_id" :123,
"a" :[
{ "_id" : 1,
"dt" : ISODate("2013-06-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-02-10T19:38:42Z")
}
],
"b" :[
{ "_id" : 1,
"dt" : ISODate("2013-02-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-23-10T19:38:42Z")
}
],
"c" :[
{ "_id" : 1,
"dt" : ISODate("2013-03-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-13-10T19:38:42Z")
}
]
}
I want to find the maximum date for the whole document (a,b,c).
The solution i have right now is, I loop through all root _id then do a $match in aggregation framework for each a, b, c for every root document. this sounds very inefficient, any better ideas?

Your question is very very similar to this question. Check it out.
As with that solution, your issue can be handled with MongoDB's aggregation framework, using the $project and $cond operators to repeatedly flatten your documents while preserving a max value at each step.

Related

Indexing MongoDB for sort consistency

The MongoDB documentation says that MongoDB doesn't store documents in a collection in a particular order. So if you have this collection:
db.restaurants.insertMany( [
{ "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan"},
{ "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens"},
{ "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn"},
{ "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan"},
{ "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn"},
] );
and sorting like this:
db.restaurants.aggregate(
[
{ $sort : { borough : 1 } }
]
)
Then the sort order can be inconsistent since:
the borough field contains duplicate values for both Manhattan and Brooklyn. Documents are returned in alphabetical order by borough, but the order of those documents with duplicate values for borough might not to be the same across multiple executions of the same sort.
To return a consistent result it's recommended to modify the query to:
db.restaurants.aggregate(
[
{ $sort : { borough : 1, _id: 1 } }
]
)
My question relates to the efficiency of such a query. Let's say you have millions of documents, should you create a compound index, something like { borough: 1, _id: -1 }, to make it efficient? Or is it enough to index { borough: 1 } due to the, potentially, special nature of the _id field?
I'm using MongoDB 4.4.
If you need stable sort, you will have to sort on both the fields and for performant query you will need to have a compound index on both the fields.
{ borough: 1, _id: -1 }

Mongodb accessing documents

I've the following db:
{ "_id" : 1, "results" : [ { "product" : "abc", "score" : 10 }, { "product" : "xyz", "score" : 5 } ] }
{ "_id" : 2, "results" : [ { "product" : "abc", "score" : 8 }, { "product" : "xyz", "score" : 7 } ] }
{ "_id" : 3, "results" : [ { "product" : "abc", "score" : 7 }, { "product" : "xyz", "score" : 8 } ] }
I want to show the first score of each _id, i tried the following:
db.students.find({},{"results.$":1})
But it doesn't seem to work, any advice?
You can take advantage of aggregation pipeline to solve this.
Use $project in conjunction with $arrayElemAt to point to appropriate node index in the array.
So, to extract the documents of the first score, have written below query.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results",0]}} } ]);
In case if you just wish to have scores excluding product, use $results.score as shown below.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results.score",0]}} } ]);
Here scoredoc object will have all documents of first score element.
Hope this helps!
According to above mentioned description please try executing following query in MongoDB shell
db.students.find(
{results:
{$elemMatch:{score:{$exists:true}}}}, {'results.$.score':1}
)
According to MongoDB documentation
The positional $ operator limits the contents of an from the
query results to contain only the first element matching the query
document.
Hence in above mentioned query positional $ operator is used in projection section to retrieve first score of each document.

Find field inside an array using $elemMatch

I have a invoice collection, in which I want find the document with a specified book's id.
db.invoice.find({"sold": {$elemMatch: {"book":{$elemMatch:{"_id":"574e68e5ac9fbac82489b689"}}}}})
I tried this but it didn't work
{
"_id" : ObjectId("575e9bf5576533313ac9d993"),
"sold" : [
{
"book" : {
"_id" : "574e68e5ac9fbac82489b689",
"price" : 100,
},
"quantity" : 10,
"total_price" : 1000
}
],
"date" : "13-06-2016"
}
You do not need the $elemMatch query operator here because you only specify only a single query condition.
db.invoice.find( { 'sold.book._id': '574e68e5ac9fbac82489b689' } )
This is mentioned in the documentation:
If you specify only a single condition in the $elemMatch expression, you do not need to use $elemMatch
https://docs.mongodb.com/manual/reference/operator/query/elemMatch/#op._S_elemMatch
The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria.
mongo> db.invoice.find({"sold": {$elemMatch: {"book._id":"574e68e5ac9fbac82489b689"}}}).pretty()
{
"_id" : ObjectId("575e9bf5576533313ac9d993"),
"sold" : [
{
"book" : {
"_id" : "574e68e5ac9fbac82489b689",
"price" : 100
},
"quantity" : 10,
"total_price" : 1000
}
],
"date" : "13-06-2016"
}

mongodb queries find total number of cities in the database

Hi everyone I have a huge data that contains some information like this below:
{ "_id" : "01011", "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA" }
{ "_id" : "01012", "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA" }
{ "_id" : "01013", "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA" }
{ "_id" : "01020", "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA" }
I want to be able to find the number of the cities in this database using Mongodb command. But also the database may have more than one recored that has the same city. As the example above.
I tried:
>db.zipcodes.distinct("city").count();
2015-04-25T15:57:45.446-0400 E QUERY warning: log line attempted (159k) over max size (10k), printing beginning and end ... TypeError: Object AGAWAM,BELCHERTOWN ***data*** has no method 'count'
but I didn't work with me.Also I did something like this:
>db.zipcodes.find({city:.*}).count();
2015-04-25T16:00:01.043-0400 E QUERY SyntaxError: Unexpected token .
But it didn't work also and even if does work it will count the redundant data (city). Any idea?
Instead of doing
db.zipcodes.distinct("city").count();
do this:
db.zipcodes.distinct("city").length;
and there is aggregate function, which may help you.
I have also found 1 example on aggregate (related to your query).
If you want to add condition, then you could refer $gte / $gte (aggregation) and/or $lte / $lte (aggregation)
See, if that helps.
You can also use the aggregation framework for this. The aggregation pipeline has two $group operator stages; the first groups the documents by city and the last calculates the total distinct documents from the previous stream:
db.collection.aggregate([
{
"$group": {
"_id": "$city"
}
},
{
"$group": {
"_id": 0,
"count": { "$sum": 1 }
}
}
]);
Output:
/* 1 */
{
"result" : [
{
"_id" : 0,
"count" : 3
}
],
"ok" : 1
}

Saving the result of a MongoDB query

When doing a research in mongo shell I often write quite complex queries and want the result to be stored in other collection. I know the way to do it with .forEach():
db.documents.find(query).forEach(function(d){db.results.insert(d)})
But it's kind of tedious to write that stuff each time. Is there a cleaner way? I'd like the syntax to be something like db.documents.find(query).dumpTo('collectionName').
Here's a solution I'll use: db.results.insert(db.docs.find(...).toArray())
There is still too much noise, though.
UPD: There is also an option to rewrite find using aggregation pipeline. Then you can use $out operator.
It looks like you are doing your queries from the mongo shell, which allows you to write code in javascript. You can assign the result of queries to a variable:
result = db.mycollection.findOne(my_query)
And save the result to another collection:
db.result.save(result)
You might have to remove the _id of the result if you want to append it to the result collection, to prevent a duplicate key error
Edit:
db.mycollection.findOne({'_id':db.mycollection.findOne()['_id']})
db.foo.save(db.bar.findOne(...))
If you want to save an array, you can write a javascript function. Something like the following should work (I haven't tested it):
function save_array(arr) {
for(var i = 0; i < arr.length; i++) {
db.result.save(arr[i])
}
}
...
result = db.mycollection.find(...)
save_array(result)
If you want the function to be available every time you start mongo shell, you can include it in your .mongorc.js file
As far as I know, there isn't built-in functionality to do this in MongoDB.
Other options would be to use mongoexport/mongoimport or mongodump/mongorestore functionalities.
In both mongoexport and mongodump you can filter the results by adding query options using --query <JSON> or -q <JSON>.
If your query is using an aggregation operator then the solution is as sample as using the $out.
I created a sample Collection with the name "tester" which contain the following records.
{ "_id" : ObjectId("4fb36bfd3d1c88bfa15103b1"), "name" : "bob", "value" : 5, "state" : "b"}
{ "_id" : ObjectId("4fb36c033d1c88bfa15103b2"), "name" : "bob", "value" : 3, "state" : "a"}
{ "_id" : ObjectId("4fb36c063d1c88bfa15103b3"), "name" : "bob", "value" : 7, "state" : "a"}
{ "_id" : ObjectId("4fb36c0c3d1c88bfa1a03b4"), "name" : "john", "value" : 2, "state" : "a"}
{ "_id" : ObjectId("4fb36c103d1c88bfa5103b5"), "name" : "john", "value" : 4, "state" : "b"}
{ "_id" : ObjectId("4fb36c143d1c88bfa15103b"), "name" : "john", "value" : 8, "state" : "b"}
{ "_id" : ObjectId("4fb36c163d1c88bfa15103a"), "name" : "john", "value" : 6, "state" : "a"}
Now using the aggregate operator I perform a group by and then save the result into a new collection using this magical operator "$out".
db.tester.aggregate([{$group:{
_id:{name:"$name",state:"$state"},
min:{$min:"$value"},
max:{$max:"$value"},
} },
{$out:"tester_max_min"}
])
What basically the query is trying to do is, group by name & state and find the min and max values for each individual group, and then save the result into a new collection named "tester_max_min"
db.tester_max_min.find();
The new collection formed will have the following documents in it :
{ "_id" : { "name" : "john", "state" : "b" }, "min" : 4, "max" : 8 }
{ "_id" : { "name" : "john", "state" : "a" }, "min" : 2, "max" : 6 }
{ "_id" : { "name" : "bob", "state" : "a" }, "min" : 3, "max" : 7 }
{ "_id" : { "name" : "bob", "state" : "b" }, "min" : 5, "max" : 5 }
I still need to explore how helpful can $out is but it works like a charm for any aggregator operator.