MongoDb Max Query - mongodb

If I have the following data in MongoDb:
{ "_id" : ObjectId("1"), "name" : "call", "hour": 10, "number" : 14 }
{ "_id" : ObjectId("2"), "name" : "call", "hour": 11, "number" : 100 }
{ "_id" : ObjectId("3"), "name" : "call", "hour": 12, "number" : 200 }
I want to query from the MongoDb to shows that which hour has the most number of calling, so after querying the result will be like this(I want it to show me the whole line):
{ "_id" : ObjectId("3"), "name" : "call", "hour": 12, "number" : 200 }
I find that MongoDb has its own max() function but it does not work like this. Could anybody tell me how to do this simple query?
Thank you for kij's idea. So if I query like this:
db.xxx.find({name:"call"}).sort({number:-1}).limit(1)
It gives me the correct result which I want; but it is quite stupid. Is there any other more directly way?

this should do the job:
db.collection.find().limit(1).sort( { number: -1 } )

Instead you could sort by desc and take first result. I think it's an alternative if max does not works as expected.
I did not practive since a long time but with something like that maybe:
db.collection.sort({'object.number: -1}).limit(1).first()

kij's pointed into correct answer I believe. Just add index on your "number" field and queries will be optimized.
The order of sort, limit and find is always the same in MongoDB find queries, no matter how query is written (end paragraph on page's bottom):
http://docs.mongodb.org/manual/reference/method/db.collection.find/
Aggregation in MongoDB gives more control on order, sort and limit I think.
Peter

Related

Query on all descendants of node

Given a collection of MongoDb documents with a property "myContacts" like this:
{
"_id": 123,
"myContacts" : {
"contacts" : {
"10" : {
"_id" : NumberLong(10),
"name" : "c1",
"prop" : true
},
"20" : {
"_id" : NumberLong(20),
"name" : "c2"
},
}
}
}
I want to select all documents, where at least one contact lacks the "prop" field.
I figured out a general query:
db.getCollection('xyz').find({ 'myContacts.contacts.???.prop': { $exists: false } })
The problem is that IDs of the contacts are part of the path and I cannot know them ahead. I want sth like 'myContacts.contacts.$anyChild.prop', but cannot find sth similar in the mongo docs.
Does it mean there is no way to do it?
PS: I cannot change the document structure, a live app uses it. I've spent some time with Google and my bet it's not possible. I however would like an opinion from people who have experience with Mongo.
Thank you guys for helpful comments, this got me going! I could get the results I wanted with:
db.getCollection('xyz').aggregate([{$project: {_id:1, contacts:{$objectToArray: "$myContacts.contacts"}}}, {$match: {"contacts.v.prop" : null}}])

MongoDB - Get aggregated difference between two date fields

I have one collection called lists with following fields:
{ "_id" : ObjectId("5a7c9f60c05d7370232a1b73"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:40:10Z") }
{ "_id" : ObjectId("5a7c9f85c05d7370232a1b74"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:41:10Z") }
{ "_id" : ObjectId("5a7c9f89c05d7370232a1b75"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:42:10Z") }
{ "_id" : ObjectId("5a7c9f8cc05d7370232a1b76"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:42:20Z") }
I need to find out aggregated result in the following format (the difference between processed_date and created_date):
[{
"30Sec":count_for_diffrence_1,
"<=60Sec":count_for_diffrence_2,
"<=90Sec":count_for_diffrence_3
}]
One more thing if we can find out how may item took 30 sec, 60 sec and so on, also make sure that the result for <=60 Sec should not come in <=90Sec.
Any help will be appreciated.
You can try below aggregation query in 3.6 version.
$match with $expr to limit the documents where the time difference is 90 or less seconds.
$group with $sum to count different time slices occurences.
db.collection.aggregate([
{"$match":{"$expr":{"$lte":[{"$subtract":["$processed_date","$created_date"]},90000]}}},
{"$group":{
"_id":null,
"30Sec":{"$sum":{"$cond":{"if":{"$eq":[{"$subtract":["$processed_date","$created_date"]},30000]},"then":1,"else":0}}},
"<=60Sec":{"$sum":{"$cond":{"if":{"$lte":[{"$subtract":["$processed_date","$created_date"]},60000]},"then":1,"else":0}}},
"<=90Sec":{"$sum":{"$cond":{"if":{"$lte":[{"$subtract":["$processed_date","$created_date"]},90000]},"then":1,"else":0}}}
}}
])
Note if the created date is greater than processed date you may want to add a condition to look only for values where difference is between 0 and your requested time slice.
Something like
{$and:[{"$gte":[{"$subtract":["$processed_date","$created_date"]},0]}, {"$lte":[{"$subtract":["$processed_date","$created_date"]},60000]}]}

Can sorting before grouping improve query performance in Mongo using the aggregate framework?

I'm trying to aggregate data for 100 accounts for a 14-15 month period, grouping by year and month.
However, the query performance is horrible as it takes 22-27 seconds. There are currently over 15 million records in the collection and I've got an index on the match criteria and can see using explain() that the optimizer uses it.
I tried adding another index on the sort criteria in the query below and after adding the index, the query now takes over 50 seconds! This happens even after I remove the sort from the query.
I'm extremely confused. I thought because grouping can't utilize an index, that if the collection was sorted beforehand, then the grouping could be much faster. Is this assumption correct? If not, what other options do I have? I can bear the query performance to be as much as 5 seconds but nothing more than that.
//Document Structure
{
Acc: 1,
UIC: true,
date: ISODate("2015-12-01T05:00:00Z"),
y: 2015
mm: 12
value: 22.3
}
//Query
db.MyCollection.aggregate([
{ "$match" : { "UIC" : true, "Acc" : { "$in" : [1, 2, 3, ..., 99, 100] }, "date" : { "$gte" : ISODate("2015-12-01T05:00:00Z"), "$lt" : ISODate("2017-02-01T05:00:00Z") } } },
//{ "$sort" : { "UIC" : 1, "Acc" : 1, "y" : -1, "mm" : 1 } },
{ "$group" : { "_id" : { "Num" : "$Num", "Year" : "$y", "Month" : "$mm" }, "Sum" : { "$sum" : "$value" } } }
])
What I would suggest you to do is to make a script (can be in nodejs) that aggregates the data in a different collection. When you have these long queries, what's advisable is to make a different collection containing the aggregation data and query from that.
My second advice would be to create a composed index in this aggregated collection and search by regular expression. In your case I would make an index containing accountId:period. For example, for account 1, and February of 2016, The index would be something like 1:201602.
Then you would be able to perform queries using regular expressions by account and timestamp. Like as if you wanted the registers for 2016 of account 1, you could do something like:
db.aggregatedCollection.find{_id : \1:2016\})
Hope my answer was helpful

How to store MapReduce result in hierarchically in mongo

I want to perform map-reduce operation on some metric and want to store its result aggregated and time-series.
Storing the aggregated result seems to be very simple, but how can i store the result in time-series fashion i.e. whenever the map-reduce function run's the value at that interval should also be recorded in the result collection. (i.e. time-series data)
Let's say i have a following result out of my map-reduce aggregation:-
> db.result.find()
{ "_id" : { "eventId" : 1}, "value" : { "sum" : 21 } }
{ "_id" : { "eventId" : 2}, "value" : { "sum" : 31 } }
I am able to achieve the above very easily with map_reduce aggregation framework.
I want the result to be stored in timeseries as well, like below:-
> db.result.find()
{ "_id" : { "eventId" : 1}, "value" : { "sum" : 21, "ts": {1: 15, 2: 4, 3: 2 } } }
{ "_id" : { "eventId" : 2}, "value" : { "sum" : 31, "ts": {1: 12, 2: 12, 3: 7 } } }
Now whenever the map-reduce function would run it should update the result collection.
I tried numerous ways to do so, but was unable to succeed in it. Any idea how can i achieve it?
Also, if this could be possible under the same map-reduce function call then that would be great.
The general recommendation for such time series data is to use pre-aggregated reports.
If that is not possible, first consider using the aggregation pipeline instead of map-reduce. It's faster and easier if your use case allows it.
With both the aggregation pipeline and map-reduce, you can use the results create the desired document. setOnInsert may be helpful.

Using Spring Data Mongodb, is it possible to get the max value of a field without pulling and iterating over an entire collection?

Using mongoTemplate.find(), I specify a Query with which I can call .limit() or .sort():
.limit() returns a Query object
.sort() returns a Sort object
Given this, I can say Query().limit(int).sort(), but this does not perform the desired operation, it merely sorts a limited result set.
I cannot call Query().sort().limit(int) either since .sort() returns a Sort()
So using Spring Data, how do I perform the following as shown in the mongoDB shell? Maybe there's a way to pass a raw query that I haven't found yet?
I would be ok with extending the Paging interface if need be...just doesn't seem to help any. Thanks!
> j = { order: 1 }
{ "order" : 1 }
> k = { order: 2 }
{ "order" : 2 }
> l = { order: 3 }
{ "order" : 3 }
> db.test.save(j)
> db.test.save(k)
> db.test.save(l)
> db.test.find()
{ "_id" : ObjectId("4f74d35b6f54e1f1c5850f19"), "order" : 1 }
{ "_id" : ObjectId("4f74d3606f54e1f1c5850f1a"), "order" : 2 }
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
> db.test.find().sort({ order : -1 }).limit(1)
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
You can do this in sping-data-mongodb. Mongo will optimize sort/limit combinations IF the sort field is indexed (or the #Id field). This produces very fast O(logN) or better results. Otherwise it is still O(N) as opposed to O(N*logN) because it will use a top-k algorithm and avoid the global sort (mongodb sort doc). This is from Mkyong's example but I do the sort first and set the limit to one second.
Query query = new Query();
query.with(new Sort(Sort.Direction.DESC, "idField"));
query.limit(1);
MyObject maxObject = mongoTemplate.findOne(query, MyObject.class);
Normally, things that are done with aggregate SQL queries, can be approached in (at least) three ways in NoSQL stores:
with Map/Reduce. This is effectively going through all the records, but more optimized (works with multiple threads, and in clusters). Here's the map/reduce tutorial for MongoDB.
pre-calculate the max value on each insert, and store it separately. So, whenever you insert a record, you compare it to the previous max value, and if it's greater - update the max value in the db.
fetch everything in memory and do the calculation in the code. That's the most trivial solution. It would probably work well for small data sets.
Choosing one over the other depends on your usage of this max value. If it is performed rarely, for example for some corner reporting, you can go with the map/reduce. If it is used often, then store the current max.
As far as I am aware Mongo totally supports sort then limit: see http://www.mongodb.org/display/DOCS/Sorting+and+Natural+Order
Get the max/min via map reduce is going to be very slow and should be avoided at all costs.
I don't know anything about Spring Data, but I can recommend Morphia to help with queries. Otherwise a basic way with the Java driver would be:
DBCollection coll = db.getCollection("...");
DBCursor curr = coll.find(new BasicDBObject()).sort(new BasicDBObject("order", -1))
.limit(1);
if (cur.hasNext())
System.out.println(cur.next());
Use aggregation $max .
As $max is an accumulator operator available only in the $group stage, you need to do a trick.
In the group operator use any constant as _id .
Lets take the example given in Mongodb site only --
Consider a sales collection with the following documents:
{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-01-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-02-03T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 5, "date" : ISODate("2014-02-03T09:05:00Z") }
{ "_id" : 4, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-02-15T08:00:00Z") }
{ "_id" : 5, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:05:00Z") }
If you want to find out the max price among all the items.
db.sales.aggregate(
[
{
$group:
{
_id: "1", //** This is the trick
maxPrice: { $max: "$price" }
}
}
]
)
Please note that the value of "_id" - it is "1". You can put any constant...
Since the first answer is correct but the code is obsolete, I'm replying with a similar solution that worked for me:
Query query = new Query();
query.with(Sort.by(Sort.Direction.DESC, "field"));
query.limit(1);
Entity maxEntity = mongoTemplate.findOne(query, Entity.class);