MongoDB query is slow and not using Indexes - mongodb

I have a collection with structure like this:
{
"_id" : ObjectId("59d7cd63dc2c91e740afcdb"),
"enrollment" : [
{ "month":-10, "enroll":'00'},
{ "month":-9, "enroll":'00'},
{ "month":-8, "enroll":'01'},
//other months
{ "month":8, "enroll":'11'},
{ "month":9, "enroll":'11'},
{ "month":10, "enroll":'00'}
]
}
I am trying to run the following query:
db.getCollection('collection').find({
"enrollment": {
"$not": {
"$elemMatch": { "month": { "$gte": -2, "$lte": 9 }, "enroll": "00" }
}
}
}).count()
This query is taking 1.6 to 1.9 seconds. I need to get this down as low as possible, to milli seconds if that is possible.
I tried creating multi key index on month and enrolled fields. I tried various combinations but the query is not using any indexes.
I tried all these combinations:
1. { 'enrollment.month':1 }
2. { 'enrollment.month':1 }, { 'enrollment.enroll':1 } -- two seperate indexes
3. { 'enrollment.month':1, 'enrollment.enroll':1}
4. { 'enrollment.enroll':1, 'enrollment.month':1}
Parsed Query:
Query Plan:
Any suggestions to improve the performance are highly appreciated.
I am fairly confident that the hardware is not an issues but open for any suggestions.
My data size is not huge. Its just under 1GB. Total number of documents are 41K and sub document count is approx. 13 million
Note: I have posted couple of questions on this in last few days, but with this i am trying to narrow down the area. Please do not take this as a duplicate of my earlier questions.

Try to inverse the query:
db.getCollection('collection').find({
"enrollment": {
"$elemMatch": {
"month": { "$lt": -2, "$gt": 9 },
"enroll": {$ne: "00"}
}
}
}).count()

Related

How to write a single query to count elements above a certain value in MongoDB

I have the following sample collection of movies:
[
{
"title":"Boots and Saddles",
"year":1909,
"cast":[],
"genres":[]
},
{
"title":"The Wooden Leg",
"year":1909,
"cast":[],
"genres":[]
},
{
"title":"The Sanitarium",
"year":1910,
"cast":["Fatty Arbuckle"],
"genres":["Comedy"]
},
{
"title":"Snow White",
"year":1916,
"cast":["Marguerite Clark"],
"genres":["Fantasy"]
},
{
"title":"Haunted Spooks",
"year":1920,
"cast":["Harold Lloyd"],
"genres":["Comedy"]
},
{
"title":"Very Truly Yours",
"year":1922,
"cast":["Shirley Mason", "lan Forrest"],
"genres":["Romance"]
}
]
I want to count number of movies appeared in the last 20 years (from the last movie recorded in this collection).
I have following query to find which year is the most recent movie (result shows 2018):
db.movies.find({},{"_id":0, "year":1}).sort({year:-1}).limit(1)
So to find how many movies appeared in the last 20 years I wrote this:
db.movies.aggregate([{$match:{year:{$gte:1999}}},{$count:"title"}])
However, this is not very optimized, because if the database is modified or updated,I will have to modify that query every time.
Is there a more elegant way to find the result?
Thank you in advance!
You can use mongodb aggregate method.
db.movies.aggregate([
{ $sort: { year: -1 } },
{ $limit: 1 },
{
$project: {
currentYear: { $year: new Date() },
latestMovieYear: "$year",
last20Years: { $subtract: [ "$currentYear", 20 ] }
}
},
{
$match: {
year: { $gte: "$last20Years", $lte: "$latestMovieYear" }
}
},
{ $count: "movies" }
]);
Sort the documents by year in descending order, and limit the number of documents to 1. It will return latest movie present in the collection.
Use the $project operator to create a new field currentYear that returns the current year, latestMovieYear that returns the year of the latest movie, and last20Years that subtracts 20 from the current year.
Use $match operator to filter out the movies that have a year greater than or equal to last20Years and less than or equal to latestMovieYear.
Use the $count operator to count the number of documents that match the above criteria.

MONGODB return min value for a field if falls within last year otherwise return min value for collection

I have a collection that consists of documents in this format:
{"_id":{"date_played":{"$date":"1998-03-28T00:00:00.000Z"},"course_played":4,"player_id":11},"score":[5,6,4,4,5,9,6,6,5,7,6,6,5,7,5,3,9,4],"handicap":30,"cash_won":0,"sort_order":6,"gross_score":102,"gross_sfpts":34,"skins_group":1,"score_differential":28,"pcc_adjustment":0,"player_name":"Dave"}
By _id.player_id I am trying to return the min value of "score_differential" for records within the last year(_id.date_played), if no records in the last year then I want the min "score_differential" for the player in the collection.
I have tried lots of combinations but this is the closest I have got:-
Which returns the correct values but the problem is if a date is found within the year I get two records back, one with _id: false which has lowest in collection and one with _id: true which has lowest for year. My problem is that I only want one record back not two. Any help is much appreciated as I have spent days on this, relatively new to mongodb coming from mysql.
{
'$match': {
'_id.player_id': 11
}
}, {
'$group': {
'_id': {
'$min': [
{
'$gt': [
'$_id.date_played', {
'$dateFromParts': {
'year': {
'$subtract': [
{
'$year': new Date()
}, 1
]
},
'month': {
'$month': new Date()
},
'day': {
'$dayOfMonth': new Date()
}
}
}
]
}
]
},
'minWHI': {
'$min': '$score_differential'
}
}
}
] ```
Thx but I no longer need a solution to this, I actually use the above code in a $lookup stage in another query and as that returns an array I end up with just one document returned where I can test the matched field for the values I need, I was just confused as I was trying to test each part of the main query separately and in doing so getting two documents.

MongoDB Aggregate multiple count and latest date

I'm trying get a Mongo 3.0 query that is beyond my depth. Was hoping for a bit of help. Basically, my database has transcription records whereby there is a given username, project_id, expedition_id and finished_date. Those are the fields I'm interested in. A project will have multiple expeditions, each expedition multiple transcriptions.
I would like to display information for a given user in a stats page for a given project. The display would be User Name, Total Project Transcriptions that user submitted for the whole project, Total Participated Expeditions the number of expeditions the user participated in across the project, and the last date the user actually performed a transcription.
So far, it's easy enough to get the Total Project Transcriptions by using the count on the user_name and matching the project_id
db.transcriptions.aggregate([
{ "$match" : {"projectId" => 13}},
{ "$group": {
"_id": "$user_name",
"transcriptionCount" : {"$sum" : 1 }
}
}
])
Each transcription document has an expeditionId field (4, 7, 9, 10, etc.) and the finished_date. So if a user performed 100 transcriptions, only participating in expedition 7 and 10, the Total Participated Expeditions would = 2
The last finished_date being a date showing the last time a user performed a transcription. Example of returned record:
user_name: john smith
transcriptionCount: 100
expeditionCount: 2
last_date: 2017-08-15
Hope I explained that well enough. Would appreciate any help.
You can try the below aggregation.
db.transcriptions.aggregate([
{
"$match": {
"projectId" => 13
}
},
{
"$sort": {
"finished_date": -1
}
},
{
"$group": {
"_id": "$user_name",
"transcriptionCount": {
"$sum": 1
},
"expedition": {
"$addToSet": "$expedition_id"
},
"last_date": {
"$first": "$finished_date"
}
}
},
{
"$project": {
"_id": 0,
"user_name": "$_id",
"transcriptionCount": 1,
"expeditionCount": {
"$size": "$expedition"
},
"last_date": 1
}
}
])

MongoDB - Querying between a time range of hours

I have a MongoDB datastore set up with location data stored like this:
{
"_id" : ObjectId("51d3e161ce87bb000792dc8d"),
"datetime_recorded" : ISODate("2013-07-03T05:35:13Z"),
"loc" : {
"coordinates" : [
0.297716,
18.050614
],
"type" : "Point"
},
"vid" : "11111-22222-33333-44444"
}
I'd like to be able to perform a query similar to the date range example but instead on a time range. i.e. Retrieve all points recorded between 12AM and 4PM (can be done with 1200 and 1600 24 hour time as well).
e.g.
With points:
"datetime_recorded" : ISODate("2013-05-01T12:35:13Z"),
"datetime_recorded" : ISODate("2013-06-20T05:35:13Z"),
"datetime_recorded" : ISODate("2013-01-17T07:35:13Z"),
"datetime_recorded" : ISODate("2013-04-03T15:35:13Z"),
a query
db.points.find({'datetime_recorded': {
$gte: Date(1200 hours),
$lt: Date(1600 hours)}
});
would yield only the first and last point.
Is this possible? Or would I have to do it for every day?
Well, the best way to solve this is to store the minutes separately as well. But you can get around this with the aggregation framework, although that is not going to be very fast:
db.so.aggregate( [
{ $project: {
loc: 1,
vid: 1,
datetime_recorded: 1,
minutes: { $add: [
{ $multiply: [ { $hour: '$datetime_recorded' }, 60 ] },
{ $minute: '$datetime_recorded' }
] }
} },
{ $match: { 'minutes' : { $gte : 12 * 60, $lt : 16 * 60 } } }
] );
In the first step $project, we calculate the minutes from hour * 60 + min which we then match against in the second step: $match.
Adding an answer since I disagree with the other answers in that even though there are great things you can do with the aggregation framework, this really is not an optimal way to perform this type of query.
If your identified application usage pattern is that you rely on querying for "hours" or other times of the day without wanting to look at the "date" part, then you are far better off storing that as a numeric value in the document. Something like "milliseconds from start of day" would be granular enough for as many purposes as a BSON Date, but of course gives better performance without the need to compute for every document.
Set Up
This does require some set-up in that you need to add the new fields to your existing documents and make sure you add these on all new documents within your code. A simple conversion process might be:
MongoDB 4.2 and upwards
This can actually be done in a single request due to aggregation operations being allowed in "update" statements now.
db.collection.updateMany(
{},
[{ "$set": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}}]
)
Older MongoDB
var batch = [];
db.collection.find({ "timeOfDay": { "$exists": false } }).forEach(doc => {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"timeOfDay": doc.datetime_recorded.valueOf() % (60 * 60 * 24 * 1000)
}
}
}
});
// write once only per reasonable batch size
if ( batch.length >= 1000 ) {
db.collection.bulkWrite(batch);
batch = [];
}
})
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
If you can afford to write to a new collection, then looping and rewriting would not be required:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$subtract": [ "$datetime_recorded", Date(0) ] },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
Or with MongoDB 4.0 and upwards:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
All using the same basic conversion of:
1000 milliseconds in a second
60 seconds in a minute
60 minutes in an hour
24 hours a day
The modulo from the numeric milliseconds since epoch which is actually the value internally stored as a BSON date is the simple thing to extract as the current milliseconds in the day.
Query
Querying is then really simple, and as per the question example:
db.collection.find({
"timeOfDay": {
"$gte": 12 * 60 * 60 * 1000, "$lt": 16 * 60 * 60 * 1000
}
})
Of course using the same time scale conversion from hours into milliseconds to match the stored format. But just like before you can make this whatever scale you actually need.
Most importantly, as real document properties which don't rely on computation at run-time, you can place an index on this:
db.collection.createIndex({ "timeOfDay": 1 })
So not only is this negating run-time overhead for calculating, but also with an index you can avoid collection scans as outlined on the linked page on indexing for MongoDB.
For optimal performance you never want to calculate such things as in any real world scale it simply takes an order of magnitude longer to process all documents in the collection just to work out which ones you want than to simply reference an index and only fetch those documents.
The aggregation framework may just be able to help you rewrite the documents here, but it really should not be used as a production system method of returning such data. Store the times separately.

Query For Total User Count Day By Day In MongoDB

I have "users" collection and i want day by day total user count eg:
01.01.2012 -> 5
02.01.2012 -> 9
03.01.2012 -> 18
04.01.2012 -> 24
05.01.2012 -> 38
06.01.2012 -> 48
I have createdAt attritube for each user. Can you help me about the query ?
{
"_id" : ObjectId( "5076d3e70546c971539d9f8a" ),
"createdAt" : Date( 1339964775466 ),
"points" : 200,
"profile" : null,
"userId" : "10002"
}
here this is works for, day by day count data
output i got:
30/3/2016 4
26/3/2016 4
21/3/2016 4
12/3/2016 12
14/3/2016 18
10/3/2016 10
9/3/2016 11
8/3/2016 19
7/3/2016 21
script:
model.aggregate({
$match: {
createdAt: {
$gte: new Date("2016-01-01")
}
}
}, {
$group: {
_id: {
"year": { "$year": "$createdAt" },
"month": { "$month": "$createdAt" },
"day": { "$dayOfMonth": "$createdAt" }
},
count:{$sum: 1}
}
}).exec(function(err,data){
if (err) {
console.log('Error Fetching model');
console.log(err);
} else {
console.log(data);
}
});
You have a couple of options, in order of performance :
Maintain the count in seperate aggregation documents. Every time you add a user you update the counter for that day (so, each day has its unique counter document in a, say, a users.daycounters collection). This is easily the fastest approach and scales best.
In 2.2 or higher you can use the aggregation framework. Examples close to your use case are documented here. Look for the $group operator : http://docs.mongodb.org/manual/applications/aggregation/
You can use the map/reduce framework : http://www.mongodb.org/display/DOCS/MapReduce. This is sharding compatible but relatively slow due to the JavaScript context use. Also it's not very straightforward for something as simple as this.
You can use the group() operator documented here : http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group. Since this does not work in a sharded environment and is generally slow due to the use of the single-threaded JavaScript context this is not recommended.