Optimizing query with &elemMatch inside array - mongodb

First of all I'd to apologize for my english.
I have serious issue with performance on my query. Unfortunetly I'm pretty new to mongoDB. So i have collection test which looks similar to this
{
"_id" : ObjectId("1"),
[...]
"statusHistories" : [
{
"created" : ISODate("2016-03-15T14:59:11.597Z"),
"status" : "STAT1",
},
{
"created" : ISODate("2016-03-15T14:59:20.465Z"),
"status" : "STAT2",
},
{
"created" : ISODate("2016-03-15T14:51:11.000Z"),
"status" : "STAT3",
}
],
}
statusHistories is an array.
Daily there's more than 3000 records inserted into that collection.
What I want to achieve is to find all tests that have given status and they are beeten two dates. So I have prepared query like this:
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
created : {
"$gte" : ISODate("2016-07-11 00:00:00.052Z"),
"$lte" : ISODate("2016-07-11 23:59:00.052Z")
},
'status' : 'STAT1'
}
}
})
It gives expected result. Unfortunetly it takes around 120 seconds to complete. Which is way too long. Surprisingly if I split this query into two seperate it takes way less:
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
created : {
"$gte" : ISODate("2016-07-11 00:00:00.052Z"),
"$lte" : ISODate("2016-07-11 23:59:00.052Z")
}
}
}
})
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
'status' : 'STAT1'
}
}
})
Both of them need less then a second in order to complete.
So what am I doing wrong with my original query? I need to take those records in one query but when I combine two elemMatch statements into one it takes ages. I tried to ensureIndex on statusHistories but it didn't work out. Any suggestion would be really helpfull.

Related

Query MongoDB collection

I have a MongoDB collection like this:
{
"_id" : {
"processUuid" : "653d0937-2afc-4915-ad42-d2b69f344402",
"partnerId" : "p377"
},
"isComplete" : true,
"tasks" : {
"dbb361a7-4b73-4691-bde5-2b160346464f" : {
"sku" : "4060079000048",
"status" : "FAILED",
"errorList" : [....]
},
"790dbc6f-563d-4eb7-931c-3cc604563dc1" : {
"sku" : "4060079000130",
"status" : "SUCCESSFUL",
"errorList" : [....]
},
... more tasks
}
... more processes
}
I want to query a certain sku and couldnt find a way. I know i could write an aggregation pipeline to project the inner part of a task, but then i would lose the task identifier which i need in order to add some stuff to a task.
I know the document structure is weird, i would have done it different, but its given.
I tried what i found about querying nested documents and so on, but unfortunately didnt get it. Any hints are appreciated.

Query date based using milliseconds time on MongoDB

I need to search all records that match a date condition. On MongoDB I've a bunch of data like this:
{
"_id" : "9ed3b937-0f43-4613-bd58-cb739a8c5bf6",
"userModels" : {
"5080" : {
"generated_date_timestamp" : NumberLong(1413382499442),
"model_id" : 5080,
},
}
"values" : {}
}
I'm not able to convert that NumberLong in something that can be used by this query:
db.anonProfile.find({
"userModels.5080.generated_date_timestamp" : { "$gte" : ISODate("2013-10-01T00:00:00.000Z") }
});
"_id" has been saved as String so I cannot use for a ObjectID search. [btw, is it possible to do?]
Any clue?
Tnx, Andrea
You can query by
db.anonProfile.find({
"userModels.5080.generated_date_timestamp" : { "$gte" : ISODate("2013-10-01T00:00:00.000Z").getTime() }
});

Get specific object in array of array in MongoDB

I need get a specific object in array of array in MongoDB.
I need get only the task object = [_id = ObjectId("543429a2cb38b1d83c3ff2c2")].
My document (projects):
{
"_id" : ObjectId("543428c2cb38b1d83c3ff2bd"),
"name" : "new project",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"members" : [
ObjectId("5424ac37eb0ea85d4c921f8b")
],
"US" : [
{
"_id" : ObjectId("5434297fcb38b1d83c3ff2c0"),
"name" : "Test Story",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"tasks" : [
{
"_id" : ObjectId("54342987cb38b1d83c3ff2c1"),
"name" : "teste3",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
},
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
]
}
]
}
Result expected:
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
But i not getting it.
I still testing with no success:
db.projects.find({
"US.tasks._id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}, { "US.tasks.$" : 1 })
I tryed with $elemMatch too, but return nothing.
db.projects.find({
"US" : {
"tasks" : {
$elemMatch : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}
}
}
})
Can i get ONLY my result expected using find()? If not, what and how use?
Thanks!
You will need an aggregation for that:
db.projects.aggregate([{$unwind:"$US"},
{$unwind:"$US.tasks"},
{$match:{"US.tasks._id":ObjectId("543429a2cb38b1d83c3ff2c2")}},
{$project:{_id:0,"task":"$US.tasks"}}])
should return
{ task : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
Explanation:
$unwind creates a new (virtual) document for each array element
$match is the query part of your find
$project is similar as to project part in find i.e. it specifies the fields you want to get in the results
You might want to add a second $match before the $unwind if you know the document you are searching (look at performance metrics).
Edit: added a second $unwind since US is an array.
Don't know what you are doing (so realy can't tell and just sugesting) but you might want to examine if your schema (and mongodb) is ideal for your task because the document looks just like denormalized relational data probably a relational database would be better for you.

Queries on arrays with timestamps

I have documents that look like this:
{
"_id" : ObjectId( "5191651568f1f6000282b81f" ),
"updated_at" : "2013-05-16T09:46:16.199660",
"activities" : [
{
"worker_name" : "image",
"completed_at" : "2013-05-13T21:34:59.293711"
},
{
"worker_name" : "image",
"completed_at" : "2013-05-16T07:33:22.550405"
},
{
"worker_name" : "image",
"completed_at" : "2013-05-16T07:41:47.845966"
}
]
}
and I would like to find only those documents where the updated_at time is greater than the last activities.completed_at time (the array is in time order)
i currently have this, but it matches any activities[].completed_at
{
"activities.completed_at" : {"$gte" : "updated_at"}
}
thanks!
update
well, i have different workers, and each has its own "completed_at".
i'll have to invert activites as follows:
activities: { image :
last : {
completed_at: t3,
},
items: [
{completed_at: t0},
{completed_at: t1},
{completed_at: t2},
{completed_at: t3},
]
}
and use this query:
{
"activities.image.last.completed_at" : {"$gte" : "updated_at"}
}
Assuming that you don't know how many activities you have (it would be easy if you always had 3 activities for example with a activities.3.completed_at positional operator) and since there's no $last positional operator, the short answer is that you cannot do this efficiently.
When the activities are inserted, I would update the record's updated_at value (or another field). Then it becomes a trivial problem.

Use aggregation framework to get peaks from a pre-aggregated dataset

I have a few collections of metrics that are stored pre-aggregated into hour and minute collections like this:
"_id" : "12345CHA-2RU020130104",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"processor_id" : NumberLong(0),
"date" : ISODate("2013-01-04T00:00:00Z"),
"processor_type" : "CHP",
"array_serial" : NumberLong(12345)
},
"hour" : {
"11" : 4.6665907,
"21" : 5.9431519999999995,
"7" : 0.6405864,
"17" : 4.712744,
---etc---
},
"minute" : {
"11" : {
"33" : 4.689972,
"32" : 4.7190895,
---etc---
},
"3" : {
"45" : 5.6883,
"59" : 4.792,
---etc---
}
The minute collection has a sub-document for each hour which has an entry for each minute with the value of the metric at that minute.
My question is about the aggregation framework, how should I process this collection if I wanted to find all minutes where the metric was above a certain highwater mark? Investigating the aggregation framework is showing an $unwind function but that seems to only work on arrays..
Would the map/reduce functionality be better suited for this? With that I could simply emit any entry above the highwatermark and count them.
You could build an array of "keys" using a reduce function that iterates through the objects attributes.
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
will give you something like
{
"results" : [
{
"hour" : "11",
"minutes" : {
"33" : 4.689972,
"32" : 4.7190895
}
},
{
"hour" : "3",
"minutes" : {
"45" : 5.6883,
"59" : 4.792
}
}
]
}
I've just done a quick test using a group() - you'll need something more complex to iterate though the sub-sub documents (minutes) but hopefully points you in right direction.
db.yourcoll.group(
{
initial: { results: [] },
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
}
);
In the finalizer you could reshape the data again. It's not going to be pretty, it might be easier to hold the minute and hour data as arrays rather than elements of the document.
hope it helps a bit