Queries on arrays with timestamps - mongodb

I have documents that look like this:
{
"_id" : ObjectId( "5191651568f1f6000282b81f" ),
"updated_at" : "2013-05-16T09:46:16.199660",
"activities" : [
{
"worker_name" : "image",
"completed_at" : "2013-05-13T21:34:59.293711"
},
{
"worker_name" : "image",
"completed_at" : "2013-05-16T07:33:22.550405"
},
{
"worker_name" : "image",
"completed_at" : "2013-05-16T07:41:47.845966"
}
]
}
and I would like to find only those documents where the updated_at time is greater than the last activities.completed_at time (the array is in time order)
i currently have this, but it matches any activities[].completed_at
{
"activities.completed_at" : {"$gte" : "updated_at"}
}
thanks!
update
well, i have different workers, and each has its own "completed_at".
i'll have to invert activites as follows:
activities: { image :
last : {
completed_at: t3,
},
items: [
{completed_at: t0},
{completed_at: t1},
{completed_at: t2},
{completed_at: t3},
]
}
and use this query:
{
"activities.image.last.completed_at" : {"$gte" : "updated_at"}
}

Assuming that you don't know how many activities you have (it would be easy if you always had 3 activities for example with a activities.3.completed_at positional operator) and since there's no $last positional operator, the short answer is that you cannot do this efficiently.
When the activities are inserted, I would update the record's updated_at value (or another field). Then it becomes a trivial problem.

Related

Optimizing query with &elemMatch inside array

First of all I'd to apologize for my english.
I have serious issue with performance on my query. Unfortunetly I'm pretty new to mongoDB. So i have collection test which looks similar to this
{
"_id" : ObjectId("1"),
[...]
"statusHistories" : [
{
"created" : ISODate("2016-03-15T14:59:11.597Z"),
"status" : "STAT1",
},
{
"created" : ISODate("2016-03-15T14:59:20.465Z"),
"status" : "STAT2",
},
{
"created" : ISODate("2016-03-15T14:51:11.000Z"),
"status" : "STAT3",
}
],
}
statusHistories is an array.
Daily there's more than 3000 records inserted into that collection.
What I want to achieve is to find all tests that have given status and they are beeten two dates. So I have prepared query like this:
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
created : {
"$gte" : ISODate("2016-07-11 00:00:00.052Z"),
"$lte" : ISODate("2016-07-11 23:59:00.052Z")
},
'status' : 'STAT1'
}
}
})
It gives expected result. Unfortunetly it takes around 120 seconds to complete. Which is way too long. Surprisingly if I split this query into two seperate it takes way less:
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
created : {
"$gte" : ISODate("2016-07-11 00:00:00.052Z"),
"$lte" : ISODate("2016-07-11 23:59:00.052Z")
}
}
}
})
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
'status' : 'STAT1'
}
}
})
Both of them need less then a second in order to complete.
So what am I doing wrong with my original query? I need to take those records in one query but when I combine two elemMatch statements into one it takes ages. I tried to ensureIndex on statusHistories but it didn't work out. Any suggestion would be really helpfull.

Resolving MongoDB DBRef array using Mongo Native Query and working on the resolved documents

My MongoDB collection is made up of 2 main collections :
1) Maps
{
"_id" : ObjectId("542489232436657966204394"),
"fileName" : "importFile1.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042e9")
},
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042ea")
}
]
},
{
"_id" : ObjectId("542489262436657966204398"),
"fileName" : "importFile2.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("542489232436657966204395")
}
],
"uploadDate" : ISODate("2012-08-22T09:06:40.000Z")
}
2) Territories, which are referenced in "Map" objects :
{
"_id" : ObjectId("5424892224366579662042e9"),
"name" : "Afghanistan",
"area" : 653958
},
{
"_id" : ObjectId("5424892224366579662042ea"),
"name" : "Angola",
"area" : 1252651
},
{
"_id" : ObjectId("542489232436657966204395"),
"name" : "Unknown",
"area" : 0
}
My objective is to list every map with their cumulative area and number of territories. I am trying the following query :
db.maps.aggregate(
{'$unwind':'$territories'},
{'$group':{
'_id':'$fileName',
'numberOf': {'$sum': '$territories.name'},
'locatedArea':{'$sum':'$territories.area'}
}
})
However the results show 0 for each of these values :
{
"result" : [
{
"_id" : "importFile2.json",
"numberOf" : 0,
"locatedArea" : 0
},
{
"_id" : "importFile1.json",
"numberOf" : 0,
"locatedArea" : 0
}
],
"ok" : 1
}
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example of such a case in the Mongo doc. area is stored as an integer, and name as a string.
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example
of such a case in the Mongo doc. area is stored as an integer, and
name as a string.
Yes indeed, the field "territories" has an array of database references and not the actual documents. DBRefs are objects that contain information with which we can locate the actual documents.
In the above example, you can clearly see this, fire the below mongo query:
db.maps.find({"_id":ObjectId("542489232436657966204394")}).forEach(function(do
c){print(doc.territories[0]);})
it will print the DBRef object rather than the document itself:
o/p: DBRef("territories", ObjectId("5424892224366579662042e9"))
so, '$sum': '$territories.name','$sum': '$territories.area' would show you '0' since there are no fields such as name or area.
So you need to resolve this reference to a document before doing something like $territories.name
To achieve what you want, you can make use of the map() function, since aggregation nor Map-reduce support sub queries, and you already have a self-contained map document, with references to its territories.
Steps to achieve:
a) get each map
b) resolve the `DBRef`.
c) calculate the total area, and the number of territories.
d) make and return the desired structure.
Mongo shell script:
db.maps.find().map(function(doc) {
var territory_refs = doc.territories.map(function(terr_ref) {
refName = terr_ref.$ref;
return terr_ref.$id;
});
var areaSum = 0;
db.refName.find({
"_id" : {
$in : territory_refs
}
}).forEach(function(i) {
areaSum += i.area;
});
return {
"id" : doc.fileName,
"noOfTerritories" : territory_refs.length,
"areaSum" : areaSum
};
})
o/p:
[
{
"id" : "importFile1.json",
"noOfTerritories" : 2,
"areaSum" : 1906609
},
{
"id" : "importFile2.json",
"noOfTerritories" : 1,
"areaSum" : 0
}
]
Map-Reduce functions should not be and cannot be used to resolve DBRefs in the server side.
See what the documentation has to say:
The map function should not access the database for any reason.
The map function should be pure, or have no impact outside of the
function (i.e. side effects.)
The reduce function should not access the database, even to perform
read operations. The reduce function should not affect the outside
system.
Moreover, a reduce function even if used(which can never work anyway) will never be called for your problem, since a group w.r.t "fileName" or "ObjectId" would always have only one document, in your dataset.
MongoDB will not call the reduce function for a key that has only a
single value

Ensure Unique indexes in embedded doc in mongodb

Is there a way to make a subdocument within a list have a unique field in mongodb?
document structure:
{
"_id" : "2013-08-13",
"hours" : [
{
"hour" : "23",
"file" : [
{
"date_added" : ISODate("2014-04-03T18:54:36.400Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.410Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.402Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.671Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
}
]
}
]
}
I want to make sure that the document's hours.hour value has a unique item when inserted. The issue is hours is a list. Can you ensureIndex in this way?
Indexes are not the tool for ensuring uniqueness in an embedded array, rather they are used across documents to ensure that certain fields do not repeat there.
As long as you can be certain that the content you are adding does not differ from any other value in any way then you can use the $addToSet operator with update:
db.collection.update(
{ "_id": "2013-08-13", "hour": 23 },
{ "$addToSet": {
"hours.$.file": {
"date_added" : ISODate("2014-04-03T18:54:36.671Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
}
}}
)
So that document would not be added as there is already an element matching those exact values within the target array. If the content was different (and that means any part of the content, then a new item would be added.
For anything else you would need to maintain that manually by loading up the document and inspecting the elements of the array. Say for a different "filename" with exactly the same timestamp.
Problems with your Schema
Now the question is answered I want to point out the problems with your schema design.
Dates as strings are "horrible". You may think you need them but you do not. See the aggregation framework date operators for more on this.
You have nested arrays, which generally should be avoided. The general problems are shown in the documentation for the positional $ operator. That says you only get one match on position, and that is always the "top" level array. So updating beyond adding things as shown above is going to be difficult.
A better schema pattern for you is to simply do this:
{
"date_added" : ISODate("2014-04-03T18:54:36.400Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.410Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.402Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.671Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
}
If that is in it's own collection then you can always actually use indexes to ensure uniqueness. The aggregation framework can break down the date parts and hours where needed.
Where you must have that as part of another document then try at least to avoid the nested arrays. This would be acceptable but not as flexible as separating the entries:
{
"_id" : "2013-08-13",
"hours" : {
"23": [
{
"date_added" : ISODate("2014-04-03T18:54:36.400Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.410Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.402Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
},
{
"date_added" : ISODate("2014-04-03T18:54:36.671Z"),
"name" : "1376434800_file_output_2014-03-10-09-27_44.csv"
}
]
}
}
It depends on your intended usage, the last would not allow you to do any type of aggregation comparison across hours within a day. Not in any simple way. The former does this easily and you can still break down selections by day and hour with ease.
Then again, if you are only ever appending information then your existing schema should be find. But be aware of the possible issues and alternatives.

sort by date with aggregate request in mongodb

I would like to retrieve a list of values ​​that comes from the oldest document currently signed.But i failed to select a document absed on the date.Thanks
here is json :
"ad" : "noc3",
"createdDate" : ISODate(),
"list" : [
{
"id" : "p45",
"value" : 21,
},
{
"id" : "p6",
"value" : 20,
},
{
"id" : "4578",
"value" : 319
}
]
and here my aggregate request :
db.friends.aggregate({$match:{advertiser:"noc3", {$sort:{timestamps:-1},{$limit:1} }},{$unwind:"$list"},{$project:{_id: "$list.id", value:{$add:[0]}}});
Your aggregate query is incorrect. You add the sort and limit to the match, but that's now how you do that. You use different pipeline operators:
db.friends.aggregate( [
{ $match: { advertiser: "noc3" } },
{ $sort: { createdDate: -1 } },
{ $limit: 1 },
Your other pipeline operators are bit strange too, and your code vs query mismatches on timestamps vs createdDate. If you add the expected output, I can update the answer to include the last bits of the query too.

Use aggregation framework to get peaks from a pre-aggregated dataset

I have a few collections of metrics that are stored pre-aggregated into hour and minute collections like this:
"_id" : "12345CHA-2RU020130104",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"processor_id" : NumberLong(0),
"date" : ISODate("2013-01-04T00:00:00Z"),
"processor_type" : "CHP",
"array_serial" : NumberLong(12345)
},
"hour" : {
"11" : 4.6665907,
"21" : 5.9431519999999995,
"7" : 0.6405864,
"17" : 4.712744,
---etc---
},
"minute" : {
"11" : {
"33" : 4.689972,
"32" : 4.7190895,
---etc---
},
"3" : {
"45" : 5.6883,
"59" : 4.792,
---etc---
}
The minute collection has a sub-document for each hour which has an entry for each minute with the value of the metric at that minute.
My question is about the aggregation framework, how should I process this collection if I wanted to find all minutes where the metric was above a certain highwater mark? Investigating the aggregation framework is showing an $unwind function but that seems to only work on arrays..
Would the map/reduce functionality be better suited for this? With that I could simply emit any entry above the highwatermark and count them.
You could build an array of "keys" using a reduce function that iterates through the objects attributes.
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
will give you something like
{
"results" : [
{
"hour" : "11",
"minutes" : {
"33" : 4.689972,
"32" : 4.7190895
}
},
{
"hour" : "3",
"minutes" : {
"45" : 5.6883,
"59" : 4.792
}
}
]
}
I've just done a quick test using a group() - you'll need something more complex to iterate though the sub-sub documents (minutes) but hopefully points you in right direction.
db.yourcoll.group(
{
initial: { results: [] },
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
}
);
In the finalizer you could reshape the data again. It's not going to be pretty, it might be easier to hold the minute and hour data as arrays rather than elements of the document.
hope it helps a bit