MongoDB aggregation - ignore key names - mongodb

I have a query:
db.events.aggregate(
{ $match: { "camera._id": "1NJE48", "start_timestamp": { $lte: 1407803834.07 } } },
{ $sort: { "start_timestamp": -1 } },
{ $limit: 2 },
{ $project: { "_id": 0, "snapshots": 1 } }
)
It returns data like so:
/* 0 */
{
"result" : [
{
"snapshots" : {
"1401330834010" : {
"uploaded_timestamp" : 1401330895,
"filename_timestamp" : 1401330834.01,
"timestamp" : 1401330834.01
},
"1401330835010" : {
"uploaded_timestamp" : 1401330896,
"filename_timestamp" : 1401330835.01,
"timestamp" : 1401330835.01
},
"1401330837010" : {
"uploaded_timestamp" : 1401330899,
"filename_timestamp" : 1401330837.01,
"timestamp" : 1401330837.01
}
}
},
{
"snapshots" : {
"1401319837010" : {
"uploaded_timestamp" : 1401319848,
"filename_timestamp" : 1401319837.01,
"timestamp" : 1401319837.01
},
"1401319838010" : {
"uploaded_timestamp" : 1401319849,
"filename_timestamp" : 1401319838.01,
"timestamp" : 1401319838.01
},
"1401319839010" : {
"uploaded_timestamp" : 1401319850,
"filename_timestamp" : 1401319839.01,
"timestamp" : 1401319839.01
}
}
}
],
"ok" : 1
}
I would like an array of snapshots:
/* 0 */
{
"result" : [
{
"uploaded_timestamp" : 1401330895,
"filename_timestamp" : 1401330834.01,
"timestamp" : 1401330834.01
},
{
"uploaded_timestamp" : 1401330896,
"filename_timestamp" : 1401330835.01,
"timestamp" : 1401330835.01
},
{
"uploaded_timestamp" : 1401330899,
"filename_timestamp" : 1401330837.01,
"timestamp" : 1401330837.01
},
{
"uploaded_timestamp" : 1401319848,
"filename_timestamp" : 1401319837.01,
"timestamp" : 1401319837.01
},
{
"uploaded_timestamp" : 1401319849,
"filename_timestamp" : 1401319838.01,
"timestamp" : 1401319838.01
},
{
"uploaded_timestamp" : 1401319850,
"filename_timestamp" : 1401319839.01,
"timestamp" : 1401319839.01
}
],
"ok" : 1
}
I.e. no key names. I'm struggling to understand how to deal with the aggregation framework when the key names are unique like they are here.

The problem is that the only way you know the key names is by looking at the document itself. MongoDB does not handle this type of situation well, in general. You are expected to know the structure of your own documents, i.e. to know what the keys are and what their types should be.
I don't know your use case and there's no sample document so I can't evaluate your data model, but having keys-as-values is generally a bad idea as you will run into a host of limitations whenever you can't say what the keys on a document should be a priori. Consider using an array instead of an embedded object for snapshots, or using an array of key-value pairs pattern like
{
...
"result" : [
{
"snapshots" : [
{
"key" : "1401330834010",
"value" : {
"uploaded_timestamp" : 1401330895,
"filename_timestamp" : 1401330834.01,
"timestamp" : 1401330834.01
},
}
]
},
...
}
If you provide a sample document and some detail about what you're trying to accomplish I'd be happy to provide more complete advice.

Came up with a stop gap solution. We will store an array of the snapshot keys, in an array on an event. It essentially acts as an index. We can then perform 2 queries - one to fetch the keys, and do a filter, and another to correctly fetch the single snapshot we need.
It's no pretty, nor backwards compatible, but it will hopefully speed it up.

Related

Mongo incorrectly sorting on array field?

I'm trying to filter on an array field which unfortunately doesn't seem to be working correctly. Everything I've read suggests that this should work, and the sort is doing something, just not what I expected and I can't explain it.
What I'm trying to achieve is sorting on an array sub-field. I've managed to achieve most of this using the positional operator but I can't work out what the sort is doing.
db.getCollection('boards')
.find({ "lastVisited.user": "AAA" }, { name: 1, "lastVisited.$" : 1 })
.sort({ "lastVisited.0.timestamp": 1 });
This results in the following output
/* 1 */
{
"_id" : ObjectId("5b642d2cac2f544b1d48d09a"),
"lastVisited" : [
{
"user" : "AAA",
"timestamp" : ISODate("2018-08-18T00:00:00.000Z")
}
]
}
/* 2 */
{
"_id" : ObjectId("5b6845245e102f3844d2181b"),
"lastVisited" : [
{
"user" : "AAA",
"timestamp" : ISODate("2018-08-16T00:00:00.000Z")
}
]
}
/* 3 */
{
"_id" : ObjectId("5b6842095e102f3844d2181a"),
"lastVisited" : [
{
"user" : "AAA",
"timestamp" : ISODate("2018-08-19T00:00:00.000Z")
}
]
}
The thing to note here is that the dates are ordered 18th then 19th then 16th which makes no sense! Can anyone explain this?
These are the documents that I've used:
/* 1 */
{
"_id" : ObjectId("5b642d2cac2f544b1d48d09a"),
"lastVisited" : [
{
"user" : "BBB",
"timestamp" : ISODate("2018-08-04T00:00:00.000Z")
},
{
"user" : "AAA",
"timestamp" : ISODate("2018-08-18T00:00:00.000Z")
}
]
}
/* 2 */
{
"_id" : ObjectId("5b6842095e102f3844d2181a"),
"lastVisited" : [
{
"user" : "AAA",
"timestamp" : ISODate("2018-08-19T00:00:00.000Z")
}
]
}
/* 3 */
{
"_id" : ObjectId("5b6845245e102f3844d2181b"),
"lastVisited" : [
{
"user" : "AAA",
"timestamp" : ISODate("2018-08-16T00:00:00.000Z")
}
]
}
Unfortunately you can't do this currently in Mongo, as it still uses the full document (not just the projected part) to sort on. So you'll need to use the aggregation framework instead. See an open issue https://jira.mongodb.org/browse/SERVER-4451
Here's an example with aggregation, as you want your sort to happen on matched elements
db.getCollection('stack1').aggregate([
// Initial document match (uses index, if a suitable one is available)
{ $match:
{ "lastVisited.user": "AAA" }
},
{ "$unwind":"$lastVisited"},
{ $match:{
"lastVisited.user": "AAA"
}},
{ "$sort": { "lastVisited.timestamp": 1 } }
])

mongodb $avg aggregation calculation out by a few decimals.

We have a collection in Mongodb which saves a value linked to a timestamp.
Our document looks as follows (I have pasted an actual one here):
{
"_id" : ObjectId("5a99596b0155fe271cfcf41d"),
"Timestamp" : ISODate("2018-03-02T16:00:00.000Z"),
"TagID" : ObjectId("59f8609eefbb4102f4c249e3"),
"Value" : 71.3,
"FileReferenceID" : ObjectId("000000000000000000000000"),
"WasValueInterpolated" : 0
}
What we then do is calculate the avg between two intervals for a given period, in more basic terms, work out an aggregated profile.
The aggregation code we use is:
{[{ "$match" :
{
"TagID" : ObjectId("59f8609eefbb4102f4c249e3") }
},
{
"$match" : { "Timestamp" : { "$gte" : ISODate("2018-03-12T00:00:00.001Z") } }
},
{
"$match" : { "Timestamp" : { "$lte" : ISODate("2018-03-13T00:00:00.001Z") } }
},
{
"$group" :
{
"_id" : { "GroupedMillisecond" :
{
"$let" :
{
"vars" :
{ "newMillisecondField" :
{
"$subtract" : ["$Timestamp", ISODate("2018-03-12T00:00:00.001Z")]
}
},
"in" : { "$subtract" : ["$$newMillisecondField", { "$mod" : ["$$newMillisecondField", NumberLong(1800000)] }] }
}
} }, "AverageValue" : { "$avg" : "$Value" }
}
}, { "$sort" : { "_id.GroupedMillisecond" : 1 } }
]}
The problem is this, the value it should give back is 71.3, but we get back 71.299999999999997
In this case, I posted above we are calculating the avg value, half hourly aggregated, for a day. And there is only one value per half hour logged (I checked this in the database). The value is also logged as a constant, as far back as I manually checked (a few months back) it is 71.3
So my question is why does the value differ?

MongoDB Aggregate Slow Performance When Using Sort

I have collection (tvshow episodes) with more than 1,200,000 document,
here is my schema :
var episodeSchema = new Schema({
imdbId: { type : String },
showId: {type : String},
episodeId: { type : String },
episodeIdNumber:{ type : Number },
episodeTitle:{ type : String },
showTitle:{type : String},
seasonNumber:{type : Number},
episodeNumber:{type : Number},
airDate : {type : String},
summary:{type : String}
});
I created Index for episodeTitle episodeIdNumber seasonNumber episodeNumber episodeId and showId
Now i used mongodb aggregate group to get every tvshow episodes
here is the aggregate query i used :
episode.aggregate( [
{ $match : { showId : "scorpion" } },
{$sort:{"episodeNumber":-1}},
{ $group: {
_id: "$seasonNumber", count: { $sum: 1 } ,
episodes : { $push: { episodeId : "$episodeId" , episodeTitle: "$episodeTitle" , episodeNumber: "$episodeNumber" , seasonNumber: "$seasonNumber" , airDate: "$airDate" } }
} }
,
{ $sort : { _id : -1 } }
] )
Now when i am run this query its take more than 2605.907 ms , after some digging i found out why its slow , it was because of using {$sort:{"episodeNumber":-1}} , without using {$sort:{"episodeNumber":-1}} its take around 19.178 ms to run.
As i mentioned above i create an index for episodeNumber field and based on MongoDB Aggregation Pipeline Optimization
i used sort after match so basically everything was ok , and i didn't anything wrong.
So after this i thought something wrong with my indexes , so i removed episodeNumber index and reindexd , but i had same time nothing changed.
At end one time i tried run aggregate group query without episodeNumber indexed and surprisingly it was faster ! its take around 20.118 ms .
I wants know why this happened , isn't indexes to get faster query ?
Update
query explain output :
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"showId" : "scorpion"
},
"sort" : {
"episodeNumber" : -1
},
"fields" : {
"airDate" : 1,
"episodeId" : 1,
"episodeNumber" : 1,
"episodeTitle" : 1,
"seasonNumber" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.episodes",
"indexFilterSet" : false,
"parsedQuery" : {
"showId" : {
"$eq" : "scorpion"
}
},
"winningPlan" : {
"stage" : "EOF"
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : "$seasonNumber",
"count" : {
"$sum" : {
"$const" : 1
}
},
"episodes" : {
"$push" : {
"episodeId" : "$episodeId",
"episodeTitle" : "$episodeTitle",
"episodeNumber" : "$episodeNumber",
"seasonNumber" : "$seasonNumber",
"airDate" : "$airDate"
}
}
}
},
{
"$sort" : {
"sortKey" : {
"_id" : -1
}
}
}
],
"ok" : 1
}

Update Array Fields

I have a Document Structure as shown below
let's say for a tree structure like ORDERS.MAIN_ORDERS.P1, this ;basically created 3 document showing parents for each node
{
"_id" : ObjectId("5362a7fd300400ffbabc3754"),
"categoryid" : "ORDERS",
"ancestors" : [],
"availableLocales" : [
{
"locale" : "en_US",
"count" : 1
}
],
"rolledUpCount" : 1
}
{
"_id" : ObjectId("5362a7fd300400ffbabc3755"),
"categoryid" : "MAIN_ORDERS",
"ancestors" : [
"ORDERS"
],
"availableLocales" : [
{
"locale" : "en_US",
"count" : 1
}
],
"rolledUpCount" : 1
}
{
"_id" : ObjectId("5362a7fd300400ffbabc3756"),
"categoryid" : "P1",
"ancestors" : [
"ORDERS",
"MAIN_ORDERS"
],
"availableLocales" : [
{
"locale" : "en_US",
"count" : 1
}
],
"rolledUpCount" : 1
}
When a new tree comes in(e;g ORDERS.MAIN_ORDERS.P2) i need to increment the count of all the parents for that categoryid( P2) for that particular locale. how do i do that with mongoQuery?
You can try to update using the following:
db.mycollection.update(
{ "availableLocales.locale" : { "$in": ["en_US"] } },
{ "$inc" : { "availableLocales.count" : 1 } }
})
This update uses the $in and $inc operators.
Find collection where availableLocales holds an array containing locale with "en_US" value.
{ "availableLocales.locale" : { "$in": ["en_US"] } }
Increment the value of the count field by 1. Note that you can pass multi:true to update all matching documents, not just one.
{ "$inc" : { "availableLocales.count" : 1 } }
Please see the docs for more info on how these works.
--- more info:
Update ORDERS:
db.mycollection.update(
{
"categoryid" : "ORDERS",
"availableLocales.locale" : { "$in": ["en_US"] }
},
{ "$inc" : { "availableLocales.count" : 1 } }
})
Update MAIN_ORDERS:
db.mycollection.update(
{ "categoryid" : "MAIN_ORDERS", "availableLocales.locale" : { "$in": ["en_US"] } },
{ "$inc" : { "availableLocales.count" : 1 } }
})
Hope that this clarifies things further. It should point you in the right direction!

Mongo map-reduce output, how to read results back?

I have a map-reduce query that "works" and does what I want however I have so far spectacularly failed to make use of my output data because I cannot workout how to read it back... let me explain... here is my emit:
emit( { jobid: this.job_id, type: this.type}, { count: 1 })
and the reduce function:
reduce: function (key, values) {
var total = 0;
for( i = 0; i < values.length; i++ ) {
total += values[i].count;
}
return { jobid: this.job_id, type:this.type, count: total};
},
It functions and the output I get in the results collection looks like this:
{ "_id" : { "jobid" : "5051ef142a120", "type" : 3 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051ef142a120", "type" : 5 }, "value" : { "count" : 43 } }
{ "_id" : { "jobid" : "5051f1a9d5442", "type" : 2 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f1a9d5442", "type" : 3 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f299340b1", "type" : 2 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f299340b1", "type" : 3 }, "value" : { "count" : 1 } }
BUT HOW the hell do I issue a query that says find me all jobid entries by "jobid" whilst ignoring the type? I tried this intiailly, expecting two rows of output but got none!
db.mrtest.find( { "_id": { "jobid" : "5051f299340b1" }} );
I have also tried and failed with:
db.mrtest.find( { "_id": { "jobid" : "5051f299340b1" }} );
and whilst:
db.mrtest.find( { "_id" : { "jobid" : "5051f299340b1", "type" : 2 }} )
does produce one row of output as hoped for, changing it to this again fails to produce anything:
db.mrtest.find( { "_id" : { "jobid" : "5051f299340b1", "type" : { $in: [2] }}} )
I get the impression that you can't do such things with the _id field, or can you? I am thinking I need to re-organise my mr output instead but that feels like failing somehow ?!?!
Help!
PS: If anybody can explain why the count is contained in a field called "value", that would also be welcome!"5051f299340b1"
Have you tried:
db.mrtest.find( { "_id.jobid": "506ea3a85e126" })
That works for me!
db.mrtest.find( { "_id.jobid": "506ea3a85e126" })