Mongodb find comparing array elements - mongodb

I have a collection with about 200K documents like this:
db.place.find()[0]
{
"_id" : ObjectId("5290de1111afb260363aa4a1"),
"name" : "place X",
"center" : [x, y]
}
Now I`m trying to query for the places where the center Y is greater than X, and having the following problem:
> db.place.find({'center.0':{'$gt':center.1}}).count()
Sat Nov 23 14:42:01.556 JavaScript execution failed: SyntaxError: Unexpected number
Any hints?
Thanks in advance

Because you happen to have exact format of the field every time (circle is a two element array) you can transform it in aggregation framework into two fields and then compare them in a projection, and match to get back just the elements satisfying your requirement of second array element being greater than first array element.
db.place.aggregate( [
{ $unwind : "$center" },
{ $group : { _id : "$_id",
centerX : {$first:"$center"},
centerY : {$last:"$center"}
} },
{ $project : { YgtX : { $gt : [ "$centerY", "$centerX" ] } } },
{ $match : { YgtX : true } }
] );
Now, if your array was an arbitrary pair of numerical values, then you can use the above.
You said in comments that your pair represented coordinates (lat, long) - keep in mind that in MongoDB coordinate pairs are always stored as long, lat - if your actual x, y values were coordinates in on a flat (as opposed to spherical) place, you could find all the documents that had Y coordinate greater than X coordinate with a single geospatial query:
db.place.find( { center : { $geoWithin : { $geometry : {
type:"Polygon",
coordinates:[[[50,50],[-50,50],[-50,-50],[50,50]]]
} } } } );
The above query assumes that your coordinate system goes from -50 to 50 along X and Y and it finds all points in the triangle that represents all coordinates having Y >= X.

It seems that you need to use $where operator instead.
db.place.find({$where: function() {return this.center[0] > this.center[1]}})
For example, there are 3 documents in collection:
{ "_id" : ObjectId("52910457c7d99f10949e5a85"), "name" : "place X", "center" : [ 2, 3 ] }
{ "_id" : ObjectId("52910463c7d99f10949e5a86"), "name" : "place Y", "center" : [ 3, 2 ] }
{ "_id" : ObjectId("5291046ac7d99f10949e5a87"), "name" : "place Y", "center" : [ 8, 9 ] }
The result of the $where command will be:
{ "_id" : ObjectId("52910463c7d99f10949e5a86"), "name" : "place Y", "center" : [ 3, 2 ] }

You can not do the query you want in a simple way in mongo because mongo does not support searching or updating based on the element in the collection. So even such simple document as {a : 1, b : 1} and find the document where a = b is impossible without $where clause.
The solution suggested by idbentley db.place.find({'center.0':{'$gt':'center.1'}}) will not work as well (also will not generate an error) because this way you will compare center.0 to a string 'center.1'. Therefore correct solution is the solution of Victoria Malaya (but she forgot to put .count() in the end).
One thing I would like to suggest. Anything with where is very very slow. So if you plant to do this query more then once, think about creating additional field which will store this precomputed result (you can do it in a similar fashion with this answer).

You can also use the $expr operator: https://docs.mongodb.com/manual/reference/operator/query/expr/
db.place.find({$expr: {$gt: ['$center.0', '$center.1']}}).count()
However: Similar to $where $expr is very slow, read here for details: Mongodb $expr query is very slow
Example here: https://medium.com/#deepakjoseph08/mongodb-query-expressions-that-compare-fields-from-the-same-document-ccb45a1f064b

Related

MongoDB Aggregation - Buckets Boundaries to Referenced Array

To whom this may concern:
I would like to know if there is some workaround in MongoDB to set the "boundaries" field of a "$bucket" aggregation pipeline stage to an array that's already in the previous aggregation stage. (Or some other aggregation pipeline that will get me the same result). I am using this data to create a histogram of a bunch of values. Rather than retrieve 1 million-or-so values, I can receive 20 buckets with their respective counts.
The previous stages of yield the following result:
{
"_id" : ObjectId("5cfa6fad883d3a9b8c6ad50a"),
"boundaries" : [ 73.0, 87.25, 101.5, 115.75, 130.0 ],
"value" : 83.58970621935025
},
{
"_id" : ObjectId("5cfa6fe0883d3a9b8c6ad5a8"),
"boundaries" : [ 73.0, 87.25, 101.5, 115.75, 130.0 ],
"value" : 97.3261380262403
},
...
The "boundaries" field for every document is a result a facet/unwind/addfield with some statistical mathematics involving "value" fields in the pipeline. Therefore, every "boundaries" field value is an array of evenly spaced values in ascending order, all with the same length and values.
The following stage of the aggregation I am trying to perform is:
$bucket: {
groupBy: "$value",
boundaries : "$boundaries" ,
default: "no_group",
output: { count: { $sum: 1 } }
}
I get the following error from the explain when I try to run this aggregation:
{
"ok" : 0.0,
"errmsg" : "The $bucket 'boundaries' field must be an array, but found type: string.",
"code" : NumberInt(40200),
"codeName" : "Location40200"
}
The result I would like to get is something like this, which is the result of a basic "$bucket" pipeline operator:
{
"_id" : 73.0, // range of [73.0,87.25)
"count" : 2 // number of documents with "value" in this range.
}, {
"_id" : 87.25, // range of [87.25,101.5)
"count" : 7 // number of documents with "value" in this range.
}, {
"_id" : 101.5,
"count" : 3
}, ...
What I know:
The JIRA documentation says
'boundaries' must be constant values (can't use "$x", but can use {$add: [4, 5]}), and must be sorted.
What I've tried:
$bucketAuto does not have a linear "granularity" setting. By default, it tries to evenly distribute the values amongst the buckets, and the bucket ranges are therefore spaced differently.
Building the constant array by retrieving the pipeline results, and then adding the constant array into the pipeline again. This is effective but inefficient and not atomic, as it creates an O(2N) time complexity. I can live with this solution if needs be.
There HAS to be a solution to this. Any workaround or alternative solutions are greatly appreciated.
Thank you for your time!

MongoDB Calculate Values from Two Arrays, Sort and Limit

I have a MongoDB database storing float arrays. Assume a collection of documents in the following format:
{
"id" : 0,
"vals" : [ 0.8, 0.2, 0.5 ]
}
Having a query array, e.g., with values [ 0.1, 0.3, 0.4 ], I would like to compute for all elements in the collection a distance (e.g., sum of differences; for the given document and query it would be computed by abs(0.8 - 0.1) + abs(0.2 - 0.3) + abs(0.5 - 0.4) = 0.9).
I tried to use the aggregation function of MongoDB to achieve this, but I can't work out how to iterate over the array. (I am not using the built-in geo operations of MongoDB, as the arrays can be rather long)
I also need to sort the results and limit to the top 100, so calculation after reading the data is not desired.
Current Processing is mapReduce
If you need to execute this on the server and sort the top results and just keep the top 100, then you could use mapReduce for this like so:
db.test.mapReduce(
function() {
var input = [0.1,0.3,0.4];
var value = Array.sum(this.vals.map(function(el,idx) {
return Math.abs( el - input[idx] )
}));
emit(null,{ "output": [{ "_id": this._id, "value": value }]});
},
function(key,values) {
var output = [];
values.forEach(function(value) {
value.output.forEach(function(item) {
output.push(item);
});
});
output.sort(function(a,b) {
return a.value < b.value;
});
return { "output": output.slice(0,100) };
},
{ "out": { "inline": 1 } }
)
So the mapper function does the calculation and output's everything under the same key so all results are sent to the reducer. The end output is going to be contained in an array in a single output document, so it is both important that all results are emitted with the same key value and that the output of each emit is itself an array so mapReduce can work properly.
The sorting and reduction is done in the reducer itself, as each emitted document is inspected the elements are put into a single tempory array, sorted, and the top results are returned.
That is important, and just the reason why the emitter produces this as an array even if a single element at first. MapReduce works by processing results in "chunks", so even if all emitted documents have the same key, they are not all processed at once. Rather the reducer puts it's results back into the queue of emitted results to be reduced until there is only a single document left for that particular key.
I'm restricting the "slice" output here to 10 for brevity of listing, and including the stats to make a point, as the 100 reduce cycles called on this 10000 sample can be seen:
{
"results" : [
{
"_id" : null,
"value" : {
"output" : [
{
"_id" : ObjectId("56558d93138303848b496cd4"),
"value" : 2.2
},
{
"_id" : ObjectId("56558d96138303848b49906e"),
"value" : 2.2
},
{
"_id" : ObjectId("56558d93138303848b496d9a"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d93138303848b496ef2"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497861"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497b58"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497ba5"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497c43"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d95138303848b49842b"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d96138303848b498db4"),
"value" : 2.1
}
]
}
}
],
"timeMillis" : 1758,
"counts" : {
"input" : 10000,
"emit" : 10000,
"reduce" : 100,
"output" : 1
},
"ok" : 1
}
So this is a single document output, in the specific mapReduce format, where the "value" contains an element which is an array of the sorted and limitted result.
Future Processing is Aggregate
As of writing, the current latest stable release of MongoDB is 3.0, and this lacks the functionality to make your operation possible. But the upcoming 3.2 release introduces new operators that make this possible:
db.test.aggregate([
{ "$unwind": { "path": "$vals", "includeArrayIndex": "index" }},
{ "$group": {
"_id": "$_id",
"result": {
"$sum": {
"$abs": {
"$subtract": [
"$vals",
{ "$arrayElemAt": [ { "$literal": [0.1,0.3,0.4] }, "$index" ] }
]
}
}
}
}},
{ "$sort": { "result": -1 } },
{ "$limit": 100 }
])
Also limitting to the same 10 results for brevity, you get output like this:
{ "_id" : ObjectId("56558d96138303848b49906e"), "result" : 2.2 }
{ "_id" : ObjectId("56558d93138303848b496cd4"), "result" : 2.2 }
{ "_id" : ObjectId("56558d96138303848b498e31"), "result" : 2.1 }
{ "_id" : ObjectId("56558d94138303848b497c43"), "result" : 2.1 }
{ "_id" : ObjectId("56558d94138303848b497861"), "result" : 2.1 }
{ "_id" : ObjectId("56558d96138303848b499037"), "result" : 2.1 }
{ "_id" : ObjectId("56558d96138303848b498db4"), "result" : 2.1 }
{ "_id" : ObjectId("56558d93138303848b496ef2"), "result" : 2.1 }
{ "_id" : ObjectId("56558d93138303848b496d9a"), "result" : 2.1 }
{ "_id" : ObjectId("56558d96138303848b499182"), "result" : 2.1 }
This is made possible largely due to $unwind being modified to project a field in results that contains the array index, and also due to $arrayElemAt which is a new operator that can extract an array element as a singular value from a provided index.
This allows the "look-up" of values by index position from your input array in order to apply the math to each element. The input array is facilitated by the existing $literal operator so $arrayElemAt does not complain and recongizes it as an array, ( seems to be a small bug at present, as other array functions don't have the problem with direct input ) and gets the appropriate matching index value by using the "index" field produced by $unwind for comparison.
The math is done by $subtract and of course another new operator in $abs to meet your functionality. Also since it was necessary to unwind the array in the first place, all of this is done inside a $group stage accumulating all array members per document and applying the addition of entries via the $sum accumulator.
Finally all result documents are processed with $sort and then the $limit is applied to just return the top results.
Summary
Even with the new functionallity about to be availble to the aggregation framework for MongoDB it is debatable which approach is actually more efficient for results. This is largely due to there still being a need to $unwind the array content, which effectively produces a copy of each document per array member in the pipeline to be processed, and that generally causes an overhead.
So whilst mapReduce is the only present way to do this until a new release, it may actually outperform the aggregation statement depending on the amount of data to be processed, and despite the fact that the aggregation framework works on native coded operators rather than translated JavaScript operations.
As with all things, testing is always recommended to see which case suits your purposes better and which gives the best performance for your expected processing.
Sample
Of course the expected result for the sample document provided in the question is 0.9 by the math applied. But just for my testing purposes, here is a short listing used to generate some sample data that I wanted to at least verify the mapReduce code was working as it should:
var bulk = db.test.initializeUnorderedBulkOp();
var x = 10000;
while ( x-- ) {
var vals = [0,0,0];
vals = vals.map(function(val) {
return Math.round((Math.random()*10),1)/10;
});
bulk.insert({ "vals": vals });
if ( x % 1000 == 0) {
bulk.execute();
bulk = db.test.initializeUnorderedBulkOp();
}
}
The arrays are totally random single decimal point values, so there is not a lot of distribution in the listed results I gave as sample output.

Understanding overlapping mongodb projections

I'm struggling to understand how overlapping projections work in mongodb.
Here's a quick example to illustrating my conundrum (results from the in-browser mongodb console).
First, I created and inserted a simple document:
var doc = {
id:"universe",
systems: {
1: {
name:"milky_way",
coords:{x:1, y:1}
}
}
}
db.test.insert(doc);
// doc succesfully inserted
Next, I try a somewhat odd projection:
db.test.find({}, {"systems.1.coords":1, "systems.1":1});
//result
"_id" : ObjectId("537fd3541cdcaf1ba735becb"),
"systems" : {
"1" : {
"coords" : {
"y" : 1,
"x" : 1
}
}
}
}
I expected to see the entirety of "system 1," including the name field. But it appears the deeper path to "systems.1.coords" overwrote the shallower path to just "system.1"?
I decide to test this "deeper path overrides shallower path" theory:
db.test.find({}, {"systems.1.coords.x":1, "systems.1.coords":1});
//result
"_id" : ObjectId("537fd3541cdcaf1ba735becb"),
"systems" : {
"1" : {
"coords" : {
"y" : 1, // how'd this get here, but for the shallower projection?
"x" : 1
}
}
}
}
Here, my deeper projection didn't override the shallower one.
What gives? How is mongodb dealing with overlapping projections? I can't find the logic to it.
EDIT:
My confusion was stemming from what counted as a "top level" path.
This worked like I expected: .findOne({}, {"systems.1":1, "systems":1}) (i.e., a full set of systems is returned, notwithstanding that I started with what appeared to be a "narrower" projection).
However, this did not work like I expected: .findOne({}, {"systems.1.name":1, "systems.1":1}) (i.e., only the name field of system.1 is returned).
In short, going more than "one dot" deep leads to the overwriting discussed in the accepted answer.
You cannot do this sort of projection using .find() as the general projections allowed are basic field selection. What you are talking about is document re-shaping and for that you can use the $project operator with the .aggregate() method.
So by your initial example:
db.test.aggregate([
{ "$project": {
"coords": "$systems.1.coords",
"systems": 1
}}
])
That will give you output like this:
{
"_id" : ObjectId("537fe2127cb762d14e2a1007"),
"systems" : {
"1" : {
"name" : "milky_way",
"coords" : {
"x" : 1,
"y" : 1
}
}
},
"coords" : {
"x" : 1,
"y" : 1
}
}
Note the different field naming there as well, as if for no other reason, the version coming from .find() would result in overlapping paths ("systems" is the same) for the levels of fields you were trying to select and therefore cannot be projected as two fields the way you can do right here.
In much the same way, consider a statement like the following:
db.test.aggregate([
{ "$project": {
"systems": {
"1": {
"coords": "$systems.1.coords"
}
},
"systems": 1
}}
])
So that is not telling you it's invalid, it's just that one of the results in the projection is overwriting the other as essentially at a top level they are both callled "systems".
This is basically what you end up with when trying to do something like this with the projection method available to .find(). So the essential part is you need a different field name, and this is what the aggregation framework ( though not aggregating here ) allows you to do.

Only retrieve back select sub properties and limit how many items they contain

I have a very simple document:
{
"_id" : ObjectId("5347ff73e4b0e4fcbbb7886b"),
"userName" : "ztolley",
"firstName" : "Zac",
"lastName" : "Tolley"
"data" : {
"temperature" : [
{
"celsius" : 22,
"timestamp" : 1212140000
}
]
}
}
I want to find a way to write a query that searches for userName = 'ztolley' and only returns back the last 10 temperature readings. I've been able to say just return the data field but I couldn't find a way to say just return data.temperature (there are many different data properties).
When I tried
db.user.find({userName:'ztolley'},{data: {temperature: {$slice: -10}}})
I got unsupported projection.
I'd try using the Aggregation framework http://docs.mongodb.org/manual/aggregation/ .
Using your schema this should work:
db.user.aggregate([{$match:{"userName":"ztolley"}},{$unwind:"$data.temperature"},{$sort:{"data.temperature.timestamp":-1}},{$limit:10}, {$project:{"data.temperature":1, _id:0}}])
It returns the temperature readings for that user, in reverse sorted order by timestamp, limited to 10.
It looks you write wrong projection, try it with dot '.' notation:
db.user.find( { userName:'ztolley' },{ 'data.temperature': { $slice: -10 } } );

Average a Sub Document Field Across Documents in Mongo

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:
/* 0 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "1"
},
{
"key" : "test-key2",
"value" : "2"
}
]
}
/* 1 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "3"
},
{
"key" : "test-key2",
"value" : "4"
}
]
}
I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2
Thanks
You'll need to use the aggregation framework. The aggregation will end up looking something like this:
db.stack.aggregate([
{ $match: { "samples.key" : "test-key" } },
{ $unwind : "$samples" },
{ $match : { "samples.key" : "test-key" } },
{ $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
{ $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])
The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.
Step 1: $match
The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.
Step 2: $unwind
The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean.
Long story short :
{ name : "bob",
children : [ {"name" : mary}, { "name" : "sue" } ]
}
becomes two documents :
{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }
Step 3: $match
The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"
Step 4: $project (Optional)
The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..
{ name : "bob", children : [ { "name" : mary } ] }
becomes
{ new_name : "bob", new_child_name : mary }
Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.
Step 5: $group
Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.
The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.
Final Note
Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field