Mongo group with geospatial condition - mongodb

I have an issue when I try to aggregate results with a geospatial condition :
db.users.group({
reduce: function(obj, prev) {
prev.sums = {
y_levels: prev.sums.y_levels + obj.current_y_level,
x_levels: prev.sums.x_levels + obj.current_x_level,
count: prev.sums.count + 1
}
},
cond: {current_position: { $near: [56.553823, 8.565619, 10]}},
initial: { sums: { y_levels: 0, x_levels: 0, count: 0 } }
});
produces :
uncaught exception: group command failed: {
"errmsg" : "exception: manual matcher config not allowed",
"code" : 13285,
"ok" : 0
I have no issue with a "regular" condition.
Any idea?

I believe that you are seeing this bug here:
http://jira.mongodb.org/browse/SERVER-1742
The bug referenced uses the Map-Reduce command, but "group" is really just a prepared subset of Map-Reduce.
If you need this functionality, please vote it up so that it can be appropriately prioritized.

Related

Getting BSONObj size error even with allowDiskUse true option

I have a collection with 300 million documents, each doc has a user_id field like following:
{
"user_id": "1234567",
// and other fields
}
I want to a list of unique user_ids in the collection, but the following mongo shell command results in an error.
db.collection.aggregate([
{ $group: { _id: null, user_ids: { $addToSet: "$user_id" } } }
], { allowDiskUse: true });
2021-11-23T14:50:28.163+0900 E QUERY [js] uncaught exception: Error: command failed: {
"ok" : 0,
"errmsg" : "Error on remote shard <host>:<port> :: caused by :: BSONObj size: 46032166 (0x2BE6526) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: null",
"code" : 10334,
"codeName" : "BSONObjectTooLarge",
"operationTime" : Timestamp(1637646628, 64),
...
} : aggregate failed :
Why does the error occur even with allowDiskUse: true option?
The db version 4.2.16.
You try to insert all unique user_ids in single document , but apparently the size of this document become greater then16MB causing the issue.
distinct may be more useful
db.collection.distinct( "user_id" )

mongodb $slice aggregate operator issue when third argument is 0 { $slice: [ <array>, <position>, <n> ] }

Consider the below Mongodb Query.
db.groups.aggregate([{
$match: {
_id: {
$in: [220]
},
active: true
}
},
{
$project: {
myArr: {
$slice: ["$arr", -1, 0]
},
name: 1
}
}]).pretty()
When I execute this query I encounter error:
2020-12-05T19:16:43.002-0500 E QUERY [js] uncaught exception: Error: command failed: {
"operationTime" : Timestamp(1607213793, 1),
"ok" : 0,
"errmsg" : "Third argument to $slice must be positive: 0",
"code" : 28729,
"codeName" : "Location28729",
"$clusterTime" : {
"clusterTime" : Timestamp(1607213793, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
} : aggregate failed :
As per the $slice documentation, https://docs.mongodb.com/manual/reference/operator/aggregation/slice/
the third argument ('n') to the $slice aggregate operator cannot be negative if the second argument 'position' is specified. I calculate the third argument value to 0 which means, I expect 0 array elements as the result. However, I encounter errmsg" : "Third argument to $slice must be positive: 0".
Please suggest if my understanding is incorrect as the query doesn't execute as I expect.
As you need to use calculated value, you can use $cond
{
$project:{
"arrayName":{
$cond:[{$eq:["$calculatedField",0]}, [], $sliceOperationGoesHere]
}
}
}
0 is not a valid value for slice when position is specified because it's not a positive number. When you calculate the number of elements to return in $slice, if it's 0 you can instead project an empty array like so:
{
$project: {
myArr: [],
name: 1
}
}
And if it's greater than 0 you can pass the $slice operator to the $project stage as you are currently doing.

Solving "BSONObj size: 17582686 (0x10C4A5E) is invalid" when doing aggregation in MongoDB?

I'm trying to remove duplicate documents in MongoDB in a large collection according to the approach described here:
db.events.aggregate([
{ "$group": {
"_id": { "firstId": "$firstId", "secondId": "$secondId" },
"dups": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }}
], {allowDiskUse:true, cursor:{ batchSize:100 } }).forEach(function(doc) {
doc.dups.shift();
db.events.remove({ "_id": {"$in": doc.dups }});
});
I.e. I want to remove events that has the same "firstId - secondId" combination. However after a while MongoDB responds with this error:
2016-11-30T14:13:57.403+0000 E QUERY [thread1] Error: getMore command failed: {
"ok" : 0,
"errmsg" : "BSONObj size: 17582686 (0x10C4A5E) is invalid. Size must be between 0 and 16793600(16MB)",
"code" : 10334
}
Is there anyway to get around this? I'm using MongoDB 3.2.6.
The error message indicates that some part of the process is attempting to create a document that is larger than the 16 MB document size limit in MongoDB.
Without knowing your data set, I would guess that the size of the collection is sufficiently large that the number of unique firstId / secondId combinations is growing the result set past the document size limit.
If the size of the collection prevents finding all duplicates values in one operation, you may want to try breaking it up and iterating through the collection and querying to find duplicate values:
db.events.find({}, { "_id" : 0, "firstId" : 1, "secondId" : 1 }).forEach(function(doc) {
cnt = db.events.find(
{ "firstId" : doc.firstId, "secondId" : doc.secondId },
{ "_id" : 0, "firstId" : 1, "secondId" : 1 } // explictly only selecting key fields to allow index to cover the query
).count()
if( cnt > 1 )
print('Dupe Keys: firstId: ' + doc.firstId + ', secondId: ' + doc.secondId)
})
It's probably not the most efficient implementation, but you get the idea.
Note that this approach heavily relies upon the existence of the index { 'firstId' : 1, 'secondId' : 1 }

MongoDB aggregation query

I am using mongoDb 2.6.4 and still getting an error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422033698000, 105),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Reading different advices I came across of saving result in another collection using $out. My query looks like this now:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" } ,
{
$out : "tmp"
}]
)
But I am getting different error:
uncaught exception: aggregate failed:
{"errmsg" : "exception: insert for $out failed: { lastOp: Timestamp 1422034172000|25, connectionId: 625789, err: \"insertDocument :: caused by :: 11000 E11000 duplicate key error index: duties_and_taxes.tmp.agg_out.5.$_id_ dup key: { : ObjectId('54c12d784c1b2a767b...\", code: 11000, n: 0, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1422034172000|25, electionId: ObjectId('542c2900de1d817b13c8d339') } }",
"code" : 16996,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422034172000, 26),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Can someone has a solution?
The error is due to the $unwind step in your pipeline.
When you unwind by a field having n elements, n copies of the same documents are produced with the same _id. Each copy having one of the elements from the array that was used to unwind. See the below demonstration of the records after an unwind operation.
Sample demo:
> db.t.insert({"a":[1,2,3,4]})
WriteResult({ "nInserted" : 1 })
> db.t.aggregate([{$unwind:"$a"}])
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 1 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 2 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 3 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 4 }
>
Since all these documents have the same _id, you get a duplicate key exception(due to the same value in the _id field for all the un-winded documents) on insert into a new collection named tmp.
The pipeline will fail to complete if the documents produced by the
pipeline would violate any unique indexes, including the index on the
_id field of the original output collection.
To solve your original problem, you could set the allowDiskUse option to true. It allows, using the disk space whenever it needs to.
Optional. Enables writing to temporary files. When set to true,
aggregation operations can write data to the _tmp subdirectory in the
dbPath directory. See Perform Large Sort Operation with External Sort
for an example.
as in:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" }] , // note, the pipeline ends here
{
allowDiskUse : true
});

Mongodb find near maxdistance

I am firing the following query in mongodb
db.acollection.find({
"field.location": {
"$near": [19.0723058, 73.00067739999997]
},
$maxDistance : 100000
}).count()
and getting the following error -
uncaught exception: count failed: {
"shards" : {
},
"cause" : {
"errmsg" : "exception: unknown top level operator: $maxDistance",
"code" : 2,
"ok" : 0
},
"code" : 2,
"ok" : 0,
"errmsg" : "failed on : Shard ShardA"
}
You did it wrong. The $maxDistance argument is a "child" of the $near operator:
db.acollection.find({
"field.location": {
"$near": [19.0723058, 73.00067739999997],
"$maxDistance": 100000
}
}).count()
Has to be within the same expression.
Also look at GeoJSON when you are making a new application. It is the way you should be storing in the future.