Java with mongodb - mongodb

I am getting following exception when I tried to fetch the data from a mongodb collection. This collection is having very huge data.
The exception is:
com.mongodb.MongoQueryException: Query failed with error code 10334 and error message 'BSONObj size: 24020168 (0x16E84C8) is invalid. Size must be between 0 and 16793600(16MB)' on server 10.15.0.227:27017
And following is my query which I used to get the data from mongodb:
db.getCollection('triggered_policies').aggregate(
[{ "$match" : { "policy_name" : "EIQSOC-1040-ec"}},
{ "$project" : { "cust_created_at" : { "$add" : [ "$created_at" , 19800000]} , "event_ids" : "$event_ids" , "trigger_time" : "$trigger_time" , "created_at" : "$created_at" , "triggered_rules" : "$triggered_rules"}},
{ "$sort" : { "created_at" : -1}},
{ "$group" :
{ "_id" :
{
"$hour" : "$cust_created_at"} ,
"triggered_policies" : { "$addToSet" : { "trigger_time" : "$trigger_time" , "created_at" : "$created_at" , "event_ids" : "$event_ids" , "triggered_rules" : "$triggered_rules"}
}
}
},
{ "$sort" : { "_id" : 1}}
])
Following is the exact exception which we are getting:
Error: getMore command failed: {
"ok" : 0,
"errmsg" : "BSONObj size: 25994482 (0x18CA4F2) is invalid. Size must be between 0 and 16793600(16MB)",
"code" : 10334
}
Please help us to solve the issue.

Looks like the document created during aggregation exceeds the 16MB size restriction in mongo db. You might have to change your aggregate query to not accumulate too much data into a single document which exceeds the 16MB size limit.
Below is the quote from Mongo DB documentation:
BSON Document Size
The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS.

Related

Poor performance on bulk deleting a large collection mongodb

I have a single standalone mongo installation on a Linux machine.
The database contains a collection with 181 million documents. This collection is by far the largest collection in the database (approx 90%)
The size of the collection is currently 3.5 TB.
I'm running Mongo version 4.0.10 (Wired Tiger)
The collection have 2 indexes.
One on id
One on 2 fields and it is used when deleting documents (see those in the snippet below).
When benchmarking bulk deletion on this collection we used the following snippet
db.getCollection('Image').deleteMany(
{$and: [
{"CameraId" : 1},
{"SequenceNumber" : { $lt: 153000000 }}]})
To see the state of the deletion operation I ran a simple test of deleting 1000 documents while looking at the operation using currentOp(). It shows the following.
"command" : {
"q" : {
"$and" : [
{
"CameraId" : 1.0
},
{
"SequenceNumber" : {
"$lt" : 153040000.0
}
}
]
},
"limit" : 0
},
"planSummary" : "IXSCAN { CameraId: 1, SequenceNumber: 1 }",
"numYields" : 876,
"locks" : {
"Global" : "w",
"Database" : "w",
"Collection" : "w"
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(877),
"w" : NumberLong(877)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(877)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(877)
}
}
}
It seems to be using the correct index but the number and type of locks worries me. As I interpret this it aquires 1 global lock for each deleted document from a single collection.
When using this approach it has taken over a week to delete 40 million documents. This cannot be expected performance.
I realise there other design exists such as bulking documents into larger chunks and store them using GridFs, but the current design is what it is and I want to make sure that what I see is expected before changing my design or restructuring the data or even considering clustering etc.
Any suggestions of how to increase performance on bulk deletions or is this expected?

MongoDB aggregation query

I am using mongoDb 2.6.4 and still getting an error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422033698000, 105),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Reading different advices I came across of saving result in another collection using $out. My query looks like this now:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" } ,
{
$out : "tmp"
}]
)
But I am getting different error:
uncaught exception: aggregate failed:
{"errmsg" : "exception: insert for $out failed: { lastOp: Timestamp 1422034172000|25, connectionId: 625789, err: \"insertDocument :: caused by :: 11000 E11000 duplicate key error index: duties_and_taxes.tmp.agg_out.5.$_id_ dup key: { : ObjectId('54c12d784c1b2a767b...\", code: 11000, n: 0, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1422034172000|25, electionId: ObjectId('542c2900de1d817b13c8d339') } }",
"code" : 16996,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422034172000, 26),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Can someone has a solution?
The error is due to the $unwind step in your pipeline.
When you unwind by a field having n elements, n copies of the same documents are produced with the same _id. Each copy having one of the elements from the array that was used to unwind. See the below demonstration of the records after an unwind operation.
Sample demo:
> db.t.insert({"a":[1,2,3,4]})
WriteResult({ "nInserted" : 1 })
> db.t.aggregate([{$unwind:"$a"}])
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 1 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 2 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 3 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 4 }
>
Since all these documents have the same _id, you get a duplicate key exception(due to the same value in the _id field for all the un-winded documents) on insert into a new collection named tmp.
The pipeline will fail to complete if the documents produced by the
pipeline would violate any unique indexes, including the index on the
_id field of the original output collection.
To solve your original problem, you could set the allowDiskUse option to true. It allows, using the disk space whenever it needs to.
Optional. Enables writing to temporary files. When set to true,
aggregation operations can write data to the _tmp subdirectory in the
dbPath directory. See Perform Large Sort Operation with External Sort
for an example.
as in:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" }] , // note, the pipeline ends here
{
allowDiskUse : true
});

assertion exception in mongo mapreduce

I have a collection that stores search query logs. It's two main attributes are user_id and search_query. user_id is null for a logged out user. I am trying to run a mapreduce job to find out the count and terms per user.
var map = function(){
if(this.user_id !== null){
emit(this.user_id, this.search_query);
}
}
var reduce = function(id, queries){
return Array.sum(queries + ",");
}
db.searchhistories.mapReduce(map,
reduce,
{
query: { "time" : {
$gte : ISODate("2013-10-26T14:40:00.000Z"),
$lt : ISODate("2013-10-26T14:45:00.000Z")
}
},
out : "mr2"
}
)
throws the following exception
Wed Nov 27 06:00:07 uncaught exception: map reduce failed:{
"errmsg" : "exception: assertion src/mongo/db/commands/mr.cpp:760",
"code" : 0,
"ok" : 0
}
I looked at mr.cpp L#760 but could not gather any vital information. What could be causing this?
My Collection has values like
> db.searchhistories.find()
{ "_id" : ObjectId("5247a9e03815ef4a2a005d8b"), "results" : 82883, "response_time" : 0.86, "time" : ISODate("2013-09-29T04:17:36.768Z"), "type" : 0, "user_id" : null, "search_query" : "awareness campaign" }
{ "_id" : ObjectId("5247a9e0606c791838005cba"), "results" : 39545, "response_time" : 0.369, "time" : ISODate("2013-09-29T04:17:36.794Z"), "type" : 0, "user_id" : 34225174, "search_query" : "eficaz eficiencia efectividad" }
Looking at the docs I could see that this is not possible in the slave. It will work perfectly fine in the master though. If you still want to use the slave then you have to use the following syntax.
db.searchhistories.mapReduce(map,
reduce,
{
query: { "time" : {
$gte : ISODate("2013-10-26T14:40:00.000Z"),
$lt : ISODate("2013-10-26T14:45:00.000Z")
}
},
out : { inline : 1 }
}
)
** Ensure that the output document size does not exceed 16MB limit while using inline function.

$Where gives error

I have a collection containing data:
{
"_id" : ObjectId("51dfb7abe4b02f15ee93a7c7"),
"date_created" : "2013-7-12 13:25:5",
"referrer_id" : 13,
"role_name" : "Physician",
"status_id" : "1",
}
I am sending the query:
cmd {
"mapreduce" : "doctor" ,
"map" : "function map(){emit(this._id,this);}" ,
"reduce" : "function reduce(key,values){return values;}" ,
"verbose" : true ,
"out" : { "merge" : "map_reduce"} ,
"query" : { "$where" : "this.demographics.first_name=='makdoctest'"}
}
I am getting error as:
"errmsg" : "exception: count failed in DBDirectClient: 10071 error on invocation of $where function:\nJS Error: TypeError: this.demographics has no properties nofile_a:0"
As Sammaye says in a comment:
It means that somewhere in one of your documents demographics is null or does not exist, you need to do a null check first, but more importantly why are you dong this in a $where?
I would go even further that that, and I wouldn't even use the Map/Reduce mechanism here. It slow, can't use indexes and can not run in parallel with others.
You would be much better off using the Aggregation Framework where you can do something like:
db.doctor.aggregate( [
{ $match: { "demographics.first_name" : 'makdoctest' } },
{ $group: …
You didn't specify the final goal here, but once you do I can update the answer.

Mongo DB sorting exception - too much data for sort() with no index

Using MongoDB version 2.4.4, I have a profile collection containing profiles documents.
I have the following query:
Query: { "loc" : { "$near" : [ 32.08290052711715 , 34.80888522811172] , "$maxDistance" : 0.0089992800575954}}
Fields: { "friendsCount" : 1 , "tappsCount" : 1 , "imageUrl" : 1 , "likesCount" : 1 , "lastActiveTime" : 1 , "smallImageUrl" : 1 , "loc" : 1 , "pid" : 1 , "firstName" : 1}
Sort: { "lastActiveTime" : -1}
Limited to 100 documents.
loc - embedded document containing the keys ( lat,lon)
I am getting the exception:
org.springframework.data.mongodb.UncategorizedMongoDbException: too much data for sort() with no index. add an index or specify a smaller limit;
As stated in the exception when I down-size the limit to 50 it works.. but it ain't option for me.
I have the following 2 relevant indexes on the profile document:
{'loc':'2d'}
{'lastActiveTime':-1}
I have also tried compound index as below but without success.
{'loc':'2d', 'lastActiveTime':-1}
This is example document (with the relevant keys):
{
"_id" : "5d5085601208aa918bea3c1ede31374d",
"gender" : "female",
"isCreated" : true,
"lastActiveTime" : ISODate("2013-04-08T11:30:56.615Z"),
"loc" : {
"lat" : 32.082230499955806,
"lon" : 34.813542940344945,
"locTime" : NumberLong(0)
}
}
There are other fields in the profile documents .. basically average profile document size is 0.5 MB correct me if I am wrong but if I am specifying only the relevant response fields (as I do) it is not the cause for the problem.
Don't know if it helps but when I down-size the limit size to 50 and the query succeed
I have the following explain information (via MongoVUE client)
cursor : GeoSearchCursor
isMultyKey : False
n : 50
nscannedObjects : 50
nscanned : 50
nscannedObjectsAllPlans : 50
nscannedAllPlans : 50
scanAndOrder : True
indexOnly : False
nYields : 0
nChunkSkips : 0
millis : 10
indexBounds :
It is a blocker for me and I will appreciate your help, what am I doing wrong? How can I make the query roll with the needed limit size?
Try creating a compound index instead of two indexes.
db.collection.ensureIndex( { 'loc':'2d','lastActiveTime':-1 } )
You can also suggest the query which index to use:
db.collection.find(...).hint('myIndexName')