MongoDB find the corrupted doc

MongoDB find the corrupted doc - mongodb

I have a collection of 9 million records where I found an index in which if I tried to get all the documents, it throws the below error.
Error: error: {
"ok" : 0,
"errmsg" : "invalid bson type in element with field name '_contract_end_date' in object with unknown _id",
"code" : 22,
"codeName" : "InvalidBSON",
"operationTime" : Timestamp(1585753324, 14),
"$clusterTime" : {
"clusterTime" : Timestamp(1585753324, 14),
"signature" : {
"hash" : BinData(0,"2fEF+tGQoHsjvCCWph9YhkVajCs="),
"keyId" : NumberLong("6756221167083716618")
}
}
}
So I tried to rename the field to contract_end_date by using $rename operator. When I tried updateMany, it throws the same error.
But updateOne works. But this is not helpful as I just see the success message but not actually updating 100 odd docs for that index. I wonder how to see that corrupted doc to identify the other fields which will help me to identify the application which corrupts.
Sample doc: -It's a pretty simple flatten structure - around 50 fields are there in each doc - no nested docs.
{
_id:
sys_contract_end_date:
customer_name:
location:
owner:
retailer:
seller:
}

Related

Mongo DB maxScan - Unrecognized field

I'm using Mongo v4.2, and I'm trying to limit the documents scanned using maxScan. I've tried "limit" but I believe that pulls all matching documents then slices the array, I actually want to stop mongo from scanning past the first 5 docs.
Here is the error I get:
db.movies.find({title: 'Godfather'}).maxScan(5)
Error: error: {
"operationTime" : Timestamp(1598049657, 1),
"ok" : 0,
"errmsg" : "Failed to parse: { find: \"content_movies\", filter: { title: \"Godfather\" }, maxScan: 5.0, lsid: { id: UUID(\"de0fad49-6cd1-425f-896a-77aa7229e4f0\") }, $clusterTime: { clusterTime: Timestamp(1598049547, 1), signature: { hash: BinData(0, 98F9B39F0F6B9E8088947EF37506EA8B17F8AFAA), keyId: 6838904097895088131 } }, $db: \"PRODUCTION\" }. Unrecognized field 'maxScan'.",
"code" : 9,
"codeName" : "FailedToParse",
"$clusterTime" : {
"clusterTime" : Timestamp(1598049657, 1),
"signature" : {
"hash" : BinData(0,"/oI+65SAR7fEGyp9yilR+PFG3KQ="),
"keyId" : NumberLong("6838904097895088131")
}
}
}
Appreciate your help.

maxScan is removed in 4.2.
MongoDB removes the deprecated option maxScan for the find command and the mongo shell helper cursor.maxScan(). Use either the maxTimeMS option for the find command or the helper cursor.maxTimeMS() instead.
I actually want to stop mongo from scanning past the first 5 docs
Use limit for this.

ShardCollection fails with error "couldn't find valid index for shard key" & code: "96" &

Consider:
I have a collection called "Feeds", "Groups" and "Users" and there are millions of document in them.
Say "Feeds" schema is as follows:
{
fromUserRef: <Reference Id from Users collection>,
toUsersRef: [<Array of Reference Id(s) from Users collection>],
toGroupsRef: [<Array of Reference Id(s) from Groups collection>],
text: <String>,
image: <String>,
...
}
Sample Documents:
{
fromUserRef: ObjectId("5dd4e8b52355555592249596"),
toUsersRef: [],
toGroupsRef: [ObjectId("5dd4e8b523594c5592249392")],
...
},
{
fromUserRef: ObjectId("5dd4e8b52355555592249583"),
toUsersRef: [ObjectId("5dd4e8b52355555592249596"), ObjectId("5dd4e8b52355555592249291")],
toGroupsRef: [],
...
},
{
fromUserRef: ObjectId("5dd4e8b52355555592249583"),
toUsersRef: [],
toGroupsRef: [],
...
}
Now say I chose the shard key with "toUsersRef" & "toGroupsRef". According to the official docs there must be an index that supports the shard key.
In my case, I have created an index as follows:
db.feeds.createIndex({
"toUserRef" : 1,
"toGroupsRef" : 1
},
{"name":"toUsersRef_1_toGroupsRef_1"})
Shard Collection:
sh.shardCollection("my_db.feeds", {
"toUsersRef" : 1, "toGroupsRef" : 1
});
Error Output:
{
"message" : "couldn't find valid index for shard key",
"ok" : 0,
"code" : 96,
"codeName" : "OperationFailed",
"operationTime" : "Timestamp(1578373028, 5)",
"$clusterTime" : {
"clusterTime" : "Timestamp(1578373028, 5)",
"signature" : {
"hash" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=",
"keyId" : 0
}
},
"name" : "MongoError"
}
Not sure what's wrong here. Could anyone put some light on this?
P.S: To make index strategy must be the same as shard strategy, tried the same shard key on an empty collection & copied the index which is created to support the shard key and re-attempted to shard collection where data exists, but no luck it throws the same error.
More info:
Using MongoDB versions: 4.2.2
config server cluster/replica set, 2 shard cluster/replica servers
while creating index/shard collection was connected to mongos
Finally found the root cause
It was working on new collection but not on existing because the create index had a typo in it [ s was missing :| ]. & once corrected the typo getting error as cannot index parallel arrays [toUsersRef] [toGroupsRef]

Aggregations work in standalone but not in sharded cluster

I'm currently trying the aggregations with MongoDB using the json found here : http://media.mongodb.org/zips.json
So, I imported it thousands of time and then I tried this command :
db.CO_villes.aggregate({$group:{_id:"$state",population:{$sum:"$pop"}}})
And I got this error :
2019-04-24T13:49:19.579+0000 E QUERY [js] Error: command failed: {
"ok" : 0,
"errmsg" : "unrecognized field 'mergeByPBRT'",
"code" : 9,
"codeName" : "FailedToParse",
"operationTime" : Timestamp(1556113758, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1556113758, 2),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
} : aggregate failed :
I have a sharded cluster with 3 MongoDB instances.
I can face this issue too when I try to get the indexes with "Compass".
I tried to export the data and to remove the id field using the "sed" command (because my Ids were not all with "ObjectID")and to import it but I still face this issue.

I solved my issue by creating a 3.6 cluster instead of a 4.0.6. So I think this is a bug related to the new versions of MongoDB.

MongoDB aggregation query

I am using mongoDb 2.6.4 and still getting an error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422033698000, 105),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Reading different advices I came across of saving result in another collection using $out. My query looks like this now:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" } ,
{
$out : "tmp"
}]
)
But I am getting different error:
uncaught exception: aggregate failed:
{"errmsg" : "exception: insert for $out failed: { lastOp: Timestamp 1422034172000|25, connectionId: 625789, err: \"insertDocument :: caused by :: 11000 E11000 duplicate key error index: duties_and_taxes.tmp.agg_out.5.$_id_ dup key: { : ObjectId('54c12d784c1b2a767b...\", code: 11000, n: 0, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1422034172000|25, electionId: ObjectId('542c2900de1d817b13c8d339') } }",
"code" : 16996,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422034172000, 26),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Can someone has a solution?

The error is due to the $unwind step in your pipeline.
When you unwind by a field having n elements, n copies of the same documents are produced with the same _id. Each copy having one of the elements from the array that was used to unwind. See the below demonstration of the records after an unwind operation.
Sample demo:
> db.t.insert({"a":[1,2,3,4]})
WriteResult({ "nInserted" : 1 })
> db.t.aggregate([{$unwind:"$a"}])
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 1 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 2 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 3 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 4 }
>
Since all these documents have the same _id, you get a duplicate key exception(due to the same value in the _id field for all the un-winded documents) on insert into a new collection named tmp.
The pipeline will fail to complete if the documents produced by the
pipeline would violate any unique indexes, including the index on the
_id field of the original output collection.
To solve your original problem, you could set the allowDiskUse option to true. It allows, using the disk space whenever it needs to.
Optional. Enables writing to temporary files. When set to true,
aggregation operations can write data to the _tmp subdirectory in the
dbPath directory. See Perform Large Sort Operation with External Sort
for an example.
as in:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" }] , // note, the pipeline ends here
{
allowDiskUse : true
});

Update an item in an array that is in an array

I have a document in a mongodb collection like this :
{
sessions : [
{
issues : [
{
id : "6e184c73-2926-46e9-a6fd-357b55986a28",
text : "some text"
},
{
id : "588f4547-3169-4c39-ab94-8c77a02a1774",
text : "other text"
}
]
}
]
}
And I want to update the issue with the id 588f4547-3169-4c39-ab94-8c77a02a1774 in the first session.
The problem is that I only know that it's the first session and the issue id (NOT the index of the issue !)
So I try something like this :
db.mycollection.update({ "sessions.0.issues.id" : "588f4547-3169-4c39-ab94-8c77a02a1774"},
{ $set: { "sessions.0.issues.$.text" : "a new text" }})
But I got the following result :
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16837,
"errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: sessions.0.issues.$.text"
}
How can I do this ?
Thanks for help.

You have to use this (apparently equivalent) query:
db.mycollection.update({"sessions.0.issues": {$elemMatch: {id: <yourValue>}}}, {$set: {"sessions.0.issues.$.text": "newText"}})
Notice that your update expression was correct.
More information about $elemMatch.
Btw, MongoDB reference explicits that $ operator does not work "with queries that traverse nested arrays".
Important: $elemMatch only works with version 4 or more.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MongoDB find the corrupted doc - mongodb

Related

Mongo DB maxScan - Unrecognized field

ShardCollection fails with error "couldn't find valid index for shard key" & code: "96" &

Aggregations work in standalone but not in sharded cluster

MongoDB aggregation query

Update an item in an array that is in an array

Categories

Resources