How I can debug MongoDB slow chunk migration? - mongodb

I'm trying to move chunk inside the cluster:
mongos>db.adminCommand({ moveChunk: "db.col", find: {_id: ObjectId("58171b29b9b4ebfb3e8b4e42")}, to: "shard_v2"});
{ "millis" : 428681, "ok" : 1 }
In log I see following record:
2016-11-08T20:27:05.972+0300 I SHARDING [conn27] moveChunk migrate
commit accepted by TO-shard: { active: false, ns: "db.col", from:
"host:27017", min: { _id: ObjectId('58171b29b9b4ebfb3e8b4e42') }, max:
{ _id: ObjectId('58171f29b9b4eb31408b4b4c') }, shardKeyPattern: { _id:
1.0 }, state: "done", cc, ok: 1.0 }
So I have 23MB of data migrated in 430 sec. It is really slow.
I've uploaded a sample file to "host" and it was uploaded extremely fast (7-8MB per sec), so I do not think it is disk or network issue (cluster also does not have any load (no active queries)). What else I can check to improve chunk migration perfomance?

The performance most certainly is not limited by your setup. It may be MongoDbs migration policy that tries not to effect the normal database tasks.
There is a great answer on this issue on DBA stack exchange: https://dba.stackexchange.com/questions/81545/mongodb-shard-chunk-migration-500gb-takes-13-days-is-this-slow-or-normal

Related

Ongoing replication from MongoDB to RDS PostgresSQL

Created an AWS DMS pipeline:
Source endpoint - MongoDB
Target endpoint - RDS Postgres SQL
Successfully did all the security configuration, and both endpoints returned successful while testing it.
For the MongoDB source, I am using one of the three replicas sets with a username and a password that is not the admin username.
I also added the privilege "changeStream" in the replica set user.
But when starting the DMS migration task getting this error in cloud watch.
Encountered an error while initializing change stream: 'not authorized on admin to execute command
{ aggregate: 1, pipeline: [ { $changeStream: { fullDocument: "updateLookup", startAtOperationTime: Timestamp(1656005815, 0),
allChangesForCluster: true } }, "ok" : { "$numberDouble" : "0.0" },
"errmsg" : "not authorized on admin to execute command { aggregate: 1, pipeline: [ { $changeStream: { fullDocument:
\"updateLookup\", startAtOperationTime: Timestamp(1656005815, 0), allChangesForCluster: true } },
74f1-4aab-9ca1-f964ab655777\ (change_streams_capture.c:356)
Assuming this is due to some missing privileges in mongo replica sets USER.

can't accept new chunks because there are still 1 deletes from previous migration

I have a mongodb production cluster running in 2.6.11 with 20 replicatSets. I getting space disk issue, because the chunks majority are store in one replicatSet. When I check the log, I can see that move chunk failed because of "deletes from previous migration"
2015-12-28T17:13:32.164+0000 [conn6504] about to log metadata event: { _id: "db1-2015-12-28T17:13:32-56816dbc6b0464b0a5801db8", server: "db1", clientAddr: "xx.xx.xx.11:50077", time: new Date(1451322812164), what: "moveChunk.start", ns: "emailing_nQafExtB.reports", details: { min: { email: "xxxxxxx" }, max: { email: "xxxxxxx" }, from: "shard16", to: "shard22" } }
2015-12-28T17:13:32.675+0000 [conn6504] about to log metadata event: { _id: "db1-2015-12-28T17:13:32-56816dbc6b0464b0a5801db9", server: "db1", clientAddr: "xx.xx.xx.11:50077", time: new Date(1451322812675), what: "moveChunk.from", ns: "emailing_nQafExtB.reports", details: { min: { email: "xxxxxxx" }, max: { email: "xxxxxxx" }, step 1 of 6: 3, step 2 of 6: 314, note: "aborted", errmsg: "moveChunk failed to engage TO-shard in the data transfer: can't accept new chunks because there are still 1 deletes from previous migration" } }
I follow the answer from this question, but doesn't work for me. I run stepDown command on one primary and all my cluster primary. I do the same with the cleanUpOrphaned command.
Does somedody run over this problem ?
Thanks in advance for any insights.

Mongorestore not restoring data

I have an existing mongodump of a single collection that I am trying to restore. After running mongo restore, no errors show up and the data is not in the collection. Are there any known reasons how this could happen? I would expect if the data weren't inserted for some reason, an error would be provided in the log.
To create and attempt to restore the dump, I followed the answer provided for this question:
How to use mongodump for 1 collection
I've created a new database on a different server and it has an empty collection. I've checked the mongo log file and there are no errors, it shows the connection open and authenticate, then disconnect on the next line.
mongorestore -vvvvv -u user -p 'password' --db=MyDatabase --collection=MyCollection dump1/MyCollection.bson
2015-03-04T18:20:31.331+0000 creating new connection to:127.0.0.1:27017
2015-03-04T18:20:31.332+0000 [ConnectBG] BackgroundJob starting: ConnectBG
2015-03-04T18:20:31.332+0000 connected to server 127.0.0.1:27017 (127.0.0.1)
2015-03-04T18:20:31.332+0000 connected connection!
connected to: 127.0.0.1
2015-03-04T18:20:31.333+0000 drillDown: dump1/MyCollection.bson
2015-03-04T18:20:31.333+0000 dump1/MyCollection.bson
2015-03-04T18:20:31.333+0000 going into namespace [MyDatabase.MyCollection]
Restoring to MyDatabase.MyCollection without dropping. Restored data will be inserted without raising errors; check your server log
file size: 94876
130 objects found
2015-03-04T18:20:31.336+0000 Creating index: { key: { _id: 1 }, name: "_id_", ns: "MyDatabase.MyCollection" }
2015-03-04T18:20:31.340+0000 Creating index: { key: { geometry: "2dsphere" }, name: "geometry_2dsphere", ns: "MyDatabase.MyCollection", 2dsphereIndexVersion: 2 }
Log file:
2015-03-04T18:20:31.333+0000 [conn874] authenticate db: MyDatabase { authenticate: 1, nonce: "xxx", user: "user", key: "xxx" }
2015-03-04T18:20:31.342+0000 [conn874] end connection 127.0.0.1:59420 (25 connections now open)
The query I am using on the origin and destination is:
db.MyCollection.find()
On the origin server, the collection has 130 elements, which is what is also shown in the mongorestore output "130 objects found".
Edit:
I added the --drop option to the mongorestore command. The log file output clearly shows that it is creating the index on an empty collection.
2015-03-20T15:03:57.565+0000 [conn61965] authenticate db: MyDatabase { authenticate: 1, nonce: "xxx", user: "user", key: "xxx" }
2015-03-20T15:03:57.566+0000 [conn61965] CMD: drop MyDatabase.MyCollection
2015-03-20T15:03:57.631+0000 [conn61965] build index on: MyDatabase.MyCollection properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "MyDatabase.MyCollection" }
2015-03-20T15:03:57.631+0000 [conn61965] added index to empty collection
2015-03-20T15:03:57.652+0000 [conn61965] build index on: MyDatabase.MyCollection properties: { v: 1, key: { geometry: "2dsphere" }, name: "geometry_2dsphere", ns: "MyDatabase.MyCollection", 2dsphereIndexVersion: 2 }
2015-03-20T15:03:57.652+0000 [conn61965] added index to empty collection
2015-03-20T15:03:57.654+0000 [conn61965] end connection 127.0.0.1:59456 (21 connections now open)
So the issue ended up being that the user I was trying to do the restore with only had the read and dbAdmin roles. I had made a separate user so that the regular user used by the application did not have administrative rights. After changing my user's role from read to readWrite, it worked as expected.
To be honest, if the user didn't have the correct permissions, I really would have expected the log to show an error of some sort when it tries to run the restore without the correct permission.

User Assertion: 1: Update query failed -- RUNNER_DEAD

We are using MongoDB (v2.6.4) to process some data and everything works great except, once in a while, we get a weird RUNNER_DEAD exception...
MongoDB.Driver.WriteConcernException: WriteConcern detected an error ' Update query failed -- RUNNER_DEAD'. (Response was { "lastOp" : { "$timestamp" : NumberLong("6073471510486450182") }, "connectionId" : 49, "err" : " Update query failed -- RUNNER_DEAD", "code" : 1, "n" : 0, "ok" : 1.0 }).
This is the method that causes the exception:
private void UpdateEntityClassName(EntityClassName myEntity) {
var dateTimeNow = DateTime.UtcNow;
var update = Update<EntityClassName>.Set(p => p.Data, myEntity.Data)
...some more Sets...
.Set(p => p.MetaData.LastModifiedDateTime, dateTimeNow);
var result = _myCollection.Update(Query.EQ("_id", myEntity.Identifier), update, UpdateFlags.Upsert);
}
Exception in MongoDB log:
2014-10-23T13:51:29.989-0500 [conn45] update Database.Table query: { _id: "SameID" } update: { $set: { Data: BinData(0, SomeData...), ...more fields... MetaData.LastModifiedDateTime: new Date(1414090294910) } } nmoved:1 nMatched:1 nModified:1 keyUpdates:0 numYields:0 locks(micros) w:2344 2ms
2014-10-23T13:51:29.989-0500 [conn49] User Assertion: 1: Update query failed -- RUNNER_DEAD
2014-10-23T13:51:29.989-0500 [conn46] update Database.Table query: { _id: "SameID" } update: { $set: { Data: BinData(0, SomeData...), ...more fields... MetaData.LastModifiedDateTime: new Date(1414090294926) } } nMatched:1 nModified:1 fastmod:1 keyUpdates:0 numYields:0 locks(micros) w:249 0ms
2014-10-23T13:51:29.989-0500 [conn49] update Database.Table query: { _id: "SameID" } update: { $set: { Data: BinData(0, SomeData...), ...more fields... MetaData.LastModifiedDateTime: new Date(1414090294864) } } nModified:0 keyUpdates:0 exception: Update query failed -- RUNNER_DEAD code:1 numYields:1 locks(micros) w:285 8ms
I found very little documentation about this exception so any help appreciated.
We are running this in a 3 machine replica set if that changes anything.
We've been running this code for a while and we didn't have that issue before (in our original tests) so we went back to MongoDB 2.4.9 (the one we first tested on) and we don't get this exception anymore. Any ideas as to what might have changed that causes this exception?
Why you couldn't use a regular "update" using capped arrays to
limit the size of the array of queries rather than using some custom
logic).
If you have multiple threads that are doing the same thing, your code
doesn't appear thread-safe - let's say that two threads try to update
the same object with _id XYZ but with different changes. Both fetch
the object, both add a new attribute/value to the array and now both
call save - the first one saves, but the second one's save overwrites
the first one.
But that's not likely to be related to your error with RUNNER_DEAD
error - that's more likely a case where either something is killing
the operation or dropping the collection you're writing to (or the
index being used).
Source: #Asya Kamsky's post.

mongodb status of index creation job

I'm using MongoDB and have a collection with roughly 75 million records.
I have added a compound index on two "fields" by using the following command:
db.my_collection.ensureIndex({"data.items.text":1, "created_at":1},{background:true}).
Two days later I'm trying to see the status of the index creation. Running db.currentOp() returns {}, however when I try to create another index I get this error message:
cannot add index with a background operation in progress.
Is there a way to check the status/progress of the index creation job?
One thing to add - I am using mongodb version 2.0.6. Thanks!
At the mongo shell, type below command to see the current progress:
rs0:PRIMARY> db.currentOp(true).inprog.forEach(function(op){ if(op.msg!==undefined) print(op.msg) })
Index Build (background) Index Build (background): 1431577/55212209 2%
To do a real-time running status log:
> while (true) { db.currentOp(true).inprog.forEach(function(op){ if(op.msg!==undefined) print(op.msg) }); sleep(1000); }
Index Build: scanning collection Index Build: scanning collection: 43687948/47760207 91%
Index Build: scanning collection Index Build: scanning collection: 43861991/47760228 91%
Index Build: scanning collection Index Build: scanning collection: 44993874/47760246 94%
Index Build: scanning collection Index Build: scanning collection: 45968152/47760259 96%
You could use currentOp with a true argument which returns a more verbose output, including idle connections and system operations.
db.currentOp(true)
... and then you could use db.killOp() to Kill the desired operation.
The following should print out index progress:
db
.currentOp({"command.createIndexes": { $exists : true } })
.inprog
.forEach(function(op){ print(op.msg) })
outputs:
Index Build (background) Index Build (background): 5311727/27231147 19%
Unfortunately, DR9885 answer didn't work for me, it has spaces in the code (syntax error) and even if the spaces are removed, it returns nothing.
This works as of Mongo Shell v3.6.0
db.currentOp().inprog.forEach(function(op){ if(op.msg) print(op.msg) })
Didn't read Bajal answer until after I posted mine, but it's almost exactly the same except that it's slightly shorter code and also works.
I like:
db.currentOp({
'msg' :{ $exists: true },
'command': { $exists: true },
$or: [
{ 'command.createIndexes': { $exists: true } },
{ 'command.reIndex': { $exists: true } }
]
}).inprog.forEach(function(op) {
print(op.msg);
});
Output example:
Index Build Index Build: 84826/335739 25%
Documentation suggests:
db.adminCommand(
{
currentOp: true,
$or: [
{ op: "command", "command.createIndexes": { $exists: true } },
{ op: "none", "msg" : /^Index Build/ }
]
}
)
Active Indexing Operations example.
Simple one to just check progress of a single index going on:
db.currentOp({"msg":/Index/}).inprog[0].progress;
outputs:
{ "done" : 86007212, "total" : 96868386 }
Find progress of index jobs, nice one liner:
> db.currentOp().inprog.map(a => a.msg)
[
undefined,
undefined,
undefined,
undefined,
undefined,
undefined,
"Index Build: scanning collection Index Build: scanning collection: 16448156/54469342 30%",
undefined,
undefined
]