can't accept new chunks because there are still 1 deletes from previous migration - mongodb

I have a mongodb production cluster running in 2.6.11 with 20 replicatSets. I getting space disk issue, because the chunks majority are store in one replicatSet. When I check the log, I can see that move chunk failed because of "deletes from previous migration"
2015-12-28T17:13:32.164+0000 [conn6504] about to log metadata event: { _id: "db1-2015-12-28T17:13:32-56816dbc6b0464b0a5801db8", server: "db1", clientAddr: "xx.xx.xx.11:50077", time: new Date(1451322812164), what: "moveChunk.start", ns: "emailing_nQafExtB.reports", details: { min: { email: "xxxxxxx" }, max: { email: "xxxxxxx" }, from: "shard16", to: "shard22" } }
2015-12-28T17:13:32.675+0000 [conn6504] about to log metadata event: { _id: "db1-2015-12-28T17:13:32-56816dbc6b0464b0a5801db9", server: "db1", clientAddr: "xx.xx.xx.11:50077", time: new Date(1451322812675), what: "moveChunk.from", ns: "emailing_nQafExtB.reports", details: { min: { email: "xxxxxxx" }, max: { email: "xxxxxxx" }, step 1 of 6: 3, step 2 of 6: 314, note: "aborted", errmsg: "moveChunk failed to engage TO-shard in the data transfer: can't accept new chunks because there are still 1 deletes from previous migration" } }
I follow the answer from this question, but doesn't work for me. I run stepDown command on one primary and all my cluster primary. I do the same with the cleanUpOrphaned command.
Does somedody run over this problem ?
Thanks in advance for any insights.

Related

Mongodb Replication doesnt start

we are trying to move from mongo 2.4.9 to 3.4, we have a lot of data so we tried to set replication and wait while data will be synced and then swap primary.
Configurations done but when replication is initiated new server cant stabilize replication:
017-07-07T12:07:22.492+0000 I REPL [replication-1] Starting initial sync (attempt 10 of 10)
2017-07-07T12:07:22.501+0000 I REPL [replication-1] sync source candidate: mongo-2.blabla.com:27017
2017-07-07T12:07:22.501+0000 I STORAGE [replication-1] dropAllDatabasesExceptLocal 1
2017-07-07T12:07:22.501+0000 I REPL [replication-1] ******
2017-07-07T12:07:22.501+0000 I REPL [replication-1] creating replication oplog of size: 6548MB...
2017-07-07T12:07:22.504+0000 I STORAGE [replication-1] WiredTigerRecordStoreThread local.oplog.rs already started
2017-07-07T12:07:22.505+0000 I STORAGE [replication-1] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2017-07-07T12:07:22.505+0000 I STORAGE [replication-1] Scanning the oplog to determine where to place markers for truncation
2017-07-07T12:07:22.519+0000 I REPL [replication-1] ******
2017-07-07T12:07:22.521+0000 I REPL [replication-1] Initial sync attempt finishing up.
2017-07-07T12:07:22.521+0000 I REPL [replication-1] Initial Sync Attempt Statistics: { failedInitialSyncAttempts: 9, maxFailedInitialSyncAttempts: 10, initialSyncStart: new Date(1499429233163), initialSyncAttempts: [ { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" }, { durationMillis: 0, status: "CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find", syncSource: "mongo-2.blabla.com:27017" } ] }
2017-07-07T12:07:22.521+0000 E REPL [replication-1] Initial sync attempt
failed -- attempts left: 0 cause: CommandNotFound: error while getting last
oplog entry for begin timestamp: no such cmd: find
2017-07-07T12:07:22.521+0000 F REPL [replication-1] The maximum number
of retries have been exhausted for initial sync.
2017-07-07T12:07:22.522+0000 E REPL [replication-0] Initial sync failed,
shutting down now. Restart the server to attempt a new initial sync.
2017-07-07T12:07:22.522+0000 I - [replication-0] Fatal assertion 40088 CommandNotFound: error while getting last oplog entry for begin timestamp: no such cmd: find at src/mongo/db/repl/replication_coordinator_impl.cpp 632
please assits guys, since we have more than 100G of data, so dump and restore will take a lot of downtime
Configurations:
3.4.5 new machine:
storage:
dbPath: /mnt/dbpath
journal:
enabled: true
engine: wiredTiger
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
net:
port: 27017
replication:
replSetName: prodTest
2.4.9 old machine with data:
dbpath=/var/lib/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true port = 27017
the task have been solved in such way:
-create replica master-v2.4, 3 slaves-v2.6
-stop app, step down master
-stop new master and upgrade mongo version to v3.0,
start master and upgrade slaves sequentually to 3.2(slave db files
removed new version started on wiredTiger engine)
-step down master, upgrade all slaves to 3.4
This process become very fast because replica slave recovery of 40G db takes around 30m.

How I can debug MongoDB slow chunk migration?

I'm trying to move chunk inside the cluster:
mongos>db.adminCommand({ moveChunk: "db.col", find: {_id: ObjectId("58171b29b9b4ebfb3e8b4e42")}, to: "shard_v2"});
{ "millis" : 428681, "ok" : 1 }
In log I see following record:
2016-11-08T20:27:05.972+0300 I SHARDING [conn27] moveChunk migrate
commit accepted by TO-shard: { active: false, ns: "db.col", from:
"host:27017", min: { _id: ObjectId('58171b29b9b4ebfb3e8b4e42') }, max:
{ _id: ObjectId('58171f29b9b4eb31408b4b4c') }, shardKeyPattern: { _id:
1.0 }, state: "done", cc, ok: 1.0 }
So I have 23MB of data migrated in 430 sec. It is really slow.
I've uploaded a sample file to "host" and it was uploaded extremely fast (7-8MB per sec), so I do not think it is disk or network issue (cluster also does not have any load (no active queries)). What else I can check to improve chunk migration perfomance?
The performance most certainly is not limited by your setup. It may be MongoDbs migration policy that tries not to effect the normal database tasks.
There is a great answer on this issue on DBA stack exchange: https://dba.stackexchange.com/questions/81545/mongodb-shard-chunk-migration-500gb-takes-13-days-is-this-slow-or-normal

mongorestore not working. collection is empty

i am trying to dump a mongodb collection to file, and then use that to restore to another mongodb instance.
dumping -
mongodump --host 127.0.0.1 --port 27017 --username vespauser --password <passwd> --collection vespastats --db vespa --out /archive/vespa-archive/vespa-db-backup_001
connected to: 127.0.0.1:27017
2015-04-21T16:24:07.070-0400 DATABASE: vespa to /archive/vespa-archive/vespa-db-backup_testing01/vespa
2015-04-21T16:24:07.141-0400 vespa.system.indexes to /archive/vespa-archive/vespa-db-backup_testing01/vespa/system.indexes.bson
2015-04-21T16:24:07.148-0400 4 documents
2015-04-21T16:24:07.149-0400 vespa.vespastats to /archive/vespa-archive/vespa-db-backup_testing01/vespa/vespastats.bson
2015-04-21T16:24:07.316-0400 59724 documents
2015-04-21T16:24:08.118-0400 Metadata for vespa.vespastats to /archive/vespa-archive/vespa-db-backup_testing01/vespa/vespastats.metadata.json
restoring -
mongorestore -v --drop --host 127.0.0.1 --port 27017 --username admin --password <passwd> /archive/vespa-archive/vespa-db-backup_001
2015-04-21T16:31:11.962-0400 creating new connection to:127.0.0.1:27017
2015-04-21T16:31:11.963-0400 [ConnectBG] BackgroundJob starting: ConnectBG
2015-04-21T16:31:11.963-0400 connected to server 127.0.0.1:27017 (127.0.0.1)
2015-04-21T16:31:11.963-0400 connected connection!
connected to: 127.0.0.1:27017
2015-04-21T16:31:11.966-0400 /home/amurty/vespa-db/vespa-db-backup_testing01/vespa/vespastats.bson
2015-04-21T16:31:11.966-0400 going into namespace [vespa.vespastats]
2015-04-21T16:31:11.966-0400 dropping
file size: 88808161
59724 objects found
2015-04-21T16:31:13.730-0400 Creating index: { key: { _id: 1 }, name: "_id_", ns: "vespa.vespastats" }
2015-04-21T16:31:13.848-0400 Creating index: { key: { url: 1 }, name: "url_1", ns: "vespa.vespastats", background: true }
2015-04-21T16:31:13.858-0400 Creating index: { key: { r_tstpm: 1 }, name: "r_tstpm_1", ns: "vespa.vespastats", background: true }
2015-04-21T16:31:13.859-0400 Creating index: { key: { url: 1, r_tstpm: 1 }, name: "url_1_r_tstpm_1", ns: "vespa.vespastats", background: true }
from /var/log/mongodb/mongod.log -
2015-04-21T16:31:11.963-0400 [initandlisten] connection accepted from 127.0.0.1:58444 #23 (1 connection now open)
2015-04-21T16:31:11.964-0400 [conn23] authenticate db: admin { authenticate: 1, nonce: "xxx", user: "admin", key: "xxx" }
2015-04-21T16:31:11.968-0400 [conn23] CMD: drop vespa.vespastats
2015-04-21T16:31:13.757-0400 [conn23] allocating new ns file /var/lib/mongo/vespa.ns, filling with zeroes...
2015-04-21T16:31:13.838-0400 [FileAllocator] allocating new datafile /var/lib/mongo/vespa.0, filling with zeroes...
2015-04-21T16:31:13.846-0400 [FileAllocator] done allocating datafile /var/lib/mongo/vespa.0, size: 64MB, took 0.007 secs
2015-04-21T16:31:13.847-0400 [conn23] build index on: vespa.vespastats properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "vespa.vespastats" }
2015-04-21T16:31:13.848-0400 [conn23] added index to empty collection
2015-04-21T16:31:13.857-0400 [conn23] build index on: vespa.vespastats properties: { v: 1, key: { url: 1 }, name: "url_1", ns: "vespa.vespastats", background: true }
2015-04-21T16:31:13.857-0400 [conn23] added index to empty collection
2015-04-21T16:31:13.858-0400 [conn23] build index on: vespa.vespastats properties: { v: 1, key: { r_tstpm: 1 }, name: "r_tstpm_1", ns: "vespa.vespastats", background: true }
2015-04-21T16:31:13.859-0400 [conn23] added index to empty collection
2015-04-21T16:31:13.860-0400 [conn23] build index on: vespa.vespastats properties: { v: 1, key: { url: 1, r_tstpm: 1 }, name: "url_1_r_tstpm_1", ns: "vespa.vespastats", background: true }
2015-04-21T16:31:13.860-0400 [conn23] added index to empty collection
2015-04-21T16:31:13.862-0400 [conn23] end connection 127.0.0.1:58444 (0 connections now open)
now when i login to my new mongodb instance and check collection size, i get a big 0 -
# mongo
MongoDB shell version: 2.6.9
connecting to: test
> use vespa
switched to db vespa
> db.auth('vespauser', '<paswd>')
1
> db.vespastats.find()
> db.vespastats.count()
0
>
Collection may or may not exist in the used database but the query is not returning an error, just 0.
db.vespastats.find().count()
The issue should be because it is added to database test. (doc mentions it should be automatic but I was able to reproduce this behaviour).
Therefore
use test
db.vespastats.find().count()
would have returned the actual documents in the collection vespastats.
The issue is caused by not specifying db name when using mongo binary command mongorestore. doc for mongorestore mongorestore --nsInclude=vesta.vestastats should be the updated version (even if -d still works).
To know where the collection would land, I would run 2 times the restore dump and check show dbs in mongo shell 3 times (before and after) > the db size is changing (not immediately though as it may show 8kb right after the restoration).

Mongorestore not restoring data

I have an existing mongodump of a single collection that I am trying to restore. After running mongo restore, no errors show up and the data is not in the collection. Are there any known reasons how this could happen? I would expect if the data weren't inserted for some reason, an error would be provided in the log.
To create and attempt to restore the dump, I followed the answer provided for this question:
How to use mongodump for 1 collection
I've created a new database on a different server and it has an empty collection. I've checked the mongo log file and there are no errors, it shows the connection open and authenticate, then disconnect on the next line.
mongorestore -vvvvv -u user -p 'password' --db=MyDatabase --collection=MyCollection dump1/MyCollection.bson
2015-03-04T18:20:31.331+0000 creating new connection to:127.0.0.1:27017
2015-03-04T18:20:31.332+0000 [ConnectBG] BackgroundJob starting: ConnectBG
2015-03-04T18:20:31.332+0000 connected to server 127.0.0.1:27017 (127.0.0.1)
2015-03-04T18:20:31.332+0000 connected connection!
connected to: 127.0.0.1
2015-03-04T18:20:31.333+0000 drillDown: dump1/MyCollection.bson
2015-03-04T18:20:31.333+0000 dump1/MyCollection.bson
2015-03-04T18:20:31.333+0000 going into namespace [MyDatabase.MyCollection]
Restoring to MyDatabase.MyCollection without dropping. Restored data will be inserted without raising errors; check your server log
file size: 94876
130 objects found
2015-03-04T18:20:31.336+0000 Creating index: { key: { _id: 1 }, name: "_id_", ns: "MyDatabase.MyCollection" }
2015-03-04T18:20:31.340+0000 Creating index: { key: { geometry: "2dsphere" }, name: "geometry_2dsphere", ns: "MyDatabase.MyCollection", 2dsphereIndexVersion: 2 }
Log file:
2015-03-04T18:20:31.333+0000 [conn874] authenticate db: MyDatabase { authenticate: 1, nonce: "xxx", user: "user", key: "xxx" }
2015-03-04T18:20:31.342+0000 [conn874] end connection 127.0.0.1:59420 (25 connections now open)
The query I am using on the origin and destination is:
db.MyCollection.find()
On the origin server, the collection has 130 elements, which is what is also shown in the mongorestore output "130 objects found".
Edit:
I added the --drop option to the mongorestore command. The log file output clearly shows that it is creating the index on an empty collection.
2015-03-20T15:03:57.565+0000 [conn61965] authenticate db: MyDatabase { authenticate: 1, nonce: "xxx", user: "user", key: "xxx" }
2015-03-20T15:03:57.566+0000 [conn61965] CMD: drop MyDatabase.MyCollection
2015-03-20T15:03:57.631+0000 [conn61965] build index on: MyDatabase.MyCollection properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "MyDatabase.MyCollection" }
2015-03-20T15:03:57.631+0000 [conn61965] added index to empty collection
2015-03-20T15:03:57.652+0000 [conn61965] build index on: MyDatabase.MyCollection properties: { v: 1, key: { geometry: "2dsphere" }, name: "geometry_2dsphere", ns: "MyDatabase.MyCollection", 2dsphereIndexVersion: 2 }
2015-03-20T15:03:57.652+0000 [conn61965] added index to empty collection
2015-03-20T15:03:57.654+0000 [conn61965] end connection 127.0.0.1:59456 (21 connections now open)
So the issue ended up being that the user I was trying to do the restore with only had the read and dbAdmin roles. I had made a separate user so that the regular user used by the application did not have administrative rights. After changing my user's role from read to readWrite, it worked as expected.
To be honest, if the user didn't have the correct permissions, I really would have expected the log to show an error of some sort when it tries to run the restore without the correct permission.

grunt-sftp-deploy unable to connect to server

I am a noob to grunt and would like to start using it.
Here is my gruntfile:
module.exports = function(grunt) {
// Project configuration.
grunt.initConfig({
pkg: grunt.file.readJSON('package.json'),
devDir: 'dev/dir',
prodDir: 'prod/dir',
'sftp-deploy': {
prod: {
auth: {
host: 'server.com',
port: 22,
authKey: {
"username": "username1",
"password": "password2"
}
},
src: '<%=devDir%>',
dest: '/test/env/',
concurrency: 4,
progress: true
}
}
});
// load modules
grunt.loadNpmTasks('grunt-sftp-deploy');
// Default task(s).
grunt.registerTask('default', ['sftp-deploy']);
};
I am getting this error when i run 'grunt' in powershell:
Running "sftp-deploy:prod" (sftp-deploy) task
Logging in with username username1
Concurrency : 4
Fatal error: Connection :: error
What am I doing wrong?
thanks!
Ok, a few things to try... (sorry - a month late!)
run:
grunt sftp-deploy --verbose
This will give you a little more info regarding your error.
I solved my error after realising I couldn't create folders on my server, only upload files. So it might be worth testing that you can accomplish manually what your asking grunt to do.
Lastly, try moving your username / password into a .ftppass file
link [here] (https://www.npmjs.com/package/grunt-sftp-deploy)