We are experiencing very slow balancing in our cluster. On our log, it seems that migration progress barely makes progress:
2016-01-25T22:21:15.907-0600 I SHARDING [conn142] moveChunk data transfer progress: { active: true, ns: "music.fav_artist_score", from: "rs1/MONGODB01-SRV:27017,MONGODB05-SRV:27017", min: { _id.u: -9159729253516193447 }, max: { _id.u: -9157438072680830290 }, shardKeyPattern: { _id.u: "hashed" }, state: "clone", counts: { cloned: 128, clonedBytes: 12419, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
2016-01-25T22:21:16.932-0600 I SHARDING [conn142] moveChunk data transfer progress: { active: true, ns: "music.fav_artist_score", from: "rs1/MONGODB01-SRV:27017,MONGODB05-SRV:27017", min: { _id.u: -9159729253516193447 }, max: { _id.u: -9157438072680830290 }, shardKeyPattern: { _id.u: "hashed" }, state: "clone", counts: { cloned: 128, clonedBytes: 12419, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
2016-01-25T22:21:17.957-0600 I SHARDING [conn142] moveChunk data transfer progress: { active: true, ns: "music.fav_artist_score", from: "rs1/MONGODB01-SRV:27017,MONGODB05-SRV:27017", min: { _id.u: -9159729253516193447 }, max: { _id.u: -9157438072680830290 }, shardKeyPattern: { _id.u: "hashed" }, state: "clone", counts: { cloned: 128, clonedBytes: 12419, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
Also, when we shard a new collection. It initially only starts with 8 chunks in the same primary replica set. It does not migrating chunks to other shards
Our configuration is 4 replica sets of (primary, secondary, arbiter) & 3 configs in a replica set. Both sh.getBalancerState() & sh.isBalancerRunning() return true
In MongoDB sharding performance depends upon the key chose for sharding the database. Since, your chunks are always stored on a single node it is highly probable that the shard key you have chosen is monotonically increasing. To avoid this issue, hash the key to allow proper balance of chunks across all the shards. Use the following command for hashed sharding.
sh.shardCollection( "<your-db>", { <shard-key>: "hashed" } )
Related
We currently have a ttl index on a timestamp in MongoDB.
Before we had an ttl expireAfterSeconds : 604800 seconds -> 1 week of documents.
A few weeks ago we change this to a TTL index of 31536000 seconds -> 365 days.
But MongoDB still removes documents after 7 days instead of 365.
MongoDB version: 4.2.18
MongoDB running hosted Atlas / AWS
Indexes in database:
{
v: 2,
key: { _id: 1 },
name: '_id_',
ns: '<DB>.<COLLECTION>'
},
{
v: 2,
key: { id: 1, ts: 1 },
name: 'id_1_ts_1',
ns: '<DB>.<COLLECTION>
},
{
v: 2,
key: { ts: 1 },
name: 'ts_1',
ns: '<DB>.<COLLECTION>',
expireAfterSeconds: 31536000
}
]
We have an environmental variable to set the TTL index on each start of the application,
the code to set the ttl index looks like this:
await collection.createIndex(
{ ts: 1 },
{
expireAfterSeconds: ENV.<Number>
}
);
} catch (e) {
if ((e as any).codeName == "IndexOptionsConflict") {
await collection.dropIndex("ts_1");
await collection.createIndex(
{ ts: 1 },
{
expireAfterSeconds: ENV.<Number>
}
);
}
}
as seen from the indexes, the TTL index should remove documents after one year?
Why does it behave like this? Any insights?
I would look into the collMod command
Using a MongoDB default setup with max chunk size of 64Mb and max document size of 16Mb (https://docs.mongodb.org/manual/reference/limits/).
I had the issue of a chunk to big that needed to be split but the split failed.
I don't understand why. Let's assume we have big documents, if a chunk is created with a document (16Mb), then a new document comes (32Mb), and so on, until we reach 64Mb, then the split process occurs. Why can it fail? why don't split the chunk in two halves of 2 16Mb documents?
Or in the case the documents are like:
10Mb + 16Mb + 16Mb + 16Mb + 16Mb = 74Mb (needs to split)
It can do:
10Mb + 16Mb + 16Mb = 42Mb
16Mb + 16Mb = 32Mb
Why is this failing?
MongoDB log:
2016-04-06T12:10:53.424-0700 I SHARDING [Balancer] ns: reports.storePoints going to move { id: "reports.storePoints-storeId-9150629136201875191", ns: "reports.storePoints", min: { storeId: -9150629136201875191 }, max: { storeId: -9148794391694793543 }, version: Timestamp 666000|1, versionEpoch: ObjectId('54ac908475d02f0bdb362171'), lastmod: Timestamp 666000|1, lastmodEpoch: ObjectId('54ac908475d02f0bdb362171'), shard: "rs1" } from: rs1 to: rs2 tag []
2016-04-06T12:10:53.477-0700 I SHARDING [Balancer] moving chunk ns: reports.storePoints moving ( ns: reports.storePoints, shard: rs1:rs1/host1:27017,host2:27017, lastmod: 666|1||000000000000000000000000, min: { storeId: -9150629136201875191 }, max: { storeId: -9148794391694793543 }) rs1:rs1/host1:27017,host1:27017 -> rs2:rs2/host2:27017,host2:27017
2016-04-06T12:10:54.052-0700 I SHARDING [Balancer] moveChunk result: { chunkTooBig: true, estimatedChunkSize: 111379744, ok: 0.0, errmsg: "chunk too big to move", $gleStats: { lastOpTime: Timestamp 1455673163000|2, electionId: ObjectId('55b838ecf80a354067abdbbf') } }
2016-04-06T12:10:54.053-0700 I SHARDING [Balancer] balancer move failed: { chunkTooBig: true, estimatedChunkSize: 111379744, ok: 0.0, errmsg: "chunk too big to move", $gleStats: { lastOpTime: Timestamp 1455673163000|2, electionId: ObjectId('55b838ecf80a354067abdbbf') } } from: rs1 to: rs2 chunk: min: { storeId: -9150629136201875191 } max: { storeId: -9148794391694793543 }
2016-04-06T12:10:54.053-0700 I SHARDING [Balancer] performing a split because migrate failed for size reasons
2016-04-06T12:10:54.063-0700 I SHARDING [Balancer] split results: CannotSplit chunk not full enough to trigger auto-split
2016-04-06T12:10:54.063-0700 I SHARDING [Balancer] marking chunk as jumbo: ns: reports.storePoints, shard: rs1:rs1/host1:27017,host2:27017, lastmod: 666|1||000000000000000000000000, min: { storeId: -9150629136201875191 }, max: { storeId: -9148794391694793543 }
How do I enable sharding in test environment? Here I am sharing what i have done till now, I have one config server:
Config server1: Host-a:27019
One mongos instance on same machine on port 27017
and two mongod instance shard:
Host-a:27020
host-b:27021
When I am enabling sharding on collection it gives me this error:
2016-01-12T10:31:07.522Z I SHARDING [Balancer] ns: feedproductsdata.merchantproducts going to move { _id: "feedproductsdata.merchantproducts-product_id_MinKey", ns: "feedproductsdata.merchantproducts", min: { product_id: MinKey }, max: { product_id: 0 }, version: Timestamp 1000|0, versionEpoch: ObjectId('5694d57ebe78315b68519c38'), lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('5694d57ebe78315b68519c38'), shard: "shard0001" } from: shard0001 to: shard0000 tag []
2016-01-12T10:31:07.523Z I SHARDING [Balancer] moving chunk ns: feedproductsdata.merchantproducts moving ( ns: feedproductsdata.merchantproducts, shard: shard0001:192.168.1.12:27021, lastmod: 1|0||000000000000000000000000, min: { product_id: MinKey }, max: { product_id: 0 }) shard0001:192.168.1.12:27021 -> shard0000:192.168.1.8:27020
2016-01-12T10:31:08.530Z I SHARDING [Balancer] moveChunk result: { errmsg: "exception: socket exception [CONNECT_ERROR] for cfg1.server.com:27019", code: 11002, ok: 0.0 }
2016-01-12T10:31:08.531Z I SHARDING [Balancer] balancer move failed: { errmsg: "exception: socket exception [CONNECT_ERROR] for cfg1.server.com:27019", code: 11002, ok: 0.0 } from: shard0001 to: shard0000 chunk: min: { product_id: MinKey } max: { product_id: 0 }
2016-01-12T10:31:08.604Z I SHARDING [Balancer] distributed lock 'balancer/Knowledgeops-PC:27017:1452594322:41' unlocked.
I'm using replication set with 3 members, this is the code for firing mapreduce
var db=new mongodb.Db('sns',replSet,{"readPreference":"secondaryPreferred", "safe":true});
....
collection.mapReduce(Account_Map,Account_Reduce,{out:{'replace': 'log_account'},query:queryObj},function(err,collection){};
Then my primary died and restarted, but voted becoming a secondary, and there remains a collection sns.tmp.mr.account_0 instead of log_account. I'm very new to mongodb, I really want to figure out what the promblem is.
2015-02-06T14:04:34.443+0800 [conn87299] build index on: sns.tmp.mr.account_0_inc properties: { v: 1, key: { 0: 1 }, name: "_temp_0", ns: "sns.tmp.mr.account_0_inc" }
2015-02-06T14:04:34.443+0800 [conn87299]added index to empty collection
2015-02-06T14:04:34.457+0800 [conn87299] build index on: sns.tmp.mr.account_0 properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "sns.tmp.mr.account_0" }
After carefully following this guide, on how to migrate a 3 member replica set to a shard containing 2 X 3 member replica sets, after sharding one of my collections nothing happened. Looking into the logs, i got this:
2014-06-22T16:49:55.467+0000 [Balancer] distributed lock 'balancer/vt-mongo-6:27018:1403429707:1804289383' unlocked.
2014-06-22T16:50:01.830+0000 [Balancer] distributed lock 'balancer/vt-mongo-6:27018:1403429707:1804289383' acquired, ts : 53a70939025c137381ef2126
2014-06-22T16:50:01.945+0000 [Balancer] ns: database.emailevents going to move { _id: "database.emailevents-_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('53a6d810089f342ad32992d4'), ns: "database.emailevents", min: { _id: MinKey }, max: { _id: -9199836772564449863 }, shard: "replicaset2" } from: replicaset2 to: replicaset4 tag []
2014-06-22T16:50:01.945+0000 [Balancer] moving chunk ns: database.emailevents moving ( ns: database.emailevents, shard: replicaset2:replicaset2/vt-mongo-4:27017,vt-mongo-5:27017,vt-mongo-6:27017, lastmod: 1|0||000000000000000000000000, min: { _id: MinKey }, max: { _id: -9199836772564449863 }) replicaset2:database_replicaset2/vt-mongo-4:27017,vt-mongo-5:27017,vt-mongo-6:27017 -> replicaset4:replicaset4/vt-mongo-1:27017,vt-mongo-2:27017,vt-mongo-3:27017
2014-06-22T16:50:01.954+0000 [Balancer] moveChunk result: { errmsg: "exception: no primary shard configured for db: config", code: 8041, ok: 0.0 }
2014-06-22T16:50:01.955+0000 [Balancer] balancer move failed: { errmsg: "exception: no primary shard configured for db: config", code: 8041, ok: 0.0 } from: replicaset2 to: replicaset4 chunk: min: { _id: MinKey } max: { _id: -9199836772564449863 }
2014-06-22T16:50:02.166+0000 [Balancer] distributed lock 'balancer/vt-mongo-6:27018:1403429707:1804289383' unlocked.
generally, this error:
result: { errmsg: "exception: no primary shard configured for db: config", code: 8041, ok: 0.0 }
is the result of the machines not being able to talk to one-eachother, but this isn't the case. I have manually checked and all the machines are able to talk to eachother.