mongoimport loading only 1000 rows on sharding - mongodb

I have a mongo sharding setup configuration like
6 config server
3 shard server (with replica)
6 router
for example:
**s1->s2 (one shard with replicat (primary:s1,secondry:s2))
s3->s4 (2nd shard with replics (primary s3, secondry s4))
s5->s6 (third shard with replics (primary s5, secondry s6))
config, router is on all server i.e s1 to s6**
I am not able to import data to one of the empty sharded collection , data is in csv format.
I m running mongoimport in background and the nohup out shows like this
**2017-01-10T17:13:18.444+0530 [........................] dbname.collectionname 364.0 KB/46.1 MB (0.8%)**
mongoimport is stuck , how to fix this.
I first tried to run mongoimport on s2 but not succeeded then try to run mongoimport on s1 no success
follwing are the errors servers from routerlog , configuration log
**HostnameCanonicalizationWorker
[rsBackgroundSync] we are too stale to use **** as a
sync source
REPL [ReplicationExecutor] could not find member to sync from
REPL [ReplicationExecutor] The liveness timeout does not match callback handle, so not resetting it.
REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode**

Related

MongoDB Resync Failure

We have a shard server with 4 shard PSA Architecture. The overall DB size is around 5Tb. And one of the shard secondary service have failed we started resyc from primary.
We are facing an issue when i am trying to resync data from a primary to secondary.
MongoDB Version 4.0.18
DataSize for that Shard: 571Gb
Oplog Size : Deault
Error Message:
2020-10-06T08:57:57.165+0530 I REPL [replication-339] We are too stale to use host:port as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1601947649, 446) is before their earliest timestamp: Timestamp(1601951946, 330) for 1min until: 2020-10-06T08:58:57.165+0530
You need to do an initial sync to the dead node. See https://docs.mongodb.com/manual/core/replica-set-sync/#initial-sync.

MongoDB: mongoimport loses connection when importing big files

I have some trouble importing a JSON file to a local MongoDB instance. The JSON was generated using mongoexport and looks like this. No arrays, no hardcore nesting:
{"_created":{"$date":"2015-10-20T12:46:25.000Z"},"_etag":"7fab35685eea8d8097656092961d3a9cfe46ffbc","_id":{"$oid":"562637a14e0c9836e0821a5e"},"_updated":{"$date":"2015-10-20T12:46:25.000Z"},"body":"base64 encoded string","sender":"mail#mail.com","type":"answer"}
{"_created":{"$date":"2015-10-20T12:46:25.000Z"},"_etag":"7fab35685eea8d8097656092961d3a9cfe46ffbc","_id":{"$oid":"562637a14e0c9836e0821a5e"},"_updated":{"$date":"2015-10-20T12:46:25.000Z"},"body":"base64 encoded string","sender":"mail#mail.com","type":"answer"}
If I import a 9MB file with ~300 rows, there is no problem:
[stekhn latest]$ mongoimport -d mietscraping -c mails mails-small.json
2015-11-02T10:03:11.353+0100 connected to: localhost
2015-11-02T10:03:11.372+0100 imported 240 documents
But if try to import a 32MB file with ~1300 rows, the import fails:
[stekhn latest]$ mongoimport -d mietscraping -c mails mails.json
2015-11-02T10:05:25.228+0100 connected to: localhost
2015-11-02T10:05:25.735+0100 error inserting documents: lost connection to server
2015-11-02T10:05:25.735+0100 Failed: lost connection to server
2015-11-02T10:05:25.735+0100 imported 0 documents
Here is the log:
2015-11-02T11:53:04.146+0100 I NETWORK [initandlisten] connection accepted from 127.0.0.1:45237 #21 (6 connections now open)
2015-11-02T11:53:04.532+0100 I - [conn21] Assertion: 10334:BSONObj size: 23592351 (0x167FD9F) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "mails"
2015-11-02T11:53:04.536+0100 I NETWORK [conn21] AssertionException handling request, closing client connection: 10334 BSONObj size: 23592351 (0x167FD9F) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "mails"
I've heard about the 16MB limit for BSON documents before, but since no row in my JSON file is bigger than 16MB, this shouldn't be a problem, right? When I do the exact same (32MB) import one my local computer, everything works fine.
Any ideas what could cause this weird behaviour?
I guess the problem is about performance, any way you can solved used:
you can use mongoimport option -j. Try increment if not work with 4. i.e, 4,8,16, depend of the number of core you have in your cpu.
mongoimport --help
-j, --numInsertionWorkers= number of insert operations to run
concurrently (defaults to 1)
mongoimport -d mietscraping -c mails -j 4 < mails.json
or you can split the file and import all files.
I hope this help you.
looking a little more, is a bug in some version
https://jira.mongodb.org/browse/TOOLS-939
here another solution you can change the batchSize, for default is 10000, reduce the value and test:
mongoimport -d mietscraping -c mails < mails.json --batchSize 1
Quite old, but I struggled on same issue.
If you want to import big files, especially remote with Compass or by Program just add
&wtimeoutMS=0
to your Connection-String. This removes Timeout on Write-Operations.

mongorestore failing because of DocTooLargeForCapped error

I'm trying to restore a collection like so:
$ mongorestore --verbose --db MY_DB --collection MY_COLLECTION /path/to/MY_COLLECTION.bson --port 1234 --noOptionsRestore
Here's the error output (timestamps removed):
using write concern: w='majority', j=false, fsync=false, wtimeout=0
checking for collection data in /path/to/MY_COLLECTION.bson
found metadata for collection at /path/to/MY_COLLECTION.metadata.json
reading metadata file from /path/to/MY_COLLECTION.metadata.json
skipping options restoration
restoring MY_DB.MY_COLLECTION from file /path/to/MY_COLLECTION.bson
file /path/to/MY_COLLECTION.bson is 241330 bytes
error: write to oplog failed: DocTooLargeForCapped document doesn't fit in capped collection. size: 116 storageSize:1206976512 # 28575
error: write to oplog failed: DocTooLargeForCapped document doesn't fit in capped collection. size: 116 storageSize:1206976512 # 28575
restoring indexes for collection MY_DB.MY_COLLECTION from metadata
Failed: restore error: MY_DB.MY_COLLECTION: error creating indexes for MY_DB.MY_COLLECTION: createIndex error: exception: write to oplog failed: DocTooLargeForCapped document doesn't fit in capped collection. size: 116 storageSize:1206976512 # 28575
The result of the restore is a database and collection with correct names but no documents.
OS: Ubuntu 14.04 running on Azure VM.
I just solved my own problem. See answer below.
The problem seemed to be that I was using mongod on the replica set PRIMARY member.
Once I commented out the following line in /etc/mongod.conf, it worked without problems:
replSet=REPL_SET_NAME --> #replSet=REPL_SET_NAME
I assume passing the correct replica set name to the mongorestore command (like in this question) could also work, but haven't tried that yet.

SocketException in Mongo

I just set up a replica set in Mongo (prod environment). I'm now getting a lot of exceptions like below (clipped).
I went into mongo and ran a serverStatus command on my primary mongo node and only have about 300 connections going, so it's hardly working.
Below are my connection option settings in my server code:
auto_connect_retry = false
connections_per_host = 10
threads_multiplier = 10
max_wait_time = 120000
connect_timeout = 10000
socket_timeout = 0
Do I have something mis-configured?
Sep 9, 2013 8:31:26 PM com.mongodb.DBPortPool gotError
WARNING: emptying DBPortPool to /10.0.8.10:27017 b/c of error
java.net.SocketException: Connection timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.bson.io.Bits.readFully(Bits.java:46)
at org.bson.io.Bits.readFully(Bits.java:33)
at org.bson.io.Bits.readFully(Bits.java:28)
at com.mongodb.Response.<init>(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:142)
at com.mongodb.DBPort.call(DBPort.java:92)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:244)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:216)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:288)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:273)
at com.mongodb.DBCollection.findOne(DBCollection.java:347)
at com.mongodb.DBCollection.findOne(DBCollection.java:332)
at com.mongodb.casbah.MongoCollectionBase$class.findOneByID(MongoCollection.scala:232)
at com.mongodb.casbah.MongoCollection.findOneByID(MongoCollection.scala:866)
at com.novus.salat.dao.SalatDAO.findOneById(SalatDAO.scala:353)
at com.novus.salat.dao.ModelCompanion$class.findOneById(ModelCompanion.scala:173)
Generally a connection timeout occurs from one of the following in a replica set
1) All members are not able to communicate with each other
2) A program is connecting to replica for update and it is unable to send it to primary due to overload or 1st as well
3) All relicas are not in sync and one is lagging behind too much
4) Leader election is going on but not completed due to some reason
Please check if your relica set is consistent and all nodes are working by issuing rs.status() on primary node , also as earlier suggested check primary logs for more information

Strange thing about mongodb-erlang driver when using replica set

My code is like this:
Replset = {<<"rs1">>, [{localhost, 27017}, {localhost, 27018}, {localhost, 27019}]},
Conn_Pool = resource_pool:new (mongo:rs_connect_factory(Replset), 10),
...
Conn = resource_pool:get(Conn_Pool)
case mongo:do(safe, master, Conn, ?DATABASE,
fun() ->
mongo:insert(mytable, {'_id', 26, d, 11})
end end)
...
27017 is the primary node, so ofc I can insert the data successfully.
But, when I put only one secondary node in the code instead of all of mongo rs instances: Replset = {<<"rs1">>, [{localhost, 27019}]}, I can also insert the data.
I thought it should have thrown exception or error, but it had written the data successfully.
why that happened?
When you connect to a replica set, you specify the name of the replSet and some of the node names as seeds. The driver connects to the seed nodes in turn and discovers the real replica set membership/config/status via 'db.isMaster()' command.
Since it discovers which node is the primary that way, it is able to then route all your write requests accordingly. The same technique is what enables it to automatically failover to the newly elected primary when the original primary fails and a new one is elected.