is there a limit for CopyDatabase in Mongo - mongodb

I'm copying 100Million records (about 97G) data to another server using copyDatabase in Mongo. Both servers have more than 500G diskspace. However, i notice although the process is still running, but the actual files are not added anymore. stop at xxxxx.11 any idea?

Copy database will move collection data over, then build indexes for each collection. Building indexes on a 100GB data set can take a lot of time (especially with small amounts of RAM). It's likely that you're in the middle of a large index build.
You can check the progress by watching the logs and running db.currentOp() in the shell on the destination DB.

Related

Has some reason to MongoDB continues with write operation when i stop the service that writes?

I have some applications that read data in MongoDB and only one that writes massively(real time data). These applications and MongoDB was doing well a more than 3 months.
Today i saw that some applications was consuming a lot of memory and had bad performance, and i notice a big delay in the data that are insert in MongoDB.
So i report some facts:
I stoped the service that writes in MongoDB.
I noticed that even with the service turned off, the MongoDB still write data for more than 1 minute.
2.1. I saw it when new data was write in the collection (have replicaset)
2.2. In MongoCompass Perfomance tab, in hottest collection list, the writes opperations still with about 45% in the collection that service writes.
I executed some commands about concurrency/locks, and everything is fine, and had no clients writing
I executed a command to see memory of MongoDB and dont see everthing wrong or near the limits.
The MongoDB is "setuped" in a machine in EC2 with huge resources (i don't know whichs).
Any ideas to try troubleshooting that?
Thanks in advance

MongoRestore Create Index Phase Uses 100% resources and locks up database

I'm using MongoDB. I have a table with 7M records and a weighted text search index.
When i do a MongoRestore, the create index phase of the restore uses 100% of my database's resources. MongoDB unresponsive to anything until it is done. My db is locked to any incoming connections. In fact it stops reporting any progress of the index creation to my output at that point, and my mongodb client starts getting request timeout errors. I can still tail into the server side mongodb logs to check the progress of the index creation.
I need the database to be responsive while this process is happening. It works just fine for all my other tables, which are a bit smaller. The next largest table, which works great, and still uses a weighted text search index is around 3M records.
What do i do?! Thanks.
I haven't tried this, but it seems that indexes created with { background: true } are dumped with this property by mongodump. This property will be passed to mongorestore during the index creation phase.
Maybe you could recreate some strategical indexes with the background option, and then dump the database. Then, the restore processes should put less strain on the server, and finish faster. Read and write operations should be allowed while MondoDB rebuilds the backgrounded indexes.
Notice that background index builds take longer to complete and result in a larger index. Also, this will not work with secondary replica set members, since background index creation operations will be foregrounded on them.
http://docs.mongodb.org/manual/tutorial/build-indexes-in-the-background/
http://docs.mongodb.org/manual/tutorial/build-indexes-on-replica-sets/
HTH.
I ran into similar issue(s):
Mongo restore took up so much resources that other database operations would simply time out or take on the order of a minute to complete (the restore is in essence a denial of service attack on the DB).
Mongo restore index phase completely blocked the database.
I found that to limit the bandwidth for the restore issue 1 was solved. I used the linux tc command line tool to achieve this. Tweaking the rate and burst from very low until other database operations started to be affected and then scaling it back a bit. The command looked as follows:
sudo tc qdisc change dev enp3s0 root tbf rate 30000kbit burst 40000kbit latency 5ms
To solve issue 2 I found this link which suggests you either:
update the *.metadata.json files in the dump directory to add background:true if not present.
use mongorestore's --noIndexRestore option to avoid accidentally building any indexes in the foreground, and then create the indexes with background:true after mongorestore finishes restoring the data.
Of course all of this is an issue because MongoDB best practices are not followed, which are to have the operational database always work in some form of replica set. If replication is present then one have many more options available such as (oversimplified) taking one out of the larger replication set, restore to it, and then move it back into the replication set.

Mongodb normal exit before applying a write lock

I am using python, scrapy, MongoDB for my web scraping project. I used to scrape 40Gb data daily. Is there a way or setting in mongodb.conf file so that MongoDB will exit normally before applying a write lock on db due to disk full error ?
Because every time i face this problem of disk full error in MongoDB. Then I have to manually re-install MongoDB to remove the write lock from db. I cant run repair and compact command on the database because for running this command also I need free space.
MongoDB doesn't handle disk-full errors very well in certain cases, but you do not have to uninstall and then re-install MongoDB to remove the lock file. Instead, you can just mongod.lock file from this. As long as you have journalling enabled, your data should be good. Of course, at that moment, you can't add more data to the MongoDB databases.
You probably wouldn't need repair and compact only helps if you actually have deleted data from MongoDB. compact does not compress data, so this is only useful if you indeed have deleted data.
Constant adding, and then deleting later can cause fragmentation and lots of disk space to be unused. You can prevent that mostly by using the userPowerOf2Sizes option that you can set on collections. compact mitigates this by rewriting the database files as well, but as you said you need free disk space for this. I would advice you to also add some monitoring to warn you when your data size reaches 50% of your full disk space. In that case, there is still plenty of time to use compact to reclaim unused space.

mongod clean memory used in ram

I have a huge amount of data in my mongodb. It's filled with tweets (50 GB) and my Ram is 8 GB. When querying it retrieves all tweets and mongodb starts filling the ram, when it reaches 8 GB it starts moving files to disk. This is the part where it gets really slowwwww. So i changed the query from skipping and starting using indexes. Now i have indexes and i query only 8GB to my program, save the id of the last tweet used in a file and the program stops. Then restart the program and it goes get the id of the tweet from the file. But mogod server still is ocupping the ram with the first 8GB, that no longer will be used, because i have a index to the last. How can i clean the memory of the mongo db server without restarting it?
(running in a win)
I am a bit confused by your logic here.
So i changed the query from skipping and starting using indexes. Now i have indexes and i query only 8GB to my program, save the id of the last tweet used in a file and the program stops.
Using ranged queries will not help the amount of data you have to page in (in fact it might worsen it because of the index), it merely makes the query faster server side by using an index for huge skips (like 42K+ row skip). If you are dong the same as that skip() but in index then (without a covered index) then you are still paging in exactly the same.
It is slow due to memory mapping and your working set. You have more data than RAM and not only that but you are using more of that data than you have RAM as such you are page faulting probably all the time.
Restarting the program will not solve this, nor will clearing its data OS side (with restart or specific command) because of your queries. You probably need to either:
Think about your queries so that your working set is more in line to your memory
Or shard your data across many servers so that you don't have to build up your primary server
Or get a bigger primary server (moar RAM!!!!!)
Edit
The LRU of your OS should be swapping out old data already since MongoDB is using its fully allocated lot, which means that if that 8GB isn't swapped it is because your working set is taking that full 8GB (most likely with some swap on the end).

Compact command not freeing up space in MongoDB 2.0

I just installed MongoDB 2.0 and tried to run the compact command instead of the repair command in earlier versions. My database is empty at the moment, meaning there is only one collection with 0 entries and the two system collections (indices, users). Currently the db takes about 4 GB of space on the harddisk. The db is used as a temp queue with all items being removes after they have been processed.
I tried to run the following in the mongo shell.
use mydb
db.theOnlyCollection.runCommand("compact")
It returns with
ok: 1
But still the same space is taken on the harddisk. I tried to compact the system collections as well, but this did not work.
When I run the normal repair command
db.repairDatabase()
the database is compacted and only takes 400 MB.
Anyone has an idea why the compact command is not working?
Thanks a lot for your help.
Best
Alex
Collection compaction is not supposed to decrease the size of data files. Main point is to defragment collection and index data - combine unused space gaps into continuous space allowing new data to be stored there. Moreover it may actually increase the size of data files:
Compaction may increase the total size of your data files by up to 2GB. Even in this case, total collection storage space will decrease.
http://www.mongodb.org/display/DOCS/compact+Command#compactCommand-Effectsofacompaction