MongoDB takes long for indexing - mongodb

I have the following setup:
Mac Pro with 2 GB of RAM (yes, not that much)
MongoDB 1.1.3 64-bit
8 million entries in a single collection
index for one field (integer) wanted
Calling .ensureIndex(...) takes more than an hour, actually I killed the process after that. My impression is, that it takes far too long. Also, I terminated the process but the index can be seen with .getIndexes() afterwards.
Anybody knows what is going wrong here?

Adding an index on an existing data set is expected to take a while, as the entire BTree needs to be constructed. If you think this is taking an unreasonable amount of time, or you've seen a regression in performance the best bet is to ask about it on the list.

I would just like to point out the command:
db.currentOp()
which prints the current operations running on the server, and also shows the indexing process.
The foreground indexing is done in 3 steps, and the background one in 2 steps (if I remember correctly), but the background one is alot slower. The foreground one on the other hand locks the collection while indexing it (ie not very useful on a running application server).
As said before, google BTree if you are interested in how they work.
Anybody knows what is going wrong here?
Are you running via ssh or connecting remotely in some way? Sound a bit like a broken pipe issue. Did you create the index with {background : true} or not?

Related

MongoDB - Intermitted slowdowns involving sorting

A couple buddies and I have been having a serious issue with MongoDB. Before I get in to this lets just get some stats out of the way.
Running on a Dedicated VPS with Softlayer
-Ubuntu 13.04 inside an ESX VM
-4 Virtual cores running at 2.0 GHz each
-48GB of RAM
-All on an SSD
-MongoDB Version 2.6.5 (and issue happening on the last 6 releases)
Were doing polygon geo queries using a 2dsphere index, as well a few other variables with ints and booleans added. There's about 7 keys total we'll query at a time, and then sort them afterwards depending what the user requests to sort by (Price, closest point to location, ect).
Now here's where things gets stupidly complicated. At random time in the days, specific locations will stop returning queries in a reasonable time, instead of the 300 to 2000 milliseconds average it returns in like 30 to 150 seconds, and this is ONLY for specific locations. Searching for New York City will return, but searching for London England will take ages.
The second we change what key were sorting by (we've even tried sorting by _id) everything goes back to normal. Then later on, that sort key will break for random locations, and by flipping it it repairs. Another method we found to fix is completely reindexing the key were sorting by (deleting and recreating), but then our search goes down for the time it takes to recreate the index.
This method works, sure, but changing the sorting method defeats the purpose, and reindexing is just more and more downtime for something that shouldn't be happening.
This issue could be a bunch of things, so let me just go through quickly the ones we've diagnosed and found not to be the issue.
-Does not relate to amount of users accessing database (Could be 0, could be 1000).
-Lock is averaged at 2.5% or below.
-Btree is normal between 0 and 200.
-Happens even when not writing to database (and haven't done so in hours)
-CPU usage is minimal
-RAM usage is only 60% average.
-Data in listings are valid.
-All keys within the indexes are the proper types. No string in int indexes, no datetimes in float indexs, ect.
-Has nothing to do with how many results are returned. Issue happens in locations where there is 10000 results, or just 5.
-No errors in logs, no errors in MMS, no errors anywhere... Just slow.
I'm out of ideas here, I simply can't figure out what it may be. Has anyone ever come across something like this? Or have any ideas on what it may be? Anything is appreciated. If I've forgotten to mention something or need to redo how I've explained it please just ask, no issues rewriting. Thanks again.

Using find() with arguments that will result in not finding anything results in the RAM never getting "dropped"

I'm currently setting up a mongodb and so far I have a DB with 25million objects, and some index. When I try use find() with 4 arguments and I know that one of the arguments are "wrong" or that the result of the query will be nothing found, then what happens is that mongodb takes all my RAM (8gb) and the time to get the "nothing found" varies a lot.
BUT the biggest problem is that once it's done it never lets go of the RAM and I need to restart mongodb to get back the RAM.
I've tried many other tests, such as adding huge amounts of objects (1million+), use the find() for something that I know exist and other operations and they work fine.
Also want to know that this works fine if I have a smaller DB like 1million objects.
I will keep on doing test on this, but if anyone have anything they know right away that could help that would be great.
EDIT: Tested some more with 1m DB. The first failed found took about 4sec and took some RAM, the proceding fails took only 0.6sec and didn't seem to affect the RAM. Does mongodb store the DB in RAM when trying to find a document and with very huge DB it can't get all in the cache?
Btw. on the 25mil it first took about 60sec to not find it, but the 2nd try took 174sec and after that find it had taken all the 8gb on RAM.

Mongo database extremely slow until I restart

I just inherited an application from another developer, and I've been asked to fix some latency issues that users have been experiencing. The problem is that any page that makes db calls to mongo takes several minutes to load in the browser.
When I restart mongo, however, everything speeds up again, and the application functions normally. I see several cron jobs that run throughout the day, and I believe one of these may be causing mongo to slow down.
Unfortunately, I have no experience with mongo (only mysql), and I really don't have any idea of what I'm looking for in terms of things that could be making mongo run so slowly.
Anyways, I was hoping someone could suggest some potential things that could be causing the latency so I can approach this problem better. I have looked in the mongo logs, and the only thing I see that could be of concern is a message that says:
warning: can't find plugin [asc]
I know this may point to an indexing problem, but are there any other obvious things I should be investigating?
From what I read at https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/pqPvMq7cSBw it looks like one of your queries declared
db.a2.find().sort({a:"asc"})
rather than
db.a2.find().sort({a:1})
In MongoDB you need to declare your sorting with either 1 or -1, there's no asc or desc constants for the sorting. So I would recommend that you check if any of your queries runs incorrectly. You can check what queries are running through the log files (with correct profiling settings) http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/ . You may use mongotop (http://docs.mongodb.org/manual/reference/mongotop/) to see where the most time reading/writing data is spent for your collections, as well.

Why does MongoDB *client* use more memory than the server in this case?

I'm evaluating MongoDB. I have a small 20GB subset of documents. Each is essentially a request log for a social game along with some captured state of the game the user was playing at that moment.
I thought I'd try finding game cheaters. So I wrote a function that runs server side. It calls find() on an indexed collection and sorts according to the existing index. Using a cursor it goes through all documents in indexed order. The index is {user_id,time}. So I'm going through each user's history, checking if certain values (money/health/etc) increase faster than is possible in the game. The script returns the first violation found. It does not collect violations.
The ONLY thing that this script does on the client is define the function and calls mymongodb.eval(myscript) on a mongod instance on another box.
The box that mongod is running on does fine. The one that the script is launched from starts losing memory and swap. Hours later: 8GB of RAM and 6GB of swap are being used on the client machine that did nothing more than launch a script on another box and wait for a return value.
Is the mongo client really that flakey? Have I done something wrong or made an incorrect assumption about mongo/mongod?
If you just want to open up a client connection to a remote database you should use the mongo command, not mongod. mongod starts up a server on your local machine. Not sure what specifying a url will do.
Try
mongo remotehost:27017
From the documentation:
Use map/reduce instead of db.eval() for long running jobs. db.eval blocks other operations!
eval is a function that blocks the entire server if you don't use a special flag. Again, from the docs:
If you don't use the "nolock" flag, db.eval() blocks the entire mongod process while running [...]
You are kind of abusing MongoDB here. Your current routine is strange, because it returns the first violation found, but it will have to re-check everything when run the next time (unless your user ids are ordered and you store the last evaluated user id).
Map/Reduce generally is the better option for a long-running task, but aggregating your data does not seem trivial. However, a map/reduce based solution would also solve the re-evaluation problem.
I'd probably return something like this from map/reduce:
user id -> suspicious actions, e.g.
------
2525454 -> [{logId: 235345435, t: ISODate("...")}]

MongoDB - anonymizing 600k records

I am trying to anonymize a large data set of about 600k records (removing sensitive information like email, etc.) so that it can be used for some performance tests.
I am using Scala (Casbah) with Mongo. The actual script is pretty simple and straightforward. When I run the script, the entire process starts off pretty fast - parsing 1000 records every 2-3 seconds, but it slows down tremendously and starts crawling very slowly.
I know this is pretty vague without too much details, but any idea why this is happening, and any hints on how I could speed this up?
It turned out to be an issue with the driver and not with Mongo. When I tried the same inserts using the mongo shell, it was through without breaking a sweat.
UPDATE
So, I tried both approaches. Inserting into existing collection and dumping the results in a new collection. The first approach was faster for me. Of course, one should never assume this to be always true and must benchmark before choosing the first approach over the second. In both case, Mongo was very very fast (meaning - it was not taking hours to get this done). There was a problem with the Java interface I was using to connect with Mongo, which was more of a stupid mistake on my part.