MongoLab (Mongodb) is slow when query old data - mongodb

We are using MongoLabs and when I query my old data it never returns me results and my web-app simply timeout. I can guess that old data is in the disk and searching through disk is slow. But really does mongodb store all the new data in RAM ? Most of recent data works fine.
What is the real cause for this ? Is there any solutions for make it even. I have 32,401,864 documents in my database and have already created enough indexes based on queries and have TTL set to 100 days.
Number of documents I have is super high for mongodb ?

If you haven't attempted this already, try turning on the profiler, in the tools tab and the commands subtab. Then select the profile (log slow) command. Also, try examining the system.profile collection, since the profiler deposits all profile data into a collection called system.profile. After running your application for a bit, go into the system.profile collection and take a looksee. Just a side note, you don'w ant to leave the profiler on if you're not using it. There is a bit of overhead if you don't turn it off. select profile turn off command from the menu and then run the command

Related

MongoDB closes connection on read operation

I run MongoDB 4.0 on WiredTiger under Ubuntu Server 16.04 to store complex documents. There is an issue with one of the collections: the documents have many images written as strings in base64. I understand this is a bad practice, but I need some time to fix it.
Because of this some find operations fail, but only those which have a non-empty filter or skip. For example db.collection('collection').find({}) runs OK while db.collection('collection').find({category: 1}) just closes connection after a timeout. It doesn't matter how many documents should be returned: if there's a filter, the error will pop every time (even if it should return 0 docs), while an empty query always executes well until skip is too big.
UPD: some skip values make queries to fail. db.collection('collection').find({}).skip(5000).limit(1) runs well, db.collection('collection').find({}).skip(9000).limit(1) takes way much time but executes too, while db.collection('collection').find({}).skip(10000).limit(1) fails every time. Looks like there's some kind of buffer where the DB stores query related data and on the 10000 docs it runs out of the resources. The collection itself has ~10500 docs. Also, searching by _id runs OK. Unfortunately, I have no opportunity to make new indexes because the operation fails just like read.
What temporary solution I may use before removing base64 images from the collection?
This happens because such a problematic data scheme causes huge RAM usage. The more entities there are in the collection, the more RAM is needed not only to perform well but even to run find.
Increasing MongoDB default RAM usage with storage.wiredTiger.engineConfig.cacheSizeGB config option allowed all the operations to run fine.

All documents in the collection magically disappeared. Can i find out what happened?

I cloned 2 of my collections from localhost to a remote location on MongoLab platform yesterday. I was trying to debug my (MEAN stack) application (with WebStorm IDE) and i realized one of those collections have no data in it. Well, there were 7800 documents this morning...
I am pretty much the only one who works on the database and especially with this collection. I didn't run any query to remove all of the documents from this collection. In mongolab's website there is a button says 'delete all documents from collection'. I am pretty sure I didn't hit that button. I asked my team mates; no one even opened that web page today.
Assuming that my team is telling the truth and I didn't remove everything and have a black out...
Is there a way to find out what happened?
And, is there a way to keep a query history (like unix command-line history) for mongo database that runs on a remote server? And if yes, how?
So, I am just curious about what happened. Also note that I don't have any DBA responsibilities or experience in that field.
MongoDB replica sets have a special collection called oplog. This collection stores all write operations for all databases in that replica set.
Here are instructions on how to access oplog in Mongolab:
Accessing the MongoDB oplog
Here is a query that will find all delete operations:
use local
db.oplog.rs.find({"op": "d", "ns" : "db_name.collection_name"})

Is it possible to run queries on 200GB data on mongodb with 16GB RAM?

I am trying to run a simple query to find number of all records with a particular value using:
db.ColName.find({id_c:1201}).count()
I have 200GB of data. When I run this query, mongodb takes up all the RAM and my system starts lagging. After an hour of futile waiting, I give up without getting any results.
What can be the issue here and how can I solve it?
I believe the right approach in the NoSQL world isn't trying to perform a full query like that, but accumulate stats overtime.
For example, you should have a collection stats with arbitrary objects which should own a kind or id property that can take a value like "totalUserCount". Whenever you add an user, you also update this count.
This way you'll get instant results. It's just getting a property value in a small collection of stats.
BTW, this slowness should be originated by querying objects by a non-indexed property in your collection. Try to index id_c and probably you'll get quicker results.
That amount of data can easily be managed by MySQL, MSSQL or Oracle with the given hardware specification. You don't need a NoSQL database for that, NoSQL databases are made for much larger storing needs which actually require lots of hardware (RAM, harddisks) to be efficient.
You need to define an index to read that id and use a normal SQL database.

Count the number of queries from an application to mongodb

I'm interested in counting how many roundtrips to the database from my web application I'm doing during the life of a query. Not counting connections, since they are pooled and reused, but actual queries (find, insert, update, ...)
Before I start adding profiling probes in my code, is there anything, driver side or server side, that could give this sort of information?
Yes, you should take a look at system profile in MongoDB. You can set it to log all database operations to a special collection withing MongoDB:
http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/
Analyze Performance of Database Operations
The database profiler collects fine grained data about MongoDB write
operations, cursors, database commands on a running mongod instance.
You can enable profiling on a per-database or per-instance basis. The
profiling level is also configurable when enabling profiling.
The database profiler writes all the data it collects to the
system.profile collection, which is a capped collection. See Database
Profiler Output for overview of the data in the system.profile
documents created by the profiler.
It does have some impact on performance, so I'd be careful anytime you are turning it on, but it's very useful when you are trying to determine exactly what's going on underneath the covers in your application. Particularly helpful when using a framework or ORM that may take a simple function call and produce large numbers of individual queries/updates/deletes.

session table in mongo db

I'm working on schema design of a scalable session table (of a customized authentication) in mongo db. I know the scalability of Mongo DB is inherited from design and also have requirements. My user case is simple,
when user login, a random token is generated and granted to user, then insert record to session table using the token as primary key, which is shard-able. old token record would be deleted if exists.
user access service using the token
my question is, if system keep delete the expired session key, the size of the session collection (considering shard'ed situation that I need partition on the token field) possibly will grow to very big and include alot 'gap' of expired session, how to gracefully handle this problem (or any better design)?
Thanks in advance.
Edit: My question is about storage level. how mongodb manage disk space if records are frequently removed and inserted? it should be kind of an (auto-)shrink mechanism there. Hopefully won't block reads to the collection.
TTL is good and all however repair is not. --repair is not designed to be run regularly on a database, infact maybe once every 3 months or something. It does a lot of internal stuff that, if run often, will seriously damage your servers performance.
Now about reuse of disk space in such an envirionemt; when you delete a record it will free that "block". If another document fits into that "block" it will reuse that space otherwise it will actually create a new extent, meaning a new "block" a.k.a more space.
So if you want save disk space here you will need to make sure that documents do not exceed each other, fortunately you have a relatively static schema here of maybe:
{
_id: {},
token: {},
user_id: {},
device: {},
user_agent: ""
}
which should mean that documents, hopefully, will reuse their space.
Now you come to a tricky part if they do not. MongoDB will not automatically give back free space per collection (but does per database since that is the same as deleting the files) so you have to run --repair on the database or compact() on the collection to actually get your space back.
That being said, I believe your documents will be of relative size to each other so I am unsure if you will see a problem here but you could also try: http://www.mongodb.org/display/DOCS/Padding+Factor#PaddingFactor-usePowerOf2Sizes for a collection that will frequently have inserts and deletes, it should help the performance on that front.
I agree with #Steven Farley, While creating index you can set ttl, in python by pymongo driver we can do like this
http://api.mongodb.org/python/1.3/api/pymongo/collection.html#pymongo.collection.Collection.create_index
I would have to suggest you use TTL. You can read more about it at http://docs.mongodb.org/manual/tutorial/expire-data/ it would be a perfect fit for what your doing. This is only available since version 2.2
How mongo stores data: http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
Way to clean up removed records:
Command Line: mongod --repair
See: http://docs.mongodb.org/manual/reference/mongod/#cmdoption-mongod--repair
Mongo Shell: db.repairDatabase()
See: http://docs.mongodb.org/manual/reference/method/db.repairDatabase/
So you could have an automated clean up script that executes the repair, keep in mind this will block mongo for a while.
There are a few ways to achieve sessions:
Capped collections as showed in this use case.
Expire data with a TTL to the index by adding expireAfterSeconds to ensureIndex.
Cleaning sessions program side using a TTL and remove.
Faced to the same problematic, I used solution 3 for the flexibility it provides.
You can find a good overview of remove and disk optimization in this answer.