MongoDB High Avg. Flush Time - Write Heavy - mongodb

I'm using MongoDB with approximately 4 million documents and around 5-6GB database size. The machine has 10GB of RAM, and free only reports around 3.7GB in use. The database is used for a video game related ladder (rankings) website, separated by region.
It's a fairly write heavy operation, but still gets a significant number of reads as well. We use an updater which queries an outside source every hour or two. This updater then processes the records and updates documents on the database. The updater only processes one region at a time (see previous paragraph), so approximately 33% of the database is updated.
When the updater runs, and for the duration that it runs, the average flush time spikes up to around 35-40 seconds, and we experience general slowdowns with other queries. The updater is RAN on a SEPARATE MACHINE and only queries MongoDB at the end, when all the data has been retrieved and processed from the third party.
Some people have suggested slowing down the number of updates, or only updating players who have changed, but the problem comes down to rankings. Since we support ties between players, we need to pre-calculate the ranks - so if only a few users have actually changed ranks, we still need to update the rest of the users ranks accordingly. At least, that was the case with MySQL - I'm not sure if there is a good solution with MongoDB for ranking ~800K->1.2 million documents while supporting ties.
My question is: how can we improve the flush and slowdown we're experiencing? Why is it spiking so high? Would disabling journaling (to take some load off the i/o) help, as data loss isn't something I'm worried about as the database is updated frequently regardless?
Server status: http://pastebin.com/w1ETfPWs

You are using the wrong tool for the job. MongoDB isn't designed for ranking large ladders in real time, at least not quickly.
Use something like Redis, Redis have something called a "Sorted List" designed just for this job, with it you can have 100 millions entries and still fetch the 5000000th to 5001000th at sub millisecond speed.
From the official site (Redis - Sorted sets):
Sorted sets
With sorted sets you can add, remove, or update elements in a very fast way (in a time proportional to the logarithm of the number of elements). Since elements are taken in order and not ordered afterwards, you can also get ranges by score or by rank (position) in a very fast way. Accessing the middle of a sorted set is also very fast, so you can use Sorted Sets as a smart list of non repeating elements where you can quickly access everything you need: elements in order, fast existence test, fast access to elements in the middle!
In short with sorted sets you can do a lot of tasks with great performance that are really hard to model in other kind of databases.
With Sorted Sets you can:
Take a leader board in a massive online game, where every time a new score is submitted you update it using ZADD. You can easily take the top users using ZRANGE, you can also, given an user name, return its rank in the listing using ZRANK. Using ZRANK and ZRANGE together you can show users with a score similar to a given user. All very quickly.
Sorted Sets are often used in order to index data that is stored inside Redis. For instance if you have many hashes representing users, you can use a sorted set with elements having the age of the user as the score and the ID of the user as the value. So using ZRANGEBYSCORE it will be trivial and fast to retrieve all the users with a given interval of ages.
Sorted Sets are probably the most advanced Redis data types, so take some time to check the full list of Sorted Set commands to discover what you can do with Redis!

Without seeing any disk statistics, I am of the opinion that you are saturating your disks.
This can be checked with iostat -xmt 2, and checking the %util column.
Please don't disable journalling - you will only cause more issues later down the line when your machine crashes.
Separating collections will have no effect. Separating databases may, but if you're IO bound, this will do nothing to help you.
Options
If I am correct, and your disks are saturated, adding more disks in a RAID 10 configuration will vastly help performance and durability - more so if you separate the journal off to an SSD.
Assuming that this machine is a single server, you can setup a replicaset and send your read queries there. This should help you a fair bit, but not as much as the disks.

Related

storing huge amounts of data in mongo

I am working on a front end system for a radius server.
The radius server will pass updates to the system every 180 seconds. Which means if I have about 15,000 clients that would be around 7,200,000 entries per day...Which is a lot.
I am trying to understand what the best possible way to store and retrieve this data will be. Obviously as time goes on, this will become substantial. Will MongoDB handle this? Typical document is not much, something this
{
id: 1
radiusId: uniqueId
start: 2017-01-01 14:23:23
upload: 102323
download: 1231556
}
However, there will be MANY of these records. I guess this is something similar to the way that SNMP NMS servers handle data which as far as I know they use RRD to do this.
Currently in my testing I just push every document into a single collection. So I am asking,
A) Is Mongo the right tool for the job and
B) Is there a better/more preferred/more optimal way to store the data
EDIT:
OK, so just incase someone comes across this and needs some help.
I ran it for a while in mongo, I was really not satisfied with performance. We can chalk this up to the hardware I was running on, perhaps my level of knowledge or the framework I was using. However I found a solution that works very well for me. InfluxDB pretty much handles all of this right out of the box, its a time series database which is effectively the data I am trying to store (https://github.com/influxdata/influxdb). Performance for me has been like night & day. Again, could all be my fault, just updating this.
EDIT 2:
So after a while I think I figured out why I never got the performance I was after with Mongo. I am using sailsjs as framework and it was searching by id using regex, which obviously has a huge performance hit. I will eventually try migrate back to Mongo instead of influx and see if its better.
15,000 clients updating every 180 seconds = ~83 insertions / sec. That's not a huge load even for a moderately sized DB server, especially given the very small size of the records you're inserting.
I think MongoDB will do fine with that load (also, to be honest, almost any modern SQL DB would probably be able to keep up as well). IMHO, the key points to consider are these:
Hardware: make sure you have enough RAM. This will primarily depend on how many indexes you define, and how many queries you're doing. If this is primarily a log that will rarely be read, then you won't need much RAM for your working set (although you'll need enough for your indexes). But if you're also running queries then you'll need much more resources
If you are running extensive queries, consider setting up a replica set. That way, your master server can be reserved for writing data, ensuring reliability, while your slaves can be configured to serve your queries without affecting the write reliability.
Regarding the data structure, I think that's fine, but it'll really depend on what type of queries you wish to run against it. For example, if most queries use the radiusId to reference another table and pull in a bunch of data for each record, then you might want to consider denormalizing some of that data. But again, that really depends on the queries you run.
If you're really concerned about managing the write load reliably, consider using the Mongo front-end only to manage the writes, and then dumping the data to a data warehouse backend to run queries on. You can partially do this by running a replica set like I mentioned above, but the disadvantage of a replica set is that you can't restructure the data. The data in each member of the replica set is exactly the same (hence the name, replica set :-) Oftentimes, the best structure for writing data (normalized, small records) isn't the best structure for reading data (denormalized, large records with all the info and joins you need already done). If you're running a bunch of complex queries referencing a bunch of other tables, using a true data warehouse for the querying part might be better.
As your write load increases, you may consider sharding. I'm assuming the RadiusId points to each specific server among a pool of Radius servers. You could potentially shard on that key, which would split the writes based on which server is sending the data. Thus, as you increase your radius servers, you can increase your mongo servers proportionally to maintain write reliability. However, I don't think you need to do this right away as I bet one reasonably provisioned server should be able to manage the load you've specified.
Anyway, those are my preliminary suggestions.

When's the time to create dedicated collections in MongoDB to avoid difficult queries?

I am asking a question that I assume does not have a simple black and white question but the principal of which I'm asking is clear.
Sample situation:
Lets say I have a collection of 1 million books, and I consistently want to always pull the top 100 rated.
Let's assume that I need to perform an aggregate function every time I perform this query which makes it a little expensive.
It is reasonable, that instead of running the query for every request (100-1000 a second), I would create a dedicated collection that only stores the top 100 books that gets updated every minute or so, thus instead of running a difficult query a 100 times every second, I only run it once a minute, and instead pull from a small collection of books that only holds the 100 books and that requires no query (just get everything).
That is the principal I am questioning.
Should I create a dedicated collection for EVERY query that is often
used?
Should I do it only for complicated ones?
How do I gauge which is complicated enough and which is simple enough
to leave as is?
Is there any guidelines for best practice in those types of
situations?
Is there a point where if a query runs so often and the data doesn't
change very often that I should keep the data in the server's memory
for direct access? Even if it's a lot of data? How much is too much?
Lastly,
Is there a way in MongoDB to cache results?
If so, how can I tell it to fetch the cached result, and when to regenerate the cache?
Thank you all.
Before getting to collection specifics, one does have to differentiate between "real-time data" vis-a-vis data which does not require immediate and real-time presenting of information. The rules for "real-time" systems are obviously much different.
Now to your example starting from the end. The cache of query results. The answer is not only for MongoDB. Data architects often use Redis, or memcached (or other cache systems) to hold all types of information. This though, obviously, is a function of how much memory is available to your system and the DB. You do not want to cripple the DB by giving your cache too much of available memory, and you do not want your cache to be useless by giving it too little.
In the book case, of 100 top ones, since it is certainly not a real time endeavor, it would make sense to cache the query and feed that cache out to requests. You could update the cache based upon a cron job or based upon an update flag (which you create to inform your program that the 100 have been updated) and then the system will run an $aggregate in the background.
Now to the first few points:
Should I create a dedicated collection for EVERY query that is often used?
Yes and no. It depends on the amount of data which has to be searched to $aggregate your response. And again, it also depends upon your memory limitations and btw let me add the whole server setup in terms of speed, cores and memory. MHO - cache is much better, as it avoids reading from the data all the time.
Should I do it only for complicated ones?
How do I gauge which is complicated enough and which is simple enough to leave as is?
I dont think anyone can really black and white answer to that question for your system. Is a complicated query just an $aggregate? Or is it $unwind and then a whole slew of $group etc. options following? this is really up to the dataset and how much information must actually be read and sifted and manipulated. It will effect your IO and, yes, again, the memory.
Is there a point where if a query runs so often and the data doesn't change very often that I should keep the data in the server's memory for direct access? Even if it's a lot of data? How much is too much?
See answers above this is directly connected to your other questions.
Finally:
Is there any guidelines for best practice in those types of situations?
The best you can do here is to time the procedures in your code, monitor memory usage and limits, look at the IO, study actual reads and writes on the collections.
Hope this helps.
Use a cache to store objects. For example in Redis use Redis Lists
Redis Lists are simply lists of strings, sorted by insertion order
Then set expiry to either a timeout or a specific time
Now whenever you have a miss in Redis, run the query in MongoDB and re-populate your cache. Also since cache resids in memory therefore your fetches will be extremely fast as compared to dedicated collections in MongoDB.
In addition to that, you don't have to keep have a dedicated machine, just deploy it within your application machine.

Sphinx: Real-Time Search w/Expiration?

I am designing a search that will be fed around 50 to 200 GB of text data per day (similar to logs) and it only needs to retain that data for week or two. This data will be piped at a constant rate (5,000/per second for example), non-stop, 24 hours a day. After a week or two, the document should drop out of the index never to be heard from again.
The index should be searchable with free-form text across only 1 field (pretty small in size, around 512 characters max). At most, the schema could have 2 attributes that could be categorized.
The system needs to be indexed in near real-time as data is fed to it. A delay of 15 to 30 seconds is acceptable.
We prefer to stream data into the indexer/service with a constant stream of pipe data.
Lastly, a single stand-alone solution is prefer over any type of distribution setup (this will be part of a package to deploy and setup on local machines for testers).
I'm looking closely at Sphinx search engine with RT updates via the API as it checks off most of these. But, I am not seeing an easy way to expire documents after a certain length of time.
I am aware that I could track the IDs and a timestamp and issue a batch DELETE through the Sphinx API. But, that creates an issue of tracking large amounts of IDs in a separate datastore that will need the same kind of 5,000/per second inserts and deleting them when done.
I also have a concern around Sphinx Fragmentation of mass-inserting, and mass-deleting in the middle of inserting.
We would really prefer the search engine/indexer to handle the expiration itself.
I think I can perform a WHERE timestamp < UNIXTIMESTAMP-OF-TWO-WEEEKS-AGO as the where clause in the Sphinx API in order to gather the Document IDs to delete. The problem with that is if the system does not stay ontop of the deletes, the total number of documents/search results will be in the 10s of millions, maybe even billions in count after a two week timeframe if it has to gather a few days worth of document ids to delete. That's not a feasible query.
You can actually run
DELETE FROM rt WHERE timestamp < UNIXTIMESTAMP-OF-TWO-WEEEKS-AGO
As a query to delete the old documents, which is much simpler :)
You will also need to call OPTIMIZE INDEX from time to time.
Both these will have to be called on some sort of 'cron' schedule, as they wont be run automatically.
You might be better not using Sphinxes DELETE function at all. When writing RT indexes, as soon as the RAM chunk is full its writen out as a disk chunk. So you end up with a number of disk chunks on the disk. The oldest documents will be in the oldest chunk, sequentially.
So to clear out the oldest documents, you could just dispose of the oldest chunks. (on a rolling basis)
The problem is sphinx does not include a function to delete individual chunks.
Will need to shutdown searchd, delete the chunk(s), manipulate the header files and then restart Sphinx. Not an easy process.
But in the more general sense, not sure if sphinx will be able to keep up with a continuous stream of 5,000/documents per second (even ignoreing delete for a moment) - Sphinx is generally designed for write-infrequently, read-frequently. It builds a (for the most part) monolithic inverted index. This is great for querying, but is very hard to keep updated. Its not great for incremental updates.

How does MonogoDB stack up for very large data sets where only some of the data is volatile

I'm working on a project where we periodically collect large quantities of e-mail via IMAP or POP, perform analysis on it (such as clustering into conversations, extracting important sentences etc.), and then present views via the web to the end user.
The main view will be a facebook-like profile page for each contact of the the most recent (20 or so) conversations that each of them have had from the e-mail we capture.
For us, it's important to be able to retrieve the profile page and recent 20 items frequently and quickly. We may also be frequently inserting recent e-mails into this feed. For this, document storage and MongoDB's low-cost atomic writes seem pretty attractive.
However we'll also have a LARGE volume of old e-mail conversations that won't be frequently accessed (since they won't appear in the most recent 20 items, folks will only see them if they search for them, which will be relatively rare). Furthermore, the size of this data will grow more quickly than the contact store over time.
From what I've read, MongoDB seems to more or less require the entire data set to remain in RAM, and the only way to work around this is to use virtual memory, which can carry a significant overhead. Particularly if Mongo isn't able to differentiate between the volatile data (profiles/feeds) and non-volatile data (old emails), this could end up being quite nasty (and since it seems to devolve the virtual memory allocation to the OS, I don't see how the this would be possible for Mongo to do).
It would seem that the only choices are to either (a) buy enough RAM to store everything, which is fine for the volatile data, but hardly cost efficient for capturing TB of e-mails, or (b) use virtual memory and see reads/writes on our volatile data slow to a crawl.
Is this correct, or am I missing something? Would MongoDB be a good fit for this particular problem? If so, what would the configuration look like?
MongoDB does not "require the entire data set to remain in RAM". See http://www.mongodb.org/display/DOCS/Caching for an explanation as to why/how it uses virtual memory the way it does.
It would be fine for this application. If your sorting and filtering were more complex you might, for example, want to use a Map-Reduce operation to create a collection that's "display ready" but for a simple date ordered set the existing indexes will work just fine.
MongoDB uses mmap to map documents into virtual memory (not physical RAM). Mongo does not require the entire dataset to be in RAM but you will want your 'working set' in memory (working set should be a subset of your entire dataset).
If you want to avoid mapping large amounts of email into virtual memory you could have your profile document include an array of ObjectIds that refer to the emails stored in a separate collection.
#Andrew J
Typical you need enough RAM to hold your working set, this is true for MongoDB as it is for an RDBMS. So if you want to hold the last 20 emails for all users without going to disk, then you need that much memory. If this exceed the memory on a single system, then you can use MongoDB's sharding feature to spread data across multiple machines, therefore aggregating the Memory, CPU and IO bandwidth of the machines in the cluster.
#mP
MongoDB allows you as the application developer to specify the durability of your writes, from a single node in memory to multiple nodes on disk. The choice is your depending on what your needs are and how critical the data is; not all data is created equally. In addition in MongoDB 1.8, you can specify --dur, this writes a journal file for all the writes. This further improves the durability of writes and speeds up recovery if there is a crash.
And what happens if your computer crashes to all the stuff Mongo had in memory. Im guessing that it has no logs so the answer is probably bad luck.

Cassandra random read speed

We're still evaluating Cassandra for our data store. As a very simple test, I inserted a value for 4 columns into the Keyspace1/Standard1 column family on my local machine amounting to about 100 bytes of data. Then I read it back as fast as I could by row key. I can read it back at 160,000/second. Great.
Then I put in a million similar records all with keys in the form of X.Y where X in (1..10) and Y in (1..100,000) and I queried for a random record. Performance fell to 26,000 queries per second. This is still well above the number of queries we need to support (about 1,500/sec)
Finally I put ten million records in from 1.1 up through 10.1000000 and randomly queried for one of the 10 million records. Performance is abysmal at 60 queries per second and my disk is thrashing around like crazy.
I also verified that if I ask for a subset of the data, say the 1,000 records between 3,000,000 and 3,001,000, it returns slowly at first and then as they cache, it speeds right up to 20,000 queries per second and my disk stops going crazy.
I've read all over that people are storing billions of records in Cassandra and fetching them at 5-6k per second, but I can't get anywhere near that with only 10mil records. Any idea what I'm doing wrong? Is there some setting I need to change from the defaults? I'm on an overclocked Core i7 box with 6gigs of ram so I don't think it's the machine.
Here's my code to fetch records which I'm spawning into 8 threads to ask for one value from one column via row key:
ColumnPath cp = new ColumnPath();
cp.Column_family = "Standard1";
cp.Column = utf8Encoding.GetBytes("site");
string key = (1+sRand.Next(9)) + "." + (1+sRand.Next(1000000));
ColumnOrSuperColumn logline = client.get("Keyspace1", key, cp, ConsistencyLevel.ONE);
Thanks for any insights
purely random reads is about worst-case behavior for the caching that your OS (and Cassandra if you set up key or row cache) tries to do.
if you look at contrib/py_stress in the Cassandra source distribution, it has a configurable stdev to perform random reads but with some keys hotter than others. this will be more representative of most real-world workloads.
Add more Cassandra nodes and give them lots of memory (-Xms / -Xmx). The more Cassandra instances you have, the data will be partitioned across the nodes and much more likely to be in memory or more easily accessed from disk. You'll be very limited with trying to scale a single workstation class CPU. Also, check the default -Xms/-Xmx setting. I think the default is 1GB.
It looks like you haven't got enough RAM to store all the records in memory.
If you swap to disk then you are in trouble, and performance is expected to drop significantly, especially if you are random reading.
You could also try benchmarking some other popular alternatives, like Redis or VoltDB.
VoltDB can certainly handle this level of read performance as well as writes and operates using a cluster of servers. As an in-memory solution you need to build a large enough cluster to hold all of your data in RAM.