Huge Mongo datasets. How much RAM do I need and how to not get ruined by paying for hosting? - mongodb

So, I have a what I call huge mongo database which is about 30Gb (about 30 millions documents). I tried to run mongod on the server shared with another application and it was completely slowed down. So I have to look for a dedicated server but have no idea how much RAM do I need.
I understand that I probably need to have amount of RAM enough to put all indexes there. But, if I'm correct, it would be about 13Gb of RAM which makes the price for the server very-very expensive (my app isn't making any money yet).
I tried to look into mongoHQ, but their cheapest dedicated plan is $600/month.
Any ideas? Is it really that expensive to host heavy mongo databases like that?

Build your own server and colocate it instead of renting someone's server. You have full control over the hardware, higher startup costs, but lower long-term costs. You are also liable for hardware malfunctions, so watch out for that.

Related

filemaker 15 Pro server preformance

I creating a database in Filemaker, the database is about 1GB and includes around 500 photos.
Filemaker maker server is having performance issues, its crashes and takes it’s time when searching though the database. My IT department recommended to raise the cache memory.
I raised the memory 252MB but it's still struggling to give a consistent performance. The database shows now peaks in the CPU.
What can cause this problem?
Verify at FileMaker.com that your server meets the minimum requirements for your version.
For starters:
Increase the cache to 50% of the total memory available to FileMaker server.
Verify that the hard disk is unfragmented and has plenty of free space.
FM Server should be extremely stable.
FMS only does two things:
reads data from the disk and sends it to the network
takes data from the network and writes it to the disk
Performance bottlenecks are always disk and network. FMS is relatively easy on CPU and RAM unless Web Direct is being used.
Things to check:
Are users connecting through ethernet or wifi? (Wifi is slow and unreliable.)
Is FMS running in a virtual machine?
Is the machine running a supported operating system?
Is the database using Web Direct? (Use a 2-machine deployment for web direct.)
Is there anything else running on the machine? (Disable virus and indexing.)
Make sure users are accessing the live databases through FMP client and not through file sharing.
How are the database being backed up? NEVER let anything other than FMS see the live files. Only let OS-level backup processes see backup copies, never the live files.
Make sure all the energy saving options on the server are DISABLED. You do NOT want the CPU or disks sleeping or powering down.
Put the server onto an uninterruptible power supply (UPS). Bad power could be causing problems.

Optimized environment for mongo

I have my RHEL linux server(VM) running a 4core processor and 8GB ram running the below applications
- an Apache Karaf container
- an Apache tomcat server
- an ActiveMQ server
- and the mongod server(either primary of secondary).
Often I see that mongo consumes nearly 80% of cpu. Now I see that my cpu and memory is overshooting most of the time and this has caused me to doubt whether my hardware config is too low for running these many components.
Please let me know if it is ok to run mongo like this on a shared server..
The question is to broad and the answer depends on too many variables, but I'll try to give you overall sense of it.
Can you use all these services together on the same machine at a minimum load? - for sure. It's not clear where other shards reside though, but it will work either way. You didn't provide your HDD specs which is quite important for a DB server, but again it will work at a minimum load.
Can you use this setup under heavy load - not the best idea. Perhaps it's better to have separate servers handling these services.
Monitor overall server load like: CPU, memory, IO. Check mongo logs for slow queries. If your queries supposed to run fast and they don't, you'll need more hardware.
Nobody would be really able to tell you how much load a specific server configuration can handle. You need at least 512Mb RAM and 1 CPU to get going these days but very soon you hit the limits. It all depends on how many users you have, what kinds of queries they run and how much data they cover.
Can you run MongoDB along other applications on a single server? Well it would appear that if you are having memory issues or CPU issues in your current configuration then you will likely need to address something. But "Can You?", well if it is not going to affect you then of course you can.
Should you, do this? Most people would firmly agree that you should not, and that would also stand for most of the other applications you are running on the one machine.
There are various reasons, process isolation, resource allocation, security, and far too many for a short topic response to go into why you should not have this kind of configuration. And certainly where it becomes a problem you should be addressing the issue by seeking a new configuration.
For Mongo alone, most people would not think twice about running their SQL database on dedicated hardware. The choice for Mongo should likely be no different.
Have also suggested this be moved to ServerFault, as it is not a programming question suited to stack overflow.

The benefits of deploying multiple instances for serving/data/cache

although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
Pros:
faster speed between database and page servlet (same host).
cheaper.
less end points to secure.
Cons:
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
Pros:
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
Cons:
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.

Perl, which is the best module to cache data for a web application?

I need some method to cache data fetched (or pre-fetched) from the internet in a web service written in Perl.
Data collected is parsed to XML format.
My goal is to decrease response time of my service to user requests.
I see there's a bunch of modules available: which is the most performing in terms of usability and efficiency (in that order of importance) ?
UPDATE:
I'm evaluating CHI, Cache, and Cache::Cache (which I suppose is quite outmoded...)
I'm in a mod_perl2 Linux environment: in this case, does a memory cache make any sense?
CHI is a good base, and can do most caching you are likely to need, with the back-end chosen according to your need. We used to use Cache::Cache, and on occasion even Memoize, but CHI appeared best in the end, and has certainly superseded Cache::Cache.
Now for the caching back-end system. This will depend a lot on your web setup, and even on your operating system and hardware. If you have a lot of memory, and/or are running a small number of server processes, a memory-based cache is solid and easy. If you are running many processes and still have decent memory, you probably want a shared memory cache (and this will work fine on Linux and UNIX systems, but might not be supportable on Windows). If you have less memory, a file-based cache will work, but it'll be slower.
However, if you are only prefetching, you might just as well use a database. And even if the caching is mainly there to avoid network requests, an SQL database will still provide the performance improvement you need. There's also a CHI driver for that at: CHI::Driver::DBI. Most databases do very decent caching anyway.
UPDATE:
Cache and Cache::Cache are both pretty old. CHI has been designed to replace Cache::Cache. Of these, CHI is definitely the best.
Yes, even in mod_perl, a memory cache can be useful, but you can't afford it to become too large. mod_perl embeds Perl in the server processes, so each server process will have its own cache. This reduces effectiveness, but if you have a small number of things to cache, and they are used often, it is still worth using.
Assuming you're on a UNIX-type system, a shared memory cache or a database cache will likely be your best bet, depending on how many things you are likely to need to cache, and their size. If your cache will become huge, and if network latency is the main part of the problem, a database cache backend will be fine. If the cache will always be relatively small, memory or shared memory should suit.

Separated servers or virtualization, which is more efficient?

My company is hosting a few separate, but related, moderately hit, web sites. Accordingly, a production database server, staging database server, production web server, staging web server, etc are needed. My question is, should we invest in physically separate servers for each of our needs, or should we put that money together and invest in a much higher end server and virtualize all of the aforementioned servers? Which route would you guys decide on, and why?
That depends on a lot of things, here are the main considerations.
If you have a lot of servers with low to moderate usage, virtualization should generally save you money on hardware, power, and floorspace. There is a tipping point, however, based on the overhead of the VM layer itself. Honestly, you will have to experiment to find the right cost/performance balance on this. I am sure the VM vendors would be happy to help you with the math.
The downside is that virtualization creates a single point of failure. If that box fails, you have downtime for all of your servers. Having them separate makes it far less likely for everything to take a dive at once.
You certainly want physical separation between the development and the production servers. You shouldn't ever have to worry that something you do in dev could bring down the machine on which production is hosted. And, there are some problems in development that really require either a hard reset of the physical machine or a ludicrous work-around to avoid a hard reset.
As for production web server and production database, you're not really introducing any new points of failure by virtualizing them on the same machine, particularly if you can colocate a static version of the site on another server. For any modern web site of even moderate complexity, database failure is web site failure anyway.
From my experience, for low or moderate usage a VM is the way to go - if you get just one very powerful server instead on several moderately powerful servers you save money, power and space and make the application faster at the same time because it's running on faster hardware.
A VM also have same another nice advantages, if the server hardware fails you can load the same VM on different physical hardware and continue running like nothing happened (you do have full backups, don't you?) and you can take a copy of the actual production server and run it in isolation on a development machine to debug those annoying bug that only appear in production.
But I would put the development (and maybe testing) servers on a different physical machine than production, you need to make sure no matter what stupid mistake you made in development it wouldn't take down the production server.