Docker instead of multiple VMs - virtualization

So we have around 8 VMs running on a 32 GB RAM and 8 Physical core server. Six of them run a mail server each(Zimbra), two of them run multiple web applications. The load on the servers are very high primarily because of heavy load on each VMs.
We recently came across Docker. It seems to be a cool idea to create containers of applications. Do you think it's a viable idea to run applications of each of these VMs inside 8 Docker Containers. Currently the server is heavily utilized because multiple VMs have serious I/O issues.
Or can docker be utilized in cases where we are only running web applications, and not email or any other infra apps. Do advise...

Docker will certainly alleviate your server's CPU load, removing the overhead from the hypervisor's with that aspect.
Regarding I/O, my tests revealed that Docker has its own overhead on I/O, due to how AUFS (or lately device mapper) works. In that front you will still gain some benefits over the hypervisor's I/O overhead, but not bare-metal performance on I/O. My observations, for my own needs, pointed that Docker was not "bare-metal performance like" when dealing with intense I/O services.

Have you thought about adding more RAM. 64GB or more? For a large zimbra deployment 4GB per VM may not be enough. Zimbra like all messaging and collaboration systems, is an IO bound application.
Having zmdiaglog (/opt/zimbra/libexec/zmdiaglog) data to see if you are allocating memory correctly would help. as per here;
http://wiki.zimbra.com/wiki/Performance_Tuning_Guidelines_for_Large_Deployments#Memory_Allocation

Related

Tomcat in k8s pod and db in cloud - slow connection

I have tomcat, zookeeper and kafka deployled in local k8s(kind) cluster. The database is remote i.e. in cloud. The pages load very slowly.
But when i moved tomcat outside of the pod and started manually with zk and kafka in local k8s cluster and db in remote cloud the pages are loading fine.
Why is Tomcat very slow when inside a Kubernetes pod?
In theory, a program running in a container can run as fast as a program running on the host machine.
In practice, there are many things that can affect the performance.
When running on Windows or macOS (for instance with Docker Desktop), container doesn't run directly on the machine, but in a small Linux virtual machine. This VM will add a bit of overhead, and it might not have as much CPU and RAM as the host environment. One way to have a look at the resource usage of containers is to use docker stats; or docker run -ti --pid host alpine and then use classic UNIX tools like free, top, vmstat, ... to see the resource usage in the VM.
In most environments (at least with Docker, and with Kubernetes clusters in their most common default configurations), containers run without resource constraints and limits. However, it is fairly common (and, in fact, highly recommended!) to set resource requests and limits when running containers on Kubernetes. You can check resource limits of a pod with kubectl describe. If metrics-server is installed (which is recommended, even on dev/staging environments), you can check resource usage with kubectl top. Tools like k9s will show you resource requests, limits, and usage in a comprehensive way (as long as the data is available; i.e. you still need to install metrics-server to obtain pod metrics, for instance).
In addition to the VM overhead described above, if the container does a lot of I/O (whether it's disk or network), there might be a bit of overhead in comparison to a native process. This can become noticeable if the container writes on the container copy-on-write filesystem (instead of a volume), especially when using the device-mapper storage driver.
Applications that use "live reload" techniques (that automatically rebuild or restart when source code is edited) are particularly prone to this I/O issue, because there are unfortunately no efficient methods to watch file modifications across a virtual machine boundary. This means that many web frameworks exhibit extreme performance degradations when running in containers on Mac or Windows when the source code is mounted to the container.
In addition to these factors, there can be other subtle differences that might affect the overall performance of a containerized application. When observing performance issues, it is very helpful to use a profiler (or some kind of APM solution) to see which parts of the code take longer to execute. If no profiler or APM is available, try to execute individual portions of the code independently to compare their performance. For instance, have a small piece of code that executes a single query to the database; or executes a single task from a job queue, etc.
Good luck!

Running MongoDB and Redis on two different containers in the same host machine

I have read somewhere that MongoDB and Redis server shouldn't be executed in the same host because the way that Redis manages the memory damages MongoDb. This is before Docker.io. But now thing seems are pretty different or not? Is is convenient running Redis server and MongoDB on two different containers on the same host machine?
Docker does not change your hardware, also it is the OS that deals with resources which is not virtualized so the same rules as a normal hardware should apply here.
RAM
MongoDB and Redis don't share any memory. The problem of using the same host will be that you can run out of RAM with these two processes, you can put a max size for redis, you can probably do the same for MongoDB, it is mandatory.
If your sizing is good (MongoDB RAM + Redis RAM < Hardware RAM), you won't get any swap on disk for redis (which is absolutely what you want to prevent) but maybe mongodb cache won't be as good (not enough place for optimization). Less memory for redis is always a challenge if your data grows: beware of out of memory if the data size is unpredictable!
If you use backups with redis, it uses more RAM than its dataset to produce the dump, so beware of that. It implies also using IO.
IO
In this case (less RAM) mongo will do a lot more of IO to access data. Redis, depending on your backup policy, can use IO or not (your choice). Worst case: if you use AOF on redis, it is a lot of IO so maybe IO can become a bottleneck in this architecture. If you don't use backups with redis: you won't have problems. Also a SSD is a good choice for Mongo.
CPU
I don't know if MongoDB uses a lot of CPU, but redis most of the time does not except during backups. If you use backups with redis: try to have two CPU cores available for it (one for redis, one for backup task).
Network
It depends on your number of clients. But you should check the throughput / input load of your machine to see if you are not saturating (using monit for instance with alerts). Sometimes it is the bottleneck, not enought throughput in one machine!
Many of today's services, in particular Databases, are very aggressive consuming resources and are designed thinking they will (or should) be executed in a dedicated machine for them. MongoDB and Redis try to keep a lot of data in memory and will try to take the more memory they can for themselves. To avoid this services take all the memory of your host machine you can limit the maximum memory used by a container using -m="<number><optional unit>" in docker run. E.g.: docker run -d -m="2g" -p 27017:27017 --name mongodb dockerfile/mongodb
So you can control in an easy way the resource limits of your services, and run them in the same host with a fine grained control of the resources. Anyway it's important to consider that the performance of these services is designed thought that the resources of the host machine will be fully available for them. For example there are other databases as Cassandra that will consume a lot of memory, and furthermore, are designed to have sequential access writing to disk. In these cases Docker will let you to run limiting the resources used, but if you run multiple services in the same host the performance of them will decrease severely.

Are there major downsides to running MongoDB locally on VPS vs on MongoLab?

I have an account with MongoLab for MongoDB and the constant calls to this remote server from my app slow it down quite a lot. When I run the app locally on my computer with a local version of Mongod and MongoDB it's far, far faster, as would be expected.
When I deploy my app (running on Node/Express) it will be run from a VPS on CentOS. I have plenty of storage space available on my VPS, are there any major downsides to running MongoDB locally rather than remotely on Mongolab?
Specs of the VPS:
1024MB RAM
1024MB VSwap
4 CPU Cores # 3.3GHz+
60GB SSD space
1Gbps Port
3000GB Bandwidth
Nothing apart from the obvious:
You will have to worry about storage yourself. MongoDB does tend to take a lot of disk space. upgrading storage will probably be harder to manage than letting Mongolab take care of it.
You will have to worry about making sure the Mongo server doesn't crash and it's running fine.
You will have scaling issues in the future once the load on your application and your database increases.
Using a "database-as-a-service" like Mongolab frees you from worrying about a lot of hardware/OS/system level requirements and configuration. Memory optimization? Which file system? Connection limits? Visualization and IO ops issues? (thanks to Nikolay for pointing that one out)
If your VPS provider doesn't account for local traffic in your bandwidth, then you can set up another VPS for MongoDB. That way, the server will be closer so the requests will be faster, and also, it will have the benefits of being an independent server. It still won't be fully managed like MongoLab though.
[ Edit: As Chris pointed out, MongoLab also helps you with your database schema design and bundles MongoDB support with their plans, so that's also nice. ]
Also, this is a good question, but probably not appropriate for StackOverflow. dba.stackexchange.com and serverfault.com are good places for this question.

Put memcached on db or web server instance?

For my Drupal-based site, I have an architecture with 3 instances running nginx, postgresql, & solr, respectively. I'd like to install Memcached. Should I put it on the nginx or postgresql server? What are the performance implications?
Memcached is very light on CPU usage, so it is a great candidate to gobble up spare web server RAM. Also, you will scale out you web tier much more than your other tiers, and Memcached clustering can pool that RAM together into one logical cache.
If you have any spare RAM on the DB, it is almost always best for performance to let the DB gobble it up.
TL;DR Let DB have all of the RAM, colocate memcached on web tier.
Source: http://code.google.com/p/memcached/wiki/NewHardware
The best is to have a separate server (if you can do that).
Otherwise, it depends on your servers CPU & memory utilization and availability requirements. In general I would avoid running anything extra on a DB server machine...since DB is the foundation of the system and has to be available and performing well.
if your Solr server does not have high traffic an don't utilize much memory I'd put it in there. Memcached servers known to be light on CPU. Also you should estimate how much memory memcached instance will need...to make sure its enough on the server.

Running JIRA on a VM

Anyone have any success or failure running Jira on a VM?
I am setting up a new source control and defect tracking server. My server room is near full and my services group suggested a VM. I saw that a bunch of people are running SVN on VM (including NCSA). The VM would also free me from hardware problems and give me high availability. Finally, it frees me from some red tape and it can be implemented faster.
So, does anyone know of any reason why I shouldn't put Jira on a VM?
Thanks
We just did the research for this, this is what we found:
If you are planning to have a small number of projects (10-20) with 1,000 to 5,000 issues in total and about 100-200 users, a recent server (2.8+GHz CPU) with 256-512MB of available RAM should cater for your needs.
If you are planning for a greater number of issues and users, adding more memory will help. We have reports that allocating 1GB of RAM to JIRA is sufficient for 100,000 issues.
For reference, Atlassian's JIRA site (http://jira.atlassian.com/) has over 33,000 issues and over 30,000 user accounts. The system runs on a 64bit Quad processor. The server has 4 GB of memory with 1 GB dedicated to JIRA.
For our installation (<10000 issues, <20 concurrent sessions at a time) we use very little server resources (<1GB Ram, running on a quad-core processor we typically use <5% with <30% peak), and VM didn't impact performance in any measurable ammount.
I don't see why you shouldn't run jira off a vm - but jira needs a good amount of resources, and if your vm resides on a heavily loaded machine, it may exhibit poor performance. Why not log a support request (support.atlassian.com) and ask?
We run Jira on a virtual machine - VMWare running Windows Server 2003 SE and storing data on our SQL Server 2000 server. No problems, works well.
My company moved our JIRA instance from a hosted physical server to an Amazon EC2 instance recently, and everything is holding up pretty well. We're using an m1.large instance (64-bit o/s with 4 virtual cores and 8GB RAM), but that's way more than we need just for JIRA; we're also hosting Confluence and our corporate Web site on the same EC2 instance.
Note that we are a relatively small outfit; our JIRA instance has 25 users (with maybe 15 of them active) and about 1000 JIRA issues so far.
We run our JIRA (and other Atlassian apps) instance on Linux-based VM instances. Everything run very nicely.
Disk access speed with JIRA on VM...
http://confluence.atlassian.com/display/JIRA/Testing+Disk+Access+Speed
I'm wondering if the person who is using JIRA with VM (Chris Latta) is running ESX underneath - that may be faster than a windows host.
I have managed to run Jira, Bamboo, and FishEye from a set of virtual machines all hosted from the same server. Although I would not recommend this setup for production in most shops. Jira has fairly low requirements by today's standards. Just be sure you can allow enough resources from your host machine things should run fine.
If, by VM, you mean a virtual instance of an OS, such as an instance of linux running on Xen, VMWare, or even Amazon EC2, then Jira will run just fine. The only time you need to worry about virtual systems is if you're doing something that depends on hardware, such as running graphical 3D apps, or say something that uses a fax modem or a Digium telephony card with Asterisk.