Does concatenation improves I/O performance? - webserver

HTTP/2 was a very nice improvement, now we don't have the network overhead related to creating multiple connections to download multiple files, and is recommended to stop concatenating JS and CSS files to improve caching, but I'm not sure if from the perspective of I/O, having to read multiple files from hard drive will impact performance.
I don't know if Web servers like Apache or nginx have some RAM caching of frequent files and this avoids I/O overhead and I can stop concatenating and stop worrying about this.

No, having to read multiple files from the hard disk won't worsen performance, specially if your are using the current crop of SSD hard-drives. They have amazing performance and thanks to them network bandwidth, SSL and everything else are the bottlenecks.
Also, operating systems automatically cache in RAM recently accessed files, so the feature you would like doesn't need to be built into the web server, because it is already built into the operating system. And this works even if you are using magnetic drives.
There is one thing though. We (at shimmercat.com) have noticed that sometimes compression suffers by having multiple small files instead of a concatenated one. In that sense, it would have been nice if HTTP/2 had included a compression format like SDCH ... But this should matter more for particularly small files than for big files. All in all, I think that the cache benefits for small files are worth it.

Related

planning a virtualised environment

I'm in the early stages of planning out a virtualised environment for our production system (Moodle). The layers are relatively simple:
web - Apache 2.2
Database - MySQL 5
PHP 5.2
My question is this, what is the generally accepted approach for distributing the above layers amongst phycsical servers? In this case, we are planning to have 2 physical servers. Should I aim to keep my web server cluster on a single physical server and database cluster on another? Or, replicate a full stack on both servers, in case one fails? Any insights into this would be a great help to me.
thanks,
Cathal.
We use separate (virtual) servers, but do maintain separate stacks on each simply because the overheads are small and it allows for flexibility if we want to scale up/down. This is not for fallback however, because if one server is so broken that it's not web accessible, you probably won't be able to get data off it and onto the second server in order for it to be a useful replacement. Use proper backups for fallback and practice restoring from them regularly.
Moodle generally blocks on the PHP side rather than the DB side and we see roughly 3.5:1 PHP:MySQL CPU loads when they are on separate machines. With that in mind, you need to consider what the maximum capacity of one server is: you will get best performance if you have no network overhead between the machines at all, so bigger is better. If you can't do it with one, then making 2 VMS, one larger for PHP and one smaller for MySQL is the best option, but do benchmark the differences under load for your particular setup (use Apache JMeter for this).
Our largest installs involve 70,000 users or so and we have two 4-CPU/8GB VMs, one for PHP and one for MySQL (although the DB one rarely goes above 30% CPU). This allows for about 400 concurrent connections via Apache. However, we are using a large farm of VMs and can scale up and down between 2 and 16 CPUs easily, so you may wish to consider one monster machine if you want flexibility.
For more information on Moodle performance, look here, particularly under 'scalability'.

The benefits of deploying multiple instances for serving/data/cache

although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
Pros:
faster speed between database and page servlet (same host).
cheaper.
less end points to secure.
Cons:
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
Pros:
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
Cons:
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.

Download multiple files vs single one big file&unzip via socket

I need my client to download 30Mb worth of files.
Following is the setup.
They are comprised of 3000 small files.
They are downloaded through tcp bsd socket.
They are stored in client's DB as they get downloaded.
Server can store all necessary files in memory.(no file access on server side)
I've not seen many cases where client downloads such large number of files which I suspect due to server side's file access.
I'm also worried if multiplexer(select/epoll) will be overwhelmed by excessive network request handling.(Do I need to worry about this?)
With the above suspicions, I zipped up 3000 files to 30 files.
(overall size doesn't change much because the files are already compressed files(png))
Test shows,
3000 files downloading is 25% faster than 30files downloading & unzipping.
I suspect it's because client device's is unable to download while unzipping & inserting into DB, I'm testing on handheld devices.. iPhone..
(I've threaded unzipping+DB operation separate from networking code, but DB operation seems to take over the whole system. I profiled a bit, and unzipping doesn't take long, DB insertion does. On server-side, files are zipped and placed in memory beforehand.)
I'm contemplating on switching back to 3000 files downloading because it's faster for clients.
I wonder what other experienced network people will say over the two strategies,
1. many small data
2. small number of big data & unzipping.
EDIT
For experienced iphone developers, I'm threading out the DB operation using NSOperationQueue.
Does NSOperationQueue actually threads out well?
I'm very suspicious on its performance.
-- I tried posix thread, no significant difference..
I'm answering my own question.
It turned out that inserting many images into sqlite DB at once in a client takes long time, as a result, network packet in transit is not delivered to client fast enough.
http://www.sqlite.org/faq.html#q19
After I adopted the suggestion in the faq to speed up "many insert", it actually outperforms the "many files download individually strategy".

Perl, which is the best module to cache data for a web application?

I need some method to cache data fetched (or pre-fetched) from the internet in a web service written in Perl.
Data collected is parsed to XML format.
My goal is to decrease response time of my service to user requests.
I see there's a bunch of modules available: which is the most performing in terms of usability and efficiency (in that order of importance) ?
UPDATE:
I'm evaluating CHI, Cache, and Cache::Cache (which I suppose is quite outmoded...)
I'm in a mod_perl2 Linux environment: in this case, does a memory cache make any sense?
CHI is a good base, and can do most caching you are likely to need, with the back-end chosen according to your need. We used to use Cache::Cache, and on occasion even Memoize, but CHI appeared best in the end, and has certainly superseded Cache::Cache.
Now for the caching back-end system. This will depend a lot on your web setup, and even on your operating system and hardware. If you have a lot of memory, and/or are running a small number of server processes, a memory-based cache is solid and easy. If you are running many processes and still have decent memory, you probably want a shared memory cache (and this will work fine on Linux and UNIX systems, but might not be supportable on Windows). If you have less memory, a file-based cache will work, but it'll be slower.
However, if you are only prefetching, you might just as well use a database. And even if the caching is mainly there to avoid network requests, an SQL database will still provide the performance improvement you need. There's also a CHI driver for that at: CHI::Driver::DBI. Most databases do very decent caching anyway.
UPDATE:
Cache and Cache::Cache are both pretty old. CHI has been designed to replace Cache::Cache. Of these, CHI is definitely the best.
Yes, even in mod_perl, a memory cache can be useful, but you can't afford it to become too large. mod_perl embeds Perl in the server processes, so each server process will have its own cache. This reduces effectiveness, but if you have a small number of things to cache, and they are used often, it is still worth using.
Assuming you're on a UNIX-type system, a shared memory cache or a database cache will likely be your best bet, depending on how many things you are likely to need to cache, and their size. If your cache will become huge, and if network latency is the main part of the problem, a database cache backend will be fine. If the cache will always be relatively small, memory or shared memory should suit.

Why can't we build virtualization that runs on several machines? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
You're probably familiar with virtualization which takes a single host and is able to "emulate" many instances by sharing the resources among them all. You probably heard about XEN.
Is it completely insane to imagine the "opposite" of XEN : a layer that would abstract several hosts in a single running instance? I believe this would allow building apps which wouldn't need to really care much about a "clustering" layer themselves.
I wonder what are the technical limits to this, because I'm pretty sure some people are already working on it somewhere :)
The goal is NOT to achieve any kind of failure recovery. I believe this can (and should?) be handled at a higher level. For example, if someone is able to run a MySQL server on a gigantic instance (made of say 50 hosts), then one can easily use MySQL's replication features to replicate the database over a similar virtual instance.
Good question. Microsoft Azure is attempting to address this by allowing you to put applications "in the cloud" and not have to be as concerned with scalability up/down, redundancy, data storage, etc. But this is not accomplished at the hypervisor level.
http://www.microsoft.com/windowsazure/
Hardware-wise, there are some downsides to having everything be one big VM rather than many smaller ones. For one thing, software doesn't always understand how to handle all the resources. For example, some applications still can't handle multiple processor cores. I've seen informal benchmarks showing that IIS performs better spreading the same resources over multiple instances rather than one giant instance.
From a management perspective, it is probably better to have multiple VMs in certain cases. Imagine that a bad deployment corrupts a node. If that were your one and only (albeit giant) node, now your whole application is down.
You're probably talking about the concept Single System Image.
There used to be a Linux implementation, openMosix that since closed down. I don't know of any replacements. openMosix made it very easy to create and use SSI on a standard Linux kernel; too bad it got overtaken by events.
I do not know enough about Xen to know if it is possible but with VMware you can create pools of resources which come from many physical hosts. Then you can assign the resources to your VMs. That could be many VMs or just one VM.
Aggregation: Transform Isolated Resources into Shared Pools
Simulating a single core over multiple physical cores is very inefficient. You can do it, but it'll be slower than a cluster. Two physical cores can talk to each other in near-real-time, if they're on separate machines then you're doing something like say clocking down your motherboard speed by factors of 10 or more if these two physical cores (and RAM) are communicating even over a fibre optic network.
Dual cores can communicate faster than two distinct CPUs on the same motherboard, if they are on separate machines, thats slower again, if there are multiple machines, slower even again.
Basically you can, but there is net performance loss compared to the net performance gain you would be hoping to achieve.
Real life example, I had a bunch of VMs on a dual quad core server (~2.5Ghz/core) performing way, way below what they should have been. On closer inspection, it turned out that the hypervisor was emulating a single 3.5-4Ghz core when the load on an individual VM was more than 2.5Ghz -- after limiting each VM to 2.5Ghz performance went back to what was expected.
I agree with saidimu, you are talking about the Single System Image concept. In addition to the OpenMosix project, there have been several commercial implementations of the same idea (one contemporary example is ScaleMP). It's not a new idea.
I just wanted to elaborate on some of the technical points of SSI.
Basically, the reason it's not done is because the performance is generally absolutely unpredictable or terrible. There is a concept in computer systems known as [NUMA][3], which basically means that the cost of accessing different pieces of memory is not uniform. This can apply to huge systems where CPUs may have some memory accesses routed around to different chips, or in cases where memory is accessed remotely over a network (such as in SSI). Typically, the operating system will attempt to compensate for this by laying out programs and data in memory in such a way that a program can run as quickly as possible. I.e., the code and data will all be placed in the same NUMA "region", and be scheduled on the closest possible CPU.
However, in cases where you are running big applications (attempting to use all the memory in your SSI), there is little the operating system can do to reduce the impact of remote memory fetches. MySQL is not aware that accessing page 0x1f3c will cost 8 nanoseconds, while accessing page 0x7f46 will stall it for hundreds of microseconds, possibly milliseconds while the memory is fetched over the network. This means that non-NUMA aware applications will run like crap (seriously, very bad) in this kind of environment. As far as I know, most comtemporary SSI products rely on the fastest possible interconnects (such as Infiniband) between machines to achieve even a passable performance.
This is also why frameworks that expose the true cost of accessing data to the programmer (such as MPI: message passing interface) have achieved more traction than SSI or DSM (distributed shared memory) approaches. In fact, there is basically no way for a programmer to optimize an application to run in an SSI environment, which just sucks.