What is the best way to deploy a Pylons app? - deployment

There are many ways to deploy Pylons apps.
- Proxying through apache or nginx to paste
- Embedding the app with mod_wsgi
- using some edgy nginx+uwsgi combo
- and probably more...
I've read a lot about the various approaches but failed to really decide which one to choose.
Proxying to paste through nginx seems to be the easiest method to setup, but is it efficient? Wouldn't paste be slower than mod_wsgi or uswgi? If so, is the performance increase worth the hassle?
Need some experts to help me choose the best compromise...
I want simplicity, but I need decent (if not cutting edge) performance, and you, Obiwan Kenobi, are my only hope ;)

If performance is most important, look at some tests:
http://wiki.pylonshq.com/display/pylonscookbook/Some+performance+test+results

What I meant to say is that if the application is more framework dependent than static content dependent, the limiting factor would be the webserver -> framework and I've found negligible differences in the performance of nginx -> uwsgi -> pylons and apache2/mpm-worker -> mod_wsgi -> pylons as the limiting factor is Pylons. This isn't to mean that Pylons is slow.
No matter which deployment method I used with repoze.who/what, I found it difficult to scale past 280 requests per second per CPU core.
#mkucharz, As for those performance results, those results are three years old and don't even come close to configurations that exist today. Pylons 1.0 is about 10% faster than 0.9, flup is much more mature, and that doesn't test uwsgi or mod_wsgi. It also uses Mighty rather than Mako, also pointing to the test's age.
The other hidden variables include the version of Python. In some distributions, I've found Python 2.5 to be a little faster than Python 2.6 depending on what the application does.
Disclaimers:
Pylons is not slow.
mod_wsgi and uwsgi performance differences are negligible in production settings.
Nginx's static file performance is better than apache.
Apache/mpm-worker is much faster than mod-prefork if mod_php isn't needed.
Almost any deployment that you understand is probably enough for 99% of the webapps out there.
99% of the published benchmarks don't properly test an environment. Hitting a page 10000 times is not indicative of real world performance.
Trying to be helpful when posting late at night never works. I knew when I saw this come up on tweetdeck I should have just said nothing.

The best answer is, it depends.
From a pure simplicity standpoint, apache2/mod_wsgi is probably the easiest to manage since you have a much larger pool of people that understand apache.
From a performance standpoint, it depends.
If your application is very framework heavy and not very static content (css, images) intensive, the gateway between the webserver and pylons is more likely your bottleneck and almost any deployment can handle that.
Paste is fairly quick. I found nginx/uwsgi's interface to be slightly quicker than apache2/mod_wsgi. nginx's static file performance and memory requirements favor nginx as well.
There are a few sites I've come across that talk about both:
tonylandis.com/python/deployment-howt-pylons-nginx-and-uwsgi/
cd34.com/blog/programming/python/pylons-and-facebook-application-layout/
code.google.com/p/modwsgi/wiki/IntegrationWithPylons
The comparisons I've done are with apache2/mpm-worker rather than mpm-prefork as I didn't need mod_php5 in my setup.

Related

Standalone WSGI servers for production

I've been learning about WSGI for REST APIs in python. I've got a working setup with Lighttpd+FastCGI.
However this path will be dedicated to serving the API - Static content will be delivered via a Content Delivery network and any web sites can be set up as REST clients to the API.
There are far too many Python WGSI servers. Seems like besides the one built into Python, every WSGI module, framework, any my dog includes one, and these almost universally comes with a "Use it for development, but you may want to use a proper production quality WSGI stack".
Python Paste looks promising, but is it really stable, and does it duplicate too much of my existing web.py+army-of-modules framework?
My primary criteria is:
Stability. I want something I can pretty much configure and not worry about.
Security. Don't introduce security holes.
Performance: Should perform well enough. I certainly don't want it to be the bottleneck in my implementation, but I see benchmarks showing that WSGI servers handles many hundreds of requests per second so as long as the WSGI server is not abnormally slow I don't expect this to be an issue.
What other aspects of the WSGI server do I need to be concerned about in a high-volume environment?
I've seen Gunicorn used in pretty important production environments, so that would probably be your best choice. I can also do a shameless plug here for netius, which is a Python network library that can be used for the rapid creation of asynchronous non-blocking servers and clients. It has no dependencies, it's cross-platform, and brings some sample netius-powered servers out of the box, namely a production-ready WSGI server. I can't recommend that project for having been used by a lot of people, even though we use it for a mission-critical SaaS service of ours with significant load, but the only advantage to you in particular is that the codebase is small, strictly structured, and extensively commented, so you can easily audit it for security yourself.

Experiences with db4o in production environment

We are planning to migrate from Prevayler (http://prevayler.org/) to db4o (http://www.db4o.com/), so we wanted to know experiences, pros and cons, and best practices to move forward. What do you think about it? Is it a good solution? Or, maybe moving forward with a NoSQL standard solution would be better? (Such as MongoDB or CouchDB). Thanks!
we use db4o as main db in our production environment (both embedded and client/server), so i am going to share some of my experiences.
Pro:
- very easy for development (you just implememt data classes)
- support both embedded/client server under the same interface, which makes it easy to unittest
- decent performance for small projects
Cons:
- db4o is no longer developed so it's quite dead project, and you wont get much of support for it
- [client/server] everytime you change model you need to redeploy server (not talking about the fact that you need to host server app yourself)
- [client/server] performance degrade with more clients connected - not possible to scale
Summary: db4o is very good as embedded db (mobile app, desktop local db), but if it comes to server application you get into troubles
Given that I did not receive so much feedback, we gave it a try. So far, it seemed to be a good option for a embedded database, that makes much easier the deployment. So, we wrote again the whole persistence layer, with their unit tests and seemed to work fine.
Then, we tried with real data, and we start to have some weird Null Pointers, and we did not know why. Then, we started to read and we found this issue: http://www.gamlor.info/wordpress/2009/09/db4o-activation-update-depth/.
We've been trying to solve for a few hours, but then we decided no to spend more time on it, and found another way. CouchDB, OrientDB or MongoDB are still on our list.

JRuby/TorqueBox for high performance / mission critical application

We are evaluating few options for developing a telecommunication related application platform (and migrating/consolidating some of the standalone apps into the new platform). One of our main concerns is the ability to handle high volume of requests during peak hours.
We feel the TorqueBox seems to an interesting solution worthy of consideration because:
Speed (Next to pure Java performance)
Faster development time over Java
Maintainability
Support for threads/concurrency even though it's Ruby
Faster/Easier front end development with Rails
...
RedHat supported and runs on JBoss (scalability, future development and ability to call Java if necessary)
Has anyone developed/deployed similar application(s) with JRuby/TorqueBox?
Any serious performance bottlenecks ahead? (or why we shouldn't use JRuby and should stick with Java?)
The answer is YES but be aware of memory leaks (gems, threadsafety issues, etc). You have to be familiar with tools like VisualVM, Eclipse MAT and/or NewRelic.
We're successfully using Torquebox on production for some clients on amazon EC2 handling 60k-80k visits per day (new c3 instances are great for Java).
Deployment is also an issue. We're unable to setup any kind of rolling restart because of memory consumption. So every time we deploy using Capistrano a full JBoss restart is needed (no big issue for us).
Bests,
Antonio
yes any mature Java web-server with JRuby is a valuable option. the details of handling high-loads on peak hours will really depend on what kind of app you'll be running, how much "hardware" can you afford to use but in general it's achievable but be aware there might still be some "gotchas" e.g. Ruby libraries (gems) that do not handle thread-safety well. you simply need to understand how to proceed than - which seems you do since you're want to use 'Celluloid.IO' :)

Caching solutions for multi-webserver configuration?

I am looking into caching solutions, for a multi webserver configuration. Thought of memcached as being cheap (free) and proven over the years. Microsoft is also developing a caching solution for webfarms, called Velocity, but this is still in CTP2.
There is a distributed caching model used in the configuration service that is part of the .NET Stocktrader sample application. This is a framework that allows you to run multiple nodes with centralised configuration management, load balancing and distributed caching. You can implement the configuration service as is or look through the code and grab what suits you. Worth a look.
When I listened to Scott Hanselman's podcast interview with the StackOverflow team, I was left with the impressions that a. they did use some kind of caching and b. they knew almost nothing about what they were doing in this respect and had fiddled with a few options and then written a blog post or two.
They currently seem to use client-side caching rather half-heartedly (short expiry times on images, for example), and I think they use a lot of ASP.NET user-mode caching, and I can't tell if they use IIS kernel-mode caching. (They didn't seem to be able to tell Scott that, either.)
However, the podcast was a while back, and I was driving at the time, so my memory might be wrong and/or out of date.
You should think HARD before bringing in something like memcached.
Caching can hide performance issues from you ("got a slow running query? just cache it and dont worry about fixing it!")
Invalidating stale data out is a nightmare.
You may spend days chasing bugs that get cleared up when you clear the cache, and it pollutes your code base.
I'm not saying don't do it, but think HARD before you do.
If you can get enough performance by adding a couple* of extra machines (which I think stackoverflow can) then do that and don't worry about caching. It'll be much cheaper in the long run.
*note I don't say 100 machines.

mod_perl vs mod_fastcgi

I'm developing a web app in Perl with some C as necessary for some heavy duty number crunching. The main problem I'm having so far is trying to decide if I should use mod-perl, mod-fastcgi or both to run my scripts because I'm having a difficult time trying to analyze the pros and cons of each mod.
Can anyone post a summary or give a link where I can find some comparison information and perhaps some recommendations with examples?
They are quite different beasts.
mod_fastcgi (by the way, mod_fcgid is recommended) just supports the FCGI protocol to execute CGIs faster with some knobs to control how many processes will it run simutaneously and not much more.
mod_perl, on the other hand is a platform for development of applications that exposes most Apache internals to you so you can tweak every webserver knob from your code, accelerates CGIs, and much more.
If all you wish is to run your CGIs quickly, and want to support as many hosts as possible, you should stick with supporting those two ways to run your code and probably standard CGI as well.
If you care about maximum efficiency at the cost of flexibility, you could aim for a single platform, probably mod_perl.
But probably the sanest option is to run everywhere and use a framework to build the application that'll take care of using the advantages of a particular way of executing if present, like Catalyst.
I would advise you to use a framework such as Catalyst that takes care of such details. For most applications, it doesn't matter how the program connects to the webserver, as long as it is done in an efficient way. The choice between mod_perl and FastCGI should be made by the sysadmin who deploys it, not the developer.
Here is a site with some actual performance comparisons of mod_perl, mod_fastcgi, cgi (Perl) and a Java servlet - for a very basic script: https://sites.google.com/site/arjunwebworld/Home/programming/apache-jmeter
In summary:
cgi - 1200+ requests per minute
mod_perl - 6000+ requests per minute (ModPerl::PerlRun only)
fast_cgi - 6000+ requests per minute
mod_perl - 6000+ requests per minute (ModPerl::Registry)
servlets - 2438 requests per minute.
There is an old thread on PerlMonks comparing mod_perl and fastcgi here: http://www.perlmonks.org/?node_id=108008