What is a good way to measure performance and usage on my webserver - webserver

We have an internal web server (intranet) and starting next week we will be under a heavy load. I need to monitor things like users online, page hits, wait time. I can't use outside sites to measure this. I need something that can be loaded internally.

have a look at siege
you can collect some urls using proxy tool or import your google webmasters sitemap extraciton.
then you can run some simluations, specify clients, delys etc.
you can also, if you have external proxies, simulate different users.

Related

What are the limitations of the flask built-in web server

I'm a newbie in web server administration. I've read multiple times that flask built-in web server is not designed for "production", and must be used only for tests and debug...
But what if my app touchs only a thousand users who occasionnaly send data to the server ?
If it works, when will I have to bother with the configuration of a more sophisticated web server ? (I am looking for approximative metrics).
In a nutshell, I would love to find what the builtin web server can do (with approx thresholds) and what it cannot.
Thanks a lot !
There isn't one right answer to this question, but here are some things to keep in mind:
With the right amount of horizontal scaling, it is quite possible you could keep scaling out use of the debug server forever. When exactly you would need to start scaling (or switch to using a "real" web server) would also depend on the environment you are hosting in, the expectations of the users, etc.
The main issue you would probably run into is that the server is single-threaded. This means that it will handle each request one at a time, serially. This means that if you are trying to serve more than one request (including favicons, static items like images, CSS and Javascript files, etc.) the requests will take longer. If any given requests happens to take a long time (say, 20 seconds) then your entire application is unresponsive for that time (20 seconds). This is only the default, of course: you could bump the thread counts (or have requests be handled in other processes), which might alleviate some issues. But once again, it can still be slow under a "high" load. What is considered a "high" load will be dependent on your application and the expectations of a maximum acceptable response time.
Another issue is security: if you are concerned at ALL about security (and not just the security of the data in the application itself, but the security of the box that will be running it as well) then you should not use the development server. It is not ready to withstand any sort of attack.
Finally, the development server could just fail outright. It is not designed to be used as a long-running process (days, weeks, months), and so it has not been well tested to work in this capacity.
So, yes, it has limitations. Yes, you could still conceivably use it in production. And yes, I would still recommend using a "real" web server. If you don't like the idea of needing to install something like Apache or Nginx, you can still go with a solution that is still as easy as "run a python script" by using some of the WSGI Standalone servers, which can run a server that is designed to be in production with something just as simple as running python run_app.py in the command line. You typically just need to create a 4-5 line python script to import and create the server object, point it to your Flask app, and run it.
gunicorn could be run with only the following on the command line, no extra script needed:
gunicorn myproject:app
...where "myproject" is the Python package that contains the app Flask object. Keep in mind that one of developers of gunicorn would probably recommend against this approach. See https://serverfault.com/questions/331256/why-do-i-need-nginx-and-something-like-gunicorn.
The OP has long-since moved on, but for those who encounter this question in the future I would just add that setting up an Apache server, even on a laptop, is free and pretty easy. It can be readily configured for as few or as many features as you want just by uncomment in or commenting out lines in the config file. There might be an even easier GUI method for doing that nowdays, but just editing the configs is simple.

Can I send multiple requests at one time using Fiddler?

Using Fiddler, I want to send multiple requests in one hit, to check the response time from the server, if too many requests are sent at one time. Basically, I want to perform a, kind of, load testing on my service. Is there any way to perform this action? I want to repeat the process of hitting the server, again and again.
In Fiddler, you can repeat a request as many times as you like by hitting SHIFT+R on the selected Web Session. You'll be prompted for a repeat count and then Fiddler will issue the specified number of requests.
Caveat: Having said that, generally speaking, you'd want to use a tool like Telerik Test Studio's Load Test tool for a task like this. Alternatively, you could use Fiddler's Export architecture to generate a script for VS WebTest or Microsoft's free WCAT tool and use those tools to generate the load. You can then run these scripts on multiple machines from multiple networks and generate a more-realistic set of load than you could by simply running on a single client.
I've been load testing with StresStimulus today. Overall, I'm quite impressed.
It's now a standalone application (it used to be a fiddler extension). There's a 7 day free trial which allows up to 50 virtual users. Also, the setup wizard is great for beginners.
For basic load testing the trial should be fine. Consider upgrading for extensive/professional use.

How can I track API usage in Symfony2?

I'm building a RESTful service in Symfony2, using the FOSRestBundle. I can track page usage in web clients using Google Analytics. However, this is is obviously not going to work for requests by non-HTML clients.
Before I embark on installing Redis, writing services, event dispatchers, etc, has this problem already been solved? Is there a solution that doesn't have a serious impact on performance?
Based on stats of the project I'm replacing, I expect around 1,000 hits per hour with 90% of traffic coming from browsers. I won't be in control of the non-HTML clients, so adding tracking there is not an option.
I need the data for the same reason that anybody needs analytics data - to make pretty graphs, and give quantitative evidence about where to focus development resources.
There are a few options, and it depends on exactly how much information you need. If you just want to know basic activity (how many calls to each web service, when, source IP, user agent), then the apache logs already have all that information.
Use CustomLog to add any additional fields you need. E.g. you mentioned the Accept header, which can be added like this:
CustomLog logs/access_log "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{Accept}i\""
If it is GET, POST, PUT, etc. is contained in the %r part.
If you want to know what is actually in the POST or PUT data, then it is more difficult. The extreme solution is to use the mod_dumpio module. That logs all of the client input (all headers, all cookies, all POST data). If people use your REST API to upload images then, for good or bad, you get the full image in your log. That could get very big.
The solution I favour is to log from PHP: a custom log either at the top of your PHP script, or when you process the requests. You then have full control over what to log, the easiest format to analyze, and you can also put it in context (e.g. log text data, but not logging image bytes). In development and for small sites I do this in parallel with apache logging. If Apache is taking too much CPU, then disable apache logging, or bypass Apache completely. (I'm currently evaluating the built-in webserver in php 5.4 - it has support for routing, so could be very suitable for web services.)
BTW, analyzing server logs is good to do in parallel with Google Analytics: it helps you appreciate the accuracy of each.

How Do I Optimize Zend Framework

I have a application built on Zend Framework I am trying to optimize.
I did some Xdebug profiling and although i cant say i understand every nitty gritty of the results i got, some things were quite obvious from the result.
For instance, the file Bootstrap.php seems to be the one gulping most of the time taking 4,553MS seconds which accounts for 92.49% of the total time.
And if i dig further, I could see that Zend_Application_Bootstrap_Boostrap->run takes the bulk of the time. Checking this out again, I found out that Zend_Controller_Front->Dispatch might actually be the function inside the Boostrap.php that takes time to execute.
Question is, from these indices that i have, how best can I go about Optimizing the application? If it caching, how do i go about applying Caching to this situation?
Thanks
From the look of the callgrinds, on the login page the app is spending most of it's time in curl_exec, which is to be expected if you're doing a remote login. But it is doing 10 separate curl_execs which seems excessive. I'm not familiar with the LinkedIn login auth, but is it possible your app is running the remote login code multiple times?
On the standard page request the app is spending most of its time connecting to MySQL, and it seems to be doing this twice. Are you using a remote DB server, and do you need two separate DB connections?
Assuming you are using a remote DB server and it is on the same network as your web server, there seems to be some networking issue there. I'd check the latency to that server if you can, and try connecting to the IP address instead of a hostname to see if that makes any difference (if doing this is much faster this would suggest an issue with the DNS setup on your web server).

What technical considerations must a system/network administrator worry about when a site gets onto social bookmarking/sharing sites?

The reason I ask is that Stack Overflow has been Slashdotted, and Redditted.
First, what kinds of effect does this have on the servers that power a website? Second, what can be done by system administrators to ensure that their sites remain up and running as best as possible?
Unfortunately, if you haven't planned for this before it happens, it's probably too late and your users will have a poor experience.
Scalability is your first immediate concern. You may start getting more hits per second than you were getting per month. Your first line of defense is good programming and design. Make sure you're not doing anything stupid like reloading data from a database multiple times per request instead of caching it. Before the spike happens, you need to do some fairly realistic load tests to see where the bottlenecks are.
For absurdly high traffic, consider the ability to switch some dynamic pages over to static pages.
Having a server architecture that can scale also helps. Shared hosts generally don't scale. A single dedicated machine generally doesn't scale. Using something like Amazon's EC2 to host can help, especially if you plan for a cluster of servers from the beginning (even if your cluster is a single computer).
You're next major concern is security. You're suddenly a much bigger target for the bad guys. Make sure you have a good security plan in place. This is something you should always have, but it become more important with high usage.
Firstly, ask if you really want to spend weeks and thousands of $ on planning for something that might not even happen, and if it does happen, lasts about 5 hours.
Easiest solution is to have a good way to switch to a page simply allowing a signup. People will sign up and you can email them when the storm has passed.
More elaborate solutions rely on being able to scale quickly. That's firstly a software issue (can you connect to a db on another server, can you do load balancing). Secondly, your hosting solution needs to support fast expansion. Amazon EC2 comes to mind, or maybe slicehost. With both services you can easily start new instances ("Let's move the database to a different server") and expand your instances ("Let's upgrade the db server to 4GB RAM").
If you keep all data in the db (including sessions), you can easily have multiple front-end servers. For the database I'd usually try a single server with the highest resources available, but only because I haven't worked with db replication and it used to be quite hard to do, at least with mysql. Things might have improved.
The app designer needs to think about scaling up (larger machines with more cores and higher performance) and/or scaling out (distributing workload across multiple systems). The IT guy needs to work out how to best support that. The network is what you look at first, because obviously everything rides on top of it. Starting at the border, that usually means network load balancers and redundant routers being served by multiple providers. You can also look at geographic caching services and apps such as cachefly.
You want to reduce your bottlenecks as much as possible. You also want to design the environment such that it can be scaled out as needed without much work. Do the design work up front and it'll mean less headaches when you do get dugg.
Some ideas (of what I used in the past and current projects):
For boosting performance (if needed) you can put a reverse-proxying, caching squid in front of your server. Of course that only works if you don't have session keys and if the pages are somewhat static (means: they change only once an hour or so) and not personalised.
With the squid you can boost a bloated and slow CMS like typo3, thus having the performance of static websites with the comfort of a CMS.
You can outsource large files to external services like Amazon S3, saving your server's bandwidth.
And if you are able to spend some (three-figures per month) bucks, you can as well use a Content Delivery Network. Whith that in place you automatically have scaling, high-availability and low latencys for your users. Of course, your pages must be cachable, so session keys and personalised pages are a no-no. If designed carefully and with CDNs in mind, you can at least cache SOME content, like pics and videos and static stuff.
The load goes up, as other answers have mentioned.
You'll also get an influx of new users/blog comments/votes from bored folks who are only really interested in vandalism. This is mostly a problem for blogs which allow completely anonymous commenting, where some dreadful stuff will be entered. The blog platform might have spam filters sufficient to block it, but manual intervention is frequently required to clean up remaining drivel.
Even a little barrier to entry, like requiring a user name or email address even if no verification is done, will dramatically reduce the volume of the vandalism.