Caching solutions for multi-webserver configuration? - memcached

I am looking into caching solutions, for a multi webserver configuration. Thought of memcached as being cheap (free) and proven over the years. Microsoft is also developing a caching solution for webfarms, called Velocity, but this is still in CTP2.

There is a distributed caching model used in the configuration service that is part of the .NET Stocktrader sample application. This is a framework that allows you to run multiple nodes with centralised configuration management, load balancing and distributed caching. You can implement the configuration service as is or look through the code and grab what suits you. Worth a look.

When I listened to Scott Hanselman's podcast interview with the StackOverflow team, I was left with the impressions that a. they did use some kind of caching and b. they knew almost nothing about what they were doing in this respect and had fiddled with a few options and then written a blog post or two.
They currently seem to use client-side caching rather half-heartedly (short expiry times on images, for example), and I think they use a lot of ASP.NET user-mode caching, and I can't tell if they use IIS kernel-mode caching. (They didn't seem to be able to tell Scott that, either.)
However, the podcast was a while back, and I was driving at the time, so my memory might be wrong and/or out of date.

You should think HARD before bringing in something like memcached.
Caching can hide performance issues from you ("got a slow running query? just cache it and dont worry about fixing it!")
Invalidating stale data out is a nightmare.
You may spend days chasing bugs that get cleared up when you clear the cache, and it pollutes your code base.
I'm not saying don't do it, but think HARD before you do.
If you can get enough performance by adding a couple* of extra machines (which I think stackoverflow can) then do that and don't worry about caching. It'll be much cheaper in the long run.
*note I don't say 100 machines.

Related

Is statebase framework such as Lift Scalable?

I've watch Harry Heymann, from Foursquare, gave a presentation on Lift to BASE usergroup. He mention something about how Lift being statebase isn't going to scale well in that video.
Is that true? If so why is that? Note: I know very little about state base.
I can't seem to find the google, I'll look for it later. Thank you in advance.
When this questions comes up on the Lift Mailing list, what the author of the framework usually replies with is a blog post he wrote some time ago, which explains why Lift is Stateful, but at the same time how you can use Lift as a stateless framework.
This is the link
David Pollack has a good answer to this at this this Quora thread, in a comment to Jackson Davis's answer:
In practice, scaling a Lift site is much much easier than scaling a LAMP site. Why? Well, state exists someplace. If it exists in the JVM, you get a lot of performance benefits and stability as well as, in Lift's case, lots of security. Contrast that with sessions in memcached. "Whoop, memcached went down, there go a pile of sessions." "Whoops, we've got a new memcached hashing algorithm, there go all the session." "Whoops, Google just crawled us creating 200,000 new sessions pushing all the but the active sessions out of cache." "Whoops, the Ruby runtime just went wild, ate all the VM on one of our boxes, memcached went down..." So, you try storing sessions in some wacky shared version of MySQL. This solution requires tons of hardware and a team of make sure that the sharing code is correct, etc. Contrast that to using Nginx, Jetty and session affinity. It's about 4 hours of setup time and it just works. See http://blog.harryh.org/post/7550... So, talk to a Facebook engineer about the challenges they go through to manage state between the front end, memcached, MySQL, etc. Compare that to Twitter with the famous fail whale. Compare that to Apple's store and the iTunes store which are written on WebObject (which is highly stateful.) Lift apps running at scale typically require 7% of the front end resources of LAMP app. The Lift apps that are running at scale (Foursquare and Novell pulse are two) do not have the kind of scaling issues associated with LAMP sites that have similar traffic patterns. Scaling with Lift is neither tricky, nor risky. It's simple. It's known. It's proven. Scaling with LAMP is playing whack-a-mole with state and that only becomes a problem at scale. -David Pollak • Jul 20, 2010
I think it's pretty clear that Lift apps scale very well. Foursquare and the UK Guardian are both using Lift. Both sites are very highly trafficked and neither has had a material Lift-related outage. Please also see the link that Diego posted. It provides an in-depth discussion of scaling Lift-powered sites.

SmartGWT, ZK and GenericFrame - Online Homework

Good day,
Our school, a small high school in semi-rural New Zealand, is currently looking into online homework solutions. Being one of the IT guys, I have been asked to look into some of the options. We have checked around and there are no robust solutions that cover what we are looking for. So, we are considering development of our own system, either on our own or in collaboration with some other schools.
Before I put significant time into any one option, I would thought I should ask for some expert advice.
Please keep in mind that one of our major obstacles is that around 20% of our students are on dial-up because broadband is not available in their area.
We are also not limited to the technologies listed, they just are the ones that we have been looking into up to this point.
With that in mind, here goes.
1. Is there a way to pre-determine the bandwidth needed for these technologies?
2. If bandwidth continued to be too limiting, could the final solution stand alone so we could distribute it to students on CD or USB stick?
3. What are some pros/cons of each for use with databases, specifically mysql or postgresql? (After all we do need to keep track of lots of data)
4. What are some pros/cons of each for of these RIA development?
I appreciate everyone for sharing their time and expertise on the matter.
Cheers,
Ben
1) If you write full-AJAX application, such as in GWT, the bandwitch will be:
a) the size of application java script, images, etc., you may consider that everything is loaded when user logs in (cache for images may seems to be big, but it's easily overloaded)
b) the size of communication - in GWT it depends only from you! no magic full-frame reloading, sending is only what YOU are wanting to send
2) I do not catch your point, stand alone applications can be distributed such way, applications that use databases generally can't
3) postgresql has high compatibility with Oracle - same transaction+select for update behaviour, pgPLSQL is highly inspired by PL/SQL (easy to rewrite stored procedures).
I personally suggest MySQL for a school project for its simplicity. PostgreSQL is powerful but a bit complicate to configure and the visual tool for optimizing queries not good.
Without considering the bandwidth, I definitely suggest ZK since, again, it is much easier to learn, to develop and to maintain (also much more powerful). The bandwidth consumption and latency of GWT really depends how much effort you want to invest, and how skillful your people are familiar with distributed computing, while the network bandwidth is basically the states of UI (not data), which is reasonably small. In short, you could have the best network bandwidth and latency if you optimize it at the best with GWT, while ZK is less to worry but, if you want to improve, you have to use jQuery (i.e, in JavaScript).
Thanks lechlukasz, I appreciate your comments and insight.
I will clarify my point about stand alone applications. We have a number of students, as high as 20%, who do not have access to broadband due to their geographic location. We are considering, as part of the design, how we may be able to distribute a stand alone version.
For instance, if we were to abstract all the database calls using a separate class in GWT, we could recompile a stand alone version that didn't make the database calls. The database would likely only be for tracking results and reporting.
In reality, we would likely implement the front end product first with references to empty methods for storing the results in a database and implement those methods at a later time.
For the record, we have started to code up some test cases using GWT/SmartGWT and are pleased with the results. Although we cannot comment on the other technologies considered because we didn't try them to the same extent, we are pleased with the results to this point of the project.
Cheers,
Ben

When and how to go about performing caching in asp.net mvc?

UPDATE : I also found ncache which seems useful and also came to know that stackoverflow uses redis for caching. I also have come across memcached and seems one of the better alternatives.
I have found this but I needed to know what are the ways in which I can cache some of my LINQ queries and use them efficiently. I found Output cache in asp.net mvc are there other ways to do caching?
I am kind of a newbie and never done caching before so I would appreciate if anyone can point me in the right direction here? Mainly I want answer to when would be caching necessary and how to go about doing caching in asp.net mvc ?
In my experience application level caching is rarely the correct approach to fixing performance issues and it nearly always causes more problems than it solves.
Before you embark on any caching you should first:
(i) profile your application and the queries it makes to see if you can address them directly (e.g. query patterns that are too wide (fetching columns that aren't displayed) or too deep (e.g. fetching more rows than you display), too frequent (lazy loading might be causing more round trips than you need), too expensive (poor table design might mean more joins than you need), or the tables themselves might not be indexed correctly;
(ii) take a holistic look at your web site and the user experience to see how you can improve the perceived performance (e.g. set proper browser-level cache cache headers on static content). Using AJAX might and a paged grid view like jQGrid might eliminate many database accesses while a user is paging through records because the rest of the page content is not changing.
After you've exhausted fixing the real problem you may then then be ready to consider caching.
But before you do, make a simple calculation: how much does an upgraded server cost vs the development and testing time you will spend implementing caching and tracking down odd stale-cache issues? Sometimes it's cheaper just to upgrade...

Risk evaluation for framework selection

I'm planning on starting a new project, and am evaluating various web frameworks. There is one that I'm seriously considering, but I worry about its lasting power.
When choosing a web framework, what should I look for when deciding what to go with?
Here's what I have noticed with the framework I'm looking at:
Small community. There are only a few messages on the users list each day
No news on the "news" page since the previous release, over 6 months ago
No svn commits in the last 30 days
Good documentation, but wiki not updated since previous release
Most recent release still not in a maven repository
It is not the officially sanctioned Java EE framework, but I've seen several people mention it as a good solution in answers to various questions on Stack Overflow.
I'm not going to say which framework I'm looking at, because I don't want this to get into a framework war. I want to know what other aspects of the project I should look at in my evaluation of risk. This should apply to other areas besides just Java EE web, like ORM, etc.
I'll say that so-called "dead" projects are not that great a danger as long as the project itself is solid and you like it. The thing is that if the library or framework already does everything you can think you want, then it's not such a big deal. If you get a stable project up and running then you should be done thinking about the framework (done!) and focus only on your webapp. You shouldn't be required to update the framework itself with the latest release every month.
Personally, I think the most important point is that you find one that is intuitive to your project. What makes the most sense? MVC? Should each element in the URL be a separate object? How would interactivity (AJAX) work? It makes no sense to pick something just because it's an "industry standard" or because it's used by a lot of big-name sites. Maybe they chose it for needs entirely different from yours. Read the tutorials for each framework and be critical. If it doesn't gel with your way of thinking, or you have seen it done more elegantly, then move on. What you are considering here is the design and good design is tantamount for staying flexible and scalable. There's hundreds of web frameworks out there, old and new, in every language. You're bound to find half a dozen that works just the way you want to think in your project.
Points I consider mandatory:
Extensible through plug-ins: check if there's already plug-ins for various middleware tasks such as memcache, gzip, OpenID, AJAX goodness, etc.
Simplicity and modularity: the more complex, the steeper the learning curve and the less you can trust its stability; the more "locked" to specific technologies, the higher the chances that you'll end up with a chain around your ankle.
Database agnostic: can you use sqlite3 for development and then switch to your production DB by changing a single line of code or configuration?
Platform agnostic: can you run it on Apache, lighttpd, etc.? Could you port it to run in a cloud?
Template agnostic: can you switch out the template system? Let's say you hire dedicated designers and they really want to go with something else.
Documentation: I am not that strict if it's open-source, but there would need to be enough official documentation to enable me to fully understand how to write my own plug-ins, for example. Also look to see if there's source code of working sites using the same framework.
License and source code: do you have access to the source code and are you allowed to modify it? Consider if you can use it commercially! (Even if you have no current plans to do that currently.)
All in all: flexibility. If I am satisfied with all four points, I'm pretty much done. Notice how I didn't have anything about "deadness" in there? If the core design is good and there's easily installable plug-ins for doing every web-dev 3.0-beta buzzword thing you want to do, then I don't care if the last SVN commit was in 2006.
Here are the things I look for in a framework before I decide to use it for a production environment project:
Plenty of well laid out and written documentation. Bad documentation just means I'm wasting time trying to find how everything works. This is OK if I am playing around with some cool new micro framework or something else, but not when it's for a client.
A decently sized community so that you can ask questions, etc. A fun and active IRC channel is a big plus.
Constant iteration of the product. Are bugs being closed or opened on a daily/weekly basis? Probably a good sign.
I can go through the code of the framework and understand what's going on. Good framework code means that the projects longterm life has a better chance of success.
I enjoy working with it. If I play with it for a few hours and it's the worst time of my life, I sure as hell won't be using it for a client.
I can go on, but those are some primary ones off the top of my head.
Besides looking at the framework, you also need to consider a lot of things about yourself (and any other team members) when evaluating the risks:
If the framework is a new, immature, "bleeding-edge" framework, are you going to be willing and able to debug it and fix or work around whatever problems you encounter?
If there is a small community, you'll have to do a lot of this debugging and diagnosis yourself. Will you have time to do that and still meet whatever deadlines you may have?
Have you looked at the framework yourself to determine how good it is, or are you willing to rely on what others say about it? Why do you trust their judgment?
Why do you want to use this rather than the "officially sanctioned Java EE framework"? Is it a pragmatic reason, or just a desire to try something new?
If problems with the framework cause you to miss deadlines or deliver a poor product, how will you talk about it with your boss or customer?
All the signs you've cited could be bad news for your framework choice.
Another thing that I look for are books available at Amazon and such. If there's good documentation available, it means that authors believe it has traction and you'll be able to find users that know it.
The only saving grace I can think of is relative maturity. If the framework or open source component is mature, there's a chance that it does the job as written and doesn't require further extension.
There should still be a bug tracker with some evidence of activity, because no software is without bugs (except for mine). But it need not be a gusher of requests in that case.

How do I plan an enterprise level web application?

I'm at a point in my freelance career where I've developed several web applications for small to medium sized businesses that support things such as project management, booking/reservations, and email management.
I like the work but find that eventually my applications get to a point where the overhear for maintenance is very high. I look back at code I wrote 6 months ago and find I have to spend a while just relearning how I originally coded it before I can make a fix or feature additions. I do try to practice using frameworks (I've used Zend Framework before, and am considering Django for my next project)
What techniques or strategies do you use to plan out an application that is capable of handling a lot of users without breaking and still keeping the code clean enough to maintain easily?
If anyone has any books or articles they could recommend, that would be greatly appreciated as well.
Although there are certainly good articles on that topic, none of them is a substitute of real-world experience.
Maintainability is nothing you can plan straight ahead, except on very small projects. It is something you need to take care of during the whole project. In fact, creating loads of classes and infrastructure code in advance can produce code which is even harder to understand than naive spaghetti code.
So my advise is to clean up your existing projects, by continuously refactoring them. Look at the parts which were a pain to change, and strive for simpler solutions that are easier to understand and to adjust. If the code is even too bad for that, consider rewriting it from scratch.
Don't start new projects and expect them to succeed, just because your read some more articles or used a new framework. Instead, identify the failures of your existing projects and fix their specific problems. Whenever you need to change your code, ask yourself how to restructure it to support similar changes in the future. This is what you need to do anyway, because there will be similar changes in the future.
By doing those refactorings you'll stumble across various specific questions you can ask and read articles about. That way you'll learn more than by just asking general questions and reading general articles about maintenance and frameworks.
Start cleaning up your code today. Don't defer it to your future projects.
(The same is true for documentation. Everyone's first docs were very bad. After several months they turn out to be too verbose and filled with unimportant stuff. So complement the documentation with solutions to the problems you really had, because chances are good that next year you'll be confronted with a similar problem. Those experiences will improve your writing style more than any "how to write good" style guide.)
I'd honestly recommend looking at Martin Fowlers Patterns of Enterprise Application Architecture. It discusses a lot of ways to make your application more organized and maintainable. In addition, I would recommend using unit testing to give you better comprehension of your code. Kent Beck's book on Test Driven Development is a great resource for learning how to address change to your code through unit tests.
To improve the maintainability you could:
If you are the sole developer then adopt a coding style and stick to it. That will give you confidence later when navigating through your own code about things you could have possibly done and the things that you absolutely wouldn't. Being confident where to look and what to look for and what not to look for will save you a lot of time.
Always take time to bring documentation up to date. Include the task into development plan; include that time into the plan as part any of change or new feature.
Keep documentation balanced: some high level diagrams, meaningful comments. Best comments tell that cannot be read from the code itself. Like business reasons or "whys" behind certain chunks of code.
Include into the plan the effort to keep code structure, folder names, namespaces, object, variable and routine names up to date and reflective of what they actually do. This will go a long way in improving maintainability. Always call a spade "spade". Avoid large chunks of code, structure it by means available within your language of choice, give chunks meaningful names.
Low coupling and high coherency. Make sure you up to date with techniques of achieving these: design by contract, dependency injection, aspects, design patterns etc.
From task management point of view you should estimate more time and charge higher rate for non-continuous pieces of work. Do not hesitate to make customer aware that you need extra time to do small non-continuous changes spread over time as opposed to bigger continuous projects and ongoing maintenance since the administration and analysis overhead is greater (you need to manage and analyse each change including impact on the existing system separately). One benefit your customer is going to get is greater life expectancy of the system. The other is accurate documentation that will preserve their option to seek someone else's help should they decide to do so. Both protect customer investment and are strong selling points.
Use source control if you don't do that already
Keep a detailed log of everything done for the customer plus any important communication (a simple computer or paper based CMS). Refresh your memory before each assignment.
Keep a log of issues left open, ideas, suggestions per customer; again refresh your memory before beginning an assignment.
Plan ahead how the post-implementation support is going to be conducted, discuss with the customer. Make your systems are easy to maintain. Plan for parameterisation, monitoring tools, in-build sanity checks. Sell post-implementation support to customer as part of the initial contract.
Expand by hiring, even if you need someone just to provide that post-implementation support, do the admin bits.
Recommended reading:
"Code Complete" by Steve Mcconnell
Anything on design patterns are included into the list of recommended reading.
The most important advice I can give having helped grow an old web application into an extremely high available, high demand web application is to encapsulate everything. - in particular
Use good MVC principles and frameworks to separate your view layer from your business logic and data model.
Use a robust persistance layer to not couple your business logic to your data model
Plan for statelessness and asynchronous behaviour.
Here is an excellent article on how eBay tackles these problems
http://www.infoq.com/articles/ebay-scalability-best-practices
Use a framework / MVC system. The more organised and centralized your code is the better.
Try using Memcache. PHP has a built in extension for it, it takes about ten minutes to set up and another twenty to put in your application. You can cache whatever you want to it - I cache all my database records in it - for every application. It does wanders.
I would recommend using a source control system such as Subversion if you aren't already.
You should consider maybe using SharePoint. It's an environment that is already designed to do all you have mentioned, and has many other features you maybe haven't thought about (but maybe you will need in the future :-) )
Here's some information from the official site.
There are 2 different SharePoint environments you can use: Windows Sharepoint Services (WSS) or Microsoft Office Sharepoint Server (MOSS). WSS is free and ships with Windows Server 2003, while MOSS isn't free, but has much more features and covers almost all you enterprise's needs.