InProc vs. AppFabric session state with single web server - session-state

I have an ASP.Net MVC application which makes significant use of session to save state (including large data collections). At present, it is hosted on a single web server. The session is set to the default of InProc.
An issue arises whereby the application freezes for some users when many users are on line. I guess that this is because InProc session does not scale too well, and that there is only so much memory available to the process. (What happens if memory demand exceeds the available memory - does it swap out to disk?)
I have a couple of solutions in mind that would help with scalability. (a) Sql server session state; (b) Configure session state to use AppFabric caching. The first option looks like a good solution, except that it will affect performance and require stored items to be serializable.
What about configuring session state to use AppFabric caching (aka Velocity) in an environment where the single web server is also used as the cache host? How does this differ from InProc in this single-server environment? Will this provide more scalability and available memory than InProc, or will it essentially amount to the same constraints?

You would be better off implementing AppFabric Cache for your scenario. As your system grows, you can increase the number of cache servers with each new web node - something you cannot easily do with SQL Server without additional cost. SQL Server licensing also costs much more than AppFabric - which is bundled with a Windows Server license.
The only benefit SQL Server will provide is recoverability, but for what you need, it's probably overkill.
See related SO post discussing AppFabric Cache vs. SQL Server for session.
As for AppFabric Cache vs. InProc...
You could put your AppFabric Cache on another server if you are running into memory limitations. You can't do this with InProc.
Here are some other miscellaneous benefits of the AppFabric Cache:
Supports Local Cache to speedup retrieval costs involved in serializing/deserializing.
Provides finer grained controls with respect to cache eviction and expiration policies.
Supports compression of session contents to reduce network bandwidth.
Blob mode versus single item retrieval to improve data retrieval for large objects.
Same session state store can be used across multiple applications (via sharedId).

The most important thing is that the session will survive an app-pool recycle and even a redeploy of the application.
Also AppFabric can serialize IXmlSerializable objects as well as [Serializable]. If you try to use the out-of-proc ASP.NET session service you ironically cannot serialize IXmlSerializable objects such as XElement. You can also do completely customized serialization if you want. With AppFabric your app is much more 'azure' ready if you ever move that way.
Then of course you can use it for caching other data if you have a need for that.

Related

Caching in a Service oriented architecture

In a distributed systems environment, we have a RESTful service that needs to provide high read throughput at low-latency. Due to limitations in the database technology and given its a read-heavy system, we decided to use MemCached. Now, in a SOA, there are atleast 2 choices for the location of the cache, basically client looks up in Cache before calling server vs client always calls server which looks up in cache. In both cases, caching itself is done in a distributed MemCached server.
Option 1: Client -> RESTful Service -> MemCached -> Database
OR
Option 2: Client -> MemCached -> RESTful Service -> Database
I have an opinion but i'd love to hear arguments for and against either option from SOA experts in the community. Please assume either option is feasible, its a architecture question. Appreciate sharing your experience.
I have seen the
Option 1: Client -> RESTful Service -> Cache Server -> Database
working very well. Pros IMHO are that you are able to operate wtih and use this layer in a way allowing you to "free" part of the load on the DB. Assuming that your end-users can have a lot of similar requests and after all the Client can decide what storage to spare for caching. Also how often to clear it.
I prefer Option 1 and I am currently using it. In this way it is easier to control the load on the DB (just as #ekostatinov mentioned). I have lots of data that are required for every user in the system, but the data is never changed (such as some system rules, types of items, etc). It really reduces the DB load. In this way you can also control the behavior of the cache (such as when to clear the items).
Option 1 is the prefered option as it makes memcache an implementation detail of the service. the other option means that if the business changes and things can't be kept in the cache (or other can etc.) the clients would have to change. Option 1 hides all that behind the service interface.
Additionally option 1 lets you evolve the service as you wish. e.g. maybe later you think you need a new technology, maybe you'd solve the performance problem with the DB etc. Again, option 1 lets you make all these changes without dragging the clients into the mess
Is the REST ful API exposed to external consumers. In that case it is up to the consumer to decide if they want to use a cache and how much stale data can they use.
As for as the REST ful service goes, the service is the container of business logic and it is the authority of data, so it decides how much to cache, cache expiry, when to flush etc. A client consuming the REST service always assumes that the service is providing it with the latest data. And hence option 1 is preferred.
Who is the client in this case?
Is it a wrapper for your REST API. Are you providing both client and the service.
I can share my experience with Enduro/X middleware implementation. For local XATMI service calls any client process connects to shared memory (LMDB) and checks the result there. If there is response saved it returns data directly from shm. If data is not there, client process goes the longest path and performs the IPC. In case of REST access, network clients still performs the HTTP invocation, but HTTP server as XATMI client returns the data from shared mem. From real life, this technique was greatly boosting the web frontend web application which used middleware via REST calls.

The benefits of deploying multiple instances for serving/data/cache

although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
Pros:
faster speed between database and page servlet (same host).
cheaper.
less end points to secure.
Cons:
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
Pros:
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
Cons:
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.

Memcached/other key-value engine isolation

I have a bunch of web servers(frontends) behind balancer. Each apache process runs with it's own user for every virtualhost. Code that apache runs is PHP and it's not trusted code.
I need to have shared (between web servers) session storage and limit user(vhost) to only access it's session storage. So I want to avoid one tenant to be able to purge or corrupt memcached stored data.
So I basically looking for solution to authenticate users + create private buckets.
I know there is always MySQL way avaliable but I want to avoid performance penalty introduced by SQL layer.
Any solution in your mind so far?
I found product called CouchBase which fully comply with my requirements. It has buckets along with memcache caching layer and access protocol. It has SASL authentication and a bonus of load balancing and fail tolerance.

Is a HTTP REST request the only way to access Azure Storage?

I've started reading about Azure Storage and it seems that the only way to access it is via an HTTP REST request.
I've seen that there are a few wrappers around these requests, for example, StorageClient (by Microsoft) and cloud storage api (http://cloudstorageapi.codeplex.com/), but they all still use REST in the background (to the best of my understanding).
It seems unreasonable to me that this is actually true. If I have a machine in Azure, and I want to access data stored in Azure Storage, it would seem every inefficient to
Yes, all storage calls are normalized to the REST API. Its actually very efficient when you consider the problem. You are thinking of a machine in Azure and data in azure as stored on two servers sitting in a rack. Remember in Azure, your data, your "servers", etc may be stored in different racks, different zones, and even different datacenters. With the REST API, your apps don't have to care about any of this. They just get the data with the URL.
So while a tiny HTTP overhead may appear inefficient if these were two boxes next to each other, its actually a very elegant solution when they are on different continents. Factor in concepts such as CDN, and it becomes an even better fit.
Layered onto this base concept is the Azure load balancer and other pieces of the internal infrastructure which can further optimize every request because they are all the same (HTTP). I also wouldn't be surprised (not sure at all, I dont work for MSFT) if the LB was doing traffic management optimizations when a request is made intra-datacenter.
Throughput on the storage subsystem in Windows Azure is pretty high. I'd be very surprised if the system cannot deliver to your needs.
There are also many design patterns to increase scalability of your app, like asynch processing, batching requests, delayed processing, etc.

REST service with load balancing

I've been considering the advantages of REST services, the whole statelessness and session affinity "stuff". What strikes me is that if you have multiple deployed versions of your service on a number of machines in your infrastructure, and they all act on a given resource, where is the state of that resource stored?
Would it make sense to have a single host in the infrastructre that utilises a distributed cache, and any state that is change inside a service, it simply fetches/puts to the cache? This would allow any number of deployed services for loading balancing reasons to all see the same state views of resources.
If you're designing a system for high load (which usually implies high reliability), having a single point of failure is never a good idea. If the service providing the consistent view goes down, at best your performance decreases drastically as the database is queried for everything and at worst, your whole application stops working.
In your question, you seem to be worried about consistency. If there's something to be learned about eBay's architecture, it's that there is a trade-off to be made between availability/redundancy/performance vs consistency. You may find 100% consistency is not required and you can get away with a little "chaos".
A distributed cache (like memcache) can be used as a backing for a distributed hashtable which have been used extensively to create scalable infrastructures. If implemented correctly, caches can be redundant and caches can join and leave the ring dynamically.
REST is also inherently cacheable as the HTTP layer can be cached with the appropriate use of headers (ETags) and software (e.g. Squid proxy as a Reverse proxy). The one drawback of specifying caching through headers is that it relies on the client interpreting and respecting them.
However, to paraphrase Phil Karlton, caching is hard. You really have to be selective about the data that you cache, when you cache it and how you invalidate that cache. Invalidating can be done in the following ways:
Through a timer based means (cache for 2 mins, then reload)
When an update comes in, invalidating all caches containing the relevant data.
I'm partial to the timer based approach as its simpler to implement and you can say with relative certainty how long stale data will live in the system (e.g. Company details will be updated in 2 hours, Stock prices will be updated in 10 seconds).
Finally, high load also depends on your use case and depending on the amount of transactions none of this may apply. A methodology (if you will) may be the following:
Make sure the system is functional without caching (Does it work)
Does it meet performance criteria (e.g. requests/sec, uptime goals)
Optimize the bottlenecks
Implement caching where required
After all, you may not have a performance problem in the first place and you may able to get away with a single database and a good back up strategy.
I think the more traditional view of load balancing web applications is that you would have your REST service on multiple application servers and they would retrieve resource data from single database server.
However, with the use of hypermedia, REST services can easily vertically partition the application so that some resources come from one service and some from another service on a different server. This would allow you to scale to some extent, depending on your domain, without have a single data store. Obviously with REST you would not be able to do transactional updates across these services, but there are definitely scenarios where this partitioning is valuable.
If you are looking at architectures that need to really scale then I would suggest looking at Greg Young's stuff on CQS Architecture (video) before attempting to tackle the problems of a distributed cache.