Where to host a queue in a rackspace cloud deployment - deployment

I am working on an application that consists out of a asp.net mvc front-end which call a bunch of webservices and those call onto sql server. The actions on the front-end could lead to a very large amount of jobs that need execution which I want to queue somewhere.
Due to the expected load profile of the application it makes sense to use a scalable infrastructure like the rackspace cloud. Now I am wondering where it would be best to queue the jobs. Queueing them on the front-end server means that the number of front-end servers can only be scalled back down once the queues are processed which is a waste of resources if the peak load on the front-end is over we want to scale that down and scale up on machines that process the queue items.
If we queue them on the database server we are adding the load onto a single machine which in the current setup is already the most likely botleneck. How would you design this?

You should read up on event sourcing and CQRS - in particular Greg Youngs 6hr presentation (http://www.viddler.com/v/dc528842) - it's aim is to alleviate the burden of these sorts of issues in a tried and tested way.
hth

Related

Will WebFlux have any bottlenecks in such architecture?

We're currently about to migrate from monolithic design to the microservice architecture, trying to choose the best way to replace JAX-WS with RESTful and considering to use Spring WebFlux.
We currently have an JAX-WS endpoint deployed at Tomcat EE serving requests from third-party clients. Webservice endpoint makes a long running blocking call to the database and then sends a SOAP-response to the client with a data retrieved from DB (Oracle).
Oracle DB will be replaced with one of NoSQL databases soon (possibly it will be MongoDB). Since MongoDB supports asynchronous calls we're considering to substitute current implementation with a microservice exposing REST endpoint based on WebFlux.
We have about 2500 req/sec at peaks, so current endpoint often gets down with a OutOfMemoryError. It was a root cause that pushed us towards migration.
My thoughts are to create a non-blocking endpoint which will call MongoDB in asynchronous manner and send a REST-response to the client. So I have a few questions considering basic features that WebFlux provides:
As far as I concerned there is a built-in backpressure control at
the business-level (not TCP flow control) in WebFlux and it works
generally via Reactive Streams. Since our clients are not
reactive, does it means that such way of a backpressure control is
not implementable here?
Suppose that calls to a new database remains long-running in a new
architecture. Since Netty uses EventLoop to serve incoming
requests, is there possible a situation when the microservice has
accepted all incoming HTTP connections, invoke an async call to the
db and subscribed a resulted Mono to the scheduler, but, since
the request quantity keeps growing explosively, application keep
creating new workers at scheduler pools that leads to a
crashing? Is this a realistic scenario?
Suppose that calls to the database remained synchronous. Is there a
way to handle them using WebFlux in a such way that microservice
will remain reachable under load?
Which bottlenecks can be found in such design? Does this solution
looks adequate?
Does Netty (or Reactor-Netty, or whatever) has a tool to limit a
quantity of requests processing simultaneously? Say I would to limit
the endpoint to serve not more than 100 parallel requests and skip
all requests above that point, is it possible?
Suppose I will create a huge amount of threads serving async (or
maybe sync) calls to the DB. Where is a breaking point when the
application will crash or stop responding to the incoming
HTTP-requests? What will happened there - we will ran out of memory
or..?
Finally, there were no any major issues concerning perfomance during our pilot project. But unfortunately we didn't take in account some specific Linux (and also OpenShift) TCP tuning props.
They may significanly affect the overall perfomance, in our case we've gained about 10 times more requests after tuning.
So pay attention to the net.core.somaxconn and other related parameters.
I've summarized our expertise in the article.

Sample REST Observable service and a remote subscriber client in Java 9/RxJava 2

Here is the background:
We have a cluster (of 3) different services deployed on various containers (like Tomcat, TomEE, JBoss) etc. Each of the services does one thing. Like one service manages a common DB and provides REST services to CRUD the db. One service puts some data into a JMS Queue, Another service reads from the Queue and updates the DB. There is a client app that makes a REST service call to one of the service that sets off creating a row in the db, pushing that row into a queue etc.
Question: We need to implement the client app so that we know at any given point in time where the processing is. How do I implement this in RcJava 2/Java 9?
First, you need to determine what functionality in RxJava 2 will benefit you.
Coordination between asynchronous sources. Since you have a) event-driven requests from one side, and b) network queries on the other sides, this is a good fit so far.
Managing a stream of data, transforming and combining from one or more sources. You have given no indication that this is required.
Second, you need to determine what RxJava 2 does not provide:
Network connections. This is provided by your existing libraries.
Database management. Again, this is provided in your existing solutions.
Now, you have to decide whether the firstlies add up to something you can benefit from, given the up-front costs of learning a new library.

Server Architecture for hosting Java PLAY application in the cloud

This is rather a set of questions than one very specific question. In the last couple weeks/days I puzzled together information regarding how to properly host a JAVA PLAY application "in the cloud", as lots of this information is scattered over different services, I felt like gathering up all these small pieces to one, because lots of things are important to be seen in full context. However, I moved my considerations to the bottom of the question, as they are mainly my opinions and subjective findings, which I don't want to be held responsible for. If I got something wrong, please don't hesitate to point that out.
Hosting Java PLAY + MySQL on AWS for world wide accessibility
Our Scenario: we have a quite straight forward application written within the Java PLAY framework (https://www.playframework.com/), working on iOS and Android as well as with a backend-system (for administration, content management and API), storing data in a MySQL DB. While most of the users' interactions with the server is quick and easy (login, sync some data) there are also some more data-intensive tasks (download some <100mb data zips to the mobile phone, upload a couple of mb to the server). Therefore we were looking for a solution to properly provide users far away from our servers with reasonable response times. The obvious next step was hosting in the cloud.
Hosting setup within AWS:
Horizontal scaling: for the start, only 1 EC2 instance with our app will be running in eu-1a. We will need to evaluate how much resources one instance actually requires, if more instances are needed and if more instances would actually benefit to quicker response times.
Horizontal scaling across regions: once the app generates heavy user load from another region, the whole EC2 instance should be duplicated and put to another region, running a db read replica (see Setting up a globally available web app on amazon web services and https://aws.amazon.com/de/blogs/aws/cross-region-read-replicas-for-amazon-rds-for-mysql/ ).
Vertical scaling of EC2 instances: in recent tests of the old hosting setup, the database proved to be the bottleneck rather than the play app and its server's hardware specifications. Therefore it is not yet fully clear how much vertical scaling would affect response times. If a t2.micro instance serves as good as a m3.xlarge instance, of course we would rather climb our way up from the bottom here.
Vertical scaling of RDS: we will need to estimate how much traffic hits the DB server and what CPU/RAM/etc will be required. Probably we will work our way up here aswell.
Global Redirection: done using Amazon Route 53 (?). A user from Tokio should be redirected to the EC2 instance running in Asia; a user from Rome to the EC2 instance in Europe. This does not only affect API calls within the app, but also content delivery (in both directions).
Open Questions regarding the setup
Is this setup conclusive? Am I missing crucial components?
Regarding global redirection: is Amazon Route 53 the right tool? How does it differ from CloudFront (which strikes me to be purely for content / media distribution?).
How do I define correct data/api endpoints for my app? Of course I don't want to define the database endpoint of a db read replica during app deployment. Will this also happen during the AR53 (question 2) setup? Same goes for API calls, of course the app should direct it's calls to https://myurl.com/api and from there it should be redirected. Is this realistic?
I would highly appreciate all kinds of thoughts (!), also regarding the background info written below. If you can point me to further reading to solve my questions on my own, I am also very thankful - there is simply a huge load of information regarding this, but this makes it hard to narrow the answers down. I do have knowledge in hosting/servers, but I am pretty sure there are true experts out there waiting to slap me with knowledge. :)
Background-Information
Current Hosting Setup: a load balancer distributes the traffic on 2 root linux servers, both of them running the PLAY app, one of them also holding the MySQL installation.
The current hosting setup has 3 big flaws:
No vertical scalability: the hosting company would take money for each scaling step. Currently the servers are running idle, but if the app booms, we could run short on capacity quickly. Running idle is still paid as if permanently under full load. This is expensive!
No deployment support: currently, we connect through SSH, manually deploy the correct folders to the file system, recompile on the server, set privileges, apply database evolutions; do the same for the second server (with different db connection parameters). What could possibly go wrong. ;)
No worldwide availability: to set up another server in another region of the world would mean a huge effort. To have a synchronized replica of our DB can be done, but once again deploying would mean downtime, room for errors and therefore time and money.
Hosting Options for Java PLAY:
There are lot of different blog posts about this. In short:
AWS: Amazon Web Services is one of the first places you start looking. Here you get everything that's possible, at a flexible price. You set yourself up an EC2 instance, a MySQL RDS and you're good to go - all of this in the free tier, so you can experiment, play around, test your stuff.
Microsoft Azure: similar to AWS regarding pricing and possibilities. However, I did not dive into setting up and deploying our application for test purposes.
Heroku: super easy deployment from within PLAY, scalable servers. However (on the first glance?) lacks possibility to supply remote regions with high speed content.
Jelastic: even easier deployment from within PLAY / IntelliJ IDEA. You push your app image to jelastic, jelastic distributes it further to their infrastructure providers.
RedHat OpenShift (https://www.openshift.com/): sounds promising, yet not as complete as AWS.
Lots of choices and possible setups/prices. Especially after finding out about deployment using boxfuse (https://cloudcaptain.sh/) I made my choice for AWS, as it offers absolutely all we need from 1 source. Boxfuse has low monthly costs but is perfectly integrated into AWS. Scaling is supported as well as the 3 common environments (dev/test/prod). Support is outstanding.
The setup looks good. I would however make one change: your large up- & downloads. As mobile speeds may not be ideal, have your app serve long-running requests is something you should avoid as this will needlessly tie up server threads. Instead consider having users upload and download straight from S3 using presigned URLs. You can then later add CloudFront to the mix when it makes financial sense to do so.
R53 will work just fine for picking the best server(s) for each end user.
For EC2 consider having an ELB + Auto-Scaling Group setup. Even just for a single instance you get the benefit of permanent health monitoring and auto-respawns. If you expect more load you can then auto-scale based on your expected bottleneck (cpu, network i/o). This will give you a more autonomous and robust setup than manually having to scale up and down based on your own monitoring analysis (even though the scaling part is very easy if you stick with immutable infrastructure & blue/green deployments like what Boxfuse offers).
Your focus on vertical server scaling might not serve you well on AWS. I would start thinking about horizontal scaling of app servers behind an Elastic Load Balancer, and possibly look into Elastic Beanstalk.
I'm not sure you can setup a read replica in another region via RDS, you might have to set that up via MySQL servers running on standard EC2 instances. And even if you can, that's going to be some expensive and high-latency data transfer.
If file uploads and downloads are all you are worried about, you just need to put CloudFront (Amazon's CDN service) in front of your application, and allow it to handle file uploads and downloads via its global edge servers. You could even do this without moving your entire application into AWS. I would recommend reading this blog post as a start.

Is it a good idea to write a gateway for PostgreSQL in Erlang?

I'm writing a web application in Erlang, and want to store my data to PostgreSQL.
There're two kinds of resources in my application. One kind is very important while the other one is not that important.
For the important one, no data loss is allowed.
For the less important one, data loss due to system failure is ok.
I want to gain maximum efficiency and came up with such an idea: write a gateway for PostgreSQL. The gateway is a gen_server, and business logic (BL) parts can talk to the gateway for storing resources.
For storing important resources, BL parts send the resources to be stored to the gateway, and block to receive a message (success or failure), and finally respond to the user with a web page.
For storing less important resources, BL parts only send the resources to the gateway without blocking. After sending the resources, BL parts respond with a web page directly.
What I'm expecting from this idea is less seconds per request, since most of the resources are less important ones. But I wonder if this is a good idea, or in other words, can I really get what I'm expecting?
Please answer according to your experience or some reliable "web search results". Thanks. :-)
I can see two problems with your proposal:
All messages sent to the gen_server gateway will be serialized (blocking or non-blocking)
If the gen_server gateway process crashes, you will lose at least the messages in the process mailbox.
What I would do is to create a helper module (not a process) that will be responsible for the database interactions. This process would use a postgresql library that supports connection pool (so the calls to DB can have some parallelism).
If you wish to do non-blocking DB operations for the less important resources, just spawn a process to do the DB interaction and move on.
Some links to postgresql erlang libraries (I haven't used any):
Postgresql connection pooling in Erlang
http://zotonic.com/page/519/epgsql-postgresql-driver

Interprocess messaging - MSMQ, Service Broker,?

I'm in the planning stages of a .NET service which continually processes incoming messages, which involves various transformations, database inserts and updates, etc. As a whole, the service is huge and complicated, but the individual tasks it performs are small, simple, and well-defined.
For this reason, and in order to allow for easy expansion in future, I want to split the service into several smaller services which basically perform part of the processing before passing it onto the next service in the chain.
In order to achieve this, I need some kind of intermediary messaging system that will pass messages from one service to another. I want this to happen in such a way that if a link in the chain crashing or is taken offline briefly, the messages will begin to queue up and get processed once the destination comes back online.
I've always used message queuing for this type of thing, but have recently been made aware of SQL Service Broker which appears to do something similar. Is SQLSB a viable alternative for this scenario and, if so, would I see any performance benefits by using that instead of standard Message Queuing?
Thanks
It sounds to me like you may be after a service bus architecture. This would provide you with the coordination and fault tolerance you are looking for. I'm most familiar and partial to NServiceBus, but there are others including Mass Transit and Rhino Service Bus.
If most of these steps initiate from a database state and end up in a database update, then merging your message storage with your data storage makes a lot of sense:
a single product to backup/restore
consistent state backups
a single high-availability/disaster recoverability solution (DB mirroring, clustering, log shipping etc)
database scale storage (IO capabilities, size and capacity limitations etc as per the database product characteristics, not the limits of message store products).
a single product to tune, troubleshoot, administer
In addition there are also serious performance considerations, as having your message store be the same as the data store means you are not required to do two-phase commit on every message interaction. Using a separate message store requires you to enroll the message store and the data store in a distributed transaction (even if is on the same machine) which requires two-phase commit and is much slower than the single-phase commit of database alone transactions.
In addition using a message store in the database as opposed to an external one has advantages like queryability (run SELECT over the message queues).
Now if we translate the abstract terms 'message store in the database as being Service Broker and 'non-database message store' as being MSMQ, you can see my point why SSB will run circles any time around MSMQ.
My recent experiences with both approaches (starting with Sql Server Service Broker) led me to the situation in which I cry for getting my messages out of SQL server. The problem is quasi-political but you might want to consider it: SQL server in my organisation is managed by a specialized DBA while application servers (i.e. messaging like NServiceBus) by developers and network team. Any change to database servers requires painful performance analysis from DBA and is immersed in fear that we might get standard SQL responsibilities down by our queuing engine living in the same space.
SSSB is pretty difficult to manage (not unlike messaging middleware) but the difference is that I am more allowed to screw something up in the messaging world (the worst that may happen is some pile of messages building up somewhere and logs filling up) and I can't afford for any mistakes in SQL world, where customer transactional data live and is vital for business (including data from legacy systems). I really don't want to get those 'unexpected database growth' or 'wait time alert' or 'why is my temp db growing without end' emails anymore.
I've learned that application servers are cheap. Just add message handlers, add machines... easy. Virtually no license costs. With SQL server it is exactly opposite. It now appears to me that using Service Broker for messaging is like using an expensive car to plow potato field. It is much better for other things.