SaaS Architecture Question from Newbie - saas

I have developed a number of departmental client-server applications, and am now ready to begin working on moving one of these applications to a SaaS model. I have done some basic web development, but I'm a newbie when it comes to SaaS architectures.
One of the first questions that comes to mind as I try to design the architecture is the question of single vs. multi tenancy. The pros and cons of each vary significantly depending on the type of application and scale required, so I'd like to describe my application and scale needs below, and hope others can comment on how I should get started with the architecture.
The client-server application currently consists of a Firebird database and a Windows application. The database contains about 20 tables containing a few thousand records in 4 primary tables, and a few hundred records in various lookup and related tables. Although the number of records is small, the size can get large, as the database can contain large BLOBS. Each customer sets up their own database and has a handful of users within the organization connected to it. When I update the db schema, a new windows application is released, and it checks the db schema and then applies the updates as needed.
For the SaaS application, I am designing for 100's (not 1000's or millions) of new customers per year. My first thought was to go with a multi tenancy model to make updates easy (shut down apply the updates to one database, and then start up). On the other hand, a single tenancy model would provide a means to roll updates out to a group of customers at a time, and spread the risk of data corruption - i.e. if something goes wrong with a database, it will impact one customer instead of all customers. With this idea, I was thinking of having a single web front-end which would connect to a single customer database upon login. Thus, when a new customer creates an account, a new database would be created (each customer would have their own db with multiple users as needed for the customer).
In this model, a db update would require either a process to go through each db to apply schema changes, or a trigger upon logging in to initiate a schema update similar to the client-server model currently in use.
Can anyone point me to information for similar applications which have been ported from client-server to SaaS? Or provide any pointers to consider? Basically I'm looking for architecture examples of taking a departmental application and making it available as a self service website for multiple customers. Thanks for any suggestions, resources, etc.

Good questions.
One thing that comes to mind is that if you have multiple databases which you roll out in a staged manner to reduce the likelihood of breaking all of your customers, you will have to address the issue of what to do if the db structure changes. You will either have to be very rigorous with respect to maintaining backward compatibility, or else deploy separate versions of your code base and somehow manage which tenants are associated with which databases.
We are providing our application using a SaaS model as well.
It was, initially a Windows app which worked similar to your multiple database proposal. Upon login, the win app would authenticate against a single "licensee" database which would then respond with connection information for a database specific to that licensee. The nice thing about this was that it provided 1) physical separation of licensee data, which our customers liked and 2) enabled us to physically locate the database on a server geographically closer to the users which both improves performance and avoids some potentially tricky legal and regulatory issues with respect to providing data across country boundaries.
Of course, since the app was a thick client app, we could get away with making database changes and pushing them out to one licensee at a time. When we were ready to upgrade, we could push out an updated thick client in conjunction with the new database - thereby ensuring that the codebase was a match with the database. As long as the common "licensee" authentication database stayed consistent, this worked fairly well.
On the other hand, though, this solution brought with it all of the problems of maintaining and managing a thick client approach which finally lead us down the thing client, browser-based approach.
In our new model, everything is in a single database. When we have updates, we push both the code and the db out at the same time. This solves the problem of keeping the code base consistent with the database structure. However, we are now confronted with the issues mentioned in #s 1 and 2, above, which we have yet to resolve.
I hope this provides some food for thought for you.
I, too, am interested in this question.
Thanks for the post.
-S

Related

How secure is this security method in postgresql?

For example I have 2 databases. One of them is called ecommerce which contains real customer information. Another is called ec1 which basically contains only views from tables of ecommerce.
We use our ec1 database to connect to our website or apps. How secure is this method in terms of back end security?
Only exposing ec1 is better than exposing ecommerce because you can reset ec1 using your "safe" values in case of corruption and you can keep some secret data only stored in ecommerce if it doesn't need to be used by your website or your app.
However, this is only a small portion of backend security. Having two different databases with real data and data views doesn't matter a lot if someone can access your server OR can corrupt your data.
I mean, if someone found a way to get some data he should be not authorized to read, it is bad even if it comes from ec1 and not from ecommerce
So yeah, exposing only views is a BETTER solution, but nothing can be said on the overall security because it mainly doesn't depend on that
EDIT: A detailed explaination of backend security is way beyond the possibility of a simple stackoverflow answer (and probably i am not the best teacher) but for basic server security you must take care of:
- Firewall to stop every request but your webapps ones.
- Updated software
- good database passwords
- The user you use for your application queries must only be able to perform operations on ecl1 database, while the views should be generated with a cron and using a different user
These are the main security enhancement tips that comes to my mind

Meteor -MongoDB - Single Database or Multiple Databases for SaaS Offering

This is from another question bust i think it should be answered by the meteor team because i can't find a straight answer so far.
"..We have decided to use MongoDB for a SaaS offering we are creating. Each company that signs up gets their own url (mycompany.domain.com) and their own private set of users, projects, etc... Since we are using a NoSQL solution, and wouldn't have to manage pushing out schema updates to every database like we would with MySQL, I am wondering if it would be better to have one huge database containing all the data, or to have one database per client..."
So, can i have with meteor aproach (with one meteor project/server):
1) Different Url for each company
2) Different database (in the same monodb server) for each company and for that specific company users.
If you look at meteor's own hosting they use a mongodb server from MongoHQ. You could use multiple meteor servers with the single mongodb server and multiple databases.
I would think it depends more on your apps design, Meteor can use either design.
1) You could use the publish functions to provide each client with only his/her own records from one huge DB, use a way to get the subdomain http host into the publish function so it only gives out data for that set.
2) Use seperate meteor instances connecting to their own mongodb database on one server, and use some kind of proxy to server them to the subdomains. You could push each one with whatever data you would like, even perhaps separate app sets.
It would really depend on what you're building. If you want to only have to update one set of data so it updates for everyone you could go with 1), so if your use case requires this it might be a better option to go with.
The benefit of using seperate meteor instances is primarily customization. Its really hard to get the gist of what you want with the details you've given, so ill cut it short: If you want the ability of each client to be very different use 2), otherwise use 1)
If you look at Meteor.com's hosting I think each deployment is given its own database, the main reason: customization, everyones deployment is likely to be completely different.
UPDATE:
As of March 2014, there is a third party atmosphere package meteor-dbproxy that allows you to use multiple mongodb servers (as well as separate oplog integration endpoints) in your backend, thus allowing you db-level sandboxed multi-tenancy.
From a MongoDB point of view, you can do a database per client. The current stable MongoDB version, 2.2, has database level locking opposed to the large global lock of previous versions.
This way, if one of your clients is hammering the system, they don't affect your other clients with a global lock.

neo4j - graph database along with a relational database?

this is a question on best practice, i understand that there are a lot of different options for doing this, but i would like your opinions as to how you would approach solving this problem. Please take it as though performance is critical in this system, in other words scalable.
I have recently found the wonders of graph database, so i came up with a theoretical situation where a company wants to manage it's customers relationships, and in order to do so they are going to use neo4j which is great, and allows for really great management of the customers, different staff members and their relationships, which is all great, however the company now wants to create a web based interface which will need authentication, and anyone in the neo4j database should be able to login to the system in order to see how they are related to other people in the company's database, so each user must have a password/email/id associated with their name.
So my question is, in this case scenario, is it best to store the password_hash/password_salt/id/email in a mysql database and then based on the node look it up on the mysql database. Or is it better to store the password_hash/password_salt/id/email in the hash tables inside the nodes.
Also each store has 1000s of products, and they can be stored in the graph database or i can store the products in the mysql database and then look up the product there, and do the changes there, because the products are not related to each other, so no point in storing them in the graph database, so should they be not stored there to improve performance?
So my question boils down to this: is it best for large projects to use a graph database along with the more common rdms database such as mysql? if not, then what is the point at which you start to use these two database systems?
apologies in advance for my lack of knowledge regarding database terminology.
Graph DB is mainly used for maintaining relations. If app has a graph DB that does not mean that app needs to store everything in Graph DB.
Every node request on Graph is in memory and thus if you have unnecessary properties in your node it will be bloated and may make things slower and take more memory.I usually decide what needs to go in graph and what needs to go in DB by very simple rule.
High level property (that defines the relation and other important properties that defines the node) goes in graph whereas additional information goes in RDMS.
For example in FB may be FBID, Name goes in Graph as it defines the relationship of one node with another. But when user clicks on someones facebook ID, he/she gets to see other users DOB, Age , College .All these can go in RDBMS.
PS: RDMS has another advantage, it can be used for quick analytics. I know with graph also you can do that but i am not sure if its as scalable and easy as RDBMS.
Downside to this approach is : You need to maintain two DBS.
Unless you have a proven case for a two-DB solution, I'd say fewer moving parts would keep you more agile, more able to change things quickly. If later you find a use case that is difficult, then weigh up the cost/ benefit of introducing a second storage. A two-DB architecture is not unheard of, but comes with an overhead.
Specific to security, there is no reason why Neo4j or any other reasonable NOSQL solution couldn't do that: http://spring.neo4j.org/docs#tutorial_security
You should use both in case there is data where it does not make much sense to store it in a graph DB such as neo4j/orientDB (and some data would be better off in a graph DB as opposed to a relational DB). Forcing data on one platform may cause issues with performance/scalability down the line.

SOA: Joining data across multiple services

Imagine we have 2 services: Product and Order. Based on my understanding of SOA, I know that each service can have its own data store (a separate database, or a group of tables in the same database). But no Service is allowed to touch the data store of another Service directly.
Now, imagine we have stored the product and order data independently inside Product and Order Services. In the Order Service, we can identify products by their ID.
My question is: With this architecture, how can I display the list of orders and product details on the "same" page?
My understanding is that I should get the list of OrderItems from OrderService. Each OrderItem has a ProductID. Now, if I make a separate call to ProductService to retrieve the details about each Product, that would be very inefficient.
How would you approach this problem?
Cheers,
Mosh
I did some research and found 2 different solutions for this.
1- Services can cache data of other Services locally. But this requires a pub/sub mechanism, so any changes in the source of the data should be published so the subscribing Services can update their local cache. This is costly to implement, but is the fastest solution because the Service has the required data locally. It also increases the availability of a Service by preventing it from being dependent to the data of other Services. In other words, if the other Service is not available, it can still do its job by its cache data.
2- Alternatively, a Service can query a "list" of objects from another Service by supplying a list of identifiers. This prevents a separate call to be made to the target service to get the details of a given object. This is easier to implement but performance-wise, is not as fast as solution 1. Also, in case the target Service is not available, the source Service cannot do its job.
Hope this helps others who have come across this issue.
Mosh
DB integration (which is really what you are talking about when two services share a table in a DB) is wrong at so many levels!
It completely breaks some of the major principles of software engineering
loose coupling,
encapsulation
separation of concerns
A service should be (to earn that name) completely independent, namely:
it must not rely on others to ensure the consistency and coherence of its data
it must not rely on others to guaranty the security of its data
it must not depend on external implementations (only interfaces)
Two services that share data at the DB level are unable to guarantee any of the former.
The fact that you "control" both services is completely irrelevant. Today you control... tomorrow you might want to outsource or replace one of the services. That should be as simple as ensuring the proper interfaces are in place.
Imagine both services that share a table with some field (varchar) in it. Now one service needs to change that field to numeric... bang the other service stops functioning - loose coupling goes down the drain.
Most of the time the trick lies in properly defining the service scope and clearly stipulating what a service do and what it doesn't do. You should also avoid turning everything into a service. Set your service granularity to high and services will start popping everywhere and integration headaches will escalate.
That being said, there are some situations where data integration between services poses some challenges. The main premiss do, should always be - data can belong only to one service. Data is intrinsically tied to business logic that affects data consistency and coherence and as such there should never be more than one service controlling any given data.
Another approach would be to have some sort of data source that lives outside of the SOA services. This data source could be considered your cache of the data, your operational data source or even a data warehouse. Extraction packages can export the data from the services (and/or some sort of real time mechanism). You can query this data source how you want.
The advantage of this approach is that the SOA black box is maintained and you can swap out a service knowing how you have coupled it.
Disadvantage is the added complexity and maintenance overhead.
SOA is just a buzz-phrase for deploying components behind web services. How many data stores you have is entirely up to you. In some cases it makes sense to have partitioned data behind individual components, and in other cases all the data lives behind one service, and in other cases many components that expose service interfaces connect to the same database via the database's connection protocol. Approach the problem by approaching the problem, not be imposing artificial constraints.
I don't think there is any principle in SOA that services should have separate data store. In general it is actually impractical. Yes you can have product and order service and the client can do the join using web service call as you said and this may be acceptable in some scenario. But that doesn't mean that you cannot have a specific service for a client if you already know the client's behaviour and performance requirements.
What I mean is that you should have a search service that returns orders and products with the join done in database. This is practical and would solve your business problem.
It is unfortunate to see this whole discussion being deteriorating in a "can I use a shared database or not in SOA" statement quest, which is totally irrelevant and does not help answering the original question at all.
More then often in a real world situation the data is already stored in different systems to start with. Customer data for example is coming from CRM, product data from SAP, contract data from yet another different source.
It is not a quest for getting this data technically together, rather than an understanding there is only one source of the data. To differently put it, there is only one owner of the data within your enterprise, who is solely responsible for maintaining it and ensuring the correct data quality.
Storing data locally for performance reasons means replicating data, which is more than often a dangerous venture, unless you have a solid caching strategy in place. I think Mosh has given some sensible answers when faced with an existing application landscape.

Strategies for "Always-Connected" Windows Client Data Architecture

Let me start by saying: this is my 1st post here, this is a bit lenghty, and I havent done Windows Forms development in years....with that in mind please excuse me if this isn't directly a programming question and please bear with me as I really need the help!!
I have been asked to develop a Windows Forms app for our company that talks to a central (local area network) Linux Server hosting a PostgreSQL database. The app is to allow users to authenticate themselves into the system and thereafter conduct the usual transactions with the PG database. Ordinarily, I would propose writing a webforms app against Mono, but the clients need to utilise local resources such as USB peripheral devices, so that is out of the question. While it might not seem clear, my questions are italised below:
Dilemma #1:
The application is meant to be always connected. How should I structure my DAL/BLL - Should this reside on the server or with the client?
Dilemma #2:
I have been reading up on Client Application Services (CAS), and it seems like a great fit for authentication, as everything is exposed via URIs. I know that a .NET Data Provider exists for PostgreSQL, but not too sure if CAS will all work on a Linux (Debian) server? Believe me, I would get my hands dirty and try myself, but I need to come up with a logical design first before resources are allocated to me for "trial purposes"!
Dilemma #3:
If the DAL/BLL is to reside on the server, is there any way I can create data services, and expose only these services to authenticated clients. There is a (security) requirement whereby a connection string with username and password to the database cannot be present on any client machines...even if security on the database side is quite rigid. I'm guessing that the only way for this to work would be to create the various CRUD data service methods that are exposed by an ASP.NET app, and have the WindowsForms make a request for data or persist data to the ASP.NET app (thru a URI) and have that return a resultset or value. Would I be correct in assuming this? Should I be looking into WCF Data Services? and will WCF work with a non-SQL Server database?
Thank you for taking the time out to read this, but know that I am desperately seeking any advice on this! THANKS A MILLION!!!!
EDIT:
I am considering also using NHibernate as my ORM
Some parts of your questions are complicated and beyond my expertise. However, in general you can do almost anything you put effort into, CAP theorem and the like aside.
DAL/BLL stuff in general can reside in any of the tiers. I put a lot of this in my database and some in the middle tier, however this is to allow re-use in different environments which may or may not be a goal for you. The thing is I would think through carefully the separation of concerns issues here and what sorts of centralization of logic you want to place. The further back, the more re-usable this becomes but this is not always a free tradeoff.
I am not entirely familiar with CAS but it looked like AJAX kinds of stuff from what I saw on the MSDN web site. That could be wrong, but if it is right, then you have an issue in that such requests may be stateless and this could be an issue if you need a constant connection.
On the whole based on what you are saying it sounds cleanest to do a two tier rather than a three tier app, and have the DAL/BLL sit on the client, possibly supported by stored procedures in the server. You can then set PostgreSQL up to authenticate against whatever you use on your network (KRB5 if AD is what I would recommend). This simplifies your data access, and it allows you to control permissions based on the authentication against the database. Since you can authenticate users based on AD, you can then set permissions accordingly.
One important consideration is going to be number of connections. PostgreSQL does have some places where every current connection must be checked and iterated through, and connection startup and tear-down overhead in some cases can be significant. So one important decision will involve connection pooling. Whether or not you use connection pooling to boost performance will depend on what you are doing but I have seen cases where PostgreSQL has handled 600 connections without serious problems.