Google CloudSQL - Instance per DB or single instance for all DBs? - google-cloud-sql

Trying to figure out what would be better:
Multiple instances, one per DB
or
Single large instance which will hold multiple DBs inside
The scenario is similar to Jira Cloud where each customer has his own Jira Cloud server, with its own DB.
Now the question is, will it be better to manage all of the users' DBs in 1 large instance, or to have a DB instance for each customer?
What would be the cons and pros for the chosen alternative?
The first thing that came to or minds is backup management - Will we be able to recover a specific customer's DB if it resides on the same large instance as all other DBs?
Similar question, but in a different scenario and other requirements - 1-big-google-cloud-sql-instance-2-small-google-cloud-sql-instances-or-1-medium

This answer is based on a personal opinion. It is up to you to decide how you want to build your database. However, it is better to go with multiple smaller Cloud SQL instances as it is also stated in Cloud SQL > Best practices documentation.
PROS of multiple databases
It is easier to manage smaller instances rather than big instances. (In documentation provided above)
You can choose the region and zone for each database, so if your customers are located in different geographical locations, you can always choose the closest for them zone for the Cloud SQL instance and this way you will reduce the latency.
Also if you are planning to have a lot of databases, with a lot of tables in each database and a lot of records in each table, this means that the instance will be huge. Therefore the backup, creating read replicas or fail-over replicas and maintaining them, will take some time after the databases will begin to expand.
Although, I would suggest, if you have multiple databases per user, have them inside one Cloud SQL instance that so you manage one Cloud SQL instance per user. e.g. You have 100 users and User1 has 4 databases, User2 has 6 databases etc. Create 100 Cloud SQL instances instead of having one Cloud SQL instance per databases, otherwise you will end up with a lot of them and it will be hard to manage multiple instances per user.

Related

Can I have multi schema DB on a single AWS Aurora PostgreSQL instance?

We're working on a multi-tenant SaaS solution that has 4 different subscription Plans, so we want to have their data isolated in a single cluster DB (everybody sharing a single logical instance) but with a different schema per tenant (plan), which seems to be better for us from a cost point of view. How should we create it?
Any clue will be welcome!

Postgres architecture for one machine with several apps

I have one machine on which several applications are hosted. Applications work on separated data and don't interact - each application only needs access to its own data. I want to use PostgreSQL as RDBMS. Which one of the following is best and why?
One global Postgres sever, one global database, one schema per application.
One global Postgres server, one database per application.
One Postgres server per application.
Feel free to suggest additional architectures if you think they would be better than the ones above.
The questions you need to ask yourself: does any application ever need to access data from another application (in the same SQL statement). If you can can answer that with a clear NO, then you should at least go for separate databases. Cross-database queries aren't that straight-forward in Postgres, so if the different applications do need a lot of data from other applications, then solution 1 might be deployment layout to think about. If this would only concern very few tables, then using foreign data wrappers with different databases might still be a better solution.
Solution 2 and 3 are more or less the same from the perspective of each application. One thing to keep in mind when deciding between 2 and 3 is availability. Some configuration changes to Postgres require a restart of the service. Is an outage of all applications acceptable in that case, even though the change was only necessary for one?
But you can always start with option 2 and then move database to different servers later.
Another question to ask is if all applications always use the same (major) Postgres version With solution 2 you must make sure that all applications are compatible with a new Postgres version if one of them wants to upgrade e.g. because of new features that the application wants to use.
Solution 1 is stupid : a SQL schema is not a database. Use SQL schema for one application that have multiple "parts" like "Poduction", "sales", "marketing", "finances"...
While the final volume of the data won't be too heavy and the number of user won't be too much, use only one PG cluster to facilitate administration tasks
If the volume of data or the number of user increases, it will be time to separates your different databases on new distinct PG clusters....

Does AWS Redshift support a replica?

We are working with AWS Redshift DB and would like to create an online replicate (such that is fully updated with changes as well)?
The reason is that we wish to have a separate environment for one of our departments to run their own queries, and as they might "go crazy" and do some super-complex queries (they need free style, and I can't control what will they do), I don't want it to overload the main Redshift and take up all the resources for my main users. A replica will solve it as it will create an environment of their own.
No, you can not have replication between two redshift clusters like we do in mysql.
You can can use WLM in redshift and create query queues.
create user groups and query groups assign more cpu/more memory and more concurrency to production users and assign less concurrency and less memory to your department users so that your production users will not be affected.

Postgres Multi-tenant administration/maintenance

We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex

MongoDB - Single Database or Multiple Databases for SaaS Offering

We have decided to use MongoDB for a SaaS offering we are creating. Each company that signs up gets their own url (mycompany.domain.com) and their own private set of users, projects, etc... Since we are using a NoSQL solution, and wouldn't have to manage pushing out schema updates to every database like we would with MySQL, I am wondering if it would be better to have one huge database containing all the data, or to have one database per client.
Since MongoDB can shard the database across multiple servers, I'm thinking there wouldn't be a huge performance hit if we had a giant database, but I also think backups and exporting data would be much easier if there was one database per client. Any thoughts?
Go with one but make sure to take advantage of some sort of replication for backup purposes!
Look into sharding or look into replica-sets.