We are working with AWS Redshift DB and would like to create an online replicate (such that is fully updated with changes as well)?
The reason is that we wish to have a separate environment for one of our departments to run their own queries, and as they might "go crazy" and do some super-complex queries (they need free style, and I can't control what will they do), I don't want it to overload the main Redshift and take up all the resources for my main users. A replica will solve it as it will create an environment of their own.
No, you can not have replication between two redshift clusters like we do in mysql.
You can can use WLM in redshift and create query queues.
create user groups and query groups assign more cpu/more memory and more concurrency to production users and assign less concurrency and less memory to your department users so that your production users will not be affected.
Related
We're working on a multi-tenant SaaS solution that has 4 different subscription Plans, so we want to have their data isolated in a single cluster DB (everybody sharing a single logical instance) but with a different schema per tenant (plan), which seems to be better for us from a cost point of view. How should we create it?
Any clue will be welcome!
I am architecting a database where I expected to have 1,000s of tenants where some data will be shared between tenants. I am currently planning on using Postgres with row level security for tenant isolation. I am also using knex and Objection.js to model the database in node.js.
Most of the tutorials I have seen look like this where you create a separate knex connection per tenant. However, I've run into a problem on my development machine where after I create ~100 connections, I received this error: "remaining connection slots are reserved for non-replication superuser connections".
I'm investigating a few possible solutions/work-arounds, but I was wondering if anyone has been able to make this setup work the way I'm intending. Thanks!
Perhaps one solution might be to cache a limited number of connections, and destroy the oldest cached connection when the limit is reached. See this code as an example.
That code should probably be improved, however, to use a Map as the knexCache instead of an object, since a Map remembers the insertion order.
Trying to figure out what would be better:
Multiple instances, one per DB
or
Single large instance which will hold multiple DBs inside
The scenario is similar to Jira Cloud where each customer has his own Jira Cloud server, with its own DB.
Now the question is, will it be better to manage all of the users' DBs in 1 large instance, or to have a DB instance for each customer?
What would be the cons and pros for the chosen alternative?
The first thing that came to or minds is backup management - Will we be able to recover a specific customer's DB if it resides on the same large instance as all other DBs?
Similar question, but in a different scenario and other requirements - 1-big-google-cloud-sql-instance-2-small-google-cloud-sql-instances-or-1-medium
This answer is based on a personal opinion. It is up to you to decide how you want to build your database. However, it is better to go with multiple smaller Cloud SQL instances as it is also stated in Cloud SQL > Best practices documentation.
PROS of multiple databases
It is easier to manage smaller instances rather than big instances. (In documentation provided above)
You can choose the region and zone for each database, so if your customers are located in different geographical locations, you can always choose the closest for them zone for the Cloud SQL instance and this way you will reduce the latency.
Also if you are planning to have a lot of databases, with a lot of tables in each database and a lot of records in each table, this means that the instance will be huge. Therefore the backup, creating read replicas or fail-over replicas and maintaining them, will take some time after the databases will begin to expand.
Although, I would suggest, if you have multiple databases per user, have them inside one Cloud SQL instance that so you manage one Cloud SQL instance per user. e.g. You have 100 users and User1 has 4 databases, User2 has 6 databases etc. Create 100 Cloud SQL instances instead of having one Cloud SQL instance per databases, otherwise you will end up with a lot of them and it will be hard to manage multiple instances per user.
I'd like to preface this by saying I'm not a DBA, so sorry for any gaps in technical knowledge.
I am working within a microservices architecture, where we have about a dozen or applications, each supported by its Postgres database instance (which is in RDS, if that helps). Each of the microservices' databases contains a few tables. It's safe to assume that there's no naming conflicts across any of the schemas/tables, and that there's no sharding of any data across the databases.
One of the issues we keep running into is wanting to analyze/join data across the databases. Right now, we're relying on a 3rd Party tool that caches our data and makes it possible to query across multiple database sources (via the shared cache).
Is it possible to create read-replicas of the schemas/tables from all of our production databases and have them available to query in a single database?
Are there any other ways to configure Postgres or RDS to make joining across our databases possible?
Is it possible to create read-replicas of the schemas/tables from all of our production databases and have them available to query in a single database?
Yes, that's possible and it's actually quite easy.
Setup one Postgres server that acts as the master.
For each remote server, create a foreign server then you then use to create a foreign table that makes the data accessible from the master server.
If you have multiple tables in multiple server that should be viewed as a single table in the master, you can setup inheritance to make all those tables appear like one. If you can define a "sharding" key that identifies a distinct attribute between those server, you can even make Postgres request the data only from the specific server.
All foreign tables can be joined as if they were local tables. Depending on the kind of query, some (or a lot) of the filter and join criteria can even be pushed down to the remote server to distribute the work.
As the Postgres Foreign Data Wrapper is writeable, you can even update the remote tables from the master server.
If the remote access and joins is too slow, you can create materialized views based on the remote tables to create a local copy of the data. This however means that it's not a real time copy and you have to manage the regular refresh of the tables.
Other (more complicated) options are the BDR project or pglogical. It seems that logical replication will be built into the next Postgres version (to be released a the end of this year).
Or you could use a distributed, shared-nothing system like Postgres-XL (which probably is the most complicated system to setup and maintain)
We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex