TypeORM: Dynamically set database schema for EntityManager (or repositories) at runtime? - postgresql

Situation:
For our SaaS API we use schema-based multitenancy, which means every customer (~tenant) has its own separate schema within the same (postgres) database, without interfering with other customers. Each schema consists of the same underlying entity-model.
Everytime a new customer is registered to the system, a new isolated schema is automatically created within the db. This means, the schema is created at runtime and not known in advance. The customer's schema is named according to the customer's domain.
For every request that arrives at our API, we extract the user's tenancy-affiliation from the JWT and determine which db-schema to use to perform the requested db-operations for this tenant.
Problem
After having established a connection to a (postgres) database via TypeORM (e.g. using createConnection), our only chance to set the schema for a db-operation is to resort to the createQueryBuilder:
const orders = await this.entityManager
.createQueryBuilder()
.select()
.from(`${tenantId}.orders`, 'order') // <--- setting schema-prefix here
.where("order.priority = 4")
.getMany();
This means, we are forced to use the QueryBuilder as it does not seem to be possible to set the schema when working with the EntityManager API (or the Repository API).
However, we want/need to use these APIs, because they are much simpler to write, require less code and are also less error-prone, since they do not rely on writing queries "manually" employing a string-based syntax.
Question
In case of TypeORM, is it possible to somehow set the db-schema when working with the EntityManager or repositories?
Something like this?
// set schema when instantiating manager
const manager = connection.createEntityManager({ schema: tenantDomain });
// should find all matching "order" entities within schema
const orders = manager.find(Order, { priority: 4 })
// should find a matching "item" entity within schema using same manager
const item = manager.findOne(Item, { id: 321 })
Notes:
The db-schema needs to be set in a request-scoped way to avoid setting the schema for other requests, which may belong to other customers. Setting the schema for the whole connection is not an option.
We are aware that one could create a whole new connection and set the schema for this connection, but we want to reuse the existing connection. So simply creating a new connection to set the schema is not an option.

To answer my own question:
At the moment there is no way to instantiate TypeORM repositories with different schemas at runtime without creating new connections.
So the only two options that a developer is left with for schema-based multi tenancy are:
Setting up new connections to connect with different schemas within the same db at runtime. E.g. see NestJS Request Scoped Multitenancy for Multiple Databases. However, one should definitely strive for reusing connections and and be aware of connection limits.
Abandoning the idea of working with the RepositoryApi and reverting to using createQueryBuilder (or executing SQL queries via query()).
For further research, here are some TypeORM GitHub issues that track the idea of changing the schema for a existing connections or repositories at runtime (similar to what is requested in the OP):
Multi-tenant architecture using schema. #4786 proposes something like this.photoRepository.useSchema('customer1').find()
Handling of database schemas #3067 proposes something like getConnection().changeDefaultSchema('myschema')
Run-time change of schema #4473
Add an ability to set postgresql schema per call #2439
P.S. If TypeORM decides to support the idea discussed in the OP, I will try to update this answer.

Here is a global overview of the issues with schema-based multitenancy along with a complete walkthrough a Github repo for it.
Most of the time, you may want to use Postgres Row Security Policy instead. It gives most of the benefits of schema-based multitenancy (especially on developer experience), without the issues related to the multiplication of connections.

Since commenting does not work for me, here a hint from the documentation of NestJS:
https://docs.nestjs.com/techniques/database#async-configuration
I am not using NestJS but reading the docs at the moment to decide, if it's a fitting framework for us. We have an app where only some modules have multi tenancy with schema per tenant, so using TypeOrmModule.forRootAsync(dynamicCreatedDbConfig) might be an option for me too.
This may help you if you have an interceptor or middleware, which prepares the dynamicCreatedDbConfig data before...

Related

multi-tenancy with sequelize and nest.js

I want to implement a multi-tenant solution where I have one webserver and one database shared across all tenants. Regarding to this blog post from AWS it is "pooled multi tenancy model".
I'm using nest.js and sequelize. If sequelize is not a good fit for this I also could switch to another library like typeORM if necessary.
How can this be implemented? I'm absolutely clueless how I can use a different connection (different database user) for each HTTP request and also I don't know how to set a runtime context variable for the connection in a good way.
What I get currently is that every HTTP requests contains a header tenant-id. This should be used for all queries.
There is also the concept of scopes in sequelize. But this is something that is implemented on the client side and not on the database directly. Also, this is something that is specific to sequelize. I would prefer a solution that is independent from sequelize and maybe more specific to PostgreSQL.
Is there any way to implement this with sequelize? A hint or a basic approach would be sufficient.
That seems that this approach is similar. https://learn.microsoft.com/en-us/microsoft-365/education/deploy/design-multi-tenant-architecture.
I'm studding for create a similar architecture, but i will use the "silo" model or "physical database". I think that at first you need to create a internal database called "catalog" that will contains the information of the user (this user already have a login? if true select this information) where have to contains a previous credentials how tenant-id. About the Sequelize, i guess that is necessary to use RAW queries for create ROLE|GRANT|DATA BASE etc and the MIGRATIONS to create the same DB for each new clients.

JPA: how to map some entities to a different schema of another database instance?

JPA: is there a way to map some entities to a schema of another database instance? e.g.,
#Entity
public class Foo {
}
#Entity
#Table(schema="schema1")
public class Bar {
}
The Bar entity is mapped to the schema1 of the same database instance. Is there a way in JPA to map it to a schema in a remote database instance? It is useful for sharing entities among multiple applications.
Can the "catalog" be used for this purpose?
What do you mean by 'remote database'?
If you use #Table(schema = "myschema", name = "bar"), Hibernate will qualify all queries with the schema name (e.g. SELECT e FROM Bar will ultimately translate to SELECT * FROM myschema.bar). If the database user you're using to connect to the DB has access to myschema.bar (whatever such a DB object is), then the query will work; if not, then the query will fail.
If you mean 'a remote DB that is a separate server', then, of course, you can only connect to the DB using one JDBC connection per persistence context. If that's your scenario, perhaps you should consult the docs of the RDBMS for ways to connect two DB instances (in Oracle, for example, you could use database links and synonyms).
Make sure that you understand the implications, though, as such a solution introduces its own class of problems (including the fact that you suddenly have implicit distributed transactions in your system).
As a side note, I'm not sure how such an approach is 'useful for sharing entities among multiple applications' or why one would even think 'sharing entities among multiple applications' is somehow useful, but I'd seriously think through the idea of integrating multiple application via shared/linked DBs. It usually introduces more problems than it solves.
If I understand well what you mean, you should use two (or more) different persistence context

Create readonly mongoose/mongodb connection or protect schema from modifying/deleting object?

I have an application that connects to an existing database and retrieves some data from it. This app will use this database in read-only mode. Despite it is our code I would like to add 'fool-level' protection from modifying/deleting documents accidentally by other developers/myself in the future. Tried with pre hooks but it looks that there're different remove hooks, query, model, document, etc... But I couldn't achieve consistency in behavior for all types of removing queries, query, model, document, etc...
Is there any appropriate solution to this task?
Create a read-only user and connect through that user:
https://sysadmins.co.za/create-read-only-users-in-mongodb/

Trouble with Multi-Tenant Schema Generator Example

We are attempting to use CFE to generate one schema for each tenant as outlined in the CodeFluent blog post (http://blog.codefluententities.com/2014/12/04/multi-tenant-using-multiple-schema/). In this scenario, we are expecting that each schema generated should be identical and we are using the ICodeFluentPersistence Hook system to identify the company for a user and then properly set the schema to be used. All of that works fine, but when we run the code to generate the multiple schemas (https://github.com/SoftFluent/CodeFluent-Entities/tree/master/Extensions/SoftFluent.MultiTenantGenerator), it is removing the constraints. I then tried to see if there was an issue with my configuration, but running the sample program from GitHub produces the same results. After running the sample program, the Primary key was not present in the contoso schema, even though is was properly defined in the dbo schema (and in the model).
Has anyone used the CFE Multi-Schema generator or have any insight into what the issue may be?
Thanks for your response, but I am not sure that I agree. The whole reason (at least of me) to use the Multi-Tenant generator is to create as many database schemas as needed (one per client) from a single CFE model. The idea that you would lose the constraints in all but one of them didn't feel right so I did a bit more investigation and found the following in "Microsoft SQL Server 2012 Internals" by Kalen Delaney and Craig Freeman (through Google Books):
And in fact was able to do a quick test to prove this out by creating two identical tables with identical PK names:
So it would appear to me that CFE should be able to create the two identical databases from the same model and seems to point to a deficiency in the SQLServer diff engine.
The multi-schema generator loads the model and change it dynamically to modify the schema of the entities. Then it call the standard code production process with only the database producers (SQL Server, Oracle, etc.).
So if you want to generate 2 differents schema (dbo and contoso) against an empty database, the process is the following:
Generate the database for the dbo schema from a blank database
Generate the database for the contoso schema from the previously generated database
Before creating a constraint, the SQL Server diff engine drops the constraint with the same name. In fact SQL Server does not allow 2 constraints to have the same name (I can't find a page on MSDN with more details about that). So in your case the existing PK is dropped when you generate the contoso schema because the name of the PK is the same as the one that exists in the dbo schema. Maybe this can be improved, but the diffs engine tries to generate a code that works for SQL Server 2000 to SQL Server 2016.
Workarounds
You can generate each schema in a different database, so the diffs engine will generate the code you expect. Then you can run the generated scripts on the production database. Not the easiest way but it should work.
You can use the patch producer to replace the name of the schema in the file. For SQL files you should use the SqlServerPatchProducer as explain in the KnowledgeBase:
namespace Sample
{
public class SqlServerPatchProducer : SqlServerProducer
{
public SqlServerPatchProducer()
{
}
protected override void RunProceduresScript()
{
string path = GetPath(Project.DefaultNamespace + "_procedures.sql");
ProduceFrom(path, "before");
SearchAndReplaceProducer.ProducePatches(Project, null, this, null, ProductionFlags, Element);
Utilities.RunFileScript(path, Database, OutputEncoding);
ProduceFrom(path, "after");
}
}
}

Change Schema of Entity Framework

I'm using Entity Framework 5 on ASP MVC 4 web site I'm developing.
Because I am using shared hosting which charge for the number of databases I use I would like to run a test site near my production site.
I have two problems:
1) I use Code First and Database Migration. The migration classes seem to embed the schema dbo inside the name of the tables.
How can I change the schema according to the test/production flag
2) How can I change the schema from which EF select data?
Thank you,
Ido.
Both migration and EF take schema from mapping so if you want to change the schema you must update your mapping to use:
modelBuilder.Entity<MyEntity>().ToTable("MyTable", "MySchema");
and control the value of MySchema from configuration but this is really bad idea. One day you forget to change the value and break your production. Use local database for development and test.
As already said: use identical databases (structurally) for development, test and production.
The goal of schemas is to group database objects, like we do with namespaces in e.g. C#, or to simplify permissions for groups of database objects. Not to identify database stages. By using them for the latter you also make it much harder, if not impossible, to use schema appropriately. See for instance this MSDN white paper.
It is much easier to use some database name conventions to indicate their purpose.