Multiple database in EF6 - entity-framework

We are involved in quite a new development in which we are remaking our current web shop platform.
In the current platform we do not use EF6 neither other ORM but store procedures to access to the db, but in the new building is what we do.
We have a doubt regarding database design of the new platform. In the current platform we use several different databases depending on the content of them.
For example, we have dedicated databases to store information for products catalogs other dedicated db for handling orders.
Currently all data access is done through stored procedures, so we have no problem with the links between different databases.
The problem appears to us now when we have started to use EF6. In this case each DB is associated with a context and it is not possible to know data from one context to another
unless we implement directly in the source code these relationships using various contexts. It looks like these means we will lose the power of EF6.
The questions we have are:
Is it a bad design maintaining different databases for the same application using EF6?
in case this is a poor design and choosing for a single database, is the performance going to be optimum even driving hundreds of tables (almost 1000) with several TBytes of information?
in the other hand, in the case of opting for the design in which several bbdd appear (it would be much better in our case), what is the best way to handle them EF6?
Thank you very much for your help!

First of all EF is not written to be cross database. You can't write cross database (cross context) queries, lazy load does not work and so on.
This is a big limitation in your case.
EF could work with several schema (actually I don't use it and I don't like it but is just my opinion).
You can use your stored procedures with EF but as I understand you are thinking to stop to use them.
In my experience I wrote several applications with more than one database but the use of the different databases was very limited. In this cases I use cross database views (i.e. one database per company and some common tables with views in company databases that selects data in common tables). In your case, if the tables are sharded everywhere I don't think this is a way you can choose.
So, in my opinion you could change the approach.
If you have backups problems you could shard the huge tables (I think facts tables and tables with pictures) and create cross database views. BTW, also, cross database referential integrity is not supported in SQL Server so you need to write triggers to check it.
If you need to split different application functions (i.e. WMS, CRM and so on) you can use namespaces without bothering about how tables are stored in the DB.

Related

Asp Net Boilerplate - Setup Schema-Per-Tenant Multitenancy (EntityFrameworkCore & PostgreSQL)

We are looking into using Asp Net Boilerplate. Looks very promising. We love the framework, but we would like to be able to use a per-schema Multitenancy configuration. Instead of sharing the data in the same db & tables, each tenant would "have" a schema, in which the whole database structure would be replicated.
One of our data tables will be quite big (sometimes +1 million entries / tenant), and we were advised that for performance reasons, it's better to keep the number of entries as low as possible. Also, this particular table will be queried & inserted a lot. It would be unrealistic that this table would hold data for 40+ tenants. For that reason, and others, we would prefer to have a distinct schema per tenant.
Our DB is a single PostgreSQL server (might scale up to more in the future). We use EntityFramework & Npgsql. We already noticed that it is possible to set up a different ConnectionString for specific tenants that would have bigger data requirements.
http://www.summa.com/blog/2013/09/17/approaches-to-multi-tenancy See separate schema per tenant
Any idea on how to acheive a schema-per-tenant multitenancy? There's a lot of moving parts in this, I'm not sure where to start.

Postgres Multi-tenant administration/maintenance

We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex

MongoDB: throw everything into the same database?

Currently, all my collections are maintained in a single database.
I'm a little confused on when I should separate my collections into multiple databases, as some of the collections aren't necessarily related.
multiple databases:
can refine security permissions
separation of concerns
single database
easy
There are a set of tables I access all the time, and a set of tables I access about once a month. It makes some sense to open a persistent connection to a database containing my always-used tables, and open a connection to a database containing the sparsely-used tables when needed.
But is there any performance difference to having all my data in the same database? Is there any general rule-of-thumb to when to use multiple databases (other than production, development, etc.)
Check here for a similar question with some useful, more in-depth answers: Is it better to use multiple databases when you are managing independent sets of things in MongoDB?

Enity Framework for existing enterprise database

I am working on creating a service layer for a large sql server database (2008 R2) that is currently the backend for a winforms POS application with strongly typed datasets.
I think WCF is the way to go, and at first glance it seemed EF 4 was a good choice but now I'm having my doubts. Here is what I have found:
The stored procedure mapping isn't that great. I have hundreds of stored procs that I want to reuse. Most of them wouldn't return an 'entity' so the stored procs would have to be mapped to a complex type. Many of the procs use dynamic sql or temp tables so EF can't figure out what complex type to crete. Many of the procs return multiple result sets. I've read that EF extensions have a way to map stored procs with multiple result sets, but only for entities, so that doesn't help me.
Large models are a problem. There doesn't seem to be a good way to handle large entity models. The workaround of creating smaller models isn't that desirable and splitting the model loses design support, am I missing something?
EF mappings only go so far. The stored procs that I want to reuse return projections or information from many tables into a result set. There doesn't seem to be a way to map these results into entities, am I wrong? I've read about combining results from 2 table into 1 entity, but that only works if the tables have the same primary key.
Are people using EF in large scale existing databases? If not what would you recommend?
I've used EF on large scale databases, but as you say, the support for SPs as you have got is not great. That's not specifically a failing of EF per-se - ORMs in general work on the same principle and have the same "limitation".
If you have lots of SPs and are mapping them to datasets, you'll have to do lots of work even without SPs in terms of no longer referencing datasets and referencing your domain model types through your system, so you'd need to have some way to map your SPs to your domain model and back anyway.

EF Interitance and DBA Concerns

For a new project, our application developers are wanting to use Entity Framework's table-per-type inheritance model.
We recently showed this functionality and the resulting table schema to our DBA, and he's expressed concerns, and I'm wondering how to address them. Inheritance is an important part of OO, and from a development side, it would be great to have the DB and ORM support this concept natively. This functionality is part of EF, so it's not like we're pulling the design out of left field.
His main concerns are:
We're not using stored procs
The added complexity will make reporting and data updates harder
We've pretty much addressed his stored proc concerns (and we've been using another ORM for 3 years now).
As far as the complexity, I do see his point, but the counterpoints address them (for me):
Reporting should not be performed from transactional tables (we currently do this), views or a transformed reporting DB should be used.
Data updates on a flatter structure can still mess up data -- it's the responsibility of the person updating the data to understand the structure. The schema used by EF's table-per-type inheritance model isn't that complex, but it must be adhered to when doing manual updates.
I know we're not the first to run into DBA concerns over DB-backed model inheritance. How have others convinced their DBA that this is a good model?
His main concerns are not considering real problems with TPT.
You can use stored procedures with TPT if you want.
Data updates are not harder. EF will deal with them and ensure correct order of data modification.
The main problem of TPT are inefficient queries (check comments as well). TPT in EF has real performance problems becuase it makes a lot of left joins and unions even if it doesn't need data from derived tables. Creating any reporting on this data structure and accessing report data through EF is really bad decision.
Edit:
If his concerns are related to other tools working with your database then they are fully legitimate but in the same time it is only about correct documentation of your database structure.