How is database normalization related to ER Modelling??
What comes first??
Or should both be implemented at the same time??
I feel modeling should come first in a highly normalized database design.
Creating the model allows you to think through how the tables will relate to one another and also allows you to envision what tables you'll need to use when writing your join queries.
Using a tool such as MySQL Workbench or Toad Data Modeler , depending on your target database vendor, can even generate SQL commands to build the tables, constraints, and indexes directly from the model. This is useful because it ensures the tables are created exactly as you designed them.
Also, when making changes to the model, some tools like those mentioned above will even allow you to "update" your schema by issuing the necessary statements required to do so.
So in short, for a project with more than one table, I'd always model it first. It also makes it easier for developers to understand how the tables function and relate at a glance rather than having to read through DDL to understand it.
Modeling can even be fun!
A model created with MySQL Workbench:
Hope this helps!
Related
We are involved in quite a new development in which we are remaking our current web shop platform.
In the current platform we do not use EF6 neither other ORM but store procedures to access to the db, but in the new building is what we do.
We have a doubt regarding database design of the new platform. In the current platform we use several different databases depending on the content of them.
For example, we have dedicated databases to store information for products catalogs other dedicated db for handling orders.
Currently all data access is done through stored procedures, so we have no problem with the links between different databases.
The problem appears to us now when we have started to use EF6. In this case each DB is associated with a context and it is not possible to know data from one context to another
unless we implement directly in the source code these relationships using various contexts. It looks like these means we will lose the power of EF6.
The questions we have are:
Is it a bad design maintaining different databases for the same application using EF6?
in case this is a poor design and choosing for a single database, is the performance going to be optimum even driving hundreds of tables (almost 1000) with several TBytes of information?
in the other hand, in the case of opting for the design in which several bbdd appear (it would be much better in our case), what is the best way to handle them EF6?
Thank you very much for your help!
First of all EF is not written to be cross database. You can't write cross database (cross context) queries, lazy load does not work and so on.
This is a big limitation in your case.
EF could work with several schema (actually I don't use it and I don't like it but is just my opinion).
You can use your stored procedures with EF but as I understand you are thinking to stop to use them.
In my experience I wrote several applications with more than one database but the use of the different databases was very limited. In this cases I use cross database views (i.e. one database per company and some common tables with views in company databases that selects data in common tables). In your case, if the tables are sharded everywhere I don't think this is a way you can choose.
So, in my opinion you could change the approach.
If you have backups problems you could shard the huge tables (I think facts tables and tables with pictures) and create cross database views. BTW, also, cross database referential integrity is not supported in SQL Server so you need to write triggers to check it.
If you need to split different application functions (i.e. WMS, CRM and so on) you can use namespaces without bothering about how tables are stored in the DB.
We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex
Our database has about 500 tables we'd like to use in our EF model. Of those I'd be happy to start with 50 or fewer just to get our feet wet after working in plain ADO.net for years.
The problem is, our SQL server contains many thousands of other tables that exist in our database that have been created through the years and many that are dynamically generated. Believe it or not:
select count(*) from INFORMATION_SCHEMA.TABLES
73261
So that's a lot of tables. I have found that pretty much every tool I've tried to design, build or template EF models or entities either hangs or does not return a list of tables. Even SQL Server Object Explorer in VS2012 won't list the tables and instead shows the Tables folder with a little "x" over the icon. So I can't even select a subset of tables.
What options do I have for using EF? Is there a template where I can explicitly define the tables that I want to use entities for? Even with 50 tables, I don't want to hand code each one in an empty EDMX.
Using a Database / Code First approach and avoiding connecting Visual Studio to the database at all (i.e. don't create an edmx, or connect with server explorer) would allow you to do this easily. It does not give you any of the Model First advantages, but I think it sounds like your project would be better served with a Database / Code First approach anyway as:
You have an existing Model, and are not looking to push changes from your EDMX to the DB
You are looking to implement this on a subset of your database
This link has a good summation ( Code-first vs Model/Database-first ) with the caveat that in you case a Database/Code First approach does not have you pushing changes from code to the Database, so the last two bullets under code first apply less, and yours is a Database/Code First hybrid.
With 70k tables I think that any GUI is going to be tricky. When I am saying Database / Code First, I am trying to convey that you are not using the code to create / define and update your Database. Someone may be able to answer this more succinctly / accurately?
I now this is an old question. But for those who land here on a google search. The only tool I have found that actually works with thousands of tables is The Sharp Factory.
It is an ORM. Pretty simple to use. So if you are looking for an ORM that can work with a large number of tables and does not require you to write "POCOS" or "Mappings" or SQL then this is the tool.
You can find it here: The Sharp Factory
I am a DBA. I want to know what advantages my Business Objects developers will get when using EF with SQL Server DB which is fully managed using Foreign keys and Primarkey as and when require. As this is our new project and we have to use EF with SQL Server 2008 R2. We have a plan to use Database First Approach. Can anyone tell me what difference my Business Object developer will experience in case If I define all foreign Key relationships in my DB?
Assuming it's setup correctly, when your developers actually create their objects from the database structure, they'll be able to access any related tables rather easily.
It should also make creation of new objects (rows in the tables) easy, as it then shouldn't be possible to create new items that would break the foreign key relationship.
It's also just plain good practice to correctly setup any foreign keys in the database; I'm not sure of any benefit not to.
As a developer that's had to work with data sources that haven't been setup correctly, I can tell you a correctly setup database structure is an amazing experience for a developer.
(As an aside, as a DBA, you may want to take a look at EF. Also take a look at LINQ, one of the items that they'll be using. In particular, Why LINQ beats SQL may help you get a basic understanding, even if you don't agree with the article title :) )
For a new project, our application developers are wanting to use Entity Framework's table-per-type inheritance model.
We recently showed this functionality and the resulting table schema to our DBA, and he's expressed concerns, and I'm wondering how to address them. Inheritance is an important part of OO, and from a development side, it would be great to have the DB and ORM support this concept natively. This functionality is part of EF, so it's not like we're pulling the design out of left field.
His main concerns are:
We're not using stored procs
The added complexity will make reporting and data updates harder
We've pretty much addressed his stored proc concerns (and we've been using another ORM for 3 years now).
As far as the complexity, I do see his point, but the counterpoints address them (for me):
Reporting should not be performed from transactional tables (we currently do this), views or a transformed reporting DB should be used.
Data updates on a flatter structure can still mess up data -- it's the responsibility of the person updating the data to understand the structure. The schema used by EF's table-per-type inheritance model isn't that complex, but it must be adhered to when doing manual updates.
I know we're not the first to run into DBA concerns over DB-backed model inheritance. How have others convinced their DBA that this is a good model?
His main concerns are not considering real problems with TPT.
You can use stored procedures with TPT if you want.
Data updates are not harder. EF will deal with them and ensure correct order of data modification.
The main problem of TPT are inefficient queries (check comments as well). TPT in EF has real performance problems becuase it makes a lot of left joins and unions even if it doesn't need data from derived tables. Creating any reporting on this data structure and accessing report data through EF is really bad decision.
Edit:
If his concerns are related to other tools working with your database then they are fully legitimate but in the same time it is only about correct documentation of your database structure.