NOTE: I have never done this before:
What are some steps or documentation to help normalize tables/views in a database? Currently, there are several tables and views in a database that do not use primary/foreign key concept and sort of repeats same information in multiple tables.
I'd like to clean this up and also somewhat setup a process that would keep relationship updated. Example, if a person zipcode changes or record is removed then it automatically updates its relationship with other tables row/s.
NOTE:* My question is to normalize existing database tables. The tables are live so how do I approach normalization? Do I create a brand new database with table structure I want and then move data to that database? Once data moved, I plug in stored procedures and imports?
This question is somewhat broad, so I will only explain the concept.
Views are generally used for reporting/data presentation purposes and therefore I would not try to normalise them. Your case may be different.
You also need to be clear about primary / foreign key concept:
Lack of actual constraints (e.g. PRIMARY KEY, FOREIGN KEY) defined on the table does not mean that the tables do not have logical relationships on columns.
Data maintenance can be implemented in Triggers.
If you really have a situation where a lot of highly de-normalised data exists in tables for no apparent reason and you want to normalise it then this problem can be approached in two ways:
Full re-write - I would recommend for small / new Apps
"Gradual" re-factoring - large / mature applications, where underlying data relationships are complex and / or may not be fully understood.
Within "Gradual" re-factoring there are a few ways as well:
2.a. You take 1 old table and replace it with a new table and at the same time change all code that uses the old table to use the new table. For large systems this can be problematic as you simply may not be aware of all places that reference this table. On the other hand, it may be useful for situations where the table structure change is not significant and/or when the number of dependencies is small.
2.b. Another way is to create new table(s) (in the same database) in the shape / form you desire. The current tables should be replaced with Views that return identical data (to old tables) but sourced from "new" tables. This approach removes / minimises the need to modify all dependencies immediately. The drawback is that the View that replaces the old table can become rather complex, especially if View Instead Of Triggers are needed to be implemented.
Related
I have a multi tennant application which will use the SILO Model to save data (each tennant will get an own database).
Because tennant names could be redundand my database are with GUIDs: MyApp_[GUID].
Now I want to save simple but neccesary information for each database like a tennant name and 3 to 5 more informations.
Is there a simple way to write and get these data?
The only way I can think of is to create a special table for this with only 1 row - but it seems a bot of wasting.
If you're looking for a simpler solution than a table per database (and having to deal with the awkward constraint that it must have exactly one row), you could
use a custom configuration parameter. You can change them with ALTER DATABASE. The downside is that you can only store strings, and that the settings might be overridden per session.
use a COMMENT on the database. The downside is that you can only store a single string per databasebase; the advantage is that it is automatically shown in many lists of databases such as psql's \l+ command
add your own columns to the pg_database system table. You should not mess with that, so it's a spectacularly bad idea even if you knew what you were doing, but in a relational model it's the closest to what you were asking for so I'd mention it for completeness.
I don't really advocate any of these solutions, although they do what you were asking for there's probably a better solution to your actual problem. It might be as simple a table of databases, possibly with a foreign key to pg_database, in an extra database shared by all tenants.
We have a large table in our Postgres production database which we want to start "sharding" using foreign tables and inheritance.
The desired architecture will be to have 1 (empty) table that defines the schema and several foreign tables inheriting from the empty "parent" table. (possible with Postgres 9.5)
I found this well written article https://www.depesz.com/2015/04/02/waiting-for-9-5-allow-foreign-tables-to-participate-in-inheritance/ that explains everything on how to do it from scratch.
My question is how to reduce the needed migration of data to a minimum.
We have this 100+ GB table now, that should become our first "shard". And in the future we will regulary add new "shards". At some point, the older shards will be moved to another tablespace (on cheaper hardware since they become less important).
My question now:
Is there a way to "ALTER" an existing table to be a foreign table instead?
No way to use alter table to do this.
You really have to basically do it manually. This is no different (really) than doing table partitioning. You create your partitions, you load the data. You direct reads and writes to the partitions.
Now in your case, in terms of doing sharding there are a number of tools I would look at to make this less painful. First, if you make sure your tables are split the way you like them first, you can use a logical replication solution like Bucardo to replicate the writes while you are moving everything over.
There are some other approaches (parallelized readers and writers) that may save you some time at the expense of db load, but those are niche tools.
There is no native solution for shard management of standard PostgreSQL (and I don't know enough about Postgres-XL in this regard to know how well it can manage changing shard criteria). However pretty much anything is possible with a little work and knowledge.
In order to cut down on "stupid" tables (the ones which are identical for several related parent entities) we made a few generic tables.
Here is an example:
tbl_settings
id
owner_type (e.g. "account", "user" etc.)
owner_id (actual ID of a foreign table record)
setting_name
setting_value
Now the problem are the deletes, where it is quite easy to forget to delete e.g. the user's settings when the user is deleted.
What is the right way to handle deletes for this kind of a table in PostgreSQL?
Do it in an application (e.g. when deleting an user, do a manual delete of related settings)?
Do it in a database trigger on the tbl_user (and in all other parent tables)?
Something else?
If a tables relationships have meaning only in the application and upwards - i.e. it has no bearing on the referential integrity of the data in the database - you can do this in the application layer.
If "orphaned" records violate data (as opposed to business logic) relationships, then do this in the database: the safest way is probably via a trigger, though that has its disadvantages too (e.g. the likelihood of obfuscating DML errors is higher if there is a trigger action involved).
My impression from your question is that these tables are mainly there because of some business logic, in which case I would handle the deletes outisde the database, in an ORM layer, for example.
Here is my setup. I have two schemas: my_app and static_data. The latter is imported from a static dump. For the needs of my application logic, I made views that use the tables of static_data, and I stored them in the my_app schema.
It all works great. But I need to update the static_data schema with a new dump, and have my views use the new data. The problem is, whatever I do, my views will always reference the old schema!
I tried importing the new dump in a new schema, static_data_new, then trying to delete static_data and rename static_data_new to static_data. It doesn't work because my views depend on tables in static_data, therefore PostgreSQL won't let me delete it.
Then I tried setting search_path to static_data_new. But when I do that, the views still reference the old tables!
Is it possible to have views that reference tables using the search_path? Thanks.
Views are bound to the underlying objects. Renaming the object does not affect this link.
I see basically 3 different ways to deal with your problem:
DELETE your views and re-CREATE them after you have your new table(s) in place. Simple and fast, as soon as you have your complete create script together. Don't forget to reset privileges, too. The recreate script may be tedious to compile, though.
Use table-functions (functions RETURNING SETOF rows or RETURNING TABLE) instead of a views. Thereby you get "late binding": the object names will be looked up in the system catalogs at execution time, not at creation time. It will be your responsibility that those objects can, in fact, be found.
The search_path can be pre-set per function or the search_path of the executing role will be effective for objects that are not explicitly schema-qualified. Detailed instructions and links in this related answer on SO.
Functions are basically like prepared statements and behave subtly different from views. Details in this related answer on dba.SE.
Take the TRUNCATE and INSERT route for the new data instead of DELETE and CREATE. Then all references stay intact. Find a more detailed answer about that here.
If foreign keys reference your table you have to use DELETE FROM TABLE instead - or drop and recreate the foreign key constraints. it will be your responsibility that the referential integrity can be restored, or the recreation of the foreign key will fail.
I have some tables which should add to my database every year and name of databases contains the year (like sell2005) and iv'e written some ef queries on these tables ,and queries can only be on a single entity (like sell2005) but what should i do when sell2006 or sell2007 add ? how can i manage them with that single query which iv'e written before?
thank you.
There is no easy way. EF is simply not tool for this scenario. For EF you must have "single table" so you must either use partitioning with one real database table partitioned by year or you must build a view on top of these tables.
The problem is that in EF you have strict relation between classes and tables. You cannot have single class mapped to multiple tables even if they are exactly same (except inheritance which is not solution for you). So the workaround would require to have multiple SSDL/MSL mappings - one for each table and construct correct context instance with correct mapping for every query. As I know dynamic changes of mapping are not possible (except modifying SSDL/MSL files before using them).