maintaining two table structures the same on postgresql - postgresql

We have two tables A and B in PostgreSQL 9.4. We want to ensure that A's columns are always a subset of the columns of B (that is, preventing an ALTER on A from adding/dropping/modifying columns that would make it differ from B). How can this be achieved?
My guess is with a kind of trigger on ALTER (though triggers happen only on CRUD) or a special constraint on A ? (though a Foreign Key on every column seems like an overkill).
(the use case: B is like a shadow of A, it will periodically receive a dump of A's records, so we want to be sure that if the structure of A changes we don't forget to change B accordingly)

As you already know triggers are in CRUD. I think maybe you can create a trigger where you compare both table on each CRUD and raise an alarm. But that only will alert after the change was made and someone play with tableA.
You can create multiple FK for each tableB columns to tableA columns, but that only will inform if you try to delete or rename a columns, If you change field type or add new columns you wont get the alarm.

#Abelisto is right in his comment, a good way to implement this in PostgreSQL is using Inheritance.
Another way is having a master a table and an "updatable VIEW" (or a view with RULEs...)

Related

Migrating `int` to `bigint` in PostgresSQL without any downtime?

I have a database that is going to experience the integer exhaustion problem that Basecamp famously faced back in November. I have several months to figure out what to do.
Is there a no-downtime-required, proactive solution to migrating this column type? If so what is it? If not, is it just a matter of eating the downtime and migrating the column when I can?
Is this article sufficient, assuming I have several days/weeks to perform the migration now before I'm forced to do it when I run out of ids?
Use logical replication.
With logical replication you can have different data types at primary and standby.
Copy the schema with pg_dump -s, change the data types on the copy and then start logical replication.
Once all data is copied over, switch the application to use the standby.
For zero down time, the application has to be able to reconnect and retry, but that's always a requirement in such a case.
You need PostgreSQL v10 or better for that, and your database
shouldn't modify the schema, as DDL is not replicated.
should not use sequence (SERIAL or IDENTITY), as the last used value would not be replicated
Another solution for pre-v10 databases where all transactions are short:
Add a bigint column to the table.
Create a BEFORE trigger that sets the new column whenever a row is added or updated.
Run a series of updates that set the new column from the old one where it IS NULL. Keep those batches short so you don't lock long and don't deadlock much. Make sure these transaction run with session_replication_role = replica so they don't trigger triggers.
Once all rows are updated, create a unique index CONCURRENTLY on the new column.
Add a unique constraint USING the index you just created. That will be fast.
Perform the switch:
BEGIN;
ALTER TABLE ... DROP oldcol;
ALTER TABLE ... ALTER newcol RENAME TO oldcol;
COMMIT;
That will be fast.
Your new column has no NOT NULL set. This cannot be done without a long invasive lock. But you can add a check constraint IS NOT NULL and create it NOT VALID. That is good enough, and you can later validate it without disruptions.
If there are foreign key constraints, things get a little more complicated. You have to drop these and create NOT VALID foreign keys to the new column.
Create a copy of the old table but with modified ID field. Next create a trigger on the old table that inserts new data to both tables. Finally copy data from the old table to the new one (it would be a good idea to distinguish pre-trigger data with post-trigger for example by id if it is sequential). Once you are done switch tables and delete the old one.
This obviously requires twice as much space (and time for copy) but will work without any downtime.

Alter column ignoring dependent views

I have column column with character varying(20) type, i want to increase it to 50
ALTER TABLE table ALTER COLUMN column TYPE character varying(50);
I get an error that view view_name depends on column "column". I wonder how can i alter column without dropping and recreating about 10 depending views?
The answer to your question can be found on this blog
PostgreSQL is very restrictive when it comes to modifying existing
objects. Very often when you try to ALTER TABLE or REPLACE VIEW it
tells you that you cannot do it, because there's another object
(typically a view or materialized view), which depends on the one you
want to modify. It seems that the only solution is to DROP dependent
objects, make desired changes to the target object and then recreate
dropped objects.
It is tedious and cumbersome, because those dependent objects can have
further dependencies, which also may have other dependencies and so
on. I created utility functions which can help in such situations.
The usage is very simple - you just have to call:
select deps_save_and_drop_dependencies(p_schema_name, p_object_name);
You
have to pass two arguments: the name of the schema and the name of the
object in that schema. This object can be a table, a view or a
materialized view. The function will drop all views and materialized
views dependent on p_schema_name.p_object_name and save DDL which
restores them in a helper table.
When you want to restore those dropped objects (for example when you
are done modyfing p_schema_name.p_object_name), you just need to make
another simple call:
select deps_restore_dependencies(p_schema_name,p_object_name);
and the dropped objects will be recreated.
These functions take care about:
dependencies hierarchy
proper order of dropping and creating views/materialized views across hierarchy
restoring comments and grants on views/materialized views
Click here
for a working sqlfiddle example or check this gist for a complete
source code
It's not possible, but a TODO feature. You should create a script that is able to handle such a thing as simple view creations.

On INSERT to a table INSERT data in connected tables

I have two tables that have a column named id_user in common. These two tables are created in my Drupal webpage at some point (that I don't know because I didn't created the Netbeans project).
I checked on the internet and found that probably by adding REFERENCES 1sttable (id_user) to the second table, it should copy the value of the 1sttable (that is always created when a new user arrives) to the id_user value of the 2ndtable (that I don't know at which point is created). Is it correct?
If it's not correct I would like to know a way in pgAdmin that could make me synchronize those tables, or at least create both of them in the same moment.
The problem I have is that the new user has a new row on 1sttable automatically as soon as he registers, while to get a new row on 2ndtable it needs some kind of "activation" like inserting all of the data. What I'm looking for is a way that as soon as there is a new row in the 1sttable, it automatically creates the new row on the other table too. I don't know how to make it more clear (English is not my native language).
The solution you gave me seems clear for the question, but the problem is a little bigger: the two tables presents different kinds of variables, and it should be that they are, one in mySQL, with the user data (drupal default for users), then i have 2 in postgresql, both with the same primary key (id_user):
the first has 118 columns, most of them real integer;
the second has 50 columns, with mixed types.
the web application i'm using needs both this column with all the values NOT EMPTY (otherwise i get a NullPointerException) to work, so what i'm searching for is (i think):
when the user register -inserting his email- in drupal, automatically it creates the two fulfilled columns, to make the web automatically works as soon as the email is stored in mysql. Is it possible? Is it well explained?
My environment is:
windows server 2008 enterprise edition
glassfish 2.1
netbeans 6.7.1
drupal 6.17
postgresql 8.4
mysql 5.1.48
pgAdmin is just the GUI. You mean PostgreSQL, the RDBMS.
A foreign key constraint, like you have only enforces that no value can be used, that isn't present in the referenced column. You can use ON UPDATE CASCADE or ON DELETE CASCADE to propagate changes from the referenced column, but you cannot create new rows with it like you describe. You got the wrong tool.
What you describe could be achieved with a trigger. Another, more complex way would be a RULE. Go with a trigger here.
In PostgreSQL you need a trigger function, mostly using plpgsql, and a trigger on a table that makes use of it.
Something like:
CREATE OR REPLACE FUNCTION trg_insert_row_in_tbl2()
RETURNS trigger AS
$func$
BEGIN
INSERT INTO tbl2 (my_id, col1)
VALUES (NEW.my_id, NEW.col1) -- more columns?
RETURN NEW; -- doesn't matter much for AFTER trigger
END
$func$ LANGUAGE plpgsql;
And a trigger AFTER INSERT on tbl1:
CREATE TRIGGER insaft
AFTER INSERT ON tbl1
FOR EACH ROW EXECUTE PROCEDURE trg_insert_row_in_tbl2();
You might want to read about using Drupal hooks to add extra code to be run when a user is registered. Once you know how to use hooks, you can write code (in a module) to insert a corresponding record in the 2nd table. A good candidate hook to use here would be hook_user for Drupal 6 or hook_user_insert for Drupal 7.
The REFERENCES you read about is part of an SQL command to define a foreign key constraint from the second table to the first. This is not strictly necessary to solve your problem, but it can help in keeping your database consistent. I suggest you read up on database structures and constraints if you want to learn more on this topic.

PostgreSQL: dynamic row values (?)

Oh helloes!
I have two tables, first one (let's call it NameTable) is preset with a set of values (id, name) and the second one (ListTable) is empty but with same columns.
The question is: How can I insert into ListTable a value that comes from NameTable? So that if I change one name in the NameTable then automagically the values in ListTable are updated aswell.
Is there INSERT for this or does the tables has to be created in some special manner?
Tried browsing the manual but without success :(
The suggestion for using INSERT...SELECT is the best method for moving between tables in the same database.
However, there's another way to deal with the auto-update requirement.
It sounds like these are your criteria:
Table A is defined with columns (x,y)
(x,y) is unique
Table B is also defined with columns (x,y)
Table A is a superset of Table B
Table B is to be loaded with data from Table A and needs to remain in sync with UPDATEs on Table A.
This is a job for a FOREIGN KEY with the option ON UPDATE CASCADE:
ALTER TABLE B ADD FOREIGN KEY (x,y) REFERENCES A (x,y) ON UPDATE CASCADE;
Now, not only will it auto-update Table B when Table A is updated, table B is protected against containing (x,y) pairs that do not exist in Table A. If you want records to auto-delete from Table B when deleted from Table A, add "ON UPDATE DELETE."
Hmmm... I'm a bit confused about exactly what you want to do or why, but here are a couple of pointers towards things you might want to take a look at: table inheritance, triggers and rules.
Table inheritance in postgresql allows a table to share the data of a another table. So, if you add a row to the base table, it won't show up in the inherited table, but if you add a row to the inherited table, it will now show up in both tables and updates in either place will reflect it in both tables.
Triggers allow you to setup code that will be run when insert, update or delete operations happen on a table. This would allow you to add the behavior you describe manually.
Rules allow you to setup a rule that will replace a matching query with an alternative query when a specific condition is met.
If you describe your problem further as in why you want this behavior, it might be easier to suggest the right way to go about things :-)

How do I INSERT and SELECT data with partitioned tables?

I set up a set of partitioned tables per the docs at http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html
CREATE TABLE t (year, a);
CREATE TABLE t_1980 ( CHECK (year = 1980) ) INHERITS (t);
CREATE TABLE t_1981 ( CHECK (year = 1981) ) INHERITS (t);
CREATE RULE t_ins_1980 AS ON INSERT TO t WHERE (year = 1980)
DO INSTEAD INSERT INTO t_1980 VALUES (NEW.year, NEW.a);
CREATE RULE t_ins_1981 AS ON INSERT TO t WHERE (year = 1981)
DO INSTEAD INSERT INTO t_1981 VALUES (NEW.year, NEW.a);
From my understanding, if I INSERT INTO t (year, a) VALUES (1980, 5), it will go to t_1980, and if I INSERT INTO t (year, a) VALUES (1981, 3), it will go to t_1981. But, my understanding seems to be incorrect. First, I can't understand the following from the docs
"There is currently no simple way to specify that rows must not be inserted into the master table. A CHECK (false) constraint on the master table would be inherited by all child tables, so that cannot be used for this purpose. One possibility is to set up an ON INSERT trigger on the master table that always raises an error. (Alternatively, such a trigger could be used to redirect the data into the proper child table, instead of using a set of rules as suggested above.)"
Does the above mean that in spite of setting up the CHECK constraints and the RULEs, I also have to create TRIGGERs on the master table so that the INSERTs go to the correct tables? If that were the case, what would be the point of the db supporting partitioning? I could just set up the separate tables myself? I inserted a bunch of values into the master table, and those rows are still in the master table, not in the inherited tables.
Second question. When retrieving the rows, do I select from the master table, or do I have to select from the individual tables as needed? How would the following work?
SELECT year, a FROM t WHERE year IN (1980, 1981);
Update: Seems like I have found the answer to my own question
"Be aware that the COPY command ignores rules. If you are using COPY to insert data, you must copy the data into the correct child table rather than into the parent. COPY does fire triggers, so you can use it normally if you create partitioned tables using the trigger approach."
I was indeed using COPY FROM to load data, so RULEs were being ignored. Will try with TRIGGERs.
Definitely try triggers.
If you think you want to implement a rule, don't (the only exception that comes to mind is updatable views). See this great article by depesz for more explanation there.
In reality, Postgres only supports partitioning on the reading side of things. You're going to have setup the method of insertition into partitions yourself - in most cases TRIGGERing. Depending on the needs and applicaitons, it can sometimes be faster to teach your application to insert directly into the partitions.
When selecting from partioned tables, you can indeed just SELECT ... WHERE... on the master table so long as your CHECK constraints are properly setup (they are in your example) and the constraint_exclusion parameter is set corectly.
For 8.4:
SET constraint_exclusion = partition;
For < 8.4:
SET constraint_exclusion = on;
All this being said, I actually really like the way Postgres does it and use it myself often.
Does the above mean that in spite of
setting up the CHECK constraints and
the RULEs, I also have to create
TRIGGERs on the master table so that
the INSERTs go to the correct tables?
Yes. Read point 5 (section 5.9.2)
If that were the case, what would be
the point of the db supporting
partitioning? I could just set up the
separate tables myself?
Basically: the INSERTS in the child tables must be done explicitly (either creating TRIGGERS, or by specifying the correct child table in the query). But the partitioning
is transparent for SELECTS, and (given the storage and indexing advantages of this schema) that's the point.
(Besides, because the partitioned tables are inherited,
the schema is inherited from the parent, hence consistency
is enforced).
Triggers are definitelly better than rules.
Today I've played with partitioning of materialized view table and run into problem with triggers solution.
Why ?
I'm using RETURNING and current solution returns NULL :)
But here's solution which works for me - correct me if I'm wrong.
1. I have 3 tables which are inserted with some data, there's an view (let we call it viewfoo) which contains
data which need to be materialized.
2. Insert into last table have trigger which inserts into materialized view table
via INSERT INTO matviewtable SELECT * FROM viewfoo WHERE recno=NEW.recno;
That works fine and I'm using RETURNING recno; (recno is SERIAL type - sequence).
Materialized view (table) need to be partitioned because it's huge, and
according to my tests it's at least x10 faster for SELECT in this case.
Problems with partitioning:
* Current trigger solution RETURN NULL - so I cannot use RETURNING recno.
(Current trigger solution = trigger explained at depesz page).
Solution:
I've changed trigger of my 3rd table TO NOT insert into materialized view table (that table is parent of partitioned tables), but created new trigger which inserts
partitioned table directly FROM 3rd table and that trigger RETURN NEW.
Materialized view table is automagically updated and RETURNING recno works fine.
I'll be glad if this helped to anybody.