How to check number of rows consistency in PostgreSQL? - postgresql

I have a database for saving various forms. I have a table of forms:
CREATE SEQUENCE seq_formtype;
CREATE TABLE formtype (
id_ft integer NOT NULL DEFAULT nextval('seq_formtype'),
name text
);
I have a table of different input fields in the form:
CREATE SEQUENCE seq_formstruct;
CREATE TABLE formstruct (
id_fs integer NOT NULL DEFAULT nextval('seq_formstruct'),
id_ft integer NOT NULL,
name text,
id_fstype text NOT NULL
);
And finally, I have a table in which I store the results from the form for each trial.
CREATE TABLE results (
id_trial integer NOT NULL,
id_fs integer NOT NULL,
res_value text
);
When I add the result, I want to check whether all inputs from formstruct were inserted - that means that there will be for each entry in formstruct where formtype = typ_trialu (pseudocode) an entry in results.
Now I am not even sure how to check it or where to start. My idea was to create a trigger that would check the consistency after insertion to results (ie after insertion of all inputfield results).

It could be done with trigger(s) after insert statements.
CREATE TRIGGER check_form_types_trigger
AFTER INSERT ON results
FOR EACH STATEMENT
EXECUTE PROCEDURE check_form_types_function();
And, in check_form_types_function (which should be plpgsql) you can raise an exception if your data (as a whole) are not consistent.
But, in the other hand, if you do this, you literally won't be able to insert partial data into results; you will be able to insert only whole data, with a single insert statement. And if you really care about consistency, you should do this check after each update & delete statements too.
Notes:
names like fs, ft, fstype are terrible, consider rename your columns.
consider using SERIALs (instead of just manually set up sequences)
consider using foreign keys

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

SQL function to read null fields and update with a string value

I have an insert that is recording data from a webform and inserting it into my table. I'd like to run an update immediately after my insert that reads the previous insert and finds all the null fields and updates that record's null fields with a string of --
The data-type for all my fields is varchar
I have 20+ forms each with 100+ fields so i'm looking for a function that would be smart enough to read/update the fields that have null values without specifically enumerating/writing out each field for the update statement. This would just take way too long.
Does anyone know of a way to read simply which fields have null values and update any fields that are null to a string, in my case --
IF you can't alter your existing code,I would go with insert trigger...so after every insert,you can check and see the null values and update them like below
create trigger triggername
on table
after insert
as
begin
update t
set t.col1=isnull(i.col1,'--'),
t.col2=isnull(i.col2,'--')
rest of cols
from table t
join
inserted i
on i.matchingcol=t.mtachingcol
end
The issue with above approach is,you will have to check all inserted rows..I would go with this approach only,since filtering many cols with many or clauses is not good for performance
If is to just for display purposes,i would go with view
Instead of update after insert you may try changing table structure.
Set default value of the columns to --. If while insert no value is provided, -- will be inserted automatically.

Getting error for auto increment fields when inserting records without specifying columns

We're in process of converting over from SQL Server to Postgres. I have a scenario that I am trying to accommodate. It involves inserting records from one table into another, WITHOUT listing out all of the columns. I realize this is not recommended practice, but let's set that aside for now.
drop table if exists pk_test_table;
create table public.pk_test_table
(
recordid SERIAL PRIMARY KEY NOT NULL,
name text
);
--example 1: works and will insert a record with an id of 1
insert into pk_test_table values(default,'puppies');
--example 2: fails
insert into pk_test_table
select first_name from person_test;
Error I receive in the second example:
column "recordid" is of type integer but expression is of type
character varying Hint: You will need to rewrite or cast the
expression.
The default keyword will tell the database to grab the next value.
Is there any way to utilize this keyword in the second example? Or some way to tell the database to ignore auto-incremented columns and just them be populated like normal?
I would prefer to not use a subquery to grab the next "id".
This functionality works in SQL Server and hence the question.
Thanks in advance for your help!
If you can't list column names, you should instead use the DEFAULT keyword, as you've done in the simple insert example. This won't work with a in insert into ... select ....
For that, you need to invoke nextval. A subquery is not required, just:
insert into pk_test_table
select nextval('pk_test_table_id_seq'), first_name from person_test;
You do need to know the sequence name. You could get that from information_schema based on the table name and inferring its primary key, using a function that takes just the table name as an argument. It'd be ugly, but it'd work. I don't think there's any way around needing to know the table name.
You're inserting value into the first column, but you need to add a value in the second position.
Therefore you can use INSERT INTO table(field) VALUES(value) syntax.
Since you need to fetch values from another table, you have to remove VALUES and put the subquery there.
insert into pk_test_table(name)
select first_name from person_test;
I hope it helps
I do it this way via a separate function- though I think I'm getting around the issue via the table level having the DEFAULT settings on a per field basis.
create table public.pk_test_table
(
recordid integer NOT NULL DEFAULT nextval('pk_test_table_id_seq'),
name text,
field3 integer NOT NULL DEFAULT 64,
null_field_if_not_set integer,
CONSTRAINT pk_test_table_pkey PRIMARY KEY ("recordid")
);
With function:
CREATE OR REPLACE FUNCTION func_pk_test_table() RETURNS void AS
$BODY$
INSERT INTO pk_test_table (name)
SELECT first_name FROM person_test;
$BODY$
LANGUAGE sql VOLATILE;
Then just execute the function via a SELECT FROM func_pk_test_table();
Notice it hasn't had to specify all the fields- as long as constraints allow it.

Odd postgres sequence behavior

I have a table a Postgres 9.04 database with a table in it called Versions:
CREATE TABLE tracking."Versions"
(
"ObjectId" UUID NOT NULL,
"From" BIGINT NOT NULL,
"To" BIGINT,
"DataTypeId" INTEGER NOT NULL REFERENCES tracking."DataTypes" ( "DataTypeId" ),
CONSTRAINT "Versions_pkey" PRIMARY KEY ("ObjectId", "DataTypeId")
);
There is also a sequence defined in the database that is used by the From & To columns:
CREATE SEQUENCE tracking."dbVersion"
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 1;
The Versions table is actually keeping track of changes made to other tables. Without going into the details:
When a row is created in one of these other tables, a row is added to the Versions table and the From column is supposed to be set to the next value of the sequence.
If an existing row in one of those tables is updated, the From value of the corresponding row in the Versions table has to be set to the next value of the sequence.
When a row in one of these other tables is deleted, the To column has to be set to the next value of the sequence.
Rather than setting the Default value of the From column to "nextval('tracking."dbVersion'), I implemented a stored function that returns the result of calling this function:
CREATE OR REPLACE FUNCTION tracking."NextVersion"() RETURNS BIGINT
AS $$
SELECT nextval('tracking."dbVersion"'::regclass);
$$ LANGUAGE Sql;
All my code for inserting rows into the tables is implemented in C# using Entity Framework 4. All of the C# code is working fine. The weird thing is that when I look at the data in the Versions table, the values in the From column are all even. When I look at the sequence's properties in PgAdmin, it's odd. But the next time a row is inserted, the value stored is even.
What am I doing wrong? How does Postgres always use all of the values when you put that nextval call in the default property of a column?
Well, time for me to feel sheepish.
I looked over my C# code for inserting rows into the Versions table & I found that I was actually calling the NextVersion stored procedure twice. That explains why the sequence was always even when it was written to the From field. I've removed the second call & problem solved.
Tony

Using the serial datatype as a foreign key

Lets say that I have two tables.
The first is: table lists, with list_id SERIAL, list_name TEXT
The second table is, trivially, a table which says if the list is public: list_id INT, is_public INT
Obviously a bit of a contrived case, but I am planning out some tables and this seems to be an issue. If I insert a new list_name into table lists, then it'll give me a new serial number...but now I will need to use that serial number in the second table. Obviously in this case, you could simply add is_public to the first table, but in the case of a linking list where you have a compound key, you'll need to know the serial value that was returned.
How do people usually handle this? Do they get the return type from the insert using whatever system they're interacting with the database with?
One approach to this sort of thing is:
INSERT...
SELECT lastval()
INSERT...
INSERT into the first table, use lastval() to get the "value most recently obtained with nextval for any sequence" (in the current session), and then use that value to build your next INSERT.
There's also INSERT ... RETURNING:
The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted. This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number.
Using INSERT ... RETURNING id basically combines the first two steps above into one so you'd do:
INSERT ... RETURNING id
INSERT ...
where the second INSERT would use the id returned from the first INSERT.