Using the serial datatype as a foreign key - postgresql

Lets say that I have two tables.
The first is: table lists, with list_id SERIAL, list_name TEXT
The second table is, trivially, a table which says if the list is public: list_id INT, is_public INT
Obviously a bit of a contrived case, but I am planning out some tables and this seems to be an issue. If I insert a new list_name into table lists, then it'll give me a new serial number...but now I will need to use that serial number in the second table. Obviously in this case, you could simply add is_public to the first table, but in the case of a linking list where you have a compound key, you'll need to know the serial value that was returned.
How do people usually handle this? Do they get the return type from the insert using whatever system they're interacting with the database with?

One approach to this sort of thing is:
INSERT...
SELECT lastval()
INSERT...
INSERT into the first table, use lastval() to get the "value most recently obtained with nextval for any sequence" (in the current session), and then use that value to build your next INSERT.
There's also INSERT ... RETURNING:
The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted. This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number.
Using INSERT ... RETURNING id basically combines the first two steps above into one so you'd do:
INSERT ... RETURNING id
INSERT ...
where the second INSERT would use the id returned from the first INSERT.

Related

insert function with side-effects: how to take insert parameter?

I'm interested in writing a Postgres function that inserts a new row to e.g. an invoice table, and performs some side effects based on the result of the insertion. My invoice table has some columns with default values (e.g. auto-generated primary key column id), and some that are optional.
I'm wondering if there's a way to take a parameter that represents a row of the invoice table, possibly without default and optional fields, and insert that value directly as a row.
CREATE OR REPLACE FUNCTION public.insert_invoice(new_invoice invoice)
RETURNS uuid
LANGUAGE sql
AS $$
WITH invoice_insert_result AS (
-- This fails: new_invoice has type invoice, but expected type uuid (because it thinks we want to put `new_invoice` in the "id" column)
INSERT INTO invoice VALUES (new_invoice)
RETURNING id
)
-- Use the result to perform side-effects
SELECT invoice_insert_result.id
$$;
I know this is possible to do by replicating the schema of the invoice table in the list of parameters of the function, however I'd prefer not to do that since it would mean additional boilerplate and maintenance burden.
The uuid is a value that is automatically generated, you cannot insert a uuid value to a table.
The target column names can be listed in any order. If no list of
column names is given at all, the default is all the columns of the
table in their declared order; or the first N column names, if there
are only N columns supplied by the VALUES clause or query. The values
supplied by the VALUES clause or query are associated with the
explicit or implicit column list left-to-right.
https://www.postgresql.org/docs/current/sql-insert.html
The quote part means that you insert command can either explicit mention column list. Or you not mention column list then the to be inserted command (after values) should have all the column list's value.
to achiever your intended result, Insert command,you must specify columns list. If not, then you need insert uuid value. But you cannot uuid is auto generated. The same example would be like if a table have a column bigserial then you cannot insert bigserial value to that column. Since bigserial is auto-generated.
For other non-automatic column, You can aggregated them use customized type.
denmo
create type inv_insert_template as (receiver text, base_amount numeric,tax_rate numeric);
full function:
CREATE OR REPLACE FUNCTION public.insert_invoice(new_invoice inv_insert_template)
RETURNS bigint
LANGUAGE sql
AS $$
WITH invoice_insert_result AS (
INSERT INTO invoices(receiver,base_amount, tax_rate)
VALUES (new_invoice.receiver,
new_invoice.base_amount,
new_invoice.tax_rate) RETURNING inv_no
)
SELECT invoice_insert_result.inv_no from invoice_insert_result;
$$;
call it: select * from public.insert_invoice(row('person_c', 1000, 0.1));
db fiddle demo

PostgreSQL: Return auto-generated ids from COPY FROM insertion

I have a non-empty PostgreSQL table with a GENERATED ALWAYS AS IDENTITY column id. I do a bulk insert with the C++ binding pqxx::stream_to, which I'm assuming uses COPY FROM. My problem is that I want to know the ids of the newly created rows, but COPY FROM has no RETURNING clause. I see several possible solutions, but I'm not sure if any of them is good, or which one is the least bad:
Provide the ids manually through COPY FROM, taking care to give the values which the identity sequence would have provided, then afterwards synchronize the sequence with setval(...).
First stream the data to a temp-table with a custom index column for ordering. Then do something likeINSERT INTO foo (col1, col2)
SELECT ttFoo.col1, ttFoo.col2 FROM ttFoo
ORDER BY ttFoo.idx RETURNING foo.id
and depend on the fact that the identity sequence produces ascending numbers to correlate them with ttFoo.idx (I cannot do RETURNING ttFoo.idx too because only the inserted row is available for that which doesn't contain idx)
Query the current value of the identity sequence prior to insertion, then check afterwards which rows are new.
I would assume that this is a common situation, yet I don't see an obviously correct solution. What do you recommend?
You can find out which rows have been affected by your current transaction using the system columns. The xmin column contains the ID of the inserting transaction, so to return the id values you just copied, you could:
BEGIN;
COPY foo(col1,col2) FROM STDIN;
SELECT id FROM foo
WHERE xmin::text = (txid_current() % (2^32)::bigint)::text
ORDER BY id;
COMMIT;
The WHERE clause comes from this answer, which explains the reasoning behind it.
I don't think there's any way to optimise this with an index, so it might be too slow on a large table. If so, I think your second option would be the way to go, i.e. stream into a temp table and INSERT ... RETURNING.
I think you can create id with type is uuid.
The first step, you should random your ids after that bulk insert them, by this way your will not need to return ids from database.

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

Getting error for auto increment fields when inserting records without specifying columns

We're in process of converting over from SQL Server to Postgres. I have a scenario that I am trying to accommodate. It involves inserting records from one table into another, WITHOUT listing out all of the columns. I realize this is not recommended practice, but let's set that aside for now.
drop table if exists pk_test_table;
create table public.pk_test_table
(
recordid SERIAL PRIMARY KEY NOT NULL,
name text
);
--example 1: works and will insert a record with an id of 1
insert into pk_test_table values(default,'puppies');
--example 2: fails
insert into pk_test_table
select first_name from person_test;
Error I receive in the second example:
column "recordid" is of type integer but expression is of type
character varying Hint: You will need to rewrite or cast the
expression.
The default keyword will tell the database to grab the next value.
Is there any way to utilize this keyword in the second example? Or some way to tell the database to ignore auto-incremented columns and just them be populated like normal?
I would prefer to not use a subquery to grab the next "id".
This functionality works in SQL Server and hence the question.
Thanks in advance for your help!
If you can't list column names, you should instead use the DEFAULT keyword, as you've done in the simple insert example. This won't work with a in insert into ... select ....
For that, you need to invoke nextval. A subquery is not required, just:
insert into pk_test_table
select nextval('pk_test_table_id_seq'), first_name from person_test;
You do need to know the sequence name. You could get that from information_schema based on the table name and inferring its primary key, using a function that takes just the table name as an argument. It'd be ugly, but it'd work. I don't think there's any way around needing to know the table name.
You're inserting value into the first column, but you need to add a value in the second position.
Therefore you can use INSERT INTO table(field) VALUES(value) syntax.
Since you need to fetch values from another table, you have to remove VALUES and put the subquery there.
insert into pk_test_table(name)
select first_name from person_test;
I hope it helps
I do it this way via a separate function- though I think I'm getting around the issue via the table level having the DEFAULT settings on a per field basis.
create table public.pk_test_table
(
recordid integer NOT NULL DEFAULT nextval('pk_test_table_id_seq'),
name text,
field3 integer NOT NULL DEFAULT 64,
null_field_if_not_set integer,
CONSTRAINT pk_test_table_pkey PRIMARY KEY ("recordid")
);
With function:
CREATE OR REPLACE FUNCTION func_pk_test_table() RETURNS void AS
$BODY$
INSERT INTO pk_test_table (name)
SELECT first_name FROM person_test;
$BODY$
LANGUAGE sql VOLATILE;
Then just execute the function via a SELECT FROM func_pk_test_table();
Notice it hasn't had to specify all the fields- as long as constraints allow it.

auto-increment column in PostgreSQL on the fly?

I was wondering if it is possible to add an auto-increment integer field on the fly, i.e. without defining it in a CREATE TABLE statement?
For example, I have a statement:
SELECT 1 AS id, t.type FROM t;
and I am can I change this to
SELECT some_nextval_magic AS id, t.type FROM t;
I need to create the auto-increment field on the fly in the some_nextval_magic part because the result relation is a temporary one during the construction of a bigger SQL statement. And the value of id field is not really important as long as it is unique.
I search around here, and the answers to related questions (e.g. PostgreSQL Autoincrement) mostly involving specifying SERIAL or using nextval in CREATE TABLE. But I don't necessarily want to use CREATE TABLE or VIEW (unless I have to). There are also some discussions of generate_series(), but I am not sure whether it applies here.
-- Update --
My motivation is illustrated in this GIS.SE answer regarding the PostGIS extension. The original query was:
CREATE VIEW buffer40units AS
SELECT
g.path[1] as gid,
g.geom::geometry(Polygon, 31492) as geom
FROM
(SELECT
(ST_Dump(ST_UNION(ST_Buffer(geom, 40)))).*
FROM point
) as g;
where g.path[1] as gid is an id field "required for visualization in QGIS". I believe the only requirement is that it is integer and unique across the table. I encountered some errors when running the above query when the g.path[] array is empty.
While trying to fix the array in the above query, this thought came to me:
Since the gid value does not matter anyways, is there an auto-increment function that can be used here instead?
If you wish to have an id field that assigns a unique integer to each row in the output, then use the row_number() window function:
select
row_number() over () as id,
t.type from t;
The generated id will only be unique within each execution of the query. Multiple executions will not generate new unique values for id.