Postgres auto insert with previous values if some data misses - postgresql

I have data with date range, some date wont come for few days, during the missed window I just want to insert the previous data.
Is there way to take care of this during insert of data it self
For example
create table foo (ID VARCHAR(10), foo_count int, actual_date date);
insert into foo ( values('234534', 100, '2017-01-01'),('234534', 200, '2017-01-02'));
insert into foo ( values('234534', 300, '2017-01-03') );
insert into foo ( values('234534', 300, '2017-01-08') );
After the last insert I can make sure previous data gets generated
So it should look something like this
ID | foo_count | actual_date
-----------+-----------------+------------
234534 | 100 | 2017-01-01
234534 | 200 | 2017-02-01
234534 | 300 | 2017-03-01
234534 | 300 | 2017-04-01
234534 | 300 | 2017-05-01
234534 | 300 | 2017-06-01
234534 | 180 | 2017-07-01
I am using JPA to insert it, currently I query the table and see the current date in the table and populate the missing data

I would think about a better INSERT statement. Inserting from a SELECT statement would make things easier. The SELECT statement could be used to generate the requested date series.
INSERT INTO foo
SELECT
--<advanced query>
However, I guess, that's not simply possible since you are not using the JDBC directly or want to use native queries for inserting your data.
In that case, you could install a trigger to your database which could do the magic:
demo:db<>fiddle
Trigger function:
CREATE FUNCTION insert_missing()
RETURNS TRIGGER AS
$$
DECLARE
max_record record;
BEGIN
SELECT -- 1
id,
foo_count,
actual_date
FROM
foo
ORDER BY actual_date DESC
LIMIT 1
INTO max_record;
IF (NEW.actual_date - 1 > max_record.actual_date) THEN -- 2
INSERT INTO foo
SELECT
max_record.id,
max_record.foo_count,
generate_series(max_record.actual_date + 1, NEW.actual_date - 1, interval '1 day'); -- 3
END IF;
RETURN NEW;
END;
$$ language 'plpgsql';
Query the record with the current maximum date.
If the maximum date is more than one day before the new date...
... Insert a date series (from day after current max date until the date before the new one). This can be generated with generate_series().
Afterwards create the ON BEFORE INSERT trigger:
CREATE TRIGGER insert_missing
BEFORE INSERT
ON foo
FOR EACH ROW
EXECUTE PROCEDURE insert_missing();

Related

Postgres Date checking - is_date fn shortcircuit

I have a set of records in a table, with some records having invalid date. I wanted to ignore those invalid records and do a check with rest of the records. I framed a query like below but I don't find it working.
select * from tbl_name i
where is_date(i.dob) and i.dob::date > CURRENT_DATE;
I got to know that sql doesn't short circuit so it also consider invalid record and end up in date/time out of range. Please help me alter this query in a way i could eliminate invalid dates and do date comparison on valid dates only.
There is no guarantee for short-circuiting in Postgres. Neither in a "plain" WHERE clause, nor when using a derived table (from (select ...) where ...). One way to force the evaluation in two steps would be a materialized common table expressions:
with data as materialized (
select *
from tbl_name i
where is_date(i.dob)
)
select *
from data
where dob::date > CURRENT_DATE;
The materialized keyword prevents the optimizer from pushing the condition of the outer query into the CTE.
Obviously this assumes that is_date() will never return false positives
Using CASE in the WHERE to differentiate between a valid date and an invalid one and run the > comparison for valid date otherwise return FALSE.
create or replace function is_date(s varchar) returns boolean as $$
begin
if s is null then
return false;
end if;
perform s::date;
return true;
exception when others then
return false;
end;
$$ language plpgsql;
create table date_str (id integer, dt_str varchar);
insert into date_str values (1, '2022-11-02'), (2, '1234'), (3, '2022-12-03');
insert into date_str values (4, 'xyz'), (5, '2022-01-01'), (6, '2023-02-02');
select * from date_str;
id | dt_str
----+------------
1 | 2022-11-02
2 | 1234
3 | 2022-12-03
4 | xyz
5 | 2022-01-01
6 | 2023-02-02
select current_date;
current_date
--------------
11/02/2022
SELECT
*
FROM
date_str
WHERE
CASE WHEN is_date (dt_str) = 't' THEN
dt_str::date > CURRENT_DATE
ELSE
FALSE
END;
id | dt_str
----+------------
3 | 2022-12-03
6 | 2023-02-02

Why subqueried function does not insert new rows?

I need a function to insert rows because one column's (seriano) default value should be the same as PK id.
I have defined table:
CREATE SEQUENCE some_table_id_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;
CREATE TABLE some_table
(
id bigint NOT NULL DEFAULT nextval('some_table_id_seq'::regclass),
itemid integer NOT NULL,
serialno bigint,
CONSTRAINT stockitem_pkey PRIMARY KEY (id),
CONSTRAINT stockitem_serialno_key UNIQUE (serialno)
);
and function to insert count of rows:
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
id bigint;
BEGIN
FOR counter IN 1..count LOOP
id := NEXTVAL( 'some_table_id_seq' );
INSERT INTO some_table (id, itemid, serialno) VALUES (id, itemid, id);
ids := array_append(ids, id);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
And inserting with it works fine:
$ select insert_item(123, 10);
insert_item
-------------
1
2
3
4
5
6
7
8
9
10
(10 rows)
$ select * from some_table;
id | itemid | serialno
----+--------+----------
1 | 123 | 1
2 | 123 | 2
3 | 123 | 3
4 | 123 | 4
5 | 123 | 5
6 | 123 | 6
7 | 123 | 7
8 | 123 | 8
9 | 123 | 9
10 | 123 | 10
(10 rows)
But if I want to use function insert_item as subquery, it seems not to work anymore:
$ select id, itemid from some_table where id in (select insert_item(123, 10));
id | itemid
----+--------
(0 rows)
I created dumb function insert_dumb to test in a subquery:
CREATE OR REPLACE FUNCTION insert_dumb(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
BEGIN
FOR counter IN 1..count LOOP
ids := array_append(ids, counter::bigint);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
and this works in a subquery as expected:
$ select id, itemid from some_table where id in (select insert_dumb(123, 10));
id | itemid
----+--------
1 | 123
2 | 123
3 | 123
4 | 123
5 | 123
6 | 123
7 | 123
8 | 123
9 | 123
10 | 123
(10 rows)
Why does insert_item function not insert new rows when called as subquery? I tried to add raise notice to the loop and it runs as expected shouting new id every time (and increasing the sequence), but no new rows are appended to the table.
I made all the setup available as fiddle
I am using Postgres 11 on Ubuntu.
EDIT
Of course, I let out my real reason, and it pays off...
I need the insert_item function returning ids, so I could use it in update-statement, like:
update some_table set some_text = 'x' where id in (select insert_item(123, 10);)
And addition to the why-question: it is understandable I can get no ids in return (because they share the same snapshot), but the function runs all the needed INSERTs without affecting the table. Shouldn't those rows be available in the next query?
The problem is that the subquery and the surrounding query share the same snapshot, that is, they see the same state of the database. Hence the outer query cannot see the rows inserted by the inner query.
See the documentation (which explains that in the context of WITH, although it also applies here):
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables.
In addition, there is a second problem with your approach: if you run EXPLAIN (ANALYZE) on your statement, you will find that the subquery is not executed at all! Since the table is empty, there is no id, and running the subquery is not necessary to calculate the (empty) result.
You will have to run that in two different statements. Or, better, do it in a different fashion: updating a row that you just inserted is unnecessarily wasteful.
Laurenz explained the visibility problem, but you don't need the sub-query at all if you re-write your function to return the actual table, rather than just he IDs
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1)
RETURNS setof some_table
AS
$func$
INSERT INTO some_table (id, itemid, serialno)
select NEXTVAL( 'some_table_id_seq' ), itemid, currval('some_table_id_seq')
from generate_series(1,count)
returning *;
$func$
LANGUAGE sql;
Then you can use it like this:
select id, itemid
from insert_item(123, 10);
And you get the complete inserted rows.
Online example

Generating incremental numbers based on a different column

I have got a composite primary key in a table in PostgreSQL (I am using pgAdmin4)
Let's call the the two primary keys productno and version.
version represents the version of productno.
So if I create a new dataset, then it needs to be checked if a dataset with this productno already exists.
If productno doesn't exist yet, then version should be (version) 1
If productno exists once, then version should be 2
If productno exists twice, then version should be 3
... and so on
So that we get something like:
productno | version
-----|-----------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
I found a quite similar problem: auto increment on composite primary key
But I can't use this solution because PostgreSQL syntax is obviously a bit different - so tried a lot around with functions and triggers but couldn't figure out the right way to do it.
You can keep the version numbers in a separate table (one for each "base PK" value). That is way more efficient than doing a max() + 1 on every insert and has the additional benefit that it's safe for concurrent transactions.
So first we need a table that keeps track of the version numbers:
create table version_counter
(
product_no integer primary key,
version_nr integer not null
);
Then we create a function that increments the version for a given product_no and returns that new version number:
create function next_version(p_product_no int)
returns integer
as
$$
insert into version_counter (product_no, version_nr)
values (p_product_no, 1)
on conflict (product_no)
do update
set version_nr = version_counter.version_nr + 1
returning version_nr;
$$
language sql
volatile;
The trick here is the the insert on conflict which increments an existing value or inserts a new row if the passed product_no does not yet exists.
For the product table:
create table product
(
product_no integer not null,
version_nr integer not null,
created_at timestamp default clock_timestamp(),
primary key (product_no, version_nr)
);
then create a trigger:
create function increment_version()
returns trigger
as
$$
begin
new.version_nr := next_version(new.product_no);
return new;
end;
$$
language plpgsql;
create trigger base_table_insert_trigger
before insert on product
for each row
execute procedure increment_version();
This is safe for concurrent transactions because the row in version_counter will be locked for that product_no until the transaction inserting the row into the product table is committed - which will commit the change to the version_counter table as well (and free the lock on that row).
If two concurrent transactions insert the same value for product_no, one of them will wait until the other finishes.
If two concurrent transactions insert different values for product_no, they can work without having to wait for the other.
If we then insert these rows:
insert into product (product_no) values (1);
insert into product (product_no) values (2);
insert into product (product_no) values (3);
insert into product (product_no) values (1);
insert into product (product_no) values (3);
insert into product (product_no) values (2);
The product table looks like this:
select *
from product
order by product_no, version_nr;
product_no | version_nr | created_at
-----------+------------+------------------------
1 | 1 | 2019-08-23 10:50:57.880
1 | 2 | 2019-08-23 10:50:57.947
2 | 1 | 2019-08-23 10:50:57.899
2 | 2 | 2019-08-23 10:50:57.989
3 | 1 | 2019-08-23 10:50:57.926
3 | 2 | 2019-08-23 10:50:57.966
Online example: https://rextester.com/CULK95702
You can do it like this:
-- Check if pk exists
SELECT pk INTO temp_pk FROM table a WHERE a.pk = v_pk1;
-- If exists, inserts it
IF temp_pk IS NOT NULL THEN
INSERT INTO table(pk, versionpk) VALUES (v_pk1, temp_pk);
END IF;
So - I got it work now
So if you want a column to update depending on another column in pg sql - have a look at this:
This is the function I use:
CREATE FUNCTION public.testfunction()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$
DECLARE v_productno INTEGER := NEW.productno;
BEGIN
IF NOT EXISTS (SELECT *
FROM testtable
WHERE productno = v_productno)
THEN
NEW.version := 1;
ELSE
NEW.version := (SELECT MAX(testtable.version)+1
FROM testtable
WHERE testtable.productno = v_productno);
END IF;
RETURN NEW;
END;
$BODY$;
And this is the trigger that runs the function:
CREATE TRIGGER testtrigger
BEFORE INSERT
ON public.testtable
FOR EACH ROW
EXECUTE PROCEDURE public.testfunction();
Thank you #ChechoCZ, you definetly helped me getting in the right direction.

Duplicate single database record

Hello what is the easiest way to duplicate a DB record over the same table?
My problem is that the table where I am doing this has many column, like 100+, and I don't like how the solution looks like. Here is what I do (this is inside plpqsql function):
...
1. duplicate record
INSERT INTO history
(SELECT NEXTVAL('history_id_seq'), col_1, col_2, ... , col_100)
FROM history
WHERE history_id = 1234
ORDER BY datetime DESC
LIMIT 1)
RETURNING
history_id INTO new_history_id;
2. update some columns
UPDATE history
SET
col_5 = 'test_5',
col_23 = 'test_23',
datetime = CURRENT_TIMESTAMP
WHERE history_id = new_history_id;
Here are the problems I am attempting to solve
Listing all these 100+ columns looks lame
When new column is added eventually the function should be updated too
On separate DB instances the column order might differ, which would cause the function fail
I am not sure if I can list them once more (solving issue 3) like insert into <table> (<columns_list>) values (<query>) but then the query looks even uglier.
I would like to achieve something like 'insert into ', but this seems impossible the unique primary key constraint will raise a duplication error.
Any suggestions?
Thanks in advance for you time.
This isn't pretty or particularly optimized but there are a couple of ways to go about this. Ideally, you might want to do this all in an UPDATE trigger though you could implement a duplication function something like this:
-- create source table
CREATE TABLE history (history_id serial not null primary key, col_2 int, col_3 int, col_4 int, datetime timestamptz default now());
-- add some data
INSERT INTO history (col_2, col_3, col_4)
SELECT g, g * 10, g * 100 FROM generate_series(1, 100) AS g;
-- function to duplicate record
CREATE OR REPLACE FUNCTION fn_history_duplicate(p_history_id integer) RETURNS SETOF history AS
$BODY$
DECLARE
cols text;
insert_statement text;
BEGIN
-- build list of columns
SELECT array_to_string(array_agg(column_name::name), ',') INTO cols
FROM information_schema.columns
WHERE (table_schema, table_name) = ('public', 'history')
AND column_name <> 'history_id';
-- build insert statement
insert_statement := 'INSERT INTO history (' || cols || ') SELECT ' || cols || ' FROM history WHERE history_id = $1 RETURNING *';
-- execute statement
RETURN QUERY EXECUTE insert_statement USING p_history_id;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql';
-- test
SELECT * FROM fn_history_duplicate(1);
history_id | col_2 | col_3 | col_4 | datetime
------------+-------+-------+-------+-------------------------------
101 | 1 | 10 | 100 | 2013-04-15 14:56:11.131507+00
(1 row)
As I noted in my original comment, you might also take a look at the colnames extension as an alternative to querying the information schema.
You don't need the update anyway, you can supply the constant values directly in the SELECT statement:
INSERT INTO history
SELECT NEXTVAL('history_id_seq'),
col_1,
col_2,
col_3,
col_4,
'test_5',
...
'test_23',
...,
col_100
FROM history
WHERE history_sid = 1234
ORDER BY datetime DESC
LIMIT 1
RETURNING history_sid INTO new_history_sid;

Calling now() in a function

One of our Postgres tables, called rep_event, has a timestamp column that indicates when each row was inserted. But all of the rows have a timestamp value of 2000-01-01 00:00:00, so something isn't set up right.
There is a function that inserts rows into the table, and it is the only code that inserts rows into that table - no other code inserts into that table. (There also isn't any code that updates the rows in that table.) Here is the definition of the function:
CREATE FUNCTION handle_event() RETURNS "trigger"
AS $$
BEGIN
IF (TG_OP = 'DELETE') THEN
INSERT INTO rep_event SELECT 'D', TG_RELNAME, OLD.object_id, now();
RETURN OLD;
ELSIF (TG_OP = 'UPDATE') THEN
INSERT INTO rep_event SELECT 'U', TG_RELNAME, NEW.object_id, now();
RETURN NEW;
ELSIF (TG_OP = 'INSERT') THEN
INSERT INTO rep_event SELECT 'I', TG_RELNAME, NEW.object_id, now();
RETURN NEW;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Here is the table definition:
CREATE TABLE rep_event
(
operation character(1) NOT NULL,
table_name text NOT NULL,
object_id bigint NOT NULL,
time_stamp timestamp without time zone NOT NULL
)
As you can see, the now() function is called to get the current time. Doing a "select now()" on the database returns the correct time, so is there an issue with calling now() from within a function?
A simpler solution is to just modify your table definition to have NOW() be the default value:
CREATE TABLE rep_event (
operation character(1) NOT NULL,
table_name text NOT NULL,
object_id bigint NOT NULL,
time_stamp timestamp without time zone NOT NULL DEFAULT NOW()
);
Then you can get rid of the now() calls in your trigger.
Also as a side note, I strongly suggest including the column ordering in your function... IOW;
INSERT INTO rep_event (operation,table_name,object_id,time_stamp) SELECT ...
This way if you ever add a new column or make other table changes that change the internal ordering of the tables, your function won't suddenly break.
Your problem has to be elsewhere, as your function works well. Create test database, paste the code you cited and run:
create table events (object_id bigserial, data text);
create trigger rep_event
before insert or update or delete on events
for each row execute procedure handle_event();
insert into events (data) values ('v1'),('v2'),('v3');
delete from events where data='v2';
update events set data='v4' where data='v3';
select * from events;
object_id | data
-----------+------
1 | v1
3 | v4
select * from rep_event;
operation | table_name | object_id | time_stamp
-----------+------------+-----------+----------------------------
I | events | 1 | 2011-07-08 10:31:50.489947
I | events | 2 | 2011-07-08 10:31:50.489947
I | events | 3 | 2011-07-08 10:31:50.489947
D | events | 2 | 2011-07-08 10:32:12.65699
U | events | 3 | 2011-07-08 10:32:33.662936
(5 rows)
Check other triggers, trigger creation command etc. And change this timestamp without timezone to timestamp with timezone.