I have a set of records in a table, with some records having invalid date. I wanted to ignore those invalid records and do a check with rest of the records. I framed a query like below but I don't find it working.
select * from tbl_name i
where is_date(i.dob) and i.dob::date > CURRENT_DATE;
I got to know that sql doesn't short circuit so it also consider invalid record and end up in date/time out of range. Please help me alter this query in a way i could eliminate invalid dates and do date comparison on valid dates only.
There is no guarantee for short-circuiting in Postgres. Neither in a "plain" WHERE clause, nor when using a derived table (from (select ...) where ...). One way to force the evaluation in two steps would be a materialized common table expressions:
with data as materialized (
select *
from tbl_name i
where is_date(i.dob)
)
select *
from data
where dob::date > CURRENT_DATE;
The materialized keyword prevents the optimizer from pushing the condition of the outer query into the CTE.
Obviously this assumes that is_date() will never return false positives
Using CASE in the WHERE to differentiate between a valid date and an invalid one and run the > comparison for valid date otherwise return FALSE.
create or replace function is_date(s varchar) returns boolean as $$
begin
if s is null then
return false;
end if;
perform s::date;
return true;
exception when others then
return false;
end;
$$ language plpgsql;
create table date_str (id integer, dt_str varchar);
insert into date_str values (1, '2022-11-02'), (2, '1234'), (3, '2022-12-03');
insert into date_str values (4, 'xyz'), (5, '2022-01-01'), (6, '2023-02-02');
select * from date_str;
id | dt_str
----+------------
1 | 2022-11-02
2 | 1234
3 | 2022-12-03
4 | xyz
5 | 2022-01-01
6 | 2023-02-02
select current_date;
current_date
--------------
11/02/2022
SELECT
*
FROM
date_str
WHERE
CASE WHEN is_date (dt_str) = 't' THEN
dt_str::date > CURRENT_DATE
ELSE
FALSE
END;
id | dt_str
----+------------
3 | 2022-12-03
6 | 2023-02-02
Related
I have a postgres table with a column names "ids".
+----+--------------+
| id | ids |
+----+--------------+
| 1 | {1, 2, 3} |
| 2 | {2, 7, 10} |
| 3 | {14, 11, 1} |
| 4 | {12, 13} |
| 5 | {15, 16, 12} |
+----+--------------+
I want to merge rows with at least one common array element and create a new row from that (or merge into one existing row). So finally the table would look like the following:
+----+--------------------------+
| id | ids |
+----+--------------------------+
| 6 | {1, 2, 3, 7, 10, 14, 11} |
| 7 | {12, 13, 15, 16} |
+----+--------------------------+
Order of array elements in the resulting table does not really matter but they must be unique.
The rows are added independently from another system. For example we could add a new row where ids are {16, 18, 1}.
Right now to make sure we combine all the rows with at least one common array element, I am doing the calculations in my server (Node.js).
So before I create a new row, I pull all the existing rows in the database that have at least one item in common using:
await t.any('SELECT * FROM arraytable WHERE $1 && ids', [16, 18, 1])
This gives me all the rows that have at least 16, 18 or 1. Then I merge the rows with [16, 18, 1] and remove duplicates.
With the availability of this new array, I delete all existing rows fetched above and insert this new row to the database. As you can see, most of the work is being done in Node.js.
Instead of this I am trying to create a trigger, which will do all these steps for me as soon as I add the new row. How do I go about doing this with a trigger. Also, are there better ways?
Can procedure suffice?
CREATE OR REPLACE PROCEDURE add_ids(new_ids INT[])
AS $$
DECLARE sum_array INT[];
BEGIN
SELECT ARRAY (SELECT UNNEST(ids) FROM table1 WHERE table1.ids && new_ids) INTO sum_array;
sum_array := sum_array || new_ids;
SELECT ARRAY(SELECT DISTINCT UNNEST(sum_array)) INTO sum_array;
DELETE FROM table1 WHERE table1.ids && sum_array;
INSERT INTO table1(ids) SELECT sum_array;
END;
$$
LANGUAGE plpgsql;
Unfortunately inserting row inside trigger calls another trigger causing infinitie loop. I do not know work around that.
PS. Sorry if creating another answer is bad practice. I want to leave it for now for reference. I will delete it when the problem is resolved.
Edit by pewpewlasers:
To prevent the loop another table is probably needed. I have created a new temporary table2. New arrays can be added to this table. This table will have a trigger which does the calculations and saves it to table1. It also deletes this temporarily created row.
CREATE OR REPLACE FUNCTION on_insert_temp() RETURNS TRIGGER AS $f$
DECLARE sum_array BIGINT[];
BEGIN
SELECT ARRAY (SELECT UNNEST(ids) FROM table1 WHERE table1.ids && NEW.ids) INTO sum_array;
sum_array := sum_array || NEW.ids;
SELECT ARRAY(SELECT DISTINCT UNNEST(sum_array)) INTO sum_array;
DELETE FROM table1 WHERE table1.ids && sum_array;
INSERT INTO table1(ids) SELECT sum_array;
DELETE FROM table2 WHERE id = NEW.id;
RETURN OLD;
END
$f$ LANGUAGE plpgsql;
CREATE TRIGGER on_insert_temp AFTER INSERT ON table2 FOR EACH ROW EXECUTE PROCEDURE on_insert_temp();
Given tables
CREATE TABLE table1(id serial, ids INT [] )
CREATE TABLE table2(id serial, ids INT [] )
the trigger can looks like that
CREATE OR REPLACE FUNCTION sum_tables_trigger() RETURNS TRIGGER AS $table1$
BEGIN
INSERT INTO table2(ids) SELECT ARRAY(SELECT DISTINCT UNNEST(table1.ids || new.ids) ORDER BY 1) FROM table1 WHERE table1.ids && new.ids;
RETURN NEW;
END;
$table1$ LANGUAGE plpgsql;
CREATE TRIGGER sum_tables_trigger_ BEFORE INSERT ON table1
FOR EACH ROW EXECUTE PROCEDURE sum_tables_trigger();
tableA.ids && tableB.ids returns true, if tables have common element.
tableA.ids || tableB.ids adds elements.
ARRAY(SELECT DISTINCT UNNEST(table1.ids || new.ids) ORDER BY 1) removes duplicates.
I have data with date range, some date wont come for few days, during the missed window I just want to insert the previous data.
Is there way to take care of this during insert of data it self
For example
create table foo (ID VARCHAR(10), foo_count int, actual_date date);
insert into foo ( values('234534', 100, '2017-01-01'),('234534', 200, '2017-01-02'));
insert into foo ( values('234534', 300, '2017-01-03') );
insert into foo ( values('234534', 300, '2017-01-08') );
After the last insert I can make sure previous data gets generated
So it should look something like this
ID | foo_count | actual_date
-----------+-----------------+------------
234534 | 100 | 2017-01-01
234534 | 200 | 2017-02-01
234534 | 300 | 2017-03-01
234534 | 300 | 2017-04-01
234534 | 300 | 2017-05-01
234534 | 300 | 2017-06-01
234534 | 180 | 2017-07-01
I am using JPA to insert it, currently I query the table and see the current date in the table and populate the missing data
I would think about a better INSERT statement. Inserting from a SELECT statement would make things easier. The SELECT statement could be used to generate the requested date series.
INSERT INTO foo
SELECT
--<advanced query>
However, I guess, that's not simply possible since you are not using the JDBC directly or want to use native queries for inserting your data.
In that case, you could install a trigger to your database which could do the magic:
demo:db<>fiddle
Trigger function:
CREATE FUNCTION insert_missing()
RETURNS TRIGGER AS
$$
DECLARE
max_record record;
BEGIN
SELECT -- 1
id,
foo_count,
actual_date
FROM
foo
ORDER BY actual_date DESC
LIMIT 1
INTO max_record;
IF (NEW.actual_date - 1 > max_record.actual_date) THEN -- 2
INSERT INTO foo
SELECT
max_record.id,
max_record.foo_count,
generate_series(max_record.actual_date + 1, NEW.actual_date - 1, interval '1 day'); -- 3
END IF;
RETURN NEW;
END;
$$ language 'plpgsql';
Query the record with the current maximum date.
If the maximum date is more than one day before the new date...
... Insert a date series (from day after current max date until the date before the new one). This can be generated with generate_series().
Afterwards create the ON BEFORE INSERT trigger:
CREATE TRIGGER insert_missing
BEFORE INSERT
ON foo
FOR EACH ROW
EXECUTE PROCEDURE insert_missing();
I have a table that regroups some users and which event (as in IRL event) they've joined.
I have set up a server query that lets a user join an event.
It goes like this :
INSERT INTO participations
VALUES(:usr,:event_id)
I want that statement to also return the number of people who have joined the same event as the user. How do I proceed? If possible in one SQL statement.
Thanks
You can use a common table expression like this to execute it as one query.
with insert_tbl_statement as (
insert into tbl values (4, 1) returning event_id
)
select (count(*) + 1) as event_count from tbl where event_id = (select event_id from insert_tbl_statement);
see demo http://rextester.com/BUF16406
You can use a function, I've set up next example, but keep in mind you must add 1 to the final count because still transaction hasn't been committed.
create table tbl(id int, event_id int);
✓
insert into tbl values (1, 2),(2, 2),(3, 3);
3 rows affected
create function new_tbl(id int, event_id int)
returns bigint as $$
insert into tbl values ($1, $2);
select count(*) + 1 from tbl where event_id = $2;
$$ language sql;
✓
select new_tbl(4, 2);
| new_tbl |
| ------: |
| 4 |
db<>fiddle here
Hello what is the easiest way to duplicate a DB record over the same table?
My problem is that the table where I am doing this has many column, like 100+, and I don't like how the solution looks like. Here is what I do (this is inside plpqsql function):
...
1. duplicate record
INSERT INTO history
(SELECT NEXTVAL('history_id_seq'), col_1, col_2, ... , col_100)
FROM history
WHERE history_id = 1234
ORDER BY datetime DESC
LIMIT 1)
RETURNING
history_id INTO new_history_id;
2. update some columns
UPDATE history
SET
col_5 = 'test_5',
col_23 = 'test_23',
datetime = CURRENT_TIMESTAMP
WHERE history_id = new_history_id;
Here are the problems I am attempting to solve
Listing all these 100+ columns looks lame
When new column is added eventually the function should be updated too
On separate DB instances the column order might differ, which would cause the function fail
I am not sure if I can list them once more (solving issue 3) like insert into <table> (<columns_list>) values (<query>) but then the query looks even uglier.
I would like to achieve something like 'insert into ', but this seems impossible the unique primary key constraint will raise a duplication error.
Any suggestions?
Thanks in advance for you time.
This isn't pretty or particularly optimized but there are a couple of ways to go about this. Ideally, you might want to do this all in an UPDATE trigger though you could implement a duplication function something like this:
-- create source table
CREATE TABLE history (history_id serial not null primary key, col_2 int, col_3 int, col_4 int, datetime timestamptz default now());
-- add some data
INSERT INTO history (col_2, col_3, col_4)
SELECT g, g * 10, g * 100 FROM generate_series(1, 100) AS g;
-- function to duplicate record
CREATE OR REPLACE FUNCTION fn_history_duplicate(p_history_id integer) RETURNS SETOF history AS
$BODY$
DECLARE
cols text;
insert_statement text;
BEGIN
-- build list of columns
SELECT array_to_string(array_agg(column_name::name), ',') INTO cols
FROM information_schema.columns
WHERE (table_schema, table_name) = ('public', 'history')
AND column_name <> 'history_id';
-- build insert statement
insert_statement := 'INSERT INTO history (' || cols || ') SELECT ' || cols || ' FROM history WHERE history_id = $1 RETURNING *';
-- execute statement
RETURN QUERY EXECUTE insert_statement USING p_history_id;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql';
-- test
SELECT * FROM fn_history_duplicate(1);
history_id | col_2 | col_3 | col_4 | datetime
------------+-------+-------+-------+-------------------------------
101 | 1 | 10 | 100 | 2013-04-15 14:56:11.131507+00
(1 row)
As I noted in my original comment, you might also take a look at the colnames extension as an alternative to querying the information schema.
You don't need the update anyway, you can supply the constant values directly in the SELECT statement:
INSERT INTO history
SELECT NEXTVAL('history_id_seq'),
col_1,
col_2,
col_3,
col_4,
'test_5',
...
'test_23',
...,
col_100
FROM history
WHERE history_sid = 1234
ORDER BY datetime DESC
LIMIT 1
RETURNING history_sid INTO new_history_sid;
One of our Postgres tables, called rep_event, has a timestamp column that indicates when each row was inserted. But all of the rows have a timestamp value of 2000-01-01 00:00:00, so something isn't set up right.
There is a function that inserts rows into the table, and it is the only code that inserts rows into that table - no other code inserts into that table. (There also isn't any code that updates the rows in that table.) Here is the definition of the function:
CREATE FUNCTION handle_event() RETURNS "trigger"
AS $$
BEGIN
IF (TG_OP = 'DELETE') THEN
INSERT INTO rep_event SELECT 'D', TG_RELNAME, OLD.object_id, now();
RETURN OLD;
ELSIF (TG_OP = 'UPDATE') THEN
INSERT INTO rep_event SELECT 'U', TG_RELNAME, NEW.object_id, now();
RETURN NEW;
ELSIF (TG_OP = 'INSERT') THEN
INSERT INTO rep_event SELECT 'I', TG_RELNAME, NEW.object_id, now();
RETURN NEW;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Here is the table definition:
CREATE TABLE rep_event
(
operation character(1) NOT NULL,
table_name text NOT NULL,
object_id bigint NOT NULL,
time_stamp timestamp without time zone NOT NULL
)
As you can see, the now() function is called to get the current time. Doing a "select now()" on the database returns the correct time, so is there an issue with calling now() from within a function?
A simpler solution is to just modify your table definition to have NOW() be the default value:
CREATE TABLE rep_event (
operation character(1) NOT NULL,
table_name text NOT NULL,
object_id bigint NOT NULL,
time_stamp timestamp without time zone NOT NULL DEFAULT NOW()
);
Then you can get rid of the now() calls in your trigger.
Also as a side note, I strongly suggest including the column ordering in your function... IOW;
INSERT INTO rep_event (operation,table_name,object_id,time_stamp) SELECT ...
This way if you ever add a new column or make other table changes that change the internal ordering of the tables, your function won't suddenly break.
Your problem has to be elsewhere, as your function works well. Create test database, paste the code you cited and run:
create table events (object_id bigserial, data text);
create trigger rep_event
before insert or update or delete on events
for each row execute procedure handle_event();
insert into events (data) values ('v1'),('v2'),('v3');
delete from events where data='v2';
update events set data='v4' where data='v3';
select * from events;
object_id | data
-----------+------
1 | v1
3 | v4
select * from rep_event;
operation | table_name | object_id | time_stamp
-----------+------------+-----------+----------------------------
I | events | 1 | 2011-07-08 10:31:50.489947
I | events | 2 | 2011-07-08 10:31:50.489947
I | events | 3 | 2011-07-08 10:31:50.489947
D | events | 2 | 2011-07-08 10:32:12.65699
U | events | 3 | 2011-07-08 10:32:33.662936
(5 rows)
Check other triggers, trigger creation command etc. And change this timestamp without timezone to timestamp with timezone.