Duplicate single database record - postgresql

Hello what is the easiest way to duplicate a DB record over the same table?
My problem is that the table where I am doing this has many column, like 100+, and I don't like how the solution looks like. Here is what I do (this is inside plpqsql function):
...
1. duplicate record
INSERT INTO history
(SELECT NEXTVAL('history_id_seq'), col_1, col_2, ... , col_100)
FROM history
WHERE history_id = 1234
ORDER BY datetime DESC
LIMIT 1)
RETURNING
history_id INTO new_history_id;
2. update some columns
UPDATE history
SET
col_5 = 'test_5',
col_23 = 'test_23',
datetime = CURRENT_TIMESTAMP
WHERE history_id = new_history_id;
Here are the problems I am attempting to solve
Listing all these 100+ columns looks lame
When new column is added eventually the function should be updated too
On separate DB instances the column order might differ, which would cause the function fail
I am not sure if I can list them once more (solving issue 3) like insert into <table> (<columns_list>) values (<query>) but then the query looks even uglier.
I would like to achieve something like 'insert into ', but this seems impossible the unique primary key constraint will raise a duplication error.
Any suggestions?
Thanks in advance for you time.

This isn't pretty or particularly optimized but there are a couple of ways to go about this. Ideally, you might want to do this all in an UPDATE trigger though you could implement a duplication function something like this:
-- create source table
CREATE TABLE history (history_id serial not null primary key, col_2 int, col_3 int, col_4 int, datetime timestamptz default now());
-- add some data
INSERT INTO history (col_2, col_3, col_4)
SELECT g, g * 10, g * 100 FROM generate_series(1, 100) AS g;
-- function to duplicate record
CREATE OR REPLACE FUNCTION fn_history_duplicate(p_history_id integer) RETURNS SETOF history AS
$BODY$
DECLARE
cols text;
insert_statement text;
BEGIN
-- build list of columns
SELECT array_to_string(array_agg(column_name::name), ',') INTO cols
FROM information_schema.columns
WHERE (table_schema, table_name) = ('public', 'history')
AND column_name <> 'history_id';
-- build insert statement
insert_statement := 'INSERT INTO history (' || cols || ') SELECT ' || cols || ' FROM history WHERE history_id = $1 RETURNING *';
-- execute statement
RETURN QUERY EXECUTE insert_statement USING p_history_id;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql';
-- test
SELECT * FROM fn_history_duplicate(1);
history_id | col_2 | col_3 | col_4 | datetime
------------+-------+-------+-------+-------------------------------
101 | 1 | 10 | 100 | 2013-04-15 14:56:11.131507+00
(1 row)
As I noted in my original comment, you might also take a look at the colnames extension as an alternative to querying the information schema.

You don't need the update anyway, you can supply the constant values directly in the SELECT statement:
INSERT INTO history
SELECT NEXTVAL('history_id_seq'),
col_1,
col_2,
col_3,
col_4,
'test_5',
...
'test_23',
...,
col_100
FROM history
WHERE history_sid = 1234
ORDER BY datetime DESC
LIMIT 1
RETURNING history_sid INTO new_history_sid;

Related

Calling an insert *function* from a CTE in a SELECT query in Postgres 13.4

I'm writing up utility code to run through pg_cron, and sometimes want the routines to insert some results into a custom table at dba.event_log. I've got a basic log table as a starting point:
DROP TABLE IF EXISTS dba.event_log;
CREATE TABLE IF NOT EXISTS dba.event_log (
dts timestamp NOT NULL DEFAULT now(),
name citext NOT NULL DEFAULT '',
details citext NOT NULL DEFAULT '');
The toy example below performs a select operation, and then uses that value as the result of the outer query, and as a values element of an insert into the event_log:
WITH
values_cte AS (
select clock_timestamp() as ct
),
log as(
insert into event_log (
name,
details)
values (
'CTE INSERT check',
'clock = ' || (select ct::text from values_cte)
)
)
select * from values_cte;
select * from event_log;
Every time I run this, I get a new log entry, with the clock_timestamp() to make it easy to see that something is happening:
+----------------------------+------------------+---------------------------------------+
| dts | name | details |
+----------------------------+------------------+---------------------------------------+
| 2021-11-10 11:58:43.919151 | CTE INSERT check | clock = 2021-11-10 11:58:43.919821+11 |
| 2021-11-10 11:58:56.769512 | CTE INSERT check | clock = 2021-11-10 11:58:56.769903+11 |
| 2021-11-10 11:58:59.632619 | CTE INSERT check | clock = 2021-11-10 11:58:59.632822+11 |
| 2021-11-10 12:00:50.442282 | CTE INSERT check | clock = 2021-11-10 12:00:50.442646+11 |
+----------------------------+------------------+---------------------------------------+
I'll likely enrich the table later, and I'd to make the log inserts into a simple call now. Below is a simple insert function:
DROP FUNCTION IF EXISTS dba.event_log_add(citext,citext);
CREATE FUNCTION dba.event_log_add(
name_in citext,
description_in citext)
RETURNS int4
LANGUAGE sql AS
$BODY$
insert into event_log (name, details)
values (name_in, description_in)
returning 1;
$BODY$;
It sees like I should be able to rewrite the original query to call the function, like this:
WITH
values_cte AS (
select clock_timestamp() as ct
),
log as (
select * from dba.event_log_add(
'CTE event_log_add check',
'clock = ' || (select ct::text from values_cte)
)
)
select * from values_cte;
The only difference here is that the VALUES are now passed as parameters to dba.event_log_add, rather than used in an INSERT directly in the query. I get this error:
ERROR: function dba.event_log_add(unknown, text) does not exist
LINE 8: select * from dba.event_log_add(
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts. 0.000 seconds. (Line 1).
I've tried
Explicit casts
Rewriting the function as a stored procedure and using CALL
Rewriting the function in PL/PgSQL, returning VOID, and running PERFORM.
Nothing seemed to work. I've checked the search_path, used qualified
names, checked permissions, etc. Some approaches throw errors that don't seem to apply,
like the one above, others throw no error, and insert no data. Run directly, the function works fine, it only blows up within the CTE.
I think I'm missing something about using a function instead of a direct INSERT. Is there a good way to do this? After looking at the docs, and hunting around here for more information, I'm a bit clearer on the rules. But not entirely. If I'm reading it right, a data-modifying CTE is ruled/regulated by the outer query. There are definitely subtleties that I'm not grasping. Am I changing the context in some way to moving the INSERT into a function, making how the code in the query and CTE are interpreted?
https://www.postgresql.org/docs/13/queries-with.html#QUERIES-WITH-MODIFYING
Your function expects parameters of type citext but you are passing values of type text. You need to cast the parameters:
WITH values_cte AS (
select clock_timestamp() as ct
),log as (
select event_log_add('CTE event_log_add check'::citext,
('clock = ' || (select ct::text from values_cte))::citext)
)
select *
from log;
It's probably easier to define the parameters as text, during the INSERT the casting will then be done automatically:
CREATE FUNCTION event_log_add(
name_in text,
description_in text)
RETURNS int4
LANGUAGE sql AS
$BODY$
insert into event_log (name, details)
values (name_in, description_in)
returning 1;
$BODY$;
WITH values_cte AS (
select clock_timestamp() as ct
),log as (
select event_log_add('CTE event_log_add check',
'clock = ' || (select ct::text from values_cte))
)
select *
from log;
If you want, you can add an explicit cast inside the function.

Return the number of rows with a where condition after an INSERT INTO? postgresql

I have a table that regroups some users and which event (as in IRL event) they've joined.
I have set up a server query that lets a user join an event.
It goes like this :
INSERT INTO participations
VALUES(:usr,:event_id)
I want that statement to also return the number of people who have joined the same event as the user. How do I proceed? If possible in one SQL statement.
Thanks
You can use a common table expression like this to execute it as one query.
with insert_tbl_statement as (
insert into tbl values (4, 1) returning event_id
)
select (count(*) + 1) as event_count from tbl where event_id = (select event_id from insert_tbl_statement);
see demo http://rextester.com/BUF16406
You can use a function, I've set up next example, but keep in mind you must add 1 to the final count because still transaction hasn't been committed.
create table tbl(id int, event_id int);
✓
insert into tbl values (1, 2),(2, 2),(3, 3);
3 rows affected
create function new_tbl(id int, event_id int)
returns bigint as $$
insert into tbl values ($1, $2);
select count(*) + 1 from tbl where event_id = $2;
$$ language sql;
✓
select new_tbl(4, 2);
| new_tbl |
| ------: |
| 4 |
db<>fiddle here

How to pivot or crosstab in postgresql without writing a function?

I have a dataset that looks something like this:
I'd like to aggregate all co values on one row, so the final result looks something like:
Seems pretty easy, right? Just write a query using crosstab, as suggested in this answer. Problem is that requires that I CREATE EXTENSION tablefunc; and I don't have write access to my DB.
Can anyone recommend an alternative?
Conditional aggregation:
SELECT co,
MIN(CASE WHEN ontology_type = 'industry' THEN tags END) AS industry,
MIN(CASE WHEN ontology_type = 'customer_type' THEN tags END) AS customer_type,
-- ...
FROM tab_name
GROUP BY co
You can use DO to generate and PREPARE your own SQL with crosstab columns, then EXECUTE it.
-- replace tab_name to yours table name
DO $$
DECLARE
_query text;
_name text;
BEGIN
_name := 'prepared_query';
_query := '
SELECT co
'||(SELECT ', '||string_agg(DISTINCT
' string_agg(DISTINCT
CASE ontology_type WHEN '||quote_literal(ontology_type)||' THEN tags
ELSE NULL
END, '',''
) AS '||quote_ident(ontology_type),',')
FROM tab_name)||'
FROM tab_name
GROUP BY co
';
BEGIN
EXECUTE 'DEALLOCATE '||_name;
EXCEPTION
WHEN invalid_sql_statement_name THEN
END;
EXECUTE 'PREPARE '||_name||' AS '||_query;
END
$$;
EXECUTE prepared_query;
Since Ver. 9.4 there's json_object_agg(), which lets us do part of the necessary magic dynamically.
However to be totally dynamic, a temp type (a temp table) has to be FIRST built by running a SQL-EXEC inside an anonymous procedure.
DB FIDDLE (UK):
https://dbfiddle.uk/Sn7iO4zL
DISCLAIMER: Typically the ability to create TEMP TABLES are granted to end-users, but YMMV. Another concern is whether anon. procedures can be exec'd as in-line code by regular users.
-- /**
-- begin test data
-- begin test data
-- begin test data
-- */
DROP TABLE IF EXISTS tmpSales ;
CREATE TEMP TABLE tmpSales AS
SELECT
sale_id
,TRUNC(RANDOM()*12)+1 AS book_id
,TRUNC(RANDOM()*100)+1 AS customer_id
,(date '2010-01-01' + random() * (timestamp '2016-12-31' - timestamp '2010-01-01')) AS sale_date
FROM generate_series(1,10000) AS sale_id;
DROP TABLE IF EXISTS tmp_month_total ;
CREATE TEMP TABLE tmp_month_total AS
SELECT
date_part( 'year' , sale_date ) AS year
,date_part( 'month', sale_date ) AS mn
,to_char(sale_date, 'mon') AS month
,COUNT(*) AS total
FROM tmpSales
GROUP BY date_part('year', sale_date), to_char(sale_date, 'mon') ,date_part( 'month', sale_date )
;
DATA:
+----+--+-----+-----+
|year|mn|month|total|
+----+--+-----+-----+
|2010|1 |jan |127 |
|2010|2 |feb |117 |
|2010|3 |mar |121 |
|2010|4 |apr |131 |
|2010|5 |may |106 |
|2010|6 |jun |121 |
|2010|7 |jul |129 |
|2010|8 |aug |114 |
|2010|9 |sep |115 |
|2010|10|oct |110 |
|2010|11|nov |133 |
|2010|12|dec |108 |
+----+--+-----+-----+
-- /**
-- END test data
-- END test data
-- END test data
-- */
-- /**
-- dyn. build a temporary row-type based on existing data, not hard-coded
-- dyn. build a temporary row-type based on existing data, not hard-coded
-- dyn. build a temporary row-type based on existing data, not hard-coded
-- **/
DROP TABLE IF EXISTS tmpTblTyp CASCADE ;
DO LANGUAGE plpgsql $$ DECLARE v_sqlstring VARCHAR = ''; BEGIN
v_sqlstring := CONCAT( 'CREATE TEMP TABLE tmpTblTyp AS SELECT '
,(SELECT STRING_AGG( CONCAT('NULL::int AS ' , month )::TEXT , ' ,'
ORDER BY mn
)::TEXT
FROM
(SELECT DISTINCT month, mn FROM tmp_month_total )a )
,' LIMIT 0 '
) ; -- RAISE NOTICE '%', v_sqlstring ;
EXECUTE( v_sqlstring ) ; END $$;
DROP TABLE IF EXISTS tmpMoToJson ;
CREATE TEMP TABLE tmpMoToJson AS
SELECT
year AS year
,(json_build_array( months )) AS js_months_arr
,json_populate_recordset ( NULL::tmpTblTyp /** use temp table as a record type!! **/
, json_build_array( months )
) jprs /** builds row-type column that can be expanded with (jprs).*
**/
FROM ( SELECT year
-- accum data into JSON array
,json_object_agg(month,total) AS months
FROM tmp_month_total
GROUP BY year
ORDER BY year
) a
;
SELECT
year
,(ROW((jprs).*)::tmpTblTyp).* -- explode the composite type row
FROM tmpMoToJson ;
+----+---+---+---+---+---+---+---+---+---+---+---+---+
|year|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|
+----+---+---+---+---+---+---+---+---+---+---+---+---+
|2010|127|117|121|131|106|121|129|114|115|110|133|108|
|2011|117|112|117|115|139|116|119|152|117|112|115|103|
|2012|129|111|98 |140|109|131|114|110|112|115|100|121|
|2013|128|112|141|127|141|102|113|109|111|110|123|116|
|2014|129|114|117|118|111|123|106|111|127|121|124|145|
|2015|118|113|131|122|120|121|140|114|118|108|114|131|
|2016|117|110|139|100|110|116|112|109|131|117|122|132|
+----+---+---+---+---+---+---+---+---+---+---+---+---+
By using pivot also we can achieve your required out put
SELECT co
,industry
,customer_type
,product_type
,sales_model
,stage
FROM dataSet
PIVOT(max(tags) FOR ontologyType IN (
industry
,customer_type
,product_type
,sales_model
,stage
)) AS PVT

Make duplicate row in Postgresql

I am writing migration script to migrate database. I have to duplicate the row by incrementing primary key considering that different database can have n number of different columns in the table. I can't write each and every column in query. If i simply just copy the row then, I am getting duplicate key error.
Query: INSERT INTO table_name SELECT * FROM table_name WHERE id=255;
ERROR: duplicate key value violates unique constraint "table_name_pkey"
DETAIL: Key (id)=(255) already exist
Here, It's good that I don't have to mention all column names. I can select all columns by giving *. But, same time I am also getting duplicate key error.
What's the solution of this problem? Any help would be appreciated. Thanks in advance.
If you are willing to type all column names, you may write
INSERT INTO table_name (
pri_key
,col2
,col3
)
SELECT (
SELECT MAX(pri_key) + 1
FROM table_name
)
,col2
,col3
FROM table_name
WHERE id = 255;
Other option (without typing all columns , but you know the primary key ) is to CREATE a temp table, update it and re-insert within a transaction.
BEGIN;
CREATE TEMP TABLE temp_tab ON COMMIT DROP AS SELECT * FROM table_name WHERE id=255;
UPDATE temp_tab SET pri_key_col = ( select MAX(pri_key_col) + 1 FROM table_name );
INSERT INTO table_name select * FROM temp_tab;
COMMIT;
This is just a DO block but you could create a function that takes things like the table name etc as parameters.
Setup:
CREATE TABLE public.t1 (a TEXT, b TEXT, c TEXT, id SERIAL PRIMARY KEY, e TEXT, f TEXT);
INSERT INTO public.t1 (e) VALUES ('x'), ('y'), ('z');
Code to duplicate values without the primary key column:
DO $$
DECLARE
_table_schema TEXT := 'public';
_table_name TEXT := 't1';
_pk_column_name TEXT := 'id';
_columns TEXT;
BEGIN
SELECT STRING_AGG(column_name, ',')
INTO _columns
FROM information_schema.columns
WHERE table_name = _table_name
AND table_schema = _table_schema
AND column_name <> _pk_column_name;
EXECUTE FORMAT('INSERT INTO %1$s.%2$s (%3$s) SELECT %3$s FROM %1$s.%2$s', _table_schema, _table_name, _columns);
END $$
The query it creates and runs is: INSERT INTO public.t1 (a,b,c,e,f) SELECT a,b,c,e,f FROM public.t1. It's selected all the columns apart from the PK one. You could put this code in a function and use it for any table you wanted, or just use it like this and edit it for whatever table.

Get columns that differ between 2 rows

I have a table company with 60 columns. The goal is to create a tool to find, compare and eliminate duplicates in this table.
Example: I find 2 companies that potentially are the same, but I need to know which values (columns) differ between these 2 rows in order to continue.
I think it is possible to compare column by column x 60, but I search for a simpler and more generic solution.
Something like:
SELECT * FROM company where co_id=22
SHOW DIFFERENCE
SELECT * FROM company where co_id=33
The result should be the column names that differ.
For this you may use an intermediate key/value representation of the rows, with JSON functions or alternatively with the hstore extension (now only of historical interest). JSON comes built-in with every reasonably recent version of PostgreSQL, whereas hstore must be installed in the database with CREATE EXTENSION.
Demo:
CREATE TABLE table1 (id int primary key, t1 text, t2 text, t3 text);
Let's insert two rows that differ by the primary key and one other column (t3).
INSERT INTO table1 VALUES
(1,'foo','bar','baz'),
(2,'foo','bar','biz');
Solution with json
First with get a key/value representation of the rows with the original row number, then we pair the rows based on their original row number and
filter out those with the same "value" column
WITH rowcols AS (
select rn, key, value
from (select row_number() over () as rn,
row_to_json(table1.*) as r from table1) AS s
cross join lateral json_each_text(s.r)
)
select r1.key from rowcols r1 join rowcols r2
on (r1.rn=r2.rn-1 and r1.key = r2.key)
where r1.value <> r2.value;
Sample result:
key
-----
id
t3
Solution with hstore
SELECT skeys(h1-h2) from
(select hstore(t.*) as h1 from table1 t where id=1) h1
CROSS JOIN
(select hstore(t.*) as h2 from table1 t where id=2) h2;
h1-h2 computes the difference key by key and skeys() outputs the result as a set.
Result:
skeys
-------
id
t3
The select-list might be refined with skeys((h1-h2)-'id'::text) to always remove id which, as the primary key, will obviously always differ between rows.
Here's a stored procedure that should get you most of the way...
While this should work "as is", it has no error checking, which you should add.
It gets all the columns in the table, and loops over them. A difference is when the count of the distinct items is more than one.
Also, the output is:
The count of the number of differences
Messages for each column where there is a difference
It might be more useful to return a rowset of the columns with the differences. Anyway, good luck!
Usage:
SELECT showdifference('public','company','co_id',22,33)
CREATE OR REPLACE FUNCTION showdifference(p_schema text, p_tablename text,p_idcolumn text,p_firstid integer, p_secondid integer)
RETURNS INTEGER AS
$BODY$
DECLARE
l_diffcount INTEGER;
l_column text;
l_dupcount integer;
column_cursor CURSOR FOR select column_name from information_schema.columns where table_name = p_tablename and table_schema = p_schema and column_name <> p_idcolumn;
BEGIN
-- need error checking here, to ensure the table and schema exist and the columns exist
-- Should also check that the records ids exist.
-- Should also check that the column type of the id field is integer
-- Set the number of differences to zero.
l_diffcount := 0;
-- use a cursor to iterate over the columns found in information_schema.columns
-- open the cursor
OPEN column_cursor;
LOOP
FETCH column_cursor INTO l_column;
EXIT WHEN NOT FOUND;
-- build a query to see if there is a difference between the columns. If there is raise a notice
EXECUTE 'select count(distinct ' || quote_ident(l_column) || ' ) from ' || quote_ident(p_schema) || '.' || quote_ident(p_tablename) || ' where ' || quote_ident(p_idcolumn) || ' in ('|| p_firstid || ',' || p_secondid ||')'
INTO l_dupcount;
IF l_dupcount > 1 THEN
-- increment the counter
l_diffcount := l_diffcount +1;
RAISE NOTICE '% has % differences', l_column, l_dupcount ; -- for "real" you might want to return a rowset and could do something here
END IF;
END LOOP;
-- close the cursor
CLOSE column_cursor;
RETURN l_diffcount;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT
COST 100;