Postgresql - how to groups rows and convert some rows to columns [duplicate] - postgresql

I have an interesting conundrum which I believe can be solved in purely SQL. I have tables similar to the following:
responses:
user_id | question_id | body
----------------------------
1 | 1 | Yes
2 | 1 | Yes
1 | 2 | Yes
2 | 2 | No
1 | 3 | No
2 | 3 | No
questions:
id | body
-------------------------
1 | Do you like apples?
2 | Do you like oranges?
3 | Do you like carrots?
and I would like to get the following output
user_id | Do you like apples? | Do you like oranges? | Do you like carrots?
---------------------------------------------------------------------------
1 | Yes | Yes | No
2 | Yes | No | No
I don't know how many questions there will be, and they will be dynamic, so I can't just code for every question. I am using PostgreSQL and I believe this is called transposition, but I can't seem to find anything that says the standard way of doing this in SQL. I remember doing this in my database class back in college, but it was in MySQL and I honestly don't remember how we did it.
I'm assuming it will be a combination of joins and a GROUP BY statement, but I can't even figure out how to start.
Anybody know how to do this? Thanks very much!
Edit 1: I found some information about using a crosstab which seems to be what I want, but I'm having trouble making sense of it. Links to better articles would be greatly appreciated!

Use:
SELECT r.user_id,
MAX(CASE WHEN r.question_id = 1 THEN r.body ELSE NULL END) AS "Do you like apples?",
MAX(CASE WHEN r.question_id = 2 THEN r.body ELSE NULL END) AS "Do you like oranges?",
MAX(CASE WHEN r.question_id = 3 THEN r.body ELSE NULL END) AS "Do you like carrots?"
FROM RESPONSES r
JOIN QUESTIONS q ON q.id = r.question_id
GROUP BY r.user_id
This is a standard pivot query, because you are "pivoting" the data from rows to columnar data.

I implemented a truly dynamic function to handle this problem without having to hard code any specific class of answers or use external modules/extensions. It also gives full control over column ordering and supports multiple key and class/attribute columns.
You can find it here: https://github.com/jumpstarter-io/colpivot
Example that solves this particular problem:
begin;
create temporary table responses (
user_id integer,
question_id integer,
body text
) on commit drop;
create temporary table questions (
id integer,
body text
) on commit drop;
insert into responses values (1,1,'Yes'), (2,1,'Yes'), (1,2,'Yes'), (2,2,'No'), (1,3,'No'), (2,3,'No');
insert into questions values (1, 'Do you like apples?'), (2, 'Do you like oranges?'), (3, 'Do you like carrots?');
select colpivot('_output', $$
select r.user_id, q.body q, r.body a from responses r
join questions q on q.id = r.question_id
$$, array['user_id'], array['q'], '#.a', null);
select * from _output;
rollback;
This outputs:
user_id | 'Do you like apples?' | 'Do you like carrots?' | 'Do you like oranges?'
---------+-----------------------+------------------------+------------------------
1 | Yes | No | Yes
2 | Yes | No | No

You can solve this example with the crosstab function in this way
drop table if exists responses;
create table responses (
user_id integer,
question_id integer,
body text
);
drop table if exists questions;
create table questions (
id integer,
body text
);
insert into responses values (1,1,'Yes'), (2,1,'Yes'), (1,2,'Yes'), (2,2,'No'), (1,3,'No'), (2,3,'No');
insert into questions values (1, 'Do you like apples?'), (2, 'Do you like oranges?'), (3, 'Do you like carrots?');
select * from crosstab('select responses.user_id, questions.body, responses.body from responses, questions where questions.id = responses.question_id order by user_id') as ct(userid integer, "Do you like apples?" text, "Do you like oranges?" text, "Do you like carrots?" text);
First, you must install tablefunc extension. Since 9.1 version you can do it using create extension:
CREATE EXTENSION tablefunc;

I wrote a function to generate the dynamic query.
It generates the sql for the crosstab and creates a view (drops it first if it exists).
You can than select from the view to get your results.
Here is the function:
CREATE OR REPLACE FUNCTION public.c_crosstab (
eavsql_inarg varchar,
resview varchar,
rowid varchar,
colid varchar,
val varchar,
agr varchar
)
RETURNS void AS
$body$
DECLARE
casesql varchar;
dynsql varchar;
r record;
BEGIN
dynsql='';
for r in
select * from pg_views where lower(viewname) = lower(resview)
loop
execute 'DROP VIEW ' || resview;
end loop;
casesql='SELECT DISTINCT ' || colid || ' AS v from (' || eavsql_inarg || ') eav ORDER BY ' || colid;
FOR r IN EXECUTE casesql Loop
dynsql = dynsql || ', ' || agr || '(CASE WHEN ' || colid || '=''' || r.v || ''' THEN ' || val || ' ELSE NULL END) AS ' || agr || '_' || r.v;
END LOOP;
dynsql = 'CREATE VIEW ' || resview || ' AS SELECT ' || rowid || dynsql || ' from (' || eavsql_inarg || ') eav GROUP BY ' || rowid;
RAISE NOTICE 'dynsql %1', dynsql;
EXECUTE dynsql;
END
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100;
And here is how I use it:
SELECT c_crosstab('query_txt', 'view_name', 'entity_column_name', 'attribute_column_name', 'value_column_name', 'first');
Example:
Fist you run:
SELECT c_crosstab('Select * from table', 'ct_view', 'usr_id', 'question_id', 'response_value', 'first');
Than:
Select * from ct_view;

There is an example of this in contrib/tablefunc/.

Related

Postgresql dynamic dataset aggregation

I'm trying to aggregate multiple datasets where I have to be able to create dynamic rules on how to aggregate the data on an id basis. Below is a quick approach I wrote up that seems to do what I intended. Is there a more performant (and preferably safe) way of doing this. tbl1...n will be larger in size than agg_rule. I'm running postgres 13.0, if there are new features coming in 14 that'll help, that could also be of interest. I am only interesting in coalescing and sum:ing like below if that simplifies the problem.
CREATE TABLE tbl1 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl2 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl3 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE agg_rule AS
SELECT 1 id
, 'coalesce(tbl1.val, tbl2.val + tbl3.val)' expr;
CREATE OR REPLACE FUNCTION eval(_id INTEGER, _seq INTEGER)
RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE _res INTEGER;
BEGIN
EXECUTE 'SELECT ' || (
SELECT expr
FROM agg_rule
WHERE id = _id
) || '
FROM tbl1
FULL JOIN tbl2 USING (id, seq)
FULL JOIN tbl3 USING (id, seq)
WHERE id = ' || _id || '
AND seq = ' || _seq || ';' INTO _res;
RETURN _res;
END
$$;

can not get all column values separately but instead get all values in one column postgresql --9.5

CREATE OR REPLACE FUNCTION public.get_locations(
location_word varchar(50)
)
RETURNS TABLE
(
country varchar(50),
city varchar(50)
)
AS $$
DECLARE
location_word_ varchar(50);
BEGIN
location_word_:=concat(location_word, '%');
RETURN QUERY EXECUTE format(' (SELECT c.country, ''''::varchar(50) as city FROM webuser.country c
WHERE lower(c.country) LIKE %L LIMIT 1)
UNION
(SELECT c.country,ci.city FROM webuser.country c
JOIN webuser.city ci ON c.country_id=ci.country_id
WHERE lower(ci.city) LIKE %L LIMIT 4)',
location_word_,
location_word_ ) ;
END
$$ language PLPGSQL STABLE;
SELECT public.get_locations('a'::varchar(50));
I get this;
+get_locations +
+record +
------------------
+(Andorra,"") +
+(Germany,Aach) +
+(Germany,Aalen) +
+(Germany,Achim) +
+(Germany,Adenau)+
How can i place/get the values column by column like below? Because otherwise i can not match the values correctly. I should get the values column by column as countries and cities etc.
|country | city |
-------------------------------
| Andorra | "" |
| Germany | Aach |
| Germany | Aalen |
| Germany | Achim |
| Germany | Adenau |
Your function is declared as returns table so you have to use it like a table:
SELECT *
FROM public.get_locations('a'::varchar(50));
Unrelated, but:
Your function is way too complicated, you don't need dynamic SQL, nor do you need a PL/pgSQL function.
You can simplify that to:
CREATE OR REPLACE FUNCTION public.get_locations(p_location_word varchar(50))
RETURNS TABLE(country varchar(50), city varchar(50))
AS $$
(SELECT c.country, ''::varchar(50) as city
FROM webuser.country c
WHERE lower(c.country) LIKE concat(p_location_word, '%')
LIMIT 1)
UNION ALL
(SELECT c.country ,ci.city
FROM webuser.country c
JOIN webuser.city ci ON c.country_id = ci.country_id
WHERE lower(ci.city) LIKE concat(p_location_word, '%')
LIMIT 4)
$$
language SQL;

Get complex output type of function that returns record in postgresql

I want to write nice and detailed report on functions in my postgresql database.
I built the following query:
SELECT routine_name, data_type, proargnames
FROM information_schema.routines
join pg_catalog.pg_proc on pg_catalog.pg_proc.proname = information_schema.routines.routine_name
WHERE specific_schema = 'public'
ORDER BY routine_name;
It works as it should (basically returns me what I want it to: function name, output data type and input data type) except one thing:
I have relatively complicated functions and many of them return record.
The thing is, data_type returns me record as well for such functions, while I want detailed list of function output types.
For instance, I have something like this in one of my functions:
RETURNS TABLE("Res" integer, "Output" character varying) AS
How can I make query above (or, perhaps, a new query, if it will solve the problem) return something like
integer, character varying instead of record for such functions?
I am using postgresql 9.2
Thanks in advance!
The RECORD returned value is evaluated at runtime, there is no way that the information can be retrieved this way.
BUT, if RETURNS TABLE("Res" integer, "Output" character varying) AS is used, there is a solution.
The test functions I used:
-- first function, uses RETURNS TABLE
CREATE FUNCTION test_ret(a TEXT, b TEXT)
RETURNS TABLE("Res" integer, "Output" character varying) AS $$
DECLARE
ret RECORD;
BEGIN
-- test
END;$$ LANGUAGE plpgsql;
-- second function, test some edge cases
-- same name as above, returns simple integer
CREATE FUNCTION test_ret(a TEXT)
RETURNS INTEGER AS $$
DECLARE
ret RECORD;
BEGIN
-- test
END;$$ LANGUAGE plpgsql;
How to retrieve this function return datatype is easy as it's stored into pg_catalog.pg_proc.proallargtypes, the problem is that this is an array of OID. We must unnest this thing and join it to pg_catalog.pg_types.oid.
-- edit: add support for function not returning tables, thx Tommaso Di Bucchianico
WITH pg_proc_with_unnested_proallargtypes AS (
SELECT
pg_catalog.pg_proc.oid,
pg_catalog.pg_proc.proname,
CASE WHEN proallargtypes IS NOT NULL THEN unnest(proallargtypes) ELSE null END AS proallargtype
FROM pg_catalog.pg_proc
JOIN pg_catalog.pg_namespace ON pg_catalog.pg_proc.pronamespace = pg_catalog.pg_namespace.oid
WHERE pg_catalog.pg_namespace.nspname = 'public'
),
pg_proc_with_proallargtypes_names AS (
SELECT
pg_proc_with_unnested_proallargtypes.oid,
pg_proc_with_unnested_proallargtypes.proname,
array_agg(pg_catalog.pg_type.typname) AS proallargtypes
FROM pg_proc_with_unnested_proallargtypes
LEFT JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = proallargtype
GROUP BY
pg_proc_with_unnested_proallargtypes.oid,
pg_proc_with_unnested_proallargtypes.proname
)
SELECT
information_schema.routines.specific_name,
information_schema.routines.routine_name,
information_schema.routines.routine_schema,
information_schema.routines.data_type,
pg_proc_with_proallargtypes_names.proallargtypes
FROM information_schema.routines
-- we can declare many function with the same name and schema as long as arg types are different
-- This is the only right way to join pg_catalog.pg_proc and information_schema.routines, sadly
JOIN pg_proc_with_proallargtypes_names
ON pg_proc_with_proallargtypes_names.proname || '_' || pg_proc_with_proallargtypes_names.oid = information_schema.routines.specific_name
;
Any refactoring is welcome :)
Here is the result:
specific_name | routine_name | routine_schema | data_type | proallargtypes
----------------+--------------+----------------+-----------+--------------------------
test_ret_16633 | test_ret | public | record | {text,text,int4,varchar}
test_ret_16635 | test_ret | public | integer | {NULL}
(2 rows)
EDIT
Identification of input and output arguments is not trivial, here is my solution for pg 9.2
-- https://gist.github.com/subssn21/e9e121f6fd5ff50f688d
-- Allow us to use array_remove in pg < 9.3
CREATE OR REPLACE FUNCTION array_remove(a ANYARRAY, e ANYELEMENT)
RETURNS ANYARRAY AS $$
BEGIN
RETURN array(SELECT x FROM unnest(a) x WHERE x <> e);
END;
$$ LANGUAGE plpgsql;
-- edit: add support for function not returning tables, thx Tommaso Di Bucchianico
WITH pg_proc_with_unnested_proallargtypes AS (
SELECT
pg_catalog.pg_proc.oid,
pg_catalog.pg_proc.proname,
pg_catalog.pg_proc.proargmodes,
CASE WHEN proallargtypes IS NOT NULL THEN unnest(proallargtypes) ELSE null END AS proallargtype
FROM pg_catalog.pg_proc
JOIN pg_catalog.pg_namespace ON pg_catalog.pg_proc.pronamespace = pg_catalog.pg_namespace.oid
WHERE pg_catalog.pg_namespace.nspname = 'public'
),
pg_proc_with_unnested_proallargtypes_names_and_mode AS (
SELECT
pg_proc_with_unnested_proallargtypes.oid,
pg_proc_with_unnested_proallargtypes.proname,
pg_catalog.pg_type.typname,
-- we can't unnest multiple array of same length the way we expect in pg 9.2
-- just retrieve each mode manually using type row_number
pg_proc_with_unnested_proallargtypes.proargmodes[row_number() OVER w] AS proargmode
FROM pg_proc_with_unnested_proallargtypes
LEFT JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = proallargtype
WINDOW w AS (PARTITION BY pg_proc_with_unnested_proallargtypes.proname)
),
pg_proc_with_input_and_output_type_names AS (
SELECT
pg_proc_with_unnested_proallargtypes_names_and_mode.oid,
pg_proc_with_unnested_proallargtypes_names_and_mode.proname,
array_agg(pg_proc_with_unnested_proallargtypes_names_and_mode.typname) AS proallargtypes,
-- we should use FILTER, but that's not available in pg 9.2 :(
array_remove(array_agg(
-- see documentation for proargmodes here: http://www.postgresql.org/docs/9.2/static/catalog-pg-proc.html
CASE WHEN pg_proc_with_unnested_proallargtypes_names_and_mode.proargmode = ANY(ARRAY['i', 'b', 'v'])
THEN pg_proc_with_unnested_proallargtypes_names_and_mode.typname
ELSE NULL END
), NULL) AS proinputargtypes,
array_remove(array_agg(
-- see documentation for proargmodes here: http://www.postgresql.org/docs/9.2/static/catalog-pg-proc.html
CASE WHEN pg_proc_with_unnested_proallargtypes_names_and_mode.proargmode = ANY(ARRAY['o', 'b', 't'])
THEN pg_proc_with_unnested_proallargtypes_names_and_mode.typname
ELSE NULL END
), NULL) AS prooutputargtypes
FROM pg_proc_with_unnested_proallargtypes_names_and_mode
GROUP BY
pg_proc_with_unnested_proallargtypes_names_and_mode.oid,
pg_proc_with_unnested_proallargtypes_names_and_mode.proname
)
SELECT
*
FROM pg_proc_with_input_and_output_type_names
;
And here is my sample output:
oid | proname | proallargtypes | proinputargtypes | prooutputargtypes
-------+--------------+--------------------------+------------------+-------------------
16633 | test_ret | {text,text,int4,varchar} | {text,text} | {int4,varchar}
16634 | array_remove | {NULL} | {} | {}
16635 | test_ret | {NULL} | {} | {}
(3 rows)
Hope that helps :)
Answer, provided by Clément Prévost, is detailed and educative and therefore I marked it as best, while, however, after I executed suggested script I ended up with empty (filled only by {}) proinputargtypes and prooutputargtypes columnes on my machine. So I conducted a little research on my own, using hints that I learned from the answer above, and wrote following query:
WITH pg_proc_with_unhandled_proallargtypes AS (
SELECT
pg_catalog.pg_proc.oid,
pg_catalog.pg_proc.proname,
pg_catalog.pg_proc.proargmodes,
CASE WHEN proallargtypes IS NOT NULL THEN cast(proallargtypes AS text) ELSE NULL END AS proallargtype,
CASE WHEN array_agg(proargtypes) IS NOT NULL THEN replace(string_agg(proargtypes::text, ','), ' ', ',') ELSE NULL END AS proargtype
FROM pg_catalog.pg_proc
JOIN pg_catalog.pg_namespace ON pg_catalog.pg_proc.pronamespace = pg_catalog.pg_namespace.oid
WHERE pg_catalog.pg_namespace.nspname = 'public'
GROUP BY pg_catalog.pg_proc.oid, pg_catalog.pg_proc.proname, pg_catalog.pg_proc.proargmodes, proallargtype
),
pg_proc_with_unnested_proallargtypes AS(
SELECT proname, CASE WHEN char_length(proargtype) =0 THEN NULL
ELSE ('{' || proargtype || '}')::oid[] end AS inp,
CASE WHEN proallargtype is NULL THEN NULL ELSE replace(proallargtype, proargtype || ',', '')::oid[] end AS output
FROM pg_proc_with_unhandled_proallargtypes
),
smth_input AS(
SELECT proname, unnest(inp) AS inp FROM pg_proc_with_unnested_proallargtypes
),
smth_output AS(
SELECT proname, unnest(output) AS output FROM pg_proc_with_unnested_proallargtypes
),
input_unnested AS(
SELECT proname, string_agg(pg_catalog.pg_type.typname::text, ',') AS fin_input FROM smth_input
JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = inp
GROUP BY proname
),
output_unnested AS (
SELECT proname, string_agg(pg_catalog.pg_type.typname::text, ',') AS fin_output FROM smth_output
JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = output
GROUP BY proname
)
SELECT input_unnested.proname, fin_input, CASE WHEN fin_output IS NOT NULl THEN fin_output ELSE information_schema.routines.data_type end AS fin_output
FROM input_unnested
LEFT JOIN output_unnested ON input_unnested.proname = output_unnested.proname
JOIN information_schema.routines ON information_schema.routines.routine_name = input_unnested.proname
It might be inefficient a bit and I probably used too much explicict type casting, but it worked.

Postgresql, select a "fake" row

In Postgres 8.4 or higher, what is the most efficient way to get a row of data populated by defaults without actually creating the row. Eg, as a transaction (pseudocode):
create table "mytable"
(
id serial PRIMARY KEY NOT NULL,
parent_id integer NOT NULL DEFAULT 1,
random_id integer NOT NULL DEFAULT random(),
)
begin transaction
fake_row = insert into mytable (id) values (0) returning *;
delete from mytable where id=0;
return fake_row;
end transaction
Basically I'd expect a query with a single row where parent_id is 1 and random_id is a random number (or other function return value) but I don't want this record to persist in the table or impact on the primary key sequence serial_id_seq.
My options seem to be using a transaction like above or creating views which are copies of the table with the fake row added but I don't know all the pros and cons of each or whether a better way exists.
I'm looking for an answer that assumes no prior knowledge of the datatypes or default values of any column except id or the number or ordering of the columns. Only the table name will be known and that a record with id 0 should not exist in the table.
In the past I created the fake record 0 as a permanent record but I've come to consider this record a type of pollution (since I typically have to filter it out of future queries).
You can copy the table definition and defaults to the temp table with:
CREATE TEMP TABLE table_name_rt (LIKE table_name INCLUDING DEFAULTS);
And use this temp table to generate dummy rows. Such table will be dropped at the end of the session (or transaction) and will only be visible to current session.
You can query the catalog and build a dynamic query
Say we have this table:
create table test10(
id serial primary key,
first_name varchar( 100 ),
last_name varchar( 100 ) default 'Tom',
age int not null default 38,
salary float default 100.22
);
When you run following query:
SELECT string_agg( txt, ' ' order by id )
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, -9999 || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
you will get this sting as a result:
"SELECT -9999 as id , null::character varying as first_name ,
'Tom'::character varying as last_name , 38 as age , 100.22 as salary"
then execute this query and you will get the "phantom row".
We can build a function that build and excecutes the query and return our row as a result:
CREATE OR REPLACE FUNCTION get_phantom_rec (p_i test10.id%type )
returns test10 as $$
DECLARE
v_sql text;
myrow test10%rowtype;
begin
SELECT string_agg( txt, ' ' order by id )
INTO v_sql
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, p_i || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
EXECUTE v_sql INTO myrow;
RETURN myrow;
END$$ LANGUAGE plpgsql ;
and then this simple query gives you what you want:
select * from get_phantom_rec ( -9999 );
id | first_name | last_name | age | salary
-------+------------+-----------+-----+--------
-9999 | | Tom | 38 | 100.22
I would just select the fake values as literals:
select 1 id, 1 parent_id, 1 user_id
The returned row will be (virtually) indistinguishable from a real row.
To get the values from the catalog:
select
0 as id, -- special case for serial type, just return 0
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'parent_id') as parent_id,
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'user_id') as user_id;
Note that you must know what the columns are and their type, but this is reasonable. If you change the table schema (except default value), you would need to tweak the query.
See the above as a SQLFiddle.

Duplicate single database record

Hello what is the easiest way to duplicate a DB record over the same table?
My problem is that the table where I am doing this has many column, like 100+, and I don't like how the solution looks like. Here is what I do (this is inside plpqsql function):
...
1. duplicate record
INSERT INTO history
(SELECT NEXTVAL('history_id_seq'), col_1, col_2, ... , col_100)
FROM history
WHERE history_id = 1234
ORDER BY datetime DESC
LIMIT 1)
RETURNING
history_id INTO new_history_id;
2. update some columns
UPDATE history
SET
col_5 = 'test_5',
col_23 = 'test_23',
datetime = CURRENT_TIMESTAMP
WHERE history_id = new_history_id;
Here are the problems I am attempting to solve
Listing all these 100+ columns looks lame
When new column is added eventually the function should be updated too
On separate DB instances the column order might differ, which would cause the function fail
I am not sure if I can list them once more (solving issue 3) like insert into <table> (<columns_list>) values (<query>) but then the query looks even uglier.
I would like to achieve something like 'insert into ', but this seems impossible the unique primary key constraint will raise a duplication error.
Any suggestions?
Thanks in advance for you time.
This isn't pretty or particularly optimized but there are a couple of ways to go about this. Ideally, you might want to do this all in an UPDATE trigger though you could implement a duplication function something like this:
-- create source table
CREATE TABLE history (history_id serial not null primary key, col_2 int, col_3 int, col_4 int, datetime timestamptz default now());
-- add some data
INSERT INTO history (col_2, col_3, col_4)
SELECT g, g * 10, g * 100 FROM generate_series(1, 100) AS g;
-- function to duplicate record
CREATE OR REPLACE FUNCTION fn_history_duplicate(p_history_id integer) RETURNS SETOF history AS
$BODY$
DECLARE
cols text;
insert_statement text;
BEGIN
-- build list of columns
SELECT array_to_string(array_agg(column_name::name), ',') INTO cols
FROM information_schema.columns
WHERE (table_schema, table_name) = ('public', 'history')
AND column_name <> 'history_id';
-- build insert statement
insert_statement := 'INSERT INTO history (' || cols || ') SELECT ' || cols || ' FROM history WHERE history_id = $1 RETURNING *';
-- execute statement
RETURN QUERY EXECUTE insert_statement USING p_history_id;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql';
-- test
SELECT * FROM fn_history_duplicate(1);
history_id | col_2 | col_3 | col_4 | datetime
------------+-------+-------+-------+-------------------------------
101 | 1 | 10 | 100 | 2013-04-15 14:56:11.131507+00
(1 row)
As I noted in my original comment, you might also take a look at the colnames extension as an alternative to querying the information schema.
You don't need the update anyway, you can supply the constant values directly in the SELECT statement:
INSERT INTO history
SELECT NEXTVAL('history_id_seq'),
col_1,
col_2,
col_3,
col_4,
'test_5',
...
'test_23',
...,
col_100
FROM history
WHERE history_sid = 1234
ORDER BY datetime DESC
LIMIT 1
RETURNING history_sid INTO new_history_sid;