Generate dynamic columns for CROSSTAB in postgresql?

Generate dynamic columns for CROSSTAB in postgresql? - postgresql

I have this table in postgres
CREATE TABLE ct(id SERIAL, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att1','val1');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att2','val2');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att3','val3');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att4','val4');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att1','val5');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att2','val6');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');
I want to generate a dynamic crosstab query using this table.
Till now I have created the static query by following the example on the official postgres documentation page.
select * from crosstab
('select rowid, attribute, value from ct order by 1,2')
as final_result(rowid text, att1 text, att2 text, att3 text, att4 text)
Now I want this part to be dynamic
as final_result(rowid text, att1 text, att2 text, att3 text, att4 text)
I tried few things such as
Creating a query which generate the column name with their types and passing that query in as final_result(query), but it doesn't work as here,
SELECT 'rowid text, '
|| string_agg(Distinct attribute, ' text, ') as name
FROM ct;
select * from crosstab
('select rowid, attribute, value from ct order by 1,2')
as final_result(SELECT 'rowid text, '
|| string_agg(Distinct attribute, ' text, ') as name
FROM ct;)
OR
select * from crosstab
('select rowid, attribute, value from ct order by 1,2',
SELECT 'rowid text, '
|| string_agg(Distinct attribute, ' text, ')) as name
FROM ct;)
Both of these queries doesn't work.
I searched stackoverflow found this link, but it also doesn't have a proper acceptable answer here,
Dynamically generate columns for crosstab in PostgreSQL
Any idea how this can be done.

Related

ERROR: column "int4" specified more than once

Steps for Execution:
Table Creation
CREATE TABLE xyz.table_a(
id bigint NOT NULL,
scores jsonb,
CONSTRAINT table_a_pkey PRIMARY KEY (id)
);
Add some dummy data :
INSERT INTO xyz.table_a(
id, scores)
VALUES (1, '{"a":20,"b":20}');
Function Creation
CREATE OR REPLACE FUNCTION xyz.example(
table_name text,
regular_columns text,
json_column text,
view_name text
) RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
cols TEXT;
cols_sum TEXT;
BEGIN
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key),
', '
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1
) s;$ex$,
table_name, json_column
)
INTO cols;
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key
),
'+'
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1) s;$ex$,
table_name, json_column
)
INTO cols_sum;
EXECUTE
format(
$ex$DROP VIEW IF EXISTS %2$s;
CREATE VIEW %2$s AS
SELECT %3$s, %4$s, SUM(%5$s) AS total
FROM %1$s
GROUP BY %3$s$ex$,
table_name, view_name, regular_columns, cols, cols_sum
);
RETURN cols;
END
$BODY$:
Call Function
SELECT xyz.example(
'xyz.table_a',
' id',
'scores',
'xyz.view_table_a'
);
Once you run these steps, I am getting an error
ERROR: column "int4" specified more than once
CONTEXT: SQL statement "
DROP VIEW IF EXISTS xyz.view_table_a;
CREATE VIEW xyz.view_table_a AS
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER), SUM(CAST(scores->>'a' AS INTEGER)+CAST(scores->>'b' AS INTEGER)) AS total FROM xyz.table_a GROUP BY id

Look at the error message closely:
...
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER),
...
There are multiple expressions without column alias. A named column like "id" defaults to the given name. But other expressions default to the internal type name, which is "int4" for integer. One might assume that the JSON key name is used, but that's not so. CAST(scores->>'a' AS INTEGER) is just another expression returning an unnamed integer value.
This still works for a plain SELECT. Postgres tolerates duplicate column names in the (outer) SELECT list. But a VIEW cannot be created that way. Would result in ambiguities.
Either add column aliases to expressions in the SELECT list:
SELECT id, CAST(scores->>'a' AS INTEGER) AS a, CAST(scores->>'b' AS INTEGER) AS b, ...
Or add a list of column names to CREATE VIEW:
CREATE VIEW xyz.view_table_a(id, a, b, ...) AS ...
Something like this should fix your function (preserving literal spelling of JSON key names:
...
format(
'CAST(%2$s->>%%1$L AS INTEGER) AS %%1$I',
key),
...
See the working demo here:
db<>fiddle here
Aside, your nested format() calls make the code pretty hard to read and maintain.

dynamically select the most occurring value from a column which occurs in multiple tables

The Customer, Musician, and Staff tables in my database include a column called FirstName. The query below returns the most occurring FirstName in those three tables and returns multiple FirstNames if more than one FirstNames occurs the same amount of times.
WITH AllFirstNames AS (
SELECT FirstName
FROM Customer
UNION ALL
SELECT FirstName
FROM Musician
UNION ALL
SELECT FirstName
FROM Staff
), FirstNameOccurrences AS (
SELECT FirstName,
COUNT(*) AS Occurrences
FROM AllFirstNames
GROUP BY FirstName
)
SELECT FirstName AS MostOccurringFirstNames
FROM AllFirstNames
WHERE FirstName IN (
SELECT FirstName
FROM FirstNameOccurrences
WHERE Occurrences IN (
SELECT MAX(Occurrences)
FROM FirstNameOccurrences
)
)
GROUP BY MostOccurringFirstNames;
This only works if the tables which include the FirstName column are specified in the query which returns the temporary AllFirstNames table. If a new table with a FirstName column is added to the database, then this query will have to be updated manually. What do I need to do to the query which returns the temporary AllFirstNames table for it to dynamically UNION ALL FirstName columns from all tables which include a FirstName column? I understand that this will only work if the same naming convention is used throughout the databases lifetime.
The query below lists all the tables that include a FirstName column, but I don't know where to go from there.
SELECT table_name
FROM information_schema.columns
WHERE column_name = 'FirstName';

This does sound like a strange database design, but you can do that by creating a function that iterates over all tables.
The following function counts the distinct values per table.
create or replace function count_names()
returns table(tablename text, firstname text, occurrences bigint)
as
$$
declare
l_row record;
begin
for l_row in select distinct table_schema, table_name, column_name
from information_schema.columns
where table_schema = 'public'
and column_name = 'firstname'
loop
return query execute
format('select %L as tablename, cast(%I as text), count(*) occurrences from %I.%I group by %I',
l_row.table_name, l_row.column_name, l_row.table_schema, l_row.table_name, l_row.column_name);
end loop;
end;
$$
language plpgsql;
The above runs a count()/group by for every table that has a column named firstname in the schema public. The result can then be summed. I included the source table name in the result for debugging purposes, but it's not really needed.
With that function you can do something like this:
select firstname, sum(occurrences) num_names
from count_names()
order by num_names desc
limit 10;
Dynamic SQL is best created using the format() function to properly deal with identifiers. The column and table names you used in your question suggests you created them using the dreaded double quotes ("FirstName" is something different than FirstName) - you should really rethink that. Avoid those dreaded double quotes in SQL

Do you need surrounding parantheses in Postgres SELECT statement?

I noticed that there is different output for this
SELECT id,name,description FROM table_name;
as opposed to this
SELECT (id,name,description) FROM table_name;
Is there any big difference between the two?
What is the purpose of this?

create table table_name(id int, name text, description text);
insert into table_name
values (1, 'John', 'big one');
select (id, name, description), id, name, description
from table_name;
row | id | name | description
--------------------+----+------+-------------
(1,John,"big one") | 1 | John | big one
(1 row)
The difference is important. Columns enclosed in parenthesis form a row constructor known also as a composite value, returned in a single column. Usually, separate columns are preferred as a query result. Row constructors are necessary when a row as a whole is needed (e.g. in the VALUES of the above INSERT command). They are also used as values of composite types.

The following query actually is selecting a ROW type value:
SELECT (id, name, description) FROM table_name;
This syntax by itself would not be very useful, and more typically you would use this if you were doing an INSERT INTO ... SELECT into a table which had a row type in its definition. Here is an example of how you might use this.
CREATE TYPE your_type AS (
id INTEGER,
name VARCHAR,
description VARCHAR
);
CREATE TABLE your_table (
id INTEGER,
t your_type
);
INSERT INTO your_table (id, t)
SELECT 1, (id, name, description)
FROM table_name;
From the Postgres documentation on composite types:
Whenever you create a table, a composite type is also automatically created, with the same name as the table, to represent the table's row type.
So you have already been working with row types, whether or not you knew it.

Make duplicate row in Postgresql

I am writing migration script to migrate database. I have to duplicate the row by incrementing primary key considering that different database can have n number of different columns in the table. I can't write each and every column in query. If i simply just copy the row then, I am getting duplicate key error.
Query: INSERT INTO table_name SELECT * FROM table_name WHERE id=255;
ERROR: duplicate key value violates unique constraint "table_name_pkey"
DETAIL: Key (id)=(255) already exist
Here, It's good that I don't have to mention all column names. I can select all columns by giving *. But, same time I am also getting duplicate key error.
What's the solution of this problem? Any help would be appreciated. Thanks in advance.

If you are willing to type all column names, you may write
INSERT INTO table_name (
pri_key
,col2
,col3
)
SELECT (
SELECT MAX(pri_key) + 1
FROM table_name
)
,col2
,col3
FROM table_name
WHERE id = 255;
Other option (without typing all columns , but you know the primary key ) is to CREATE a temp table, update it and re-insert within a transaction.
BEGIN;
CREATE TEMP TABLE temp_tab ON COMMIT DROP AS SELECT * FROM table_name WHERE id=255;
UPDATE temp_tab SET pri_key_col = ( select MAX(pri_key_col) + 1 FROM table_name );
INSERT INTO table_name select * FROM temp_tab;
COMMIT;

This is just a DO block but you could create a function that takes things like the table name etc as parameters.
Setup:
CREATE TABLE public.t1 (a TEXT, b TEXT, c TEXT, id SERIAL PRIMARY KEY, e TEXT, f TEXT);
INSERT INTO public.t1 (e) VALUES ('x'), ('y'), ('z');
Code to duplicate values without the primary key column:
DO $$
DECLARE
_table_schema TEXT := 'public';
_table_name TEXT := 't1';
_pk_column_name TEXT := 'id';
_columns TEXT;
BEGIN
SELECT STRING_AGG(column_name, ',')
INTO _columns
FROM information_schema.columns
WHERE table_name = _table_name
AND table_schema = _table_schema
AND column_name <> _pk_column_name;
EXECUTE FORMAT('INSERT INTO %1$s.%2$s (%3$s) SELECT %3$s FROM %1$s.%2$s', _table_schema, _table_name, _columns);
END $$
The query it creates and runs is: INSERT INTO public.t1 (a,b,c,e,f) SELECT a,b,c,e,f FROM public.t1. It's selected all the columns apart from the PK one. You could put this code in a function and use it for any table you wanted, or just use it like this and edit it for whatever table.

Quotation mark incorrect when using crosstab() in PostgreSQL

I have a table t1 as below:
create table t1 (
person_id int,
item_name varchar(30),
item_value varchar(100)
);
There are five records in this table:
person_id | item_name | item_value
1 'NAME' 'john'
1 'GENDER' 'M'
1 'DOB' '1970/02/01'
1 'M_PHONE' '1234567890'
1 'ADDRESS' 'Some Addresses unknown'
Now I want to use crosstab function to extract NAME, GENDER data, so I write a SQL as:
select * from crosstab(
'select person_id, item_name, item_value from t1
where person_id=1 and item_name in ('NAME', 'GENDER') ')
as virtual_table (person_id int, NAME varchar, GENDER varchar)
My problem is, as you see the SQL in crosstab() contains condition of item_name, which will cause the quotation marks to be incorrect.
How do I solve the problem?

To avoid any confusion about how to escape single quotes and generally simplify the syntax, use dollar-quoting for the query string:
SELECT *
FROM crosstab(
$$
SELECT person_id, item_name, item_value
FROM t1
WHERE person_id = 1
AND item_name IN ('NAME', 'GENDER')
$$
) AS virtual_table (person_id int, name varchar, gender varchar);
See:
Insert text with single quotes in PostgreSQL
And you should add ORDER BY to your query string. I quote the manual for the tablefunc module:
In practice the SQL query should always specify ORDER BY 1,2 to ensure
that the input rows are properly ordered, that is, values with the
same row_name are brought together and correctly ordered within the
row. Notice that crosstab itself does not pay any attention to the
second column of the query result; it's just there to be ordered by,
to control the order in which the third-column values appear across the page.
See:
PostgreSQL Crosstab Query

Double your single quotes to escape them:
select * from crosstab(
'select person_id, item_name, item_value from t1
where person_id=1 and item_name in (''NAME'', ''GENDER'') ')
as virtual_table (person_id int, NAME varchar, GENDER varchar)