Create pivot table with dynamic column names - postgresql

I am creating a pivot table which represents crash values for particular year. Currently, i am doing a hard code for column names to create pivot table. Is there anyway to make the column names dynamic to create pivot table? years are stored inside an array
{2018,2017,2016 ..... 2008}
with crash as (
--- pivot table generated for total fatality ---
SELECT *
FROM crosstab('SELECT b.id, b.state_code, a.year, count(case when a.type = ''Fatal'' then a.type end) as fatality
FROM '||state_code_input||'_all as a, (select * from source_grid_repository where state_code = '''||upper(state_code_input)||''') as b
where st_contains(b.geom,a.geom)
group by b.id, b.state_code, a.year
order by b.id, a.year',$$VALUES ('2018'),('2017'),('2016'),('2015'),('2014'),('2013'),('2012'),('2011'),('2010'),('2009'),('2008') $$)
AS pivot_table(id integer, state_code varchar, fat_2018 bigint, fat_2017 bigint, fat_2016 bigint, fat_2015 bigint, fat_2014 bigint, fat_2013 bigint, fat_2012 bigint, fat_2011 bigint, fat_2010 bigint, fat_2009 bigint, fat_2008 bigint)
)
In the above code, fat_2018, fat_2017 , fat_2016 etc were hard coded. I need the years after fat_ to be dynamic.

This question has been asked many times, & there are decent (even dynamic) solutions. While CROSSTAB() is available in recent versions of Postgres, not everyone has sufficient privileges to install the prerequisite extensions.
One such solution involves a temp type (temp table) created by an anonymous function & JSON expansion of the resultant type.
See also: DB FIDDLE (UK): https://dbfiddle.uk/Sn7iO4zL
How to pivot or crosstab in postgresql without writing a function?

It is not possible. PostgreSQL is strict type system. Result is a table (relation). A format of this table (columns, columns names, columns types) should be defined before query execution (in planning time). So you cannot to write any query for Postgres that returns dynamic number of columns.

Related

Is there a way to add the same row multiple times with different ids into a table with postgresql?

I am trying to add the same data for a row into my table x number of times in postgresql. Is there a way of doing that without manually entering the same values x number of times? I am looking for the equivalent of the go[count] in sql for postgres...if that exists.
Use the function generate_series(), e.g.:
insert into my_table
select id, 'alfa', 'beta'
from generate_series(1,4) as id;
Test it in db<>fiddle.
Idea
Produce a resultset of a given size and cross join it with the record that you want to insert x times. What would still be missing is the generation of proper PK values. A specific suggestion would require more details on the data model.
Query
The sample query below presupposes that your PK values are autogenerated.
CREATE TABLE test ( id SERIAL, a VARCHAR(10), b VARCHAR(10) );
INSERT INTO test (a, b)
WITH RECURSIVE Numbers(i) AS (
SELECT 1
UNION ALL
SELECT i + 1
FROM Numbers
WHERE i < 5 -- This is the value `x`
)
SELECT adhoc.*
FROM Numbers n
CROSS JOIN ( -- This is the single record to be inserted multiple times
SELECT 'value_a' a
, 'value_b' b
) adhoc
;
See it in action in this db fiddle.
Note / Reference
The solution is adopted from here with minor modifications (there are a host of other solutions to generate x consecutive numbers with SQL hierachical / recursive queries, so the choice of reference is somewhat arbitrary).

Athena - Union tables with incompatible data types

We have two tables with a column differing in its data type. A column in first table is of type int, while the same column on second table is of type float/real. if it was a naked column I could have CAST'ed to a common type, the problem here is, these columns are deep inside a struct.
Error i'm getting is,
SYNTAX_ERROR: line 23:1: column 4 in row(priceconfiguration row(maximumvalue integer, minimumvalue integer, type varchar, value integer)) query has incompatible types: Union, row(priceconfiguration row(maximumvalue integer, minimumvalue integer, type varchar, value real))
The query (simplified) is,
WITH t1 AS (
SELECT
"so"."createdon"
, "so"."modifiedon"
, "so"."deletedon"
, "so"."createdby"
, "so"."priceconfiguration"
, "so"."year"
, "so"."month"
, "so"."day"
FROM
my_db.raw_price so
UNION ALL
SELECT
"ao"."createdon"
, "ao"."modifiedon"
, "ao"."deletedon"
, "ao"."createdby"
, "ao"."priceconfiguration"
, "ao"."year"
, "ao"."month"
, "ao"."day"
FROM
my_db.src_price ao
)
SELECT t1.* FROM t1 ORDER BY "modifiedon" DESC
In fact, the real table is more complex than this and the column priceconfiguration is nested deep inside the tables. So CASTing the column under question is directly not possible, unless all the structs are un-nested to CAST the offending column.
Is there a way to UNION these two tables without unnesting and casting?
The solution was to upgrade the Athena Engine Version to v2.
V2 Engine has more support for schema evolution. As per the AWS doc,
Schema evolution support has been added for data in Parquet format.
Added support for reading array, map, or row type columns from
partitions where the partition schema is different from the table
schema. This can occur when the table schema was updated after the
partition was created. The changed column types must be compatible.
For row types, trailing fields may be added or dropped, but the
corresponding fields (by ordinal) must have the same name.
Ref:
https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference.html

Will i get benefits of hyper table if I have a query in which I join a hyper table with a normal (non-hyper) table in timescaledb

I have to fetch record from two tables, there is one table is hyper table another table is normal table.
Hyper table primary key (a UUID, not a timestampz column) is used as foreign key in 2nd normal table.
The hyper table has one to many relationship with the normal table.
Will I get all benefits of hyper table here if I select record after joining this table?
I am using postgresql database for timescale.
Below are create table queries for same. The demography_person is the hypertable and the emotions_person is the normal table
CREATE TABLE public.demography_person
(
start_timestamp timestamp with time zone NOT NULL,
end_timestamp timestamp with time zone,
demography_person_id character varying NOT NULL,
device_id bigint,
age_actual numeric,
age_band integer,
gender integer,
dwell_time_in_millis bigint,
customer_id bigint NOT NULL
);
SELECT create_hypertable('demography_person', 'start_timestamp');
CREATE TABLE public.emotions_person
(
emotion_start_timestamp timestamp with time zone NOT NULL,
demography_person_id character varying NOT NULL,
count integer,
emotion integer,
emotion_percentage numeric
);
select sql Query is like:-
SELECT * FROM crosstab
(
$$
SELECT * FROM ( select to_char(dur,'HH24') as duration , dur as time_for_sorting from
generate_series(
timestamp '2019-04-01 00:00:00',
timestamp '2020-03-09 23:59:59' ,
interval '1 hour'
) as dur ) d
LEFT JOIN (
select to_char(
start_timestamp ,
'HH24'
)
as duration,
emotion,count(*) as count from demography_person dp INNER JOIN (
select distinct ON (demography_person_id) demography_person_id, emotion_start_timestamp,count,emotion,emotion_percentage,
(CASE emotion when 4 THEN 1 when 6 THEN 2 when 1 THEN 3 WHEN 3 THEN 4 WHEN 2 THEN 5 when 7 THEN 6 when 5 THEN 7 ELSE 8 END )
as emotion_key_for_sorting from emotions_person where demography_person_id in (select demography_person_id from demography_person where start_timestamp >= '2019-04-01 00:00:00'
AND start_timestamp <= '2020-03-09 23:59:59' AND device_id IN ( 2052,2692,1797,2695,1928,2697,2698,1931,2574,2575,2706,1942,1944,2713,1821,2719,2720,2721,2722,2723,2596,2725,2217,2603,1852,2750,1726,1727,2754,2757,1990,2759,2760,2376,2761,2762,2257,2777,2394,2651,2652,1761,2658,1762,2659,2788,2022,2791,2666,1770,2026,2028,2797,2675,1780,2549 ))
order by demography_person_id asc,emotion_percentage desc, emotion_key_for_sorting asc
) ep ON
ep.demography_person_id = dp.demography_person_id
WHERE start_timestamp >= '2019-04-01 00:00:00'
AND start_timestamp <= '2020-03-09 23:59:59' AND device_id IN ( 2052,2692,1797,2695,1928,2697,2698,1931,2574,2575,2706,1942,1944,2713,1821,2719,2720,2721,2722,2723,2596,2725,2217,2603,1852,2750,1726,1727,2754,2757,1990,2759,2760,2376,2761,2762,2257,2777,2394,2651,2652,1761,2658,1762,2659,2788,2022,2791,2666,1770,2026,2028,2797,2675,1780,2549 ) AND gender IN ( 1,2 )
group by 1,2 ORDER BY 1,2 ASC
) t USING (duration) GROUP BY 1,2,3,4 ORDER BY time_for_sorting;
$$ ,
$$
select emotion from (
values ('1'), ('2'), ('3'),('4'), ('5'), ('6'),('7'), ('8')
) t(emotion)
$$
) AS ct
(
duration text,
time_for_sorting timestamp,
ANGER bigInt,
DISGUSTING bigInt,
FEAR bigInt,
HAPPY bigInt,
NEUTRAL bigInt,
SAD bigInt,
SURPRISE bigInt,
NO_DETECTION bigInt
);
Will i get benefits of hyper table if I have a query in which I join a hyper table with a normal (non-hyper) table in timescaledb
I don't fully understand the question and see 2 interpretations for it:
Will I benefit from using TimescaleDB and hypertable just for improving this query?
Can I join a hypertable and a normal table and how to make the above query to perform better?
If you just need to execute a complex query over large dataset, PostgreSQL can do good job if you provide indexes. TimescaleDB provides benefits for Timeseries workflows especially when a workflow includes data in-order ingesting, time-related queries, timeseries operators and/or usage TimescaleDB specific functionality such as continuous aggregates and compression, i.e., not just a query. TimescaleDB is designed for large volumes of timeseries data. I hope it clarifies the first question.
In TimescaleDB it is very common to join hypertable, which stores timeseries data, and a normal table, which contains metadata on timerseries data. TimescaleDB implements constraint exclusion to improve query performance. However, it might not be applied in some cases due to uncommon query expressions or too complex queries.
The query in the question is very complex. So I suggest to use ANALYZE on the query to see if the query planner misses some optimisations.
I see that the query generates data and I doubt it can be done much to produce good query plan. So this is my biggest concern for getting good performance. It would be great if you can explain motivation around the generating data inside the query.
Another issue, which I see, is a nested query demography_person_id in (select demography_person_id from demography_person ... in a where condition. And the outer query is a part in a inner join with the same table as in the nested query. I expect it can be rewritten without nested subquery utilising inner join.
I doubt that TimescaleDB or PostgreSQL can do much to execute query efficiently. The query requires manual human rewriting.

PostGresql: Copy data from a random row of another table

I have two tables, stuff and nonsense.
create table stuff(
id serial primary key,
details varchar,
data varchar,
more varchar
);
create table nonsense (
id serial primary key,
data varchar,
more varchar
);
insert into stuff(details) values
('one'),('two'),('three'),('four'),('five'),('six');
insert into nonsense(data,more) values
('apple','accordion'),('banana','banjo'),('cherry','cor anglais');
See http://sqlfiddle.com/#!17/313fb/1
I would like to copy random values from nonsense to stuff. I can do this for a single value using the answer to my previous question: SQL Server Copy Random data from one table to another:
update stuff
set data=(select data from nonsense where stuff.id=stuff.id
order by random() limit 1);
However, I would like to copy more than one value (data and more) from the same row, and the sub query won’t let me do that, of course.
I Microsoft SQL, I can use the following:
update stuff
set data=sq.town,more=sq.state
from stuff s outer apply
(select top 1 * from nonsense where s.id=s.id order by newid()) sq
I have read that PostGresql uses something like LEFT JOIN LATERAL instead of OUTER APPPLY, but simply substituting doesn’t work for me.
How can I update with multiple values from a random row of another table?
As of Postgres 9.5, you can assign multiple columns from a subquery:
update stuff
set (data, more) = (
select data, more
from nonsense
where stuff.id=stuff.id
order by random()
limit 1
);

Compact or renumber IDs for all tables, and reset sequences to max(id)?

After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.
I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.
But it's not easy to compact IDs by hand because of references and constraints.
Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?
Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.
The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:
Compacting a sequence in PostgreSQL
The currently accepted answer will fail for most cases.
Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.
The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:
BEGIN;
LOCK tbl;
-- remove all FK constraints to the column
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
-- for the simple case without FK references - or see below:
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
-- Update referencing value in FK columns at the same time (if any)
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
-- add all FK constraints to the column back
COMMIT;
This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.
If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.
Example for a table fk_tbl and a FK column fk_id:
WITH u1 AS (
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id
RETURNING t.id, t1.new_id -- return old and new ID
)
UPDATE fk_tbl f
SET fk_id = u1.new_id -- set to new ID
FROM u1
WHERE f.fk_id = u1.id; -- match on old ID
More in the referenced answer on dba.SE.
Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.
CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.
For example:
Create the table, put some data in, and remove a middle value:
db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
4 | are
5 | you
(4 rows)
Reset your sequence:
db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE
Update your data:
db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
3 | are
4 | you
(4 rows)
new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)
\i tmp.sql
-- the test tables
CREATE TABLE one (
id serial NOT NULL PRIMARY KEY
, payload text
);
CREATE TABLE two (
id serial NOT NULL PRIMARY KEY
, the_fk INTEGER REFERENCES one(id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);
-- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;
INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;
-- make some gaps
DELETE FROM one WHERE id % 13 > 0;
-- SELECT * FROM two;
-- Add new keycolumns to one and two
ALTER TABLE one
ADD COLUMN new_id SERIAL NOT NULL UNIQUE
;
-- UPDATE:
-- This could need DEFERRABLE
-- Note since the update is only a permutation of the
-- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;
ALTER TABLE two
ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
;
-- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
;
SELECT * FROM two;
-- The crucial part: the final renaming
-- (at this point it would be better not to allow other sessions
-- messing with the {one,two} tables ...
-- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);
ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);
-- Some checks.
-- (the automatically generated names for the indexes
-- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two
UPDATE: added the permutation of new_id (after creating it as a serial)
Funny thing is: it doesn't seem to need 'DEFERRABLE'.
*This script will work for postgresql
This is a generic solution that works for all cases
This query find the desciption of the fields of all tables from any database.
WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from
( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
FROM information_schema.columns
) colum
full join (SELECT schemaname, relid, relname,objoid, objsubid, description
FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid ) descre
on descre.relname = colum.table_name and descre.objsubid=colum.ordinal_position and descre.schemaname=colum.schemaname )
This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).
It finds the number of records of the table and then increment this number by one.
SELECT table_name, column_name, ordinal_position,column_default,
data_type, is_nullable, character_maximum_length, is_updatable,
description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''', (select max( '||column_name ||')+1 from '|| table_name ||' ), true);' as req
FROM description_bd where column_default like '%nextva%'
Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job.
It is called like this :
=> SELECT resequence('port','id','port_id_seq');
resequence
--------------
5090 -> 3919
Takes 3 parameters
name of table
name of column that is SERIAL
name of sequence that the SERIAL uses
The function returns a short report of what it has done, with the previous value of the sequence and the new value.
The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.
The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.
The code :
CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE
_old BIGINT;_new BIGINT := 0;
BEGIN
FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
_new=_new+1;
EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
END LOOP;
RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;