Dynamically transpose a table using PostgresSQL - postgresql

I have a table, with this schema:
summary | time | v1 | v2 | .... v28
---------------------------------------------------------
count | 12345 | 3.7 | 9.8 | .... 6.8
stddev | 23564 | 5.7 | 8.8 | .... 6.6
min | 12345 | 6.7 | 3.8 | .... 6.5
25% | 12545 | 7.7 | 1.8 | .... 6.4
50% | 13455 | 8.7 | 2.8 | .... 6.3
75% | 32345 | 9.7 | 4.8 | .... 6.2
max | 89045 | 1.7 | 4.9 | .... 6.1
I wan to the result to be:
column_Name | count | count | stddev | min | 25% | 50% | 75% | max
--------------------------------------------------------------------------------------------------------
time | ..... | ..... | ..... | .... | ... | ... | ... | ...
v1 | ..... | ..... | ..... | .... | ... | ... | ... | ...
v2 | ..... | ..... | ..... | .... | ... | ... | ... | ...
.
.
.
v28 | ..... | ..... | ..... | .... | ... | ... | ... | ...
I'm using PostgresSQL 9.4 and DBeaver to transpose/pivot it using several queries I found on stack overflow, but I'm getting errors! I don't know why I'm getting them:
Query1
SELECT (each(hstore(dataset.stats.*))).* FROM dataset.stats
Getting this error:
SQL Error [1]: Query failed (#20190927_165859_00936_zga4f): line 1:77: mismatched input '*'. Expecting: <identifier>
This is query2:
SELECT 'SELECT *
FROM crosstab(
''SELECT u.attnum, t.rn, u.val
FROM (SELECT row_number() OVER () AS rn, * FROM ' || attrelid::regclass || ') t
, unnest(ARRAY[' || string_agg(quote_ident(attname) || '::text', ',') || '])
WITH ORDINALITY u(val, attnum)
ORDER BY 1, 2''
) t (attnum int, '
|| (SELECT string_agg('r'|| rn ||' text', ',')
FROM (SELECT row_number() OVER () AS rn FROM tbl) t)
|| ')' AS sql
FROM pg_attribute
WHERE attrelid = 'dataset.stats'::regclass
AND attnum > 0
AND NOT attisdropped
GROUP BY attrelid;
Getting this error:
SQL Error [1]: Query failed (#20190927_165928_00937_zga4f): line 4:63: identifiers must not contain ':'

Related

Interpretation of rows in Phoenix SYSTEM.CATALOG

When I create a Phoenix table there are two extra rows in SYSTEM.CATALOG. These are the first and the second rows in the output of SELECT * FROM SYSTEM.CATALOG ...... below. Can someone please help me understand what these two rows signify?
The third and fourth rows in the output of SELECT * FROM SYSTEM.CATALOG ...... below are easily relatable to the CREATE TABLE statement. Therefore, they look fine.
0: jdbc:phoenix:t40aw2.gaq> CREATE TABLE C5 (company_id INTEGER PRIMARY KEY, name VARCHAR(225));
No rows affected (4.618 seconds)
0: jdbc:phoenix:t40aw2.gaq> select * from C5;
+-------------+-------+
| COMPANY_ID | NAME |
+-------------+-------+
+-------------+-------+
No rows selected (0.085 seconds)
0: jdbc:phoenix:t40aw2.gaq> SELECT * FROM SYSTEM.CATALOG WHERE TABLE_NAME='C5';
+------------+--------------+-------------+--------------+----------------+----------------+-------------+----------+---------------+---------------+------------------+--------------+----+
| TENANT_ID | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | COLUMN_FAMILY | TABLE_SEQ_NUM | TABLE_TYPE | PK_NAME | COLUMN_COUNT | SALT_BUCKETS | DATA_TABLE_NAME | INDEX_STATE | IM |
+------------+--------------+-------------+--------------+----------------+----------------+-------------+----------+---------------+---------------+------------------+--------------+----+
| | | C5 | | | 0 | u | | 2 | null | | | fa |
| | | C5 | | 0 | null | | | null | null | | | |
| | | C5 | COMPANY_ID | | null | | | null | null | | | |
| | | C5 | NAME | 0 | null | | | null | null | | | |
+------------+--------------+-------------+--------------+----------------+----------------+-------------+----------+---------------+---------------+------------------+--------------+----+
4 rows selected (0.557 seconds)
0: jdbc:phoenix:t40aw2.gaq>
The Phoenix version I am using is: 4.1.8.29
Kindly note that no other operations where done on the table other than the 3 listed above, namely, create table, select * from the table, and select * from system.catalog where TABLE_NAME=the concerned table name.

Select common values when using group by [Postgres]

I have three main tables meetings, persons, hobbies with two relational tables.
Table meetings
+---------------+
| id | subject |
+----+----------+
| 1 | Kickoff |
| 2 | Relaunch |
| 3 | Party |
+----+----------+
Table persons
+------------+
| id | name |
+----+-------+
| 1 | John |
| 2 | Anna |
| 3 | Linda |
+----+-------+
Table hobbies
+---------------+
| id | name |
+----+----------+
| 1 | Soccer |
| 2 | Tennis |
| 3 | Swimming |
+----+----------+
Relation Table meeting_person
+-----------------+-----------+
| id | meeting_id | person_id |
+----+------------+-----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 3 | 1 |
+----+------------+-----------+
Relation Table person_hobby
+----------------+----------+
| id | person_id | hobby_id |
+----+-----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 3 | 1 |
+----+-----------+----------+
Now I want to to find the common hobbies of all person attending each meeting.
So the desired result would be:
+------------+-----------------+------------------------+
| meeting_id | persons | common_hobbies |
| | (Aggregated) | (Aggregated) |
+------------+-----------------+------------------------+
| 1 | John,Anna,Linda | Soccer |
| 2 | John,Anna | Soccer,Tennis |
| 3 | John | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+
My current work in progress is:
select
m.id as "meeting_id",
(
select string_agg(distinct p.name, ',')
from meeting_person mp
inner join persons p on mp.person_id = p.id
where m.id = mp.meeting_id
) as "persons",
string_agg(distinct h2.name , ',') as "common_hobbies"
from meetings m
inner join meeting_person mp2 on m.id = mp2.meeting_id
inner join persons p2 on mp2.person_id = p2.id
inner join person_hobby ph2 on p2.id = ph2.person_id
inner join hobbies h2 on ph2.hobby_id = h2.id
group by m.id
But this query lists not the common_hobbies but all hobbies which are at least once mentioned.
+------------+-----------------+------------------------+
| meeting_id | persons | common_hobbies |
+------------+-----------------+------------------------+
| 1 | John,Anna,Linda | Soccer,Tennis,Swimming |
| 2 | John,Anna | Soccer,Tennis,Swimming |
| 3 | John | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+
Does anyone have any hints for me, on how I could solve this problem?
Cheers
This problem can be solved by implement custom aggregation function (found it here):
create or replace function array_intersect(anyarray, anyarray)
returns anyarray language sql
as $$
select
case
when $1 is null then $2
when $2 is null then $1
else
array(
select unnest($1)
intersect
select unnest($2))
end;
$$;
create aggregate array_intersect_agg (anyarray)
(
sfunc = array_intersect,
stype = anyarray
);
So, the solution can be next:
select
meeting_id,
array_agg(ph.name) persons,
array_intersect_agg(hobby) common_hobbies
from meeting_person mp
join (
select p.id, p.name, array_agg(h.name) hobby
from person_hobby ph
join persons p on ph.person_id = p.id
join hobbies h on h.id = ph.hobby_id
group by p.id, p.name
) ph on ph.id = mp.person_id
group by meeting_id;
Look the example fiddle
Result:
meeting_id | persons | common_hobbies
-----------+-----------------------+--------------------------
1 | {John,Anna,Linda} | {Soccer}
3 | {John} | {Soccer,Tennis,Swimming}
2 | {John,Anna} | {Soccer,Tennis}

Find rows in relation with at least n rows in a different table without joins

I have a table as such (tbl):
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
| 3 | faso | 11 |
+----+------+-----+
And another table in n-to-1 relationship with tbl (tbl2):
+----+-----+
| pk | rel |
+----+-----+
| 0 | 0 |
| 1 | 1 |
| 2 | 0 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 1 |
| 7 | 2 |
+----+-----+
(tbl2.rel -> tbl.pk.)
I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.
I.e., for n = 2, I want this table:
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
+----+------+-----+
This is the solution I came up with:
SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
SELECT tbl.pk
FROM tbl
RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
GROUP BY tbl.pk
HAVING COUNT(tbl2.*) >= 2 -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk
Can it be done without selecting the candidates with a subquery and re-joining the table with itself?
I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.
OK, just join once, as below:
select
t1.pk,
t1.attr,
t1.val
from
tbl t1
join
tbl2 t2 on t1.pk = t2.rel
group by
t1.pk,
t1.attr,
t1.val
having(count(1)>=2) order by t1.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Or just join once and use CTE(with clause), as below:
with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Is the SQL clearer?

How to copy into Postgres table from csv with added column?

I have a table in Postgres that I would like to copy into from a csv file. I usually do as so:
\copy my_table from '/workdir/some_file.txt' with null as 'NULL' delimiter E'|' csv header;
The problem is now however that my_table has one column extra that I would like to fill in manually on copy, with the same value 'b'. Here are my tables:
some_file.txt:
col1 | col2 | col3
0 0 1
0 1 3
my_table :
xtra_col | col1 | col2 | col3
a 5 2 5
a 6 2 5
a 7 2 5
Desired my_table after copy into:
xtra_col | col1 | col2 | col3
a 5 2 5
a 6 2 5
a 7 2 5
b 0 0 1
b 0 1 3
Is there a way to mention the persisting 'b' value in the copy statement for column `xtra_col'. If not, how should I approach this problem?
You could set a (temporary) default value for the xtra_col:
ALTER TABLE my_table ALTER COLUMN xtra_col SET DEFAULT 'b';
COPY my_table (col1, col2, col3) FROM '/workdir/some_file.txt' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);
ALTER TABLE my_table ALTER COLUMN xtra_col DROP DEFAULT;
is there a way to not repeat columns in my_table? the real my_table has 20 columns and i wouldnt want to call all of them.
If my_table has a lot of columns and you wish to avoid having to type out all the column names,
you could dynamically generate the COPY command like this:
SELECT format($$COPY my_table(%s) FROM '/workdir/some_file.txt' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);$$
, string_agg(quote_ident(attname), ','))
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
AND attname != 'xtra_col'
AND attnum > 0
you could then copy-and-paste the SQL to run it.
Or, for totally hands-free operation, you could create a function to generate the SQL and execute it:
CREATE OR REPLACE FUNCTION test_func(filepath text, xcol text, fillval text)
RETURNS void
LANGUAGE plpgsql
AS $func$
DECLARE sql text;
BEGIN
EXECUTE format($$ALTER TABLE my_table ALTER COLUMN %s SET DEFAULT '%s';$$, xcol, fillval);
SELECT format($$COPY my_table(%s) FROM '%s' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);$$
, string_agg(quote_ident(attname), ','), filepath)
INTO sql
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
AND attname != 'xtra_col'
AND attnum > 0;
EXECUTE sql;
EXECUTE format($$ALTER TABLE my_table ALTER COLUMN %s DROP DEFAULT;$$, xcol);
END;
$func$;
SELECT test_func('/workdir/some_file.txt', 'xtra_col', 'b');
This is the sql I used to test the solution above:
DROP TABLE IF EXISTS test;
CREATE TABLE test (
xtra_col text
, col1 int
, col2 int
, col3 int
);
INSERT INTO test VALUES
('a', 5, 2, 5)
, ('a', 6, 2, 5)
, ('a', 7, 2, 5);
with the contents of /tmp/data being
col1 | col2 | col3
0 | 0 | 1
0 | 1 | 3
Then
SELECT test_func('/tmp/data', 'xtra_col', 'b');
SELECT * FROM test;
results in
+----------+------+------+------+
| xtra_col | col1 | col2 | col3 |
+----------+------+------+------+
| a | 5 | 2 | 5 |
| a | 6 | 2 | 5 |
| a | 7 | 2 | 5 |
| b | 0 | 0 | 1 |
| b | 0 | 1 | 3 |
+----------+------+------+------+
(5 rows)
Regarding the pg.dropped column:
The test_func call does not seem to produce the pg.dropped column, at least on the test table used above:
unutbu=# SELECT *
FROM pg_attribute
WHERE attrelid = 'test'::regclass;
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
| attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions |
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
| 53393 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | | |
| 53393 | xtra_col | 25 | -1 | -1 | 1 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | |
| 53393 | col1 | 23 | -1 | 4 | 2 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
| 53393 | col2 | 23 | -1 | 4 | 3 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
| 53393 | col3 | 23 | -1 | 4 | 4 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
(10 rows)
As far as I know, the pg.dropped column is a natural result of how PostgreSQL works when a column is dropped. So no fix is necessary.
Rows whose attname contains pg.dropped also have a negative attnum.
This is why attnum > 0 was used in test_func -- to remove such rows from the generated list of column names.
My experience with Postgresql is limited, so I might be wrong. If you can produce an example which generates a pg.dropped "column" with positive attnum, I'd very much like to see it.
I usually load a file into a temporary table then insert (or update) from there. In this case,
CREATE TEMP TABLE input (LIKE my_table);
ALTER TABLE input DROP xtra_col;
\copy input from 'some_file.txt' ...
INSERT INTO my_table
SELECT 'b', * FROM input;
The INSERT statement looks tidy, but that can only really be achieved when the columns you want to exclude are on either end of my_table. In your (probably simplified) example, xtra_col is at the front so we can quickly append the remaining columns using *.
If the arrangement of CSV file columns differs my_table much more than that, you'll need to start typing out column names.

Postgres 10 lateral unnest missing null values

I have a Postgres table where the content of a text column is delimited with '|'.
ID | ... | my_column
-----------------------
1 | ... | text|concatenated|as|such
2 | ... | NULL
3 | ... | NULL
I tried to unnest(string_to_array()) this column to separate rows which works fine, except that my NULL values (>90% of all entries) are excluded. I have tried several approaches:
SELECT * from "my_table", lateral unnest(CASE WHEN "this_column" is NULL
THEN NULL else string_to_array("this_column", '|') END);
or
as suggested here: PostgreSQL unnest with empty array
What I get:
ID | ... | my_column
-----------------------
1 | ... | text
1 | ... | concatenated
1 | ... | as
1 | ... | such
But this is what I need:
ID | ... | my_column
-----------------------
1 | ... | text
1 | ... | concatenated
1 | ... | as
1 | ... | such
2 | ... | NULL
3 | ... | NULL
Use a LEFT JOIN instead:
SELECT m.id, t.*
from my_table m
left join lateral unnest(string_to_array(my_column, '|')) as t(w) on true;
There is no need for the CASE statement to handle NULL values. string_to_array will handle them correctly.
Online example: http://rextester.com/XIGXP80374