Transpose/Pivot a table in Postgres - postgresql

I am trying for hours to transpose one table into another one this way:
My idea is to grab on an expression (which can be a simple SELECT * FROM X INNER JOIN Y ...), and transpose it into a MATERIALIZED VIEW.
The problem is that the original table can have an arbitrary number of rows (hence columns in the transposed table). So I was not able to find a working solution, not even with colpivot.
Can this ever be done?

Use conditional aggregation:
select "user",
max(value) filter (where property = 'Name') as name,
max(value) filter (where property = 'Age') as age,
max(value) filter (where property = 'Address') as addres
from the_table
group by "user";
A fundamental restriction of SQL is, that all columns of a query must be known to the database before it starts running that query.
There is no way you can have a "dynamic" number of columns (evaluated at runtime) in SQL.
Another alternative is to aggregate everything into a JSON value:
select "user",
jsonb_object_agg(property, value) as properties
from the_table
group by "user";

Related

Is distinct function deterministic? T-sql

I have table like below. For distinct combination of user ID and Product ID SQL will select product bought from store ID 1 or 2? Is it determinictic?
My code
SELECT (DISTINCT CONCAT(UserID, ProductID)), Date, StoreID FROM X
This isn't valid syntax. You can have
select [column_list] from X
or you can have
select distinct [column_list] from X
The difference is that the first will return one row for every row in the table while the second will return one row for every unique combination of the column values in your column list.
Adding "distinct" to a statement will reliably produce the same results every time unless the underlying data changes, so in this sense, "distinct" is deterministic. However, it is not a function so the term "deterministic" doesn't really apply.
You may actually want a "group by" clause like the following (in which case you have to actually specify how you want the engine to pick values for columns not in your group):
select
concat(UserId, ProductID)
, min(Date)
, max(Store)
from
x
group by
concat(UserId, ProductID)
Results:
results

GROUP BY and DISTINCT ON working differently for tables and views

I have a table Design and a view on that table called ArchivedDesign. The view is declared as:
CREATE OR REPLACE VIEW public."ArchivedDesign" ("RootId", "Id", "Created", "CreatedBy", "Modified", "ModifiedBy", "VersionStatusId", "OrganizationId")
AS
SELECT DISTINCT ON (des."RootId") "RootId", des."Id", des."Created", des."CreatedBy", des."Modified", des."ModifiedBy", des."VersionStatusId", des."OrganizationId"
FROM public."Design" AS des
JOIN public."VersionStatus" AS vt ON des."VersionStatusId" = vt."Id"
WHERE vt."Code" = 'Archived'
ORDER BY "RootId", des."Modified" DESC;
Then, I have a large query which gets a short summary of latest changes, thumbnails, etc. The whole query is not important, but it contains two almost identical subqueries - one for the main table and and one for the view.
SELECT DISTINCT ON (1) x."Id",
TRIM(con."Name") AS "Contributor",
extract(epoch from x."Modified") * 1000 AS "Modified",
x."VersionStatusId",
x."OrganizationId"
FROM public."Design" AS x
JOIN "Contributor" AS con ON con."DesignId" = x."Id"
WHERE x."OrganizationId" = ANY (ARRAY[]::uuid[])
AND x."VersionStatusId" = ANY (ARRAY[]::uuid[])
GROUP BY x."Id", con."Name"
ORDER BY x."Id";
and
SELECT DISTINCT ON (1) x."Id",
TRIM(con."Name") AS "Contributor",
extract(epoch from x."Modified") * 1000 AS "Modified",
x."VersionStatusId",
x."OrganizationId"
FROM public."ArchivedDesign" AS x
JOIN "Contributor" AS con ON con."DesignId" = x."Id"
WHERE x."OrganizationId" = ANY (ARRAY[]::uuid[])
AND x."VersionStatusId" = ANY (ARRAY[]::uuid[])
GROUP BY x."Id", con."Name"
ORDER BY x."Id";
Link to SQL fiddle: http://sqlfiddle.com/#!17/d1d0f/1
The query is valid for the table, but fails for the view with an error column x."Modified" must appear in the GROUP BY clause or be used in an aggregate function. I don't understand why there is a difference in the behavior of those two queries? How do I fix the view query to work the same way as the table query?
My ultimate goal is to replace all table sub-queries with view sub-queries so we can easily separate draft, active and archived designs.
You get that error because when you query the table directly, Postgres is able to identify the primary key of the table and knows that grouping by it is enough.
Quote from the manual
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column
(emphasis mine)
When querying the view, Postgres isn't able to detect that functional dependency that makes it possible to have a "shortened" GROUP BY when querying the table directly.

OrientDB: Efficient way to select records with a value equal to the max of all such values?

I'm not sure how to do this without using a JOIN (which ODB doesn't have, of course). In "generic" SQL, you might do something like this:
Select * FROM table
INNER JOIN
(SELECT max(field) AS max_of_field, key FROM table GROUP BY key) sub
ON table.field = sub.max_of_field AND table.key = sub.key
Is there an efficient way to do this in ODB, using SELECT and/or MATCH?

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE. Input as follows
call_id,stat1,stat2,stat3
1,a,b,c,
2,x,y,z,
3,d,e,f,
1,j,k,l,
The output table needs to have call_id as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3,
1,b,c, or (1,k,l)
2,y,z,
3,e,f,
The problem is that when i use the keyword DISTINCT in the HIVE query, the DISTINCT applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE(I am not well-versed in SQL either).
The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id as the other columns are different and the row on the whole is distinct.
NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.
Any ideas how i can do this?
One quick idea,not the best one, but will do the work-
hive>create table temp1(a int,b string);
hive>insert overwrite table temp1
select call_id,max(concat(stat1,'|',stat2,'|',stat3)) from intable group by call_id;
hive>insert overwrite table intable
select a,split(b,'|')[0],split(b,'|')[1],split(b,'|')[2] from temp1;
,,I want to apply the DISTINCT operation only to the call_id"
But how will then Hive know which row to eliminate?
Without knowing the amount of data / size of the stat fields you have, the following query can the job:
select distinct i1.call_id, i1.stat2, i1.stat3 from (
select call_id, MIN(concat(stat1, stat2, stat3)) as smin
from intable group by call_id
) i2 join intable i1 on i1.call_id = i2.call_id
AND concat(i1.stat1, i1.stat2, i1.stat3) = i2.smin;

Order By Column Names

Is it possible to use the names of the actual columns for the order by clause?
I am using a view to let a client use a reporter writer (Pentaho) and this would make things easier on them.
To clarify, I want to put the results in alphabetical order of the column names themselves. I want to sort the data using the columns, not the data IN the columns.
In PostgreSQL you can try:
SELECT column_name, data_type FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME =
'my_table' ORDER BY column_name;
If what you mean is to change the order of the columns themselves according to their names (that would make sense only if you are using SELECT *, I guess), I'm afraid that's not possible, at least not straightforwardly. And that sounds very unSQL, I'd say...
Sure, you can order by column name, column alias, or column position:
select a, b from table order by b;
select a as x, b as y from table order by x, y;
select a, b from table order by 1;
You can create the view with the columns in any order you like. Then SELECT * FROM your_view queries will return the columns in the order specified by the view.