Pivot function without manually typing values in `for in`? - amazon-redshift

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?

I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

Related

Db2 convert rows to columns

I need the below results ..
Table :
Order postcode qnty
123 2234 1
Expected result:
Order 123
Postcode 2234
Qnty 1
SQL server:
Select pvt.element_name
,pvt.element_value(select order.postcode
from table name)up
unpivot (element_value for element_name in(order,postcode) as Pvt
How to achieve this in db2?
Db2 for IBM i doesn't have a built-in unpviot function.. AFAIK, it's not available on any Db2 platofrm...unless it's been added recently.
The straight forward method
select 'ORDER' as key, order as value
from mytable
UNION ALL
select 'POSTCODE', postcode
from mytable
UNION ALL
select 'QNTY', char(qnty)
from mytable;
A better performing method is to do a cross join between the source table and a correlated VALUES of as many rows as columns that need to be unpivoted.
select
Key, value
from mytable T,
lateral (values ('ORDER', t.order)
, ('POSTCODE', t.postcode)
, ('QNQTY', varchar(t.qnty))
) as unpivot(key, value);
However, you'll need to know ahead of time what the values you're unpivoting on.
If you don't know the values, there are some ways to unpivot with the XMLTABLE (possibly JSON_TABLE) that might work. I've never used them, and I'm out of time to spend answering this question. You can find some examples via google.
I have created a stored procedure for LUW that rotate a table:
https://github.com/angoca/db2tools/blob/master/pivot.sql
You just need to call the stored procedure by passing the tablename as parameter, and it will return a cursor with the headers of the column in the first column.

Postgres "first" aggregation function

I am aggregation a table using file ID field. Each file has a name which matched exactly one (his) file id.
select file_key, min(fullfilepath)
from table
group by file_key
Because I know the structure of the table, I know that I need ANY fullfilepath. The min and the max are ok, but it requires a lot of time.
I came across this aggregation function which returns the first value. Unfortunately, this function takes a long time, because it scans the whole table. For example, this is very slow:
select first(file_id) from table;
What is the fastest way to do that? With or without aggregation function.
There is no way to make your first query with the GROUP BY clause faster, because it has to scan the whole table to find all groups.
Your second query can be made faster:
SELECT (
SELECT file_id FROM "table"
WHERE file_id IS NOT NULL
LIMIT 1
);
There is no way to optimize the query as you wrote it, because the aggregate function is a black box to PostgreSQL.
I doubt that this will help performance but it may be useful if anyone actually wants a first agregate.
-- coaslesce isn't a function so make an equivalent function.
create function coalesce_("anyelement","anyelement") returns "anyelement"
language sql as $$ select coalesce( $1,$2 ) $$;
create aggregate first("anyelement") (sfunc=coalesce_, stype="anyelement");
select
distinct on (file_key)
file_key, fullfilepath
from table
order by file_key
That will return one record for each file_key

Db2 sql for partition by range select

I am trying to get my head around db2 partition stuff.
Select a.*, max(a.bloo)
over (
partition by range (a.bloo) (starting '2014-4-20' ending '2015-1-1')
)
as maxmax from (
select * from someTable
) a
I get a sql code of negative 104 for this, and I cannot decipher the docs.
You are mixing up two different things: table partitioning, which is a physical characteristic of a table, and OLAP (window) functions, which provide logical grouping of records in a query.
I guess what you wanted was something like
Select
a.*,
max(a.bloo) over ( partition by a.bloo ) as maxmax
from someTable a
where
a.bloo between '2014-4-20' and '2015-1-1'
However, without knowing what you wanted to achieve in the first place it's impossible to give you a definitive answer. You may want to publish some sample data and the desired output.

Dynamic FROM clause in Postgres

Using PostgreSQL 9.1.13 I've written the followed query to calculate some data:
WITH windowed AS (
SELECT a.person_id, a.category_id,
CAST(dense_rank() OVER w AS float) / COUNT(*) OVER (ORDER BY category_id) * 100.0 AS percentile
FROM (
SELECT DISTINCT ON (person_id, category_id) *
FROM performances s
-- Want to insert a FROM clause here
INNER JOIN person p ON s.person_id = p.ident
ORDER BY person_id, category_id, created DESC
) a
WINDOW w AS (PARTITION BY category_id ORDER BY score)
)
SELECT category_id,percentile FROM windowed
WHERE person_id = 1;
I now want to turn this into a stored procedure but my issue is that in the middle there, where I showed the comment, I need to place a dynamic WHERE clause. For example, I'd like to add something like:
WHERE p.weight > 110 OR p.weight IS NULL
The calling application let's people pick filters and so I want to be able to pass the appropriate filters into the query. There could be 0 or many filters, depending on the caller, but I could pass it all in as a properly formatted where clause as a string parameter, for example.
The calling application just sends values to a webservice, which then builds the string and calls the stored procedure, so SQL injection attacks won't really be an issue.
The calling application just sends values to a webservice, which then
builds the string and calls the stored procedure, so SQL injection
attacks won't really be an issue.
Too many cooks spoil the broth.
Either let your webserive build the SQL statement or let Postgres do it. Don't use both on the same query. That leaves two possible weak spots for SQL injection attacks and makes debugging and maintenance a lot harder.
Here is full code example for a plpgsql function that builds and executes an SQL statement dynamically while making SQL injection impossible (just from two days ago):
Robust approach for building SQL queries programmatically
Details heavily depend on exact requirements.

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE. Input as follows
call_id,stat1,stat2,stat3
1,a,b,c,
2,x,y,z,
3,d,e,f,
1,j,k,l,
The output table needs to have call_id as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3,
1,b,c, or (1,k,l)
2,y,z,
3,e,f,
The problem is that when i use the keyword DISTINCT in the HIVE query, the DISTINCT applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE(I am not well-versed in SQL either).
The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id as the other columns are different and the row on the whole is distinct.
NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.
Any ideas how i can do this?
One quick idea,not the best one, but will do the work-
hive>create table temp1(a int,b string);
hive>insert overwrite table temp1
select call_id,max(concat(stat1,'|',stat2,'|',stat3)) from intable group by call_id;
hive>insert overwrite table intable
select a,split(b,'|')[0],split(b,'|')[1],split(b,'|')[2] from temp1;
,,I want to apply the DISTINCT operation only to the call_id"
But how will then Hive know which row to eliminate?
Without knowing the amount of data / size of the stat fields you have, the following query can the job:
select distinct i1.call_id, i1.stat2, i1.stat3 from (
select call_id, MIN(concat(stat1, stat2, stat3)) as smin
from intable group by call_id
) i2 join intable i1 on i1.call_id = i2.call_id
AND concat(i1.stat1, i1.stat2, i1.stat3) = i2.smin;