Oracle: How to group records by certain columns before fetching results - amazon-redshift

I have a table in Redshift that looks like this:
col1 | col2 | col3 | col4 | col5 | col6
=======================================
123 | AB | SSSS | TTTT | PQR | XYZ
---------------------------------------
123 | AB | SSTT | TSTS | PQR | XYZ
---------------------------------------
123 | AB | PQRS | WXYZ | PQR | XYZ
---------------------------------------
123 | CD | SSTT | TSTS | PQR | XYZ
---------------------------------------
123 | CD | PQRS | WXYZ | PQR | XYZ
---------------------------------------
456 | AB | GGGG | RRRR | OPQ | RST
---------------------------------------
456 | AB | SSTT | TSTS | PQR | XYZ
---------------------------------------
456 | AB | PQRS | WXYZ | PQR | XYZ
I have another table that also has a similar structure and data.
From these tables, I need to select values that don't have 'SSSS' in col3 and 'TTTT' in col4 in (edited) either of the tables. I'd also need to group my results by the value in col1 and col2.
Here, I'd like my query to return:
123,CD
456,AB
I don't want 123, AB to be in my results, since one of the rows corresponding to 123, AB has SSSS and TTTT in col3 and col4 respectively. i.e, I want to omit items that have SSSS and TTTT in col3 and col4 in either of the two tables that I'm looking up.
I am very new to writing queries to extract information from a database, so please bear with my ignorance. I was told to explore GROUP BY and ORDER BY, but I am not sure I understand their usage well enough yet.
The query I have looks like:
SELECT * from table1 join table2 on
table1.col1 = table2.col1 AND
table1.col2 = table2.col2
WHERE
col3 NOT LIKE 'SSSS' AND
col4 NOT LIKE 'TTTT'
GROUP BY col1,col2
However, this query throws an error: col5 must appear in the GROUP BY clause or be used in an aggregate function;
I'm not sure how to proceed. I'd appreciate any help. Thank you!

It seems you also want DISTINCT results. In this case a solution with MINUS is probably as efficient as any other (and, remember, MINUS automatically also means DISTINCT):
select col1, col2 from table_name -- enter your column and table names here
minus
select col1, col2 from table_name where col3 = 'SSSS' and col4 = 'TTTT'
;
No need to group by anything!
With that said, here is a solution using GROUP BY. Note that the HAVING condition uses a non-trivial aggregate function - it is a COUNT() but what is counted is a CASE to take care of what was required. Note that it is not necessary/required that the aggregate function in the HAVING clause/condition be included in the SELECT list!
select col1, col2
from table_name
group by col1, col2
having count(case when col3 = 'SSSS' and col4 = 'TTTT' then 1 else null end) = 0
;

You should use the EXCEPT operator.
EXCEPT and MINUS are two different versions of the same operator.
Here is the syntax of what your query should look like
SELECT col1, col2 FROM table1
EXCEPT
SELECT col1, col2 FROM table1 WHERE col3 = 'SSSS' AND col4 = 'TTTT';
One important consideration is to know if your desired answer requires either the and or OR operator. Do you want to see the records where col3 = 'SSSS' and col4 has a value different than col4 = 'TTTT'?
If the answer is no you should use the version below:
SELECT col1, col2 FROM table1
EXCEPT
SELECT col1, col2 FROM table1 WHERE col3 = 'SSSS' OR col4 = 'TTTT';
You can learn more about the MINUS or EXCEPT operator on the Amazon Redshift documentation here.

Related

Is there anyway to load comma seperated string into a single column in hive?

'TOK_STRINGLITERALSEQUENCE not supported in insert/values' getting this error while loading data into the hive.
when trying to insert a comma-separated string into a single column it is showing error as
'TOK_STRINGLITERALSEQUENCE not supported in insert/values'
insert into table table_name values('llu'/t'ghf'/t'a,b,c,d'/t'gh,edf,ghu,kjhl'/t'1')
/t represents delimiter as tab
while loading data I am getting an error as 'TOK_STRINGLITERALSEQUENCE not supported in insert/values'.
Expected results
col1 col2 col3 col4 col5
llu ghf a,b,c,d gh,edf,ghu,kjhl 1
I'm not sure why you're using tab-delimitation for the insert statement. This worked for me in Hive version 1.2.1
create table test (col1 STRING, col2 STRING, col3 STRING, col4 STRING, col5 STRING);
insert into table test values('llu','ghf','a,b,c,d','gh,edf,ghu,kjhl','1');
select * from test;
+------------+------------+------------+------------------+------------+--+
| test.col1 | test.col2 | test.col3 | test.col4 | test.col5 |
+------------+------------+------------+------------------+------------+--+
| llu | ghf | a,b,c,d | gh,edf,ghu,kjhl | 1 |
+------------+------------+------------+------------------+------------+--+

Joining two tables on historical date

I have two tables. The first one:
col1 | col2 | ColumnOfInterest | DateOfInterest
--------------------------------------------------------
abc | def | ghi | 2013-02-24 17:48:32.548
.
.
.
The second one:
ColumnOfInterest | DateChanged | col3 | col4
--------------------------------------------------------
ghi | 2012-08-13 06:28:11.092 | jkl | mno
ghi | 2012-10-16 23:54:07.613 | pqr | stu
ghi | 2013-01-29 14:13:18.502 | vwx | yz1
ghi | 2013-10-01 14:17:32.992 | 234 | 567
.
.
.
What I'm trying to do is to make a 1:1 join between the two tables on the ColumnOfInterest and so that the DateOfInterest reflects the date from the second table.
That is, the line from the first table would be joined to the third line of the second table.
Do you have any ideas?
Thanks
select table1.ColumnOfInterest, max(table2.DateChanged)
from table1
join table2
on table1.ColumnOfInterest = table1.ColumnOfInterest
and table1.CDateOfInterest >= table2.DateChanged
group by table1.ColumnOfInterest
SELECT 'abc' col1,
'def' col2,
'ghi' ColumnOfInterest,
CAST('2013-02-24 17:48:32.548' AS DATETIME) DateOfInterest
INTO #DateOfInterest
CREATE TABLE #History
(
ColumnOfInterest VARCHAR(5),
DateChanged DATETIME,
col3 VARCHAR(5),
col4 VARCHAR(5)
)
INSERT INTO #History
VALUES ('ghi','2012-08-13 06:28:11.092','jkl','mno'),
('ghi','2012-10-16 23:54:07.613','pqr','stu'),
('ghi','2013-01-29 14:13:18.502','vwx','yz1'),
('ghi','2013-10-01 14:17:32.992','234','567');
;WITH CTE_Date_Ranges
AS
(
SELECT ColumnOfInterest,
DateChanged,
LAG(DateChanged,1,GETDATE()) OVER (PARTITION BY ColumnOfInterest ORDER BY DateChanged) AS end_date,
col3,
col4
FROM #History
)
SELECT B.*,
A.*
FROM CTE_Date_Ranges A
INNER JOIN #DateOfInterest B
ON B.DateOfInterest > A.DateChanged AND B.DateOfInterest < A.end_date
Results:
col1 col2 ColumnOfInterest DateOfInterest ColumnOfInterest DateChanged end_date col3 col4
---- ---- ---------------- ----------------------- ---------------- ----------------------- ----------------------- ----- -----
abc def ghi 2013-02-24 17:48:32.547 ghi 2012-08-13 06:28:11.093 2015-04-21 18:46:46.967 jkl mno

PostgreSQL XOR - How to check if only 1 column is filled in?

How can I simulate a XOR function in PostgreSQL? Or, at least, I think this is a XOR-kind-of situation.
Lets say the data is as follows:
id | col1 | col2 | col3
---+------+------+------
1 | 1 | | 4
2 | | 5 | 4
3 | | 8 |
4 | 12 | 5 | 4
5 | | | 4
6 | 1 | |
7 | | 12 |
And I want to return 1 column for those rows where only one of the columns is filled in. (ignore col3 for now..
Lets start with this example of 2 columns:
SELECT
id, COALESCE(col1, col2) AS col
FROM
my_table
WHERE
COALESCE(col1, col2) IS NOT NULL -- at least 1 is filled in
AND
(col1 IS NULL OR col2 IS NULL) -- at least 1 is empty
;
This works nicely an should result in:
id | col
---+----
1 | 1
3 | 8
6 | 1
7 | 12
But now, I would like to include col3 in a similar way. Like this:
id | col
---+----
1 | 1
3 | 8
5 | 4
6 | 1
7 | 12
How can this be done is a more generic way? Does Postgres support such a method?
I'm not able to find anything like it.
rows with exactly 1 column filled in:
select * from my_table where
(col1 is not null)::integer
+(col1 is not null)::integer
+(col1 is not null)::integer
=1
rows with 1 or 2
select * from my_table where
(col1 is not null)::integer
+(col1 is not null)::integer
+(col1 is not null)::integer
between 1 and 2
The "case" statement might be your friend here, the "min" aggregated function doesn't affect the result.
select id, min(coalesce(col1,col2,col3))
from my_table
group by 1
having sum(case when col1 is null then 0 else 1 end+
case when col2 is null then 0 else 1 end+
case when col3 is null then 0 else 1 end)=1
[Edit]
Well, i found a better answer without using aggregated functions, it's still based on the use of "case" but i think is more simple.
select id, coalesce(col1,col2,col3)
from my_table
where (case when col1 is null then 0 else 1 end+
case when col2 is null then 0 else 1 end+
case when col3 is null then 0 else 1 end)=1
How about
select coalesce(col1, col2, col3)
from my_table
where array_length(array_remove(array[col1, col2, col3], null), 1) = 1

Replacing a comma seperate value in table with another in select query (postgres)

I have two tables, table A has ID column whose values are comma separated, each of those ID value has a representation in table B.
Table A
+-----------------+
| Name | ID |
+------------------
| A1 | 1,2,3|
| A2 | 2 |
| A3 | 3,2 |
+------------------
Table B
+-------------------+
| ID | Value |
+-------------------+
| 1 | Apple |
| 2 | Orange |
| 3 | Mango |
+-------------------+
I was wondering if there is an efficient way to do a select where the result would as below,
Name, Value
A1 Apple, Orange, Mango
A2 Orange
A3 Mango, Orange
Any suggestions would be welcome. Thanks.
You need to first "normalize" table_a into a new table using the following:
select name, regexp_split_to_table(id, ',') id
from table_a;
The result of this can be joined to table_b and the result of the join then needs to be grouped in order to get the comma separated list of the names:
select a.name, string_agg(b.value, ',')
from (
select name, regexp_split_to_table(id, ',') id
from table_a
) a
JOIN table_b b on b.id = a.id
group by a.name;
SQLFiddle: http://sqlfiddle.com/#!12/77fdf/1
There are two regex related functions that can be useful:
http://www.postgresql.org/docs/current/static/functions-string.html
regexp_split_to_table()
regexp_split_to_array()
Code below is untested, but you'd use something like it to match A and B:
select name, value
from A
join B on B.id = ANY(regexp_split_to_array(A.id, E'\\s*,\\s*', 'g')::int[]))
You can then use array_agg(value), grouping by name, and format using array_to_string().
Two notes, though:
It won't be as efficient as normalizing things.
The formatting itself ought to be done further down, in your views.

Equivalent to unpivot() in PostgreSQL

Is there a unpivot equivalent function in PostgreSQL?
Create an example table:
CREATE TEMP TABLE foo (id int, a text, b text, c text);
INSERT INTO foo VALUES (1, 'ant', 'cat', 'chimp'), (2, 'grape', 'mint', 'basil');
You can 'unpivot' or 'uncrosstab' using UNION ALL:
SELECT id,
'a' AS colname,
a AS thing
FROM foo
UNION ALL
SELECT id,
'b' AS colname,
b AS thing
FROM foo
UNION ALL
SELECT id,
'c' AS colname,
c AS thing
FROM foo
ORDER BY id;
This runs 3 different subqueries on foo, one for each column we want to unpivot, and returns, in one table, every record from each of the subqueries.
But that will scan the table N times, where N is the number of columns you want to unpivot. This is inefficient, and a big problem when, for example, you're working with a very large table that takes a long time to scan.
Instead, use:
SELECT id,
unnest(array['a', 'b', 'c']) AS colname,
unnest(array[a, b, c]) AS thing
FROM foo
ORDER BY id;
This is easier to write, and it will only scan the table once.
array[a, b, c] returns an array object, with the values of a, b, and c as it's elements.
unnest(array[a, b, c]) breaks the results into one row for each of the array's elements.
You could use VALUES() and JOIN LATERAL to unpivot the columns.
Sample data:
CREATE TABLE test(id int, a INT, b INT, c INT);
INSERT INTO test(id,a,b,c) VALUES (1,11,12,13),(2,21,22,23),(3,31,32,33);
Query:
SELECT t.id, s.col_name, s.col_value
FROM test t
JOIN LATERAL(VALUES('a',t.a),('b',t.b),('c',t.c)) s(col_name, col_value) ON TRUE;
DBFiddle Demo
Using this approach it is possible to unpivot multiple groups of columns at once.
EDIT
Using Zack's suggestion:
SELECT t.id, col_name, col_value
FROM test t
CROSS JOIN LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
<=>
SELECT t.id, col_name, col_value
FROM test t
,LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
db<>fiddle demo
Great article by Thomas Kellerer found here
Unpivot with Postgres
Sometimes it’s necessary to normalize de-normalized tables - the opposite of a “crosstab” or “pivot” operation. Postgres does not support an UNPIVOT operator like Oracle or SQL Server, but simulating it, is very simple.
Take the following table that stores aggregated values per quarter:
create table customer_turnover
(
customer_id integer,
q1 integer,
q2 integer,
q3 integer,
q4 integer
);
And the following sample data:
customer_id | q1 | q2 | q3 | q4
------------+-----+-----+-----+----
1 | 100 | 210 | 203 | 304
2 | 150 | 118 | 422 | 257
3 | 220 | 311 | 271 | 269
But we want the quarters to be rows (as they should be in a normalized data model).
In Oracle or SQL Server this could be achieved with the UNPIVOT operator, but that is not available in Postgres. However Postgres’ ability to use the VALUES clause like a table makes this actually quite easy:
select c.customer_id, t.*
from customer_turnover c
cross join lateral (
values
(c.q1, 'Q1'),
(c.q2, 'Q2'),
(c.q3, 'Q3'),
(c.q4, 'Q4')
) as t(turnover, quarter)
order by customer_id, quarter;
will return the following result:
customer_id | turnover | quarter
------------+----------+--------
1 | 100 | Q1
1 | 210 | Q2
1 | 203 | Q3
1 | 304 | Q4
2 | 150 | Q1
2 | 118 | Q2
2 | 422 | Q3
2 | 257 | Q4
3 | 220 | Q1
3 | 311 | Q2
3 | 271 | Q3
3 | 269 | Q4
The equivalent query with the standard UNPIVOT operator would be:
select customer_id, turnover, quarter
from customer_turnover c
UNPIVOT (turnover for quarter in (q1 as 'Q1',
q2 as 'Q2',
q3 as 'Q3',
q4 as 'Q4'))
order by customer_id, quarter;
FYI for those of us looking for how to unpivot in RedShift.
The long form solution given by Stew appears to be the only way to accomplish this.
For those who cannot see it there, here is the text pasted below:
We do not have built-in functions that will do pivot or unpivot. However,
you can always write SQL to do that.
create table sales (regionid integer, q1 integer, q2 integer, q3 integer, q4 integer);
insert into sales values (1,10,12,14,16), (2,20,22,24,26);
select * from sales order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
pivot query
create table sales_pivoted (regionid, quarter, sales)
as
select regionid, 'Q1', q1 from sales
UNION ALL
select regionid, 'Q2', q2 from sales
UNION ALL
select regionid, 'Q3', q3 from sales
UNION ALL
select regionid, 'Q4', q4 from sales
;
select * from sales_pivoted order by regionid, quarter;
regionid | quarter | sales
----------+---------+-------
1 | Q1 | 10
1 | Q2 | 12
1 | Q3 | 14
1 | Q4 | 16
2 | Q1 | 20
2 | Q2 | 22
2 | Q3 | 24
2 | Q4 | 26
(8 rows)
unpivot query
select regionid, sum(Q1) as Q1, sum(Q2) as Q2, sum(Q3) as Q3, sum(Q4) as Q4
from
(select regionid,
case quarter when 'Q1' then sales else 0 end as Q1,
case quarter when 'Q2' then sales else 0 end as Q2,
case quarter when 'Q3' then sales else 0 end as Q3,
case quarter when 'Q4' then sales else 0 end as Q4
from sales_pivoted)
group by regionid
order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
Hope this helps, Neil
Pulling slightly modified content from the link in the comment from #a_horse_with_no_name into an answer because it works:
Installing Hstore
If you don't have hstore installed and are running PostgreSQL 9.1+, you can use the handy
CREATE EXTENSION hstore;
For lower versions, look for the hstore.sql file in share/contrib and run in your database.
Assuming that your source (e.g., wide data) table has one 'id' column, named id_field, and any number of 'value' columns, all of the same type, the following will create an unpivoted view of that table.
CREATE VIEW vw_unpivot AS
SELECT id_field, (h).key AS column_name, (h).value AS column_value
FROM (
SELECT id_field, each(hstore(foo) - 'id_field'::text) AS h
FROM zcta5 as foo
) AS unpiv ;
This works with any number of 'value' columns. All of the resulting values will be text, unless you cast, e.g., (h).value::numeric.
Just use JSON:
with data (id, name) as (
values (1, 'a'), (2, 'b')
)
select t.*
from data, lateral jsonb_each_text(to_jsonb(data)) with ordinality as t
order by data.id, t.ordinality;
This yields
|key |value|ordinality|
|----|-----|----------|
|id |1 |1 |
|name|a |2 |
|id |2 |1 |
|name|b |2 |
dbfiddle
I wrote a horrible unpivot function for PostgreSQL. It's rather slow but it at least returns results like you'd expect an unpivot operation to.
https://cgsrv1.arrc.csiro.au/blog/2010/05/14/unpivotuncrosstab-in-postgresql/
Hopefully you can find it useful..
Depending on what you want to do... something like this can be helpful.
with wide_table as (
select 1 a, 2 b, 3 c
union all
select 4 a, 5 b, 6 c
)
select unnest(array[a,b,c]) from wide_table
You can use FROM UNNEST() array handling to UnPivot a dataset, tandem with a correlated subquery (works w/ PG 9.4).
FROM UNNEST() is more powerful & flexible than the typical method of using FROM (VALUES .... ) to unpivot datasets. This is b/c FROM UNNEST() is variadic (with n-ary arity). By using a correlated subquery the need for the lateral ORDINAL clause is eliminated, & Postgres keeps the resulting parallel columnar sets in the proper ordinal sequence.
This is, BTW, FAST -- in practical use spawning 8 million rows in < 15 seconds on a 24-core system.
WITH _students AS ( /** CTE **/
SELECT * FROM
( SELECT 'jane'::TEXT ,'doe'::TEXT , 1::INT
UNION
SELECT 'john'::TEXT ,'doe'::TEXT , 2::INT
UNION
SELECT 'jerry'::TEXT ,'roe'::TEXT , 3::INT
UNION
SELECT 'jodi'::TEXT ,'roe'::TEXT , 4::INT
) s ( fn, ln, id )
) /** end WITH **/
SELECT s.id
, ax.fanm -- field labels, now expanded to two rows
, ax.anm -- field data, now expanded to two rows
, ax.someval -- manually incl. data
, ax.rankednum -- manually assigned ranks
,ax.genser -- auto-generate ranks
FROM _students s
,UNNEST /** MULTI-UNNEST() BLOCK **/
(
( SELECT ARRAY[ fn, ln ]::text[] AS anm -- expanded into two rows by outer UNNEST()
/** CORRELATED SUBQUERY **/
FROM _students s2 WHERE s2.id = s.id -- outer relation
)
,( /** ordinal relationship preserved in variadic UNNEST() **/
SELECT ARRAY[ 'first name', 'last name' ]::text[] -- exp. into 2 rows
AS fanm
)
,( SELECT ARRAY[ 'z','x','y'] -- only 3 rows gen'd, but ordinal rela. kept
AS someval
)
,( SELECT ARRAY[ 1,2,3,4,5 ] -- 5 rows gen'd, ordinal rela. kept.
AS rankednum
)
,( SELECT ARRAY( /** you may go wild ... **/
SELECT generate_series(1, 15, 3 )
AS genser
)
)
) ax ( anm, fanm, someval, rankednum , genser )
;
RESULT SET:
+--------+----------------+-----------+----------+---------+-------
| id | fanm | anm | someval |rankednum| [ etc. ]
+--------+----------------+-----------+----------+---------+-------
| 2 | first name | john | z | 1 | .
| 2 | last name | doe | y | 2 | .
| 2 | [null] | [null] | x | 3 | .
| 2 | [null] | [null] | [null] | 4 | .
| 2 | [null] | [null] | [null] | 5 | .
| 1 | first name | jane | z | 1 | .
| 1 | last name | doe | y | 2 | .
| 1 | | | x | 3 | .
| 1 | | | | 4 | .
| 1 | | | | 5 | .
| 4 | first name | jodi | z | 1 | .
| 4 | last name | roe | y | 2 | .
| 4 | | | x | 3 | .
| 4 | | | | 4 | .
| 4 | | | | 5 | .
| 3 | first name | jerry | z | 1 | .
| 3 | last name | roe | y | 2 | .
| 3 | | | x | 3 | .
| 3 | | | | 4 | .
| 3 | | | | 5 | .
+--------+----------------+-----------+----------+---------+ ----
Here's a way that combines the hstore and CROSS JOIN approaches from other answers.
It's a modified version of my answer to a similar question, which is itself based on the method at https://blog.sql-workbench.eu/post/dynamic-unpivot/ and another answer to that question.
-- Example wide data with a column for each year...
WITH example_wide_data("id", "2001", "2002", "2003", "2004") AS (
VALUES
(1, 4, 5, 6, 7),
(2, 8, 9, 10, 11)
)
-- that is tided to have "year" and "value" columns
SELECT
id,
r.key AS year,
r.value AS value
FROM
example_wide_data w
CROSS JOIN
each(hstore(w.*)) AS r(key, value)
WHERE
-- This chooses columns that look like years
-- In other cases you might need a different condition
r.key ~ '^[0-9]{4}$';
It has a few benefits over other solutions:
By using hstore and not jsonb, it hopefully minimises issues with type conversions (although hstore does convert everything to text)
The columns don't need to be hard coded or known in advance. Here, columns are chosen by a regex on the name, but you could use any SQL logic based on the name, or even the value.
It doesn't require PL/pgSQL - it's all SQL