Joining two tables on historical date - tsql

I have two tables. The first one:
col1 | col2 | ColumnOfInterest | DateOfInterest
--------------------------------------------------------
abc | def | ghi | 2013-02-24 17:48:32.548
.
.
.
The second one:
ColumnOfInterest | DateChanged | col3 | col4
--------------------------------------------------------
ghi | 2012-08-13 06:28:11.092 | jkl | mno
ghi | 2012-10-16 23:54:07.613 | pqr | stu
ghi | 2013-01-29 14:13:18.502 | vwx | yz1
ghi | 2013-10-01 14:17:32.992 | 234 | 567
.
.
.
What I'm trying to do is to make a 1:1 join between the two tables on the ColumnOfInterest and so that the DateOfInterest reflects the date from the second table.
That is, the line from the first table would be joined to the third line of the second table.
Do you have any ideas?
Thanks

select table1.ColumnOfInterest, max(table2.DateChanged)
from table1
join table2
on table1.ColumnOfInterest = table1.ColumnOfInterest
and table1.CDateOfInterest >= table2.DateChanged
group by table1.ColumnOfInterest

SELECT 'abc' col1,
'def' col2,
'ghi' ColumnOfInterest,
CAST('2013-02-24 17:48:32.548' AS DATETIME) DateOfInterest
INTO #DateOfInterest
CREATE TABLE #History
(
ColumnOfInterest VARCHAR(5),
DateChanged DATETIME,
col3 VARCHAR(5),
col4 VARCHAR(5)
)
INSERT INTO #History
VALUES ('ghi','2012-08-13 06:28:11.092','jkl','mno'),
('ghi','2012-10-16 23:54:07.613','pqr','stu'),
('ghi','2013-01-29 14:13:18.502','vwx','yz1'),
('ghi','2013-10-01 14:17:32.992','234','567');
;WITH CTE_Date_Ranges
AS
(
SELECT ColumnOfInterest,
DateChanged,
LAG(DateChanged,1,GETDATE()) OVER (PARTITION BY ColumnOfInterest ORDER BY DateChanged) AS end_date,
col3,
col4
FROM #History
)
SELECT B.*,
A.*
FROM CTE_Date_Ranges A
INNER JOIN #DateOfInterest B
ON B.DateOfInterest > A.DateChanged AND B.DateOfInterest < A.end_date
Results:
col1 col2 ColumnOfInterest DateOfInterest ColumnOfInterest DateChanged end_date col3 col4
---- ---- ---------------- ----------------------- ---------------- ----------------------- ----------------------- ----- -----
abc def ghi 2013-02-24 17:48:32.547 ghi 2012-08-13 06:28:11.093 2015-04-21 18:46:46.967 jkl mno

Related

How to construct a LATERAL JOIN in SQLAlchemy

I have a table tb like this:
db=# SELECT * FROM tb ORDER by id;
id | name | luckynumber
----+--------+-------------
1 | Alice | 100
2 | Bob | 300
3 | Chad | 13
4 | Dave | 123
5 | Eve | 130
6 | Faythe | 777
(6 rows)
An SQL below returns modulo of each person`s luckynumber to Chad's lucky number:
db=# SELECT name, mod(a.luckynumber, b.luckynumber) as modulo FROM tb AS a,
LATERAL (SELECT id, luckynumber FROM tb WHERE name = 'Chad') AS b
WHERE a.id <> b.id ORDER by modulo;
name | modulo
--------+--------
Eve | 0
Bob | 1
Dave | 6
Alice | 9
Faythe | 10
(5 rows)
What is a SQLAlchemy equivalent of this SQL ? Thank you!
In complement as a_horse_with_no_name... Or with a simple join...
select a.name, mod(a.luckynumber, b.luckynumber) as modulo
FROM tb AS a
join tb as b ON a.id <> b.id
where b.name = 'Chad'
ORDER by modulo;

Selecting on a condition in window function postgresql

I am using postgresql and applying window function. previously I had to find first gid with same last name , and address(street_address and city) so i simply put last name in partition by clause in window function.
but now I have requirement to find first g_id of which last name is not same. while address is same How can I do it ?
This is what i was doing previously.
SELECT g_id as g_id,
First_value(g_id)
OVER (PARTITION BY lname,street_address , city ,
order by last_date DESC NULLS LAST )as c_id,
street_address as street_address FROM my table;
lets say this is my db
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x2 | bar | abc road | khi | 12-6-19
x3 | foo | abc road | khi | 19-6-19
x4 | harry | abc road | khi | 17-6-19
x5 | bar | xyz road | khi | 11-6-19
_________________________________________________
In previous scenario :
for if i run for the first row my c_id, it should return 'x2' as it considers these rows:
_________________________________________________
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x2 | bar | abc road | khi | 12-6-19
_________________________________________________
and return a row with latest last_date.
what i want now to select these rows (rows with same street_address and city but no same l_name):
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x3 | foo | abc road | khi | 19-6-19
x4 | harry | abc road | khi | 17-6-19
_________________________________________________
and output will be x3.
somehow i want to compare last_name column if it is not equals to the current value of last name and then partition by address field. and if no rows satisfy the condition c_id should be equal to current g_id
Looking at your expected output,it's not clear whether you want earliest or oldest for each group. You may change the ORDER BY accordingly for last_date in this query which uses DISTINCT ON
SELECT DISTINCT ON ( street_address, city, l_name) *
FROM mytable
ORDER BY street_address,
city,
l_name,
last_date --change this to last_date desc if you want latest
DEMO
After discussing the details in this chat:
demo:db<>fiddle
SELECT DISTINCT ON (t1.g_id)
t1.*,
COALESCE(t2.g_id, t1.g_id) AS g_id
FROM
mytable t1
LEFT JOIN mytable t2
ON t1.street_address = t2.street_address AND t1.l_name != t2.l_name
ORDER BY t1.g_id, t2.last_date DESC
here is how I solved it using subquery
creating example table.
CREATE TABLE mytable
("g_id" varchar(2), "l_name" varchar(5), "street_address" varchar(8), "city" varchar(3), "last_date" date)
;
INSERT INTO mytable
("g_id", "l_name", "street_address", "city", "last_date")
VALUES
('x1', 'bar', 'abc road', 'khi', '11-6-19'),
('x2', 'bar', 'abc road', 'khi', '12-6-19'),
('x3', 'foo', 'abc road', 'khi', '19-6-19'),
('x4', 'harry', 'abc road', 'khi', '17-6-19'),
('x5', 'bar', 'xyz road', 'khi', '11-6-19')
;
query to get g_ids
SELECT * ,
(select b.g_id from mytable b where (base.g_id = b.g_id) or (base.l_name <>
b.l_name and base.street_address = b.street_address and base.city = b.city )
order by b.last_date desc limit 1)
from mytable base

Oracle: How to group records by certain columns before fetching results

I have a table in Redshift that looks like this:
col1 | col2 | col3 | col4 | col5 | col6
=======================================
123 | AB | SSSS | TTTT | PQR | XYZ
---------------------------------------
123 | AB | SSTT | TSTS | PQR | XYZ
---------------------------------------
123 | AB | PQRS | WXYZ | PQR | XYZ
---------------------------------------
123 | CD | SSTT | TSTS | PQR | XYZ
---------------------------------------
123 | CD | PQRS | WXYZ | PQR | XYZ
---------------------------------------
456 | AB | GGGG | RRRR | OPQ | RST
---------------------------------------
456 | AB | SSTT | TSTS | PQR | XYZ
---------------------------------------
456 | AB | PQRS | WXYZ | PQR | XYZ
I have another table that also has a similar structure and data.
From these tables, I need to select values that don't have 'SSSS' in col3 and 'TTTT' in col4 in (edited) either of the tables. I'd also need to group my results by the value in col1 and col2.
Here, I'd like my query to return:
123,CD
456,AB
I don't want 123, AB to be in my results, since one of the rows corresponding to 123, AB has SSSS and TTTT in col3 and col4 respectively. i.e, I want to omit items that have SSSS and TTTT in col3 and col4 in either of the two tables that I'm looking up.
I am very new to writing queries to extract information from a database, so please bear with my ignorance. I was told to explore GROUP BY and ORDER BY, but I am not sure I understand their usage well enough yet.
The query I have looks like:
SELECT * from table1 join table2 on
table1.col1 = table2.col1 AND
table1.col2 = table2.col2
WHERE
col3 NOT LIKE 'SSSS' AND
col4 NOT LIKE 'TTTT'
GROUP BY col1,col2
However, this query throws an error: col5 must appear in the GROUP BY clause or be used in an aggregate function;
I'm not sure how to proceed. I'd appreciate any help. Thank you!
It seems you also want DISTINCT results. In this case a solution with MINUS is probably as efficient as any other (and, remember, MINUS automatically also means DISTINCT):
select col1, col2 from table_name -- enter your column and table names here
minus
select col1, col2 from table_name where col3 = 'SSSS' and col4 = 'TTTT'
;
No need to group by anything!
With that said, here is a solution using GROUP BY. Note that the HAVING condition uses a non-trivial aggregate function - it is a COUNT() but what is counted is a CASE to take care of what was required. Note that it is not necessary/required that the aggregate function in the HAVING clause/condition be included in the SELECT list!
select col1, col2
from table_name
group by col1, col2
having count(case when col3 = 'SSSS' and col4 = 'TTTT' then 1 else null end) = 0
;
You should use the EXCEPT operator.
EXCEPT and MINUS are two different versions of the same operator.
Here is the syntax of what your query should look like
SELECT col1, col2 FROM table1
EXCEPT
SELECT col1, col2 FROM table1 WHERE col3 = 'SSSS' AND col4 = 'TTTT';
One important consideration is to know if your desired answer requires either the and or OR operator. Do you want to see the records where col3 = 'SSSS' and col4 has a value different than col4 = 'TTTT'?
If the answer is no you should use the version below:
SELECT col1, col2 FROM table1
EXCEPT
SELECT col1, col2 FROM table1 WHERE col3 = 'SSSS' OR col4 = 'TTTT';
You can learn more about the MINUS or EXCEPT operator on the Amazon Redshift documentation here.

Showing only TOP 1 value result from join duplicates

I have 3 tables like below. You will see how they are joined.
Orders Table
+---------+------------+
| Orderid | LocationId |
+---------+------------+
| 36 | 14 |
| 38 | 13 |
+---------+------------+
OrdersDetails Table
+-----------+------------+
| Detailsid | OrderId |
+-----------+------------+
| 38 | 36 |
| 39 | 36 |
| 40 | 38 |
+-----------+------------+
OrderLocations
+------------+------------+
| Locationid | DistanceKM |
+------------+------------+
| 13 | 550 |
| 14 | 245 |
+------------+------------+
When doing an inner join of the 3 tables we get:
I don't want to have a duplicate DistanceKM, ex. 245. I would like a 0 instead for line item 2 like this:
Here is my solution:
Creating tables:
CREATE TABLE #Orders
(
Orderid INT, LocationId INT
);
INSERT INTO #Orders
VALUES
(36, 14
),
(38, 13
);
CREATE TABLE #OrdersDetails
(
Detailsid INT, OrderId INT
);
INSERT INTO #OrdersDetails
VALUES
(38, 36
),
(39, 36
),
(40, 38
);
CREATE TABLE #OrderLocations
(
Locationid INT, DistanceKM INT
);
INSERT INTO #OrderLocations
VALUES
(13, 550
),
(14, 245
);
The actual query:
;WITH cte
AS
(SELECT o.Orderid, d.Detailsid, l.DistanceKM, ROW_NUMBER() OVER
(PARTITION BY l.DistanceKM ORDER BY o.Orderid
) AS rn
FROM #Orders AS o
INNER JOIN
#OrdersDetails AS d
ON o.Orderid = d.OrderId
INNER JOIN
#OrderLocations AS l
ON o.LocationId = l.Locationid
)
SELECT cte.Orderid, cte.Detailsid,
CASE
WHEN cte.rn > 1
THEN 0
ELSE cte.DistanceKM
END AS DistanceKM
FROM CTE;
And here is the results:

Equivalent to unpivot() in PostgreSQL

Is there a unpivot equivalent function in PostgreSQL?
Create an example table:
CREATE TEMP TABLE foo (id int, a text, b text, c text);
INSERT INTO foo VALUES (1, 'ant', 'cat', 'chimp'), (2, 'grape', 'mint', 'basil');
You can 'unpivot' or 'uncrosstab' using UNION ALL:
SELECT id,
'a' AS colname,
a AS thing
FROM foo
UNION ALL
SELECT id,
'b' AS colname,
b AS thing
FROM foo
UNION ALL
SELECT id,
'c' AS colname,
c AS thing
FROM foo
ORDER BY id;
This runs 3 different subqueries on foo, one for each column we want to unpivot, and returns, in one table, every record from each of the subqueries.
But that will scan the table N times, where N is the number of columns you want to unpivot. This is inefficient, and a big problem when, for example, you're working with a very large table that takes a long time to scan.
Instead, use:
SELECT id,
unnest(array['a', 'b', 'c']) AS colname,
unnest(array[a, b, c]) AS thing
FROM foo
ORDER BY id;
This is easier to write, and it will only scan the table once.
array[a, b, c] returns an array object, with the values of a, b, and c as it's elements.
unnest(array[a, b, c]) breaks the results into one row for each of the array's elements.
You could use VALUES() and JOIN LATERAL to unpivot the columns.
Sample data:
CREATE TABLE test(id int, a INT, b INT, c INT);
INSERT INTO test(id,a,b,c) VALUES (1,11,12,13),(2,21,22,23),(3,31,32,33);
Query:
SELECT t.id, s.col_name, s.col_value
FROM test t
JOIN LATERAL(VALUES('a',t.a),('b',t.b),('c',t.c)) s(col_name, col_value) ON TRUE;
DBFiddle Demo
Using this approach it is possible to unpivot multiple groups of columns at once.
EDIT
Using Zack's suggestion:
SELECT t.id, col_name, col_value
FROM test t
CROSS JOIN LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
<=>
SELECT t.id, col_name, col_value
FROM test t
,LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
db<>fiddle demo
Great article by Thomas Kellerer found here
Unpivot with Postgres
Sometimes it’s necessary to normalize de-normalized tables - the opposite of a “crosstab” or “pivot” operation. Postgres does not support an UNPIVOT operator like Oracle or SQL Server, but simulating it, is very simple.
Take the following table that stores aggregated values per quarter:
create table customer_turnover
(
customer_id integer,
q1 integer,
q2 integer,
q3 integer,
q4 integer
);
And the following sample data:
customer_id | q1 | q2 | q3 | q4
------------+-----+-----+-----+----
1 | 100 | 210 | 203 | 304
2 | 150 | 118 | 422 | 257
3 | 220 | 311 | 271 | 269
But we want the quarters to be rows (as they should be in a normalized data model).
In Oracle or SQL Server this could be achieved with the UNPIVOT operator, but that is not available in Postgres. However Postgres’ ability to use the VALUES clause like a table makes this actually quite easy:
select c.customer_id, t.*
from customer_turnover c
cross join lateral (
values
(c.q1, 'Q1'),
(c.q2, 'Q2'),
(c.q3, 'Q3'),
(c.q4, 'Q4')
) as t(turnover, quarter)
order by customer_id, quarter;
will return the following result:
customer_id | turnover | quarter
------------+----------+--------
1 | 100 | Q1
1 | 210 | Q2
1 | 203 | Q3
1 | 304 | Q4
2 | 150 | Q1
2 | 118 | Q2
2 | 422 | Q3
2 | 257 | Q4
3 | 220 | Q1
3 | 311 | Q2
3 | 271 | Q3
3 | 269 | Q4
The equivalent query with the standard UNPIVOT operator would be:
select customer_id, turnover, quarter
from customer_turnover c
UNPIVOT (turnover for quarter in (q1 as 'Q1',
q2 as 'Q2',
q3 as 'Q3',
q4 as 'Q4'))
order by customer_id, quarter;
FYI for those of us looking for how to unpivot in RedShift.
The long form solution given by Stew appears to be the only way to accomplish this.
For those who cannot see it there, here is the text pasted below:
We do not have built-in functions that will do pivot or unpivot. However,
you can always write SQL to do that.
create table sales (regionid integer, q1 integer, q2 integer, q3 integer, q4 integer);
insert into sales values (1,10,12,14,16), (2,20,22,24,26);
select * from sales order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
pivot query
create table sales_pivoted (regionid, quarter, sales)
as
select regionid, 'Q1', q1 from sales
UNION ALL
select regionid, 'Q2', q2 from sales
UNION ALL
select regionid, 'Q3', q3 from sales
UNION ALL
select regionid, 'Q4', q4 from sales
;
select * from sales_pivoted order by regionid, quarter;
regionid | quarter | sales
----------+---------+-------
1 | Q1 | 10
1 | Q2 | 12
1 | Q3 | 14
1 | Q4 | 16
2 | Q1 | 20
2 | Q2 | 22
2 | Q3 | 24
2 | Q4 | 26
(8 rows)
unpivot query
select regionid, sum(Q1) as Q1, sum(Q2) as Q2, sum(Q3) as Q3, sum(Q4) as Q4
from
(select regionid,
case quarter when 'Q1' then sales else 0 end as Q1,
case quarter when 'Q2' then sales else 0 end as Q2,
case quarter when 'Q3' then sales else 0 end as Q3,
case quarter when 'Q4' then sales else 0 end as Q4
from sales_pivoted)
group by regionid
order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
Hope this helps, Neil
Pulling slightly modified content from the link in the comment from #a_horse_with_no_name into an answer because it works:
Installing Hstore
If you don't have hstore installed and are running PostgreSQL 9.1+, you can use the handy
CREATE EXTENSION hstore;
For lower versions, look for the hstore.sql file in share/contrib and run in your database.
Assuming that your source (e.g., wide data) table has one 'id' column, named id_field, and any number of 'value' columns, all of the same type, the following will create an unpivoted view of that table.
CREATE VIEW vw_unpivot AS
SELECT id_field, (h).key AS column_name, (h).value AS column_value
FROM (
SELECT id_field, each(hstore(foo) - 'id_field'::text) AS h
FROM zcta5 as foo
) AS unpiv ;
This works with any number of 'value' columns. All of the resulting values will be text, unless you cast, e.g., (h).value::numeric.
Just use JSON:
with data (id, name) as (
values (1, 'a'), (2, 'b')
)
select t.*
from data, lateral jsonb_each_text(to_jsonb(data)) with ordinality as t
order by data.id, t.ordinality;
This yields
|key |value|ordinality|
|----|-----|----------|
|id |1 |1 |
|name|a |2 |
|id |2 |1 |
|name|b |2 |
dbfiddle
I wrote a horrible unpivot function for PostgreSQL. It's rather slow but it at least returns results like you'd expect an unpivot operation to.
https://cgsrv1.arrc.csiro.au/blog/2010/05/14/unpivotuncrosstab-in-postgresql/
Hopefully you can find it useful..
Depending on what you want to do... something like this can be helpful.
with wide_table as (
select 1 a, 2 b, 3 c
union all
select 4 a, 5 b, 6 c
)
select unnest(array[a,b,c]) from wide_table
You can use FROM UNNEST() array handling to UnPivot a dataset, tandem with a correlated subquery (works w/ PG 9.4).
FROM UNNEST() is more powerful & flexible than the typical method of using FROM (VALUES .... ) to unpivot datasets. This is b/c FROM UNNEST() is variadic (with n-ary arity). By using a correlated subquery the need for the lateral ORDINAL clause is eliminated, & Postgres keeps the resulting parallel columnar sets in the proper ordinal sequence.
This is, BTW, FAST -- in practical use spawning 8 million rows in < 15 seconds on a 24-core system.
WITH _students AS ( /** CTE **/
SELECT * FROM
( SELECT 'jane'::TEXT ,'doe'::TEXT , 1::INT
UNION
SELECT 'john'::TEXT ,'doe'::TEXT , 2::INT
UNION
SELECT 'jerry'::TEXT ,'roe'::TEXT , 3::INT
UNION
SELECT 'jodi'::TEXT ,'roe'::TEXT , 4::INT
) s ( fn, ln, id )
) /** end WITH **/
SELECT s.id
, ax.fanm -- field labels, now expanded to two rows
, ax.anm -- field data, now expanded to two rows
, ax.someval -- manually incl. data
, ax.rankednum -- manually assigned ranks
,ax.genser -- auto-generate ranks
FROM _students s
,UNNEST /** MULTI-UNNEST() BLOCK **/
(
( SELECT ARRAY[ fn, ln ]::text[] AS anm -- expanded into two rows by outer UNNEST()
/** CORRELATED SUBQUERY **/
FROM _students s2 WHERE s2.id = s.id -- outer relation
)
,( /** ordinal relationship preserved in variadic UNNEST() **/
SELECT ARRAY[ 'first name', 'last name' ]::text[] -- exp. into 2 rows
AS fanm
)
,( SELECT ARRAY[ 'z','x','y'] -- only 3 rows gen'd, but ordinal rela. kept
AS someval
)
,( SELECT ARRAY[ 1,2,3,4,5 ] -- 5 rows gen'd, ordinal rela. kept.
AS rankednum
)
,( SELECT ARRAY( /** you may go wild ... **/
SELECT generate_series(1, 15, 3 )
AS genser
)
)
) ax ( anm, fanm, someval, rankednum , genser )
;
RESULT SET:
+--------+----------------+-----------+----------+---------+-------
| id | fanm | anm | someval |rankednum| [ etc. ]
+--------+----------------+-----------+----------+---------+-------
| 2 | first name | john | z | 1 | .
| 2 | last name | doe | y | 2 | .
| 2 | [null] | [null] | x | 3 | .
| 2 | [null] | [null] | [null] | 4 | .
| 2 | [null] | [null] | [null] | 5 | .
| 1 | first name | jane | z | 1 | .
| 1 | last name | doe | y | 2 | .
| 1 | | | x | 3 | .
| 1 | | | | 4 | .
| 1 | | | | 5 | .
| 4 | first name | jodi | z | 1 | .
| 4 | last name | roe | y | 2 | .
| 4 | | | x | 3 | .
| 4 | | | | 4 | .
| 4 | | | | 5 | .
| 3 | first name | jerry | z | 1 | .
| 3 | last name | roe | y | 2 | .
| 3 | | | x | 3 | .
| 3 | | | | 4 | .
| 3 | | | | 5 | .
+--------+----------------+-----------+----------+---------+ ----
Here's a way that combines the hstore and CROSS JOIN approaches from other answers.
It's a modified version of my answer to a similar question, which is itself based on the method at https://blog.sql-workbench.eu/post/dynamic-unpivot/ and another answer to that question.
-- Example wide data with a column for each year...
WITH example_wide_data("id", "2001", "2002", "2003", "2004") AS (
VALUES
(1, 4, 5, 6, 7),
(2, 8, 9, 10, 11)
)
-- that is tided to have "year" and "value" columns
SELECT
id,
r.key AS year,
r.value AS value
FROM
example_wide_data w
CROSS JOIN
each(hstore(w.*)) AS r(key, value)
WHERE
-- This chooses columns that look like years
-- In other cases you might need a different condition
r.key ~ '^[0-9]{4}$';
It has a few benefits over other solutions:
By using hstore and not jsonb, it hopefully minimises issues with type conversions (although hstore does convert everything to text)
The columns don't need to be hard coded or known in advance. Here, columns are chosen by a regex on the name, but you could use any SQL logic based on the name, or even the value.
It doesn't require PL/pgSQL - it's all SQL