How to query two tables with same schema but different time ranges in a date column - postgresql

I have two tables:
main_products
old_products
They have the same info and schema with only one difference:
main_products has min(date) = 2022-01 and max(date) = 2022-05
and
old_products has min(date) = 2020-01 and max(date) = 2020-12
How can I query to get all records from old_products + all records from main_products to get products from 2020-01 to 2022-05 ?
The product on both tables has and product_id field.
I tried to join both tables on product_id but the output is a table with twice number of columns.
select t1.*, t2.* from t1
inner join t2
one t1.product_id = t2.product_id

I think you are looking for a UNION or UNION ALL:
SELECT *
FROM t1
WHERE ...
UNION ALL
SELECT *
FROM t2
WHERE ...
If the columns in t1 and t2 are the same (same number of columns and same types), this will pull the data from both of them. Use UNION if you want duplicates removed or UNION ALL to include duplicates. (In your case it won't make a functional difference since the tables don't overlap by date, but UNION ALL will be faster.)
In the above example, you can put your condition (to only get 2022-01 to 2022-05) in both WHERE conditions. If you don't like repeating the condition, you can use the UNION ALL query in a subquery with the condition outside:
SELECT *
FROM
(
SELECT *
FROM t1
UNION ALL
SELECT *
FROM t2
) sq
WHERE ...

Related

Merge Row values from different columns to one column on top of each other: MySQL

I have Table1 with columns:
Fname,Sname
Table 2 with columns:
Fname,Lname
Now,in a query I want to take all the values from these two tables (First names and last names [Sname in table 1 is Lname]) and return to one columns.
Basically I want to create column to get list of participants which include everyone from these two tables.
Is it possible?
Both the tables are joined indirectly via third table.
You can use UNION ALL will give all the rows from both tables.
SELECT
Fname,
Sname,
CONCAT(Fname,Sname) AS FSname
FROM table1
UNION ALL
SELECT
Fname,
Lname,
CONCAT(Fname,Lname)
FROM table2;
The column names are taken from the first SELECT.
If you use UNION and not UNION ALL rows in table2 which are duplicates of table1 will be omitted, but it can run slower as the values have to be compared.
You can use the third table and LEFT JOIN onto both tables and use COALESCE which returns the first argument which is not null.
SELECT
COALESCE(t1.Fname,t2.Fname),
COALESCE(t1.Sname,t2.Lname),
CONCAT(
COALESCE(t1.Fname,t2.Fname),
COALESCE(t1.Sname,t2.Lname)
) AS FSname
FROM third_table t3
LEFT JOIN table1 t1
ON t1.id = t3.id
LEFT JOIN table2 t2
ON t2.id = te.id;

Date in one table is before date in another table - Postgres

I have a table 1
and Table 2
I need to get the following table where the date from table 1 is the closest (i.e. before) to the date from table 2 by id.
I assume I need to join two table where table1.id=table2.id and table1.date<=table2.date and then, rank to get the 'last' record in that merged table? Is it correct? Is there a simpler way?
You can see structure and result in: dbfiddle
select
distinct on (t1.id)
t1.id,
last_value(t1.type) over (order by to_date(t1.date, 'mm/dd/yyyy') desc)
from
table1 t1 inner join table2 t2 on t1.id = t2.id
where
to_date(t1.date, 'mm/dd/yyyy') <= to_date(t2.date, 'mm/dd/yyyy');

Find the next oldest row in Redshift

I have a table called user_activity in Redshift that has department, user_id, activity_type, activity_id, activity_date.
I'd like to query a daily report of how many days since the last event (of any type). Using CROSS APPLY (SQL Server) or LATERAL JOIN (Postgres 9+), I'd do something like...
SELECT d.date, a.last_activity_date
FROM date_table d
CROSS JOIN (
SELECT DISTINCT user_id FROM activity_table
) u
CROSS APPLY (
SELECT TOP 1 activity_date as last_activity_date
FROM activity_table
WHERE user_id = u.user_id AND activity_date <= d.date
ORDER BY activity_date DESC
) a
For now, I write it similar to the below, but it is a bit slow and I am afraid it'll only get slower.
with user_activity as (
select distinct activity_date, user_id from activity_table
)
select
d.date, u.user_id,
max(u.activity_date) as last_activity_date
from date_table d
inner join user_activity u on u.activity_date <= d.date
where d.date between '2020-01-01' and current_date
group by 1, 2
Can someone suggest a good alternative for my needs or for CROSS APPLY / LATERAL JOIN.
As you are seeing cross joining and inequality joining will slow down as you data grows and are generally not the approach you want in Redshift. This is due to the data size increase that comes with this type of action when applied to large data tables that are typical in Redshift.
You want to use window functions to perform this type of analysis. But you will need to step back and rethink how you will structure the SQL. A MAX(activity_date) window function, partitioned by user_id and ordered by date and with a frame clause of all preceding rows, will find the most recent activity to any activity.
Now this will produce only rows for user_ids and dates that exist in the data table and it looks like you want 1 row for each date for each user_id, right? To do this you need to UNION in a frame of data that has 1 row for each date for each user_id ahead of the window function. You will need NULLs in for the other columns so that the data widths match. You will also want the dates in a separate column from activity_date. Now all dates for all user ids will be in the source and the window function will give you the result you want.
You also ask ‘how is this better than the joins?’ Well in the joins you are replicating all the data records by the number of dates which can get really big. In this approach you just have the original data records plus one row per user_id per date (which is the size of your output) and as the number of records per user_id grows this approach doesn’t.
——— Request to modify asker’s code per comments made to their approach ———
Your code is definitely on the right track as you have removed the massive inequality join of your original. I made 2 comments about it. The first is that I believe you need GROUP BY user_id, date to prevent multiple rows per user_id per date that would result if there are records for the same user_id on a single date with differing activity_types. This is a simple oversight.
The second is to state that I intended for you to use UNION ALL, not LEFT JOIN, in combining the actual data and the user_id/date framework. Your approach works fine but I have found that unioning with very large amounts of data is generally faster than joining but you do need to make sure the columns match up. Either way we end up with a data segment with 3 columns - 2 date columns, one with NULLs for framework rows, and 1 user_id. Your approach is fine and the difference in performance is likely very small unless you have huge tables.
Since you asked for a rewrite, here it is with both changes. (NOTE: my laptop is in the shop so I don’t have ready access to Redshift at the moment and this SQL is untested. If the intent is not clear from this and you need me to debug it will be delayed by a few days. I’m keeping your setup methods and SQL structure.)
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select cal_date, user_id,
lag(max(activity_date), 1) ignore nulls over (partition by user_id order by date) as last_activity_date
from (
Select user_id, date as cal_date, NULL as activity_date from user_dates
Union all
Select user_id, activity_date as cal_date, activity_date from user_activity
)
Group by user_id, cal_date
)
select * from user_date_activity
order by user_id, cal_date```
This was my query based on Bill's answer.
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select ud.date, ud.user_id,
lag(ua.activity_date, 1) ignore nulls over (partition by ud.user_id order by ud.date) as last_activity_date
from user_dates ud
left join user_activity ua on ud.date = ua.activity_date and ud.user_id = ua.user_id
)
select * from user_date_activity
order by user_id, date

SQL left join on maximum date

I have two tables: contracts and contract_descriptions.
On contract_descriptions there is a column named contract_id which is equal on contracts table records.
I am trying to join the latest record on contract_descriptions:
SELECT *
FROM contracts c
LEFT JOIN contract_descriptions d ON d.contract_id = c.contract_id
AND d.date_description =
(SELECT MAX(date_description)
FROM contract_descriptions t
WHERE t.contract_id = c.contract_id)
It works, but is it the performant way to do it? Is there a way to avoid the second SELECT?
You could also alternatively use DISTINCT ON:
SELECT * FROM contracts c LEFT JOIN (
SELECT DISTINCT ON (cd.contract_id) cd.* FROM contract_descriptions cd
ORDER BY cd.contract_id, cd.date_description DESC
) d ON d.contract_id = c.contract_id
DISTINCT ON selects only one row per contract_id while the sort clause cd.date_description DESC ensures that it is always the last description.
Performance depends on many values (for example, table size). In any case, you should compare both approaches with EXPLAIN.
Your query looks okay to me. One typical way to join only n rows by some order from the other table is a lateral join:
SELECT *
FROM contracts c
CROSS JOIN LATERAL
(
SELECT *
FROM contract_descriptions cd
WHERE cd.contract_id = c.contract_id
ORDER BY cd.date_description DESC
FETCH FIRST 1 ROW ONLY
) cdlast;

Filter union result

I'm making select with a union.
SELECT * FROM table_1
UNION
SELECT * FROM table_2
Is it possible to filter query results by column values?
Yes, you can enclose your entire union inside another select:
select * from (
select * from table_1 union select * from table_2) as t
where t.column = 'y'
You have to introduce the alias for the table ("as t"). Also, if the data from the tables is disjoint, you might want to consider switching to UNION ALL - UNION by itself works to eliminate duplicates in the result set. This is frequently not necessary.
A simple to read solution is to use a CTE (common table expression). This takes the form:
WITH foobar AS (
SELECT foo, bar FROM table_1
UNION
SELECT foo, bar FROM table_2
)
Then you can refer to the CTE in subsequent queries by name, as if it were a normal table:
SELECT foo,bar FROM foobar WHERE foo = 'value'
CTEs are quite powerful, I recommend further reading here
One tip that you will not find in that MS article is; if you require more than one CTE put a comma between the expression statements. eg:
WITH foo AS (
SELECT thing FROM place WHERE field = 'Value'
),
bar AS (
SELECT otherthing FROM otherplace WHERE otherfield = 'Other Value'
)
If you want to filter the query based on some criteria then you could do this -
Select * from table_1 where table_1.col1 = <some value>
UNION
Select * from table_2 where table_2.col1 = <some value>
But, I would say if you want to filter result to find the common values then you can use joins instead
Select * from table_1 inner join table_2 on table_1.col1 = table_2.col1