Redshift: replace FULL OUTER for a CROSS JOIN

Redshift: replace FULL OUTER for a CROSS JOIN - amazon-redshift

I would like to perform a full outer join using multiple OR values but i've read that PostgreSQL can only do a full outer join in a situation where the join conditions are distinct on each side of the = sign.
In my scenario, I have 2 tables: ticket and production. One register on Ticket can have a few values for Production.code. Example:
TICKET|custom_field_1|custom_field_2|custom_field_3
1| 10 |9 |
2| |8 |
PRODUCTION|CODE
1| 10
5| 8
12| 9
In the following example, Ticket ID 1 is related with Production Code 9 and 10. And Ticket ID 2 is related with Production Code 8.
I'm trying to write a query to return column Status from table Production:
SELECT
production.status
FROM ticket
FULL OUTER JOIN production ON ticket.custom_field_1 = production.code
OR ticket.custom_field_2 = production.code
OR ticket.custom_field_3 = production.code
GROUP BY 1
ORDER BY 1
LIMIT 1000
When I try to run this query, I got an error: Invalid operation: FULL JOIN is only supported with merge-joinable join conditions;
So I've started to replace it for a CROSS JOIN. The query is almost working but I'm facing a difference number of rows:
SELECT count(production.id) FROM ticket
CROSS JOIN production
WHERE date(production.ts_real) >= '2019-03-01' AND
((ticket.custom_field_1 = sisweb_producao.proposta) OR
(ticket.custom_field_2 = sisweb_producao.proposta) OR
(ticket.custom_field_3 = sisweb_producao.proposta));
This query above should return 202 rows but only gives 181 because of my conditions. How can i make the cross join works like a FULL OUTER?
I'm using a tool called Looker, that's why I'm building this query on this way.

It's not quite clear what the schema of your tables is as some of your example SQL contains columns not in the example schema, but it looks like you could use an alternative approach of pivoting the ticket columns and joining them to the production table using an inner join to achieve the same thing:
SELECT
t1.ticket
, production.id
, production.status
FROM
(
SELECT
ticket
, custom_field_1 AS code
FROM
ticket
WHERE
custom_field_1 IS NOT NULL
UNION
SELECT
ticket
, custom_field_2 AS code
FROM
ticket
WHERE
custom_field_2 IS NOT NULL
UNION
SELECT
ticket
, custom_field_3 AS code
FROM
ticket
WHERE
custom_field_3 IS NOT NULL
) t1
INNER JOIN
production ON t1.code = production.code
Based on the example data you provided, it looks like a ticket can be related to more than one production code, and hence more than one "status", so whichever way you do this be aware you will potentially have multiple result rows per ticket.

Related

PostgreSQL how do I COUNT with a condition?

Can someone please assist with a query I am working on for school using a sample database from PostgreSQL tutorial? Here is my query in PostgreSQL that gets me the raw data that I can export to excel and then put in a pivot table to get the needed counts. The goal is to make a query that counts so I don't have to do the manual extraction to excel and subsequent pivot table:
SELECT
i.film_id,
r.rental_id
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
ORDER BY film_id, rental_id
;
From the database this gives me a list of films (by film_id) showing each time the film was rented (by rental_id). That query works fine if just exporting to excel. Since we don't want to do that manual process what I need is to add into my query how to count how many times a given film (by film_id) was rented. The results should be something like this (just showing the first five here, the query need not do that):
film_id | COUNT of rental_id
1 | 23
2 | 7
3 | 12
4 | 23
5 | 12
Database setup instructions can be found here: LINK
I have tried using COUNTIF and CASE (following other posts here) and I can't get either to work, please help.

Did you try this?:
SELECT
i.film_id,
COUNT(1)
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
GROUP BY i.film_id
ORDER BY film_id;
If there can be >1 rental_id in your data you may want to use COUNT(DISTINCT r.rental_id)

Postgres SQL query group by get most recent record instead of an aggregate

This is a current postgres query I have:
sql = """
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
avg(cms.incremental_opens),
avg(cms.incremental_clicks),
avg(cms.incremental_conversions)
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id
"""
whereby I get the average incremental_opens, incremental_clicks, and incremental_conversions per campaign group from the cms table. However, instead of the average, I want the most recent values for the 3 fields. See the cms table screenshot below - I want the values from the record with the greatest (i.e. most recent) event_id (instead of an average for all records) for a given group).
How can I do this? Thanks

It sounds like you want a lateral join.
FROM
experiments.variant_metric_snapshot vms
CROSS JOIN LATERAL (select * from experiments.campaign_metric_snapshot cms where vms.campaign_id = cms.campaign_id order by event_id desc LIMIT 1) cms
WHERE...

If you are after a quick and dirty solution you can use array_agg function with minimal change to your query.
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
(array_agg(cms.incremental_opens ORDER BY cms.event_id DESC))[1] AS incremental_opens,
..
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id;

R2DBC adjacency list get all children

I have a table which has id and parentId columns: i refer to this structure as an adjacency list.
So, now i want to get all children of arbitrary id. Classic solution of this problem use recursion, for example here is Postgres procedure or CTE implementation.
I'm currently using Spring Webflux and Spring Data R2DBC + Postgres R2DBC driver (which doesn't support stored procedures yet).
How can i approach this problem in reactive style? Is it even possible or am i missing something conceptually wrong?
UPD 1:
Let's image we've data like:
+-------------+---------+
|id |parent_id|
+-------------+---------+
|root |NULL |
|id1 |root |
|dir1 |root |
|dir1_id1 |dir1 |
|dir1_dir1 |dir1 |
|dir1_dir1_id1|dir1_dir1|
+-------------+---------+
Now i want to have a method inside a ReactiveCrudRepository, which will return all children of provided id.
For example, using sample data: by providing id='dir1', i want to get children with ids: ['dir1_id1', "dir1_dir1", "dir1_dir1_id1"].

using proc or cte has nothing to do with full scan.
in your case scenario, you only have to use recursive cte , but adding an index on id, parentid will surely help
create index idx_name on tablename (parentid , id);
also 10k rows its not that big , index will definitely improve cte alot.

I think the best sql approach is recursive CTE (Common Table Expressions), did you try it? I never tried it with many rows.
WITH recursive nodes AS (
SELECT id, parent_id
FROM t
WHERE parent_id = 'dir1'
UNION ALL
SELECT t.id, t.parent_id
FROM nodes n
INNER JOIN t ON t.parent_id = n.id
)
SELECT id
FROM nodes;
Output for parent_id = 'dir1'
id
dir1_id1
dir1_dir1
dir1_dir1_id1

select distinct from 2 columns but only 1 is duplicate

select a.subscriber_msisdn, war.created_datetime from
(
select distinct subscriber_msisdn from wiz_application_response
where application_item_id in
(select id from wiz_application_item where application_id=155)
and created_datetime between '2012-10-07 00:00' and '2012-11-15 00:00:54'
) a
left outer join wiz_application_response war on (war.subscriber_msisdn=a.subscriber_msisdn)
the sub select returns 11 rows but when joined return 18 (with duplicates). The objective of this query is only add the date column to the 11 rows of the sub select.

Based on your description, it stands to reason that there are multiple created_datetime values for some of the subscriber_msisdn values which is what prompted you to use the distinct in the subquery to begin with. By joining the sub query to the original table you are defeating this. A cleaner way to write the query would be:
SELECT
war.subscriber_msisdn
, war.created_datetime
FROM
wiz_application_response war
LEFT JOIN wiz_application_item wai
ON war.application_item_id = wai.id
AND wai.application_id = 155
WHERE
war.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
This should return only the rows from the war table that satisfy the criteria based on the wai table. It should not be and outer join unless you wanted to return all the rows from war table that satisfied the created_datetime parameter regardless of the application_item_id parameter.
This is my best guess based on the limited information I have about your tables and what I’m assuming you’re trying to accomplish. If this doesn’t get you what you are after, I will continue to offer other ideas based on additional information you could provide. Hope this works.

Can most probably simplified to this:
SELECT DISTINCT ON (1)
r.subscriber_msisdn, r.created_datetime
FROM wiz_application_item i
JOIN wiz_application_response r ON r.application_item_id = i.id
WHERE i.application_id = 155
AND i.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
ORDER BY 1, 2 DESC -- to pick the latest created_datetime
Details depend on missing information.
More explanation here.

query gives two of the same results

I have the following SQL query but I got a problem:
When I execute it I got two of the same serial numbers from the "sn" column in the "products" table.
SELECT specifications.productname,
products.sn, specifications.year,
lendings.lending_date
FROM products
INNER JOIN lendings ON products.id = lendings.product_id
INNER JOIN specifications ON products.sn LIKE CONCAT(\'%\', specifications.sn, \'%\') OR products.type LIKE CONCAT(\'%\', specifications.type, \'%\')
WHERE lendings.user_id = ?
EDIT:
lendings table:
user_id product_id
1 1
1 2
2 3
Specifications table:
productname year type sn
name1 2012 1 1234
name2 2011 2 4321
name3 2010 3 3241
products table:
id sn
1 AAAAAAAA1234
2 BBBBBBBB4321
3 CCCCCCCC3241
EDIT2:
SELECT products.id,
specifications.productname,
products.sn,
specifications.year,
lendings.lending_date
FROM products
INNER JOIN lendings ON products.id = lendings.product_id
INNER JOIN specifications ON products2.sn LIKE CONCAT(specifications.sn, \'%\') OR products.type = specifications.type
WHERE lendings.user_id = ?

One of your Join on conditions is too slack then
for instance two lendings records pointing to the same product.

Usually, that means you don't have all the necesary join columns present in one of your joins and you are getting a cartesian product. In database terms, this means you are joining to a table and expected to join to a single row, but multiple rows match the criteria, so you are actually joining to more than one row. When this happens, you will get the same row multiple times (product row in your example) in your result.
It would have been better if you posted some test data so this scenario could be confirmed, but since you didn't, I would recommend checking each of your joins to make sure you are not getting multiple rows back for the given products row.
One part of your query I find particularly suspect is this join:
INNER JOIN specifications ON products.sn LIKE CONCAT(\'%\', specifications.sn, \'%\') OR products.type LIKE CONCAT(\'%\', specifications.type, \'%\')
You're joining using a LIKE operator, which seems to have a high chance of getting multiple rows.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Redshift: replace FULL OUTER for a CROSS JOIN - amazon-redshift

Related

PostgreSQL how do I COUNT with a condition?

Postgres SQL query group by get most recent record instead of an aggregate

R2DBC adjacency list get all children

select distinct from 2 columns but only 1 is duplicate

query gives two of the same results

Categories

Resources