How can you filter for only the max value from from a queried table in Postgresql? - postgresql

I'm fairly new to Postgresql and my problem can be simplified to the following:
Suppose that I have 2 tables:
Table A:
id | join_value | filter_data1 | filter_data2
---------------------------------------------
1 | 1 | "Yes" | 1
2 | 1 | "Yes" | 3
3 | 2 | "No" | 0
Table B:
id | join_value | filter_data1 | filter_data2 | date
---------------------------------------------------------
1 | 3 | "Yes" | 0 | 1/3/2021
2 | 1 | "Yes" | 17 | 1/3/2021
3 | 1 | "No" | -1 | 1/2/2021
4 | 1 | "Yes" | 32 | 1/2/2021
5 | 1 | "Yes" | 40 | 1/3/2021
I would like to filter these tables on the filter data and then join them on the join value. The catch is that I would then like to only grab the values that have a date == MAX(date). Here is an example of a query that I have attempted.
SELECT * FROM
(SELECT * FROM A
WHERE filter_data1 = "Yes"
AND filter_data2 > 2)
AS a_tab
JOIN
(SELECT * FROM B
WHERE filter_data1 = "Yes"
AND filter_data2 > 16)
AS b_tab
ON a_tab.join_value = b_tab.join_value;
This would give me the following table:
id | join_value | filter_data1 | filter_data2 | id | filter_data1 | filter_data2 | date
------------------------------------------------------------------------------------------
2 | 1 | "Yes" | 3 | 2 | "Yes" | 17 | 1/3/2021
2 | 1 | "Yes" | 3 | 4 | "Yes" | 32 | 1/2/2021
2 | 1 | "Yes" | 3 | 5 | "Yes" | 40 | 1/3/2021
But the problem is, I would like to also do a 'WHERE date = MAX(date)'
The resulting table would be this:
id | join_value | filter_data1 | filter_data2 | id | filter_data1 | filter_data2 | date
------------------------------------------------------------------------------------------
2 | 1 | "Yes" | 3 | 2 | "Yes" | 17 | 1/3/2021
2 | 1 | "Yes" | 3 | 5 | "Yes" | 40 | 1/3/2021
Does anyone have any ideas how to accomplish this?

At first, let me give you a hint, how you can write your existing select query in a way that it is better readable:
SELECT
a.*, b.*
FROM a
INNER JOIN b ON b.join_value = a.join_value
WHERE a.filter_data1 = 'YES' AND a.filter_data2 > 2
AND b.filter_data1 = 'YES' AND b.filter_data2 > 16
Now I am going to add another column to this query, that holds the maximum value of the date column of the output. Therefore, we can use a WINDOW FUNCTION:
SELECT
a.*, b.*, MAX(b.date) OVER ()
FROM a
INNER JOIN b ON b.join_value = a.join_value
WHERE a.filter_data1 = 'YES' AND a.filter_data2 > 2
AND b.filter_data1 = 'YES' AND b.filter_data2 > 16
As the WINDOW FUNCTION is the part of the query, that is computed in the last step, we cannot add a condition here. So we use this query as a subquery and add the condition to the top-level-query:
SELECT
*
FROM (
SELECT
a.*, b.*, MAX(b.date) OVER () AS max_date
FROM a
INNER JOIN b ON b.join_value = a.join_value
WHERE a.filter_data1 = 'YES' AND a.filter_data2 > 2
AND b.filter_data1 = 'YES' AND b.filter_data2 > 16
) t
WHERE t.date = t.max_date
This should give you the required results.

Related

Get dummy columns from different tables

I have three different tables that look like that:
Table 1
| id | city|
|----|-----|
| 1 | A |
| 1 | B |
| 2 | C |
Table 2
| id | city|
|----|-----|
| 2 | B |
| 1 | B |
| 3 | C |
Table 3
| id | city|
|----|-----|
| 1 | A |
| 1 | B |
| 2 | A |
I need to create one column for each table, and the dummies values if it's present.
| id | city| is_tbl_1 | is_tbl_2 | is_tbl_3 |
|----|-----|-----------|-------------|------------|
| 1 | A | 1 | 0 | 1 |
| 1 | B | 1 | 1 | 1 |
| 2 | A | 0 | 0 | 1 |
| 2 | C | 1 | 0 | 0 |
| 2 | B | 0 | 1 | 0 |
| 3 | C | 0 | 1 | 0 |
I have tried to add the columns is_tbl# myself on three different selects, UNION all the three tables and group, but it looks ugly, is there a better way to do it?
You can outer-join the 3 tables on id and city, then group by the id and city, and finally count the number of non-null values of the city columns :
SELECT
COALESCE (t1.id, t2.id, t3.id) AS id
, COALESCE (t1.city, t2.city, t3.city) AS city
, count(*) FILTER (WHERE t1.city IS NOT NULL) AS is_tbl_1
, count(*) FILTER (WHERE t2.city IS NOT NULL) AS is_tbl_2
, count(*) FILTER (WHERE t3.city IS NOT NULL) AS is_tbl_3
FROM
t1 AS t1
FULL OUTER JOIN
t2 AS t2 ON t1.id = t2.id AND t1.city = t2.city
FULL OUTER JOIN
t3 AS t3 ON t1.id = t3.id AND t1.city = t3.city
GROUP BY
1,2
ORDER BY
1,2

PostgreSQL limit by group, only show first 2 store options

I need to select first 2 lines where the store_name is different than one given for a given product
id | store_name | prod_name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | A
5 | 2 | E
6 | 2 | A
7 | 3 | G
8 | 2 | A
9 | 1 | A
10 | 3 | A
(10 rows)
result should be store_name <> 3 AND prod_name ='A'
id | store_name | prod_name
----+------------+------
1 | 1 | A
4 | 1 | A
6 | 2 | A
8 | 2 | A
Use the row_number() window function to accomplish this.
Query #1
with first_two as (
select *,
row_number() over (partition by store_name
order by id) as rn
from store_product
where store_name <> 3
and prod_name = 'A'
)
select id, store_name, prod_name
from first_two
where rn <= 2;
| id | store_name | prod_name |
| --- | ---------- | --------- |
| 1 | 1 | A |
| 4 | 1 | A |
| 6 | 2 | A |
| 8 | 2 | A |
View on DB Fiddle

Find rows in relation with at least n rows in a different table without joins

I have a table as such (tbl):
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
| 3 | faso | 11 |
+----+------+-----+
And another table in n-to-1 relationship with tbl (tbl2):
+----+-----+
| pk | rel |
+----+-----+
| 0 | 0 |
| 1 | 1 |
| 2 | 0 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 1 |
| 7 | 2 |
+----+-----+
(tbl2.rel -> tbl.pk.)
I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.
I.e., for n = 2, I want this table:
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
+----+------+-----+
This is the solution I came up with:
SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
SELECT tbl.pk
FROM tbl
RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
GROUP BY tbl.pk
HAVING COUNT(tbl2.*) >= 2 -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk
Can it be done without selecting the candidates with a subquery and re-joining the table with itself?
I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.
OK, just join once, as below:
select
t1.pk,
t1.attr,
t1.val
from
tbl t1
join
tbl2 t2 on t1.pk = t2.rel
group by
t1.pk,
t1.attr,
t1.val
having(count(1)>=2) order by t1.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Or just join once and use CTE(with clause), as below:
with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Is the SQL clearer?

Populate zero values in column with next value greater than zero

I have the following Postres code:
SELECT
a.assessmentid,
b.groupid
FROM wo_assessment a
LEFT JOIN wo_group_info b ON a.assessmentid = b.assessmentid
WHERE a.workorderid=2
ORDER BY a.assessmentid
Which returns the following results:
|-------------------|------------|
| assessmentid | groupid |
|-------------------|------------|
| 5 | 5 |
|-------------------|------------|
| 6 | 4 |
|-------------------|------------|
| 7 | 0 |
|-------------------|------------|
| 8 | 5 |
|-------------------|------------|
| 9 | 0 |
|-------------------|------------|
| 10 | 0 |
|-------------------|------------|
I would like to populate the 0 values in the groupid field with the next number above in that column, that isn't 0.
So for example I want my table to look like this:
|-------------------|------------|
| assessmentid | groupid |
|-------------------|------------|
| 5 | 5 |
|-------------------|------------|
| 6 | 4 |
|-------------------|------------|
| 7 | 4 |
|-------------------|------------|
| 8 | 5 |
|-------------------|------------|
| 9 | 5 |
|-------------------|------------|
| 10 | 5 |
|-------------------|------------|
Here is what worked for me:
SELECT q.assessmentid,
first_value(b.groupid ) over (partition by value_partition order by q.assessmentid) FROM (
SELECT a.assessmentid,
b.groupid ,
sum(case when b.groupid is null then 0 else 1 end) over (order by a.assessmentid) as value_partition
FROM wo_assessment as a
LEFT JOIN wo_group_info b ON a.assessmentid = b.assessmentid
ORDER BY a.assessmentid ) as q
LEFT JOIN wo_group_info b ON q.assessmentid = b.assessmentid

1th and 7th row in grouping

I have this table named Samples. The Date column values are just symbolic date values.
+----+------------+-------+------+
| Id | Product_Id | Price | Date |
+----+------------+-------+------+
| 1 | 1 | 100 | 1 |
| 2 | 2 | 100 | 2 |
| 3 | 3 | 100 | 3 |
| 4 | 1 | 100 | 4 |
| 5 | 2 | 100 | 5 |
| 6 | 3 | 100 | 6 |
...
+----+------------+-------+------+
I want to group by product_id such that I have the 1'th sample in descending date order and a new colomn added with the Price of the 7'th sample row in each product group. If the 7'th row does not exist, then the value should be null.
Example:
+----+------------+-------+------+----------+
| Id | Product_Id | Price | Date | 7thPrice |
+----+------------+-------+------+----------+
| 4 | 1 | 100 | 4 | 120 |
| 5 | 2 | 100 | 5 | 100 |
| 6 | 3 | 100 | 6 | NULL |
+----+------------+-------+------+----------+
I belive I can achieve the table without the '7thPrice' with the following
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Product_Id ORDER BY date DESC) r, * FROM Samples
) T WHERE T.r = 1
Any suggestions?
You can try something like this. I used your query to create a CTE. Then joined rank1 to rank7.
;with sampleCTE
as
(SELECT ROW_NUMBER() OVER (PARTITION BY Product_Id ORDER BY date DESC) r, * FROM Samples)
select *
from
(select * from samplecte where r = 1) a
left join
(select * from samplecte where r=7) b
on a.product_id = b.product_id