PostgreSQL- get records with unique column combination - postgresql

I want to select the records that have a unique column combination in postgresql, however it doesn't seem to work with distinct as distinct only removes duplicates.
Example
ID A B
01 1 2
02 1 2
03 1 3
04 2 4
05 1 4
06 2 4
07 2 5
08 1 3
In this example row with ID 05 and 07 have unique combination AB, how can i get these records
SELECT ...

With NOT EXISTS:
select t.* from tablename t
where not exists (
select 1 from tablename
where id <> t.id and a = t.a and b = t.b
)
Or with COUNT() window function:
select t.id, t.a, t.b
from (
select *, count(id) over (partition by a, b) counter
from tablename
) t
where t.counter = 1
Or with aggregation:
select max(id) id, a, b
from tablename
group by a, b
having count(id) = 1
Or with a self LEFT join that excludes the matching rows:
select t.*
from tablename t left join tablename tt
on tt.id <> t.id and tt.a = t.a and tt.b = t.b
where tt.id is null
See the demo.
Results:
| id | a | b |
| --- | --- | --- |
| 05 | 1 | 4 |
| 07 | 2 | 5 |

Related

How to force query to return only first row from window?

I have data:
id | price | date
1 | 25 | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
Is it possible to write such query which will return only first row from window? something like LIMIT 1 but for the window OVER( date )?
I expect next result:
id | price | date
1 | 25 | 2019-01-01
1 | 27 | 2019-02-01
Or ignore whole window if first window row has NULL:
id | price | date
1 | NULL | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
result:
1 | 27 | 2019-02-01
Order the rows by date and id, and take only the first row per date.
Then remove those where the price is NULL.
SELECT *
FROM (SELECT DISTINCT ON (date)
id, price, date
FROM mytable
ORDER BY date, id
) AS q
WHERE price IS NOT NULL;
#Laurenz let me to provide a bit more explanation
select distinct on (<fldlist>) * from <table> order by <fldlist+>;
is equal to much more complex query:
select * from (
select row_number() over (partition by <fldlist> order by <fldlist+>) as rn,*
from <table>)
where rn = 1;
And here <fldlist> should be the beginning part (or equal) of <fldlist+>
As Myon on IRC said:
if you want to use a window function in WHERE, you need to put it into a subselect first
So the target query is:
select * from (
select
*
agg_function( my_field ) OVER( PARTITION BY other_field ) as agg_field
from sometable
) x
WHERE agg_field <condition>
In my case I have next query:
SELECT * FROM (
SELECT *,
FIRST_VALUE( p.price ) over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS first_price,
ROW_NUMBER() over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS row_number
FROM st
LEFT JOIN price p ON <COND>
LEFT JOIN currency_rate crate ON <COND>
) p
WHERE p.row_number = 1 AND p.first_price IS NOT null
Here I select only first rows from the group and where price IS NOT NULL

How do I join multiple select results into a single table?

I have a query which returns monthly averages from the same table, but for different pressure_level's:
SELECT some_id, avg(exposure_value) monthly_avg_1000
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
I then have the same query, but for a different pressure_level:
SELECT some_id, avg(exposure_value) monthly_avg_925
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
Both queries return 12 rows (1 per month) with the ID and the average value for the month:
some_id | monthly_avg_1000
--------------------------
1 | 0.000023
1 | 0.000051
1 | 0.000009
some_id | monthly_avg_925
--------------------------
1 | 0.000014
1 | 0.000007
1 | 0.000131
I would like to combine the two queries so that the monthly_avg_* columns all appear in the final table:
some_id | monthly_avg_1000 | monthly_avg_925
--------------------------
1 | 0.000023 | 0.000014
1 | 0.000051 | 0.000007
1 | 0.000009 | 0.000131
How can I do this?
if you have same id, then you can try join:
with a as (
SELECT some_id, avg(exposure_value) monthly_avg_1000,date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
, b as (
SELECT some_id, avg(exposure_value) monthly_avg_925, date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
select distinct a.some_id, monthly_avg_1000,monthly_avg_925
from a
join b on a.some_id = b.some_id and a.d = b.d

Combining three very similar queries? (Postgres)

So I have three queries. I'm trying to combine them all into one query. Here they are with their outputs:
Query 1:
SELECT distinct on (name) name, count(distinct board_id)
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY name
ORDER BY name ASC
Output:
A | 15
B | 26
C | 24
D | 11
E | 31
F | 32
G | 16
Query 2:
SELECT distinct on (name) name, count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1, board_id
ORDER BY name, total DESC
Output:
A | 435
B | 246
C | 611
D | 121
E | 436
F | 723
G | 293
Finally, the last query:
SELECT distinct on (name) name, count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1
ORDER BY name, total DESC
Output:
A | 14667
B | 65123
C | 87426
D | 55198
E | 80612
F | 31485
G | 43392
Is it possible to format it to be like this:
A | 15 | 435 | 14667
B | 26 | 246 | 65123
C | 24 | 611 | 87426
D | 11 | 121 | 55198
E | 31 | 436 | 80612
F | 32 | 723 | 31485
G | 16 | 293 | 43392
EDIT:
With #Clodoaldo Neto 's help, I combined the first and the third queries with this:
SELECT name, count(distinct board_id), count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1
ORDER BY description ASC
The only thing preventing me from combining the second query with this new one is the GROUP BY clause needing board_id to be in it. Any thoughts from here?
This is hard to get right without test data. But here is my try:
with s as (
select name, grouping(name, board_id) as grp,
count(distinct board_id) as dist_total,
count(*) as name_total,
count(*) as name_board_total
from
tablea
inner join
table_b on tablea.id = table_b.id
group by grouping sets ((name), (name, board_id))
)
select name, dist_total, name_total, name_board_total
from
(
select name, dist_total, name_total
from s
where grp = 1
) r
inner join
(
select name, max(name_board_total) as name_board_total
from s
where grp = 0
group by name
) q using (name)
order by name
https://www.postgresql.org/docs/current/static/queries-table-expressions.html#QUERIES-GROUPING-SETS

pl sql query recuresive looping

i have only one table "tbl_test"
Which have table filed given below
tbl_test table
trx_id | proj_num | parent_num|
1 | 14 | 0 |
2 | 14 | 1 |
3 | 14 | 2 |
4 | 14 | 0 |
5 | 14 | 3 |
6 | 15 | 0 |
Result i want is : when trx_id value 5 is fetched
it's a parent child relationship. so,
trx_id -> parent_num
5 -> 3
3 -> 2
2 -> 1
That means output value:
3
2
1
Getting all parent chain
Query i used :
SELECT * FROM (
WITH RECURSIVE tree_data(project_num, task_num, parent_task_num) AS(
SELECT project_num, task_num, parent_task_num
FROM tb_task
WHERE project_num = 14 and task_num = 5
UNION ALL
SELECT child.project_num, child.task_num, child.parent_task_num
FROM tree_data parent Join tb_task child
ON parent.task_num = child.task_num AND parent.task_num = child.parent_task_num
)
SELECT project_num, task_num, parent_task_num
FROM tree_data
) AS tree_list ;
Can anybody help me ?
There's no need to do this with pl/pgsql. You can do it straight in SQL. Consider:
WITH RECURSIVE my_tree AS (
SELECT trx_id as id, parent_id as parent, trx_id::text as path, 1 as level
FROM tbl_test
WHERE trx_id = 5 -- start value
UNION ALL
SELECT t.trx_id, t.parent_id, p.path || ',' || t.trx_id::text, p.level + 1
FROM my_tree p
JOIN tbl_text t ON t.trx_id = p.parent
)
select * from my_tree;
If you are using PostgresSQL, try using a WITH clause:
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

Joining many tables on same data and returning all rows

UPDATE:
my orgional attempt to use FULL OUTER JOIN did not work correctly. I have updated the question to reflex the true issue. Sorry for presenting a classic XY PROBLEM.
I'm trying to retrieve a dataset from multiple tables all in one query thats is grouped by year, month of the data.
The final result should look like this:
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | 14 |
Coming from data that looks like this:
Table 1:
| Year | Month | Data |
|------+-------+------|
| 2012 | 11 | 231 |
| 2012 | 12 | 534 |
Table 2:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 12 |
| 2013 | 1 | 22 |
Table 3:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 13 |
| 2013 | 1 | 14 |
I tried using FULL OUTER JOIN but this doesn't quite work because in my SELECT clause because no matter which table I select 'Year' and 'Month' from there are null values.
SELECT
Collase(t1.year,t2.year,t3.year)
,Collese(t1.month,t2.month,t3.month)
,t1.data as col1
,t2.data as col2
,t3.data as col3
From t1
FULL OUTER JOIN t2
on t1.year = t2.year and t1.month = t2.month
FULL OUTER JOIN t3
on t1.year = t3.year and t1.month = t3.month
Result is something like this (is too confusing to repeat exactly what i would get using this demo data):
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | |
| - | 1 | - | - | 14 |
If your data allows it (not 100 columns), this is usually a clean way of doing it:
select year, month, sum(col1) as col1, sum(col2) as col2, sum(col3) as col3
from (
SELECT t1.year, t1.month, t1.data as col1, 0 as col2, 0 as col3
From t1
union all
SELECT t2.year, t2.month, 0 as col1, t2.data as col2, 0 as col3
From t2
union all
SELECT t3.year, t3.month, 0 as col1, 0 as col2, t3.data as col3
From t3
) as data
group by year, month
If you are using SQL Server 2005 or later version, you could also try this PIVOT solution:
SELECT
Year,
Month,
Col1,
Col2,
Col3
FROM (
SELECT Year, Month, 'Col1' AS Col, Data FROM t1
UNION ALL
SELECT Year, Month, 'Col2' AS Col, Data FROM t2
UNION ALL
SELECT Year, Month, 'Col3' AS Col, Data FROM t3
) f
PIVOT (
SUM(Data) FOR Col IN (Col1, Col2, Col3)
) p
;
This query can be tested and played with at SQL Fiddle.
Perhaps you are looking for the COALESCE keyword? It takes a list of columns and returns the first one that is NOT NULL, or NULL if all arguments are null. In your example, you would do something like this.
SELECT COALESCE(t1.data, t2.data)
You would still need to join tables in this case. It would just cut down on the case statements.
You could derive the complete list of years and months from all the tables, than join every table to that list (using a left join):
SELECT
f.Year,
f.Month,
t1.data AS col1,
t2.data AS col2,
t3.data AS col3
FROM (
SELECT Year, Month FROM t1
UNION
SELECT Year, Month FROM t2
UNION
SELECT Year, Month FROM t3
) f
LEFT JOIN t1 ON f.year = t1.year and f.month = t1.month
LEFT JOIN t2 ON f.year = t2.year and f.month = t2.month
LEFT JOIN t3 ON f.year = t3.year and f.month = t3.month
;
You can see a live demonstration of this query at SQL Fiddle.
if you are looking for the non-null values from either tabloe then you will have to add t1.dat IS NOT NULL as well. I hope that I understand your question.
CREATE VIEW joined_SALES
AS SELECT t1.year, t1.month, t1.data , t2.data
FROM table1 t1, table2 t2
WHERE
t1.year = t2.year
and t1.month = t2.month
and t1.dat IS NOT NULL
GROUP BY t1.year, t1.month;
This might be a better way, especially if you are going to do something with the data before returning it. Basically you are translating the table the data came from into a typeId.
declare #temp table
([year] int,
[month] int,
typeId int,
data decimal)
insert into #temp
SELECT t1.year, t1.month, 1, sum(t1.data)
From t1
group by t1.year, t1.month
insert into #temp
SELECT t2.year, t2.month, 2, sum(t2.data)
From t2
group by t1.year, t1.month
insert into #temp
SELECT t3.year, t3.month, 3, sum(t3.data)
group by t1.year, t1.month
select t.year, t.month,
sum(case when t.typeId = 1 then t.data end) as col1,
sum(case when t.typeId = 2 then t.data end) as col2,
sum(case when t.typeId = 3 then t.data end) as col3
from #temp t
group by t.year, t.month