Postgresql summing duplicate elements - postgresql

In the table can exsist 2 lines that give the same information only a single column value is different. Basically the data is duplicated because of this 1 column. Can I somehow sum otherelement in such a manner that it takes this duplication into account ?
To illustrate the idea of the problem
Example:
|id|type|val1|val2|
|1 | 2 | 1 | 1 |
|1 | 3 | 1 | 1 |
|1 | 2 | 2 | 2 |
|1 | 3 | 2 | 2 |
Expected result
|id|type|val1|val2|count|
|1 |2,3 | 3 | 3 | 2 |
Actual result
|id|type|val1|val2|count|
|1 |2,3 | 6 | 6 | 4 |
In the actual data the type and val come from 2 different tables connected by 3rd table, so the query is like this:
SELECT id,
array_to_string(array_agg(DISTINCT x.type ORDER BY x.type), ','::text) AS type,
sum(y.val1) AS val1,
sum(y.val2) AS val2,
count(y.val1) AS count
FROM a
JOIN x ON x.a_id = a.id AND x.active = true
JOIN y ON y.a_id = a.id AND y.active = true
GROUP BY a.id
SOLUTION
SELECT id,
array_to_string(array_agg(DISTINCT x.type ORDER BY x.type), ','::text) AS type,
sum(distinct y.val1) AS val1,
sum(distinct y.val2) AS val2,
count(distinct y.val1) AS count
FROM a
JOIN x ON x.a_id = a.id AND x.active = true
JOIN y ON y.a_id = a.id AND y.active = true
GROUP BY a.id

Related

SQL select the max from each group and given them different lables

For the following tables:
-- People
id | category | count
----+----------+-------
1 | a | 2
1 | a | 3
1 | b | 2
2 | a | 2
2 | b | 3
3 | a | 1
3 | a | 2
I know that I can find the max count for each id in each category by doing:
SELECT id, category, max(count) from People group by category, id;
With result:
id | category | max
----+----------+-------
1 | a | 3
1 | b | 2
2 | a | 2
2 | b | 3
3 | a | 2
But what if now I want to label the max values differently, like:
id | max_b_count | max_a_count
----+-------------+------------
1 | 2 | 3
2 | 3 | 2
3 | Null | 2
Should I do something like the following?
WITH t AS (SELECT id, category, max(count) from People group by category, id)
SELECT t.id, t.count as max_a_count from t where t.category = 'a'
FULL OUTER JOIN t.id, t.count as max_b_count from t where t.category = 'b'
on t.id;
It looks weird to me.
This is the exact use case why the filter_clause was added to the Aggregate Expressions
With filter_clause you may limit which row you aggregate
aggregate_name ( * ) [ FILTER ( WHERE filter_clause ) ]
Your example
SELECT id,
max(count) filter (where category = 'a') as max_a_count,
max(count) filter (where category = 'b') as max_b_count
from People
group by id
order by 1;
id|max_a_count|max_b_count|
--+-----------+-----------+
1| 3| 2|
2| 2| 3|
3| 2| |
This is one way you can do it:
with T as (select id, category, max(count_ab) maks
from people
group by id, category
order by id)
select t3.id
, (select t1.maks from T t1 where category = 'b' and t1.id = t3.id) max_b_count
, (select t2.maks from T t2 where category = 'a' and t2.id = t3.id) max_a_count
from T t3
group by t3.id
order by t3.id
Here is a demo
Also, as you can see, I have changed the name of the column count to count_ab because it is not a good practice to use keywords as columns names.

Left Join two tables - dont include the joins where second table has more than 1 row for value from first table; rejects

As title said, I want to reject rows, so I will not create duplicates.
And first step is not to join on values that have more rows in second table.
Here is an example if needed:
Table a:
aa |bb |
---|----|
1 |111 |
2 |222 |
Table h:
hh |kk |
---|----|
1 |111 |
2 |111 |
3 |222 |
Using Normal Left join:
SELECT
*
FROM a
LEFT JOIN h
ON a.bb = h.kk
;
I get:
aa |bb |hh |kk |
---|----|---|----|
1 |111 |1 |111 |
1 |111 |2 |111 |
2 |222 |3 |222 |
I want to get rid of first two rows, where aa = 1.
...
And second step would be for another query, probably with some case, where is table a I will filter out only those rows which have in table b more than 2 rows.
Therefore I want to create table c, where i will have:
aa |bb |
---|----|
1 |111 |
Can someone help me please?
Thank you.
To get only the 1:1 joins
SELECT a.aa,h.hh,h.kk FROM a
LEFT JOIN h ON a.bb = h.kk
GROUP BY bb HAVING COUNT(kk)=1
To get only the 1:n joins
SELECT a.aa,h.hh,h.kk FROM a
LEFT JOIN h ON a.bb = h.kk
GROUP BY bb HAVING COUNT(kk)>1

How to pivot postgresql (Amazon RDS) dataset around values to make a histogram?

I'm using Amazon RDS (Aurora) so don't have access to the crosstab() function.
My dataset is a count of particular actions per user and looks like:
| uid | action1 | action2 |
| alice | 2 | 2 |
| bob | 1 | 2 |
| charlie | 5 | 0 |
How can I pivot this dataset to make a histogram of action counts? So it would look like:
# | Action1 | Action2
---------------------
0 | | 1
1 | 1 |
2 | 1 | 2
3 | |
4 | |
5 | 1 |
6 | |
Here's a SQL fiddle I've been using with the values already entered: http://sqlfiddle.com/#!17/2b966/1
I have a solution but it is very verbose:
WITH nums AS (
SELECT n
FROM (VALUES (0), (1), (2), (3), (4), (5)) nums(n)
),
action1_counts as (
select
action1,
count(*) as total
from test
group by 1
),
action2_counts as (
select
action2,
count(*) as total
from test
group by 1
)
select
nums.n,
coalesce(a1.total, 0) as Action1,
coalesce(a2.total, 0) as Action2
from nums
LEFT join action1_counts a1 on a1.action1 = nums.n
LEFT join action2_counts a2 on a2.action2 = nums.n
order by 1
Assume action is between 0 and 6.
select a1.action, a1.action1, nullif(count(t2.action2),0) as action2
from
( select t.action, nullif(count(t1.action1),0) as action1
from
(select action from generate_series(0,6) g(action)) t
left join
test t1
on t1.action1 = t.action
group by t.action
) a1
left join
test t2
on t2.action2 = a1.action
group by a1.action, a1.action1
order by a1.action

How to eliminate repeated field with GROUP BY clause?

I have 3 tables called:
1.app_tenant pk:id, fk:pasar_id
---+--------+-----------+
id | nama | pasar_id |
----+--------+-----------+
1 | joe | 1 |
2 | adi | 2 |
3 | adam | 3 |
2.app_pasar pk:id
----+------------- +
id | nama |
----+------------- +
1 | kosambi |
2 | gede bage |
3 | pasar minggu |
3.app_kios pk:id, fk:tenant_id
----+---------------+----------
id | nama |tenant_id
----+-------------- +----------
1 | kios1 |1
2 | kios2 |2
3 | kios3 |3
4 | kios4 |1
5 | kios5 |1
6 | kios6 |2
7 | kios7 |2
8 | kios8 |3
9 | kios9 |3
Then with a LEFT JOIN query and grouping by id in every table I want to displaying data like this:
----+---------------+------------+-----------
id | nama_tenant |nama_pasar |nama_kios
----+-------------- +------------------------
1 | joe |kosambi |kios 1
2 | adi |gede bage |kios 2
2 | adam |pasar minggu|kios 3
but after I execute this query, data are not shown as expected. The problem is
redundancy in the nama_tenant field. How can I eliminate repeated nama_tenantrecords?
This is my query:
select a.id,a.nama as nama_tenant,
b.nama as nama_pasar,
c.nama as nama_kios
from app_tenant a
left join app_pasar b on a.id=b.id
left join app_kios c on a.id= c.tenant_id
group by
a.id,
b.id,
c.id
Table definitions:
CREATE TABLE app_tenant (
id serial PRIMARY KEY,
nama character varying,
pasar_id integer);
CREATE TABLE app_kios (
id serial PRIMARY KEY,
nama character varying,
tenant_id integer REFERENCES app_tenant);
The problem is that tenants can have multiple kiosks. From your sample data it looks like you want to display the first kiosk of every tenant (although "first" is a vague concept on strings, here I use alphabetical sort order). Your query would be like this:
SELECT t.id, t.nama AS nama_tenant, p.nama AS nama_pasar, k.nama AS nama_kios
FROM app_tenant t
LEFT JOIN app_pasar p ON p.id = t.pasar_id
LEFT JOIN (
SELECT tenant_id, nama, rank() OVER (PARTITION BY tenant_id ORDER BY nama) AS rnk
FROM app_kios
WHERE rnk = 1) k ON k.tenant_id = t.id
ORDER BY t.id
The sub-query on app_kios uses a window function to get the first kiosk name after sorting the names of the kiosk for each tenant.
I would also suggest to use meaningful aliases for table names instead of simply a, b, c.

postgres counting one record twice if it meets certain criteria

I thought that the query below would naturally do what I explain, but apparently not...
My table looks like this:
id | name | g | partner | g2
1 | John | M | Sam | M
2 | Devon | M | Mike | M
3 | Kurt | M | Susan | F
4 | Stacy | F | Bob | M
5 | Rosa | F | Rita | F
I'm trying to get the id where either the g or g2 value equals 'M'... But, a record where both the g and g2 values are 'M' should return two lines, not 1.
So, in the above sample data, I'm trying to return:
$q = pg_query("SELECT id FROM mytable WHERE ( g = 'M' OR g2 = 'M' )");
1
1
2
2
3
4
But, it always returns:
1
2
3
4
Your query doesn't work because each row is returned only once whether it matches one or both of the conditions. To get what you want use two queries and use UNION ALL to combine the results:
SELECT id FROM mytable WHERE g = 'M'
UNION ALL
SELECT id FROM mytable WHERE g2 = 'M'
ORDER BY id
Result:
1
1
2
2
3
4
you might try a UNION along these lines:
"SELECT id FROM mytable WHERE ( g = 'M') UNION SELECT id FROM mytable WHERE ( g2 = 'M')"
Hope this helps, Martin
SELECT id FROM mytable WHERE g = 'M'
UNION
SELECT id FROM mytable WHERE g2 = 'M'