Count the unique and duplicate values in a column using Postgresql - postgresql

Goal: Find the count of uniques and duplicates in the worker_ref_id column.
I find the solution here in MySQL but IF does not exist in PostgreSQL. So, how would I do that in PostgreSQL?
I have the following table:
|worker_ref_id|bonus_amount|
| 1| 5000|
| 2| 3000|
| 3| 4000|
| 1| 4500|
| 2| 3500|
I would like the following output:
|Unique|Duplicates|
|1 |2 |
I get the right answer but it appears as two rows rather than two columns and one row:
SELECT COUNT(*) AS "Duplicate" FROM (SELECT worker_ref_id,
COUNT(worker_ref_id) AS "Count"
FROM bonus
GROUP BY worker_ref_id
HAVING COUNT(worker_ref_id) > 1) AS mySub
UNION
SELECT COUNT(*) AS "Unique" FROM (SELECT worker_ref_id,
COUNT(worker_ref_id) AS "Count"
FROM bonus
GROUP BY worker_ref_id
HAVING COUNT(worker_ref_id) = 1) AS mySub2

We can do this in two steps, using a CTE:
WITH cte AS (
SELECT worker_ref_id, COUNT(*) AS cnt
FROM bonus
GROUP BY worker_ref_id
)
SELECT
COUNT(*) FILTER (WHERE cnt = 1) AS "Unique",
COUNT(*) FILTER (WHERE cnt > 1) AS Duplicates
FROM cte;

Related

How to select specific items in IN sql

I have a table "products" with a column called "store_id".
This table has a lot af products from many stores.
I need to select 4 random products from 4 specific stores (id: 1, 34, 45, 100).
How can I do that?
I've tried to like this:
SELECT * FROM products WHERE store_id IN (1, 34, 45, 100)
But that query returns duplicated records (by store_id).
I need the following result:
store_id
title
1
title a
34
title b
45
title c
100
title d
To get a true random pick of the products use a row_number function with random order.
This query shows all data with a random index of the product for each store
select products.*,
row_number() over (partition by store_id order by random()) rn
from products
where store_id in (1,34)
store_id|product_id|title|rn|
--------+----------+-----+--+
1| 1|a | 1|
1| 3|c | 2|
1| 2|b | 3|
34| 6|f | 1|
34| 7|g | 2|
34| 8|h | 3|
34| 5|e | 4|
34| 4|d | 5|
To get only one product per store simple filter with rn=1
with prod as (
select products.*,
row_number() over (partition by store_id order by random()) rn
from products
where store_id in (1,34)
)
select store_id, title from prod
where rn = 1
;
store_id|title|
--------+-----+
1|a |
34|e |
Note this query will produce a different result on each run. If you need a stability you must call setseed before each execution. E.g.
SELECT setseed(1)
Use the DISTINCT construct to get unique records for the desired column:
SELECT distinct on (store_id) store_id, title FROM products WHERE store_id IN (1, 34, 45, 100);
Demo in sqldaddy.io

concatenate a column value for several rows based on condition

I have a table which has format like this (id is the pk)
id|timestamps |year|month|day|groups_ids|status |SCHEDULED |uid|
--|-------------------|----|-----|---|----------|-------|-------------------|---|
1|2021-02-04 17:18:24|2020| 8| 9| 1|OK |2020-08-09 00:00:00| 1|
2|2021-02-04 17:18:09|2020| 9| 9| 1|OK |2020-09-09 00:00:00| 1|
3|2021-02-04 17:19:51|2020| 10| 9| 1|HOLD |2020-10-09 00:00:00| 1|
4|2021-02-04 17:19:04|2020| 10| 10| 2|HOLD |2020-10-09 00:00:00| 1|
5|2021-02-04 17:18:30|2020| 10| 11| 2|HOLD |2020-10-09 00:00:00| 1|
6|2021-02-04 17:18:57|2020| 10| 12| 2|OK |2020-10-09 00:00:00| 1|
7|2021-02-04 17:18:24|2020| 8| 9| 1|HOLD |2020-08-09 00:00:00| 2|
8|2021-02-04 17:18:09|2020| 9| 9| 2|HOLD |2020-09-09 00:00:00| 2|
9|2021-02-04 17:19:51|2020| 10| 9| 2|HOLD |2020-10-09 00:00:00| 2|
10|2021-02-04 17:19:04|2020| 10| 10| 2|HOLD |2020-10-09 00:00:00| 2|
11|2021-02-04 17:18:30|2020| 10| 11| 2|HOLD |2020-10-09 00:00:00| 2|
12|2021-02-04 17:18:57|2020| 10| 12| 2|HOLD |2020-10-09 00:00:00| 2|
the job is i want to extract every group_ids for each uid when the status is OK order by SCHEDULED ascended, and if there's no OK found in the record for the uid it will takes for the latest HOLD based on year month and day. After that I want to make a weighing score with each group_ids:
group_ids > score
1 > 100
2 > 80
3 > 60
4 > 50
5 > 10
6 > 50
7 > 0
so if [1,1,2] will be change to (100+100+80) = 280
it will look like this:
ids|uid|pattern|score|
---|---|-------|-----|
1| 1|[1,1,2]| 280|
2| 2|[2] | 80|
It's pretty hard since i cannot found any operators like python for loop and append operators in PostgreSQL
step-by-step demo:db<>fiddle
SELECT
s.uid, s.values,
sum(v.value) as score
FROM (
SELECT DISTINCT ON (uid)
uid,
CASE
WHEN cardinality(ok_count) > 0 THEN ok_count
ELSE ARRAY[last_value]
END as values
FROM (
SELECT
*,
ARRAY_AGG(groups_ids) FILTER (WHERE status = 'OK') OVER (PARTITION BY uid ORDER BY scheduled)as ok_count,
first_value(groups_ids) OVER (PARTITION BY uid ORDER BY year, month DESC) as last_value
FROM mytable
) s
ORDER BY uid, scheduled DESC
) s,
unnest(values) as u_group_id
JOIN (VALUES
(1, 100), (2, 80), (3, 60), (4, 50), (5,10), (6, 50), (7, 0)
) v(group_id, value) ON v.group_id = u_group_id
GROUP BY s.uid, s.values
Phew... quite complex. Let's have a look at the steps:
a)
SELECT
*,
-- 1:
ARRAY_AGG(groups_ids) FILTER (WHERE status = 'OK') OVER (PARTITION BY uid ORDER BY scheduled)as oks,
-- 2:
first_value(groups_ids) OVER (PARTITION BY uid ORDER BY year, month DESC) as last_value
FROM mytable
Using the array_agg() window function to create an array of group_ids without loosing the other data as we would with simple GROUP BY. The FILTER clause is to put only the status = OK records into the array.
Find the last group_id of a group (partition) using the first_value() window function. In descending order is returns the last value.
b)
SELECT DISTINCT ON (uid) -- 2
uid,
CASE -- 1
WHEN cardinality(ok_count) > 0 THEN ok_count
ELSE ARRAY[last_value]
END as values
FROM (
...
) s
ORDER BY uid, scheduled DESC -- 2
The CASE clause either takes the previously created array (from step a1) or, if there is none, it takes the last value (from step a2), creates an one-elemented array.
The DISTINCT ON clause returns only the first element of an ordered group. The group is your uid and the order is given by the column scheduled. Since you don't want the first, but last records within the group, you have to order it DESC to make the most recent one the topmost record. That is taken by the DISTINCT ON
c)
SELECT
uid,
group_id
FROM (
...
) s,
unnest(values) as group_id -- 1
The arrays should be extracted into one record per element. That helps to join the weighted values later.
d)
SELECT
s.uid, s.values,
sum(v.weighted_value) as score -- 2
FROM (
...
) s,
unnest(values) as u_group_id
JOIN (VALUES
(1, 100), (2, 80), ...
) v(group_id, weighted_value) ON v.group_id = u_group_id -- 1
GROUP BY s.uid, s.values -- 2
Join your weighted value on the array elements. Naturally, this can be a table or query or whatever.
Regroup the uid groups to calculate the SUM() of the weighted_values
Additional note:
You should avoid duplicate data storing. You don't need to store the date parts year, month and day if you also store the complete date. You can always calculate them from the date.

Counting items in postgresql with outputting zero for those no found

I make the following request to count my items:
SELECT COUNT(cee."entryId"), cee."categoryId" FROM category_entries_entry cee
WHERE cee."categoryId" IN (1, 2, 3)
GROUP BY cee."categoryId";
If items with ids 1 and 2 not found then I will only see the result for item with id = 3. Nevertheless I would like to get the following output:
count|categoryId|
-----|----------|
1| 0|
2| 0|
3| 5|
How do I achieve it?
Meta:
PostgreSQL version: 12.3
Use a left join against a values clause:
SELECT COUNT(cee."entryId"),
t.id as category_id
FROM (
values (1),(2),(3)
) as t(id)
left join category_entries_entry cee on cee."categoryId" = t.id
GROUP BY t.id;

Postgresql select, show fixed count rows

Simple question. I have a table "tablename" with 3 rows. I need show 5 rows in my select when count rows < 5.
select * from tablename
+------------------+
|colname1 |colname2|
+---------+--------+
|1 |AAA |
|2 |BBB |
|3 |CCC |
+---------+--------+
In this query I show all rows in the table.
But I need show 5 rows. 2 rows is empty.
For example (I need):
+------------------+
|colname1 |colname2|
+---------+--------+
|1 |AAA |
|2 |BBB |
|3 |CCC |
| | |
| | |
+---------+--------+
Last 2 rows is empty.
It is possible?
Something like this:
with num_rows (rn) as (
select i
from generate_series(1,5) i -- adjust here the desired number of rows
), numbered_table as (
select colname1,
colname2,
row_number() over (order by colname1) as rn
from tablename
)
select t.colname1, t.colname2
from num_rows r
left outer join numbered_table t on r.rn = t.rn;
This assigns a number for each row in tablename and joins that to a fixed number of rows. If you know that your values in colname1 are always sequential and without gaps (which is highly unlikely) then you can remove the generation of row numbers in the second CTE using row_number().
If you don't care which rows are returned, you can leave out the order by part - but then the rows that are matched will be random. Leaving out the order by will be a bit more efficient.
The above will always return exactly 5 rows, regardless of how many rows tablename contains. If you want at least 5 rows, then you need to flip the outer join:
....
select t.colname1, t.colname2
from numbered_table t
left outer join num_rows r on r.rn = t.rn;
SQLFiddle example: http://sqlfiddle.com/#!15/e5770/3

T-SQL group by and filter for all

Please edit my question for a better title (and remove this line)
Hi I have a table
id|Col_A|Col_B|
--|-----|-----|
1| A | NULL|
2| A | a |
3| B | a |
4| C | NULL|
5| C | NULL|
I want to select only those rows where all Col_B is NULL group by Col_A (i.e. only C should be returned in this example)
Tried
SELECT Col_A FROM MYTABLE WHERE Col_B IS NULL GROUP BY Col_A
which gives me A and C. Also tried
SELECT Col_A, Col_B FROM MYTABLE GROUP BY Col_A, Col_B HAVING Col_B IS NULL
which also gives me A and C.
How do you write a T-SQL query to select a group only when all match the criteria?
Thank you
This seems to work:
declare #t table (id int,Col_A char(1),col_b char(1))
insert into #t(id,Col_A,Col_B) values
(1,'A',NULL),
(2,'A', 'a' ),
(3,'B', 'a' ),
(4,'C',NULL),
(5,'C',NULL)
select Col_A from #t group by Col_A having MAX(Col_B) is null
Because MAX (or MIN) can only return NULL if all of the inputs were NULL.
Result:
Col_A
-----
C
Similarly, if you wanted to check that all of the values where all some specific, non-NULL value, you could check that by first confirming the the MIN() and MAX() of that column were equal and then checking either one of them against the sought for value:
select Col_A from #t group by Col_A
having MAX(Col_B) = MIN(Col_B) and
MIN(Col_B) = 'a'
Would return just B, since that's the only group where all input values are equal to a.