Select with group by and count two columns - postgresql

I have this table
+----+----------------------------+---------------+----------+
| ID | CREATED_AT | CATEGORY | TYPE |
+----+----------------------------+---------------+----------+
| 1 | 2017-11-23 23:00:40.221958 | SEM COBERTURA | callback |
| 2 | 2017-11-23 22:58:36.970052 | VENDA | ativo |
| 3 | 2017-11-23 22:47:03.956185 | SEM COBERTURA | ativo |
| 4 | 2017-11-23 22:42:24.309915 | VENDA | ativo |
| 5 | 2017-11-23 22:32:48.780418 | SEM COBERTURA | callback |
| 6 | 2017-11-23 22:12:21.631433 | VENDA | callback |
| 7 | 2017-11-23 22:09:38.52699 | SEM COBERTURA | ativo |
| 8 | 2017-11-23 22:08:09.836343 | LIGACAO MUDA | callback |
| 9 | 2017-11-23 22:08:07.058063 | SEM COBERTURA | callback |
| 10 | 2017-11-23 22:07:02.067439 | LIGACAO MUDA | other |
+----+----------------------------+---------------+----------+
With the table above, i want group by and sum the column TYPE and sum the CATEGORY "VENDA", eg:
This is what i want
+----------+------------+----------------+
| TYPE | COUNT_TYPE | COUNT_CATEGORY_VENDA |
+----------+------------+----------------+
| callback | 5 | 1 |
| ativo | 4 | 2 |
| other | 1 | 0 |
+----------+------------+----------------+
The type "callback" appear 5 times and has 1 category "VENDA", "ativo" appear 4 times and has 2 "VENDA"...
To get TYPE and COUNT_TYPE i'm using this query:
SELECT TYPE, count(TYPE) AS COUNT_TYPE FROM table WHERE created_at BETWEEN '2017-11-23 00:00:00' AND '2017-11-23 23:59:00' GROUP BY TYPE ORDER BY COUNT_TYPE DESC
Can anyone help me, please?

You can use case when in postgresql
SELECT TYPE, count(TYPE) AS COUNT_TYPE, SUM(case CATEGORY when 'VENDA' then 1 else 0 end) FROM table WHERE created_at BETWEEN '2017-11-23 00:00:00' AND '2017-11-23 23:59:00' GROUP BY TYPE ORDER BY COUNT_TYPE DESC

There are several ways to do this. IMHO the most simple way is to use case when to filter what to count in query.
SELECT
TYPE,
count(TYPE) AS COUNT_TYPE,
SUM(
case CATEGORY when 'VENDA' then 1 else 0 end
) AS COUNT_CATEGORY_VENDA
FROM table
WHERE
created_at BETWEEN '2017-11-23 00:00:00' AND '2017-11-23 23:59:00'
GROUP BY TYPE
ORDER BY COUNT_TYPE DESC;
Besides CASE col WHEN d1 THEN v1 WHEN d2 THEN v2 ELSE v3 END, you can also try CASE WHEN col = d1 THEN v1 WHEN col = d2 THEN v2 ELSE v3 END.
Another way is to use sub-queries.

Related

PostgreSQL - Check if column value exists in any previous row

I'm working on a problem where I need to check if an ID exists in any previous records within another ID set, and create a tag if it does.
Suppose I have the following table
| client_id | order_date | supplier_id |
| 1 | 2022-01-01 | 1 |
| 1 | 2022-02-01 | 2 |
| 1 | 2022-03-01 | 1 |
| 1 | 2022-04-01 | 3 |
| 2 | 2022-05-01 | 1 |
| 2 | 2022-06-01 | 1 |
| 2 | 2022-07-01 | 2 |
And I want to create a column with a "is new supplier" tag (for each client):
| client_id | order_date | supplier_id | is_new_supplier|
| 1 | 2022-01-01 | 1 | True
| 1 | 2022-02-01 | 2 | True
| 1 | 2022-03-01 | 1 | False
| 1 | 2022-04-01 | 3 | True
| 2 | 2022-05-01 | 1 | True
| 2 | 2022-06-01 | 1 | False
| 2 | 2022-07-01 | 2 | True
First I tried doing this by creating a dense_rank and filtering out repeated ranks, but it didn't work:
with aux as (SELECT client_id,
order_date,
supplier_id
FROM table)
SELECT *, dense_rank() over (
partition by client_id
order by supplier_id
) as _dense_rank
FROM aux
Another way I thought about doing this, is by creating an auxiliary id with client_id + supplier_id, ordering by date and checking if the aux id exists in any previous row, but I don't know how to do this in SQL.
You are on the right track.
Instead of dense_rank, you can just use row_number and on your partition by add supplier id..
Don't forget to order by order_date
with aux as (SELECT client_id,
order_date,
supplier_id,
row_number() over (
partition by client_id, supplier_id
order by order_date
) as rank
FROM table)
SELECT client_id,
order_date,
supplier_id,
rank,
(rank = 1) as is_new_supplier
FROM aux

Postgres distinct rows whilst also summing

I have a dataset that is similar to this. I need to pick out the most recent metadata (greater execution time = more recent) for a client including the sum of quantities and the latest execution time and meta where the quantity > 0
| Name | Quantity | Metadata | Execution time |
| -------- | ---------|----------|----------------|
| Neil | 1 | [1,3] | 4 |
| James | 1 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
| James | -1 | Null | 8 |
| Neil | -1 | Null | 9 |
Eg the query needs to return:
| Name | Summed Quantity | Metadata | Execution time |
| -------- | ----------------|----------|----------------|
| James | 0 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
My query doesn't quite work as it's not returning the sum of the quantities correctly.
SELECT
distinct on (name) name,
(
SELECT
cast(
sum(quantity) as int
)
) as summed_quantity,
meta,
execution_time
FROM
table
where
quantity > 0
group by
name,
meta,
execution_time
order by
name,
execution_time desc;
This query gives a result of
| Name | Summed Quantity | Metadata | Execution time |
| -------- | ----------------|----------|----------------|
| James | 1 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
ie it's just taking the quantity > 0 from the where and not adding up the quantities in the sub query (i assume because of the distinct clause) I'm unsure how to fix my query to produce the desired output.
This can be achieved using window functions (hence with a single pass of the data)
select
name
, sum_qty
, metadata
, execution_time
from (
select
*
, sum(Quantity) over(partition by name) sum_qty
, row_number() over(partition by name, case when quantity > 0 then 1 else 0 end
order by Execution_time DESC) as rn
from mytable
) d
where rn = 1 and quantity > 0
order by name
result
+-------+---------+----------+----------------+
| name | sum_qty | metadata | execution_time |
+-------+---------+----------+----------------+
| James | 0 | [2,18] | 5 |
| Mike | 1 | [5,42] | 7 |
| Neil | 1 | [4,1] | 6 |
+-------+---------+----------+----------------+
db<>fiddle here

Condition lead results in postgres query

I have a table person_updates in postgresql with rows like:
| id | status | person_id | modified_at |
|----|--------|-----------|------------------|
| 1 | INFO | 2 | 2019-11-01 10:00 |
| 1 | UPDATE | 2 | 2019-11-02 15:00 |
| 1 | DEBUG | 2 | 2019-11-03 12:00 |
| 3 | INFO | 4 | 2019-11-04 14:00 |
| 3 | UPDATE | 4 | 2019-11-05 16:00 |
| 5 | INFO | 6 | 2019-11-06 08:00 |
| 5 | DEBUG | 6 | 2019-11-07 07:00 |
I want to get the INFO rows that are followed by an UPDATE row:
| id | status | person_id | modified_at |
|----|--------|-----------|------------------|
| 1 | INFO | 2 | 2019-11-01 10:00 |
| 3 | INFO | 4 | 2019-11-04 14:00 |
I've attempted this by doing a lead query
select d2.id, d2.status, d2.modified_at, d2.person_id,
lead(d2.status) over (partition by d2.id order by d2.modified_at) as next_status
from person_updates d2
where d2.status = 'INFO'
This returns more rows than I want. Adding a and d2.next_status = 'UPDATE' throws an error. How do I do this query?
Like this:
select t.id, t.status, t.modified_at, t.person_id
from (
select *,
lead(status) over (partition by id order by modified_at) as next_status
from person_updates
) t
where t.status = 'INFO' and t.next_status = 'UPDATE'
See the demo.
Results:
| id | status | modified_at | person_id |
| --- | ------ | ------------------------ | --------- |
| 1 | INFO | 2019-11-01T10:00:00.000Z | 2 |
| 3 | INFO | 2019-11-04T14:00:00.000Z | 4 |
You can use window function lead() to get the status of the next record. Since window functions are not allowed in the where clause, you need to turn the query to a subquery, and then filter in the outer query, like so:
select *
from (
select
t.*,
lead(status) over(partition by id order by modified_at) lead_status
from person_updates t
) t
where status = 'INFO' and lead_status = 'UPDATE'

How to fill Null with the previous value in PostgreSQL?

I have a table which contains Null values. I need to replace them with a previous non-Null value.
This is an example of data which I have:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | Null |
2018-01-03 | A | 0 | Null |
2018-01-04 | A | 0 | Null |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | Null |
2018-01-07 | B | 0 | Null |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | Null |
2018-01-10 | A | 0 | Null |
The result should look like this:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | 1 |
2018-01-03 | A | 0 | 1 |
2018-01-04 | A | 0 | 1 |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | 2 |
2018-01-07 | B | 0 | 2 |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | 3 |
2018-01-10 | A | 0 | 3 |
I tried the following query, but in this case, only the first Null value will be replaced.
select
date,
category,
start_period,
case
when period_number isnull then lag(period_number) over()
else period_number
end as period_number
from period_table;
Also, I tried to use first_value() window function, but I don't know how to set up the correct window.
Any help is highly appreciated.
You can join table with itself and get desired value. Assuming your date column is the primary key or unique.
update your_table upd set period_number = tbl.period_number
from
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t
inner join your_table tbl on tbl.date = t.d2
where t.date= upd.date
If you don't need to update the table but only a select statement then
select yt.date, yt.category, yt.start_period, tbl.period_number
from your_table yt
inner join
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t on yt.date = t.date
inner join your_table tbl on tbl.date = t.d2
If you replace your case statement with:
(
select
_.period_number
from
period_table as _
where
_.period_number is not null
and _.category = period_table.category
and _.date <= period_table.date
order by
_.date desc
limit 1
) as period_number
Then it should have the intended effect. It's nowhere near as elegant as a window function but I don't think window functions are quite flexible enough for your specific use case here (Or at least, if they are, I don't know how to flex them that much)
Examples of windows function and frame clause:
select
date,category,score
,FIRST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
) as last_score
from testing.rec_test
order by date, category
select
date,category,score
,LAST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as last_score
from testing.rec_test
order by date, category

Crosstab function and Dates PostgreSQL

I had to create a cross tab table from a Query where dates will be changed into column names. These order dates can be increase or decrease as per the dates passed in the query. The order date is in Unix format which is changed into normal format.
Query is following:
Select cd.cust_id
, od.order_id
, od.order_size
, (TIMESTAMP 'epoch' + od.order_date * INTERVAL '1 second')::Date As order_date
From consumer_details cd,
consumer_order od,
Where cd.cust_id = od.cust_id
And od.order_date Between 1469212200 And 1469212600
Order By od.order_id, od.order_date
Table as follows:
cust_id | order_id | order_size | order_date
-----------|----------------|---------------|--------------
210721008 | 0437756 | 4323 | 2016-07-22
210721008 | 0437756 | 4586 | 2016-09-24
210721019 | 10749881 | 0 | 2016-07-28
210721019 | 10749881 | 0 | 2016-07-28
210721033 | 13639 | 2286145 | 2016-09-06
210721033 | 13639 | 2300040 | 2016-10-03
Result will be:
cust_id | order_id | 2016-07-22 | 2016-09-24 | 2016-07-28 | 2016-09-06 | 2016-10-03
-----------|----------------|---------------|---------------|---------------|---------------|---------------
210721008 | 0437756 | 4323 | 4586 | | |
210721019 | 10749881 | | | 0 | |
210721033 | 13639 | | | | 2286145 | 2300040