First and second time appearing row id in PostgreSQL - postgresql

Suppose we have a list of ids with date. And we want to know when the ids appeared for the first and the second time. About the first time, I have created a query that is
SELECT year, mon, COUNT(id) AS sum_first_id
FROM (
SELECT DISTINCT
ON (id) DATE, id
FROM TABLE
GROUP BY 2, 1
) AS foo
GROUP BY 2, 1
ORDER BY 1, 2;
I think that this works. But how could I find when the ids appear for the second time?

Let's say you have the table table_x:
select *
from table_x
order by 1, 2
id | date
----+------------
1 | 2015-06-04
1 | 2015-06-05
1 | 2015-06-14
2 | 2015-06-05
2 | 2015-06-08
2 | 2015-06-10
2 | 2015-06-17
2 | 2015-06-22
(8 rows)
To select n first element in groups use row_number() function:
select id, date
from (
select id, date, row_number() over (partition by id order by date) rn
from table_x
order by 1, 2
) sub
where rn <= 2
id | date
----+------------
1 | 2015-06-04
1 | 2015-06-05
2 | 2015-06-05
2 | 2015-06-08
(4 rows)
It does not appear that your query is correct.
SELECT year, mon, COUNT(id) AS sum_first_id -- what is year, mon?
FROM (
SELECT DISTINCT
ON (id) DATE, id
FROM TABLE
GROUP BY 2, 1 -- should be order by 2, 1
) AS foo
GROUP BY 2, 1
ORDER BY 1, 2;

Related

Select dates missing data in a range

I have a postgres table test_table that looks like this:
date | test_hour
------------+-----------
2000-01-01 | 1
2000-01-01 | 2
2000-01-01 | 3
2000-01-02 | 1
2000-01-02 | 2
2000-01-02 | 3
2000-01-02 | 4
2000-01-03 | 1
2000-01-03 | 2
I need to select all the dates which don't have test_hour = 1, 2, and 3, so it should return
date
------------
2000-01-03
Here is what I have tried:
SELECT date FROM test_table WHERE test_hour NOT IN (SELECT generate_series(1,3));
But that only returns dates that have extra hours beyond 1, 2, 3
You can use aggregation and conditional HAVING clauses, like so:
SELECT mydate
FROM mytable
GROUP BY mydate
HAVING
MAX(CASE WHEN test_hour = 1 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 2 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 3 THEN 1 END) != 1
Another possibility would be to join it against the series (or another subquery containing the hours) and do a [distinct] count on the hours aggregatet per date:
select date from tst
inner join (select generate_series(1,3) "hour") hours on hours.hour = tst.hour
group by tst.date
having count(distinct tst.hour) < 3;
or
select date from tst
where hour in (select generate_series(1,3))
group by date
having count(distinct tst.hour) < 3;
[You don't need the distinct if date/hour combinations in Your table are unique]
A solution using set difference, giving you exactly the rows that are missing:
(SELECT DISTINCT
date, all_hour
FROM test_table
CROSS JOIN generate_series(1,3) all_hour)
EXCEPT
(TABLE test_table)
And a solution using an array aggregate and the array contains operator:
SELECT date
FROM test_table
GROUP BY date
HAVING NOT array_agg(test_hour) #> ARRAY(SELECT generate_series(1,3))
(online demos)

Monthly Counting PostgreSQL giving months

I have a table for customers like this
cust_id | date_signed_up | location_id
-----------------------------------------
1 | 2019/01/01 | 1
2 | 2019/03/05 | 1
3 | 2019/06/17 | 1
What I need is to have a monthly count but having the months even if its 0. Ex:
monthly_count | count
-------------------------
Jan | 1
Feb | 0
Mar | 1
Apr | 0
(months can be in numbers)
Right now I made this query:
SELECT date_trunc('MONTH', (date_signed_up::date)) AS monthly, count(customer_id) AS count FROM customer
WHERE group_id = 1
GROUP BY monthly
ORDER BY monthly asc
but it's giving me just for the months there's information, skipping the ones where it's zero. How can I get all the months even if they have or not information.
You need a list of months.
How to generate Month list in PostgreSQL?
SELECT a.month , count( y.cust_id )
FROM allMonths a
LEFT JOIN yourTable y
ON a.month = date_trunc('MONTH', (date_signed_up::date))
GROUP BY a.month

How to force query to return only first row from window?

I have data:
id | price | date
1 | 25 | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
Is it possible to write such query which will return only first row from window? something like LIMIT 1 but for the window OVER( date )?
I expect next result:
id | price | date
1 | 25 | 2019-01-01
1 | 27 | 2019-02-01
Or ignore whole window if first window row has NULL:
id | price | date
1 | NULL | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
result:
1 | 27 | 2019-02-01
Order the rows by date and id, and take only the first row per date.
Then remove those where the price is NULL.
SELECT *
FROM (SELECT DISTINCT ON (date)
id, price, date
FROM mytable
ORDER BY date, id
) AS q
WHERE price IS NOT NULL;
#Laurenz let me to provide a bit more explanation
select distinct on (<fldlist>) * from <table> order by <fldlist+>;
is equal to much more complex query:
select * from (
select row_number() over (partition by <fldlist> order by <fldlist+>) as rn,*
from <table>)
where rn = 1;
And here <fldlist> should be the beginning part (or equal) of <fldlist+>
As Myon on IRC said:
if you want to use a window function in WHERE, you need to put it into a subselect first
So the target query is:
select * from (
select
*
agg_function( my_field ) OVER( PARTITION BY other_field ) as agg_field
from sometable
) x
WHERE agg_field <condition>
In my case I have next query:
SELECT * FROM (
SELECT *,
FIRST_VALUE( p.price ) over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS first_price,
ROW_NUMBER() over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS row_number
FROM st
LEFT JOIN price p ON <COND>
LEFT JOIN currency_rate crate ON <COND>
) p
WHERE p.row_number = 1 AND p.first_price IS NOT null
Here I select only first rows from the group and where price IS NOT NULL

Count the number of consecutive entries fulfilling a condition within a GROUP BY

I've got a list of users who are behind on their bills, and I want to generate an entry for each of them that says how many consecutive bills they've been behind on. So here's the table:
user | bill_date | outstanding_balance
---------------------------------------
a | 2017-03-01 | 90
a | 2016-12-01 | 60
a | 2016-09-01 | 30
b | 2017-03-01 | 50
b | 2016-12-01 | 0
b | 2016-09-01 | 40
c | 2017-03-01 | 0
c | 2016-12-01 | 0
c | 2016-09-01 | 1
And I want a query that would generate the following table:
user | consecutive_billing_periods_behind
-----------------------------------------
a | 3
b | 1
a | 0
In other words, if you've paid up at any point, I want to ignore all of the earlier entries, and only count how many billing periods you've been behind since you've been last paid up. How do I do this most simply?
If I understood the question correctly, first you need to find the last date that any given customer paid their bill so the last date their outstanding balance was 0. You can do this by this subquery:
(SELECT
user1,
bill_date AS no_outstanding_bill_date
FROM table1
WHERE outstanding_balance = 0)
Then you need get the last bill date and create field for each row if they are outstanding bill. Then filter the rows between the last clear day to last bill date of each customer by this where clause:
WHERE bill_date >= last_clear_day AND bill_date <= last_bill_date
Then if you put the pieces together you can have the results by this query:
SELECT
DISTINCT
user1,
sum(is_outstanding_bill)
OVER (
PARTITION BY user1 ) AS consecutive_billing_periods_behind
FROM (
SELECT
user1,
last_value(bill_date)
OVER (
PARTITION BY user1
ORDER BY bill_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_bill_date,
CASE WHEN outstanding_balance > 0
THEN 1
ELSE 0 END AS is_outstanding_bill,
bill_date,
outstanding_balance,
nvl(max(t2.no_outstanding_bill_date)
OVER (
PARTITION BY user1 ), min(bill_date)
OVER (
PARTITION BY user1 )) AS last_clear_day
FROM table1 t1
LEFT JOIN (SELECT
user1,
bill_date AS no_outstanding_bill_date
FROM table1
WHERE outstanding_balance = 0) t2 USING (user1)
) table2
WHERE bill_date >= last_clear_day AND bill_date <= last_bill_date
Since we are using distinct you will not need the group by clause.
select
user,
count(case when min_balance > 0 then 1 end)
as consecutive_billing_periods_behind
from
(
select
user,
min(outstanding_balance)
over (partition by user order by bill_date) as min_balance
from tbl
)
group by user
Or:
select
user,
count(*)
as consecutive_billing_periods_behind
from
(
select
user,
bill_date,
max(case when outstanding_balance = 0 then bill_date) over
(partition by user)
as max_bill_date_with_zero_balance
from tbl
)
where
-- If user has no outstanding_balance = 0, then
max_bill_date_with_zero_balance is null
-- Count all rows in this case.
-- Otherwise
or
-- count rows with
bill_date > max_bill_date_with_zero_balance
group by user

Iterate through rows, compare them against each other and store results in another table

I have a table that contains the following rows:
product_id | order_date
A | 12/04/12
A | 01/11/13
A | 01/21/13
A | 03/05/13
B | 02/14/13
B | 03/09/13
What I now need is an overview for each month, how many products have been bought for the first time (=have not been bought the month before), how many are existing products (=have been bought the month before) and how many have not been purchased within a given month. Taken the sample above as an input, the script should deliver the following result, regardless of what period of time is in the data:
month | new | existing | nopurchase
12/2012 | 1 | 0 | 0
01/2013 | 0 | 1 | 0
02/2013 | 1 | 0 | 1
03/2013 | 1 | 1 | 0
Would be great to get a first hint how this could be solved so I'm able to continue.
Thanks!
SQL Fiddle
with t as (
select product_id pid, date_trunc('month', order_date)::date od
from t
group by 1, 2
)
select od,
sum(is_new::integer) "new",
sum(is_existing::integer) existing,
sum(not_purchased::integer) nopurchase
from (
select od,
lag(t_pid) over(partition by s_pid order by od) is null and t_pid is not null is_new,
lag(t_pid) over(partition by s_pid order by od) is not null and t_pid is not null is_existing,
lag(t_pid) over(partition by s_pid order by od) is not null and t_pid is null not_purchased
from (
select t.pid t_pid, s.pid s_pid, s.od
from
t
right join
(
select pid, s.od
from
t
cross join
(
select date_trunc('month', d)::date od
from
generate_series(
(select min(od) from t),
(select max(od) from t),
'1 month'
) s(d)
) s
group by pid, s.od
) s on t.od = s.od and t.pid = s.pid
) s
) s
group by 1
order by 1