Postgres: Query to compare Data with previous Data

Postgres: Query to compare Data with previous Data - postgresql

I have a main table where all my results will be written to.
Each object that will be checked is identified by the item_id:
Checkdate item_id Price Cat A Price Cat B
2017-04-25 1 29.99 84.99
2017-04-24 1 39.99 89.99
2017-04-23 1 39.99 91.99
2017-04-25 2 42.99 88.99
2017-04-23 2 41.99 81.99
2017-04-22 2 50.99 81.99
2017-04-21 2 42.99 81.99
In the postgres query i select all results with the current_date = checkdate to provide the newest data:
Item Price Cat A Price Cat B
1 29.99 84.99
2 42.99 88.99
So far its not a problem for me. But now i want to compare these results with the previous results. Something like that:
Item Price Cat A Price Cat A Before Price Cat B Price Cat B Before
1 29.99 39.99 84.99 89.99
2 42.99 41.99 88.99 81.99
But I have no idea how to do that. These items doesn't exist on every day (item 2 doesn't exist on 2017-04-24 for example).
Can someone help me?

select
item_id,
min(price_cat_a) filter (where rn = 1) as a,
min(price_cat_a) filter (where rn = 2) as a_before,
min(price_cat_b) filter (where rn = 1) as b,
min(price_cat_b) filter (where rn = 2) as b_before
from (
select
item_id, price_cat_a, price_cat_b,
row_number() over (partition by item_id order by checkdate desc) as rn
from t
where checkdate <= current_date
) s
where rn <= 2
group by item_id
;
item_id | a | a_before | b | b_before
---------+-------+----------+-------+----------
1 | 29.99 | 39.99 | 84.99 | 89.99
2 | 42.99 | 41.99 | 88.99 | 81.99

You can use a lateral join:
SELECT today.item_id,
today."Price Cat A",
before."Price Cat A" AS "Price Cat A Before",
today."Price Cat B",
before."Price Cat B" AS "Price Cat B Before"
FROM main today
CROSS JOIN LATERAL
(SELECT "Price Cat A",
"Price Cat B"
FROM main
WHERE item_id = today.item_id
AND "Checkdate" < today."Checkdate"
ORDER BY "Checkdate" DESC
LIMIT 1
) before
WHERE today."Checkdate" = current_date
ORDER BY today.item_id;

These items doesn't exist on every day -- because of this, your original query has an error too (i.e. it won't contain all of your items).
If you are looking for the last (and the second last) checkdate, there is no need to use current_date (unless, there might be future data in your table; in that case just append where checkdate <= current_date to filter them out).
Finding the last row (within its group, i.e. in your case, it's item_id) is a typical greatest-n-per-group problem, and the second last is easy with the lag() window function:
select distinct on (item_id)
item_id,
price_cat_a,
price_cat_a_before,
price_cat_b,
price_cat_b_before
from (select *,
lag(price_cat_a) over w price_cat_a_before,
lag(price_cat_b) over w price_cat_b_before
from t
window w as (partition by item_id order by checkdate)) t
order by item_id, checkdate desc
http://rextester.com/AGZ99646

Related

PostgreSQL: Find percentages of total_films_rented

The code below gives me the following results
Early: 7738
Late: 6586
On Time: 1720
How would I take this a step further and add a third column that finds the percentages?
Here is a link to the ERD and database set-up: https://www.postgresqltutorial.com/postgresql-sample-database/
WITH
t1
AS
(
SELECT *, DATE_PART('day', return_date - rental_date) AS days_rented
FROM rental
),
t2
AS
(
SELECT rental_duration, days_rented,
CASE WHEN rental_duration > days_rented THEN 'Early'
WHEN rental_duration = days_rented THEN 'On Time'
ELSE 'Late'
END AS rental_return_status
FROM film f, inventory i, t1
WHERE f.film_id = i.film_id AND t1.inventory_id = i.inventory_id
)
SELECT rental_return_status, COUNT(*) AS total_films_rented
FROM t2
GROUP BY 1
ORDER BY 2 DESC;

You can use a window function with one CTE table (instead of 2):
WITH raw_status AS (
SELECT rental_duration - DATE_PART('day', return_date - rental_date) AS days_remaining
FROM rental r
JOIN inventory i ON r.inventory_id=i.inventory_id
JOIN film f on f.film_id=i.film_id
)
SELECT CASE WHEN days_remaining > 0 THEN 'Early'
WHEN days_remaining = 0 THEN 'On Time'
ELSE 'Late' END AS rental_status,
count(*),
(100*count(*))/sum(count(*)) OVER () AS percentage
FROM raw_status
GROUP BY 1;
rental_status | count | percentage
---------------+-------+---------------------
Early | 7738 | 48.2298678633757168
On Time | 1720 | 10.7205185739217153
Late | 6586 | 41.0496135627025679
(3 rows)
Disclosure: I work for EnterpriseDB (EDB)

Use a window function to get the sum of the count column (sum(count(*)) over ()), then just divide the count by that (count(*)/sum(count(*)) over ()). Multiply by 100 to make it a percentage.
psql (12.1 (Debian 12.1-1))
Type "help" for help.
testdb=# CREATE TABLE faket2 AS (
SELECT 'early' AS rental_return_status UNION ALL
SELECT 'early' UNION ALL
SELECT 'ontime' UNION ALL
SELECT 'late');
SELECT 4
testdb=# SELECT
rental_return_status,
COUNT(*) as total_films_rented,
(100*count(*))/sum(count(*)) over () AS percentage
FROM faket2
GROUP BY 1
ORDER BY 2 DESC;
rental_return_status | total_films_rented | percentage
----------------------+--------------------+---------------------
early | 2 | 50.0000000000000000
late | 1 | 25.0000000000000000
ontime | 1 | 25.0000000000000000
(3 rows)

Gaps and Islands - get a list of dates unemployed over a date range with Postgresl

I have a table called Position, in this table, I have the following, dates are inclusive (yyyy-mm-dd), below is a simplified view of the employment dates
id, person_id, start_date, end_date , title
1 , 1 , 2001-12-01, 2002-01-31, 'admin'
2 , 1 , 2002-02-11, 2002-03-31, 'admin'
3 , 1 , 2002-02-15, 2002-05-31, 'sales'
4 , 1 , 2002-06-15, 2002-12-31, 'ops'
I'd like to be able to calculate the gaps in employment, assuming some of the dates overlap to produce the following output for the person with id=1
person_id, start_date, end_date , last_position_id, gap_in_days
1 , 2002-02-01, 2002-02-10, 1 , 10
1 , 2002-06-01, 2002-06-14, 3 , 14
I have looked at numerous solutions, UNIONS, Materialized views, tables with generated calendar date ranges, etc. I really am not sure what is the best way to do this. Is there a single query where I can get this done?

step-by-step demo:db<>fiddle
You just need the lead() window function. With this you are able to get a value (start_date in this case) to the current row.
SELECT
person_id,
end_date + 1 AS start_date,
lead - 1 AS end_date,
id AS last_position_id,
lead - (end_date + 1) AS gap_in_days
FROM (
SELECT
*,
lead(start_date) OVER (PARTITION BY person_id ORDER BY start_date)
FROM
positions
) s
WHERE lead - (end_date + 1) > 0
After getting the next start_date you are able to compare it with the current end_date. If they differ, you have a gap. These positive values can be filtered within the WHERE clause.
(if 2 positions overlap, the diff is negative. So it can be ignored.)

first you need to find what dates overlaps Determine Whether Two Date Ranges Overlap
then merge those ranges as a single one and keep the last id
finally calculate the ranges of days between one end_date and the next start_date - 1
SQL DEMO
with find_overlap as (
SELECT t1."id" as t1_id, t1."person_id", t1."start_date", t1."end_date",
t2."id" as t2_id, t2."start_date" as t2_start_date, t2."end_date" as t2_end_date
FROM Table1 t1
LEFT JOIN Table1 t2
ON t1."person_id" = t2."person_id"
AND t1."start_date" <= t2."end_date"
AND t1."end_date" >= t2."start_date"
AND t1.id < t2.id
), merge_overlap as (
SELECT
person_id,
start_date,
COALESCE(t2_end_date, end_date) as end_date,
COALESCE(t2_id, t1_id) as last_position_id
FROM find_overlap
WHERE t1_id NOT IN (SELECT t2_id FROM find_overlap WHERE t2_ID IS NOT NULL)
), cte as (
SELECT *,
LEAD(start_date) OVER (partition by person_id order by start_date) next_start
FROM merge_overlap
)
SELECT *,
DATE_PART('day',
(next_start::timestamp - INTERVAL '1 DAY') - end_date::timestamp
) as days
FROM cte
WHERE next_start IS NOT NULL
OUTPUT
| person_id | start_date | end_date | last_position_id | next_start | days |
|-----------|------------|------------|------------------|------------|------|
| 1 | 2001-12-01 | 2002-01-31 | 1 | 2002-02-11 | 10 |
| 1 | 2002-02-11 | 2002-05-31 | 3 | 2002-06-15 | 14 |

Difference between the max date and the penultimate max for specific employee - postgresql

Bit stuck on a problem. Trying to find the difference between two dates in postgreSQL.
I have a table emp with many employees in it:
emp_id, date
1, 31-10-2017
1, 08-08-2017
1, 02-06-2017
I want it to look like this:
emp_id, max_date, penultimate_date, difference
1, 31-10-2017, 08-08-2017, 84 days
Obviously you can use max(date) and group by the emp_id, however how do you retrieve the penultimate date. I have used a few functions like:
order by date desc limit 1 offset 1
I have also tried to put these in sub queries but that hasn,t worked as there are many employee numbers and I need one row for each employee.
Can anyone help???
Thanks,
pp84

as kindly suggested by #Haleemur Ali, order by date desc limit 1 offset 1 would not work with several emp_id:
t=# with d(emp_id, date)as (values(1, '31-10-2017'::date),(1, '08-08-2017'),(1, '02-06-2017' ),(2,'2016-01-01'),(2,'2016-02-02'),(2,'2016-03-03'))
select distinct emp_id
, max(date) over (partition by emp_id) max_date
, nth_value(date,2) over (partition by emp_id) penultimate_date
, max(date) over (partition by emp_id) - nth_value(date,2) over (partition by emp_id) diff
from d
;
emp_id | max_date | penultimate_date | diff
--------+------------+------------------+------
2 | 2016-03-03 | 2016-02-02 | 30
1 | 2017-10-31 | 2017-08-08 | 84
(2 rows)
Time: 0.756 ms

WITH emps (emp_id, date) AS (
VALUES (1, '2017-10-31'::DATE)
, (1, '2017-08-08'::DATE)
, (1, '2017-08-08'::DATE)
)
SELECT DISTINCT ON (emp_id)
emp_id
, "date" max_date
, LEAD("date") OVER w penultimate_date
, "date" - LEAD("date") OVER w difference
FROM emps
WINDOW w AS (PARTITION BY emp_id)
ORDER BY emp_id, date DESC
When ordered in descending order, the LEAD("date") w will give the value of the date value from the next row.
The DISTINCT ON limits the resultset to 1 row (the first row encountered) per emp_id.
With our ordering this first row must contain the greatest date, and the LEAD(...) over w therefore returns the penultimate date. This gives us the following result:
emp_id | max_date | penultimate_date | difference
--------+------------+------------------+------------
1 | 2017-10-31 | 2017-08-08 | 84
(1 row)

Count the number of consecutive entries fulfilling a condition within a GROUP BY

I've got a list of users who are behind on their bills, and I want to generate an entry for each of them that says how many consecutive bills they've been behind on. So here's the table:
user | bill_date | outstanding_balance
---------------------------------------
a | 2017-03-01 | 90
a | 2016-12-01 | 60
a | 2016-09-01 | 30
b | 2017-03-01 | 50
b | 2016-12-01 | 0
b | 2016-09-01 | 40
c | 2017-03-01 | 0
c | 2016-12-01 | 0
c | 2016-09-01 | 1
And I want a query that would generate the following table:
user | consecutive_billing_periods_behind
-----------------------------------------
a | 3
b | 1
a | 0
In other words, if you've paid up at any point, I want to ignore all of the earlier entries, and only count how many billing periods you've been behind since you've been last paid up. How do I do this most simply?

If I understood the question correctly, first you need to find the last date that any given customer paid their bill so the last date their outstanding balance was 0. You can do this by this subquery:
(SELECT
user1,
bill_date AS no_outstanding_bill_date
FROM table1
WHERE outstanding_balance = 0)
Then you need get the last bill date and create field for each row if they are outstanding bill. Then filter the rows between the last clear day to last bill date of each customer by this where clause:
WHERE bill_date >= last_clear_day AND bill_date <= last_bill_date
Then if you put the pieces together you can have the results by this query:
SELECT
DISTINCT
user1,
sum(is_outstanding_bill)
OVER (
PARTITION BY user1 ) AS consecutive_billing_periods_behind
FROM (
SELECT
user1,
last_value(bill_date)
OVER (
PARTITION BY user1
ORDER BY bill_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_bill_date,
CASE WHEN outstanding_balance > 0
THEN 1
ELSE 0 END AS is_outstanding_bill,
bill_date,
outstanding_balance,
nvl(max(t2.no_outstanding_bill_date)
OVER (
PARTITION BY user1 ), min(bill_date)
OVER (
PARTITION BY user1 )) AS last_clear_day
FROM table1 t1
LEFT JOIN (SELECT
user1,
bill_date AS no_outstanding_bill_date
FROM table1
WHERE outstanding_balance = 0) t2 USING (user1)
) table2
WHERE bill_date >= last_clear_day AND bill_date <= last_bill_date
Since we are using distinct you will not need the group by clause.

select
user,
count(case when min_balance > 0 then 1 end)
as consecutive_billing_periods_behind
from
(
select
user,
min(outstanding_balance)
over (partition by user order by bill_date) as min_balance
from tbl
)
group by user
Or:
select
user,
count(*)
as consecutive_billing_periods_behind
from
(
select
user,
bill_date,
max(case when outstanding_balance = 0 then bill_date) over
(partition by user)
as max_bill_date_with_zero_balance
from tbl
)
where
-- If user has no outstanding_balance = 0, then
max_bill_date_with_zero_balance is null
-- Count all rows in this case.
-- Otherwise
or
-- count rows with
bill_date > max_bill_date_with_zero_balance
group by user

Compare interval date by row

I am trying to group dates within a 1 year interval given an identifier by labeling which is the earliest date and which is the latest date. If there are no dates within a 1 year interval from that date, then it will record it's own date as the first and last date. For example originally the data is:
id | date
____________
a | 1/1/2000
a | 1/2/2001
a | 1/6/2000
b | 1/3/2001
b | 1/3/2000
b | 1/3/1999
c | 1/1/2000
c | 1/1/2002
c | 1/1/2003
And the output I want is:
id | first_date | last_date
___________________________
a | 1/1/2000 | 1/2/2001
b | 1/3/1999 | 1/3/2001
c | 1/1/2000 | 1/1/2000
c | 1/1/2002 | 1/1/2003
I have been trying to figure this out the whole day and can't figure it out. I can do it for cases id's with only 2 duplicates, but can't for greater values. Any help would be great.

SELECT id
, min(min_date) AS min_date
, max(max_date) AS max_date
, sum(row_ct) AS row_ct
FROM (
SELECT id, year, min_date, max_date, row_ct
, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM (
SELECT id
, extract(year FROM the_date)::int AS year
, min(the_date) AS min_date
, max(the_date) AS max_date
, count(*) AS row_ct
FROM tbl
GROUP BY id, year
) sub1
) sub2
GROUP BY id, grp
ORDER BY id, grp;
1) Group all rows per (id, year), in subquery sub1. Record min and max of the date. I added a count of rows (row_ct) for demonstration.
2) Subtract the row_number() from the year in the second subquery sub2. Thus, all rows in succession end up in the same group (grp). A gap in the years starts a new group.
3) In the final SELECT, group a second time, this time by (id, grp) and record min, max and row count again. Voilá. Produces exactly the result you are looking for.
-> SQLfiddle demo.
Related answers:
Return array of years as year ranges
Group by repeating attribute

select id, min ([date]) first_date, max([date]) last_date
from <yourTbl> group by id

Use this (SQLFiddle Demo):
SELECT id,
min(date) AS first_date,
max(date) AS last_date
FROM mytable
GROUP BY 1
ORDER BY 1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres: Query to compare Data with previous Data - postgresql

Related

PostgreSQL: Find percentages of total_films_rented

Gaps and Islands - get a list of dates unemployed over a date range with Postgresl

Difference between the max date and the penultimate max for specific employee - postgresql

Count the number of consecutive entries fulfilling a condition within a GROUP BY

Compare interval date by row

Categories

Resources