Subtracting values by id and a previous counter

Subtracting values by id and a previous counter - postgresql

Here is a Snippet of my data :-
customers order_id order_date order_counter
1 a 1/1/2018 1
1 b 1/4/2018 2
1 c 3/8/2018 3
1 d 4/9/2019 4
I'm trying to get the average number of days between the order time for each customer. So for the following Snippet the average number of days should be 32.66 days as there were 3,62,32 number of days between each order, sum it, and then divide by 3.
My data has Customers that may have more than 100+ orders .

You could use LAG function:
WITH cte AS (
SELECT customers,order_date-LAG(order_date) OVER(PARTITION BY customers ORDER BY order_counter) AS d
FROM t
)
SELECT customers, AVG(d)
FROM cte
WHERE d IS NOT NULL
GROUP BY customers;
db<>fiddle demo

With a self join, group by customer and get the average difference:
select
t.customers,
round(avg(tt.order_date - t.order_date), 2) averagedays
from tablename t inner join tablename tt
on tt.customers = t.customers and tt.order_counter = t.order_counter + 1
group by t.customers
See the demo.
Results:
| customers | averagedays |
| --------- | ----------- |
| 1 | 32.67 |

Please check below query.
I tried to insert data of two customers so that we can check that average for every customer is coming correct.
DB Fiddle Example: https://www.db-fiddle.com/
CREATE TABLE test (
customers INTEGER,
order_id VARCHAR(1),
order_date DATE,
order_counter INTEGER
);
INSERT INTO test
(customers, order_id, order_date, order_counter)
VALUES
('1', 'a', '2018-01-01', '1'),
('1', 'b', '2018-01-04', '2'),
('1', 'c', '2018-03-08', '3'),
('1', 'd', '2018-04-09', '4'),
('2', 'a', '2018-01-01', '1'),
('2', 'b', '2018-01-06', '2'),
('2', 'c', '2018-03-12', '3'),
('2', 'd', '2018-04-15', '4');
commit;
select customers , round(avg(next_order_diff),2) as average
from
(
select customers , order_date , next_order_date - order_date as next_order_diff
from
(
select customers ,
lead(order_date) over (partition by customers order by order_date) as next_order_date , order_date
from test
) a
where next_order_date is not null
) a
group by customers
order by customers
;

Another option. I would myself like the answer from #forpas except that it depends on the monotonically increasing value for order_counter (what happens when an order is deleted). The following accounts for that by actually counting the number of order pairs. It also accounts for customers have places only 1 order, returning NULL as the average.
select customers, round(sum(nd)::numeric/n, 2) avg_days_to_order
from (
select customers
, order_date - lag(order_date) over(partition by customers order by order_counter) nd
, count(*) over (partition by customers) - 1 n
from test
)d
group by customers, n
order by customers;

Related

Remove duplicates based on condition and keep oldest element

Currently, the table is ordered in ascending order by row_number. I need help removing duplicates based on 2 conditions.
If there is a stage, that is online then I want to keep that row, doesn't matter which one, there can be multiple.
If there isn't a row with online for that org_id, then I keep row_number = 1 which would be the oldest element.
sales_id
org_id
stage
row_number
ccc_123
ccc
off-line
1
ccc_123
ccc
off-line
2
ccc_123
ccc
online
3
abc_123
abc
off-line
1
abc_123
abc
power-off
2
zzz_123
zzz
power-off
1
so the table should look like this after:
sales_id
org_id
stage
ccc_123
ccc
online
abc_123
abc
off-line
zzz_123
zzz
power-off
Looks like this, stackoverflow not working well with second table for some reason

I would use a combination of a CASE statement to modify the rownumber of records with stage='online' and then use ROW_NUMBER to allow me to filter for the lowest value in a group.
http://sqlfiddle.com/#!17/1421b/5
create table sales_stage (
sales_id varchar,
org_id varchar,
stage varchar,
row_num int);
insert into sales_stage (sales_id, org_id, stage, row_num) values
('ccc_123', 'ccc', 'off-line', 1),
('ccc_123', 'ccc', 'off-line', 2),
('ccc_123', 'ccc', 'online', 3),
('abc_123', 'abc', 'off-line', 1),
('abc_123', 'abc', 'power-off', 2),
('zzz_123', 'zzz', 'power-off', 1);
SELECT
sales_id, org_id, stage
FROM
(
SELECT
sales_id, org_id, stage,
ROW_NUMBER() OVER(PARTITION BY sales_id, org_id ORDER BY row_num) as rn
FROM (
SELECT sales_id, org_id, stage,
CASE WHEN stage='online' THEN -999 ELSE row_num END as row_num
FROM sales_stage
) x
) y
WHERE rn = 1

Checking Slowly Changing Dimension 2

I have a table that looks like this:
A slowly changing dimension type 2, according to Kimball.
Key is just a surrogate key, a key to make rows unique.
As you can see there are three rows for product A.
Timelines for this product are ok. During time the description of the product changes.
From 1-1-2020 up until 4-1-2020 the description of this product was ProdA1.
From 5-1-2020 up until 12-2-2020 the description of this product was ProdA2 etc.
If you look at product B, you see there are gaps in the timeline.
We use DB2 V12 z/Os. How can I check if there are gaps in the timelines for each and every product?
Tried this, but doesn't work
with selectie (key, tel) as
(select product, count(*)
from PROD_TAB
group by product
having count(*) > 1)
Select * from
PROD_TAB A
inner join selectie B
on A.product = B.product
Where not exists
(SELECT 1 from PROD_TAB C
WHERE A.product = C.product
AND A.END_DATE + 1 DAY = C.START_DATE
)
Does anyone know the answer?

The following query returns all gaps for all products.
The idea is to enumerate (RN column) all periods inside each product by START_DATE and join each record with its next period record.
WITH
/*
MYTAB (PRODUCT, DESCRIPTION, START_DATE, END_DATE) AS
(
SELECT 'A', 'ProdA1', DATE('2020-01-01'), DATE('2020-01-04') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA2', DATE('2020-01-05'), DATE('2020-02-12') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA3', DATE('2020-02-13'), DATE('2020-12-31') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB1', DATE('2020-01-05'), DATE('2020-01-09') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB2', DATE('2020-01-12'), DATE('2020-03-14') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB3', DATE('2020-03-15'), DATE('2020-04-18') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB4', DATE('2020-04-16'), DATE('2020-05-03') FROM SYSIBM.SYSDUMMY1
)
,
*/
MYTAB_ENUM AS
(
SELECT
T.*
, ROWNUMBER() OVER (PARTITION BY PRODUCT ORDER BY START_DATE) RN
FROM MYTAB T
)
SELECT A.PRODUCT, A.END_DATE + 1 START_DT, B.START_DATE - 1 END_DT
FROM MYTAB_ENUM A
JOIN MYTAB_ENUM B ON B.PRODUCT = A.PRODUCT AND B.RN = A.RN + 1
WHERE A.END_DATE + 1 <> B.START_DATE
AND A.END_DATE < B.START_DATE;
The result is:
|PRODUCT|START_DT |END_DT |
|-------|----------|----------|
|B |2020-01-10|2020-01-11|
May be more efficient way:
WITH MYTAB2 AS
(
SELECT
T.*
, LAG(END_DATE) OVER (PARTITION BY PRODUCT ORDER BY START_DATE) END_DATE_PREV
FROM MYTAB T
)
SELECT PRODUCT, END_DATE_PREV + 1 START_DATE, START_DATE - 1 END_DATE
FROM MYTAB2
WHERE END_DATE_PREV + 1 <> START_DATE
AND END_DATE_PREV < START_DATE;

Thnx Mark, will try this one of these days.
Never heard of LAG in DB2 V12 for z/Os
Will read about it
Thnx

Unpivot Columns with Most Recent Record

Student Records are updated for subject and update date. Student can be enrolled in one or multiple subjects. I would like to get each student record with most subject update date and status.
CREATE TABLE Student
(
StudentID int,
FirstName varchar(100),
LastName varchar(100),
FullAddress varchar(100),
CityState varchar(100),
MathStatus varchar(100),
MUpdateDate datetime2,
ScienceStatus varchar(100),
SUpdateDate datetime2,
EnglishStatus varchar(100),
EUpdateDate datetime2
);
Desired query output, I am using CTE method but trying to find alternative and better way.
SELECT StudentID, FirstName, LastName, FullAddress, CityState, [SubjectStatus], UpdateDate
FROM Student
;WITH orginal AS
(SELECT * FROM Student)
,Math as
(
SELECT DISTINCT StudentID, FirstName, LastName, FullAddress, CityState,
ROW_NUMBER OVER (PARTITION BY StudentID, MathStatus ORDER BY MUpdateDate DESC) as rn
, _o.MathStatus as SubjectStatus, _o.MupdateDate as UpdateDate
FROM original as o
left join orignal as _o on o.StudentID = _o.StudentID
where _o.MathStatus is not null and _o.MUpdateDate is not null
)
,Science AS
(
...--Same as Math
)
,English AS
(
...--Same As Math
)
SELECT * FROM Math WHERE rn = 1
UNION
SELECT * FROM Science WHERE rn = 1
UNION
SELECT * FROM English WHERE rn = 1

First: storing data in a denormalized form is not recommended. Some data model redesign might be in order. There are multiple resources about data normalization available on the web, like this one.
Now then, I made some guesses about how your source table is populated based on the query you wrote. I generated some sample data that could show how the source data is created. Besides that I also reduced the number of columns to reduce my typing efforts. The general approach should still be valid.
Sample data
create table Student
(
StudentId int,
StudentName varchar(15),
MathStat varchar(5),
MathDate date,
ScienceStat varchar(5),
ScienceDate date
);
insert into Student (StudentID, StudentName, MathStat, MathDate, ScienceStat, ScienceDate) values
(1, 'John Smith', 'A', '2020-01-01', 'B', '2020-05-01'),
(1, 'John Smith', 'A', '2020-01-01', 'B+', '2020-06-01'), -- B for Science was updated to B+ month later
(2, 'Peter Parker', 'F', '2020-01-01', 'A', '2020-05-01'),
(2, 'Peter Parker', 'A+', '2020-03-01', 'A', '2020-05-01'), -- Spider-Man would never fail Math, fixed...
(3, 'Tom Holland', null, null, 'A', '2020-05-01'),
(3, 'Tom Holland', 'A-', '2020-07-01', 'A', '2020-05-01'); -- Tom was sick for Math, but got a second chance
Solution
Your question title already contains the word unpivot. That word actually exists in T-SQL as a keyword. You can learn about the unpivot keyword in the documentation. Your own solution already contains common table expression, these constructions should look familiar.
Steps:
cte_unpivot = unpivot all rows, create a Subject column and place the corresponding values (SubjectStat, Date) next to it with a case expression.
cte_recent = number the rows to find the most recent row per student and subject.
Select only those most recent rows.
This gives:
with cte_unpivot as
(
select up.StudentId,
up.StudentName,
case up.[Subject]
when 'MathStat' then 'Math'
when 'ScienceStat' then 'Science'
end as [Subject],
up.SubjectStat,
case up.[Subject]
when 'MathStat' then up.MathDate
when 'ScienceStat' then up.ScienceDate
end as [Date]
from Student s
unpivot ([SubjectStat] for [Subject] in ([MathStat], [ScienceStat])) up
),
cte_recent as
(
select cu.StudentId, cu.StudentName, cu.[Subject], cu.SubjectStat, cu.[Date],
row_number() over (partition by cu.StudentId, cu.[Subject] order by cu.[Date] desc) as [RowNum]
from cte_unpivot cu
)
select cr.StudentId, cr.StudentName, cr.[Subject], cr.SubjectStat, cr.[Date]
from cte_recent cr
where cr.RowNum = 1;
Result
StudentId StudentName Subject SubjectStat Date
----------- --------------- ------- ----------- ----------
1 John Smith Math A 2020-01-01
1 John Smith Science B+ 2020-06-01
2 Peter Parker Math A+ 2020-03-01
2 Peter Parker Science A 2020-05-01
3 Tom Holland Math A- 2020-07-01
3 Tom Holland Science A 2020-05-01

How can I combine the two select queries on the same table horizontally in Postgresql?

everyone. I am a beginner of Postgresql. Recently I met with one question.
I have one table named 'sales'.
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
);
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
insert into sales values ('Emily', 'Soap', 2, 4, 2002, 'CT', 2549);
insert into sales values ('Bloom', 'Eggs', 30, 11, 2000, 'NJ', 559);
....
There are 498 rows in total.
Here is the overview of this table:
Now I want to compute the maximum and minimum sales quantities for each product, along with their corresponding customer (who purchased the product), dates (i.e., dates of those maximum and minimum sales quantities) and the state in which the sale transaction took place.
And the average sales quantity for the corresponding products.
The combined one should be like this:
It should have 10 rows because there are 10 distinct products in total.
I have tried:
select prod,
max(quant),
cust as MAX_CUST
from sales
group by prod;
but it returned an error and said the cust should be in the group by. But I only want to classify by the type of product.
What's more, how can I horizontally combine the max_q and its customer, date, state with min_q and its customer, date, state and also the AVG_Q by their product name?
I feel really confused!

You can use analytic function ROW_NUMBER to rank records by increasing/decreasing sales for each product in a subquery, and then do conditional aggregation:
SELECT
prod product,
MAX(CASE WHEN rn2 = 1 THEN quant END) max_quant,
MAX(CASE WHEN rn2 = 1 THEN cust END) max_cust,
MAX(CASE WHEN rn2 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) max_date,
MAX(CASE WHEN rn2 = 1 THEN state END) max_state,
MAX(CASE WHEN rn1 = 1 THEN quant END) min_quant,
MAX(CASE WHEN rn1 = 1 THEN cust END) min_cust,
MAX(CASE WHEN rn1 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) min_date,
MAX(CASE WHEN rn1 = 1 THEN state END) min_state,
avg_quant
FROM (
SELECT
s.*,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant) rn1,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant DESC) rn2,
AVG(quant) OVER(PARTITION BY prod) avg_quant
FROM sales s
) x
WHERE rn1 = 1 OR rn2 = 1
GROUP BY prod, avg_quant

With two aggregate function (min, max) applied on a column and selecting respective row is not that straight forward. if u wanted only one aggregate function u could do something like example below with dense rank (window function).
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2;
this will give you rows for a product with maximum quant. you can do same for minimum quant. it will more complicated to do both in same query, you can do it in simple way of creating on the fly tables for each case and joining them as show below.
with max_quant as (
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2
),
min_quant as (
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2
),
avg_quant as (
select prod, avg(quant) as avg_quant from sales group by prod
)
select mx.prod, mx.quant, mx.cust, mn.quant, mn.cust, ag.avg_quant
from max_quant mx
join min_quant mn on mn.prod = mx.prod
join avg_quant ag on ag.prod = mx.prod;
you cant use a group by to select min/max here as you want to get the complete row for the min/max value of quant which is not possible directly with group by.

Min date flag in select

I have a table with records for sales of products.
For the purpose of sales count a product should only be counted one time.
In this scenario a product is sold and reversed several times and we should only consider it in the month with minimum date and rest all the dates should be marked no.
Eample:
Product Month Sales flag
A Jan-01 Y
B Jan-01 Y
A Feb-01 N
C Feb-01 Y
How can I write a select from the table indicating as above. Any help would be appreciated.
Tried and failed.

The trick here is that ordering by "Jan-01", "Feb-01", etc... is tricky because you need to sort numeric values stored as text. This is one of the uses of a calendar table or data dimension. In my solution below I'm creating an on-the-fly date dimension table with "Month-number" you can sort by...
-- Sample data
DECLARE #table TABLE
(
Product CHAR(1) NOT NULL,
Mo CHAR(6) NOT NULL
)
INSERT #table VALUES
('A', 'Jan-01'),
('B', 'Jan-01'),
('A', 'Feb-01'),
('C', 'Feb-01');
-- Solution
SELECT f.Product, f.Mo, [Sales Flag] = CASE f.rnk WHEN 1 THEN 'Y' ELSE 'N' END
FROM
(
SELECT t.Product, i.Mo, rnk = ROW_NUMBER() OVER (PARTITION BY t.Product ORDER BY i.RN)
FROM #table AS t
JOIN
(
SELECT i.RN, Mo = LEFT(DATENAME(MONTH,DATEADD(MONTH, i.RN-1, '20010101')),3)+'-01'
FROM
(
SELECT RN = ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS x(x)
) AS i
) AS i ON t.Mo = i.Mo
) AS f;
Returns:
Product Mo Sales Flag
------- ------ ----------
A Jan-01 Y
A Feb-01 N
B Jan-01 Y
C Feb-01 Y

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Subtracting values by id and a previous counter - postgresql

You could use LAG function: WITH cte AS ( SELECT customers,order_date-LAG(order_date) OVER(PARTITION BY customers ORDER BY order_counter) AS d FROM t ) SELECT customers, AVG(d) FROM cte WHERE d IS NOT NULL GROUP BY customers; db<>fiddle demo

Related

Remove duplicates based on condition and keep oldest element

Checking Slowly Changing Dimension 2

Unpivot Columns with Most Recent Record

How can I combine the two select queries on the same table horizontally in Postgresql?

Min date flag in select

Categories

Resources