Count number of unique purchase dates - oracle10g

I have a log of purchases made by customers. Sometimes a customer purchases multiple items during a given purchase, other times they only purchase a single item. What I want to do, on a line by line basis, is identify which purchase events have happened (i.e. not on an item by item basis, but on a checkout by checkout basis).
Each row of the source database contains the following fields
cust_id, purchase_date, sku
So a customer who purchases three items during a given transaction would look like this
1, 01/01/01, dog1
1, 01/01/01, cat1
1, 01/01/01, mouse1
1, 01/02/01, wolf1
1, 01/03/01, lion1
WHat I want out is
cust_id, purchase_date, sku, item_purchase_number_within_purchase, unique_purchase_date_across_dates
And that would look like
1, 01/01/01, dog1, 1, 1
1, 01/01/01, cat1, 2, 1
1, 01/01/01, mouse1, 3, 1
1, 01/02/01, wolf1, 1, 2
1, 01/03/01, lion1, 1, 3
In words, on the first date, three items where purchased arbitrarily identified as purchase numbers, 1, 2, and 3, on the second purchase date (Jan 2nd, 2001), only a single item was purchase, but this was the second purchasing event, and then on the third purchasing date (Jan 3, 2001) there was another single item purchased.
I'm trying to do this in oracle10g. I'm not sure how to describe what I'm accomplishing.
This is the sql I have so far
SELECT
cust_id, purchase_date, sku, ROW_NUMBER() OVER (PARTITION BY purchase_date ORDER BY sku)
FROM
[table]
Thanks

You seem to want dense_rank() rather than row_number() (or rank()) to avoid gaps. With your sample data in a CTE:
with t (cust_id, purchase_date, sku) as (
select 1, date '2001-01-01', 'dog1' from dual
union all select 1, date '2001-01-01', 'cat1' from dual
union all select 1, date '2001-01-01', 'mouse1' from dual
union all select 1, date '2001-01-02', 'wolf1' from dual
union all select 1, date '2001-01-03', 'lion1' from dual
)
select cust_id, purchase_date, sku,
dense_rank() over (partition by cust_id, purchase_date order by sku)
as item_within_purchase,
dense_rank() over (partition by cust_id order by purchase_date)
as purchase_event
from t;
CUST_ID PURCHASE_D SKU ITEM_WITHIN_PURCHASE PURCHASE_EVENT
---------- ---------- ------ -------------------- --------------
1 2001-01-01 cat1 1 1
1 2001-01-01 dog1 2 1
1 2001-01-01 mouse1 3 1
1 2001-01-02 wolf1 1 2
1 2001-01-03 lion1 1 3
The first extra column is partition by both customer and date, and ordered by SKU as you had; the second is only partitioned by customer, and ordered by date.

Related

tsql max and group by not working properly

I got the following table articles:
ID
category
price
1
category1
10
2
category1
55
3
category2
15
4
category3
20
5
category4
25
I would like to get the highest price of each category.
The result would be:
ID
category
price
2
category1
55
3
category2
15
4
category3
20
5
category4
25
select Max(price), ID, category from article
group by ID,category
returns:
ID
category
price
1
category1
10
2
category1
55
3
category2
15
4
category3
20
5
category4
25
Unfortunately I get both rows for category 1. But I only would like to have the highest price in category 1 which is 55.
Can someone help me?
see above
Try this...
I've reproduced your sample data and then added a rnk column which ranks by price descending witin each category, used this in the subquery and just returned anything where rank is 1.
DECLARE #articles TABLE (ID int, Category varchar(20), Price float)
INSERT INTO #articles VALUES
(1, 'category1', 10),
(2, 'category1', 55),
(3, 'category2', 15),
(4, 'category3', 20),
(5, 'category4', 25)
SELECT
ID, Category, Price
FROM (
SELECT
ID, Category, Price
, RANK() OVER(PARTITION BY Category ORDER BY Price DESC) as rnk
FROM #articles
) a
WHERE a.rnk = 1
Which gives these results
Bote If you have two articles for the same category with the same price, both will be returned.
--===== This is NOT a part of the solution.
-- We're just making "Readily Consumable Test Data" here.
-- This is how you should post sample data to help those
-- that would help you. You'll get more thumbs up on your
-- questions, as well
SELECT *
INTO #Articles
FROM (VALUES
(1, 'category1', 10)
,(2, 'category1', 55)
,(3, 'category2', 15)
,(4, 'category3', 20)
,(5, 'category4', 25)
)d(ID,category,price)
;
--===== One possible easy solution that will also display "ties".
WITH cteRankByCategory AS
(
SELECT *,DR = DENSE_RANK() OVER (PARTITION BY Category ORDER BY Category, Price DESC)
FROM #Articles
)
SELECT ID,Category,MaxPrice = Price
FROM cteRankByCategory
WHERE DR = 1
ORDER BY Category
;

Oracle SQL return value from child table with minimum row number with values in specific list

I have a need to select all rows from a table (main table) and join to another table (child table). In the results set, I want to include one column from the child table, that is only the first row / line number with a column value in a specified list. If there is no match for the specified list, it should be (null)
Desired Result:
ORDER_NO
ORDER_DATE
ORDER CUST
ORDER_VALUE
ITEM
1
02/14/2022
12345
$1,000.00
APPLES
2
02/13/2022
67890
$5,000.00
(null)
3
02/12/2022
45678
$100.00
PEARS
Example:
Main Table: Order Table
Order Number (Handle)
Order Date,
Order Customer,
Order Value
ORDER_NO
ORDER_DATE
ORDER CUST
ORDER_VALUE
1
02/14/2022
12345
$1,000.00
2
02/13/2022
67890
$5,000.00
3
02/12/2022
45678
$100.00
Child Table: Order Details Tbl
Order Number (Handle)
Line Number = Order Line No
Ordered Item,
Ordered Qty
ORDER_NO
LINE_NO
ITEM
1
10
APPLES
1
20
ORANGES
1
30
LETTUCE
2
10
BROCCOLI
2
20
CAULIFLOWER
2
30
LETTUCE
3
10
KALE
3
20
RADISHES
3
30
PEARS
In this example, the returned column is essentially the first line of the order that is a fruit, not a vegetable. And if the order includes no matching fruit, null is returned.
What my code is thus far:
SELECT
MAIN.ORDER_NO,
MAIN.ORDER_DATE,
MAIN.ORDER_CUST,
MAIN.ORDER_VALUE,
B.ITEM
FROM
MAIN
LEFT JOIN
(
SELECT
CHILD.ORDER_NO,
CHILD.LINE_NO,
CHILD.ITEM
FROM
CHILD
WHERE
CHILD.ORDER_NO||'_'||LINE_NO IN
(
SELECT
CHILD.ORDER_NO||'_'||MIN(LINE_NO) AS ORDER_LINE_NO
FROM
CHILD
WHERE
CHILD.ITEM IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
GROUP BY
CHILD.ORDER_NO
)
) B ON MAIN.ORDER_NO = B.ORDER_NO
'''
This code is of course not working as desired, as table 'B' is including all results from CHILD.
From Oracle 12, you can use:
SELECT o.*,
d.item
FROM orders o
LEFT OUTER JOIN LATERAL(
SELECT *
FROM order_details d
WHERE o.order_no = d.order_no
AND item IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
ORDER BY line_no ASC
FETCH FIRST ROW ONLY
) d
ON (1 = 1)
In earlier versions you can use:
SELECT o.*,
d.item
FROM orders o
LEFT OUTER JOIN(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY order_no ORDER BY line_no ASC)
AS rn
FROM order_details d
WHERE item IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
) d
ON (o.order_no = d.order_no AND rn = 1)
Which, for the sample data:
CREATE TABLE orders (ORDER_NO, ORDER_DATE, ORDER_CUST, ORDER_VALUE) AS
SELECT 1, DATE '2022-02-14', 12345, 1000.00 FROM DUAL UNION ALL
SELECT 2, DATE '2022-02-13', 67890, 5000.00 FROM DUAL UNION ALL
SELECT 3, DATE '2022-02-12', 45678, 100.00 FROM DUAL;
CREATE TABLE Order_Details (ORDER_NO, LINE_NO, ITEM) AS
SELECT 1, 10, 'APPLES' FROM DUAL UNION ALL
SELECT 1, 20, 'ORANGES' FROM DUAL UNION ALL
SELECT 1, 30, 'LETTUCE' FROM DUAL UNION ALL
SELECT 2, 10, 'BROCCOLI' FROM DUAL UNION ALL
SELECT 2, 20, 'CAULIFLOWER' FROM DUAL UNION ALL
SELECT 2, 30, 'LETTUCE' FROM DUAL UNION ALL
SELECT 3, 10, 'KALE' FROM DUAL UNION ALL
SELECT 3, 20, 'RADISHES' FROM DUAL UNION ALL
SELECT 3, 30, 'PEARS' FROM DUAL;
Both output:
ORDER_NO
ORDER_DATE
ORDER_CUST
ORDER_VALUE
ITEM
1
2022-02-14 00:00:00
12345
1000
APPLES
2
2022-02-13 00:00:00
67890
5000
null
3
2022-02-12 00:00:00
45678
100
PEARS
db<>fiddle here

Cohort Analysis with RedShift by Month

I am trying to build a cohort analysis for monthly retention but experiencing challenge getting the Month Number column right. The month number is supposed to return month(s) user transacted i.e 0 for registration month, 1 for the first month after registration month, 2 for the second month until the last month but currently, it returns negative month numbers in some cells.
It should be like this table:
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
Here is the SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
I think the RANK window function is the right solution. So the idea is to assigne a rank to months of user activities for each user, order by year and month.
Something like:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
RANK number starts from 1, so you may want to substract 1.
Then, you can group by user_id and month_number to get the number of interactions for each user per month from the subscription (adapt to your use case accordingly).
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2
Here is the documentation:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html

How can I remove the null values and make it to 10 rows in Postgresql?

I am new to Postgresql. I have a table called 'sales'.
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
)
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
insert into sales values ('Emily', 'Soap', 2, 4, 2002, 'CT', 2549);
something like this:
Now I want to find the “most favorable” month (when most amount of the product was
sold) and the “least favorable” month (when the least amount of the product was sold) for each product.
The result should be like this:
I entered
SELECT
prod product,
MAX(CASE WHEN rn2 = 1 THEN month END) MOST_FAV_MO,
MAX(CASE WHEN rn1 = 1 THEN month END) LEAST_FAV_MO
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant ) rn1,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant DESC) rn2
FROM sales
) x
WHERE rn1 = 1 or rn2 = 1
GROUP BY prod,quant;
Then there are null values for each product and there are 20 rows in total:
So how can I remove the null values in these rows and make the total number of rows to 10 (There are 10 distinct products in total)???
I would say that the GROUP BY clause should be
GROUP BY prod
Otherwise you get one line per different quant, which is not what you want.

Group events by sequence, defining the minimum period between sequences t-SQL

I have a table of events, called tbl_events that looks something like this:
PersonID Date
1 30/03/2015
1 22/04/2015
1 30/06/2015
2 18/07/2016
2 09/12/2016
2 28/04/2017
3 01/10/2014
3 28/11/2016
3 28/11/2016
3 16/01/2017
4 13/04/2017
4 09/05/2017
I want to be able to group these events up by the start date of each 'sequence', with a sequence being defined as a run of events from the first identified to the last identified for each PersonID. The last event in a sequence is defined as the event where thereafter there are no subsequent events for that PersonID for a year.
The result of this I would expect to look like is below:
PersonID FirstDate Sequence Events
1 30/03/2015 1 3
2 18/07/2016 1 3
3 01/10/2014 1 1
3 28/11/2016 2 3
4 13/04/2017 1 2
I am able to identify the sequences in Excel and pivot the data, but I need to be able to do this in SQL.
Here is the formula I have used in Excel to generate the sequence number (I am populating cell C3, with column A being PersonID and B being Date):
=+IF(A2<>A3,1,IF((B3-B2)<365,C2,C2+1))
I have joined the table back on itself using ROW_NUMBER to get the difference between the Date and the previous event date for that ID, but I'm not really sure where to go from there.
Any help is much appreciated.
My solution is based on the sample data you've provided along with your excel formula.
-- easily consumable sample data
DECLARE #tbl_events TABLE (PersonId int, [date] date)
INSERT #tbl_events VALUES
(1,'20150330'),(1,'20150422'),(1,'20150630'),(2,'20160718'),(2,'20161209'),(2,'20170428'),
(3,'20141001'),(3,'20161128'),(3,'20161128'),(3,'20170116'),(4,'20170413'),(4,'20170509');
-- Solution
WITH groupings AS
(
SELECT
PersonId,
FirstDate = MIN([date]) OVER (PARTITION BY personId ORDER BY [date]),
NextDate = LAG([date],1,[date]) OVER (PARTITION BY personId ORDER BY [date]),
[date],
grouper =
DATEDIFF(DAY, MIN([date]) OVER (PARTITION BY personId ORDER BY [date]), [date]) / 365
FROM #tbl_events
),
Prep AS
(
SELECT
PersonId,
firstDate = IIF(grouper = 0, FirstDate, IIF(FirstDate = NextDate, [date],NextDate))
FROM groupings
)
SELECT
PersonId,
FirstDate,
[Sequence] = ROW_NUMBER() OVER (PARTITION BY personId ORDER BY FirstDate),
[Events] = COUNT(*)
FROM prep
GROUP BY personId, FirstDate;
Results
PersonId FirstDate Sequence Events
----------- ---------- -------------------- -----------
1 2015-03-30 1 3
2 2016-07-18 1 3
3 2014-10-01 1 1
3 2016-11-28 2 3
4 2017-04-13 1 2
First note all years have 365 days, nonetheless, I'm using 365 to emulate your excel logic; this would need to be updated to account for leap years. Next, like your excel formula - this will only be correct when there are two sequences;
it would not work when, say personId has a date of jan 1 2015, then jan 10 2016, then feb 1 2017.Let us know if we need logic to accommodate for the aforementioned scenarios.
Lastly this solution uses LAG which requires SQL Server 2012+, if you're working with an earlier version of SQL the query will have to be updated accordingly.