select first order for each customer from two tables - tsql

Hi guys I have two tables dbo.Sales (customer_id, order_date, product_id) and dbo.Menu (Product_id, product_name, price). The question is
What was the first item from the menu purchased by each customer?
My solution is
select A.customer_id,m.product_id, m.product_name
from dbo.menu m
cross apply
(select top 1 * from dbo.sales s
where s.product_id=m.product_id
group by s.customer_id,s.order_date, s.product_id
order by s.order_date) A
customer_id product_id product_name
A 1 sushi
A 2 curry
C 3 ramen
Missing customer is B. Instead of B it gives me the second first order by A.
I need for each customer
Murat

You could use a ROW_NUMBER() window function to get the earliest product_id per customer and then join to the Menu table to get your product details.
Edit: Updated ORDER to ASC.
;with cte
as (
select customer_id, product_id, row_number() over (partition by customer_id order by order_date acs) RN
from dbo.Sales)
select c.customer_id, c.product_id, m.product_name
from cte c
join dbo.menu m on c.product_id=m.product_id
where RN = 1

SELECT distinct s.customer_id,
FIRST_VALUE(m.product_name) OVER (partition by s.customer_id order by order_date )
as FirstItem_Customer
FROM [dbo].[sales] S
join [dbo].[menu] M on M.product_id=s.product_id

Related

SQL Debugging Help Needed

I am writing a query in Redshift to answer the question "Give the average lifetime spend of users who spent more on their first order than their second order." This is based off of an order_items table which has one row for every item ordered (so an order with 3 items would be represented in 3 rows). Here's a snapshot of the first 10 rows:
First 10 rows of order_items:
Here is my solution:
with
cte1_lifetime as (
select oi.user_id, sum(oi.sale_price) as lifetime_spend
from order_items as oi
group by oi.user_id
),
cte2_order as (
select oi.user_id, oi.order_id, sum(oi.sale_price) as order_total, rank() over(partition by oi.user_id order by oi.created_at) as order_rank
from order_items as oi
group by oi.user_id, oi.order_id, oi.created_at
order by oi.user_id, oi.order_id
),
cte3_first_order as (
select user_id, order_id, order_total
from cte2_order
where order_rank=1
order by user_id, order_id
),
cte4_second_order as (
select user_id, order_id, order_total
from cte2_order
where order_rank=2
order by user_id, order_id
)
select avg(cte1.lifetime_spend) as average_lifetime_spend
from cte1_lifetime as cte1
where exists (
select *
from cte3_first_order as cte3, cte4_second_order as cte4
where cte3.user_id=cte4.user_id
and cte1.user_id=cte3.user_id
and cte3.order_total > cte4.order_total)
And here is the answer key:
WITH
table1 AS
(SELECT user_id, order_id,
SUM(sale_price) OVER (PARTITION BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as order_total,
RANK() OVER (PARTITION BY user_id ORDER BY created_at) AS "sequence"
FROM order_items)
,
table2 AS
(SELECT user_id, SUM(sale_price) AS lifetime_spend
FROM order_items
WHERE EXISTS
(SELECT t1.user_id
FROM table1 t1, table1 t2
WHERE t1.user_id = t2.user_id AND t1.sequence = 1 AND t2.sequence = 2 AND t1.order_total>t2.order_total
AND t1.user_id = order_items.user_id)
GROUP BY 1
ORDER BY 1)
SELECT AVG(lifetime_spend)
FROM table2
These answers yield slightly different results on the same data- an average lifetime spend of $215 vs $220. I'd really like to understand why they are different but so far I can't figure it out. Any ideas?

PostgreSQL command : using the result obtained from first Query and using it In second Query : write as single query

SELECT partner_id
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
GROUP BY ts.partner_id
From the query we can get the partners id.Using that partner id we want check in trip delicery sales lines table and want to find each customer last two sale product quantity sum. If last two sale have product qty as 2 & 5 want result as partner_id | count as Mn2333 - 7
here fore example i take partner id as 34806. But i want to check all partner_id obtained from last query
SELECT product_qty
FROM trip_delivery_sales_lines td
WHERE td.partner_id='34806'
AND td.route_id='152'
AND td.product_id='432'
ORDER BY td.order_date DESC
LIMIT 2
You can run this query
SELECT td.partner_id,sum(product_qty)
FROM trip_delivery_sales_lines td,
(SELECT partner_id FROM trip_delivery_sales ts WHERE ts.route_id='152') as ts
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
GROUP BY td.partner_id
ORDER BY td.order_date DESC
LIMIT 2
Or this one
with ts as (SELECT distinct partner_id FROM trip_delivery_sales WHERE route_id='152')
SELECT td.partner_id,sum(product_qty)
FROM trip_delivery_sales_lines td,ts
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
GROUP BY td.partner_id
ORDER BY td.order_date DESC
LIMIT 2
You might be looking for
SELECT DISTINCT ts.partner_id, ARRAY(
SELECT product_qty
FROM trip_delivery_sales_lines td
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
ORDER BY td.order_date DESC
LIMIT 2
) AS product_qty_arr
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
or just
SELECT
partner_id,
array_agg(product_qty ORDER BY order_date DESC) as product_qty_arr
FROM (
SELECT
td.partner_id,
td.product_qty,
td.order_date,
row_number() OVER (PARTITION BY td.partner_id ORDER BY td.order_date DESC)
FROM trip_delivery_sales_lines td
JOIN trip_delivery_sales ts USING (partner_id)
WHERE ts.route_id='152'
AND td.product_id='432'
) AS enumerated
WHERE row_number <= 2
GROUP BY partner_id
See also PostgreSQL: top n entries per item in same table or Optimize GROUP BY query to retrieve latest row per user

Selecting the 1st and 10th Records Only

Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)

postgres - get top category purchased by customer

I have a denormalized table with the columns:
buyer_id
order_id
item_id
item_price
item_category
I would like to return something that returns 1 row per buyer_id
buyer_id, sum(item_price), item_category
-- but ONLY for the category with the highest rank of sales along that specific buyer_id.
I can't get row_number() or partition to work because I need to order by the sum of item_price relative to item_category relative to buyer. Am I overlooking anything obvious?
You need a few layers of fudging here:
SELECT buyer_id, item_sum, item_category
FROM (
SELECT buyer_id,
rank() OVER (PARTITION BY buyer_id ORDER BY item_sum DESC) AS rnk,
item_sum, item_category
FROM (
SELECT buyer_id, sum(item_price) AS item_sum, item_category
FROM my_table
GROUP BY 1, 3) AS sub2) AS sub
WHERE rnk = 1;
In sub2 you calculate the sum of 'item_price' for each 'item_category' for each 'buyer_id'. In sub you rank these with a window function by 'buyer_id', ordering by 'item_sum' in descending order (so the highest 'item_sum' comes first). In the main query you select those rows where rnk = 1.

selecting only two employees from every department

Can you let me know how to select only two employees from every department? The table has deptname, ssn, name . I am doing a sampling and I need only two ssns for every department name. Can someone help?
You can accomplish this with an "OLAP expression" row_number()
with e as
( select deptname, ssn, empname,
row_number() over (partition by dptname order by empname) as pick
from employees
)
select deptname, ssn, empname
from e
where pick < 3
order by deptname, ssn
This example will give you the two employees with the lowest order names, because that is what is specified in the row_number() (order by) expression.
Try this:
select *
from t t1
where (
select count(*)
from t t2
where
t2.deptname = t1.deptname
and
t2.ssn <= t1.ssn) <= 2
order by deptname, ssn,name;
The above will give "smallest" two ssn.
If you want top 2, change to t2.ssn >= t1.ssn
sqlfiddle
The data:
The result from query:
select * from
( select rank() over (partition by dptname order by empname) as count , *
from employees
)
where count<=2
order by deptname, ssn,name;