Calculated balance of purchased lots - tsql

I have a list of purchases by date. EG:
ItemCode, Purchase Date, Purchase Qty
XXX, 01 Jan 2012, 10
XXX, 10 Jan 2012, 5
For the item I have a corresponding Sales transactions:
Item, Sales Date, Sales Qty
XXX, 02 Jan 2012, -5
XXX, 09 Jan 2012, -3
XXX, 11 JAN 2012, -3
I am looking to get a SQL query (Without a cursor), to get the balance on each purchase order quantity. I.e Run each purchase (First in first out) to 0. (For the purposes of aging inventory )
How can you join the Purchases to the Sales to get this balance remaining each purchased Inventory Lot? Is this possible without a cursor?

Yes.
You union the two tables together, and run a running total on the resulting set.
;with cte as
(
select itemcode, purchasedate as tdate, purchaseqty as qty from purchases
union
select itemcode, salesdate, salesqty from sales
)
select
t1.*,
SUM(t2.qty)
from cte t1
left join cte t2
on t1.tdate>=t2.tdate
and t1.item = t2.item
group by t1.item, t1.pdate, t1.qty
To get the stock remaining at any particular time the same principal applies.
select p1.*,
case when (select SUM(abs(qty)) from sales) > SUM(p2.qty) then 0
else SUM(p2.qty) - (select SUM(abs(qty)) from sales) end as stockremaining
from purchases p1
left join purchases p2 on p1.item = p2.item
and p2.purchasedate <= p1.purchasedate
group by p1.purchasedate, p1.item, p1.qty
gives
1 2012-01-01 10 0
1 2012-01-10 5 4

Related

Cohort Analysis with RedShift by Month

I am trying to build a cohort analysis for monthly retention but experiencing challenge getting the Month Number column right. The month number is supposed to return month(s) user transacted i.e 0 for registration month, 1 for the first month after registration month, 2 for the second month until the last month but currently, it returns negative month numbers in some cells.
It should be like this table:
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
Here is the SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
I think the RANK window function is the right solution. So the idea is to assigne a rank to months of user activities for each user, order by year and month.
Something like:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
RANK number starts from 1, so you may want to substract 1.
Then, you can group by user_id and month_number to get the number of interactions for each user per month from the subscription (adapt to your use case accordingly).
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2
Here is the documentation:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html

Redshift - Find average sales by month

I have the below query that find count of sales done by month.
select to_char(sale_date,'Mon') as mon,
count (*) as "Sales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1
I am trying to find the average sales done by month. How could I modify the above query to get this output. I am using Redshift.
You may query a CTE and take the average:
WITH cte AS (
select to_char(sale_date,'Mon') as mon, count (*) as "Sales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1
)
SELECT AVG(Sales)
FROM cte;
Note that ideally you should be grouping by year and month, because a given month can belong to more than one year. If you wanted to keep your current query, but include the average over all months, then you could try:
select
to_char(sale_date,'Mon') as mon,
extract(year from sale_date) as year,
count (*) as "Sales",
avg(count(*)) over () "AvgSales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1, 2;

How to join many to many and keep the same total amount

I have two data-sets. Left data-set has the same QuoteID, PolicyNumber, but can be different Year, Month and PaidLosses.
Second data-set has different QuoteID, same PolicyNumber different year, and different Month and also can be multiple ClassCode.
I need to join first data-set with second one and keep the same PaidLosses. Main goal is to keep the same total PaidLosses by each month. I know its probably not very business proper, but that's what boss wants to see.
This is what I tried so far:
select
cte1.PolicyNumber,
AccidentYear,
AccidentMonth,
cte2.ClassCode,
/*
Using ROW_NUMBER() to check if it's the first record in the join and returns
the PaidLosses value if so, otherwise it will display 0. The ORDER BY (SELECT 0)
is there just because I don't need the row number to be based on any explicit
order.
*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY cte1.QuoteID, cte1.PolicyNumber ORDER BY (SELECT 0))=1 THEN cte1.PaidLosses
ELSE 0
END as PaidLosses
from cte1 inner join cte2 on cte1.PolicyNumber=cte2.PolicyNumber AND cte1.QuoteID=cte2.QuoteID AND cte1.AccidentYear=cte2.LossYear
AND cte1.AccidentMonth=cte2.LossMonth
But for some reason it doesnt pickup some of the Policies.
Ideally I would like to see something like that:
Have Paid Losses on the first row,
but then If the ClassCode repeats for same Policy, QuoteID, Year and Month then have 0.
I think you should partition also by cte1.AccidentYear, cte1.AccidentMonth.
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY cte1.QuoteID, cte1.PolicyNumbe cte2.LossYear, cte2.AccidentMonth ORDER BY (SELECT 0))=1 THEN cte1.PaidLosses
ELSE 0
END as PaidLosses.
Result would be:
QuoteId PolicyNumber AccidentYear AccidentMonth ClassCode
PaidLosses
191289 PACA1001776-0 2015 4 50228 26657
191289 PACA1001776-0 2015 4 67228 0
191289 PACA1001776-0 2015 9 50228 16718
191289 PACA1001776-0 2015 9 67228 0
191289 PACA1001776-0 2016 1 50228 3445
191289 PACA1001776-0 2016 1 67228 0
Is that wnat you need?

T-SQL - Data Islands and Gaps - How do I summarise transactional data by month?

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.
Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).
SELECT
ReportId,
TransactionId,
CurrentProductionHours,
ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
ORDER BY TransactionTimestamp desc
) AS TransactionByMonth
FROM
tblSource
The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.
Source Data:
ReportId TransactionTimestamp CurrentProductionHours
1 2014-01-05 13:37:00 14.50
1 2014-01-20 09:15:00 15.00
1 2014-01-21 10:20:00 10.00
2 2014-01-22 09:43:00 22.00
1 2014-02-02 08:50:00 12.00
Target Results:
ReportId Month Year ProductionHours
1 1 2014 10.00
2 1 2014 22.00
1 2 2014 12.00
2 2 2014 22.00
I should also mention that I have a date table available, which can be referenced if required.
** UPDATE 05/03/2014 **
I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?
ReportId Month Year ProductionHours
1 1 2014 10.00
1 2 2014 12.00
1 3 2014 NULL
2 1 2014 22.00
2 2 2014 NULL
2 3 2014 NULL
Any advice about how to tackle this would be greatly appreciated!
Try this:
;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp,
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId,
month(dateadd(m, -1, c.TT)) Month,
year(dateadd(m, -1, c.TT)) Year,
x.CurrentProductionHours
from c
cross apply
(select top 1 CurrentProductionHours from tblSource
where TransactionTimestamp < c.TT
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
A similar approach but using a cartesian to obtain all the combinations of report ids/months.
in the first step.
A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row.
Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.
;
WITH allcombinations -- Cartesian (reportid X yearmonth)
AS ( SELECT reportid ,
yearmonth
FROM ( SELECT DISTINCT
reportid
FROM tblSource
) a
JOIN ( SELECT DISTINCT
DATEPART(yy, transactionTimestamp)
* 100 + DATEPART(MM,
transactionTimestamp) yearmonth
FROM tblSource
) b ON 1 = 1
),
maxdates --add correlated max timestamp where the month is less or equal to the month in current record
AS ( SELECT a.* ,
( SELECT MAX(transactionTimestamp)
FROM tblSource t
WHERE t.reportid = a.reportid
AND DATEPART(yy, t.transactionTimestamp)
* 100 + DATEPART(MM,
t.transactionTimestamp) <= a.yearmonth
) maxtstamp
FROM allcombinations a
)
-- join previous data to the source table by reportid and timestamp
SELECT distinct m.reportid ,
m.yearmonth ,
t.CurrentProductionHours
FROM maxdates m
JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
m.yearmonth

TSQL: over clause

Please help me undestand how order by influences to over clause. I have read msdn and one book and still misunderstood.
Let's say we have such query:
SELECT Count(OrderID) over(Partition By Year(OrderDate))
,*
FROM [Northwind].[dbo].[Orders]
ORDER BY OrderDate
The result is that each raw has the column with the value how many entries in the table have the same year.
alt text http://img-fotki.yandex.ru/get/3912/svin80.2/0_3b871_3bb591da_XL
But what's happened when i try this query?:
SELECT ROW_NUMBER() over(Partition By Year(OrderDate)
order by OrderDate) as RowN
,*
FROM [Northwind].[dbo].[Orders]
ORDER BY RowN
alt text http://img-fotki.yandex.ru/get/3908/svin80.2/0_3b872_c9352fb1_XL
Now I see the only thing that each RowN has 3 different years for each value (1996, 1997, 1998). I expected that RowN will be the same value for all 1996 year dates. Please explain me what happens and why.
In this case:
SELECT ROW_NUMBER() over(Partition By Year(OrderDate)
order by OrderDate) as RowN,*
FROM [Northwind].[dbo].[Orders]
order by RowN
What you're seeing it it's giving you a row number that is partitioned by year, meaning that each year has it's own climbing row number. To make this a bit cleaerer in the results:
SELECT ROW_NUMBER() over(Partition By Year(OrderDate)
order by OrderDate) as RowN,*
FROM [Northwind].[dbo].[Orders]
order by RowN, Year(OrderDate)
This means that each year, say 1997, will have orders 1 through n ordered by the date that year...like this was the 1st order of 1997, 2nd order of 1997, etc.
The results will make far more sense if you do this:
SELECT
Year(OrderDate),
ROW_NUMBER() over(Partition By Year(OrderDate)order by OrderDate) as RowN,
*
FROM [Northwind].[dbo].[Orders]
ORDER BY Year(OrderDate), RowN
Now you can see that each year has increasing row numbers starting from 1, ordered by order date:
Year RowN Order Date
1997 1 10400 1997-01-01 00:00:00
1997 2 10401 1997-01-01 00:00:00
1997 3 10402 1997-01-02 00:00:00
...
1998 1 10808 1998-01-01 00:00:00
1998 2 10809 1998-01-01 00:00:00
1998 3 10810 1998-01-01 00:00:00
...