t-sql logic to find the receipt amount - tsql

I have logic to calculate the sum of receipt amount and receipt date. I am not sure whether the below t-sql works fine for my logic.
I need the max of receipt date and will go for any receipt date ie., 01-01-2014 this should give me the advance amount paid for the person id. Also if the max receipt date value is negative then I need to remove that from the calculation.
WITH rec_cte (amt,recdate) AS
(
select -40, 01-04-2014 UNION ALL
select -40,01-04-2014 UNION ALL
select 40,20-04-2013 UNION ALL
select -20,10-04-2012 UNION ALL
select 50,20-04-2011 UNION ALL
select 40,20-04-2010 UNION ALL
)
SELECT SUM(amt),recdate from rec_cte

As far as I am able to interpret your question, you are looking for SUM(amt) based on ID.
WITH rec_cte AS
(
select 1 as id,-40 as amt, '01-04-2014' as recdate UNION ALL
select 1, -40,'01-04-2014' UNION ALL
select 2, 40,'20-04-2013' UNION ALL
select 3, -20,'10-04-2012' UNION ALL
select 2, 50,'20-04-2011' UNION ALL
select 1, 40,'20-04-2010'
)
SELECT id, SUM(amt) from rec_cte
--where recdate= '01-04-2014'
group by id
having SUM(amt)>=0

Related

Find the next oldest row in Redshift

I have a table called user_activity in Redshift that has department, user_id, activity_type, activity_id, activity_date.
I'd like to query a daily report of how many days since the last event (of any type). Using CROSS APPLY (SQL Server) or LATERAL JOIN (Postgres 9+), I'd do something like...
SELECT d.date, a.last_activity_date
FROM date_table d
CROSS JOIN (
SELECT DISTINCT user_id FROM activity_table
) u
CROSS APPLY (
SELECT TOP 1 activity_date as last_activity_date
FROM activity_table
WHERE user_id = u.user_id AND activity_date <= d.date
ORDER BY activity_date DESC
) a
For now, I write it similar to the below, but it is a bit slow and I am afraid it'll only get slower.
with user_activity as (
select distinct activity_date, user_id from activity_table
)
select
d.date, u.user_id,
max(u.activity_date) as last_activity_date
from date_table d
inner join user_activity u on u.activity_date <= d.date
where d.date between '2020-01-01' and current_date
group by 1, 2
Can someone suggest a good alternative for my needs or for CROSS APPLY / LATERAL JOIN.
As you are seeing cross joining and inequality joining will slow down as you data grows and are generally not the approach you want in Redshift. This is due to the data size increase that comes with this type of action when applied to large data tables that are typical in Redshift.
You want to use window functions to perform this type of analysis. But you will need to step back and rethink how you will structure the SQL. A MAX(activity_date) window function, partitioned by user_id and ordered by date and with a frame clause of all preceding rows, will find the most recent activity to any activity.
Now this will produce only rows for user_ids and dates that exist in the data table and it looks like you want 1 row for each date for each user_id, right? To do this you need to UNION in a frame of data that has 1 row for each date for each user_id ahead of the window function. You will need NULLs in for the other columns so that the data widths match. You will also want the dates in a separate column from activity_date. Now all dates for all user ids will be in the source and the window function will give you the result you want.
You also ask ‘how is this better than the joins?’ Well in the joins you are replicating all the data records by the number of dates which can get really big. In this approach you just have the original data records plus one row per user_id per date (which is the size of your output) and as the number of records per user_id grows this approach doesn’t.
——— Request to modify asker’s code per comments made to their approach ———
Your code is definitely on the right track as you have removed the massive inequality join of your original. I made 2 comments about it. The first is that I believe you need GROUP BY user_id, date to prevent multiple rows per user_id per date that would result if there are records for the same user_id on a single date with differing activity_types. This is a simple oversight.
The second is to state that I intended for you to use UNION ALL, not LEFT JOIN, in combining the actual data and the user_id/date framework. Your approach works fine but I have found that unioning with very large amounts of data is generally faster than joining but you do need to make sure the columns match up. Either way we end up with a data segment with 3 columns - 2 date columns, one with NULLs for framework rows, and 1 user_id. Your approach is fine and the difference in performance is likely very small unless you have huge tables.
Since you asked for a rewrite, here it is with both changes. (NOTE: my laptop is in the shop so I don’t have ready access to Redshift at the moment and this SQL is untested. If the intent is not clear from this and you need me to debug it will be delayed by a few days. I’m keeping your setup methods and SQL structure.)
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select cal_date, user_id,
lag(max(activity_date), 1) ignore nulls over (partition by user_id order by date) as last_activity_date
from (
Select user_id, date as cal_date, NULL as activity_date from user_dates
Union all
Select user_id, activity_date as cal_date, activity_date from user_activity
)
Group by user_id, cal_date
)
select * from user_date_activity
order by user_id, cal_date```
This was my query based on Bill's answer.
with date_table as (
select '2000-01-01'::date as date
union all
select '2000-01-02'::date
union all
select '2000-01-03'::date
union all
select '2000-01-04'::date
union all
select '2000-01-05'::date
union all
select '2000-01-06'::date
),
users as (
select 1 as user_id
union all
select 2
union all
select 3
),
user_activity as (
select 1 as user_id, '2000-01-01'::date as activity_date
union all
select 1 as user_id, '2000-01-04'::date as activity_date
union all
select 3 as user_id, '2000-01-03'::date as activity_date
union all
select 1 as user_id, '2000-01-05'::date as activity_date
union all
select 1 as user_id, '2000-01-06'::date as activity_date
),
user_dates as (
select d.date, u.user_id
from date_table d
cross join users u
),
user_date_activity as (
select ud.date, ud.user_id,
lag(ua.activity_date, 1) ignore nulls over (partition by ud.user_id order by ud.date) as last_activity_date
from user_dates ud
left join user_activity ua on ud.date = ua.activity_date and ud.user_id = ua.user_id
)
select * from user_date_activity
order by user_id, date

Select Date and Count, Group By Date -- How to show Dates with NULL Counts?

SELECT
CAST(c.DT AS DATE) AS 'Date'
, COUNT(p.PatternID) AS 'Count'
FROM CalendarMain c
LEFT OUTER JOIN Pattern p
ON c.DT = p.PatternDate
INNER JOIN Result r
ON p.PatternID = r.PatternID
INNER JOIN Detail d
ON p.PatternID = d.PatternID
WHERE r.Type = 7
AND d.Panel = 501
AND CAST(c.DT AS DATE)
BETWEEN '20190101' AND '20190201'
GROUP BY CAST(c.DT AS DATE)
ORDER BY CAST(c.DT AS DATE)
The query above isn't working for me. It still skips days where the COUNT is NULL for it's c.DT.
c.DT and p.PatternDate are both time DateTime, although c.DT can't be NULL. It is actually the PK for the table. It is populated as DateTimes for every single day from 2015 to 2049, so the records for those days exist.
Another weird thing I noticed is that nothing returns at all when I join C.DT = p.PatternDate without a CAST or CONVERT to a Date style. Not sure why when they are both DateTimes.
There are a few things to talk about here. At this stage it's not clear what you're actually trying to count. If it's the number of "patterns" per day for the month of Jan 2019, then:
Your BETWEEN will also count any activity occurring on 1 Feb,
It looks like one pattern could have multiple results, potentially causing a miscount
It looks like one pattern could have multiple details, potentially causing a miscount
If one pattern has say 3 eligible results, and also 4 details, you'll get the cross product of them. Your count will be 12.
I'm going to assume:
you only want the distinct number of patterns, regardless of the number of details and results.
You only want January's activity
--Set up some dummy data
DROP TABLE IF EXISTS #CalendarMain
SELECT cast('20190101' as datetime) as DT
INTO #CalendarMain
UNION ALL SELECT '20190102' as DT
UNION ALL SELECT '20190103' as DT
UNION ALL SELECT '20190104' as DT
UNION ALL SELECT '20190105' as DT --etc etc
;
DROP TABLE IF EXISTS #Pattern
SELECT cast('1'as int) as PatternID
,cast('20190101 13:00' as datetime) as PatternDate
INTO #Pattern
UNION ALL SELECT 2,'20190101 14:00'
UNION ALL SELECT 3,'20190101 15:00'
UNION ALL SELECT 4,'20190104 11:00'
UNION ALL SELECT 5,'20190104 14:00'
;
DROP TABLE IF EXISTS #Result
SELECT cast(100 as int) as ResultID
,cast(1 as int) as PatternID
,cast(7 as int) as [Type]
INTO #Result
UNION ALL SELECT 101,1,7
UNION ALL SELECT 102,1,8
UNION ALL SELECT 103,1,9
UNION ALL SELECT 104,2,8
UNION ALL SELECT 105,2,7
UNION ALL SELECT 106,3,7
UNION ALL SELECT 107,3,8
UNION ALL SELECT 108,4,7
UNION ALL SELECT 109,5,7
UNION ALL SELECT 110,5,8
;
DROP TABLE IF EXISTS #Detail
SELECT cast(201 as int) as DetailID
,cast(1 as int) as PatternID
,cast(501 as int) as Panel
INTO #Detail
UNION ALL SELECT 202,1,502
UNION ALL SELECT 203,1,503
UNION ALL SELECT 204,1,502
UNION ALL SELECT 205,1,502
UNION ALL SELECT 206,1,502
UNION ALL SELECT 207,2,502
UNION ALL SELECT 208,2,503
UNION ALL SELECT 209,2,502
UNION ALL SELECT 210,4,502
UNION ALL SELECT 211,4,501
;
-- create some variables
DECLARE #start_date as date = '20190101';
DECLARE #end_date as date = '20190201'; --I assume this is an exclusive end date
SELECT cal.DT
,isnull(patterns.[count],0) as [Count]
FROM #CalendarMain cal
LEFT JOIN ( SELECT cast(p.PatternDate as date) as PatternDate
,COUNT(DISTINCT p.PatternID) as [Count]
FROM #Pattern p
JOIN #Result r ON p.PatternID = r.PatternID
JOIN #Detail d ON p.PatternID = d.PatternID
WHERE r.[Type] = 7
and d.Panel = 501
GROUP BY cast(p.PatternDate as date)
) patterns ON cal.DT = patterns.patternDate
WHERE cal.DT >= #start_date
and cal.DT < #end_date --Your code would have included 1 Feb, which I assume was unintentional.
ORDER BY cal.DT
;

T-SQL End of Month sum

I have a table with some transaction fields, primary id is a CUSTomer field and a TXN_DATE and for two of them, NOM_AMOUNT and GRS_AMOUNT I need an EndOfMonth SUM (no rolling, just EOM, can be 0 if no transaction in the month) for these two amount fields. How can I do it? I need also a 0 reported for months with no transactions..
Thank you!
If you group by the expresion month(txn_date) you can calculate the sum. If you use a temporary table with a join on month you can determine which months have no records and thus report a 0 (or null if you don't use the coalesce fiunction).
This will be your end result, I assume you are able to add the other column you need to sum and adapt for your schema.
select mnt as month
, sum(coalesce(NOM_AMOUNT ,0)) as NOM_AMOUNT_EOM
, sum(coalesce(GRS_AMOUNT ,0)) as GRS_AMOUNT_EOM
from (
select 1 as mnt
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
union all select 10
union all select 11
union all select 12) as m
left outer join Table1 as t
on m.mnt = month(txn_date)
group by mnt
Here is the initial working sqlfiddle

Selecting distinct substring values

I have a field that is similar to a MAC address in that the first part is a group ID and the second part is a serial number. My field is alphanumeric and 5 digits in length, and the first 3 are the group ID.
I need a query that gives me all distinct group IDs and the first serial number lexicographically. Here is sample data:
ID
-----
X4MCC
X4MEE
X4MFF
V21DD
8Z6BB
8Z6FF
Desired Output:
ID
-----
X4MCC
V21DD
8Z6BB
I know I can do SELECT DISTINCT SUBSTRING(ID, 1, 3) but I don't know how to get the first one lexicographically.
Another way which seems to have the same cost as the query by gbn:
SELECT MIN(id)
FROM your_table
GROUP BY SUBSTRING(id, 1, 3);
SELECT
ID
FROM
(
SELECT
ID,
ROW_NUMBER() OVER (PARTITION BY SUBSTRING(ID, 1, 3) ORDER BY ID) AS rn
FROM MyTable
) oops
WHERE
rn = 1

t-sql return multiple rows depending on field value

i am trying to run an export on a system that only allows t-sql. i know enough of php to make a foreach loop, but i don't know enough of t-sql to generate multiple rows for a given quantity.
i need a result to make a list of items with "1 of 4" like data included in the result
given a table like
orderid, product, quantity
1000,ball,3
1001,bike,4
1002,hat,2
how do i get a select query result like:
orderid, item_num, total_items,
product
1000,1,3,ball
1000,2,3,ball
1000,3,3,ball
1001,1,4,bike
1001,2,4,bike
1001,3,4,bike
1001,4,4,bike
1002,1,2,hat
1002,2,2,hat
You can do this with the aid of an auxiliary numbers table.
;WITH T(orderid, product, quantity) AS
(
select 1000,'ball',3 union all
select 1001,'bike',4 union all
select 1002,'hat',2
)
SELECT orderid, number as item_num, quantity as total_items, product
FROM T
JOIN master..spt_values on number> 0 and number <= quantity
where type='P'
NB: The code above uses the master..spt_values table - this is just for demo purposes I suggest you create your own tally table using one of the techniques here.
If you are on SQL Server 2005 or later version, then you can try a recursive CTE instead of a tally table.
;WITH CTE AS
(
SELECT orderid, 1 item_num, product, quantity
FROM YourTable
UNION ALL
SELECT orderid, item_num+1, product, quantity
FROM CTE
WHERE item_num < quantity
)
SELECT *
FROM CTE
OPTION (MAXRECURSION 0)
I'm not on a computer with a database engine where I can test this, so let me know how it goes.
Well, IF you know the maximum value for the # of products for any product (and it's not too big, say 4), you can:
Create a helper table called Nums containing 1 integer column n, with rows containing 1,2,3,4
Run
SELECT * from Your_table, Nums
WHERE Nums.n <= Your_table.quantity