tsql math across multiple dates in a table - tsql

I have a #variabletable simply defined as EOMDate(datetime), DandA(float), Coupon(float), EarnedIncome(float)
04/30/2008, 20187.5,17812.5,NULL
05/31/2008, 24640.63, 22265.63, NULL
06/30/2008, 2375, 26718.75,NULL
What I am trying to do is after the table is populated, I need to go back and calculate the EarnedIncome field to populate it.
the formula is DandA for the current month minus DandA for the previous month plus coupon.
Where I am having trouble is how can I do the update? So for 6/30 the value should be 4453.12 (2375-24640.63)+26718.75
I'll gladly take a clubbing over the head to get this resolved. thanks. Also, running under MS SQL2005 so any CTE ROW_OVER type solution can be used if possible.

You would need to use a subquery like this:
UPDATE #variabletable v1
SET EarnedIncome = DandA
- (SELECT DandA FROM #variabletable v2 WHERE GetMonthOnly(DATEADD(mm, -1, v2.EOMDate)=GetMonthOnly(v1.EOMDate))
+ Coupon
And I was making use of this helper function
DROP FUNCTION GetMonthOnly
GO
CREATE FUNCTION GetMonthOnly
(
#InputDate DATETIME
)
RETURNS DATETIME
BEGIN
RETURN CAST(CAST(YEAR(#InputDate) AS VARCHAR(4)) + '/' +
CAST(MONTH(#InputDate) AS VARCHAR(2)) + '/01' AS DATETIME)
END
GO

There's definitely quite a few ways to do this. You'll find pros and cons depending on how large your data set is, and other factors.
Here's my recommendation...
Declare #table as table
(
EOMDate DateTime,
DandA float,
Coupon Float,
EarnedIncome Float
)
Insert into #table Values('04/30/2008', 20187.5,17812.5,NULL)
Insert into #table Values('05/31/2008', 24640.63, 22265.63, NULL)
Insert into #table Values('06/30/2008', 2375, 26718.75,NULL)
--If we know that EOMDate will only contain one entry per month, and there's *always* one entry a month...
Update #Table Set
EarnedIncome=DandA-
(Select top 1 DandA
from #table t2
where t2.EOMDate<T1.EOMDate
order by EOMDate Desc)+Coupon
From #table T1
Select * from #table
--If there's a chance that there could be more per month, or we only want the values from the previous month (do nothing if it doesn't exist)
Update #Table Set
EarnedIncome=DAndA-(
Select top 1 DandA
From #table T2
Where DateDiff(month, T1.EOMDate, T2.EOMDate)=-1
Order by EOMDate Desc)+Coupon
From #Table T1
Select * from #table
--Leave the null, it's good for the data (since technically you cannot calculate it without a prior month).
I like the second method best because it will only calculate if there exists a record for the preceding month.
(add the following to the above script to see the difference)
--Add one for August
Insert into #table Values('08/30/2008', 2242, 22138.62,NULL)
Update #Table Set
EarnedIncome=DAndA-(
Select top 1 DandA
From #table T2
Where DateDiff(month, T1.EOMDate, T2.EOMDate)=-1
Order by EOMDate Desc
)+Coupon
From #Table T1
--August is Null because there's no july
Select * from #table
It's all a matter of exactly what do you want.
Use the record directly proceding the current record (regardless of date), or ONLY use the record that is a month before the current record.
Sorry about the format... Stackoverflow.com's answer editor and I do not play nice together.
:D

You can use a subquery to perform the calcuation, the only problem is what do you do with the first month because there is no previous DandA value. Here I've set it to 0 using isnull. The query looks like
Update MyTable
Set EarnedIncome = DandA + Coupon - IsNull( Select Top 1 DandA
From MyTable2
Where MyTable.EOMDate > MyTable2.EOMDate
Order by MyTable2.EOMDate desc), 0)
This also assumes that you only have one record per month in each table, and that there are't any gaps between months.

Another alternative is to calculate the running total when you are inserting your data, and have a constraint guarantee that your running total is correct:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

There may be a way to do this in a single statement, but in cases like this, I'd be inclined to set up a cursor to walk through each row, computing the new EarnedIncome field for that row, update the row, and then move to the next row.
Ex:
DECLARE #EOMDateVal DATETIME
DECLARE #EarnedIncomeVal FLOAT
DECLARE updCursor CURSOR FOR
SELECT EOMDate FROM #variabletable
OPEN updCursor
FETCH NEXT FROM updCursor INTO #EOMDateVal
WHILE ##FETCH_STATUS = 0
BEGIN
// Compute #EarnedIncomeVal for this row here.
// This also gives you a chance to catch data integrity problems
// that would cause you to fail the whole batch if you compute
// everything in a subquery.
UPDATE #variabletable SET EarnedIncome = #EarnedIncomeVal
WHERE EOMDate = #EOMDateVal
FETCH NEXT FROM updCursor INTO #EOMDateVal
END
CLOSE updCursor
DEALLOCATE updCursor

Related

How to reference output rows with window functions?

Suppose I have a table with quantity column.
CREATE TABLE transfers (
user_id integer,
quantity integer,
created timestamp default now()
);
I'd like to iteratively go thru a partition using window functions, but access the output rows, not the input table rows.
To access the input table rows I could do something like this:
SELECT LAG(quantity, 1, 0)
OVER (PARTITION BY user_id ORDER BY created)
FROM transfers;
I need to access the previous output row to calculate the next output row. How can i access the lag row in the output? Something like:
CREATE VIEW balance AS
SELECT LAG(balance.total, 1, 0) + quantity AS total
OVER (PARTITION BY user_id ORDER BY created)
FROM transfers;
Edit
This is a minimal example to support the question of how to access the previous output row within a window partition. I don't actually want a sum.
It seems you attempt to calculate a running sum. Luckily that's just what Sum() window function does:
WITH transfers AS(
SELECT i, random()-0.3 AS quantity FROM generate_series(1,100) as i
)
SELECT i, quantity, sum(quantity) OVER (ORDER BY i) from transfers;
I guess, looking at the question, that the only you need is to calculate a cumulative sum.
To calculate a cumulative summ use this query:
SELECT *,
SUM( CASE WHEN quantity IS NULL THEN 0 ELSE quantity END)
OVER ( PARTITION BY user_id ORDER BY created
ROWS BETWEEN unbounded preceding AND current row
) As cumulative_sum
FROM transfers
ORDER BY user_id, created
;
But if you want more complex calculations, especially containing some conditions (decisions) that depend on a result from prevoius row, then you need a recursive approach.

postgresql find preceding and following timestamp to arbitrary timestamp

Given an arbitrary timestamp such as 2014-06-01 12:04:55-04 I can find in sometable the timestamps just before and just after. I then calculate the elapsed number of seconds between those two with the following query:
SELECT EXTRACT (EPOCH FROM (
(SELECT time AS t0
FROM sometable
WHERE time < '2014-06-01 12:04:55-04'
ORDER BY time DESC LIMIT 1) -
(SELECT time AS t1
FROM sometable
WHERE time > '2014-06-01 12:04:55-04'
ORDER BY time ASC LIMIT 1)
)) as elapsedNegative;
`
It works, but I was was wondering if there was another more elegant or astute way to achieve the same result? I am using 9.3. Here is a toy database.
CREATE TABLE sometable (
id serial,
time timestamp
);
INSERT INTO sometable (id, time) VALUES (1, '2014-06-01 11:59:37-04');
INSERT INTO sometable (id, time) VALUES (1, '2014-06-01 12:02:22-04');
INSERT INTO sometable (id, time) VALUES (1, '2014-06-01 12:04:49-04');
INSERT INTO sometable (id, time) VALUES (1, '2014-06-01 12:07:35-04');
INSERT INTO sometable (id, time) VALUES (1, '2014-06-01 12:09:53-04');
Thanks for any tips...
update Thanks to both #Joe Love and #Clément Prévost for interesting alternatives. Learned a lot on the way!
Your original query can't be more effective given that the sometable.time column is indexed, your execution plan should show only 2 index scans, which is very efficient (index only scans if you have pg 9.2 and above).
Here is a more readable way to write it
WITH previous_timestamp AS (
SELECT time AS time
FROM sometable
WHERE time < '2014-06-01 12:04:55-04'
ORDER BY time DESC LIMIT 1
),
next_timestamp AS (
SELECT time AS time
FROM sometable
WHERE time > '2014-06-01 12:04:55-04'
ORDER BY time ASC LIMIT 1
)
SELECT EXTRACT (EPOCH FROM (
(SELECT * FROM next_timestamp)
- (SELECT * FROM previous_timestamp)
))as elapsedNegative;
Using CTE allow you to give meaning to a subquery by naming it. Explicit naming is a well known and recognised coding best practice (use explicit names, don't abbreviate and don't use over generic names like "data" or "value").
Be warned that CTE are optimisation "fences" and sometimes get in the way of planner optimisation
Here is the SQLFiddle.
Edit: Moved the extract from the CTE to the final query so that PostgreSQL can use a index only scan.
This solution will likely perform better if the timestamp column does not have an index. When 9.4 comes out we can do it a little shorter by using aggregate filters.
This should be a bit bit faster as it's running 1 full table scan instead of 2, however it may perform worse, if your timestamp column is indexed and you have a large dataset.
Here's the example without the epoch conversion to make it more easy to read.
select
min(
case when start_timestamp > current_timestamp
then
start_timestamp
else 'infinity'::timestamp
end
),
max(
case when t1.start_timestamp < current_timestamp
then
start_timestamp
else '-infinity'::timestamp
end
)
from my_table as t1
And here's the example including the math and epoch extraction:
select
extract (EPOCH FROM (
min(
case when start_timestamp > current_timestamp
then
start_timestamp
else 'infinity'::timestamp
end
)-
max(
case when start_timestamp < current_timestamp
then
start_timestamp
else '-infinity'::timestamp
end
)))
from snap.offering_event
Please let me know if you need further details-- I'd recommend trying my code vs yours and seeing how it performs.

TSQL - Control a number sequence

Im a new in TSQL.
I have a table with a field called ODOMETER of a vehicle. I have to get the quantity of km in a period of time from 1st of the month to the end.
SELECT MAX(Odometer) - MIN(Odometer) as TotalKm FROM Table
This will work in ideal test scenary, but the Odomometer can be reset to 0 in anytime.
Someone can help to solve my problem, thank you.
I'm working with MS SQL 2012
EXAMPLE of records:
Date Odometer value
datetime var, 37210
datetime var, 37340
datetime var, 0
datetime var, 220
Try something like this using the LAG. There are other ways, but this should be easy.
EDIT: Changing the sample data to include records outside of the desired month range. Also simplifying that Reading for easy hand calc. Will shows a second option as siggested by OP.
DECLARE #tbl TABLE (stamp DATETIME, Reading INT)
INSERT INTO #tbl VALUES
('02/28/2014',0)
,('03/01/2014',10)
,('03/10/2014',20)
,('03/22/2014',0)
,('03/30/2014',10)
,('03/31/2014',20)
,('04/01/2014',30)
--Original solution with WHERE on the "outer" SELECT.
--This give a result of 40 as it include the change of 10 between 2/28 and 3/31.
;WITH cte AS (
SELECT Reading
,LAG(Reading,1,Reading) OVER (ORDER BY stamp ASC) LastReading
,Reading - LAG(Reading,1,Reading) OVER (ORDER BY stamp ASC) ChangeSinceLastReading
,CONVERT(date, stamp) stamp
FROM #tbl
)
SELECT SUM(CASE WHEN Reading = 0 THEN 0 ELSE ChangeSinceLastReading END)
FROM cte
WHERE stamp BETWEEN '03/01/2014' AND '03/31/2014'
--Second option with WHERE on the "inner" SELECT (within the CTE)
--This give a result of 30 as it include the change of 10 between 2/28 and 3/31 is by the filtered lag.
;WITH cte AS (
SELECT Reading
,LAG(Reading,1,Reading) OVER (ORDER BY stamp ASC) LastReading
,Reading - LAG(Reading,1,Reading) OVER (ORDER BY stamp ASC) ChangeSinceLastReading
,CONVERT(date, stamp) stamp
FROM #tbl
WHERE stamp BETWEEN '03/01/2014' AND '03/31/2014'
)
SELECT SUM(CASE WHEN Reading = 0 THEN 0 ELSE ChangeSinceLastReading END)
FROM cte
I think Karl solution using LAG is better than mine, but anyway:
;WITH [Rows] AS
(
SELECT o1.[Date], o1.[Value] as CurrentValue,
(SELECT TOP 1 o2.[Value]
FROM #tbl o2 WHERE o1.[Date] < o2.[Date]) as NextValue
FROM #tbl o1
)
SELECT SUM (CASE WHEN [NextValue] IS NULL OR [NextValue] < [CurrentValue] THEN 0 ELSE [NextValue] - [CurrentValue] END )
FROM [Rows]

Difference between dates in different rows

Hy
my problem is, that I need the average time between a chargebegin & chargeend row (timestampserver) grouped by stationname and connectornumber and day.
The main problem is, that i can not use a Max oder Min function because I have the same stationname/connecternumber combination several times in the table.
So in fact I have to select the first chargebegin and find the next chargeend (the one with the same station/connectornumber combination and the min(id) > chargebegin.id) to get the difference.
I tried a lot but in fact i have no idea how to do this.
Database is postgresql 9.2
Testdata:
create table datatable (
id int,
connectornumber int,
message varchar,
metercount int,
stationname varchar,
stationuser varchar,
timestampmessage varchar,
timestampserver timestamp,
authsource varchar
);
insert into datatable values (181,1,'chargebegin',4000,'100','FCSC','2012-10-10 16:39:10','2012-10-10 16:39:15.26');
insert into datatable values (182,1,'chargeend',4000,'100','FCSC','2012-10-10 16:39:17','2012-10-10 16:39:28.379');
insert into datatable values (184,1,'chargebegin',4000,'100','FCSC','2012-10-11 11:06:31','2012-10-11 11:06:44.981');
insert into datatable values (185,1,'chargeend',4000,'100','FCSC','2012-10-11 11:16:09','2012-10-11 11:16:10.669');
insert into datatable values (191,1,'chargebegin',4000,'100','MSISDN_100','2012-10-11 13:38:19','2012-10-11 13:38:26.583');
insert into datatable values (192,1,'chargeend',4000,'100','MSISDN_100','2012-10-11 13:38:53','2012-10-11 13:38:55.631');
insert into datatable values (219,1,'chargebegin',4000,'100','MSISDN_','2012-10-12 11:38:03','2012-10-12 11:38:29.029');
insert into datatable values (220,1,'chargeend',4000,'100','MSISDN_','2012-10-12 11:40:14','2012-10-12 11:40:18.635');
This might have some syntax errors as I can't test it right now, but you should get an idea, how to solve it.
with
chargebegin as (
select
stationname,
connectornumber,
timestampserver,
row_number() over(partition by stationname, connectornumber order by timestampserver) as rn
from
datatable
where
message = 'chargebegin'
),
chargeend as (
select
stationname,
connectornumber,
timestampserver,
row_number() over(partition by stationname, connectornumber order by timestampserver) as rn
from
datatable
where
message = 'chargeend'
)
select
stationname,
connectornumber,
avg(b.timestampserver - a.timestampserver) as avg_diff
from
chargebegin a
join chargeend b using (stationname, connectornumber, rn)
group by
stationname,
connectornumber
This assumes that there is always end event for begin event and that these event cannot overlap (means that for stationname and connectornumber, there can be only one connection at any time). Therefore you can user row_number() to get matching begin/end events and then do whatever calculation is needed.

T-SQL how to calculate palmaresand create a view based on one table and a view?

Hy,
I have these objects in a bridge card game:
Tabble:
Series
SerieID
Title
DateAdded
View:
BoardCounter
BoardID
SerieID
Counter
The counter is the number of time each board has been played.
Now I need to establish a ranking based on the counter divided by the number of days of presence.
I figured out easily how to calculate the days:
select serieid, date, datediff(day, DateAdded, GETDATE()) Days, Title from auboardseries
But not being that much a T-SQL geek (mostly used to LINQ), I need help to create the new view.
Thanks
Here's my stab at this based on a whole bunch of assumptions I've made about your question.
Here's my bootstrapping code to get sample data:
Your series table
DECLARE #series table
(
SeriesID int,
Title varchar(10),
DateAdded datetime
)
INSERT #series
SELECT 1,'series 1','10-10-2011'
UNION
SELECT 2,'series 2','8-01-2011'
UNION
SELECT 3,'series 3','11-30-2011'
Your BoardCounter View (it's a table here but that shouldn't matter and you said you've got that figured out already)
DECLARE #BoardCounter table
(
BoardID INT,
SeriesID INT,
[COUNTER] INT
)
INSERT #BoardCounter
SELECT 1,1,1000
UNION
SELECT 1,2,800
UNION
SELECT 1,3,600
UNION
SELECT 2,1,2000
UNION
SELECT 2,2,1600
UNION
SELECT 2,3,1200
UNION
SELECT 3,1,500
UNION
SELECT 3,2,400
UNION
SELECT 3,3,300
Okay, the following should be the guts of the view you say you want. It's just a join of the table and view above. Again, this is fine because once a view is created, in this situation, it's just like a table. Nothing too fancy. I divided the counter by the days elapsed like you indicated. You'll notice that I've converted both values to FLOATs first and rounded the result to two decimal places. That's just an assumption on my part that you'll want a fine grained ranking. You can adjust that as you like.
select S.SeriesID, DateAdded as [Date], ROUND(CAST([COUNter] AS FLOAT)/CAST(datediff(day, DateAdded, GETDATE()) AS FLOAT),2) AS Ranking,S.Title
from #series s
INNER JOIN #BoardCounter bc
ON s.seriesid=bc.seriesid
So, you should be good to just change a few table/view/column names in this and slap a CREATE VIEW AS on it.