Difference between dates in different rows - postgresql

Hy
my problem is, that I need the average time between a chargebegin & chargeend row (timestampserver) grouped by stationname and connectornumber and day.
The main problem is, that i can not use a Max oder Min function because I have the same stationname/connecternumber combination several times in the table.
So in fact I have to select the first chargebegin and find the next chargeend (the one with the same station/connectornumber combination and the min(id) > chargebegin.id) to get the difference.
I tried a lot but in fact i have no idea how to do this.
Database is postgresql 9.2
Testdata:
create table datatable (
id int,
connectornumber int,
message varchar,
metercount int,
stationname varchar,
stationuser varchar,
timestampmessage varchar,
timestampserver timestamp,
authsource varchar
);
insert into datatable values (181,1,'chargebegin',4000,'100','FCSC','2012-10-10 16:39:10','2012-10-10 16:39:15.26');
insert into datatable values (182,1,'chargeend',4000,'100','FCSC','2012-10-10 16:39:17','2012-10-10 16:39:28.379');
insert into datatable values (184,1,'chargebegin',4000,'100','FCSC','2012-10-11 11:06:31','2012-10-11 11:06:44.981');
insert into datatable values (185,1,'chargeend',4000,'100','FCSC','2012-10-11 11:16:09','2012-10-11 11:16:10.669');
insert into datatable values (191,1,'chargebegin',4000,'100','MSISDN_100','2012-10-11 13:38:19','2012-10-11 13:38:26.583');
insert into datatable values (192,1,'chargeend',4000,'100','MSISDN_100','2012-10-11 13:38:53','2012-10-11 13:38:55.631');
insert into datatable values (219,1,'chargebegin',4000,'100','MSISDN_','2012-10-12 11:38:03','2012-10-12 11:38:29.029');
insert into datatable values (220,1,'chargeend',4000,'100','MSISDN_','2012-10-12 11:40:14','2012-10-12 11:40:18.635');

This might have some syntax errors as I can't test it right now, but you should get an idea, how to solve it.
with
chargebegin as (
select
stationname,
connectornumber,
timestampserver,
row_number() over(partition by stationname, connectornumber order by timestampserver) as rn
from
datatable
where
message = 'chargebegin'
),
chargeend as (
select
stationname,
connectornumber,
timestampserver,
row_number() over(partition by stationname, connectornumber order by timestampserver) as rn
from
datatable
where
message = 'chargeend'
)
select
stationname,
connectornumber,
avg(b.timestampserver - a.timestampserver) as avg_diff
from
chargebegin a
join chargeend b using (stationname, connectornumber, rn)
group by
stationname,
connectornumber
This assumes that there is always end event for begin event and that these event cannot overlap (means that for stationname and connectornumber, there can be only one connection at any time). Therefore you can user row_number() to get matching begin/end events and then do whatever calculation is needed.

Related

How to update duplicate rows in a table n postgresql

I have created synthetic data for a typical call center.
Below is the screenshot of the table I have created.
Table 1:
Problem statement: Since this is completely random data, I noticed that there are some customers who are being assigned to the same agents whenever they call again.
So using this query I was able to test such a case and count the number of times agents are being repeated for each customer.
select agentid, customerid, count(customerid) from aa_dev.calls group by agentid, customerid having count(customerid) > 1 ;
Table 2
I have a separate agents table to called aa_dev.agents in which the agent's ids are stored
Now I want to replace the agentid for such cases, such that if agentid is repeated 6 times for a single customer then 5 of the times the agent id should be updated with any other agentid from the table but call time shouldn't be overlapping That means the agent we are replacing with should not be busy on the time the call is going one.
I have assigned row numbers to each repeated ones.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY agentid, customerid ORDER BY random()) rn,
COUNT(*) OVER (PARTITION BY agentid, customerid) cnt
FROM aa_dev.calls
)
SELECT agentid, customerid, rn
FROM cte
WHERE cnt > 1;
This way I could visualize the repetition clearly.
So I don't want to update row 1 but the rest.
Is there any way I can acheive this? Can I use the row number and write a query according to the row number to update rownum 2 onwards row one by one with each row having a unique agent?
If you don't want duplicates in your artificial data, it's probably better to not generate them.
But if you already have a table with duplicates and want to work on the duplicates, either updating them or deleting, here is the easy way:
You need a unique ID for each updated row. If you don't have it,
add it temporarily. Then you can use this pattern to update all duplicates
except the first one:
To add artificial id column to preexisting table, use:
ALTER TABLE calls ADD id serial;
In my case I generated a test table with 100 random rows:
CREATE TEMP TABLE calls (id serial, agentid int, customerid int);
INSERT INTO calls (agentid, customerid)
SELECT (random()*10)::int, (random()*10)::int
FROM generate_series(1, 100) n;
Define what constitutes a duplicate and find duplicates in data:
SELECT agentid, customerid, count(*), array_agg(id) id
FROM calls
GROUP BY 1,2 HAVING count(*)>1
ORDER BY 1,2;
Update all the duplicate rows except first one with NULLs:
UPDATE calls SET agentid = whatever_needed
FROM (
SELECT array_agg(id) id, min(id) idmin FROM calls
GROUP BY agentid, customerid HAVING count(*)>1
) AS dup
WHERE calls.id = ANY(dup.id) AND calls.id <> dup.idmin;
Alternatively, remove all duplicates except first one:
DELETE FROM calls
USING (
SELECT array_agg(id) id, min(id) idmin FROM calls
GROUP BY agentid, customerid HAVING count(*)>1
) AS dup
WHERE calls.id = ANY(dup.id) AND calls.id <> dup.idmin;

Days since last purchase postgres (for each purchase)

Just have a standard orders table:
order_id
order_date
customer_id
order_total
Trying to write a query that generates a column that shows the days since the last purchase, for each customer. If the customer had no prior orders, the value would be zero.
I have tried something like this:
WITH user_data AS (
SELECT customer_id, order_total, order_date::DATE,
ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY order_date::DATE DESC
)
AS order_count
FROM transactions
WHERE STATUS = 100 AND order_total > 0
)
SELECT * FROM user_data WHERE order_count < 3;
Which I could feed into tableau, then use some table calculations to wrangle the data, but I really would like to understand the SQL approach. My approach also only analyzes the most recent 2 transactions, which is a drawback.
Thanks
You should use lag() function:
select *,
lag(order_date) over (partition by customer_id order by order_date)
as prior_order_date
from transactions
order by order_id
To have the number of days since last order, just subtract the prior order date from the current order date:
select *,
order_date- lag(order_date) over (partition by customer_id order by order_date)
as days_since_last_order
from transactions
order by order_id
The query selects null if there is no prior order. You can use coalesce() to change it to zero.
You indicated that you need to calculate number of days since the last purchase.
..Trying to write a query that generates a column that shows the days
since the last purchase
So, basically you need get a difference between now and last purchase date for each client. Query can be the following:
-- test DDL
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_id INTEGER,
order_total INTEGER
);
INSERT INTO orders(order_date, customer_id, order_total) VALUES
('01-01-2015'::DATE,1,2),
('01-02-2015'::DATE,1,3),
('02-01-2015'::DATE,2,4),
('02-02-2015'::DATE,2,5),
('03-01-2015'::DATE,3,6),
('03-02-2015'::DATE,3,7);
WITH orderdata AS (
SELECT customer_id,order_total,order_date,
(now()::DATE - max(order_date) OVER (PARTITION BY customer_id)) as days_since_purchase
FROM orders
WHERE order_total > 0
)
SELECT DISTINCT customer_id ,days_since_purchase FROM orderdata ORDER BY customer_id;

How to Calculate Gap Between two Dates in SQL Server 2005?

I have a data set as shown in the picture.
I am trying to get the date difference between eligenddate (First row) and eligstartdate (second row). I would really appreciate any suggestions.
Thank you
SQL2005:
One solution is to insert into a table variable (#DateWithRowNum - the number of rows is small) or into a temp table (#DateWithRowNum - the number of rows is high) the rows with a row number (generated using [elig]startdate as order by criteria; also see note #1) plus a self join thus:
DECLARE #DateWithRowNum TABLE (
memberid VARCHAR(50) NOT NULL,
rownum INT,
PRIMARY KEY(memberid, rownum),
startdate DATETIME NOT NULL,
enddate DATETIME NOT NULL
)
INSERT #DateWithRowNum (memberid, rownum, startdate, enddate)
SELECT memberid,
ROW_NUMBER() OVER(PARTITION BY memberid ORDER By startdate),
startdate,
enddate
FROM dbo.MyTable
SELECT crt.*, DATEDIFF(MONTH, crt.enddate, prev.startdate) AS gap
FROM #DateWithRowNum crt
LEFT JOIN #DateWithRowNum prev ON crt.memberid = prev.memberid AND crt.rownum - 1 = prev.rownum
ORDER BY crt.memberid, crt.rownum
Another solution is to use common table expression instead of table variable / temp table thus:
;WITH DateWithRowNum AS (
SELECT memberid,
ROW_NUMBER() OVER(PARTITION BY memberid ORDER By startdate),
startdate,
enddate
FROM dbo.MyTable
)
SELECT crt.*, DATEDIFF(MONTH, crt.enddate, prev.startdate) AS gap
FROM DateWithRowNum crt
LEFT /*HASH*/ JOIN DateWithRowNum prev ON crt.memberid = prev.memberid AND crt.rownum - 1 = prev.rownum
ORDER BY crt.memberid, crt.rownum
Note #1: I assume that you need to calculate these values for every memberid
Note #2: HASH hint forces SQL Server to evaluate just once every data source (crt or prev) of LEFT JOIN.

Speeding up TSQL

Hi all i wondering if there's a more efficient way of executing this TSQl script. It basically goes and gets the very latest activity ordering by account name and then join this to the accounts table. So you get the very latest activity for a account. The problem is there are currently about 22,000 latest activities, so obviously it has to go through alot of data, just wondering if theres a more efficient way of doing what i'm doing?
DECLARE #pastAppointments TABLE (objectid NVARCHAR(100), account NVARCHAR(500), startdate DATETIME, tasktype NVARCHAR(100), ownerid UNIQUEIDENTIFIER, owneridname NVARCHAR(100), RN NVARCHAR(100))
INSERT INTO #pastAppointments (objectid, account, startdate, tasktype, ownerid, owneridname, RN)
SELECT * FROM (
SELECT fap.regardingobjectid, fap.regardingobjectidname, fap.actualend, fap.activitytypecodename, fap.ownerid, fap.owneridname,
ROW_NUMBER() OVER (PARTITION BY fap.regardingobjectidname ORDER BY fap.actualend DESC) AS RN
FROM FilteredActivityPointer fap
WHERE fap.actualend < getdate()
AND fap.activitytypecode NOT LIKE 4201
) tmp WHERE RN = 1
ORDER BY regardingobjectidname
SELECT fa.name, fa.owneridname, fa.new_technicalaccountmanagername, fa.new_customerid, fa.new_riskstatusname, fa.new_numberofopencases,
fa.new_numberofurgentopencases, app.startdate, app.tasktype, app.ownerid, app.owneridname
FROM FilteredAccount fa LEFT JOIN #pastAppointments app on fa.accountid = app.objectid and fa.ownerid = app.ownerid
WHERE fa.statecodename = 'Active'
AND fa.ownerid LIKE #owner_search
ORDER BY fa.name
You can remove ORDER BY regardingobjectidname from the first INSERT query - the only (narrow) purpose such a sort would have on an INSERT query is if there was an identity column on the table being inserted into. And there isn't in this case, so if the optimizer isn't smart enough, it'll perform a pointless sort.

tsql math across multiple dates in a table

I have a #variabletable simply defined as EOMDate(datetime), DandA(float), Coupon(float), EarnedIncome(float)
04/30/2008, 20187.5,17812.5,NULL
05/31/2008, 24640.63, 22265.63, NULL
06/30/2008, 2375, 26718.75,NULL
What I am trying to do is after the table is populated, I need to go back and calculate the EarnedIncome field to populate it.
the formula is DandA for the current month minus DandA for the previous month plus coupon.
Where I am having trouble is how can I do the update? So for 6/30 the value should be 4453.12 (2375-24640.63)+26718.75
I'll gladly take a clubbing over the head to get this resolved. thanks. Also, running under MS SQL2005 so any CTE ROW_OVER type solution can be used if possible.
You would need to use a subquery like this:
UPDATE #variabletable v1
SET EarnedIncome = DandA
- (SELECT DandA FROM #variabletable v2 WHERE GetMonthOnly(DATEADD(mm, -1, v2.EOMDate)=GetMonthOnly(v1.EOMDate))
+ Coupon
And I was making use of this helper function
DROP FUNCTION GetMonthOnly
GO
CREATE FUNCTION GetMonthOnly
(
#InputDate DATETIME
)
RETURNS DATETIME
BEGIN
RETURN CAST(CAST(YEAR(#InputDate) AS VARCHAR(4)) + '/' +
CAST(MONTH(#InputDate) AS VARCHAR(2)) + '/01' AS DATETIME)
END
GO
There's definitely quite a few ways to do this. You'll find pros and cons depending on how large your data set is, and other factors.
Here's my recommendation...
Declare #table as table
(
EOMDate DateTime,
DandA float,
Coupon Float,
EarnedIncome Float
)
Insert into #table Values('04/30/2008', 20187.5,17812.5,NULL)
Insert into #table Values('05/31/2008', 24640.63, 22265.63, NULL)
Insert into #table Values('06/30/2008', 2375, 26718.75,NULL)
--If we know that EOMDate will only contain one entry per month, and there's *always* one entry a month...
Update #Table Set
EarnedIncome=DandA-
(Select top 1 DandA
from #table t2
where t2.EOMDate<T1.EOMDate
order by EOMDate Desc)+Coupon
From #table T1
Select * from #table
--If there's a chance that there could be more per month, or we only want the values from the previous month (do nothing if it doesn't exist)
Update #Table Set
EarnedIncome=DAndA-(
Select top 1 DandA
From #table T2
Where DateDiff(month, T1.EOMDate, T2.EOMDate)=-1
Order by EOMDate Desc)+Coupon
From #Table T1
Select * from #table
--Leave the null, it's good for the data (since technically you cannot calculate it without a prior month).
I like the second method best because it will only calculate if there exists a record for the preceding month.
(add the following to the above script to see the difference)
--Add one for August
Insert into #table Values('08/30/2008', 2242, 22138.62,NULL)
Update #Table Set
EarnedIncome=DAndA-(
Select top 1 DandA
From #table T2
Where DateDiff(month, T1.EOMDate, T2.EOMDate)=-1
Order by EOMDate Desc
)+Coupon
From #Table T1
--August is Null because there's no july
Select * from #table
It's all a matter of exactly what do you want.
Use the record directly proceding the current record (regardless of date), or ONLY use the record that is a month before the current record.
Sorry about the format... Stackoverflow.com's answer editor and I do not play nice together.
:D
You can use a subquery to perform the calcuation, the only problem is what do you do with the first month because there is no previous DandA value. Here I've set it to 0 using isnull. The query looks like
Update MyTable
Set EarnedIncome = DandA + Coupon - IsNull( Select Top 1 DandA
From MyTable2
Where MyTable.EOMDate > MyTable2.EOMDate
Order by MyTable2.EOMDate desc), 0)
This also assumes that you only have one record per month in each table, and that there are't any gaps between months.
Another alternative is to calculate the running total when you are inserting your data, and have a constraint guarantee that your running total is correct:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
There may be a way to do this in a single statement, but in cases like this, I'd be inclined to set up a cursor to walk through each row, computing the new EarnedIncome field for that row, update the row, and then move to the next row.
Ex:
DECLARE #EOMDateVal DATETIME
DECLARE #EarnedIncomeVal FLOAT
DECLARE updCursor CURSOR FOR
SELECT EOMDate FROM #variabletable
OPEN updCursor
FETCH NEXT FROM updCursor INTO #EOMDateVal
WHILE ##FETCH_STATUS = 0
BEGIN
// Compute #EarnedIncomeVal for this row here.
// This also gives you a chance to catch data integrity problems
// that would cause you to fail the whole batch if you compute
// everything in a subquery.
UPDATE #variabletable SET EarnedIncome = #EarnedIncomeVal
WHERE EOMDate = #EOMDateVal
FETCH NEXT FROM updCursor INTO #EOMDateVal
END
CLOSE updCursor
DEALLOCATE updCursor