iPhone SQLite (FMDB) query takes excessive time - iphone

I am trying to execute a query that has group by clause in it.
The query executes in 0.012243 seconds
However when I execute [resultset next], it takes more than 5 seconds the first time it is called.
Example
NSMutableArray *recordsArray = [[NSMutableArray alloc] init];
NSString * query = #"select count(id), SUBSTR(purchase_date, 0,11) as purchase_date from sales where purchase_date LIKE '2012-12%' group by purchase_date order by purchase_date asc";
NSDate * date1 = [NSDate date];
FMResultSet * resultset = [database executeQuery:query];
NSDate * date2 = [NSDate date];
NSTimeInterval timeTaken = [date2 timeIntervalSinceDate:date1];
NSLog(#"TimeTaken: %f", timeTaken); //outputs around 0.012
while ([resultset next])
{
[recordsArray addObject:[resultset resultDictionary]];
}
date2 = [NSDate date];
timeTaken = [date2 timeIntervalSinceDate:date1];
NSLog(#"TimeTaken 2: %f", timeTaken); //outputs around 5.5 seconds
I have also been able to determine that all the time taken is during the first time [resultset next] is called.
I have also tried modifying the query to remove the group by clause by generating a UNION'ed query like
NSString * query2 = #"select * from
(
select count(id), SUBSTR(purchase_date, 0,11) as purchase_date from sales where purchase_date = 2012-12-01
UNION
select count(id), SUBSTR(purchase_date, 0,11) as purchase_date from sales where purchase_date = 2012-12-02
UNION
select count(id), SUBSTR(purchase_date, 0,11) as purchase_date from sales where purchase_date = 2012-12-03
....
UNION
select count(id), SUBSTR(purchase_date, 0,11) as purchase_date from sales where purchase_date = 2012-12-31
) where purcase_date is not NULL order by purchase_date asc";
Executing this query also takes 0.2 seconds but the first call to [resultset next] and the time shoots to 7+ seconds.
Other Info
The table currently has 8000+ rows but that number can go as high as 100K in case of some of my users.
I am using this data to plot a graph for the sales trends for the given month.
On the simulator, this query executes in less that 0.5 seconds but on the device it takes a lot of time.
Question
Can you guide me how to bring down the time for this query?

I determined that the largest bottleneck was the SUBSTR and Group By clauses and executing & processing a simple query like the following only took around 0.02 seconds
Select purchase_date from sales where purchase_date LIKE '2012-12%' order by purchase_date asc;
So I introduced it as an an inner query
Select count(purcahse_date) as count, SUBSTR(purchase_date, 0, 11) as purchase_date from
(
Select purchase_date from sales where purchase_date LIKE '2012-12%'
)
group by purchase_date, order by purchase_date;
Although the data generated was same as the initial query the time again sky-rocketed to around 5.5 seconds as before.
So finally I decide to bite the bullet and my final solution till now is to get all the purchase_date records for the given month and process them by myself.
So Now the code looks like this
while ([resultset next])
{
[recordsArray addObject:[resultset resultDictionary]];
}
[resultset close];
[self closeDB];
//release the db lock at this point
int array[31]; //maximum days in a month
bzero((void *)array, 31 * sizeof(int)); //initialize the month array
for(NSDictionary * d in recordsArray) //read the records received from the db and add daily sales count
{
NSRange r;
r.location = 8;
r.length = 2;
int dDate = [[[d objectForKey:#"purchase_date"] substringWithRange:r] intValue];
array[dDate-1]++;
}
[recordsArray removeAllObjects];
//refDate contains #"2012-12"
for(int i=0; i<31; i++) //now populate the final array again
{
if(array[i] > 0)
{
NSDictionary * d1 = [NSDictionary dictionaryWithObjectsAndKeys:[NSString stringWithFormat:#"%#-%02d", refDate, i+1], #"date", [NSNumber numberWithInt:array[i]], #"count", nil];
[recordsArray addObject:d1];
}
}
return recordsArray;
I hope it helps someone else also stuck in a similar situation or some db guru might suggest some better alternative than this ugly solution.

Related

How can I, in T-SQL, examine date intervals to remove overlapping intervals before adding totals together

I am running an analysis on medication prescribing practices. We want to identify whether someone has been on a class of medications for 60 days out of a 90 day quarter. We have a start and end date for each prescription, and the bounds of the quarter (e.g., 4/1/2022 – 6/30/2022). For each prescription I’ve calculated the number of days between the start and end date (only including days that fall within the bounds of the quarter). There are many instances in which multiple drugs within the same class are prescribed someone might try one antidepressant but not like it, so be given another in the same class.
My original strategy was just to total up number of days for each class of medication and see if it’s 60 or over. The days don’t have to be consecutive, but if they overlap, days during an overlap period shouldn’t count twice (which they would in a simple sum).
For instance in the data table below, patient 1 in row 1 should be included as they are over 60 days. Patient 2 should also get in (rows 2 and 3) because the non-overlapping total (57+8) within the same med class gets them to over 60 days. However, patient 3 should NOT get in, even though the total of 32 + 32 is over 60 because the intervals overlap. This means that they were really on the medication class for only 32 days – this is an instance where someone might be on two different antidepressants simultaneously.
It’s not sufficient to just sum the days in the interval, but I also have to include some way to examine whether the intervals are overlapping and only add days if an interval for a given medication class falls outside another interval for that same class.
Row num Patid Med class Start date End date Interval
1 1 A 2022-04-28 2022-09-12 63
2 2 B 2022-05-03 2022-06-29 57
3 2 B 2022-04-21 2022-04-29 8
4 3 A 2022-01-19 2022-05-03 32
5 3 A 2022-01-19 2022-05-03 32
I’m having a hard time figuring out how to do this. Note, I'm limited to just using SQL for this.
Code that produced the above data. I would embed this in another query to generate a total interval but need to deal with the overlap issue.
DECLARE #startdt DATE;
DECLARE #enddt DATE;
SET #startdt='4/1/2022'
SET #enddt='6/30/2022'
--for q4 fy2022-23 (4/1/2022-6/30/2022)`
SELECT DISTINCT
rx.patid, d.medication_category as medcat, start_date, end_date,
-- case statement to capture days within quarter only
CASE WHEN start_date<#startdt and end_date>#enddt then 90
WHEN start_date<#startdt and end_date>=#startdt then datediff(d,#startdt,end_date)
WHEN start_date>=#startdt and end_date>#enddt then datediff(d,start_date,#enddt)
ELSE datediff(d,start_date,end_date)
END as interval
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
You can accomplish what you want by generating a calendar table (using a Common Table Expression) of individual days within the test range, joining those days with the prescriptions with overlapping days, and then counting distinct days for each patient and medication category combination.
Something like:
DECLARE #startdt DATE = '2022-04-01';
DECLARE #enddt DATE = '2022-06-30';
DECLARE #threshold INT = 60;
WITH Days AS (
SELECT #startdt AS Day
UNION ALL
SELECT DATEADD(day, 1, Day)
FROM Days
WHERE Day < #enddt
)
SELECT
rx.patid, d.medication_category as medcat,
COUNT(DISTINCT DD.Day) AS days_medicated,
MIN(DD.Day) AS start_date,
MAX(DD.Day) AS end_date
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname = d.drugname
INNER JOIN Days DD
ON DD.Day BETWEEN rx.start_date AND rx.end_date
WHERE rx.start_date <= #enddt AND #startdt <= rx.end_date
GROUP BY rx.patid, d.medication_category
HAVING COUNT(DISTINCT DD.Day) >= #threshold
ORDER BY rx.patid, start_date;
If using SQL Server 2022 or later, the Days generator can be simplified by using the new GENERATE_SERIES() function:
WITH Days AS (
SELECT DATEADD(day, S.value, #startdt) AS Day
FROM GENERATE_SERIES(0, DATEDIFF(day, #Startdt, #enddt)) S
)
See this db<>fiddle for an example with some sample data.
I would do this using a date/calendar table, then it's pretty easy.
If you don't already have a date table, this link is one of many that describe how to create one easily ( https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/ )
Here's the script from this link (in case the link dies)
DECLARE #StartDate date = '20100101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 30, #StartDate));
;WITH seq(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, #StartDate, #CutoffDate)
),
d(d) AS
(
SELECT DATEADD(DAY, n, #StartDate) FROM seq
),
src AS
(
SELECT
TheDate = CONVERT(date, d),
TheDay = DATEPART(DAY, d),
TheDayName = DATENAME(WEEKDAY, d),
TheWeek = DATEPART(WEEK, d),
TheISOWeek = DATEPART(ISO_WEEK, d),
TheDayOfWeek = DATEPART(WEEKDAY, d),
TheMonth = DATEPART(MONTH, d),
TheMonthName = DATENAME(MONTH, d),
TheQuarter = DATEPART(Quarter, d),
TheYear = DATEPART(YEAR, d),
TheFirstOfMonth = DATEFROMPARTS(YEAR(d), MONTH(d), 1),
TheLastOfYear = DATEFROMPARTS(YEAR(d), 12, 31),
TheDayOfYear = DATEPART(DAYOFYEAR, d)
FROM d
)
SELECT *
INTO MyDateTable
FROM src
ORDER BY TheDate
OPTION (MAXRECURSION 0);
No that you have your new date table you can join to it to get the list of dates that are within the start and end date, something like
SELECT DISTINCT COUNT(TheDate)
FROM rx
INNER JOIN MyDateTable dt on dt BETWEEN rx.start_date AND rx.end_date
INNER JOIN Drug_names_categories d ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
Obviously this is simple example but you could extend this easily to include all the details you need, the point is that you now have a list of dates or distinct list of dates which you can work with easily.
You could also simply the date range applied by referencing the TheQuarter and TheYear columns. If this is a common task consider extending the date table to contain a comound YearQurater columns (e.g. 2023Q1/202301 etc)

How to get the correct LOD calculation?

in my dataset there are a few customers who churned besides that their subscription plan information. Customers can change their subscription, so here I get the max plan:
{FIXED [Customer Id]:MAX(
(IF {FIXED [Customer Id]: MAX(
IF NOT ISNULL([Subscription Plan]) THEN [Date] END)
}=[Date] THEN [Subscription Plan] END)
)}
To find customer churn:
{ FIXED DATETRUNC('month', [Date]), [Max Plan]:
COUNTD(
IF NOT ISNULL([Churn Date]) AND
DATETRUNC('month', [Date]) = DATETRUNC('month', [Churn Date])
THEN [Customer Id] END
)
I want to calculate the revenue loss by churn, i.e. for each customer churn with advanced plan costs 9, for premium 19 dollars.
IF [Max Plan] = 'advanced' THEN [Churn]*9
ELSEIF [Max Plan] = 'premium' THEN [Churn]*19
END
However, it doesn't give me the correct result.
Here is the expected result:
Here is the workbook attached: https://community.tableau.com/s/contentdocument/0694T000004aUnLQAU

First year spending of all customers

I want to get all customers' spending for first year. But all of them have different date of joining. I have transaction data which consist of columns [User ID], [Date joined], [Transaction Date], [Amount].
Can someone help? Thanks
An LOD calculation oughta do the trick.
{ FIXED [User ID] :
SUM(
IIF(
(
[Transaction Date] >= [Date Joined]
AND [Transaction Date] < DATEADD('year', 1, [Date Joined])
),
[Amount],
0
)
)
}

Updating a table to avoid overlapping dates

I am trying to write a query that reorders date ranges around particular spans. It should do something that looks like this
Member Rank Begin Date End Date
2275 A 9/9/14 11/17/14
2275 B 9/26/14 3/24/15
2275 B 3/25/15 12/31/15
8983 A 9/16/13 3/10/15
8983 B 2/24/15 4/28/15
8983 A 4/28/15 12/31/15
and have it become
Member Rank Begin Date End Date
2275 A 9/9/14 11/17/14
2275 B 11/18/14 3/24/15
2275 B 3/25/15 12/31/15
8983 A 9/16/13 3/10/15
8983 B 3/11/15 4/27/15
8983 A 4/28/15 12/31/15
To explain further, I am looking to update the dates. There isn't much to the ranking except A > B. And there is only A and B. Date ranges with rank A should remain untouched. Overlapping B ranked dates are okay. I am concerned with B ranked dates overlapping with A ranked dates. The table is very large (~700 members) and with several different members IDs. The 2nd line (Rank B) of member 2275 changes the begin date to 11/18/15 to not overlap with the 1st line.
I am using Microsoft SQL Server 2008 R2
Thanks
LATEST EDIT: Here's what I did for pre-2012. I don't think it's the most elegant solution.
WITH a AS (
SELECT
1 AS lgoffset
, NULL AS lgdefval
, ROW_NUMBER() OVER(PARTITION BY [Member] ORDER BY [Begin Date]) AS seq
, [Member]
, [Rank]
, [Begin Date]
, [End Date]
FROM #table
)
SELECT
a.seq
, a.[Member]
, a.[Rank]
, a.[Begin Date]
, CASE
WHEN a.[Rank] = 'B' AND a.[Begin Date] <= ISNULL(aLag.[End Date], a.lgdefval)
THEN ISNULL(aLag.[End Date], a.lgdefval)
ELSE a.[Begin Date]
END AS bdate2
, a.[End Date]
INTO #b
FROM a
LEFT OUTER JOIN a aLag
ON a.seq = aLag.seq + a.lgoffset
AND a.[Member] = aLag.[Member]
ORDER BY [Member], [Begin Date];
UPDATE #table
SET #table.bdate = CASE
WHEN #table.rnk = 'B' AND #table.bdate <= (SELECT #b.bdate2 FROM #b WHERE #b.bdate2 > #b.bdate and #table.mbr = #b.mbr)
THEN dateadd(d, 1,(SELECT bdate2 FROM #b WHERE #b.bdate2 > #b.bdate and #table.mbr = #b.mbr ))
ELSE #table.bdate
END
EDIT PS: Below was my previous answer that only applies to 2012 and later.
You may want to try the following SELECT statement to see if you get the desired results and then convert to an UPDATE:
SELECT
[Member]
, [Rank]
, CASE
WHEN [Rank] = 'B' AND [Begin Date] <= LAG([End Date],1,'12/31/2030') OVER(PARTITION BY [Member] ORDER BY [Begin Date])
THEN DATEADD(d,1,LAG([End Date],1,'12/31/2030')OVER(PARTITION BY [Member] ORDER BY [Begin Date]))
ELSE [Begin Date]
END AS [Begin Date]
, [End Date]
FROM #Table
ORDER BY [Member], [Begin Date]
EDIT: So in order to update the begin date column:
UPDATE #Table
SET [Begin Date] = (SELECT
CASE
WHEN [Rank] = 'B' AND [Begin Date] <= LAG([End Date],1,'12/31/2030') OVER(PARTITION BY [Member] ORDER BY [Begin Date])
THEN DATEADD(d,1,LAG([End Date],1,'12/31/2030')OVER(PARTITION BY [Member] ORDER BY [Begin Date]))
ELSE [Begin Date]
END AS [Begin Date]
FROM #Table)
EDIT 2: Some of my code was incorrect due to not realizing the lag function needed an OVER statement, updated select statement and update statement
Sources:Alternate of lead lag function in sql server 2008
http://blog.sqlauthority.com/2011/11/24/sql-server-solution-to-puzzle-simulate-lead-and-lag-without-using-sql-server-2012-analytic-function/

Yearly total for items over multiple years

I am trying to find the total per year
For example
Start date End Date Total Value
1 07/01/14 01/01/15 $10,000
2 08/01/13 12/01/14 $10,000
3 03/01/13 05/01/15 $10,000
As you can see, Some items are over multiple years. Is there a way to find out what the total value is per year.
Solution should be:
item 3
2013- $3600
2014-$4800
2015-1600
Then a summation would be down for all three items to give a yearly total.
What I have so far:
I have a rolling summation code which is shown below.
case when
(
[begin date] >= dateadd(mm,0,DATEADD(mm,DATEDIFF(mm,0,getdate()),0))
and [end date] >= dateadd(mm,0,DATEADD(mm,DATEDIFF(mm,0,getdate()),0))
)
OR
(
[Begin Date] < dateadd(mm,0,DATEADD(mm,DATEDIFF(mm,0,getdate()),0))
and [End Date] >= dateadd(mm,0,DATEADD(mm,DATEDIFF(mm,0,getdate()),0))
)
then [Totalvalue]/nullif(DATEDIFF(mm,[begin date],[end date]),0)
else 0
end [Current Month]
I dono how you got that total values for item 3
but for item 3 i hope it should be
2013 = 3704
2014 = 4444
2015 = 1852
Dono how efficient this code is just have a try
CREATE TABLE #tblName
(
itemid INT,
startdate DATETIME,
endate DATETIME,
value int
)
INSERT INTO #tblName
VALUES (1,'2014/07/01','2015/01/01',10000),
(2,'2013/08/01','2014/12/01',10000),
(3,'2013/03/01','2015/05/01',10000)
DECLARE #mindate DATETIME,
#maxdate DATETIME
SELECT #mindate = Min(startdate),
#maxdate = Max(endate)
FROM #tblName
SELECT *
FROM #tblName;
WITH cte
AS (SELECT #mindate startdate
UNION ALL
SELECT Dateadd(mm, 1, startdate) startdate
FROM cte
WHERE startdate <= Dateadd(mm, -1, #maxdate))
SELECT a.value * ( ( convert(numeric(22,6),a.cnt) / convert(numeric(22,6),c.total) ) * 100 ) / 100,a.itemid,a.startdate
FROM (SELECT Avg(value) value,
Count(1) cnt,
itemid,
Year(a.startdate) startdate
FROM cte a
JOIN #tblName b
ON a.startdate BETWEEN b.startdate AND b.endate
GROUP BY itemid,
Year(a.startdate)) a
JOIN(SELECT Sum(cnt) total,
itemid
FROM (SELECT Avg(value) value,
Count(1) cnt,
itemid,
Year(a.startdate) startdate
FROM cte a
JOIN #tblName b
ON a.startdate BETWEEN b.startdate AND b.endate
GROUP BY itemid,
Year(a.startdate)) B
GROUP BY itemid) C
ON a.itemid = c.itemid
WHERE a.itemid = 3