Very slow operation due to lack of GroupBy and Sum in Realm [iOS]

Very slow operation due to lack of GroupBy and Sum in Realm [iOS] - swift

I am trying to build an accounting app where there can be many accounts and each account can have many entries. Each entry has date and amount.
Entries are linked to the Account class like so:
let entries = LinkingObjects(fromType: Entry.self, property: "account").sorted(byKeyPath: #keyPath(Entry.date))
I would like to group by Account name, have sum of amount for each month between any given dates. Dates can vary for reporting purposes.
Realm doesn't have groupby function I cannot easily get a result where the columns are account name, total, average, sumOf(month1), sumOf(month2) etc etc
Therefore; I need to do it in code, but the result is very slow. Is there anything realm specific missing from my code that would dramatically improve the speed of the calculations?
This code is being run for each account and for each period (for a yearly report that means 12 times for each account) and cause of the slowness:
let total: Double = realm
.objects(Entry.self)
.filter("_account.fullPath == %#", account.fullPath)
.filter("date >= %# AND date <= %#", period.startDate, period.endDate)
.filter("isConsolidation != TRUE")
.sum(ofProperty: "scaledAmount")
Below is the complete code for I have to loop each account and run a query for each period:
private func getMonthlyValues(for accounts: [Account], in realm: Realm, maxLevel: Int) -> String {
guard accounts.count > 0 else { return "" }
guard accounts[0].displayLevel < maxLevel else { return "" }
var rows: String = ""
for account in accounts.sorted(by: { $0.fullPath < $1.fullPath }) {
if account.isFolder {
let row = [account.localizedName.htmlColumn].htmlRow
rows += row
rows += monthlyValues(for: Array(account.children), in: realm, maxLevel: maxLevel)
} else {
var row: [String] = []
var totals: [Double] = []
// period has a start date and an end date
// monthlyPeriods returns the period in months
// for a datePeriod starting February 10, and ending in November 20 (for whatever reason)
// monthlyPeriods returns 10 periods for each month starting from [February 10 - February 28],
// and ending [November 1 - November 20]
// below gets the some for the given period for the current account
// for this example it runs 10 times for each account getting the sum for each period
for period in datePeriod.monthlyPeriods {
let total: Double = realm
.objects(Entry.self)
.filter("_account.fullPath == %#", account.fullPath)
.filter("date >= %# AND date <= %#", period.startDate, period.endDate)
.filter("isConsolidation != TRUE")
.sum(ofProperty: "scaledAmount")
totals.append(total)
}
let total = totals.reduce(0, +)
guard total > 0 else { continue }
let average = AccountingDouble(total / Double(totals.count)).value
row.append("\(account.localizedName.htmlColumn)")
row.append("\(total.htmlFormatted().htmlColumn)")
row.append("\(average.htmlFormatted().htmlColumn)")
row.append(contentsOf: totals.map { $0.htmlFormatted().htmlColumn })
rows += row.htmlRow
}
}
return rows
}
Any help is much appreciated.

Since you are not using the auto updating property of Results, what I would try is not running different queries for each month, but getting all entries using a single query, then doing the filtering and summing using Swift's built in filter and reduce functions, this way you can reduce the overhead of Realm queries.
Just have let entries = realm.objects(Entry.self) to store all entries using a single query, then you can filter this using Array(entries).filter({$0.account.fullPath == account.fullPath}), etc. for each account and date and this won't query Realm each time you do the calculation. If your entries won't change between calculations, you can even cast entries to Array right away using let entries = Array(realm.objects(Entry.self))

Related

Why doesn't adding an index to a Realm Object speed up my count query?

I have the following code I'm using to try to test indexing with Realm. I have an object class with a single Int value that is indexed, also an identical class, but unindexed.
After inserting 1 million rows of each class, with evenly distributed values, I run some queries to see the time it takes with and without indexes.
The time is basically the same (sometimes unindexed will be slightly faster, sometimes indexed will be slightly faster -- but within about 5% or something).
the results I get are these:
realm init6 took: 79.721125 ms
1000000
baseCountIndexed took: 0.233 ms
1000000
baseCountUnindexed took: 0.075833 ms
500000
getCountIndexedObjsViaFilterString took: 18.982542 ms
500000
getCountIndexedObjsViaWhereClosure took: 16.156041 ms
500000
getCountUnindexedObjsViaFilterString took: 17.985084 ms
500000
getCountUnindexedObjsViaWhereClosure took: 16.031917 ms
I would expect the indexed version to be faster -- seems like it should be close to the base count time, since an index should result in O(Log N) time -- the indexed version should at least be faster than unindexed.
What am I doing wrong?
The code I'm using follows: (long so I could make it complete / runnable)
The code basically inserts 1 million indexed objects, and 1 million unindexed objects, then gets the count of both types -- which is very fast -- and then does 2 different kinds of where clauses, one using a .filter(String), and one using a .where closure, to select 1/2 the objects, then do a count.
It is my understanding that the results coming from realm are lazy, and so the count should be done without loading the objects into memory.
The times for all 4 queries (indexed with .filter and indexed with .where, and unindexed for .filter & .where) all take about the same amount of time.
Edit:
After the answer from jay below, I ran the same code, with results not massively dissimilar -- I got about 33% speed increase by indexing, rather than the 66% he got. Still I would expect the results to be near instant for an indexed field, so something isn't right. I'll update if / when I figure it out. For now I'm moving on, since even the unindexed speed is good enough for my current usage. This is just really weird to see it so slow with indexes.
import RealmSwift
class TimeIt {
let val = DispatchTime.now()
func elapsed() -> DispatchTimeInterval {
return val.distance(to: DispatchTime.now())
}
static func time(_ desc: String, aclosure: () -> Void) {
let t = TimeIt()
aclosure()
print("\(desc) took: \(Double(t.elapsed().nanoseconds) / 1000000.0) ms")
}
}
class RealmIndexed: Object {
#Persisted var id = UUID().uuidString
#Persisted(indexed: true) var val: Int = 0
}
class RealmUnindexed: Object {
#Persisted var id = UUID().uuidString
#Persisted var val: Int = 0
}
func generateObjs(_ hundredsOfThousands: Int = 1) {
let realm = try! Realm()
var objs: [RealmIndexed] = []
for _ in 0..<hundredsOfThousands {
for i in 0..<100_000 {
let obj = RealmIndexed()
obj.val = i
objs.append(obj)
}
}
try! realm.write {
realm.add(objs)
}
var objs2: [RealmUnindexed] = []
for _ in 0..<hundredsOfThousands {
for i in 0..<100_000 {
let obj = RealmUnindexed()
obj.val = i
objs2.append(obj)
}
}
try! realm.write {
realm.add(objs2)
}
}
func baseCountIndexed() -> Int {
return (try! Realm().objects(RealmIndexed.self)).count
}
func baseCountUnindexed() -> Int {
return (try! Realm().objects(RealmUnindexed.self)).count
}
func getCountIndexedObjsViaFilterString(_ minVal: Int = 50000) -> Int {
let count = (try! Realm().objects(RealmIndexed.self).filter("val >= %#", minVal)).count
return count
}
func getCountUnindexedObjsViaFilterString(_ minVal: Int = 50000) -> Int {
let count = (try! Realm().objects(RealmUnindexed.self).filter("val >= %#", minVal)).count
return count
}
func getCountIndexedObjsViaWhereClosure(_ minVal: Int = 50000) -> Int {
let count = (try! Realm().objects(RealmIndexed.self).where { $0.val >= minVal }).count
return count
}
func getCountUnindexedObjsViaWhereClosure(_ minVal: Int = 50000) -> Int {
let count = (try! Realm().objects(RealmUnindexed.self).where { $0.val >= minVal }).count
return count
}
func testRealmSpeed() {
TimeIt.time("realm init6") { _ = try! Realm() }
TimeIt.time("baseCountIndexed") { print(baseCountIndexed()) }
TimeIt.time("baseCountUnindexed") { print(baseCountUnindexed()) }
TimeIt.time("getCountIndexedObjsViaFilterString") { print(getCountIndexedObjsViaFilterString()) }
TimeIt.time("getCountIndexedObjsViaWhereClosure") { print(getCountIndexedObjsViaWhereClosure()) }
TimeIt.time("getCountUnindexedObjsViaFilterString") { print(getCountUnindexedObjsViaFilterString()) }
TimeIt.time("getCountUnindexedObjsViaWhereClosure") { print(getCountUnindexedObjsViaWhereClosure()) }
}
generateObjs(10)
testRealmSpeed()

This is not exactly an answer but perhaps additional info.
I don't think you're doing anything wrong but perhaps a simpler test will be more revealing. I set up two similar objects, one using indexing on a val property and one not. Testing for equality; val == 5, here's the setup:
For brevity, I'm omitting the writing code but it creates 1 Million of each object containing the values 5 and 9 e.g. Realm will contain a million indexed 5,9, 5,9, 5,9 etc and a million not indexed 5,9, 5,9 etc. (5 & 9 are just arbitrary numbers I picked)
And then a function to test each object type. I queried for '5' so it would return a 1/2 million results
func testNotIndexed() {
Task {
let realm = Realm()
let startTime = Date()
let results = try await realm.objects(NotIndexedClass.self).where { $0.val == 5 }
let elapsed = Date().timeIntervalSince(startTime)
print("Not Indexed took: \(elapsed * 1000) ms")
}
}
func testIndexed() {
Task {
let realm = Realm()
let startTime = Date()
let results = try await realm.objects(IndexedClass.self).where { $0.val == 5 }
let elapsed = Date().timeIntervalSince(startTime)
print("Indexed took: \(elapsed * 1000) ms")
}
}
and the repeatable results
Not Indexed took: 1.2680292129516602 ms
Indexed took: 0.44596195220947266 ms
So the indexed query took roughly 1/3 the time.
If the test parameters are changed to be objects containing numbers from 0 to 999,999, and then query for all numbers > 50,000 (not 500k to increase the returned dataset size), the results are similar.

Is there a way to use a computed property with a parameter in Core Data?

This might be a weird question because I don't fully understand how transient and the new derived properties work in Core Data.
So imagine I have a RegularClass entity, which stores any class that repeats over time. A class can repeat for example every third day or every one week. This is how it looks in the data model:
(a RegularClass belongs to a Schedule entity, which in turn belongs to a Course entity)
Now, if our class repeats every third day, we store the number 3 in the frequency property, and a string "days" in unit property, which is then converted to an enum in Swift. A Schedule, which every RegularClass belongs to, has a startDate property.
To check if a class happens at a given date, I came up with nothing better than calculating the difference in specified unit between the startDate and the given date, then taking a remainder between the difference and frequency, and if it's 0, than it's the date in which a class can occur.
var differenceComponent: Calendar.Component {
switch unitType {
case .weeks:
return .weekOfYear
case .days:
return .day
}
}
func getDifferenceFromDateComponents(_ dateComponents: DateComponents) -> Int? {
switch unitType {
case .weeks:
return dateComponents.weekOfYear
case .days:
return dateComponents.day
}
}
func dateIsInActiveState(_ date: Date) -> Bool {
if let startDate = schedule?.startDate {
let comps = Calendar.current.dateComponents([differenceComponent], from: startDate, to: date)
if let difference = getDifferenceFromDateComponents(comps) {
let remainder = Int64(difference) % frequency // that is the key!
return remainder == 0
}
}
return false
}
func containsDate(_ date: Date) -> Bool {
if dateIsInActiveState(date) {
if unitType == .days {
return true
}
let weekday = Calendar.current.component(.weekday, from: date)
return (weekdays?.allObjects as? [Weekday])?.contains(where: { $0.number == weekday }) ?? false
}
return false
}
Now, the thing is that this code works perfectly for courses that I've already got from a fetch request. But is there a way to pass a date parameter in a NSPredicate to calculate this while request happens? Or do I have to fetch all the courses and then filter them out manually?

To solve this issue you could store your data as scalar types and then do simple arithmetic in your predicate. Rather than dates, use integers with a days-from-day-zero figure (or whatever minimum unit of time is necessary for these calculations). Store your repeat cycle as number-of-days.
Then you can use the calculation ((searchDate - startDate) mod repeatCycle) == 0 in your predicate to find matching classes.
As you have suggested, it might be sensible to denormalise your data for different search cases.

CoreData fetching issue: fetching entities with an array of specific days

I have saved entities and gave them an attribute called "date". My goal is to fetch objects for a specific set of days. As Date (NSDate) objects are specific moments in time, i'm forced to create "day ranges" for every day i want to fetch.
How can i approach this effectively? Should i create N predicates for N days or create one big predicate with "AND" clauses? Are there clever ways to approach this common goal?

If you need your data (so all events) multipe times, you should not fetch them on every request, better filter them.
For example:
var allEvents = [Events]()
...
let fetchRequest = NSFetchRequest(entityName: "events")
allEvents = try context.executeFetchRequest(fetchRequest) as! [Event]
And then, when you need them
1. for a specific date
let calender = NSCalendar.currentCalendar()
eventsForSelectedDay = events.filter( { return calender.isDate($0.yourDateProperty, inSameDayAsDate: self.currentSelectedDate) == true } )
2. for a date range
eventsForSelectedDays = events.filter( { return $0.yourDateProperty >= dateFrom && $0.yourDateProperty < dateTo } )

Performing functions with valueForKey

I am trying to get my app to perform functions. I have two attributes per item (quantity and price) that I want to multiply together and then total for all the didSelectRow items on the list. There is two sections on my tableView. Section 0 is regular and moved to section 1 with didSelectRow. (I only explain this because it comes into play further down)
My code so far is...
`func cartTotalFunc() {
itemFetchRequest().returnsObjectsAsFaults = false
do {
let results = try moc.executeFetchRequest(itemFetchRequest())
print("===\(results)")
// Calculate the grand total.
var grandTotal = 0
for order in results {
let SLP = order.valueForKey("slprice") as! Int
let SLQ = order.valueForKey("slqty") as! Int
grandTotal += SLP * SLQ
}
print("\(grandTotal)")
cartTotal.text = "$\(grandTotal)" as String
} catch let error as NSError {
print(error)
}
}
`
slprice and slqty are strings in Core Data. I am trying to cast them as Int so they will do the arithmetic. I had this working but it totaled every item instead of only the crossed off ones (section 1). I gave it a rest for a while and now when I come back to try to work on it again Xcode is giving me an error of, "can not Could not cast value of type 'NSTaggedPointerString' (0x104592ae8) to 'NSNumber' (0x1051642a0)."
Can anyone help with this, please?

HealthKit Running Splits In Kilometres Code Inaccurate – Why?

So below is the code that I've got thus far, cannot figure out why I'm getting inaccurate data.
Not accounting for the pause events yet that should not affect the first two kilometre inaccuracies...
So the output would be the distance 1km and the duration that km took.
Any ideas for improvement, please help?
func getHealthKitWorkouts(){
print("HealthKit Workout:")
/* Boris here: Looks like we need some sort of Health Kit manager */
let healthStore:HKHealthStore = HKHealthStore()
let durationFormatter = NSDateComponentsFormatter()
var workouts = [HKWorkout]()
// Predicate to read only running workouts
let predicate = HKQuery.predicateForWorkoutsWithWorkoutActivityType(HKWorkoutActivityType.Running)
// Order the workouts by date
let sortDescriptor = NSSortDescriptor(key:HKSampleSortIdentifierStartDate, ascending: false)
// Create the query
let sampleQuery = HKSampleQuery(sampleType: HKWorkoutType.workoutType(), predicate: predicate, limit: 0, sortDescriptors: [sortDescriptor])
{ (sampleQuery, results, error ) -> Void in
if let queryError = error {
print( "There was an error while reading the samples: \(queryError.localizedDescription)")
}
workouts = results as! [HKWorkout]
let target:Int = 0
print(workouts[target].workoutEvents)
print("Energy ", workouts[target].totalEnergyBurned)
print(durationFormatter.stringFromTimeInterval(workouts[target].duration))
print((workouts[target].totalDistance!.doubleValueForUnit(HKUnit.meterUnit())))
self.coolMan(workouts[target])
self.coolManStat(workouts[target])
}
// Execute the query
healthStore.executeQuery(sampleQuery)
}
func coolMan(let workout: HKWorkout){
let expectedOutput = [
NSTimeInterval(293),
NSTimeInterval(359),
NSTimeInterval(359),
NSTimeInterval(411),
NSTimeInterval(810)
]
let healthStore:HKHealthStore = HKHealthStore()
let distanceType = HKObjectType.quantityTypeForIdentifier(HKQuantityTypeIdentifierDistanceWalkingRunning)
let workoutPredicate = HKQuery.predicateForObjectsFromWorkout(workout)
let startDateSort = NSSortDescriptor(key: HKSampleSortIdentifierStartDate, ascending: true)
let query = HKSampleQuery(sampleType: distanceType!, predicate: workoutPredicate,
limit: 0, sortDescriptors: [startDateSort]) {
(sampleQuery, results, error) -> Void in
// Process the detailed samples...
if let distanceSamples = results as? [HKQuantitySample] {
var count = 0.00, countPace = 0.00, countDistance = 0.0, countPacePerMeterSum = 0.0
var countSplits = 0
var firstStart = distanceSamples[0].startDate
let durationFormatter = NSDateComponentsFormatter()
print("🕒 Time Splits: ")
for (index, element) in distanceSamples.enumerate() {
count += element.quantity.doubleValueForUnit(HKUnit.meterUnit())
/* Calculate Pace */
let duration = ((element.endDate.timeIntervalSinceDate(element.startDate)))
let distance = distanceSamples[index].quantity
let pacePerMeter = distance.doubleValueForUnit(HKUnit.meterUnit()) / duration
countPace += duration
countPacePerMeterSum += pacePerMeter
if count > 1000 {
/* Account for extra bits */
let percentageUnder = (1000 / count)
//countPace = countPace * percentageUnder
// 6.83299013038 * 2.5
print("👣 Reached Kilometer \(count) ")
// MARK: Testing
let testOutput = durationFormatter.stringFromTimeInterval(NSTimeInterval.init(floatLiteral: test)),
testOutputExpected = durationFormatter.stringFromTimeInterval(expectedOutput[countSplits])
print(" Output Accuracy (", round(test - expectedOutput[countSplits]) , "): expected \(testOutputExpected) versus \(testOutput)")
print(" ", firstStart, " until ", element.endDate)
/* Print The Split Time Taken */
firstStart = distanceSamples[index].endDate;
count = (count % 1000) //0.00
countPace = (count % 1000) * pacePerMeter
countSplits++
/* Noise
\(countSplits) – \(count) – Pace \(countPace) – Pace Per Meter \(pacePerMeter) – Summed Pace Per Meter \(countPacePerMeterSum) – \(countPacePerMeterSum / Double.init(index))"
*/
}
/* Account for the last entry */
if (distanceSamples.count - 1 ) == index {
print("We started a kilometer \(countSplits+1) – \(count)")
let pacePerKM = (count / countPace) * 1000
print(durationFormatter.stringFromTimeInterval(NSTimeInterval.init(floatLiteral: (pacePerKM ))))
}
}
}else {
// Perform proper error handling here...
print("*** An error occurred while adding a sample to " + "the workout: \(error!.localizedDescription)")
abort()
}
}
healthStore.executeQuery(query)
}
func coolManStat(let workout: HKWorkout){
let healthStore:HKHealthStore = HKHealthStore()
let stepsCount = HKQuantityType.quantityTypeForIdentifier(HKQuantityTypeIdentifierDistanceWalkingRunning)
let sumOption = HKStatisticsOptions.CumulativeSum
let statisticsSumQuery = HKStatisticsQuery(quantityType: stepsCount!, quantitySamplePredicate: HKQuery.predicateForObjectsFromWorkout(workout),
options: sumOption)
{ (query, result, error) in
if let sumQuantity = result?.sumQuantity() {
let numberOfSteps = Int(sumQuantity.doubleValueForUnit(HKUnit.meterUnit()))/1000
print("👣 Right -O: ",numberOfSteps)
}
}
healthStore.executeQuery(statisticsSumQuery)
}

I'm sure you're past this problem by now, more than two years later! But I'm sure someone else will come across this thread in the future, so I thought I'd share the answer.
I started off with a version of your code (many thanks!!) and encountered the same problems. I had to make a few changes. Not all of those changes are related to the issues you were seeing, but in any case, here are all of the considerations I've thought of so far:
Drift
You don't handle the 'drift', although this isn't what's causing the big inaccuracies in your output. What I mean is that your code is saying:
if count > 1000
But you don't do anything with the remainder over 1000, so your kilometre time isn't for 1000m, it's for, let's say, 1001m. So your time both is inaccurate for the current km, and it's including some of the running from the next km, so that time will be wrong too. Over a long run, this could start to cause noticeable problems. But it's not a big deal over short runs as I don't think the difference is significant enough at small distances. But it's definitely worth fixing. In my code I'm assuming that the runner was moving at a constant pace during the current sample (which is obviously not perfect, but I don't think there's a better way), and I'm then simply finding the fraction of the current sample distance that puts the split distance over 1000m, and getting that same fraction of the current sample's duration and removing it from the current km's time, and adding it (and the distance) to the next split.
GPS drops
The real problem with your results is that you don't handle GPS drops. The way I'm currently handling this is to compare the startDate of the current sample with the endDate of the previous sample. If they're not the same then there was a GPS drop. You need to add the difference between the previous endDate and the current startDate to the current split. Edit: you also need to do this with the startDate of the activity and the startDate of the first sample. There will be a gap between these 2 dates while GPS was connecting.
Pauses
There's a slight complication to the above GPS dropping problem. If the user has paused the workout then there will also be a difference between the current sample's startDate and the previous sample's endDate. So you need to be able to detect that and not adjust the split in that case. However, if the user's GPS dropped and they also paused during that time then you'll need to subtract the pause time from the missing time before adding it to the split.
Unfortunately, my splits are still not 100% in sync with the Apple Workouts app. But they've gone from being potentially minutes off to being mostly within 1 second. The worst I've seen is 3 seconds. I've only been working on this for a couple of hours, so I plan to continue trying to get 100% accuracy. I'll update this answer if I get that. But I believe I've covered the major problems here.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse