iOS CloudKit is slow on querying heavy CKAsset (even with QoS) - swift

I am using CloudKit to download CoreML (machine learning) models. They are about 90MB each. I have the public database and default zone with one custom 'ML' record type.
I query this 'ML' by id, and it takes more than a minute to get a response on the completion block (it should be a matter of seconds). I've tried production environment, setting quality of service, and different ways of querying with the same result (very slow).
I wonder if I'm missing something or if there is any other way of downloading the ML models that is faster?
Here's my current code:
let arrayPredicate = NSPredicate(format: "id == %#", id)
let query = CKQuery(recordType: "ML", predicate: arrayPredicate)
let queryOperation = CKQueryOperation(query: query)
queryOperation.qualityOfService = .userInteractive
queryOperation.resultsLimit = 1
queryOperation.recordFetchedBlock = { record in
// This gets called +60 sec after
}
queryOperation.queryCompletionBlock = { record, error in
// Same here
}
publicDB.add(queryOperation)

I switched to Firebase Storage to test and the result was slightly faster, but not much faster. rmdaddy and TommyBs were right on their line of thought: CloudKit might be a bit slower because you need to query a record, but the download is on a similar speed.
My final solution was so use Firebase Storage as it's easy to handle download progress and then show it on UI for the user to wait.

Related

Splitting up Firebase Snapshots into smaller pieces?

For testing purposes, I created a specific node on my Firebase database. I copy a user over to that node and then can futz with it without worrying about corrupting data or ruining a user's info. It works really well for my purposes.
I've run into a problem, however. If a user has an extremely large set of data, the copy function won't work. It just stalls. I don't get any errors, though. I read that Firebase has copy limits of 1MB, and I'm guessing that's the problem. I'm running up against that wall, I think.
Here is my code:
func copyToTestingNode() {
let start = Date()
// 1 . create copy of user and then modify the copy
guard var copiedUser = user else { print("copied user error"); return }
copiedUser.userID = MP.adminID
copiedUser.householdInfo.subscriptionExpiryDate = 2500000000
// 2. get a snapshot of the copied user's info
ref.child(user.userID).observeSingleEvent(of: .value) { (userSnapshot) in
print("Step 2 TRT:", Date().timeIntervalSince(start))
// 3. remove any existing data at admin node, and then...
self.ref.child(MP.adminID).removeValue { (error, dbRef) in
print("Step 3 TRT:", Date().timeIntervalSince(start))
// 4. ...copy the new user info to the admin node
self.ref.child(MP.adminID).setValue(userSnapshot.value, withCompletionBlock: { (error, adminRef) in
print("Step 4 TRT:", Date().timeIntervalSince(start))
// 5. then send user alert and stop activity indicator
self.activityIndicator.stopAnimating()
self.showSimpleAlert(alertTitle: "Copy Complete", alertMessage: "Your copy of \(copiedUser.householdInfo.userName) is complete and can be found under the new node:\n\n\(copiedUser.householdInfo.userName) Family")
})
}
}
}
Options:
Is there a simple way to check the size of the DataSnapshot to alert me that the dataset is too large to copy over?
Is there a simple way to split up the snapshot into smaller pieces and overcome the 1MB limit that way?
Should I use Cloud Functions instead of trying to trigger this on a device?
Is there a way to somehow "compress" the snapshot to be smaller so that I can copy it easier?
I'm open to suggestions.
UPDATE #1
I read about the size limitation HERE. Judging from Frank's reaction, I'm guessing my understanding of that limitation is wrong.
I downloaded the node from the Firebase console and checked its size. It's 799 KB on my hard drive. It's a large JSON tree, and so I thought that its size must be the reason why it won't copy over. The smaller nodes copy over no problem. Just the large ones have trouble.
UPDATE #2
I'm not sure how to show the actual data, other than a screenshot, seeing how large the JSON tree is. So here is a screenshot:
As you can see, the data has multiple nodes, some of which are larger than others. I suppose I can cut down the 'Job Jar' node, but the rest really need to be that size for everything to work properly.
Granted, this is one of the largest datasets I have among all my users, but the structure doesn't change.
As for the speed of execution for each line of code, here are the simulator times for each numbered step:
Step 2 TRT: 0.5278879404067993
Step 3 TRT: 0.6249579191207886
Step 4 TRT: 1.8466829061508179
ALL DONE COPYING!!
This only works for the smaller datasets. For the larger ones, I never get to step 4. It just hangs. I let it run for several minutes, but no change.
Final version that seems to work:
func copyToTestingNode() {
// 1 . create copy of user and then modify the copy
guard var copiedUser = user else { print("copied user error"); return }
let adminRef = ref.child(MP.adminID)
copiedUser.userID = MP.adminID
copiedUser.householdInfo.subscriptionExpiryDate = 2500000000
// 2. get a snapshot of the copied user's info
ref.child(user.userID).observeSingleEvent(of: .value) { (userSnapshot) in
// 3. remove any existing data at admin node, and then...
adminRef.removeValue { (error, dbRef) in
if (error != nil) { print("Yikes!") }
// 4. ...copy the new user info to the admin node one node at a time (if user has a lot of data)
var totalNodesCopied = 0
for item in userSnapshot.children {
guard let snap = item as? DataSnapshot else { print("snap error"); return }
self.ref.child(MP.adminID).child(snap.key).setValue(snap.value) { (error, adminRef) in
totalNodesCopied += 1
if totalNodesCopied == userSnapshot.childrenCount {
print("ALL DONE COPYING!!")
}
}
}
}
}
}

process Swift DispatchQueue without affecting resource

I have a Swift DispatchQueue that receives data at 60fps.
However, depending on phones or amount of data received, the computation of those data becomes expensive to process at 60fps. In actuality, it is okay to process only half of them or as much as the computation resource allows.
let queue = DispatchQueue(label: "com.test.dataprocessing")
func processData(data: SomeData) {
queue.async {
// data processing
}
}
Does DispatchQueue somehow allow me to drop some data if a resource is limited? Currently, it is affecting the main UI of SceneKit. Or, is there something better than DispatchQueue for this type of task?
There are a couple of possible approaches:
The simple solution is to keep track of your own Bool as to whether your task is in progress or not, and when you have more data, only process it if there's not one already running:
private var inProgress = false
private var syncQueue = DispatchQueue(label: Bundle.main.bundleIdentifier! + ".sync.progress") // for reasons beyond the scope of this question, reader-writer with concurrent queue is not appropriate here
func processData(data: SomeData) {
let isAlreadyRunning = syncQueue.sync { () -> Bool in
if self.inProgress { return true }
self.inProgress = true
return false
}
if isAlreadyRunning { return }
processQueue.async {
defer {
self.syncQueue.async { self.inProgress = false }
}
// process `data`
}
}
All of that syncQueue stuff is to make sure that I have thread-safe access to the inProgress property. But don't get lost in those details; use whatever synchronization mechanism you want (e.g. a lock or whatever). All we want to make sure is that we have thread-safe access to the Bool status flag.
Focus on the basic idea, that we'll keep track of a Bool flag to know whether the processing queue is still tied up processing the prior set of SomeData. If it is busy, return immediately and don't process this new data. Otherwise, go ahead and process it.
While the above approach is conceptually simple, it won't offer great performance. For example, if your processing of data always takes 0.02 seconds (50 times per second) and your input data is coming in at a rate of 60 times per second, you'll end up getting 30 of them processed per second.
A more sophisticated approach is to use a GCD user data source, something that says "run the following closure when the destination queue is free". And the beauty of these dispatch user data sources is that it will coalesce them together. These data sources are useful for decoupling the speed of inputs from the processing of them.
So, you first create a data source that simply indicates what should be done when data comes in:
private var dataToProcess: SomeData?
private lazy var source = DispatchSource.makeUserDataAddSource(queue: processQueue)
func configure() {
source.setEventHandler() { [unowned self] in
guard let data = self.syncQueue.sync(execute: { self.dataToProcess }) else { return }
// process `data`
}
source.resume()
}
So, when there's data to process, we update our synchronized dataToProcess property and then tell the data source that there is something to process:
func processData(data: SomeData) {
syncQueue.async { self.dataToProcess = data }
source.add(data: 1)
}
Again, just like the previous example, we're using syncQueue to synchronize our access to some property across multiple threads. But this time we're synchronizing dataToProcess rather than the inProgress state variable we used in the first example. But the idea is the same, that we must be careful to synchronize our interation with a property across multiple threads.
Anyway, using this pattern with the above scenario (input coming in at 60 fps, whereas processing can only process 50 per second), the resulting performance much closer to the theoretical max of 50 fps (I got between 42 and 48 fps depending upon the queue priority), rather than 30 fps.
The latter process can conceivably lead to more frames (or whatever you're processing) to be processed per second and results in less idle time on the processing queue. The following image attempts to graphically illustrate how the two alternatives compare. In the former approach, you'll lose every other frame of data, whereas the latter approach will only lose a frame of data when two separate sets of input data came in prior to the processing queue becoming free and they were coalesced into a single call to the dispatch source.

Parallel.Foreach and BulkCopy

I have a C# library which connects to 59 servers of the same database structure and imports data to my local db to the same table. At this moment I am retrieving data server by server in foreach loop:
foreach (var systemDto in systems)
{
var sourceConnectionString = _systemService.GetConnectionStringAsync(systemDto.Ip).Result;
var dbConnectionFactory = new DbConnectionFactory(sourceConnectionString,
"System.Data.SqlClient");
var dbContext = new DbContext(dbConnectionFactory);
var storageRepository = new StorageRepository(dbContext);
var usedStorage = storageRepository.GetUsedStorageForCurrentMonth();
var dtUsedStorage = new DataTable();
dtUsedStorage.Load(usedStorage);
var dcIp = new DataColumn("IP", typeof(string)) {DefaultValue = systemDto.Ip};
var dcBatchDateTime = new DataColumn("BatchDateTime", typeof(string))
{
DefaultValue = batchDateTime
};
dtUsedStorage.Columns.Add(dcIp);
dtUsedStorage.Columns.Add(dcBatchDateTime);
using (var blkCopy = new SqlBulkCopy(destinationConnectionString))
{
blkCopy.DestinationTableName = "dbo.tbl";
blkCopy.WriteToServer(dtUsedStorage);
}
}
Because there are many systems to retrieve data, I wonder if it is possible to use Pararel.Foreach loop? Will BulkCopy lock the table during WriteToServer and next WriteToServer will wait until previous will complete?
-- EDIT 1
I've changed Foreach to Parallel.Foreach but I face one problem. Inside this loop I have async method: _systemService.GetConnectionStringAsync(systemDto.Ip)
and this line returns error:
System.NotSupportedException: A second operation started on this
context before a previous asynchronous operation completed. Use
'await' to ensure that any asynchronous operations have completed
before calling another method on this context. Any instance members
are not guaranteed to be thread safe.
Any ideas how can I handle this?
In general, it will get blocked and will wait until the previous operation complete.
There are some factors that may affect if SqlBulkCopy can be run in parallel or not.
I remember when adding the Parallel feature to my .NET Bulk Operations, I had hard time to make it work correctly in parallel but that worked well when the table has no index (which is likely never the case)
Even when worked, the performance gain was not a lot faster.
Perhaps you will find more information here: MSDN - Importing Data in Parallel with Table Level Locking

Firebase query slow code execution: append to table

I was hoping I could get some help optimising my code. I´m new to development so please be kind.
Currently it works, but it uses quite some time (10-15 sec) to load the first table view I need in my app.
First I thought that I had not activated "persistence" properly, but I am starting to suspect that it is the way I am loading data that is suboptimal.
The "large" (12k + items) data set I use dont change that frequently, so the ideal solution would be to load that once, then listen for changes. I thought that was what I am doing, but if so I dont understand why it is so slow? So I now suspect that it is the way that I append the data every time, instead of just "reading/loading" from "somewhere local" and then listen for changes from the sever?
Any help is appreciated
//read From Firebase adjusted to whiskies
func startObservingDB () {
dbRef.queryOrdered(byChild: "brand_name").observe(.value, with: { (snapshot:FIRDataSnapshot) in
var newWhisky = [WhiskyItem]()
//forloop to iterate through the snapshot
for whiskyItem in snapshot.children {
let whiskyObject = WhiskyItem(snapshot: whiskyItem as! FIRDataSnapshot)
newWhisky.append(whiskyObject)
}
//update
self.whiskies = newWhisky
print("WhiskyItem")
self.tableView.reloadData()
}) { (error: Error) in
print(error.localizedDescription)
}
}
Firebase structure: /Results/Index/name: xxx, "other thing1": xxxx,..., "other thing32": xxxx
I'm not sure, that it is a good idea to store all 12 000 items on your phone.
May be it will be good solution for you:
You can use this lib for:
(example)
1) load data for 100 rows
2) scroll to the end
3) do another load of 100 rows.
Hope it helps

Identify user's high score ranking in Parse back-end

I have developed a simple Swift iOS game with high scores stored in Parse. Saving and retrieving data works fine. What I'd like to do now is to implement a "user ranking" -feature, which would show how the user's high score ranks against other players. In practice this would mean that I'd need:
The total count of high scores in Parse
The ranking of the user's high score on that list
If Parse did not have any query limits, this would be relatively easy to implement for even a newbie coder like myself. However, it does, and I just can't figure out how to implement this in a way that this would still work (1) efficiently, and (2) even if there were e.g., 100,000 high scores.
What would be a workable way of identifying the current user's ranking amongst all other high scores stored in Parse? I don't want to use countObjects as I believe it fails when the number of objects gets high.
Were you using the countObjectsInBackground without a completion block? Because then I'd believe that you could get messed up with asynchronous issues, but it's very easy, and very safe with the countObjectsInBackgroundWithBlock method.
func getUserPosition() {
let totalQuery = PFQuery(className: "highScores")
totalQuery.countObjectsInBackgroundWithBlock { (number, error) -> Void in
if error == nil {
//your total number is the number var passed here
let positionQuery = PFQuery(className: "highScores")
positionQuery.whereKey("score", greaterThanOrEqualTo: userHighScore)
positionQuery.countObjectsInBackgroundWithBlock({ (position, error) -> Void in
//your user's position is the position var passed here
})
}
}
}