Problems importing large file in Core Data with background thread - swift

I am running into some trouble importing a large .csv file in OS X using Core Data on a background thread.
My simplified data model is a Dataset which has a to-many relationship to many Entries. Each entry is a line in the .csv file (which has a bunch of attributes that I'll leave out for brevity). Following what I have read about efficiently importing lots of data, along with how to make a progress indicator work properly, I have created a managed object context for the purposes of going the import.
I am having trouble with two things:
I need to hang on to a reference to the new Dataset at the end of the import, because I need to select it in the popup. this will be done in the main thread, but for efficiency (and to make my NSProgressIndicator work) the new dataset is created on the background thread, with the background MOC.
From what I read, batching the import, so that the background MOC saves and resets, is the best way to stop the import from eating up too much memory. That does not turn out to be the case so far - it looks like gigs of memory is being used even for files in the tens of megabytes. Also, once I reset my import MOC, it cannot find the data for the inDataset, and so cannot create the relationship between all subsequent Entries and the Dataset.
I've posted the simplified code below. I have tried refreshObject:mergeChanges, without good results. Can anyone point me at what I am doing wrong?
let inFile = op.URL
dispatch_async(dispatch_get_global_queue(priority, 0)) {
//create moc
let inMOC = NSManagedObjectContext()
inMOC.undoManager = nil
inMOC.persistentStoreCoordinator = self.moc.persistentStoreCoordinator
var inDataset : inDataset = Dataset(entity: NSEntityDescription.entityForName("Dataset", inManagedObjectContext: inMOC)!, insertIntoManagedObjectContext: inMOC)
//set up NSProgressIndicator here, removed for clarity
let datasetID = inDataset.objectID
mocDataset = self.moc.objectWithID(datasetID) as! Dataset
let fileContents : String = (try! NSString(contentsOfFile: inFile!.path!, encoding: NSUTF8StringEncoding)) as String
let fileLines : [String] = fileContents.componentsSeparatedByString("\n")
var batchCount : Int = 0
for thisLine : String in fileLines {
let newEntry : Entry = Entry(entity: NSEntityDescription.entityForName("Entry", inManagedObjectContext: inMOC)!, insertIntoManagedObjectContext: inMOC)
//Read in attributes of this entry from this line, removed here for brevity
newEntry.setValue("Entry", forKey: "type")
newEntry.setValue(inDataset, forKey: "dataset")
inDataset.addEntryObject(newEntry)
dispatch_async(dispatch_get_main_queue()) {
self.progInd.incrementBy(1)
}
batchCount++
if(batchCount > 1000){
do {
try inMOC.save()
} catch let error as NSError {
print(error)
} catch {
fatalError()
}
batchCount = 0
inMOC.reset()
inDataset = inMOC.objectWithID(datasetID) as! Dataset
// fails here, does not seem to be able to find the data associated with inDataset
}
}// end of loop for reading lines
//save whatever remains after last batch save
do {
try inMOC.save()
} catch let error as NSError {
print(error)
} catch {
fatalError()
}
inMOC.reset()
dispatch_async(dispatch_get_main_queue()) {
//This is done on main queue after background queue has read all line
// I thought the statement just below would refresh mocDataset, but no luck
self.moc.refreshObject(mocDataset, mergeChanges: true)
//new dataset selected from popup
let datafetch = NSFetchRequest(entityName: "Dataset")
let datasets : [Dataset] = try! self.moc.executeFetchRequest(datafetch) as! [Dataset]
self.datasetController.addObjects(datasets)
let mocDataset = self.moc.objectWithID(datasetID) as! Dataset
//fails here too, mocDataset object has data as a fault
let nDarray : [Dataset] = [mocDataset]
self.datasetController.setSelectedObjects(nDarray)
}
}

Related

Swift 5: How to save results of NSFetchRequest to File

I am new to programming in general and have started with Swift. I have a feeling what I'm attempting to do is a bit outside of my scope, but I've come so far so here's the ask:
I am adding a tracker to a program for macOS X I've already created. The end user inputs a number and hits "Add to tracker" which then takes that number, the timestamp from the button click and writes that to the appropriate entity in Core Data. Everything works perfectly, my NSTable displays the data and I my batch delete works, but I cannot for the life of me work out the best way to take the results from the NSFetchRequest and print them to a text file.
Here is the code for my fetch request that occurs when the "print" button is hit:
#IBAction func printTracker(_ sender: Any) {
fetchRequest.propertiesToFetch = ["caseDate","caseNumber"]
fetchRequest.returnsDistinctResults = true
fetchRequest.resultType = NSFetchRequestResultType.dictionaryResultType
do {
let results = try context.fetch(fetchRequest)
let resultsDict = results as! [[String:String]]
} catch let err as NSError {
print(err.debugDescription)
}
}
After the "resultsDict" declaration is where I just can't seem to come to a workable solution for getting it to string, then to txt file.
If I add a print command to the console as is, I can see that resultsDict pulls correctly with the following format:
[["caseNumber": "12345", "caseDate": "3/22/21, 5:48:18 PM"]]
Ideally I need it in plaintext more like
"3/22/21, 5:48:18 PM : 12345"
Any advice or help on the conversion would be greatly appreciated.
A simple way if there is not a huge amount of data returned is to create a string from the fetched data and then write that string to disk
First create the string by getting the values from the dictionary and adding them in the right order into a string and joining the strings with a new line character
let output = results.reduce(into: []) { $0.append("\($1["caseDate", default: ""]) : \($1["caseNumber", default: ""])") }
.joined(separator: "\n")
Then we can write them to file, here I use the Document directory as the folder to save the file in
let paths = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)
let path = paths[0].appendingPathComponent("results.txt")
do {
try String(output).write(to: path, atomically: true, encoding: .utf8)
} catch {
print("Failed to write to file, error: \(error)")
}

Multi-threaded core data sometimes returns nil properties

I am new to core data. I have an app that uses core data as local store. Writing to and reading from core data is done by background threads. While this works generally, in rare cases are the fetched data wrong, i.e. properties of a fetched entity are nil.
To check the situation, I wrote a unit test that starts 2 async threads: One fetches continuously from core data, and the other one overwrites continuously these data by first deleting all data, and then storing new data.
This test pretty quickly provokes the error, but I have no idea why. Of course I guess this is a multi-threading problem, but I don’t see why, because fetches and deletion+writes are done in separate managed contexts of a single persistentContainer.
I am sorry that the code below is pretty long, although shortened, but I think without it one cannot identify the problem.
Any help is highly welcome!
Here is my function to fetch data:
func fetchShoppingItems(completion: #escaping (Set<ShoppingItem>?, Error?) -> Void) {
persistentContainer.performBackgroundTask { (managedContext) in
let fetchRequest: NSFetchRequest<CDShoppingItem> = CDShoppingItem.fetchRequest()
do {
let cdShoppingItems: [CDShoppingItem] = try managedContext.fetch(fetchRequest)
for nextCdShoppingItem in cdShoppingItems {
nextCdShoppingItem.managedObjectContext!.performAndWait {
let nextname = nextCdShoppingItem.name! // Here, sometimes name is nil
} // performAndWait
} // for all cdShoppingItems
completion(nil, nil)
return
} catch let error as NSError {
// error handling
completion(nil, error)
return
} // fetch error
} // performBackgroundTask
} // fetchShoppingItems
I have commented the line that sometimes crashes the test, since name is nil.
Here are my functions to store data:
func overwriteCD(shoppingItems: Set<ShoppingItem>,completion: #escaping () -> Void) {
persistentContainer.performBackgroundTask { (managedContext) in
self.deleteAllCDRecords(managedContext: managedContext, in: "CDShoppingItem")
let cdShoppingItemEntity = NSEntityDescription.entity(forEntityName: "CDShoppingItem",in: managedContext)!
for nextShoppingItem in shoppingItems {
let nextCdShoppingItem = CDShoppingItem(entity: cdShoppingItemEntity,insertInto: managedContext)
nextCdShoppingItem.name = nextShoppingItem.name
} // for all shopping items
self.saveManagedContext(managedContext: managedContext)
completion()
} // performBackgroundTask
} // overwriteCD
func deleteAllCDRecords(managedContext: NSManagedObjectContext, in entity: String) {
let deleteFetch = NSFetchRequest<NSFetchRequestResult>(entityName: entity)
let deleteRequest = NSBatchDeleteRequest(fetchRequest: deleteFetch)
deleteRequest.resultType = .resultTypeObjectIDs
do {
let result = try managedContext.execute(deleteRequest) as? NSBatchDeleteResult
let objectIDArray = result?.result as? [NSManagedObjectID]
let changes = [NSDeletedObjectsKey: objectIDArray]
NSManagedObjectContext.mergeChanges(fromRemoteContextSave: changes as [AnyHashable: Any], into: [managedContext])
} catch let error as NSError {
// error handling
}
} // deleteAllCDRecords
func saveManagedContext(managedContext: NSManagedObjectContext) {
if !managedContext.hasChanges { return }
do {
try managedContext.save()
} catch let error as NSError {
// error handling
}
} // saveManagedContext
Are you sure that name isn't nil for all requested entities? Just use guard-let to avoid ! for optional variables. Also ! it isn't safe way to unwrap optional variable especially if you can't be sure for source of data.
The problem with my code was apparently a race condition:
While the „fetch“ thread fetched the core data records, and tried to assign the attributes to the properties, the „store“ thread deleted the records.
This apparently released the attribute objects, so that nil was stored as property.
I thought that the persistentContainer would automatically prevent this, but it does not.
The solution is to execute both background threads of the persistentContainer in a concurrent serial queue, the „fetch“ thread synchronously, and the „store“ thread asynchronously with a barrier.
So, concurrent fetches can be executed, while a store waits until all current fetches are finished.
The concurrent serial queue is defined as
let localStoreQueue = DispatchQueue(label: "com.xxx.yyy.LocalStore.localStoreQueue",
attributes: .concurrent)
EDIT:
In the following fetch and store functions, I moved the core data function persistentContainer.performBackgroundTask inside the localStoreQueue. If it were outside as in my original answer, the store code in localStoreQueue.async(flags: .barrier) would setup a new thread and thus use managedContext in another thread that it was created in, which is a core data multi-threading error.
The „fetch“ thread is modified as
localStoreQueue.sync {
self.persistentContainer.performBackgroundTask { (managedContext) in
let fetchRequest: NSFetchRequest<CDShoppingItem> = CDShoppingItem.fetchRequest()
//…
} // performBackgroundTask
} // localStoreQueue.sync
and the „store“ thread as
localStoreQueue.async(flags: .barrier) {
self.persistentContainer.performBackgroundTask { (managedContext) in
self.deleteAllCDRecords(managedContext: managedContext, in: "CDShoppingItem")
//…
} // performBackgroundTask
} // localStoreQueue.async

Inserting child records is slow in coredata

I have close to 7K items stored in a relation called Verse. I have another relation called Translation that needs to load 7K related items with a single call from a JSON file.
Here is my code:
let container = getContainer()
container.performBackgroundTask() { (context) in
autoreleasepool {
for row in translations{
let t = Translation(context: context)
t.text = (row["text"]! as? String)!
t.lang = (row["lang"]! as? String)!
t.contentType = "Verse"
t.verse = VerseDao.findById(row["verse_id"] as! Int16, context: context)
// this needs to make a call to the database to retrieve the approparite Verse instance.
}
}
do {
try context.save()
} catch {
fatalError("Failure to save context: \(error)")
}
context.reset()
}
Code for the findById method.
static func findById(_ id: Int16, context: NSManagedObjectContext) -> Verse{
let fetchRequest: NSFetchRequest<Verse>
fetchRequest = Verse.fetchRequest()
fetchRequest.predicate = NSPredicate(format: "verseId == %#", id)
fetchRequest.includesPropertyValues = false
fetchRequest.fetchLimit = 1
do {
let results =
try context.fetch(fetchRequest)
return results[0]
} catch let error as NSError {
print("Could not fetch \(error), \(error.userInfo)")
return Verse()
}
}
This works fine until I add the VerseDao.findById, which makes the whole process really slow because it has to make a request for each object to the Coredata database.
I did everything I could by limiting the number of fetched properties and using NSFetchedResultsController for data fetching but no luck.
I wonder if there's any way to insert child records in a more efficient way? Thanks.
Assuming your store type is persistent store type is sqlite (NSSQLiteStoreType):
The first thing you should check is whether you have an Core Data fetch index on the Verse objects verseId property. See this stack overflow answer for some introductory links on fetch indexes.
Without that, the fetch in your VerseDao.findById function may be scanning the whole database table every time.
To see if your index is working properly you may inspect the SQL queries generated by adding -com.apple.CoreData.SQLDebug 1 to the launch arguments in your Xcode scheme.
Other improvements:
Use NSManagedObjectContext.fetch or NSFetchRequest.execute (equivalent) instead of NSFetchedResultsController. The NSFetchedResultsController is typically used to bind results to a UI. In this case using it just adds overhead.
Don't set fetchRequest.propertiesToFetch, instead set fetchRequest.includesPropertyValues = false. This will avoid fetching the Verse object property values which you don't need to establish the relation to the Translation object.
Don't specify a sortDescriptor on the fetch request, this just complicates the query

How do I Create objects and save with core data in for loop instead of overwriting each time?

I am trying to use a for loop to create objects, which have attributes I will use later to populate a map. I am able to loop through and populate the attributes of the object, but when I save the object, it seems to be overwriting each time. the for loop pulls data out of my firebase database.
let holes = [hole1,hole2,hole3,hole4,hole5,hole6,hole7,hole8,hole9,hole10,hole11,hole12,hole13,hole14,hole15,hole16,hole17,hole18]
for hole in holes {
let holeX = hole?["HoleX"] as? Double // these are just grabbing the data for the appropriate hole from my firebase data.
let holeY = hole?["HoleY"] as? Double
let holeH = hole?["HoleH"] as? Double
var newHole = NSEntityDescription.insertNewObject(forEntityName: "HoleCoordinateData", into: self.managedObjectContext)
newHole.setValue(holeX, forKey: "HoleX")
newHole.setValue(holeY, forKey: "HoleY")
newHole.setValue(holeH, forKey: "HoleH")
newHole.setValue(holeNumber, forKey: "holeNumber")
newHole.setValue("\(holeNumber)", forKey: "holeNumberString")
self.managedObjectContext.insert(newHole)
holeNumber += 1
}
do {
try self.managedObjectContext.save()
} catch {
print("Could not save Data: \(error.localizedDescription)")
}
I have tried the do try catch within the for loop. I originally had the newHole being initialized prior to the for loop, but I put it inside on the advice of some other stack questions and responses. I originally filled out newHole by just doing newHole.holeX = holeX. obviously some of these changes were just different ways of doing the same thing, but I've been spinning my wheels on this for long enough to try anything I could.

How to improve performance for large datasets with Realm?

My database has 500,000 records. The tables don't have a primary key because Realm doesn't support compound primary keys. I fetch data in background thread, then I want to display it in the UI on the main thread. But since Realm objects cannot be shared across threads I cannot use the record I fetched in the background. Instead I need to refetch the record on main thread? If I fetch a record out of the 500,000 records it will block the main thread. I don't know how to deal with it. I use Realm because it said it's enough quick. If I need refetch the record many times, is it really faster than SQLite? I don't want to create another property that combine other columns as primary key because the Realm database is already bigger than a SQLite file.
#objc class CKPhraseModel: CKBaseHMMModel{
dynamic var pinyin :String!
dynamic var phrase :String = ""
class func fetchObjects(apinyin :String) -> Results<CKPhraseModel> {
let realm = Realm.createDefaultRealm()
let fetchString = generateQueryString(apinyin)
let phrases = realm.objects(self).filter(fetchString).sorted("frequency", ascending: false)
return phrases
}
func save(needTransition :Bool = true) {
if let realm = realm {
try! realm.write(needTransition) {[unowned self] in
self.frequency += 1
}
}
else {
let realm = Realm.createDefaultRealm()
if let model = self.dynamicType.fetchObjects(pinyin).filter("phrase == %#", phrase).first {
try! realm.write(needTransition) {[unowned self] in
model.frequency += self.frequency
}
}
else {
try! realm.write(needTransition) {[unowned self] in
realm.add(self)
}
}
}
}
}
then I store fetched records in Array
let userInput = "input somthing"
let phraseList = CKPhraseModel().fetchObjects(userInput)
for (_,phraseModel) in phraseList.enumerate() {
candidates.append(phraseModel)
}
Then I want to display candidates information in UI when the user clicks one of these. I will call CKPhraseModel's save function to save changes. This step is on main thread.
Realm is fast if you use its lazy loading capability, which means that you create a filter that would return your candidates directly from the Realm, because then you'd need to only retrieve only the elements you index in the results.
In your case, you copy ALL elements out. That's kinda slow, which is why you end up freezing.