Core Data, CloudKit - Failed to find matching objectIDs for CKRecordID - swift

I am using CloudKit to sync my app across devices.
At first everything seems to work as expected but after a while CloudKit seems to get caught in an endless loop and the debug console throws tons of these messages (several thousand in serial):
CoreData: debug: CoreData+CloudKit: -[PFCloudKitSerializer
applyUpdatedRecords:deletedRecordIDs:toStore:inManagedObjectContext:onlyUpdatingAttributes:andRelationships:madeChanges:error:]_block_invoke(1018):
Failed to find matching objectIDs for <CKRecordID: 0x60000330c000;
recordName=1E0972A7-D9DD-44A7-88F9-3AD13B32A330,
zoneID=com.apple.coredata.cloudkit.zone:defaultOwner> /
<CKRecordID: 0x60000330c020;
recordName=EE02B981-E54D-486B-95A1-AC0839671C27,
zoneID=com.apple.coredata.cloudkit.zone:defaultOwner> in pending
relationship: 0xe92e2f9c5a6d27e2
x-coredata://75AFDFFD-8E35-4B9F-AA61-C477073B435B/NSCKImportPendingRelationship/p8626
I guess the most important part is
Failed to find matching objectIDs for <CKRecordID: 0x60000330c000; ...
It's just the standard CloudKit implementation without any special custom code, therefore I have no idea where to start to investigate.
Is this normal, expected behaviour?
I feel like this is slowing down my CloudKit sync quite a lot.

I just found the issue on my side. During further investigation I realized that these messages came together with other messages referring to one specific entity. I preload Core Data from a json file with some data which eventually change from time.
For development purposes each time the app launches,I update my preloaded data from the json file
It turned out that I created new objects during that update to run a comparison and forgot to delete the unnecessary objects right after.
Since that update process creates several hundred objects, that adds up quite fast. They were now floating around in core data without any purpose. And CloudKit had to sync them.
However, at the end it's still quite small data and why this causes into those cryptic debug messages and hours of syncing, is still a mystery for me.

Related

Filtering certain types of Requests logged by quarkus.http.access-log

I want to test a new REST-Client I am building, where I'd like to see the exact request which is being built, so I set the quarkus.http.access-log.enabled=true property. When starting the Quarkus however, I am being bombarded with the logs of many scheduled requests which are happening simultaneously. The worst of those are several Elasticsearch Scroll-Requests, which return a lot of data which is directly fed into the log.
My idea is, that everything returning a response that contains _index should be filtered, as I know well by now, that my ES-Client is working properly, however it pumps out so much data, that the request I am intending to log is overwritten almost instantly.
So my question: Does somebody know a working (and convenient) method to effectively filter unwanted HTTP-Logs?
I tried setting
quarkus.http.access-log.exclude-pattern=(_index+)
in an attempt to filter out unwanted requests, but I'm not sure where to continue from there.

Data syncing with pouchdb-based systems client-side: is there a workaround to the 'deleted' flag?

I'm planning on using rxdb + hasura/postgresql in the backend. I'm reading this rxdb page for example, which off the bat requires sync-able entities to have a deleted flag.
Q1 (main question)
Is there ANY point at which I can finally hard-delete these entities? What conditions would have to be met - eg could I simply use "older than X months" and then force my app to only ever displays data for less than X months?
Is such a hard-delete, if possible, best carried out directly in the central db, since it will be the source of truth? Would there be any repercussions client-side that I'm not foreseeing/understanding?
I foresee the number of deleted's growing rapidly in my app and i don't want to have to store all this extra data forever.
Q2 (bonus / just curious)
What is the (algorithmic) basis for needing a 'deleted' flag? Is it that it's just faster to check a flag rather than to check for the omission of an object from, say, a very large list. I apologize if it's kind of a stupid question :(
Ultimately it comes down to a decision that's informed by your particular business/product with regards to how long you want to keep deleted entities in your system. For some applications it's important to always keep a history of deleted things or even individual revisions to records stored as a kind of ledger or history. You'll have to make a judgement call as to how long you want to keep your deleted entities.
I'd recommend that you also add a deleted_at column if you haven't already and then you could easily leverage something like Hasura's new Scheduled Triggers functionality to run a recurring job that fully deletes records older than whatever your threshold is.
You could also leverage Hasura's permissions system to ensure that rows that have been deleted aren't returned to the client. There is documentation and examples for ways to work with soft deletes and Hasura
For your second question it is definitely much faster to check for the deleted flag on records than to have to try and diff the entire dataset looking for things that are now missing.

Can watchman send why a file changed?

Is watchman capable of posting to the configured command, why it's sending a file to that command?
For example:
a file is new to a folder would possibly be a FILE_CREATE flag;
a file that is deleted would send to the command the FILE_DELETE flag;
a file that's modified would send a FILE_MOD flag etc.
Perhaps even when a folder gets deleted (and therefore the files thereunder) would send a FOLDER_DELETE parameter naming the folder, as well as a FILE_DELETE to the files thereunder / FOLDER_DELETE to the folders thereunder
Is there such a thing?
No, it can't do that. The reasons why are pretty fundamental to its design.
The TL;DR is that it is a lot more complicated than you might think for a client to correctly process those individual events and in almost all cases you don't really want them.
Most file watching systems are abstractions that simply translate from the system specific notification information into some common form. They don't deal, either very well or at all, with the notification queue being overflown and don't provide their clients with a way to reliably respond to that situation.
In addition to this, the filesystem can be subject to many and varied changes in a very short amount of time, and from multiple concurrent threads or processes. This makes this area extremely prone to TOCTOU issues that are difficult to manage. For example, creating and writing to a file typically results in a series of notifications about the file and its containing directory. If the file is removed immediately after this sequence (perhaps it was an intermediate file in a build step), by the time you see the notifications about the file creation there is a good chance that it has already been deleted.
Watchman takes the input stream of notifications and feeds it into its internal model of the filesystem: an ordered list of observed files. Each time a notification is received watchman treats it as a signal that it should go and look at the file that was reported as changed and then move the entry for that file to the most recent end of the ordered list.
When you ask Watchman for information about the filesystem it is possible or even likely that there may be pending notifications still due from the kernel. To minimize TOCTOU and ensure that its state is current, watchman generates a synchronization cookie and waits for that notification to be visible before it responds to your query.
The combination of the two things above mean that watchman result data has two important properties:
You are guaranteed to have have observed all notifications that happened before your query
You receive the most recent information for any given file only once in your query results (the change results are coalesced together)
Let's talk about the overflow case. If your system is unable to keep up with the rate at which files are changing (eg: you have a big project and are very quickly creating and deleting files and the system is heavily loaded), the OS can't fit all of the pending notifications in the buffer resources allocated to the watches. When that happens, it blows those buffers and sends an overflow signal. What that means is that the client of the watching API has missed some number of events and is no longer in synchronization with the state of the filesystem. If that client is maintains state about the filesystem it is no longer valid.
Watchman addresses this situation by re-examining the watched tree and synthetically marking all of the files as being changed. This causes the next query from the client to see everything in the tree. We call this a fresh instance result set because it is the same view you'd get when you are querying for the first time. We set a flag in the result so that the client knows that this has happened and can take appropriate steps to repair its own state. You can configure this behavior through query parameters.
In these fresh instance result sets, we don't know whether any given file really changed or not (it's possible that it changed in such a way that we can't detect via lstat) and even if we can see that its metadata changed, we don't know the cause of that change.
There can be multiple events that contribute to why a given file appears in the results delivered by watchman. We don't them record them individually because we can't track them with unbounded history; imagine a file that is incrementally being written once every second all day long. Do we keep 86400 change entries for it per day on hand and deliver those to our clients? What if there are hundreds of thousands of files like this? We'd have to truncate that data, and at that point the loss in the data reduces how well you can reason about it.
At the end of all of this, it is very rare for a client to do much more than try to read a file or look at its metadata, and generally speaking, they want to do that only when the file has stopped changing. For this use case, watchman-wait, watchman-make and trigger all have the concept of a settle period that causes the change notifications to be delayed in delivery until after the filesystem has stopped changing.

Exporting Importing Breeze Model and hasTempKey issues

When you create a new entity Breeze set id:-1 state:'Added' hasTempKey:true. After export and re-import Breeze doesn't merge the import -1 entity with the current -1 entity in memory it adds a new one... this is explain in the docs... ( but how do we overcome this problem is the question in my case... ) So I tried to set the entity created to setUnchanged(); Now the export import cycle runs as expected but the created entity has lost it's hasTempKey:true property so newly created entity can conflict with a current one... some advice on how to resolve these issues would really be appreciated thanks
I assume that this question relates to your approach to implementing a disconnected app as described in this SO question.
As I said there, I you're spending too much time trying to trick Breeze into doing what it should not do. Here, for example, you want EntityManager.saveChanges to not actually save to the remote data store. But the whole point of "saveChanges" is that it persists permanently. "Saving locally" is not really saving. No one but you knows about these saved changes. You don't know if they would pass the business validation rules on your server or if they would collide with a different user's changes. If your laptop dies or is stolen, your locally saved data are gone.
I think breeze can be a huge help in crafting occasionally connected applications. But I think it is critical to properly differentiate stashing changes locally with the intent to save them and actually saving them remotely.
Outline of a Disconnected App
I urge you to take a different tack.
Your app could easily initiate a sequence of distinct editing sessions. For example, one session could be a travel reservation for client 'A', another session for client 'B', and a third session is about something else entirely ... maybe the client 'C' profile.
When your app can't reach the server, it preserves each session as a WIP ("Work In Progress"). Each WIP session its own serialized bundle, identified by a WIP key.
Aside: you'll see this pattern in John Papa's "Building Apps with Angular and Breeze Part 2" when that comes out later this year.
The Breeze EntityManager.exportEntities(list_of_entities) serializes everything about the changed entities of that session including their change-state, original values, and temp keys. Remember that the list_of_entities can be anything including an object graph. You can save that bundle to browser local storage under the WIP key and restore it later.
I'd keep a directory of WIP sessions that included information about the state of the session as a whole (e.g., what kind of editing session it is and whether this session was ready to be persisted remotely). Your app presents WIP sessions to the user while offline. When it gets a connection, it goes through a "synchronization" phase during which it tries to persist the changes. With luck it succeeds. If not, you can rehydrate the session in the UI and help the user reconcile the conflicts.
These are broad strokes. The devil is in the details.
The critical thing in this context is that you do not mess with entity state or the temp keys. You don't care what the keys are or if they change. The serialized session will hold that state information for you. The serialized bundles will move in and out of local storage without complaint. You are using Breeze as intended, while offline or online.
The current behavior is deliberate. Generally we assume that temp keys are effectively not comparable. However, I do understand your use case. So, one approach would be to:
1) import your "exported entities" into a temporary EntityManager and check for temp key collisions between this entityManager and your "destination" EntityManager.
2) Then remove any "dups" from the destination EntityManager
3) Import your original "exported entities" into your destination EntityManager
You can actually skip step 1 if you know that all tempKeys are dups.
Another approach would be to use Guid's for your keys. This completely bypasses the temp key issue because Guid never need to be "temporary".

Entity Framework Code First - Model change breaks Seed

We've been using Entity Framework Code First 5 for a little while now, without major issue.
I've recently discovered that ANY change I make to my model (such as adding a field, or removing a field) means that the Seed method no longer runs leaving my database in an invalid state.
If I reverse the change, the seed method runs fine.
I have tried making changes to varying parts of my model, so it's not the specific change which is relevant.
Anyone know how I can (a) debug what the specific issue is, or (b) come across this themselves and know how to fix it?
UPDATE: After the model change, however many times I query the database it doesn't run the Seed. However, I have found that if I manually run IISRESET, and then re-execute the web service which executes the query it does then run the seed! Anyone know why this would be the case, and why suddenly I need to reset IIS in between the database initialization and the Seed executing?
Many thanks Steve