Swift: Implementing custom merge policy

Swift: Implementing custom merge policy - swift

I'm building an OSX app using Swift, with Coredata as my data layer. As part of this, I have table that lists a large number of files, with metadata associated with each. Each record can include a URI that points to one of three services it can be hosted on.
1. title created_at size uuid source_local source_remote source_cloud
I generate all the records using information pulled from the local source. These records all have a source_local string.
Later I import a number of records from the remote source. These records are all added and have a source_remote string.
A number of these records are hosted on both services, and have matching UUIDs. There is a unique constraint on the UUID field, and I want Swift to merge these two records' fields in some way when it has a constraint error.
I've tried:
NSMergeByPropertyStoreTrumpMergePolicy
and
NSMergeByPropertyObjectTrumpMergePolicy
But these policies result in one record completely trumping the other.
Currently I have to work around this limitation by checking if a record already exists with the UUID and updating the existing record with any missing fields in the new file.
However this feels non-optimal – is there a way to create a custom merge policy, in order to have Swift automatically handle conflicts in this way? At this stage I am not concerned with whether the Store or Memory record trumps the other, as long as I can correctly the merge the source_* fields.
Thanks

First of all, thanks to #tom-harrington for his nod to extend NSMergePolicy. Huge oversight on my part that I hadn't even considered going down that route.
While exploring how NSMergeByPropertyStoreTrumpMergePolicy/NSMergeByPropertyObjectTrumpMergePolicy are implemented, however, I realised that this issue stems from a misunderstanding on my part. These policies already handle conflicts at a property level. Rather than discarding the entirety of one of the object states on conflict, they compare each property and only apply the policy to those properties which have both changed/exist.
NSOverwriteMergePolicy and NSRollbackMergePolicy are policies that will result in one of either object A or B being completely discarded on conflict.

Related

Detect and handle certain conflicts in Core Data with iCloud Sync

I'm trying to create a note-taking like app that uses NSPersistentCloudKitContainer and core data.
The store uses the NSMergeByPropertyObjectTrumpMergePolicy, which is fine for almost every property. For example, if the name of a file is changed on two different devices, then it's fine to use the latest value.
The problem is that the note text cannot be overridden by the latest value if it's changed on two devices at once. It needs to be detected as a conflict so the user can choose which version they want to keep.
I can replicate the behavior by turning off wifi on one device and writing content, then writing content on a different device at the same time. When I turn the wifi back on, whichever device saved the changes last completely overrides the other device's text.
What I'd like to accomplish is detect when there is a conflict of text, then create a duplicate file called "Conflicted Copy." Bonus points if someone can tell me how Apple Notes magically merges text without ever creating a conflict. I really only need a simple solution though that prevents data loss.
Any help in the right direction would be appreciated!

The conflict arises when CoreData & CloudKit sync an object to the persistent store while a managed context has an updated version of the object that has not yet been saved.
Any merge policy, including a custom merge policy, is used to create a single object that will be stored in the persistent store, but you want to have both conflicting objects so that the user can choose one of them.
Thus automatic conflict resolution, including with a custom merge policy, cannot be applied.
Instead, use the default merge policy of type error. The docs say
If a save fails because of conflicting objects, you can find the IDs
of those objects in error’s userInfo dictionary. Use the
NSInsertedObjectsKey and NSUpdatedObjectsKey keys to extract the
object IDs.
This means, you can keep the properties of the not-yet-stored conflicting object in the managed context, and re-fetch the properties of the conflicting object from the persistent store. This will overwrite the conflicting in-context version and "resolve" the conflict, but you still have now the conflicting properties and can present them to the user.
If the user selects the version in the persistent store, you are done. Otherwise update the objects properties with the kept values as required and save the context.
PS: Here is for your information how to implement a custom merge policy, although it cannot be applied here, just because such info is difficult to find.
EDIT:
I can imagine 2 ways to have one merge policy for your "normal" objects, and the error policy for the text objects:
A merge policy is set for a managed context. So you could have one context with NSMergeByPropertyObjectTrumpMergePolicy, and another one with NSErrorMergePolicy. So if it is feasible to fetch most objects into the 1st context, but the text storing objects into the 2nd one, you could apply both conflict resolution strategies.
During conflict resolution, i.e. in func resolve(optimisticLockingConflicts…, one has to call super.resolve(optimisticLockingConflicts: (see my example implementation cited above). So if you set NSMergeByPropertyObjectTrumpMergePolicy to your context, but call super only for your "normal" objects, this merge policy would not be applied to your text objects, and you could handle the conflict by your own. Warning: As far as I know this is non-standard, and I am not sure what happens if you don't call super.

TYPO3 backend workflow when avoiding the storage of data in intermediate table

I have a situation as described in the ExtbaseFluid book:
I would like to store information in the intermediate table which is not recommended at all.
Here is a cite from the warning box of the above linked book chapter:
Do not store data in the Intermediate Table that concern the Domain. Though TYPO3 supports this (especially in combination with Inline Relational Record Editing (IRRE) but this is always a sign that further improvements can be made to your Domain Model. Intermediate Tables are and should always be tools for storing relationships and nothing else.
Let’s say you want to store a CD with its containing music tracks: CD -- m:n (Intermediate Table) -- Song. The track number may be stored in a field of the Intermediate Table. However, the track should be stored as a separate domain object, and the connection be realized as CD -- 1:n -- Track -- n:1 -- Song.
So I want not to do what is not recommended. But thinking about the workflow for the editor that results of the recommended solution rises a few question for me.
To stay with this example I would need the following tables:
tx_extname_domain_model_cd
tx_extname_domain_model_cd_track_mm
tx_extname_domain_model_track (which holds the track number)
tx_extname_domain_model_track_song_mm
tx_extname_domain_model_song
From what I know this would end in the situation that the editor would need to create following records:
one record for the cd
one record for the song
now the editor can create one record for the track.
There the track number is added.
Furthermore the cd record needs to be assigned as well as the song.
So here are my questions:
I guess this workflow cannot be improved with some (to me unknown) TCA setup?
An editor cannot directly reach the song when the cd record is opened?
Instead first she / he has to open the track record and can from there navigate to the song?
Is it really that bad to store data in the intermediate table? The TYPO3 table sys_file_reference does the same!? But I wonder how those data could be shown (because IRRE is not possible because it shall only be used for 1:n relations (source).

The question you have to ask yourself is: Do I want to do coding by the book, or do I want to create a pragmatic approach to solve a customer's problem?
In this specific case the additional problem is, that the people who originally invented Extbase had a quite sophisticated and academic approach, but when it comes to a pragmatic use and performance, they were blocked by their own rules and stuck with coding by the book.
Especially this example and the warning message shows a way of thinking that was one of the reasons, why I never actually used Extbase but went for Core-API methods to create performant and pragmatic queries to get the desired result sets. Now that we've got Doctrine under the hood, this works like a charm even with quite exotic DB flavors.
Of course intermediate tables are a good idea and of course those intermediate tables can and should be enriched with additional data fields, that do not require a 3rd, 4th or nth table to store i.e. a simple set of dropdown options, since this can easily be handled with attributes configured in TCA, as it is shown here: https://docs.typo3.org/m/typo3/reference-tca/master/en-us/ColumnsConfig/Type/Inline/Examples.html
sys_file_reference is the most prominent example since it provides exactly that kind of additional information that should not be pumped into additional tables - and guess what, the TYPO3 core does not make use of a single line of Extbase code to deal with that data or almost any other data of the core tables.
To answer your last question: Take a look at the good old IRRE Tutorial to get a clue how to do m:n connections with intermediate inline tables.
https://docs.typo3.org/typo3cms/extensions/irre_tutorial/0.4.0/Manual/Index.html#intermediate-tables-for-m-n-relations

Depends on the issue, sometimes the intermediate table is an entity, sometimes not. In this example the intermediate table is the track, which would contain: [uid, cd, song, track_no, ... (whatever else needed to define the track)]
Be carefull when you define your data, that you do not make it too advanced.

Keeping duplicated DynamoDB records synchronized

I am currently trying to model the data for our application. The data consists of identities and groups. One group can have multiple identities and one identity can be in multiple groups. (a typical many-to-many relationship).
So I have used the Adjacency List Design Pattern to structure my data as recommended by AWS:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html
I keep all the info about identities duplicated inside the groups and reading the data works just fine - a normal query for the details and a query against the index to get the relations of my objects.
How can I ensure that all duplicated records have the same value?
Every time the group changes, I am updating all the duplicated group records in the database.
I am okay with updating multiple records at once as changes will happen rarely but I want to avoid inconsistent data.
All the tutorials and guides always just talk about reading and accessing data not about updating the data.
I know that there is a TransactWriteItem-Reuquest but it is limited to 25 items maximum. So is there another way/pattern to guarantee that all my identity records are updated when e.g. the name changes.

You have to decide for yourself how consistent is consistent enough in your application.
The CAP theorem is alive and well and it says that to get availability and partition tolerance we have to sacrifice consistency.
Since updates happen infrequently, how does your application fail if it sees inconsistent records? If you can't use the transactional API because of the 25 item limit, maybe you could roll your own "lock-out" using an attribute you would set on items that must all be updated together:
first, you identify all items that need to be updated and set the "lock_out" attribute on them (this can be a timestamp indicating when the lock_out expires)
in your application, you can add business logic to treat items with the "lock_out" in a way that makes sense (maybe show them as being updated, or not show them at all etc.)
update the items
after the update is complete, clear the "lock-out" attribute

Graphql Schema update rollback

We are moving some of our API's to graphql and would like to know to handle the rollback of the deployed package (Schema)and the best practice to the same.
To be more specific let's say we have a Schema S with 3 fields and then we added 4th field "A" . Now for some reason we cannot go forward with this package and field "A". So we have to perform roll back of the package so that now the Schema doesn't have field "A".
Now some consumer might ask for this field "A" and he might get an error. We could of course ask our clients to update but there is a time gap during which we might have failed request.
How do we handle this scenario,specifically an urgent rollback with in few hours or a day?

In general, you should avoid removing fields without warning to avoid the exact scenario you describe.
As your schema evolves, it's not uncommon to have some fields that are no longer needed. For example, rather than introducing a drastic change to a particular field (moving from a nullable to a non-nullable return type, adding required arguments, etc.), we may opt to add another field and encourage clients to transition to using that one instead. In such scenarios, we want to eventually remove the original field. The safest way to do so is to deprecate the field first. Using SDL, we can do so using a directive:
fieldA: String #deprecated(reason: "Use fieldB instead!")
After a certain amount of time, you can then remove the field entirely. How long you wait to remove the field depends on your team and the expectations you've communicated around handling deprecated fields. For example, you may find it helpful to set a deadline, by which point all clients are expected to have stopped using any deprecated fields. This works well as long as your client teams have the bandwidth to handle such technical debt.
A deprecated field's resolver can be changed to return a null value (if the field itself is nullable) or some minimal mock data. This prevents making unnecessary API or database calls, while still ensuring client requests don't result in an error.
In the context of your question, this means you should probably avoid rolling back to a previous release and instead follow the process outlined above for the fields you want to remove.
Alternatively, you could consider versioning. GraphQL generally shies away from the concept of versioning. As the official site explains:
Why do most APIs version? When there's limited control over the data that's returned from an API endpoint, any change can be considered a breaking change, and breaking changes require a new version. If adding new features to an API requires a new version, then a tradeoff emerges between releasing often and having many incremental versions versus the understandability and maintainability of the API.
In contrast, GraphQL only returns the data that's explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking change. This has led to a common practice of always avoiding breaking changes and serving a versionless API.
With that in mind, it's also feasible to still implement versioning with GraphQL by serving different schemas from different endpoints. While it's costly and usually unnecessary to go that route, it may be the right solution for you and your team, particularly if you expect to have to do similar rollbacks in the future.

You cannot do anything w.r.t GraphQL. Since you need to have the field present in the GraphQL type system. There may be libraries available, which will allow you to specify whether the field should be present in the query or not. But, there's no way of allowing non-existent field in the Query.
But what you can do is opt for a Blue-Green deployment strategy. In this strategy, you have both the versions running at the same time.
Let's say: Green is the version with Field A and Blue is the version without Field A. So when your clients are updated they start requesting the Blue version. And once all your clients are updated, shut-down the Green (with Field A).

Options for handling a frequently changing data form

What are some possible designs to deal with frequently changing data forms?
I have a basic CRUD web application where the main data entry form changes yearly. So each record should be tied to a specific version of the form. This requirement is kind of new, so the existing application was not built with this in mind.
I'm looking for different ways of handling this, hoping to avoid future technical debt. Here are some options I've come up with:
Create a new object, UI and set of tables for each version. This is obviously the most naive approach.
Keep adding all the fields to the same object and DB tables, but show/hide them based on the form version. This will become a mess after a few changes.
Build form definitions, then dynamically build the UI and store the data as some dictionary like format (e.g. JSON/XML or maybe an document oriented database) I think this is going to be too complex for the scope of this app, especially for the UI.
What other possibilities are there? Does anyone have experience doing this? I'm looking for some design patterns to help deal with the complexity.

First, I will speak to your solutions above and then I will give my answer.
Creating a new table for each
version is going to require new
programming every year since you will
not be able to dynamically join to
the new table and include the new
columns easily. That seems pretty obvious and really makes this a bad choice.
The issues you mentioned with adding
the columns to the same form are
correct. Also, whatever database you
are using has a max on how many
columns it can handle and how many
bytes it can have in a row. That could become another concern.
The third option I think is the
closest to what you want. I would
not store the new column data in a
JSON/XML unless it is for duplication
to increase speed. I think this is
your best option
The only option you didn't mention
was storing all of the data in 1
database field and using XML to
parse. This option would make it
tough to query and write reports
against.
If I had to do this:
The first table would have the
columns ID (seeded), Name,
InputType, CreateDate,
ExpirationDate, and CssClass. I
would call it tbInputs.
The second table would have the have
5 columns, ID, Input_ID (with FK to
tbInputs.ID), Entry_ID (with FK to
the main/original table) value, and
CreateDate. The FK to the
main/original table would allow you
to find what items were attached to
what form entry. I would call this
table tbInputValues.
If you don't
plan on having that base table then
I would use a simply table that tracks the creation date, creator ID,
and the form_id.
Once you have those you will just need to create a dynamic form that pulls back all of the inputs that are currently active and display them. I would put all of the dynamic controls inside of some kind of container like a <div> since it will allow you to loop through them without knowing the name of every element. Then insert into tbInputValues the ID of the input and its value.
Create a form to add or remove an
input. This would mean you would
not have much if any maintenance
work to do each year.
I think this solution may not seem like the most eloquent but if executed correctly I do think it is your most flexible solution that requires the least amount of technical debt.

I think the third approach (XML) is the most flexible. A simple XML structure is generated very fast and can be easily versioned and validated against an XSD.
You'd have a table holding the XML in one column and the year/version this xml applies to.
Generating UI code based on the schema is basically a bad idea. If you do not require extensive validation, you can opt for a simple editable table.
If you need a custom form every year, I'd look at it as kind of a job guarantee :-) It's important to make the versioning mechanism and extension transparent and explicit though.

For this particular app, we decided to deal with the problem as if there was one form that continuously grows. Due to the nature of the form this seemed more natural than more explicit separation. We will have a mapping of year->field for parts of the application that do need to know which data is for which year.
For the UI, we will be creating a new page for each year's form. Dynamic form creation is far too complex in this situation.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse