Best Way to Manage Derived Properties - swift

I have a couple custom NSManagedObjects that have various relationships between each other. Below is a very simplified example. In production there should ~10 instances of A, >= 10k instances of B, and <30 instances of C.
First, I'm trying to track the sum of B.value for specific categories in A. Second, I'm tracking the sum of B.value in C if B.date is between C.startDate and C.endDate. C instances are in a linked list style representing sequential windows in time.
If B.value changes manually going to A and C is fairly simple and updating the cached value in each. Updating the date in B is a little tougher as I'd have to search through the list and update it.
With all of this in mind, I've been trying to determine what is the best way in core data to keep these cached values up to date? My current thought is a mediator pattern, NotificationCenter, or KVO. Mediator pattern is not super flexible but would work. NotificationCenter seems ideal, however I'm not sure how to ensure that all instances of C are always in memory and subscribed to the publisher. KVO seems solid, but it doesn't seem to report edits for objects already in to-many relationships. What is the best way to keep these objects in sync with each other in the most Core data-esc way?
+---------+ +--------+ +---------+
|A | |B | |C |
+---------+ +--------+ +---------+
|totalCatA| <------->> |category| <<-------> |total |
|totalCatB| |date | |startDate|
+---------+ |value | |endDate |
+--------+ |prevC |
|nextC |
+---------+

Related

Scala: best way to update a deltatable after filling missing values

I have the following delta table
+-+----+
|A|B |
+-+----+
|1|10 |
|1|null|
|2|20 |
|2|null|
+-+----+
I want to fill the null values in column B based on the A column.
I figured this to do so:
var df = spark.sql("select * from MyDeltaTable")
val w = Window.partitionBy("A")
df = df.withColumn("B", last("B", true).over(w))
Which gives me the desired output:
+-+----+
|A|B |
+-+----+
|1|10 |
|1|10 |
|2|20 |
|2|20 |
+-+----+
Now, my question is:
What is the best way to write the result in my delta table correctly ?
Should I merge ? Re-write with overwrite option ?
My delta table us huge and it will keep on increasing, I am looking for the best possible method to achieve so.
Thank you
It depends on the distribution of the rows (aka. are they all in 1 file or spread through many?) that contain null values you'd like to fill.
MERGE will rewrite entire files, so you may end up rewriting enough of the table to justify simply overwriting it instead. You'll have to test this to determine what's best for your use case.
Also, to use MERGE, you need to filter the dataset down only to the changes. Your example "desired output" table has the all the data, which you'd fail to MERGE in its current state because there are duplicate keys.
Check the Important! section in the docs for more

Ensure that consistency is respected with Kafka, Springboot and MongoDB

Let's assume that I have data in a Kafka topic employee-topic, 2 SpringBoot instances spring1 and spring2 of the same application, that store/retreive in/from a MongoDB
Let's assume that we have a table employee containing two fields: id and amount
employee is populated as the following:
+---+---------+
| id| amount|
+---+---------+
| 1| 200|
+---+---------+
In Kafka topic, we have 2 messages, containing 2 amounts: "amount1": -200 and "amount1": -100
Let's say that our spring application will use spring1 to consume data containing "amount1" and spring2 to consume data containing "amount2"
The objective is to update the value in employee table.
While updating the value in MongoDB, I could have 2 possibilities:
spring1 update before spring2, which means that: spring1 reads 200, do the sum (-200 + 200) and update the amount becomes 0, spring1 do the same operation: (-100 + 0) and the amount becomes -100.
+---+---------+
| id| amount|
+---+---------+
| 1| -100|
+---+---------+
spring1 update after spring2, which means that: spring2 reads 200, do the sum (-100 + 200) and then: spring1 do the same operation (-200 + (-100)) and the amount becomes -300
+---+---------+
| id| amount|
+---+---------+
| 1| -300|
+---+---------+
This behaviour is random, how can I set a rule, without impacting performance, to set the required behaviour.
Thanks in advance for your help.
It's essentially a concurrency issue, so there are a couple options I can suggest:
Event Sourcing:
Instead of doing updates, find a way to append a stream of events.
Instead of:
Current Amount is 200
Update current amount to current amount -100
Update current amount to current amount -200
Get current amount
Do:
Insert new record with amount of -100
insert new record with amount of -200
Starting from the beginning (or a snapshot), aggregate the total amount to get the current amount
Partition your Topic
I believe this would best-serve your purposes.
You could pre-assign the database resources to a specific spring instance, based on the id.
Essentially your producer could put messages with odd ids on a topic or partition for spring1 to process, and messages with an even id on a separate topic or partition for spring2 to process. Each spring service would know that the database record to update will not be updated by another spring service, removing the concurrency issue.
Acquire a Lock
I can't speak at depth on this, but a last-resort might be having some kind of distributed locking mechanism, like a lock-file, a db flag, a lock-message etc.

Using NSPredicate to refer other NSPredicate rules

Let's say I have a Core Data database for NSPredicate rules.
enum PredicateType,Int {
case beginswith
case endswith
case contains
}
My Database looks like below
+------+-----------+
| Type | Content |
+------+-----------+
| 0 | Hello |
| 1 | end |
| 2 | somevalue |
| 0 | end |
+------+-----------+
I have a content "This is end". How can I query Core Data to check if there is any rule that satisfies this content? It should find second entry on the table
+------+-----------+
| Type | Content |
+------+-----------+
| 1 | end |
+------+-----------+
but shouldn't find
+------+-----------+
| Type | Content |
+------+-----------+
| 0 | end |
+------+-----------+
Because in this sentence end is not at the beginning.
Currently I am getting all values, Create predicate with Content and Type and query the database again which is a big overhead I believe.
They way you doing it now is correct. You first need to build your predicate (which in your case is very complex operation that also requires fetching) and run each predicate to see if which one matches.
I wouldn't be so quick to assume that there is a huge overhead with this. If your data set is small (<300) I would suspect that there would be no problem with this at all. If you are experencing problems then (and only then!) you should start optimizing.
If you see the app is running too slowly then use instrements to see where the issue is. There are two possible places that I could see having perforance issues - 1) the fetching of all the predicates from the database and 2) the running of all of the predicates.
If you want to make the fetching faster, then I would recommend using a NSFetchedResultsController. While it is generally used to keep data in sync with a tableview it can be used for any data that you want to have a correct data for at any time. With the controller you do a single fetch and then it monitors core-data and keeps itself up to data. Then when you you need all of the predicate instead of doing a fetch, you simply access the contoller's fetchedObjects property.
If you find that running all the predicates are taking a long time, then you can improve the running for beginsWith and endsWith by a clever use of a bianary search. You keep two arrays of custom predicate objects, one sorted alphabetically and the other will all the revered strings sorted alphabetically. To find which string it begins with use indexOfObject:inSortedRange:options:usingComparator: to find the relevant objects. If don't know how you can improve contains. You could see if running string methods on the objects is faster then NSPredicate methods. You could also try running the predicates on a background thread concurrently.
Again, you shouldn't do any of this unless you find that you need to. If your dataset is small, then the way you are doing it now is fine.

SQL Server 2014 set based way to create a unique integer for a string combination input

I'm using SQL Server 2014 Developer Edition Service Pack 2 on Windows 7 Enterprise machine.
The question
Is there a set based way that I can create an integer field based on a string input? It must ensure that the Entity ID field is never duplicated for different inputs?
Hypothetical table structure
|ID|Entity ID|Entity Code|Field One|From Date|To Date |
|1 |1 |CodeOne |ValueOne |20160731 |20160801|
|2 |1 |CodeOne |ValueTwo |20160802 |NULL |
|3 |2 |CodeTwo |ValueSix |20160630 |NULL |
Given the above table, I'm trying to find a way to create the Entity ID based on the Entity Code field (it is possible that we would use a combination of fields)
What I've tried so far
Using a sequence object (don't like this because it is too easy for the sequence to be dropped and reset the count)
Creating a table to track the Entities, creating a new Entity ID each time a new Entity is discovered (don't like this because it is not a set based operation)
Creating a hashbyte on the Entity Code field and converting this to a BIGINT (I have no proof that this won't work but it doesn't feel like this is a robust solution)
Thanks in advance all.
Your concerns over HashBytes collisions is understandable, but I think yo can put your worries aside. see How many random elements before MD5 produces collisions?
I've used this technique when masking tens of thousands of customer account numbers. I've yet to witness a collision
Select cast(HashBytes('MD5', 'V5H 3K3') as int)
Returns
-381163718
(Note: as illustrated above, you may see negative values. We didn't mind)

How to filter jbehave examples table rows based on scenario meta data

Is there a way we can filter jbehave examples table rows at runtime using the scenario meta data? For e.g
Scenario: my scenario title
Meta:
#id 1
Examples:
|Meta:|col1|col2|
|id 1 |val1|val2|
|id 2| val |val |
|id 1| val |val |
When we run this scenario it should iterate only for the 1st and 3rd row, based on the meta data set on the scenario.
What I am trying to do is to externalize data across scenarios/ stories and try to use filtered data rows applicable for particular scenario.
I found some similar topics based meta filtering but not specific to this.
Appreciate any help. Thanks
A meta character # must be used in the example table, in this way:
Scenario: some scenario
Meta: #id
Given I pass value '1'
Examples:
|Meta:|col1|col2|
|#id 1|val1|val2|
|#id 2| val|val |
|#id 1| val|val |
Then you need to define the filter in the configuration, for example:
configuredEmbedder().useMetaFilters(Arrays.asList("+id 1"));
More on this topic can be found here:
http://jbehave.org/reference/stable/meta-filtering.html