Can't change partitioning scheme for actors - azure-service-fabric

While playing with azure service fabric actors, here is the weird thing I've recently found out about - I can't change the default settings for partitioning. If I try to, say, set Named partitioning or change low/high key for UniformInt64, it gets overwritten each time I build my project in Visual Studio. There is no problem to do this for statefull service, it only happens with actors. No errors, no records in Event Log, no nothing... I've found just one single reference about the same problem on the Internet -
But I haven't seen any explanations to that - neither on MSDN, nor in official documentation. Any ideas? Would it really be 'by design'?
Executing just Powershell script to deploy the app does allow me to set the scheme the way I want it to. Still it's frustrating to not being able to do this in VS. Probably there is a good reason to that... it should be, right? :)

Reliable Services can be created with different partition schemes and
partition key ranges. The Actor Service uses the Int64 partitioning
scheme with the full Int64 key range to map actors to partitions.
Every ActorId is hashed to an Int64, which is why the actor service
must use an Int64 partitioning scheme with the full Int64 key range.
However, custom ID values can be used for an ActorID, including GUIDs,
strings, and Int64s.
When using GUIDs and strings, the values are hashed to an Int64.
However, when explicitly providing an Int64 to an ActorId, the Int64
will map directly to a partition without further hashing. This can be
used to control which partition actors are placed in.
This ActorId => PartitionKey translation strategy doesn't work if your partitions are named.


Kafka Message Keys with Composite Values

I am working on a system that will produce kafka messages. These messages will be organized into topics that more or less represent database tables. Many of these tables have composite keys and this aspect of the design is out of my control. The goal is to prepare these messages in a way that they can be easily consumed by common sink connectors, without a lot of manipulation.
I will be using the schema registry and avro format for all of the obvious advantages. Having the entire "row" expressed as a record in the message value is fine for upsert operations, but I also need to support deletes. From what I can tell, this means my message needs a key so I can have "tombstone" messages. Also keep in mind that I want to avoid any sort of transforms unless absolutely necessary.
In a perfect world, the message key would be a "record" that included strongly-typed key-column values and the message value would have the other column values (both controlled by the schema registry). However, it seems like a lot of the tooling around kafka expects message keys to be a single, primitive value. This makes me wonder if I need to compute a key value where I concatenate my multiple key columns into a single string value and keep the individual columns in my message value. Is this right or am I missing something? What other options do I have?
I'm assuming that you know the relationship between the message key and partition assignment.
As per my understanding, there is nothing that stops you from using a complex type like STRUCT as a key with or without a key schema. Please refer to the API here. If you are using an out of box connector that does not support complex type as key, then you may have to write your own Single Message Transformations (SMT) to move the key attributes into the value.
The approach that you mentioned - contacting columns to create the key and keeping the values of the same column in the value attribute would work in many cases if you don't want to write code. The only downside I could see is that your messages would be larger than required. If you don't need a partition assignment strategy or ordering requirement, then the message can have no key or a random key.
I wanted to follow-up with an answer that solved my issue:
The strategy I mentioned of using a concatenated string, technically worked. However, it certainly wasn't very elegant.
My original issue in using a structured key was that I wasn't using the correct converter for deserializing the key, which led to other errors. Once I used the avro converter, I was able to get my multi-part key and use it effectively.
Both, when implemented appropriately allowed me to produce valid tombstone messages that could represent deletes.

Cosmos DB Change Feeds in a Kubernetes Cluster with arbitrary number of pods

I have a collection in my Cosmos database that I would like to observe for changes. I have many documents (official and unofficial) explaining how to do this. There is one thing though that I cannot get to work in a reliable way: how do I receive the same changes to multiple instances when I don't have any common reference for instance names?
What do I mean by this? Well, I'm running my work loads in a Kubernetes cluster (AKS). I have a variable number of instances within the cluster that should observe my collection. In order for change feeds to work properly, I have to have a unique instance name for each instance. The only candidate I have is the pod name. It's usually on the form of <deployment-name>-<random string>. E.g. pod-5f597c9c56-lxw5b.
If I use the pod name as instance name, all instances do not receive the same changes (which is my requirement), only one instance will receive the change (see What I can do is to use the pod name as feed name instead, then all instances get the same changes. This is what I fear will bite me in the butt at some point; when peek into the lease container, I can see a set of documents per feed name. As pod names come and go (the random string part of the name), I fear the container will grow over time, generating a heap of garbage. I know Cosmos can handle huge work loads, but you know, I like to keep things tidy.
How can I keep this thing clean and tidy? I really don't want to invent (or reuse for that matter!) some protocol between my instances to vote for which instance gets which name out of a finite set of names.
One "simple" solution would be to build my own instance names, if AKS or Kubernetes held some "index" of some sort for my pods. I know stateful sets give me that, but I don't want to use stateful sets, as the pods themselves aren't really stateful (except for this particular aspect!).
There is a new Change Feed pull model (which is in preview at this time).
The differences are:
In your case, it looks like you don't need parallelization (you want all instances to receive everything). The important part would be to design a state storing model that can maintain the continuation tokens (or not, maybe you don't care to continue if a pod goes down and then restarts).
I would suggest that you proceed to use the pod name as unique ID. If you are concerned about sprawl of the data, you could monitor the container and devise a clean-up mechanism for the metadata.
In order to have at-least-once delivery, there is going to need to be metadata persisted somewhere to track items ACK-ed / position in a partition, etc. I suspect there could be a bit of work to get change feed processor to give you at-least-once delivery once you consider pod interruption/re-scheduling during data flow.
As another option Azure offers an implementation of checkpoint based message sharing from partitioned event hubs via EventProcessorClient. In EventProcessorClient, there is also a bit of metadata added to a storage account.

Kafka Streams WindowStore Fetch Record Ordering

The Kafka Streams 2.2.0 documentation for the WindowStore and ReadOnlyWindowStore method fetch(K key, Instant from, Instant to) states:
For each key, the iterator guarantees ordering of windows, starting
from the oldest/earliest available window to the newest/latest window.
None of the other fetch methods state this (except the deprecated fetch(K key, long from, long to)), but do they offer the same guarantee?
Additionally, is there any guarantee on ordering of records within a given window? Or is that up to the underlying hashing collection (I assume) implementation and handling of possible hash collisions?
I should also note that we built the WindowStore with retainDuplicates() set to true. So a single key would have multiple entries within a window. Unless we're using it wrong; which I guess would be a different question...
The other methods don't have ordering guarantees, because the order depends on the byte-order of the serialized keys. It's hard to reason about this ordering for Kafka Streams, because the serializers are provided by the user.
I should also note that we built the WindowStore with retainDuplicates() set to true. So a single key would have multiple entries within a window. Unless we're using it wrong; which I guess would be a different question...
You are using it wrong :) -- you can store different keys for the same window by default. If you enable retainDuplicates() you can store the same key multiple times for the same window.

Hazelcast 3.3 - EntryProcessor is accessing "non-local" keys

I'm using Hazelcast 3.3.
One member writes entries to an IMap and calls map.executeOnEntries(myEntryProcessor). The task of EntryProcessor is to just print the entries on console. However, the members (3 other and the 1st one = 4 members) seem to print overlapping set of entries.
My understanding was that the EntryProcessors get only entries corresponding to localKeySet(). However, it appears thats not the case.
Could someone please explain this behavior?
Your reasoning is correct. An EntryProcessor should only touch local keys.
What are you using as key? Hazelcast uses the serialized version of the key as the actual key; so perhaps you have 2 different key instances that lead to the same 'toString', but their binary content is different.
I have shot myself in the foot with e.g. a HashMap being part of the key; this can lead to different binary content even though the actual content is the same, and then you get strange behavior.
If you are using e.g. Long or String as key; then I can't explain the behavior you are seeing. How difficult is it to get this reproduced?
Found out the issue. The problem was not with the EntryProcessors. Actually, the code which was writing data to the distributed IMap, was running on more than the desired number of members.
So, in essence, a process (launched through IExecutorService) was running on multiple instances and publishing 'overlapping sets'/ duplicate sets of data. The EntryProcessor was working in correct way.

Using an RDBMS as event sourcing storage

If I were using an RDBMS (e.g. SQL Server) to store event sourcing data, what might the schema look like?
I've seen a few variations talked about in an abstract sense, but nothing concrete.
For example, say one has a "Product" entity, and changes to that product could come in the form of: Price, Cost and Description. I'm confused about whether I'd:
Have a "ProductEvent" table, that has all the fields for a product, where each change means a new record in that table, plus "who, what, where, why, when and how" (WWWWWH) as appropriate. When cost, price or description are changed, a whole new row as added to represent the Product.
Store product Cost, Price and Description in separate tables joined to the Product table with a foreign key relationship. When changes to those properties occur, write new rows with WWWWWH as appropriate.
Store WWWWWH, plus a serialised object representing the event, in a "ProductEvent" table, meaning the event itself must be loaded, de-serialised and re-played in my application code in order to re-build the application state for a given Product.
Particularly I worry about option 2 above. Taken to the extreme, the product table would be almost one-table-per-property, where to load the Application State for a given product would require loading all events for that product from each product event table. This table-explosion smells wrong to me.
I'm sure "it depends", and while there's no single "correct answer", I'm trying to get a feel for what is acceptable, and what is totally not acceptable. I'm also aware that NoSQL can help here, where events could be stored against an aggregate root, meaning only a single request to the database to get the events to rebuild the object from, but we're not using a NoSQL db at the moment so I'm feeling around for alternatives.
The event store should not need to know about the specific fields or properties of events. Otherwise every modification of your model would result in having to migrate your database (just as in good old-fashioned state-based persistence). Therefore I wouldn't recommend option 1 and 2 at all.
Below is the schema as used in Ncqrs. As you can see, the table "Events" stores the related data as a CLOB (i.e. JSON or XML). This corresponds to your option 3 (Only that there is no "ProductEvents" table because you only need one generic "Events" table. In Ncqrs the mapping to your Aggregate Roots happens through the "EventSources" table, where each EventSource corresponds to an actual Aggregate Root.)
Table Events:
Id [uniqueidentifier] NOT NULL,
TimeStamp [datetime] NOT NULL,
Name [varchar](max) NOT NULL,
Version [varchar](max) NOT NULL,
EventSourceId [uniqueidentifier] NOT NULL,
Sequence [bigint],
Data [nvarchar](max) NOT NULL
Table EventSources:
Id [uniqueidentifier] NOT NULL,
Type [nvarchar](255) NOT NULL,
Version [int] NOT NULL
The SQL persistence mechanism of Jonathan Oliver's Event Store implementation consists basically of one table called "Commits" with a BLOB field "Payload". This is pretty much the same as in Ncqrs, only that it serializes the event's properties in binary format (which, for instance, adds encryption support).
Greg Young recommends a similar approach, as extensively documented on Greg's website.
The schema of his prototypical "Events" table reads:
Table Events
AggregateId [Guid],
Data [Blob],
SequenceNumber [Long],
Version [Int]
The GitHub project CQRS.NET has a few concrete examples of how you could do EventStores in a few different technologies. At time of writing there is an implementation in SQL using Linq2SQL and a SQL schema to go with it, there's one for MongoDB, one for DocumentDB (CosmosDB if you're in Azure) and one using EventStore (as mentioned above). There's more in Azure like Table Storage and Blob storage which is very similar to flat file storage.
I guess the main point here is that they all conform to the same principal/contract. They all store information in a single place/container/table, they use metadata to identify one event from another and 'just' store the whole event as it was - in some cases serialised, in supporting technologies, as it was. So depending on if you pick a document database, relational database or even flat file, there's several different ways to all reach the same intent of an event store (it's useful if you change you mind at any point and find you need to migrate or support more than one storage technology).
As a developer on the project I can share some insights on some of the choices we made.
Firstly we found (even with unique UUIDs/GUIDs instead of integers) for many reasons sequential IDs occur for strategic reasons, thus just having an ID wasn't unique enough for a key, so we merged our main ID key column with the data/object type to create what should be a truly (in the sense of your application) unique key. I know some people say you don't need to store it, but that will depend on if you are greenfield or having to co-exist with existing systems.
We stuck with a single container/table/collection for maintainability reasons, but we did play around with a separate table per entity/object. We found in practise that meant either the application needed "CREATE" permissions (which generally speaking is not a good idea... generally, there's always exceptions/exclusions) or each time a new entity/object came into existence or was deployed, new storage containers/tables/collections needed to be made. We found this was painfully slow for local development and problematic for production deployments. You may not, but that was our real-world experience.
Another things to remember is that asking action X to happen may result in many different events occurring, thus knowing all the events generated by a command/event/what ever is useful. They may also be across different object types e.g. pushing "buy" in a shopping cart may trigger account and warehousing events to fire. A consuming application may want to know all of this, so we added a CorrelationId. This meant a consumer could ask for all events raised as a result of their request. You'll see that in the schema.
Specifically with SQL, we found that performance really became a bottleneck if indexes and partitions weren't adequately used. Remember events will needs to be streamed in reverse order if you are using snapshots. We tried a few different indexes and found that in practise, some additional indexes were needed for debugging in-production real-world applications. Again you'll see that in the schema.
Other in-production metadata was useful during production based investigations, timestamps gave us insight into the order in which events were persisted vs raised. That gave us some assistance on a particularly heavily event driven system that raised vast quantities of events, giving us information about the performance of things like networks and the systems distribution across the network.
Well you might wanna give a look at Datomic.
Datomic is a database of flexible, time-based facts, supporting queries and joins, with elastic scalability, and ACID transactions.
I wrote a detailed answer here
You can watch a talk from Stuart Halloway explaining the design of Datomic here
Since Datomic stores facts in time, you can use it for event sourcing use cases, and so much more.
I think solution (1 & 2) can become a problem very quickly as your domain model evolves. New fields are created, some change meaning, and some can become no longer used. Eventually your table will have dozens of nullable fields, and loading the events will be mess.
Also, remember that the event store should be used only for writes, you only query it to load the events, not the properties of the aggregate. They are separate things (that is the essence of CQRS).
Solution 3 what people usually do, there are many ways to acomplish that.
As example, EventFlow CQRS when used with SQL Server creates a table with this schema:
CREATE TABLE [dbo].[EventFlow](
[GlobalSequenceNumber] [bigint] IDENTITY(1,1) NOT NULL,
[BatchId] [uniqueidentifier] NOT NULL,
[AggregateId] [nvarchar](255) NOT NULL,
[AggregateName] [nvarchar](255) NOT NULL,
[Data] [nvarchar](max) NOT NULL,
[Metadata] [nvarchar](max) NOT NULL,
[AggregateSequenceNumber] [int] NOT NULL,
[GlobalSequenceNumber] ASC
GlobalSequenceNumber: Simple global identification, may be used for ordering or identifying the missing events when you create your projection (readmodel).
BatchId: An identification of the group of events that where inserted atomically (TBH, have no idea why this would be usefull)
AggregateId: Identification of the aggregate
Data: Serialized event
Metadata: Other usefull information from event (e.g. event type used for deserialize, timestamp, originator id from command, etc.)
AggregateSequenceNumber: Sequence number within the same aggregate (this is usefull if you cannot have writes happening out of order, so you use this field to for optimistic concurrency)
However, if you are creating from scratch I would recomend following the YAGNI principle, and creating with the minimal required fields for your use case.
Possible hint is design followed by "Slowly Changing Dimension" (type=2) should help you to cover:
order of events occurring (via surrogate key)
durability of each state (valid from - valid to)
Left fold function should be also okay to implement, but you need to think of future query complexity.
I reckon this would be a late answer but I would like to point out that using RDBMS as event sourcing storage is totally possible if your throughput requirement is not high. I would just show you examples of an event-sourcing ledger I build to illustrate.
The above is an event sourcing ledger web service.
And the above I use RDBMS to compute states so you can enjoy all the advantages coming with a RDBMS like transaction support.
And I have another consumer to be processing in memory to handle bursts.
One would argue the actual event store above still lives in Kafka-- as RDBMS is slow for inserting especially when the inserting is always appending.
I hope the code help give you an illustration apart from the very good theoretical answers already provided for this question.