LMAX Disruptor - Maintain order of events - event-handling

I have an application that loads time series data from various files. The application opens one thread per file to load the data in parallel. The records in the files are ordered but I need to deliver one feed to the rest of the application maintaining the order of events overall.
Can this be implemented using the disruptor like multiple producers one consumer type of design maintaining the order of events?
I am currently using blocking collections and a sorted list to sort the head of each of the blocking collections but this consumes a ton of memory and I am interested to see if someone else has implemented a similar design using a different architecture.
Thanks

If you redesign to something like object streams (focus on the stream), then loading from the file should only load a minimum in memory (which ever buffer size you need.) Each stream prefetch 1 head item.
Then you have to implement a k-way merge to pick the lowest of N items. You would place the streams in a binary tree. When popping the lowest value, the stream relocates in the tree (swaps and rotations). It's around O(log n) of course, to pop a value. When a stream dry, remove from tree.
It's a generalization of a 2 sorted array merge; you have to resort the arrays by their head, which is quite different than sorting a random set; you have a nearly ordered set except that 1 stream out of place. You could do a binary search but that re-insert would be expensive in mem copies. Tree rotations are simpler.
(and disruptor has nothing to do with this...lol)

Related

Aggregate design with EventSourcing and large number of events

I'd like to start adventure with EventSourcing. As a playground I have a system that gathers data from set of Sensors organized in Arrays. Each Sensor have a single value like temperature. What I need from this system is
to get the current value of Sensors readings
last month of Sensor value history
when Sensor value changes, I have to calculate Array "status" and
store it (also for a month)
Array "status" can be corrected manually by the user
Number of Arrays and Sensors is growing. For each Array I have many readings per second.
Now I wanted to have the Array as an Aggregate with Sensors as it's entities. In this case each Sensor reading update would upgrade Array Aggregate version. That gives > 10M of changes for a month. In this design I can't cut off not old events. I can't think about time required for restoring ReadModels after a year of data.
I think I could store current state as CRUD table and remove Sensor current data from Array. Keep just definition. Then I can use service that will handle the Sensor data stream, check Array "status" and keep Array "status" as separate Aggregate. Service would emit "Sensor data update" event. This event would trigger ReadModel keeping historical data handling 1 month constraint. I will not pollute event store with Sensor readings events. In case of Array "status" I will be able to remove whole past "status" Aggregates from the event store. Arrays would keep only Sensor definitions, so EventStore would be relatively small.
I loose complete history. I can't restore my 1 month signal history ReadModel. I would have to pay additional attention not to break it.
The goal is to learn how to scale EventSourcing / CQRS system. How to handle large EventStore and rebuild damaged or inflate new ReadModels within hours not days.
Does this idea fits into ES / CQRS?
(EDIT: is it OK to update RM with event stream not from an Aggregates?)
How to handle issues with growing event store and fixing broken ReadModels?
Thanks!
Does this idea fits into ES / CQRS?
One of the things that you need to be really careful about, is understanding which information is under the control of your domain model, and which belongs to something outside.
If your sensors are physical devices in the real world, broadcasting readings, then your domain model is not the authority. That sensor data is probably going to be read, validated (ie: no corruption to the messages in transit) and stored. In other words, the sensor measurements are events (past), not commands (imperative). Throw them into a convenient data store.
With that in mind, you need to look carefully at whether your arrays are domain entities (reading in sensor data, and making interesting decisions) or projections (a reorganization of the streams of sensor measurements).
It may be useful to review When to avoid CQRS, by Udi Dahan. One of the things he talks about there is that, when done right, aggregates look like processes.
In short, make sure that you are applying the right tools to your problem.
That said, yes -- if you have enough events that folding them into a projection isn't easy, then it is hard. You have to look at how much budget you have to solve the problem, and start digging into more I/O efficient representations of your events, more memory efficient representations of your events, batching, etc. Trying to find different ways to partition the work among different cores.
LMAX did a pretty good job documenting the lessons they learned in processing high volume message streams; search for information about their architecture.
Aggregates with lots of events
Aggregate is a term for the write side (C in CQRS). Aggregate receives a command, and using its state emits events into event store. Aggregate state is built using events from the event store. So if there are lot of events for the given aggregate, it takes time to build the state.
In order to speed up building a state for an aggregate, CQRS/ES frameworks are using snapshots - this is a serialized aggregate state that is stored for particular aggregate version, so you are building the state not from the beginning of time, but from the latest snapshot. You can store snapshots for, say, every 100 events. And don't forget to rebuild them if your projection function changed.
Frameworks such as reSolve are doing this for you transparently.
Your scenario
In your particular case it seems to me that your business logic is trivial, meaning you don't need an aggregate state to calculate anything or to make a decision - there is no business logic, you essentially just store events as they being generated by sensor. So in your custom framework you can just avoid building an aggregate state at write side - just store events as sensor data coming in.
At the read side you would use event stream as usual - upon receiving an event you can store it into Read Model database with necessary categorization or time slots.
If you don't need old data in the ReadModel - you may just skip old events during rebuilding - it should be very fast.
If you don't want to store old event in the event store - you can delete them, but this would not be real event sourcing anymore.

Kafka : Generating unique IDs for strings across partitions

I'm trying to asses if Kafka could be used to scale-out our current solution.
I can identify partitions easily. Currently, the requirement is there to be 1500 partitions, each having 1-2 events per second, but future might go as high as 10000 partitions.
But there is one part of our solution which I don't know how would be solved in Kafka.
The problem is that each message contains a string and I want to assign a unique ID to each string across the whole topic. So same strings have the same ID while different strings have different IDs. The IDs don't need to be sequential, nor do they need to be always-growing.
The IDs will then be used down-stream as unique keys to identify those strings. The strings can be hundreds of characters long, so I don't think they would make efficient keys.
More advanced usage would be where messages might have different "kinds" of strings, so there would be multiple unique sequences of IDs. And messages will contain only some of those kinds depending on the type of the message.
Another advanced usage would be that the values are not strings, but structures and if two structures are same would be some more elaborate rule, like if PropA is equal, then structures are equal, if not, then structures are equal if PropB is equal.
To illustrate the problem: Each partition is a computer in a network. Each event is action on the computer. Events need to be ordered per-computer so that events that change the state of the computer (eg. user logged in) can affect other types of events, and ordering is critical for that. Eg. the user opened an application, a file is written, a flash drive is inserted, etc.. And I need each application, file, flash drive, or many others to have unique identifiers across all computers. This is then used to calculate statistics down-stream. And sometimes, an event can have multiple of those, eg. operation on a specific file on the specific flash drive.
There is a very nice post about kafka and blockchain. This is collective mind work and I think this could solve your IDs scalability issue. For solution refer to "Blockchain: reasons." part. All credits goes to respective authors.
Idea is simple, yet efficient:
Data is hash based, with link to previous block
Data may be very well same hashes, links to respective blocks of types
Custom block-chain solution means you in control of data encoding/decoding
Each hash chain is self-contained, and essentially may be your process (hdd/ram/cpu/word/app etc.)
Each hash chain may be a message itself
Bonus: statistics and analytics may be very well stored in block-chain, with high support for compression and replication. Consumers are pretty cheap in that context (scalability).
Proc:
Unique identifier issue solved
All records linked and thanks to kafka & blockchain highly ordered
Data extendable
Kafka properties applied
Cons:
Encryption/Decryption is CPU intensive
Growing level of hash calculation complexity
Problem: without problem context it's hard to approximate the limitations that need to be addressed further. However, assuming calculated solution has a finite nature you should have no issues scaling the solution in a regular way.
Bottom line:
Without knowledge of requirements in terms of speed/cost/quality it's hard to give a better, backed answer with working example. CPU cloud extension may be comparably cheap, data storage - depends on time for how long and what amount of data you want to store, replay-ability, etc. It's a good chunk of work. Prototype? Concept in referenced article.

How to ensure external projections are in sync when using CQRS and EventSourcing?

I'm starting a new application and I want to use cqrs and eventsourcing. I got the idea of replaying events to recreate aggregates and snapshotting to speedup if needed, using in memory models, caching, etc.
My question is regarding large read models I don't want to hold in memory. Suppose I have an application where I sell products, and I want to listen to a stream of events like "ProductRegistered" "ProductSold" and build a table in a relational database that will be used for reporting or integration with another system. Suppose there are lots of records and this table may take from a few seconds to minutes to truncate/rebuild, and the application exports dozens of these projections for multiple purposes.
How does one handle the consistency of the projections in this scenario?
With in-memory data, it's quite simple and fast to replay the events. But I feel that external projections that are kept in disk will be much slower to rebuild.
Should I always start my application with a TRUNCATE TABLE + rebuild for every external projection? This seems impractical to me over time, but I may be worried about a problem I didn't have yet.
Since the table is itself like a snapshot, I could keep a "control table" to tell which event was the last one I handled for that projection, so I can replay only what's needed. But I'm worried about inconsistencies if the application or database crashes. It seems that checking the consistency of the table and rebuilding would be the same, which points to the solution 1 again.
How would you handle that in a way that is maintainable over time? Are there better solutions?
Thank you very much.
One way to handle this is the concept of checkpointing. Essentially either your event stream or your whole system has a version number (checkpoint) that increments with each event.
For each projection, you store the last committed checkpoint that was applied. At startup, you pull events greater than the last checkpoint number that was applied to the projection, and continue building your projection from there. If you need to rebuild your projection, you delete the data AND the checkpoint and rerun the whole stream (or set of streams).
Caution: the last applied checkpoint and the projection's read models need to be persisted in a single transaction to ensure they do not get out of sync.

One big and wide table or many not so big for statistics data

I'm writing simplest analytics system for my company. I have about 100 different event types that should be collected per tens of projects. We are not interested in cross-project analytic requests but events have similar types through all projects. I use PostgreSQL as primary storage for this system. Now I should decide which architecture is more preferable.
First architecture is one very big table (in terms of rows count) per project that contains data for all types of events. It will be about 20 or more columns many of them will be nullable. May be it will be used partitioning to split this table by event type but table still be so wide.
Second one architecture is a lot of tables (fairly big in terms of rows count but not so wide) with one table per event type.
I going to retrieve analytic data from this tables using different join queries (self join in case of first architecture). Which one is more preferable and where are pitfalls of them?
UPD. All events have about 10 common attributes. And remain attributes are varied from one event type to another.
In the past, I've had similar situations. With postgres you have a bunch of options.
Depending on how your data is input into the system (all at once/ a little at a time) and the volume of your data per project (hundreds of data points vs millions of data points) and the querying pattern (IE, querying after the data is all in, querying nightly, or reports running constantly throughout), there are many options. One other factor will be IF new project types (with new data point types) are likely to crop up.
First, in your "first architecture" the first question that comes up for me is: Are all the "data points" the same data type (or at least very similar). Are some text and others numeric? Are some numeric and others floats? If so, you're likely to run into issues with rolling up your data without either building a column or a table for every data type.
If all your data is the same datatype, then the first architecture you mentioned might work really well.
The second architecture you mentioned is OK especially if you don't predict having a bunch of new project types coming down the pike anytime soon, otherwise, you'll be constantly modifying the DB, which I prefer to avoid when unnecessary.
A third architecture that you didn't mention is to have a combination of 1 and 2. Basically have 1 table to hold the 10 common attributes and use either 1 or 2 to hold the additional attributes. This would have an advantage, especially if the additional data wasn't that frequently used, or was non-numeric.
Lastly, you could use one of PostgreSQLs "document store" type datatypes. You could store this information in arrays, hstores, or json. Now, this will be fairly inefficient if you're doing a ton of aggregate functions as you might be left calculating the aggregates outside of Pgsql, or at a minimum, running an inefficient query. You could store the 10 common fields in normal fields, and the additional ones as hstore or json.
I didn't ask you, but it'd be nice to know that if each event within a project had more than 1 data point (IE are you logging changes, or just updating data).If your overall table has less than 100,000 rows, it's likely just going to be best to focus on what's easier to maintain and program rather than performance, as small amounts of data are pretty quick regardless of how they're stored.

scala/akka/stm design for large shared state?

I am new to Scala and Akka and am considering using it to solve a problem. Suppose I have a calculation engine (that searches for a solution). I'd like to parallelize that search both across cpus and across nodes by giving each cpu on each node its own engine instance.
The engine inputs consist of a small number of scalar inputs and a very large hash table. Each engine instance would use its scalar inputs to make some small local change to the hash table, calculate a goodness, then discard its changes (they do not need to be committed/seen by any other engine instance). The goodness value would be returned to some coordinator that would choose among the results.
I was reading some about the STM TransactionalMap as a vehicle for shared state. This seems ideal, but I don't really see any complete examples using it as shared state.
Questions:
Does the actor/stm model seem right for this problem?
Can you show a specific example of how to distribute the shared state? (is it Ref[TransactionalMap[,]] as a message?
Is there anything different about distributing the shared state within a node as opposed to across different nodes?
Inquiring Minds Want to Know,
Allan
In terms of handling shared memory it doesn't sound like STM would be the right fit here because you don't want the changes made in engine instances to commit to the shared copy of the hash table.
Instead, an immutable HashMap might be a better fit. The elements that do not change in the map can be shared by the engine instances with only the differences in each map taking additional memory space.
The actor model would fit very well what you want to do. Set up one actor for each engine instance you want and pass it a message with the scalar values and the hashmap. Then have it return the results to the coordinator.