Efficient way to keep state in reactive stream scan(), atomic, etc?

Efficient way to keep state in reactive stream scan(), atomic, etc? - reactive-programming

Last time, I started implementing bitbay.net subscription on orders.
The problem is that bitbay is returning a delta of orders, but I always want to keep the whole price depth (so I have to keep full price depth on my side and update it when some delta event occur):
bid ask bid ask
---------- -----------
A D ------------>delta-event(removed=D)---> A F
B F B G
C G C
So I decided to use
Flux
.from(eventsFromBitbay)
.scan(FullPriceDepth.empty(), (pd, e) -> pd.update(e))
.subscription(...)
My question is Flux.scan(...) will be a good choice for that (in term of efficiency and thread safety)? I'm talking about millions of events in high spped system.
My alternative is to make some Atomic... and update it in Flux.create(...).map(e -> atomicHere) or is there something better?
Is Flux.scan() more efficient than Atomic..., why, why not?

"My question is Flux.scan(...) will be a good choice for that?"
Sure, why not? It's an obvious pattern, if you ask me. You have a class that holds information needed to process the flux. You should keep a couple things in mind though, mostly that the order of a flux is easy changed, for example by using Flux::flatMap instead of Flux::flatMapSequential, so you could easily get things in any order. Also, someone could put the flux on multiple threads so your FullPriceDepth properties might have to code for concurrency issues.

Related

Is there a hard definition of how CQRS should be applied and CQRS questions

I have some trouble understanding what the CQRS pattern really is, on its core, meaning what are the red lines that, when crossed, we are not implementing the CQRS pattern.
I clearly understand the CQS pattern.
Suppose that we implement microservices with CQRS, without event sourcing.
1) First question is, does the CQRS pattern only apply to the client I/O? Meaning, hoping I get this right, that for example the client updates using controllers that write to database A, but read by querying controllers that write to database B, (B is eventually updated and may be aggregating information from multiple models using events sent by controller of A).
Or, this is not about the client, but anything, for example another microservice reading / writing? And if the latter, what is really the borderline that defines who is the reader / writer that causes the segregation?
Does this maybe have to do with the domains in DDD?
This is an important question in my mind, because without it, CQRS is just an interdependence, of model B being updated by model A, but not the reverse. And why wouldn't this be propagated from a model B to a model C for example?
I have also read articles stating that some people implement CQRS by having one Command Service and one Query Service, which even more complicates this.
2) Similar to the first question, why do some references speak of events as if they are the Commands of CQRS? This complicates CQRS in my mind, because, technically, with one request event we can ask a service "Hey please give me the information X" and the service can respond with an event that contains the payload, effectively doing a query. Is this a hard rule, or just an example to state that, we can update using events and we can query using REST?
3) What if, in most cases I write to model A, and I read from model B, but in some cases I read directly from model A?
Am I breaking CQRS?
What if my querying needs are very simple, should I duplicate model A in this situation?
4) What if, as stated in question 1), I read from model A to emit events, to produce model B, but then I like to read some information from model B because it's valuable because it is aggregated, in order to produce model C?
Am I breaking CQRS?
What is the controller that populates model B doing in that case, e.g. if it also emits events to populate model C? Is it simply a Command anyway because it is not the part that queries, so we still apply CQRS?
Additionally, what if, we read from model A to emit events, to produce model B, but while we produce model B, we emit events, to send client notifications. Is that still CQRS?
5) When is CQRS broken?
If I have a controller that reads from model B, but also emits a message that updates model A, is that it?
Finally, if that controller, e.g. a REST controller, reads from model B and simultaneously emits a message to update model A, but without containing any information from what was read from model B, (so the operation is two in one, but it does not use information from B to update A), is that still CQRS?
And, what if a REST controller, that updates model A, also returns some information to the client, that has been read from A, does that break CQRS? What if this is just an id? And what if the id is not read from A, but it is just a reference number that is randomly generated? Is there a problem in that case because the REST controller updates, but also returns something to the user?
I will really appreciate your patience for replying as it can be seen that I'm still quite confused and that I'm in the process of learning!

Is there a hard definition of how CQRS should be applied and CQRS questions
Yes, start with Greg Young.
CQRS is simply the creation of two objects where there was previously only one. The separation occurs based upon whether the methods are a command or a query (the same definition that is used by Meyer in Command and Query Separation, a command is any method that mutates state and a query is any method that returns a value). -- Greg Young 2010
It's "just a pattern", born of the fact that data representations which are effective for queries are frequently not the patterns that are effective for tracking change. For example, using an RDBMS for storing business data may be our best choice for maintaining data integrity, but for certain kinds of queries we might want to use a replicate of that data in a graph database.
why do some references speak of events as if they are the Commands of CQRS
HandleEvent is a command. CommandReceived is an event. It's very easy for readers (and authors!) to confuse the contexts that are being described. They are all "just" messages, the semantics of one or the other really depend on the direction the message is traveling relative to the authority for the information in the message.
For example, if I fill out a form on an e-commerce site and submit; is the corresponding message an OrderSubmitted event? or is it a PlaceOrder command? Either spelling could be the correct one, depending on how you choose to model the ordering process.
What if, in most cases I write to model A, and I read from model B, but in some cases I read directly from model A? Am I breaking CQRS?
The CQRS police are not going to come after you if you read from write models. In many architectures, the business logic is executed in a stateless component, and will depend on reading the "current" state from some storage appliance -- in other words, to support write often requires a read.
Pessimizing a write model to support read use cases is the thing we are trying to avoid.
Also: horses for courses. It's entirely appropriate to restrict the use of CQRS to those cases where you can profit from it. When GET/PUT semantics of a single model work, you should prefer them to separate models for reads and writes.

What are the (dis)advantages of early bound?

I'm researching the pros and cons of early and late binding in CRM. I've got a good idea on the subject but there are some points I'm unclear about.
Some say that early biding is the fastest, other that lates is. Is there any significant difference?
How does one handle early binding for custom entities?
How does one handle early binding for default entities with custom fields?
There is a lot of links but the most useful I got my mouse on were those. Any other pointers?
Pro both
Pro early
Pro late

Some say that early biding is the fastest, other that late is. Is there any significant difference?
a. Since Early bound is just a wrapper over the late bound entity class, and contains all the functionality there of, it can't have a faster runtime than late bound. But, this difference is extremely small and I differ to Eric Lippert in the What's Fastest type of questions. The one difference in speed that isn't negligible, is the speed of development though. Early bound is much faster for development, and much less error prone IMHO.
How does one handle early binding for custom entities?
a. The CrmSrvcUtil generates the early bound classes for custom entities, exactly like the default ones (I created this tool to make generating the classes even easier. Update: It has since moved over to GitHub Update 2 It is now in the XrmToolBox Plugin Store, search for "Early Bound Generator" ). Each time a change is made to a CRM entity, the entity type definitions will need to be updated (only if you want to use a new property or entity, or you've removed a property or entity that you currently use. You can use early bound entity classes that are out of date, as long as you don't set the values of any properties that don't actually exist, which is the same exact requirements of late bound)
How does one handle early binding for default entities with custom fields?
a. See the answer to question 2.
One of the little gottcha's when working with early bound entities, is the need to enable early bound proxy types in your IOrganizationService. This is easy for the OrganizationServiceProxy, but may take a few more steps for plugins and especially custom workflow activities.
Edit 1 - My Tests
Below is my code, testing against a pretty inactive local dev environment. Feel free to test for youself
using (var service = TestBase.GetOrganizationServiceProxy())
{
var earlyWatch = new Stopwatch();
var lateWatch = new Stopwatch();
for (int i = 0; i < 100; i++)
{
earlyWatch.Start();
var e = new Contact() { FirstName = "Early", LastName = "BoundTest"
e.Id = service.Create(e);
earlyWatch.Stop();
lateWatch.Start();
var l = new Entity();
l.LogicalName = "contact";
l["firstname"] = "Late";
l["lastname"] = "BoundTest";
l.Id = service.Create(l);
lateWatch.Stop();
service.Delete(e);
service.Delete(l);
}
var earlyTime = earlyWatch.ElapsedMilliseconds;
var lateTime = lateWatch.ElapsedMilliseconds;
var percent = earlyWatch.ElapsedTicks / (double)lateWatch.ElapsedTicks;
}
My two test results (please note that running two test are not statistically significant to draw any sort of statistical conclusion, but I think they lend weight to it not really being that big of a performance decrease to justify some of the development gains) where ran against a local dev environment with very little other activity to disrupt the tests.
Number Creates | Early (MS) | Late (MS) | % diff (from ticks)
10 | 1242 | 1106 | 12.3%
100 | 8035 | 7960 | .1%
Now lets plug in the numbers and see the difference. 12% seems like a lot, but 12% of what? The actual difference was .136 seconds. Let's say you create 10 Contacts every minute... .136 x 60 min / hour x 24 hours / day = 195.84 s/day or about 3 seconds a day. Lets say you spend 3 developer hours attempting to figure out which is faster. In order for the program to be able to save that much time, it would take 60 days of 24/7 10 contacts / minute processing in order for the faster code to "pay back" it's 3 hours of decision making.
So the rule is, always pick the method that is more readable/maintainable than what is faster first. And if the performance isn't fast enough, then look at other possibilities. But 98 times out of 100, it really isn't going to affect performance in a way that is detectable by an end user.
Premature optimization is the root of all evil -- DonaldKnuth

Probably not. If you want to know for certain, I would suggest running some tests and profiling the results.
However these MSDN articles suggest late binding it faster.
Best Practices for Developing with Microsoft Dynamics CRM
Use Early-Bound Types
Use the Entity class when your code must work on entities and
attributes that are not known at the time the code is written. In
addition, if your custom code works with thousands of entity records,
use of the Entity class results in slightly better performance than
the early-bound entity types. However, this flexibility has a
disadvantage because you cannot verify entity and attribute names at
compile time. If your entities are already defined at code time and
slight performance degradation is acceptable, you should use the early-bound types that you can generate by using the CrmSvcUtil
tool. For more information, see Use the Early Bound Entity Classes in
Code.
Choose your Development Style for Managed Code for Microsoft Dynamics CRM
Entity Programming (Early Bound vs. Late Bound vs. Developer
Extensions)
Early Bound ... Serialization costs increase as the entities are
converted to late bound types during transmission over the network.
2 & 3. You don't have to take any special action with custom fields or entities. Svcutil will generate classes for both.
Use the Early Bound Entity Classes in Code
The class created by the code generation tool includes all the
entity’s attributes and relationships. By using the class in your
code, you can access these attributes and be type safe. A class with
attributes and relationships is created for all entities in your
organization. There is no difference between the generated types for
system and custom entities.
As a side note, I wouldn't get too hung up on it, they are both acceptable implementation approaches and in the majority of situations I doubt the performance impact will be significant enough to worry about. Personally I prefer late binding, but that's mostly because I don't like having to generate the classes.
Edit.
I performed some quick profiling on this by creating accounts in CRM, a set of 200 & 5000. It confirms the information provided by Microsoft, in both runs late binding was about 8.5 seconds quicker. Over very short runs the late binding is significantly faster - 90%. However early binding quickly picks up speed and by the time 5000 records are created late binding is only 2% faster.
Full details blogged here.

What is the smallest unit of work that is sensible to parallelize with actors?

While Scala actors are described as light-weight, Akka actors even more so, there is obviously some overhead to using them.
So my question is, what is the smallest unit of work that is worth parallelising with Actors (assuming it can be parallelized)? Is it only worth it if there is some potentially latency or there are a lot of heavy calculations?
I'm looking for a general rule of thumb that I can easily apply in my everyday work.
EDIT: The answers so far have made me realise that what I'm interested in is perhaps actually the inverse of the question that I originally asked. So:
Assuming that structuring my program with actors is a very good fit, and therefore incurs no extra development overhead (or even incurs less development overhead than a non-actor implementation would), but the units of work it performs are quite small - is there a point at which using actors would be damaging in terms of performance and should be avoided?

Whether to use actors is not primarily a question of the unit of work, its main benefit is to make concurrent programs easier to get right. In exchange for this, you need to model your solution according to a different paradigm.
So, you need to decide first whether to use concurrency at all (which may be due to performance or correctness) and then whether to use actors. The latter is very much a matter of taste, although with Akka 2.0 I would need good reasons not to, since you get distributability (up & out) essentially for free with very little overhead.
If you still want to decide the other way around, a rule of thumb from our performance tests might be that the target message processing rate should not be higher than a few million per second.

My rule of thumb--for everyday work--is that if it takes milliseconds then it's potentially worth parallelizing. Although the transaction rates are higher than that (usually no more than a few 10s of microseconds of overhead), I like to stay well away from overhead-dominated cases. Of course, it may need to take much longer than a few milliseconds to actually be worth parallelizing. You always have to balance time time taken by writing more code against the time saved running it.

If no side effects are expected in work units then it is better to make decision for work splitting in run-time:
protected T compute() {
if (r – l <= T1 || getSurplusQueuedTaskCount() >= T2)
return problem.solve(l, r);
// decompose
}
Where:
T1 = N / (L * Runtime.getRuntime.availableProcessors())
N - Size of work in units
L = 8..16 - Load factor, configured manually
T2 = 1..3 - Max length of work queue after all stealings
Here is presentation with much more details and figures:
http://shipilev.net/pub/talks/jeeconf-May2012-forkjoin.pdf

Recreate a graph that change in time

I have an entity in my domain that represent a city electrical network. Actually my model is an entity with a List that contains breakers, transformers, lines.
The network change every time a breaker is opened/closed, user can change connections etc...
In all examples of CQRS the EventStore is queried with Version and aggregateId.
Do you think I have to implement events only for the "network" aggregate or also for every "Connectable" item?
In this case when I have to replay all events to get the "actual" status (based on a date) I can have near 10000-20000 events to process.
An Event modify one property or I need an Event that modify an object (containing all properties of the object)?

Theres always an exception to the rule but I think you need to have an event for every command handled in your domain. You can get around the problem of processing so many events by making use of Snapshots.
http://thinkbeforecoding.com/post/2010/02/25/Event-Sourcing-and-CQRS-Snapshots

I assume you mean currently your "connectable items" are part of the "network" aggregate and you are asking if they should be their own aggregate? That really depends on the nature of your system and problem and is more of a DDD issue than simple a CQRS one. However if the nature of your changes is typically to operate on the items independently of one another then then should probably be aggregate roots themselves. Regardless in order to answer that question we would need to know much more about the system you are modeling.
As for the challenge of replaying thousands of events, you certainly do not have to replay all your events for each command. Sure snapshotting is an option, but even better is caching the aggregate root objects in memory after they are first loaded to ensure that you do not have to source from events with each command (unless the system crashes, in which case you can rely on snapshots for quicker recovery though you may not need them with caching since you only pay the penalty of loading once).
Now if you are distributing this system across multiple hosts or threads there are some other issues to consider but I think that discussion is best left for another question or the forums.
Finally you asked (I think) can an event modify more than one property of the state of an object? Yes if that is what makes sense based on what that event represents. The idea of an event is simply that it represents a state change in the aggregate, however these events should also represent concepts that make sense to the business.
I hope that helps.

What are some of the advantage/disadvantages of using SQLDataReader?

SqlDataReader is a faster way to process the stored procedure. What are some of the advantage/disadvantages of using SQLDataReader?

I assume you mean "instead of loading the results into a DataTable"?
Advantages: you're in control of how the data is loaded. You can ask for specific data types, and you don't end up loading the whole set of data into memory all at the same time unless you want to. Basically, if you want the data but don't need a data table (e.g. you're going to populate your own kind of collection) you don't get the overhead of the intermediate step.
Disadvantages: you're in control of how the data is loaded, which means it's easier to make a mistake and there's more work to do.
What's your use case here? Do you have a good reason to believe that the overhead of using a normal (or strongly typed) data table is significantly hurting performance? I'd only use SqlDataReader directly if I had a good reason to do so.

The key advantage is obviously speed - that's the main reason you'd choose a SQLDataReader.
One potential disadvantage not already mentioned is that the SQLDataReader is forward only, so you can only go through the records once in sequence - that's one of the things that allows it to be so fast. In many cases that's fine but if you need to iterate over the records more than once or add/edit/delete data you'll need to use one of the alternatives.
It also remains connected until you've worked through all the records and close the reader (of course, you can opt to close it earlier, but then you can't access any of the remaining records). If you're going to perform any lengthy processing on the records as you iterate over them, you may find that you impact other connections to the database.

It depends on what you need to do. If you get back a page of results from the database (say 20 records), it would be better to use a data adapter to fill a DataSet, and bind that to something in the UI.
But if you need to process many records, 1 at a time, use SqlDataReader.

Advantages: Faster, less memory.
Disadvantages: Must remain connected, must remember to close the reader.

The data might not be concluesive and you are not in control of your actions that why the milk man down the road has always got to carry data with him or else they gona get cracked by the data and the policeman will not carry any data because they think that is wrong to keep other people's data and its wrong to do so. There is a girl who lives in Sheffield and she loves to go out and play most the times that she s in the house that is why I dont like to talk to her because her parents and her other fwends got taken to peace gardens thats a place that everyone likes to sing and stay. usually famous Celebs get to hang aroun dthere but there are always top security because we dont want to get skanked down them ends. KK see u now I need 2 go and chill in the west end PEACE!!!£"$$$ Made of MOney MAN$$$$