Where to draw the line with reactive programming [closed] - system.reactive

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have been using RxJava in my project for about a year now.
With time, I grew to love it very much - now I'm thinking maybe too much...
Most methods I write now have some form of Rx in it, which is great! (until it's not).
I now notice that some methods require a lot of work to combine the different observable producing methods.
I get the feeling that although I understand what I write now, the next programmer will have a really hard time understanding my code.
Before I get to the bottom line let me give an example straight from my code in Kotlin (Don't dive too deep into it):
private fun <T : Entity> getCachedEntities(
getManyFunc: () -> Observable<Timestamped<List<T>>>,
getFromNetwork: () -> Observable<ListResult<T>>,
getFunc: (String) -> Observable<Timestamped<T>>,
insertFunc: (T) -> Unit,
updateFunc: (T) -> Unit,
deleteFunc: (String) -> Unit)
= concat(
getManyFunc().filter { isNew(it.timestampMillis) }
.map { ListResult(it.value, "") },
getFromNetwork().doOnNext {
syncWithStorage(it.entities, getFunc, insertFunc, updateFunc, deleteFunc)
}).first()
.onErrorResumeNext { e -> // If a network error occurred, return the cached data and the error
concat(getManyFunc().map { ListResult(it.value, "") }, error(e))
}
Briefly what this does is:
Retrieve some timestamped data from storage
If data is not new, fetch data from network
Sync network data again with the storage (to update it)
If a network error occured, again retrieve the older data and the error
And here comes my actual question:
Reactive programming offers some really powerful concepts. But as we know with great power comes great responsibility.
Where do we draw the line? Is it OK to fill our entire programs with awesome reactive oneliners or should we save it only for really mundane operations?
Obviously this is very subjective, but I hope someone with more experience can share his knowledge and pitfalls.
Let me phrase it better
How do I design my code to be reactive yet easy to read?

When you pick up Rx, it becomes this awesome shiny hammer and everything starts looking like a rusty nail just waiting for you to bang in.
Personally, I think the biggest clue is in the name, reactive framework. Given a requirement, you need to reflect upon whether a reactive solution truly makes sense.
In any Rx proposition, you are looking to introduce one or more event streams and carry out some action in response to an event.
I think there are two key questions to ask:
Are you in control of the event stream?
To what degree must you complete responses at the rate of the event stream?
If you do not have control of the event stream and you must respond at the rate of the event stream then Rx is a good candidate.
In any other circumstance, it is probably a poor choice.
I have seen many examples where people have jumped through hoops to create the illusion of a lack of control in order to justify Rx - which seems crazy to me. Why give up the control that you have?
Some examples:
You have to extract data from a fixed list of files and store it in a database. You decide to push each file name into a subject and create a reactive pipeline that opens each file and projects the data, then processes the data in some way and finally writes it to the database.
This fails the control test and the rate test. It would be far easier to iterate over the files and pull them in and process them as fast as you can. The phrase "decide to push" is the giveaway here.
You need to display stock prices from a stock exchange.
Clearly this is a good choice for Rx. If you can't keep up with the rate of prices in general, you are screwed. It might be the case that you conflate prices (perhaps to provide an update only once every second) - but this still qualifies as keeping up. The one thing you can't do is ask the stock exchange to slow down.
These (real world) examples pretty much fall at opposite ends of the spectrum and don't have much grey area. But there is a lot of grey area out there where control isn't clear.
Sometimes you are wearing the client hat in a client/server system and it can be easy to fall into the trap of sacrificing control, or putting control in the wrong place - which can easily be fixed with correct design. Consider this:
A client application displays news updates from a server.
News updates are submitted to the server at any time and are created in high volume.
The client should be refreshed at an interval set by the client.
Refresh interval can be changed at any time and the user can always request an immediate refresh.
The client only shows updates tagged with particular keywords, as specified by the user.
The news updates are sometimes lengthy and the client should not store the full content of news updates, but rather display the headline and summary.
At user request, the full content of an article can be shown.
Here, the frequency of news updates is not in control of the client. But the desired refresh rate and the tags of interest are.
For the client to receive all the news updates as they arrive and filter them client side isn't going to work. But there are plenty of options:
Should the server send a data stream of updates taking into account the client refresh rate? What if the client goes offline?
What if there are thousands of clients? What if the client wants an immediate refresh?
There are lots of valid ways to tackle this problem that include more or less reactive elements. But any good solution should take account of the client's control of tags and desired refresh rate, and the lack of control of news update frequency (by client or server). You might want the server to react to changes in client interest by updating the events that it pushes to the client - which it pushes only as long as the client is listening (detected via a heartbeat). When the user wants a full article, then the client would pull the article down.
There is much debate in the Rx community about back-pressure. This is the idea that the client should inform the server when it is overloaded and the server respond by somehow reducing the event stream. I think this is a misguided approach that can lead to confusing designs.
To my mind, as soon as a client needs to give this feedback, it has failed the response rate test. At this point, you are not in a reactive situation, you are in an async enumerable situation. i.e. The client should be saying "I am ready" when it is ready for more and then waiting in a non-blocking fashion for server to respond.
This would be appropriate if the first scenario were modified to be files arriving in a drop-folder, of varying lengths and complexity to process. The client should make a non-blocking call for the next file, process it, and repeat. (Add parallelism as required) - and not be responding to a stream of file-arrived events.
Wrap up
I've deliberately avoided other valid concerns such as maintainability of code, performance of Rx itself etc. Most because they are addressed elsewhere and more importantly because I think the ideas here are more divisive than those concerns.
So if you reflect on the elements of control and response rate in your scenario you and will probably stay on the right track.
The response rate issue can be subtle - and the degree aspect is important. Arrival rate can fluctuate, and there is going to be some acceptable degree of fluctuation in response rate - clearly, if you don't ultimately have a way to "catch up" then at some point the client will blow up.

I find that there are two things I keep in mind when writing Rx (or any mildly sophisticated/new technology)
Can I test it?
Can I easily hire someone that can maintain it. Not struggle to maintain it, but will be fine left alone to maintain it?
To this end, I also find that just because you can, doesn't always mean you should. As a guide I try to avoid creating queries that are over say 7 lines of code. Queries bigger than this, I try to separate into sub queries that I compose.
If code you have provided is at the core of the code base, and is at the extreme end of the complexity, then It may be fine. However, if you find all of your Rx code carries that much complexity, you may be creating a difficult to work with code base.

Related

Realm and Task causing extreme UI issues

I am writing a SwiftUI app that uses a single Task(priority: .low) to do lots of HTTP requests as a low priority whose results are displayed in a Table.
There's several hundreds of MBs of data being aggregated to display totals, averages etc.
Thing is, I am only creating one Task, but this is causing my UI to hang. Why is this? I think I must have a fundamental misunderstanding around how tasks are created and how they relate to processes and threads.
I am using Mongo's Realm to store the data having become frustrated with Core Data. Does anyone have any experience of agonisingly slow apps using Realm?
There is far too much code to post, but all the HTTP requests are being made using URLSession's async methods. These data are being transformed and saved in my Realm. In any case, I certainly wouldn't expect this to cause the UI to hang! What might be going on?
I know it's tough to triage without code, but as I say there is a lot of it. I’m just after general guidance really, rather than a fix for a very specific issue.

What use cases are there for Early Media with IVRs?

I would say that the following are valid use cases for Early Media when used for IVRs:
Filtering
Disconnect incoming calls based on certain criteria (such as callers from a specific area code, or to play a message before hanging up on after hour callers).
Rate limiting
For example to limit the number of simultaneous callers, where the excess calls are either disconnected, or placed in a queue. (I'm not sure if the queue example is possible though without answering the call.)
Are there more?
Some links that were helpful:
https://www.dialogic.com/webhelp/csp1010/8.4.1_ipn3/sip_software_chap_-_early_media.htm
https://wiki.asterisk.org/wiki/display/AST/Early+Media+and+the+Progress+Application
https://freeswitch.org/confluence/display/FREESWITCH/Early+Media
3) Legal: in some countries it is illegal for a call center using payed line (caller pay per minute) to accept a call without sending it straight to an agent. So you can't accept the call, give the user some waiting music for 15 minutes (and make them pay for the privilege of waiting as well).
Result: you don't accept the call. However, this creates a new problem: if the caller only hears the ring back tone for those 15 minutes, he/she will assume no one will answer and hangs up.
Using early media, you can give them the traditional "your call is very important to us, please hold on"-type of experience without accepting the call and without starting to charge money. Of course this also depends somewhat on how much the provider is willing to tolerate, as this can also affects their income (depending on their own business model).
4) Comfort: you may not be aware of this, but the sound you hear as caller when the other side is ringing (ring-back tone), is not universally the same throughout the world. A company with a global number may wish to use early media to provide a ring-back tone more familiar to you, depending on where you are calling from. It was always a bit niche of a concept but some target audiences are statistically more likely to hangup if they hear an unfamiliar sound. 15-20 year ago this might have been a concern to some, but in the era of smartphones and internet calls, I doubt anyone really worries about it anymore.

What triggers UI refresh in CQRS client app?

I am attempting to learn and apply the CQRS design approach (pattern and architecture) to a new project but seem to be missing a key piece.
My client application executes a query and retrieves a list of light-weight, read-only DTOs from the read model. The user selects an item and clicks a button to initiate some action. The action is performed by creating and sending the corresponding command object to the write model (where the command handler carries out the action, updates the data store, etc.) At some point, however, I need to update the UI to reflect changes to the state of the application resulting from the action.
How does the UI know when it is time to refresh the original list?
Additional Info
I have noticed that most articles/blogs discussing CQRS use MVC client apps in their examples. I am working on a Silverlight client right now and am beginning to wonder if the pattern simply doesn't work in that case.
Follow-Up Question
After thinking more about Bartlomiej's response and subsequent discussion, I am wondering about error handling in CQRS. Given that commands are basically fire-and-forget asynchronous operations, how do we report an error condition to the UI?
I see 'refreshing the UI' to take one of two forms:
The operation succeeds, data has changed and the UI should be updated to reflect these changes
The operation fails, data has not changed but the user should be notified of the failure and potential corrective actions.
Even with a Post-Redirect-Get pattern in an MVC, you can't really Redirect until you know the outcome of the operation. None of the examples I've seen thus far address these real-world concerns.
I've been struggling with similar issues for a WPF client. The re-query trigger for any data is dependent on the data your updating, commands tend to fall into categories:
The command is a true fire and forget method, it informs the back-end of a state change but this change does not need to be reflected in the UI, or the change simply isn't important to the UI.
The command will alter the result of a single query
The command will alter the result of multiple queries, usually (in my domain at least) in a cascading fashion, that is, changing the state of a single "high level" piece of data will likely affect many "low level" caches.
My first trigger is the page load, very few items are exempt from this as most pages must assume data has been updated since it was last visited. Though some systems may be able to escape with only updating financial and other critical data in this way.
For short commands I also update data when 'success' is returned from a command. Though this is mostly laziness as IMHO all CQRS commands should be fired asynchronously. It's still an option I couldn't live without but one you may have to if your implementation expects high latency between command and query.
One pattern I'm starting to make use of is the mediator (most MVVM frameworks come with one). When I fire a command, I also fire a message to the mediator specifying which command was launched. Each Cache (A view model property Retriever<T>) listens for commands which affect it and then updates appropriately. I try to minimise the number of messages while still minimising the number of caches that update unnecessary from a single message so I'll (hopefully) eventually end up with a shortlist of update reasons, with each 'reason' updating a list of caches.
Another approach is simple honesty, I find that by exposing graphically how the system updates itself makes users more willing to be patient with it. On firing a command show some UI indicating you're waiting for the successful response, on error you could offer to retry / show the error, on success you start the update of the relevant fields. Baring in mind that this command could have been fired from another terminal (of which you have no knowledge) so data will need to timeout eventually to avoid missing state changes invoked by other machines also.
Noting the irony that the only efficient method of updating cache's and values on a client is to un-separate the commands and queries again, be it through hardcoding or something like a hashmap.
My two cents.
I think MVVM actually fits into CQRS quite well. The ViewModel simply becomes an observable ReadModel.
1 - You initialize your ViewModel state via a query on the ReadModel.
2 - Changes on your ViewModel are automatically reflected on any Views that are bound to it.
3 - Certain changes on your ViewModel trigger a command to propegate to a message queue, an object responsible for sending those commands to the server takes those messages off the queue and sends them to the WriteModel.
4 - Clients should be well formed, meaning the ViewModel should have performed appropriate validation before it ever triggered the command. Once the command has been triggered, any event notifications can be published onto an event bus for the client to communicate changes to other ViewModels or components in the system interested in those changes. These events should carry the relevant information necessary. Typically, this means that other view models usually don't have to re-query the read model as a result of the change unless they are dependent on other data that needs to be retrieved.
5 - There is an object that connects to the message bus on the server for real-time push notifications when other clients make changes that this client is interested in knowing about, falling back to long-polling if necessary. It propagates those to the internal message bus that ties the components on the client together.
6 - The last part to handle is the fact that clients can be occasionally connected, which should be the only reason a command fails (they don't have internet access at the moment), which is when the client should be notified of problems.
In my ASP.NET MVC 3 I use 2 techniques depending on use case:
already well-known Post-Redirect-Get pattern which fits nicely with CQRS. Your MVC action that triggers the command returns a redirection to action that performs a query.
in some cases, like real-time updates of other clients, I rely on domain events/messages. I create an event handler that uses singlarR to push changes to all connected and interested clients.
There are two major ways you can take as far as I know :
1) design your UI , so that the user does not see its changes right away. Like for instance a message to tell him his action is a success, and offering him different choices to continue his work. this should buy you enough time to have updated your readmodel.
2) more complex, but you might keep the information you have send to the server and shows them in the interface.
The most important I guess, educate your user if you can so that they know why the data is not here... yet!
I am thinking about it only now, but these are for sync command handling, not async, in async things go really harder on the brain...the client interface becomes an event eater too..

Why should the event store be on the write side?

Event sourcing is pitched as a bonus for a number of things, e.g. event history / audit trail, complete and consistent view regeneration, etc. Sounds great. I am a fan. But those are read-side implementation details, and you could accomplish the same by moving the event store completely to the read side as another subscriber.. so why not?
Here's some thoughts:
The views/denormalizers themselves don't care about an event store. They just handle events from the domain.
Moving the event store to the read side still gives you event history / audit
You can still regenerate your views from the event store. Except now it need not be a write model leak. Give him read model citizenship!
Here seems to be one technical argument for keeping it on the write side. This from Greg Young at http://codebetter.com/gregyoung/2010/02/20/why-use-event-sourcing/:
There are however some issues that exist with using something that is storing a snapshot of current state. The largest issue revolves around the fact that you have introduced two models to your data. You have an event model and a model representing current state.
The thing I find interesting about this is the term "snapshot", which more recently has become a distinguished term in event sourcing as well. Introducing an event store on the write side adds some overhead to loading aggregates. You can debate just how much overhead, but it's apparently a perceived or anticipated problem, since there is now the concept of loading aggregates from a snapshot and all events since the snapshot. So now we have... two models again. And not only that, but the snapshotting suggestions I've seen are intended to be implemented as an infrastructure leak, with a background process going over your entire data store to keep things performant.
And after a snapshot is taken, events before the snapshot become 100% useless from the write perspective, except... to rebuild the read side! That seems wrong.
Another performance related topic: file storage. Sometimes we need to attach large binary files to entities. Conceptually, sometimes these are associated with entities, but sometimes they ARE the entities. Putting these in the event store means you have to physically load that data each and every time you load the entity. That's bad enough, but imagine several or hundreds of these in a large aggregate. Every answer I have seen to this is to basically bite the bullet and pass a uri to the file. That is a cop-out, and undermines the distributed system.
Then there's maintenance. Rebuilding views requires a process involving the event store. So now every view maintenance task you ever write further binds your write model into using the event store.. forever.
Isn't the whole point of CQRS that the use cases around the read model and write model are fundamentally incompatible? So why should we put read model stuff on the write side, sacrificing flexibility and performance, and coupling them back up again. Why spend the time?
So all in all, I am confused. In all respects from where I sit, the event store makes more sense as a read model detail. You still achieve the many benefits of keeping an event store, but you don't over-abstract write side persistence, possibly reducing flexibility and performance. And you don't couple your read/write side back up by leaky abstractions and maintenance tasks.
So could someone please explain to me one or more compelling reasons to keep it on the write side? Or alternatively, why it should NOT go on the read side as a maintenance/reporting concern? Again, I'm not questioning the usefulness of the store. Just where it should go :)
This is a long dead question that someone pointed me to. There are quite a few reasons why its better to store events on the write side.
From my understanding the architecture you are talking about is a very common one that I see ... fail. We will store our domain model in a relational database then put out events. You add the twist of them saving the events on the read side in an event store. This will likely lead to a mess.
The first issue you will run into is in the publishing of your events. What happens when I save to the database and publish to say MSMQ (I die in the middle). So DTC gets introduced between them. This is a huge thing to bring in, distributed transactions should be avoided like the plague. It is also quite inefficient as I am probably making the data durable twice (once to queue once to database). This will limit system throughput by a lot (DTC benchmarks of 200-300 messages/second are common, with events only 20-30k/second is common).
Some work around the need for DTC by putting a table in their database that has the events and operates as a queue. This will avoid the need for DTC however this will still run into the next issue.
What happens when you have a bug? I know you would never write buggy code but one of the Jrs/maintenance developers later working with the project. As an example what happens when the domain object change and the event raised do not match? Say you set State on your domain object to "LA" (hardcoded) but you properly set State on the event to cmd.State ("CT").
How will you detect such errors are occurring? The biggest problem with what is being discussed is that there are now two sources of "truth" there is the database on the write side and the event stream coming out. There is no way to prove that they are equivalent. This will cause all sorts of weird bugs down the line.
I think this is really an excellent question. Treating your aggregate as a sequence of events is useful in its own right on the write side, making command retries and the like easier. But I agree that it seems upsetting to work to create your events, then have to make yet another model of your object for persistence if you need this snapshotting performance improvement.
A system where your aggregates only stored snapshots, but sent events to the read-model for projection into read models would I think be called "CQRS", just not "Event Sourcing". If you kept the events around for re-projection, I guess you'd have a system that was very much both.
But then wouldn't you have three definitions? One for persisting your aggregates, one for communicating state changes, and any number more for answering queries?
In such a system it would be tempting to start answering queries by loading your aggregates and asking them questions directly. While this isn't forbidden by any means, it does tend to start causing those aggregates to accrete functionality they might not otherwise need, not to mention complicating threading and transactions.
One reason for having the event store on the write-side might be for resolving concurrency issues before events become "facts" and get distributed/dispatched, e.g. through optimistic locking on committing to event streams. That way, on the write side you can make sure that concurrent "commits" to the same event stream (aggregate) are resolved, one of them gets through, the other one has to resolve the conflicts in a smart way through comparing events or propagating the conflict to the client, thus rejecting the command.

Ideas for designing a Secure, "Low Cost" method for confirming client-side game results

This is more a system design question/challenge, than a coding question.
Basically, I'm thinking of throwing together a Bejeweled-esque game on Facebook using just HTML, CSS, and javascript. This is mostly out of a desire to learn all the little caveats of FBJS via a non-trivial project.
So here's the deal. When developing for Facebook, actual API calls are very expensive; not only is there an additional POST to the Facebook servers, there's also the api call limit and throttling to worry about. In a nutshell, the fewer calls to Facebook the better. Combine this with the timing concerns of even this simple puzzle game, and there's good reason to aggressively minimize the number of callbacks in general.
Not being a security expert, here's the design I've come up with:
Embed a random seed in the game page.
Use that seed to create the game board (As well as additional pieces as needed).
Tweak the seed (xor, concatenate and hash, something like that) after each player move, based on time since last move. Edit: I should probably also include the actual move taken in mutating the seed.
Upon game completion post back the following: game start time, each move taken and when, and the client side results.
On the server, re-run the game with the given data, sanity checking the start time and move times, and then confirm that the results match.
To mitigate denial of service, the game itself will be tweaked to have a win by turn X condition.
To discourage the server being used as a "oracle" of sorts, a user posting back an invalid game will be banned for some constant time X (X being on the order of minutes).
This design requires three Facebook call per game played: one to store the random seed before the game is played, one to fetch it after the game is finished, and one to update the player's score if the game is valid.
What I'm trying to proof the system against is straight up score spoofing (http://...?myscore=999999999, or similar). I'd also like to mitigate "look ahead" attacks, wherein the user can tell what pieces are coming to the board next. Denial of service attacks on the hosting server (intentional or otherwise) should also be prevented.
The actual question, can anyone see a flaw in this design? Equivalently, is there a simpler design that meets my criteria?
Note: I am aware how unnecessary this probably is, but its an interesting question none the less.
I'm going to try and throw some numbers up here to futher illustrate my reasoning, these are pretty rough but I hope helpful.
Assuming a 10x10 game board, there are ~200 potential moves (swapping two adjacent pieces) most of which are invalid. Let's say there are on average 5 valid moves per "turn". If we constrain player actions to the frame of 50 to 30,000 milliseconds, there are 149,750 potential new hashes provided the "tweaking" algorithm doesn't discard bits; I feel confident in say there are at least 10,000 potential new hashes which must be calculated by an attacker assuming a cryptographically secure hash is used. If you throw a min-max algorithm at this, your decision tree explodes very quickly. Throw a game session expiration at this, say 30 minutes, and I believe the attack because equivalent in complexity to writing a little bot program to play for you which cannot reasonably be defended against.
If the client code calculates the next piece and you can't hide this algorithm very well, then some bored college student will figure this out. As a result, they will be able to generate a massive score and defeat your intentions.
I tend to say that it is impossible to do. Why? You cannot trust the client - I could just analyse and completly rewrite the client side code and return whatever values I like. The only way to protect you from cheating and all kinds of attacks is to perform the logic at the server - the client will just collect user input and display the server output. But this is completly against your design goal to minimize the number of server calls.