What to prefer in which situation: Single function handling events or a function for each event?
Here is a basic code example:
Option 1
enum Notification {
case A
case B
case C
}
protocol One {
func consumer(consumer: Consumer, didReceiveNotification notification: Notification)
}
or
Option 2
protocol Two {
func consumerDidReceiveA(consumer: Consumer)
func consumerDidReceiveB(consumer: Consumer)
func consumerDidReceiveC(consumer: Consumer)
}
Background
Apple use both options. E.g. for NSStreamDelegate we have the first option, while in CoreBluetooth (e.g. CBCentralManagerDelegate) we see option two.
One big difference I see is that Swift does not support optional protocol methods nicely (via extension or #obj keyword).
What would you prefer? What's the (dis)advantages?
In terms of achieving the loosest form of coupling and highest degree of cohesion, naturally the choice would sway towards individual events, not this kind of multi-event bundle of responsibilities.
Yet there are a lot of practical concerns that might move you towards favoring the opposite, coarser way of dealing with events instead of a separate function per granular event.
Here are some possible ones (not listed in any specific order).
Boilerplate
While it's not the biggest thing to worry about, writing a bunch of functions tends to take a bit more effort than writing a bunch of if/else statements or switch cases within one. More importantly than this, however, is the code needed to connect/disconnect event-handling slots to event-handling signals. Avoiding the need to write that subscription/unsubscription kind of code for every single teeny event handled can save considerably on the amount of code to maintain.
Performance
It might seem counter-intuitive that performance can favor the coarser multi-event handler. After all, the granular event-handler requires less branching (one dynamic dispatch to get to the precise event handler), while the coarser one requires twice as much (one dynamic dispatch to get to a coarse event-handling site, and another local series of branches to get to the precise event-handling code).
Yet the cost of dynamic dispatch leans heavily on branch prediction. If you're branching into coarser event handlers, then often you're branching more often into the same set of instructions, and that can be an optimization strategy. To have two sets of more predictable branches can often produce more optimal results than one less predictable branch.
Moreover, coarser event-handling typically implies fewer aggregates, fewer lists of functions to call on the side of those triggering events. And that can translate to reduced memory usage and improved locality of reference.
On the flip side, to branch into coarser event handlers often means branching more often. For example, some site might only be interested in push kind of input events, not resize events. If we lump all these together into a coarse event handler and without some filtering mechanism on top, then typically we would have to pay the cost of dynamic dispatch even for a resize event that isn't even handled for a particular site.
Yet I've found that this is actually often better than I thought it would be to branch needlessly into the same coarse functions (most likely due to the branch predictor succeeding) as opposed to branching into a wide variety of disparate functions and only as needed.
So there's a balancing act here and even performance doesn't clearly side with one strategy over the another. It still varies case-by-case.
Nevertheless, lacking measurements and very detailed data about the critical code paths, it's typically safer from a performance perspective to err on the side of these coarser multi-event handlers. After all, even if that proves to be the wrong decision from a performance standpoint, it's easier to optimize from coarse to fine (we can even do so very non-intrusively by keeping the coarse and using fine-grained event-handling in cases that benefit most from it) than vice versa.
Event Subscription/Unsubscription
This can likewise swing one way or the other, but in my experience (from team settings), most of the human errors associated with event handling do not occur within the event-handling code, but outside. The most common source of errors I see relate to failing to subscribing to events and, most commonly, failing to unsubscribe when the events are no longer of interest.
When events are handled at a coarser level, there's typically less of that error-prone subscription/unsubscription code involved (this relates to the boilerplate concerns above, but this is unusual kind of boilerplate in that it can be quite error-prone and not merely tedious to write).
This is also very case-by-case. In the systems I've often been involved in, there was a common need for entities to continue to exist that unsubscribed from certain events prematurely. Those premature cases often required the code to unsubscribe from events to be written manually, as they could not be tied to an entity's lifetime. That may have pointed more to design issues elsewhere, but in that scenario, the number of mistakes made team-wide went down with coarser event handling.
Type Safety
While not shown in the examples here, typically with coarser event-handling is a need to squeeze more disparate types of data through more generic parameters. That might translate in an extreme scenario like in C to squeezing more data through void pointers and more dangerous pointer type casts. With that, compile-time type safety is obliterated and we could start seeing a whole new source of human error.
In higher-level languages, this might translate to more down casts or things of that sort when we cannot model the signature of a delegate to perfectly fit the parameters passed in when an event is triggered.
I've found typically that this isn't the biggest source of confusions and bugs provided that there is at least some form of runtime type safety when casting or unboxing these parameters. But it is a con on the side of coarser event-handling.
Intellectual Overhead
This might vary per individual but I tend to look at systems from a very administrative/overview kind of standpoint and specifically with respect to control flow. It's because I tend to work in lower-level portions of the system, including things like proprietary UI toolkits.
In those cases, when a button is pushed, what functions are called? It turns into a mystery in a large-scale codebase composed of hundreds of thousands of little functions without tracing into the code actually invoked when a button is pushed and seeing each and every function that is called.
That's an inevitability with an event-driven paradigm and something I never became 100% comfortable about, but I find it alleviates some of that explosive complexity that I perceive in my personal mental model (something resembling a very complex graph) when there's less code decentralization. With coarser event handlers comes fewer, more centralized functions to branch into throughout a system on such a button push, and that helps me increase my familiarity when there are fewer but bigger functions involved in my mental graph.
There is a very simple practical benefit here where if you want to find out when a specific entity responds to a series of events, we can simply put a breakpoint on this one coarse event-handling site (while still being able to drill down a specific event for that specific entity by putting a breakpoint in a local branch of code).
Of course, I might be an exception there working in these low-level systems that everyone uses. It seems a lot of people are comfortable with the idea of just subscribing to a button push event in their code without worrying about all the other subscribers to the same event.
From my kind of holistic control flow view of the system, it helps me to absorb the complexity more easily when there are fewer but coarser event-handling sites in the codebase even though I normally otherwise find monolithic functions to be a burden. Especially in a debugging context where I face a concern like, "What caused this to happen?", that combined with the event-handling concern of "What functions are actually going to be called when this happens?" can really multiply the complexity. With fewer potential target sites where events are handled, the latter concern is mitigated.
Conclusion
So these are some factors that might sway you to choose one design strategy over another. I find myself somewhere in the middle. I generally don't choose design as coarse as say, wndproc on Windows which wants to associate a single, ultra-coarse event handler for every single window event imaginable. Yet I might favor designing at a coarser event-handling level than some just to alleviate this kind of mental complexity, reduce code decentralization, possibly improve performance (always with a profiler in hand).
And then there are times when I choose to design at a very granular level when the complexity isn't that great (typically when the package triggering events isn't that central), when performance isn't a concern or performance actually favors this route, and for the improved type safety. It's all case-by-case.
Related
Experts,
I am in the evaluating moving a nice designed DDD app to an event sourced architecture as a pet project.
As can be inferred, my Aggregate operations are relatively coarse grained. As part of my design process, I now find myself emitting a large number of events from a small subset of operations to satisfy what I know to be requirements of the read model. Is this acceptable practice ?
In addition to this, I have distilled a lot of the domain complexity via use of ValueObjects & entities. Can VO's/ E accept commands and emit events themselves, or should I expose state and add from the command handler lower down the stack ?
In terms of VO's note that I use mutable operations sparingly and it is a trade off between over complicating other areas of my domain.
I now find myself emitting a large number of events from a small subset of operations to satisfy what I know to be requirements of the read model. Is this acceptable practice ?
In most cases, you should have 1 event by command. Remember events describe the user intent so you have to stay close to the user action.
Can VO's/ E accept commands and emit events themselves
No only aggregates emit events and yes you can come up with very messy aggregates if you have a lot of events, but there is solutions for that like delegating the work to the commands and events themselves. I blogged about this technique here: https://dev.to/maximegel/cqrs-scalable-aggregates-731
In terms of VO's note that I use mutable operations sparingly
As long as you're aware of the consequences. That's fine. Trade-off are part of the job, but be sure you team is aware of that since it's written everywhere that value objects are immutable you expose yourselves to confusion and pointer issues.
I do not understand the reason for this metric/rule:
A function should not be called from more than 5 different functions.
All calls within the same function are counted as 1. The rule is
limited to translation unit scope.
It appears to me completely intuitive, because this contradicts code reuse and the approach of split code into often used functions instead of duplicated code.
Can someone explain the rationale?
The first thing to say is that Metric-based quality approaches are by their nature a little subjective and approximate. There are no absolutes in following a metric approach to delivering good quality code.
There are two factors to consider in software complexity. One is the internal complexity, expressed by decision complexity within each function (best exemplified by the Cyclomatic Complexity measure) and dependency complexity between functions within the container (Translation Unit or Class). The other is interface complexity, measuring the level of dependency, including cyclic ones, between collaborating and hierarchical components or classes. In the C/C++ world, this is across multiple TUs. In Structure101 terms, the internal form of complexity is called “Fat” and the external form called “Tangles”.
Back to your question, this Hersteller Initiative Software ‘CALLING’ metric is targeting internal complexity (Fat). Their argument appears to be that if you have more than 5 points of reference to a single function, there may be too much implementation logic in that C++ class or C implementation file, and therefore perhaps time to break into separate modules or components. It seems like a peculiarly stinted view of software design and structure, and the list of exceptions may be as long as the areas where such a judgement might apply.
I read Deprecating the Observer Pattern with Scala.React and found reactive programming very interesting.
But there is a point I can't figure out: the author described the signals as the nodes in a DAG(Directed acyclic graph). Then what if you have two signals(or event sources, or models, w/e) depending on each other? i.e. the 'two-way binding', like a model and a view in web front-end programming.
Sometimes it's just inevitable because the user can change view, and the back-end(asynchronous request, for example) can change model, and you hope the other side to reflect the change immediately.
The loop dependencies in a reactive programming language can be handled with a variety of semantics. The one that appears to have been chosen in scala.React is that of synchronous reactive languages and specifically that of Esterel. You can have a good explanation of this semantics and its alternatives in the paper "The synchronous languages 12 years later" by Benveniste, A. ; Caspi, P. ; Edwards, S.A. ; Halbwachs, N. ; Le Guernic, P. ; de Simone, R. and available at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1173191&tag=1 or http://virtualhost.cs.columbia.edu/~sedwards/papers/benveniste2003synchronous.pdf.
Replying #Matt Carkci here, because a comment wouldn't suffice
In the paper section 7.1 Change Propagation you have
Our change propagation implementation uses a push-based approach based on a topologically ordered dependency graph. When a propagation turn starts, the propagator puts all nodes that have been invalidated since the last turn into a priority queue which is sorted according to the topological order, briefly level, of the nodes. The propagator dequeues the node on the lowest level and validates it, potentially changing its state and putting its dependent nodes, which are on greater levels, on the queue. The propagator repeats this step until the queue is empty, always keeping track of the current level, which becomes important for level mismatches below. For correctly ordered graphs, this process monotonically proceeds to greater levels, thus ensuring data consistency, i.e., the absence of glitches.
and later at section 7.6 Level Mismatch
We therefore need to prepare for an opaque node n to access another node that is on a higher topological level. Every node that is read from during n’s evaluation, first checks whether the current propagation level which is maintained by the propagator is greater than the node’s level. If it is, it proceed as usual, otherwise it throws a level mismatch exception containing a reference to itself, which is caught only in the main propagation loop. The propagator then hoists n by first changing its level to a level above the node which threw the exception, reinserting n into the propagation queue (since it’s level has changed) for later evaluation in the same turn and then transitively hoisting all of n’s dependents.
While there's no mention about any topological constraint (cyclic vs acyclic), something is not clear. (at least to me)
First arises the question of how is the topological order defined.
And then the implementation suggests that mutually dependent nodes would loop forever in the evaluation through the exception mechanism explained above.
What do you think?
After scanning the paper, I can't find where they mention that it must be acyclic. There's nothing stopping you from creating cyclic graphs in dataflow/reactive programming. Acyclic graphs only allow you to create Pipeline Dataflow (e.g. Unix command line pipes).
Feedback and cycles are a very powerful mechanism in dataflow. Without them you are restricted to the types of programs you can create. Take a look at Flow-Based Programming - Loop-Type Networks.
Edit after second post by pagoda_5b
One statement in the paper made me take notice...
For correctly ordered graphs, this process
monotonically proceeds to greater levels, thus ensuring data
consistency, i.e., the absence of glitches.
To me that says that loops are not allowed within the Scala.React framework. A cycle between two nodes would seem to cause the system to continually try to raise the level of both nodes forever.
But that doesn't mean that you have to encode the loops within their framework. It could be possible to have have one path from the item you want to observe and then another, separate, path back to the GUI.
To me, it always seems that too much emphasis is placed on a programming system completing and giving one answer. Loops make it difficult to determine when to terminate. Libraries that use the term "reactive" tend to subscribe to this thought process. But that is just a result of the Von Neumann architecture of computers... a focus of solving an equation and returning the answer. Libraries that shy away from loops seem to be worried about program termination.
Dataflow doesn't require a program to have one right answer or ever terminate. The answer is the answer at this moment of time due to the inputs at this moment. Feedback and loops are expected if not required. A dataflow system is basically just a big loop that constantly passes data between nodes. To terminate it, you just stop it.
Dataflow doesn't have to be so complicated. It is just a very different way to think about programming. I suggest you look at J. Paul Morison's book "Flow Based Programming" for a field tested version of dataflow or my book (once it's done).
Check your MVC knowledge. The view doesn't update the model, so it won't send signals to it. The controller updates the model. For a C/F converter, you would have two controllers (one for the F control, on for the C control). Both controllers would send signals to a single model (which stores the only real temperature, Kelvin, in a lossless format). The model sends signals to two separate views (one for C view, one for F view). No cycles.
Based on the answer from #pagoda_5b, I'd say that you are likely allowed to have cycles (7.6 should handle it, at the cost of performance) but you must guarantee that there is no infinite regress. For example, you could have the controllers also receive signals from the model, as long as you guaranteed that receipt of said signal never caused a signal to be sent back to the model.
I think the above is a good description, but it uses the word "signal" in a non-FRP style. "Signals" in the above are really messages. If the description in 7.1 is correct and complete, loops in the signal graph would always cause infinite regress as processing the dependents of a node would cause the node to be processed and vice-versa, ad inf.
As #Matt Carkci said, there are FRP frameworks that allow loops, at least to a limited extent. They will either not be push-based, use non-strictness in interesting ways, enforce monotonicity, or introduce "artificial" delays so that when the signal graph is expanded on the temporal dimension (turning it into a value graph) the cycles disappear.
Can anyone explain if there are any significant advantages or disadvantages when choosing to implement features such as authentication or caching etc using hooks as opposed to using middleware?
For instance - I can implement a translation feature by obtaining the request object through custom middleware and setting an app language variable that can be used to load the correct translation file when the app executes. Or I can add a hook before the routing and read the request variable and then load the correct file during the app execution.
Is there any obvious reason I am missing that makes one choice better than the other?
Super TL/DR; (The very short answer)
Use middleware when first starting some aspect of your application, i.e. routers, the boot process, during login confirmation, and use hooks everywhere else, i.e. in components or in microservices.
TL/DR; (The short answer)
Middleware is used when the order of execution matters. Because of this, middleware is often added to the execution stack in various aspects of code (middleware is often added during boot, while adding a logger, auth, etc. In most implementations, each middleware function subsequently decides if execution is continued or not.
However, using middleware when order of execution does not matter tends to lead to bugs in which middleware that gets added does not continue execution by mistake, or the intended order is shuffled, or someone simply forgets where or why a middleware was added, because it can be added almost anywhere. These bugs can be difficult to track down.
Hooks are generally not aware of the execution order; each hooked function is simply executed, and that is all that is guaranteed (i.e. adding a hook after another hook does not guarantee the 2nd hook is always executed second, only that it will simply be executed). The choice to perform it's task is left up to the function itself (to call out to state to halt execution). Most people feel this is much simpler and has fewer moving parts, so statistically yields less bugs. However, to detect if it should run or not, it can be important to include additional state in hooks, so that the hook does not reach out into the app and couple itself with things it's not inherently concerned with (this can take discipline to reason well, but is usually simpler). Also, because of their simplicity, hooks tend to be added at certain named points of code, yielding fewer areas where hooks can exist (often a single place).
Generally, hooks are easier to reason with and store because their order is not guaranteed or thought about. Because hooks can negate themselves, hooks are also computationally equivalent, making middleware only a form of coding style or shorthand for common issues.
Deep dive
Middleware is generally thought of today by architects as a poor choice. Middleware can lead to nightmares and the added effort in debugging is rarely outweighed by any shorthand achieved.
Middleware and Hooks (along with Mixins, Layered-config, Policy, Aspects and more) are all part of the "strategy" type of design pattern.
Strategy patterns, because they are invoked whenever code branching is involved, are probably one of if not the most often used software design patterns.
Knowledge and use of strategy patterns are probably the easiest way to detect the skill level of a developer.
A strategy pattern is used whenever you need to apply "if...then" type of logic (optional execution/branching).
The more computational thought experiments that are made on a piece of software, the more branches can mentally be reduced, and subsequently refactored away. This is essentially "aspect algebra"; constructing the "bones" of the issue, or thinking through what is happening over and over, reducing the procedure to it's fundamental concepts/first principles. When refactoring, these thought experiments are where an architect spends the most time; finding common aspects and reducing unnecessary complexity.
At the destination of complexity reduction is emergence (in systems theory vernacular, and specifically with software, applying configuration in special layers instead of writing software in the first place) and monads.
Monads tend to abstract away what is being done to a level that can lead to increased code execution time if a developer is not careful.
Both Monads and Emergence tend to abstract the problem away so that the parts can be universally applied using fundamental building blocks. Using Monads (for the small) and Emergence (for the large), any piece of complex software can be theoretically constructed from the least amount of parts possible.
After all, in refactoring: "the easiest code to maintain is code that no longer exists."
Functors and mapping functions
A great way to continually reduce complexity is applying functors and mapping functions. Functors are also usually the fastest possible way to implement a branch and let the compiler see into the problem deeply so it can optimize things in the best way possible. They are also extremely easy to reason with and maintain, so there is rarely harm in leaving your work for the day and committing your changes with a partially refactored application.
Functors get their name from mathematics (specifically category theory, in which they are referred to a function that maps between two sets). However, in computation, functors are generally just objects that map problem-space in one way or another.
There is great debate over what is or is not a functor in computer science, but in keeping with the definition, you only need to be concerned with the act of mapping out your problem, and using the "functor" as a temporary thought scaffold that allows you to abstract the issue away until it becomes configuration or a factor of implementation instead of code.
As far as I can say that middleware is perfect for each routing work. And hooks is best for doing anything application-wide. For your case I think it should be better to use hooks than middleware.
This question relates to project design. The project takes an electrical system and defines it function programatically. Now that I'm knee-deep in defining the system, I'm incorporating a significant amount of interaction which causes the system to configure itself appropriately. Example: the system opens and closes electrical contactors when certain events occur. Because this system is on an airplane, it relies on air/ground logic and thus incorporates two different behaviors depending on where it is.
I give all of this explanation to demonstrate the level of complexity that this application contains. As I have continued in my design, I have employed the use of if/else constructs as a means of extrapolating the proper configurations in this electrical system. However, the deeper I get into the coding, the more if/else constructs are required. I feel that I have reached a point where I am inefficiently programing this system.
For those who tackled projects like this before, I ask: Am I treading a well-known path (when it comes to defining EVERY possible scenario that could occur) and I should continue to persevere... or can I employ some other strategies to accomplish the task of defining a real-world system's behavior.
At this point, I have little to no experience using delegates, but I wonder if I could utilize some observers or other "cocoa-ey" goodness for checking scenarios in lieu of endless if/else blocks.
Since you are trying to model a real world system, I would suggest creating a concrete object oriented design that well defines the is-a and a has-a relationships and apply good old fashioned object oriented design and apply it into breaking the real world system into a functional decomposition.
I would suggest you look into defining protocols that handle the generic case, and using them on specific cases.
For example, you can have many types of events adhering to an ElectricalEvent protocol and depending on the type you can better decide how an ElectricalContactor discriminates on a GeneralElectricEvent versus a SpecializedElectricEvent using the isKindOfClass selector.
If you can define all the states in advance, you're best of implementing this as a finite state machine. This allows you to define the state-dependent logic clearly in one central place.
There are some implementations that you could look into:
SCM allows you to generate state machine code for Objective-C
OFC implements them as DFSM
Of course you can also roll your own customized implementation if that suits you better.