HIS-Metric "calling" - code-metrics

I do not understand the reason for this metric/rule:
A function should not be called from more than 5 different functions.
All calls within the same function are counted as 1. The rule is
limited to translation unit scope.
It appears to me completely intuitive, because this contradicts code reuse and the approach of split code into often used functions instead of duplicated code.
Can someone explain the rationale?

The first thing to say is that Metric-based quality approaches are by their nature a little subjective and approximate. There are no absolutes in following a metric approach to delivering good quality code.
There are two factors to consider in software complexity. One is the internal complexity, expressed by decision complexity within each function (best exemplified by the Cyclomatic Complexity measure) and dependency complexity between functions within the container (Translation Unit or Class). The other is interface complexity, measuring the level of dependency, including cyclic ones, between collaborating and hierarchical components or classes. In the C/C++ world, this is across multiple TUs. In Structure101 terms, the internal form of complexity is called “Fat” and the external form called “Tangles”.
Back to your question, this Hersteller Initiative Software ‘CALLING’ metric is targeting internal complexity (Fat). Their argument appears to be that if you have more than 5 points of reference to a single function, there may be too much implementation logic in that C++ class or C implementation file, and therefore perhaps time to break into separate modules or components. It seems like a peculiarly stinted view of software design and structure, and the list of exceptions may be as long as the areas where such a judgement might apply.

Related

In multi-stage compilation, should we use a standard serialisation method to ship objects through stages?

This question is formulated in Scala 3/Dotty but should be generalised to any language NOT in MetaML family.
The Scala 3 macro tutorial:
https://docs.scala-lang.org/scala3/reference/metaprogramming/macros.html
Starts with the The Phase Consistency Principle, which explicitly stated that free variables defined in a compilation stage CANNOT be used by the next stage, because its binding object cannot be persisted to a different compiler process:
... Hence, the result of the program will need to persist the program state itself as one of its parts. We don’t want to do this, hence this situation should be made illegal
This should be considered a solved problem given that many distributed computing frameworks demands the similar capability to persist objects across multiple computers, the most common kind of solution (as observed in Apache Spark) uses standard serialisation/pickling to create snapshots of the binded objects (Java standard serialization, twitter Kryo/Chill) which can be saved on disk/off-heap memory or send over the network.
The tutorial itself also suggested the possibility twice:
One difference is that MetaML does not have an equivalent of the PCP - quoted code in MetaML can access variables in its immediately enclosing environment, with some restrictions and caveats since such accesses involve serialization. However, this does not constitute a fundamental gain in expressiveness.
In the end, ToExpr resembles very much a serialization framework
Instead, Both Scala 2 & Scala 3 (and their respective ecosystem) largely ignores these out-of-the-box solutions, and only provide default methods for primitive types (Liftable in scala2, ToExpr in scala3). In addition, existing libraries that use macro relies heavily on manual definition of quasiquotes/quotes for this trivial task, making source much longer and harder to maintain, while not making anything faster (as JVM object serialisation is an highly-optimised language component)
What's the cause of this status quo? How do we improve it?

Single function handling events or a function for each event?

What to prefer in which situation: Single function handling events or a function for each event?
Here is a basic code example:
Option 1
enum Notification {
case A
case B
case C
}
protocol One {
func consumer(consumer: Consumer, didReceiveNotification notification: Notification)
}
or
Option 2
protocol Two {
func consumerDidReceiveA(consumer: Consumer)
func consumerDidReceiveB(consumer: Consumer)
func consumerDidReceiveC(consumer: Consumer)
}
Background
Apple use both options. E.g. for NSStreamDelegate we have the first option, while in CoreBluetooth (e.g. CBCentralManagerDelegate) we see option two.
One big difference I see is that Swift does not support optional protocol methods nicely (via extension or #obj keyword).
What would you prefer? What's the (dis)advantages?
In terms of achieving the loosest form of coupling and highest degree of cohesion, naturally the choice would sway towards individual events, not this kind of multi-event bundle of responsibilities.
Yet there are a lot of practical concerns that might move you towards favoring the opposite, coarser way of dealing with events instead of a separate function per granular event.
Here are some possible ones (not listed in any specific order).
Boilerplate
While it's not the biggest thing to worry about, writing a bunch of functions tends to take a bit more effort than writing a bunch of if/else statements or switch cases within one. More importantly than this, however, is the code needed to connect/disconnect event-handling slots to event-handling signals. Avoiding the need to write that subscription/unsubscription kind of code for every single teeny event handled can save considerably on the amount of code to maintain.
Performance
It might seem counter-intuitive that performance can favor the coarser multi-event handler. After all, the granular event-handler requires less branching (one dynamic dispatch to get to the precise event handler), while the coarser one requires twice as much (one dynamic dispatch to get to a coarse event-handling site, and another local series of branches to get to the precise event-handling code).
Yet the cost of dynamic dispatch leans heavily on branch prediction. If you're branching into coarser event handlers, then often you're branching more often into the same set of instructions, and that can be an optimization strategy. To have two sets of more predictable branches can often produce more optimal results than one less predictable branch.
Moreover, coarser event-handling typically implies fewer aggregates, fewer lists of functions to call on the side of those triggering events. And that can translate to reduced memory usage and improved locality of reference.
On the flip side, to branch into coarser event handlers often means branching more often. For example, some site might only be interested in push kind of input events, not resize events. If we lump all these together into a coarse event handler and without some filtering mechanism on top, then typically we would have to pay the cost of dynamic dispatch even for a resize event that isn't even handled for a particular site.
Yet I've found that this is actually often better than I thought it would be to branch needlessly into the same coarse functions (most likely due to the branch predictor succeeding) as opposed to branching into a wide variety of disparate functions and only as needed.
So there's a balancing act here and even performance doesn't clearly side with one strategy over the another. It still varies case-by-case.
Nevertheless, lacking measurements and very detailed data about the critical code paths, it's typically safer from a performance perspective to err on the side of these coarser multi-event handlers. After all, even if that proves to be the wrong decision from a performance standpoint, it's easier to optimize from coarse to fine (we can even do so very non-intrusively by keeping the coarse and using fine-grained event-handling in cases that benefit most from it) than vice versa.
Event Subscription/Unsubscription
This can likewise swing one way or the other, but in my experience (from team settings), most of the human errors associated with event handling do not occur within the event-handling code, but outside. The most common source of errors I see relate to failing to subscribing to events and, most commonly, failing to unsubscribe when the events are no longer of interest.
When events are handled at a coarser level, there's typically less of that error-prone subscription/unsubscription code involved (this relates to the boilerplate concerns above, but this is unusual kind of boilerplate in that it can be quite error-prone and not merely tedious to write).
This is also very case-by-case. In the systems I've often been involved in, there was a common need for entities to continue to exist that unsubscribed from certain events prematurely. Those premature cases often required the code to unsubscribe from events to be written manually, as they could not be tied to an entity's lifetime. That may have pointed more to design issues elsewhere, but in that scenario, the number of mistakes made team-wide went down with coarser event handling.
Type Safety
While not shown in the examples here, typically with coarser event-handling is a need to squeeze more disparate types of data through more generic parameters. That might translate in an extreme scenario like in C to squeezing more data through void pointers and more dangerous pointer type casts. With that, compile-time type safety is obliterated and we could start seeing a whole new source of human error.
In higher-level languages, this might translate to more down casts or things of that sort when we cannot model the signature of a delegate to perfectly fit the parameters passed in when an event is triggered.
I've found typically that this isn't the biggest source of confusions and bugs provided that there is at least some form of runtime type safety when casting or unboxing these parameters. But it is a con on the side of coarser event-handling.
Intellectual Overhead
This might vary per individual but I tend to look at systems from a very administrative/overview kind of standpoint and specifically with respect to control flow. It's because I tend to work in lower-level portions of the system, including things like proprietary UI toolkits.
In those cases, when a button is pushed, what functions are called? It turns into a mystery in a large-scale codebase composed of hundreds of thousands of little functions without tracing into the code actually invoked when a button is pushed and seeing each and every function that is called.
That's an inevitability with an event-driven paradigm and something I never became 100% comfortable about, but I find it alleviates some of that explosive complexity that I perceive in my personal mental model (something resembling a very complex graph) when there's less code decentralization. With coarser event handlers comes fewer, more centralized functions to branch into throughout a system on such a button push, and that helps me increase my familiarity when there are fewer but bigger functions involved in my mental graph.
There is a very simple practical benefit here where if you want to find out when a specific entity responds to a series of events, we can simply put a breakpoint on this one coarse event-handling site (while still being able to drill down a specific event for that specific entity by putting a breakpoint in a local branch of code).
Of course, I might be an exception there working in these low-level systems that everyone uses. It seems a lot of people are comfortable with the idea of just subscribing to a button push event in their code without worrying about all the other subscribers to the same event.
From my kind of holistic control flow view of the system, it helps me to absorb the complexity more easily when there are fewer but coarser event-handling sites in the codebase even though I normally otherwise find monolithic functions to be a burden. Especially in a debugging context where I face a concern like, "What caused this to happen?", that combined with the event-handling concern of "What functions are actually going to be called when this happens?" can really multiply the complexity. With fewer potential target sites where events are handled, the latter concern is mitigated.
Conclusion
So these are some factors that might sway you to choose one design strategy over another. I find myself somewhere in the middle. I generally don't choose design as coarse as say, wndproc on Windows which wants to associate a single, ultra-coarse event handler for every single window event imaginable. Yet I might favor designing at a coarser event-handling level than some just to alleviate this kind of mental complexity, reduce code decentralization, possibly improve performance (always with a profiler in hand).
And then there are times when I choose to design at a very granular level when the complexity isn't that great (typically when the package triggering events isn't that central), when performance isn't a concern or performance actually favors this route, and for the improved type safety. It's all case-by-case.

Object oriented programming with C++-AMP

I need to update some code I used for Aho-Corasick algorithm in order to implement the algorithm using the GPU. However, the code heavily relies on object-oriented programming model. My question is, is it possible to pass objects to parallel for each? If not; is there any way around can be workable and exempt me from re-writing the entire code once again. My apologies if this seems naive question. C++-AMP is the first language I use in GPU programming. Hence, my experience in this field is quiet limited.
The answer to your question is yes, in that you can pass classes or structs to a lambda marked restrict(amp). Note that the parallel_foreach` is not AMP restricted, its lambda is.
However you are limited to using the types that are supported by the GPU. This is more of a limitation of current GPU hardware, rather than C++ AMP.
A C++ AMP-compatible function or lambda can only use C++
AMP-compatible types, which include the following:
int
unsigned int
float
double
C-style arrays of int, unsigned int, float, or double
concurrency::array_view or references to concurrency::array
structs containing only C++ AMP-compatible types
This means that some data types are forbidden:
bool (can be used for local variables in the lambda)
char
short
long long
unsigned versions of the above
References and pointers (to a compatible type) may be used locally but
cannot be captured by a lambda. Function pointers, pointer-to-pointer,
and the like are not allowed; neither are static or global variables.
Classes must meet more rules if you wish to use instances of them.
They must have no virtual func- tions or virtual inheritance.
Constructors, destructors, and other nonvirtual functions are allowed.
The member variables must all be of compatible types, which could of
course include instances of other classes as long as those classes
meet the same rules.
... From the C++ AMP book, Ch, 3.
So while you can do this it may not be the best solution for performance reasons. CPU and GPU caches are somewhat different. This makes arrays of structs a better choice of CPU implementations, whereas GPUs often perform better if structs of arrays are used.
GPU hardware is designed to provide the best performance when all
threads within a warp are access- ing consecutive memory and
performing the same operations on that data. Consequently, it should
come as no surprise that GPU memory is designed to be most efficient
when accessed in this way. In fact, load and store operations to the
same transfer line by different threads in a warp are coalesced into
as little as a single transaction. The size of a transfer line is
hardware-dependent, but in general, your code does not have to account
for this if you focus on making memory accesses as contiguous as
possible.
... Ch. 7.
If you take a look at the CPU and GPU implementations of the my n-body example you'll see implementations of both approaches for CPU and GPU.
The above does not mean that your algorithm will not run faster when you move the implementation to C++ AMP. It just means that you may be leaving some additional performance on the table. I would recommend doing the simplest port possible and then consider if you want to invest more time optimizing the code, possibly rewriting it to take better advantage of the GPU's architecture.

hooks versus middleware in slim 2.0

Can anyone explain if there are any significant advantages or disadvantages when choosing to implement features such as authentication or caching etc using hooks as opposed to using middleware?
For instance - I can implement a translation feature by obtaining the request object through custom middleware and setting an app language variable that can be used to load the correct translation file when the app executes. Or I can add a hook before the routing and read the request variable and then load the correct file during the app execution.
Is there any obvious reason I am missing that makes one choice better than the other?
Super TL/DR; (The very short answer)
Use middleware when first starting some aspect of your application, i.e. routers, the boot process, during login confirmation, and use hooks everywhere else, i.e. in components or in microservices.
TL/DR; (The short answer)
Middleware is used when the order of execution matters. Because of this, middleware is often added to the execution stack in various aspects of code (middleware is often added during boot, while adding a logger, auth, etc. In most implementations, each middleware function subsequently decides if execution is continued or not.
However, using middleware when order of execution does not matter tends to lead to bugs in which middleware that gets added does not continue execution by mistake, or the intended order is shuffled, or someone simply forgets where or why a middleware was added, because it can be added almost anywhere. These bugs can be difficult to track down.
Hooks are generally not aware of the execution order; each hooked function is simply executed, and that is all that is guaranteed (i.e. adding a hook after another hook does not guarantee the 2nd hook is always executed second, only that it will simply be executed). The choice to perform it's task is left up to the function itself (to call out to state to halt execution). Most people feel this is much simpler and has fewer moving parts, so statistically yields less bugs. However, to detect if it should run or not, it can be important to include additional state in hooks, so that the hook does not reach out into the app and couple itself with things it's not inherently concerned with (this can take discipline to reason well, but is usually simpler). Also, because of their simplicity, hooks tend to be added at certain named points of code, yielding fewer areas where hooks can exist (often a single place).
Generally, hooks are easier to reason with and store because their order is not guaranteed or thought about. Because hooks can negate themselves, hooks are also computationally equivalent, making middleware only a form of coding style or shorthand for common issues.
Deep dive
Middleware is generally thought of today by architects as a poor choice. Middleware can lead to nightmares and the added effort in debugging is rarely outweighed by any shorthand achieved.
Middleware and Hooks (along with Mixins, Layered-config, Policy, Aspects and more) are all part of the "strategy" type of design pattern.
Strategy patterns, because they are invoked whenever code branching is involved, are probably one of if not the most often used software design patterns.
Knowledge and use of strategy patterns are probably the easiest way to detect the skill level of a developer.
A strategy pattern is used whenever you need to apply "if...then" type of logic (optional execution/branching).
The more computational thought experiments that are made on a piece of software, the more branches can mentally be reduced, and subsequently refactored away. This is essentially "aspect algebra"; constructing the "bones" of the issue, or thinking through what is happening over and over, reducing the procedure to it's fundamental concepts/first principles. When refactoring, these thought experiments are where an architect spends the most time; finding common aspects and reducing unnecessary complexity.
At the destination of complexity reduction is emergence (in systems theory vernacular, and specifically with software, applying configuration in special layers instead of writing software in the first place) and monads.
Monads tend to abstract away what is being done to a level that can lead to increased code execution time if a developer is not careful.
Both Monads and Emergence tend to abstract the problem away so that the parts can be universally applied using fundamental building blocks. Using Monads (for the small) and Emergence (for the large), any piece of complex software can be theoretically constructed from the least amount of parts possible.
After all, in refactoring: "the easiest code to maintain is code that no longer exists."
Functors and mapping functions
A great way to continually reduce complexity is applying functors and mapping functions. Functors are also usually the fastest possible way to implement a branch and let the compiler see into the problem deeply so it can optimize things in the best way possible. They are also extremely easy to reason with and maintain, so there is rarely harm in leaving your work for the day and committing your changes with a partially refactored application.
Functors get their name from mathematics (specifically category theory, in which they are referred to a function that maps between two sets). However, in computation, functors are generally just objects that map problem-space in one way or another.
There is great debate over what is or is not a functor in computer science, but in keeping with the definition, you only need to be concerned with the act of mapping out your problem, and using the "functor" as a temporary thought scaffold that allows you to abstract the issue away until it becomes configuration or a factor of implementation instead of code.
As far as I can say that middleware is perfect for each routing work. And hooks is best for doing anything application-wide. For your case I think it should be better to use hooks than middleware.

Design strategy advice for defining machine system functionality

This question relates to project design. The project takes an electrical system and defines it function programatically. Now that I'm knee-deep in defining the system, I'm incorporating a significant amount of interaction which causes the system to configure itself appropriately. Example: the system opens and closes electrical contactors when certain events occur. Because this system is on an airplane, it relies on air/ground logic and thus incorporates two different behaviors depending on where it is.
I give all of this explanation to demonstrate the level of complexity that this application contains. As I have continued in my design, I have employed the use of if/else constructs as a means of extrapolating the proper configurations in this electrical system. However, the deeper I get into the coding, the more if/else constructs are required. I feel that I have reached a point where I am inefficiently programing this system.
For those who tackled projects like this before, I ask: Am I treading a well-known path (when it comes to defining EVERY possible scenario that could occur) and I should continue to persevere... or can I employ some other strategies to accomplish the task of defining a real-world system's behavior.
At this point, I have little to no experience using delegates, but I wonder if I could utilize some observers or other "cocoa-ey" goodness for checking scenarios in lieu of endless if/else blocks.
Since you are trying to model a real world system, I would suggest creating a concrete object oriented design that well defines the is-a and a has-a relationships and apply good old fashioned object oriented design and apply it into breaking the real world system into a functional decomposition.
I would suggest you look into defining protocols that handle the generic case, and using them on specific cases.
For example, you can have many types of events adhering to an ElectricalEvent protocol and depending on the type you can better decide how an ElectricalContactor discriminates on a GeneralElectricEvent versus a SpecializedElectricEvent using the isKindOfClass selector.
If you can define all the states in advance, you're best of implementing this as a finite state machine. This allows you to define the state-dependent logic clearly in one central place.
There are some implementations that you could look into:
SCM allows you to generate state machine code for Objective-C
OFC implements them as DFSM
Of course you can also roll your own customized implementation if that suits you better.