In multi-stage compilation, should we use a standard serialisation method to ship objects through stages? - scala

This question is formulated in Scala 3/Dotty but should be generalised to any language NOT in MetaML family.
The Scala 3 macro tutorial:
https://docs.scala-lang.org/scala3/reference/metaprogramming/macros.html
Starts with the The Phase Consistency Principle, which explicitly stated that free variables defined in a compilation stage CANNOT be used by the next stage, because its binding object cannot be persisted to a different compiler process:
... Hence, the result of the program will need to persist the program state itself as one of its parts. We don’t want to do this, hence this situation should be made illegal
This should be considered a solved problem given that many distributed computing frameworks demands the similar capability to persist objects across multiple computers, the most common kind of solution (as observed in Apache Spark) uses standard serialisation/pickling to create snapshots of the binded objects (Java standard serialization, twitter Kryo/Chill) which can be saved on disk/off-heap memory or send over the network.
The tutorial itself also suggested the possibility twice:
One difference is that MetaML does not have an equivalent of the PCP - quoted code in MetaML can access variables in its immediately enclosing environment, with some restrictions and caveats since such accesses involve serialization. However, this does not constitute a fundamental gain in expressiveness.
In the end, ToExpr resembles very much a serialization framework
Instead, Both Scala 2 & Scala 3 (and their respective ecosystem) largely ignores these out-of-the-box solutions, and only provide default methods for primitive types (Liftable in scala2, ToExpr in scala3). In addition, existing libraries that use macro relies heavily on manual definition of quasiquotes/quotes for this trivial task, making source much longer and harder to maintain, while not making anything faster (as JVM object serialisation is an highly-optimised language component)
What's the cause of this status quo? How do we improve it?

Related

HIS-Metric "calling"

I do not understand the reason for this metric/rule:
A function should not be called from more than 5 different functions.
All calls within the same function are counted as 1. The rule is
limited to translation unit scope.
It appears to me completely intuitive, because this contradicts code reuse and the approach of split code into often used functions instead of duplicated code.
Can someone explain the rationale?
The first thing to say is that Metric-based quality approaches are by their nature a little subjective and approximate. There are no absolutes in following a metric approach to delivering good quality code.
There are two factors to consider in software complexity. One is the internal complexity, expressed by decision complexity within each function (best exemplified by the Cyclomatic Complexity measure) and dependency complexity between functions within the container (Translation Unit or Class). The other is interface complexity, measuring the level of dependency, including cyclic ones, between collaborating and hierarchical components or classes. In the C/C++ world, this is across multiple TUs. In Structure101 terms, the internal form of complexity is called “Fat” and the external form called “Tangles”.
Back to your question, this Hersteller Initiative Software ‘CALLING’ metric is targeting internal complexity (Fat). Their argument appears to be that if you have more than 5 points of reference to a single function, there may be too much implementation logic in that C++ class or C implementation file, and therefore perhaps time to break into separate modules or components. It seems like a peculiarly stinted view of software design and structure, and the list of exceptions may be as long as the areas where such a judgement might apply.

working around type erasure -- recommended way?

After Scala-2.10 the situation has changed, since there is now a dedicated reflection system.
What is the recommended, best-practice, standard way the community has settled down on in order to amend the deficiencies created by type erasure?
The situation
We all know that the underlying runtime system (JVM / bytecode) is lacking the ability to fully represent parametrised types in a persistent way. This means that the Scala type system can express elaborate type relationships, which lack an unambiguous representation in plain JVM byte code.
Thus, when creating a concrete data instance, the context of the creation contains specific knowledge about the fine points of the embedded data. As long as the creation context is connected to the usage context in a statical fashion, i.e. as long as both are connected directly within a single compilation process, everything is fine, since we stay in "scala realm" where any specific type knowledge can be passed within the compiler.
But as soon as our data instance (object instance) passes a zone where only JVM bytecode properties are guaranteed, this link is lost. This might happen e.g. when
sending the data element as message to another Actor
passing it through subsystems written in other JVM languages (e.g. an OR-Mapper and RDBMS storage)
feeding the data through JVM Serialisation and any marshalling techniques built on top
and even just within Scala, passing through any function signature which discards specific type parameter information and retains only some type bound (e.g. Seq[AnyRef]) or an existential type (Seq[_])
Thus we need a way to marshal the additional specific type information.
An Example
Let's assume we use a type Document[Format]. Documents are sent and retrieved through a family of external service APIs, which mostly talk in JSON format (and are typically not confined to usage from Java alone). Obviously, for some specific kinds of Format, we can write type classes to hold the knowledge how to parse that JSON and convert it into explicit Scala types. But clearly there is no hope for one coherent type hierarchy to cover any kind of Document[Format] (beyond the mere fact that it is a formatted document and can be converted). It figures that we can handle all the generic concerns elegantly (distributing load, handling timeouts and availability of some API / service, keeping a persistent record of data). But for any actual "business" functionality, we need to switch over to specific types eventually.
The Quest
Since the JVM bytecode can not represent the type information we need, without any doubt we need to allocate some additional metadata field within our Document[Format] to represent the information "this document has Format XYZ". So, by looking at that piece of metadata, we can re-gain the fully typed context later on.
My question is about the preferred, most adequate, most elegant, most idiomatic way of solving this problem. That is, in current Scala (>= Scala-2.10).
how to represent the additional type information (i.e. the Format in the example above)? Storing Format.class in the document object? or using a type tag? or would you rather recommend a symbolic representation, e.g. 'Format or "my.package.Format"
or would you rather recommend to store the full type information, e.g. Document[Format], and which representation is recommended?
what is the most concise, most clear, most clean, most readable and self-explanatory solution in code to re-establish the full type context? Using some kind of pattern match? Using some implicit? Using a type class or view bound?
What have people found out to work well in this situation?
Scala documentation: http://docs.scala-lang.org/overviews/reflection/typetags-manifests.html
From the article:
Like scala.reflect.Manifest, TypeTags can be thought of as objects
which carry along all type information available at compile time, to
runtime. For example, TypeTag[T] encapsulates the runtime type
representation of some compile-time type T. Note however, that
TypeTags should be considered to be a richer replacement of the
pre-2.10 notion of a Manifest, that are additionally fully integrated
with Scala reflection.
You should also look at Manifests.

How is Scala suitable for Big Scalable Application

I am taking course Functional Programming Principles in Scala | Coursera on Scala.
I fail to understand with immutability , so many functions and so much dependencies on recursion , how is Scala is really suitable for real world applications.
I mean coming from imperative languages I see a risk of StackOverflow or Garbage Collection kicking in and with multiple copies of everything I am running Out Of Memory
What I a missing here?
Stack overflow: it's possible to make your recursive function tail recursive. Add #tailrec from scala.annotation.tailrec to make sure your function is 100% tail recursive. This is basically a loop.
Most importantly recursive solutions is only one of many patterns available. See "Effective Java" why mutability is bad. Immutable data is much better suitable for large applications: no need to synchronize access, client can't mess with data internals, etc. Immutable structures are very efficient in many cases. If you add an element to the head of a list: elem :: list all data is shared between 2 lists - awesome! Only head is created and pointed to the list. Imagine that you have to create a new deep clone of a list every time client asks for.
Expressions in Scala are more succinct and maybe more lazy - create filter and map and all that applied as needed. You can do the same in Java but ceremony takes forever so usually devs just create multiple temp collections on the way.
Martin Odersky defines mutability as a dependence on time/history. That's very interesting because you can use var inside of a function as long as no other code can be affected in any way, i.e. results are always the same.
Look at Option[T] and compare to null. Use them in for comprehensions. Exception becomes really exceptional and Option, Try, Box, Either communicate failures in a very nice way.
Scala allows to write more modular and generic code with less effort compared to Java.
Find a good piece of Scala code and try to see how you would do it in Java - it will be self evident.
Real world applications are getting more event-driven which involves passing around data across different processes or systems needing immutable data structures
In most of the cases we are either manipulating data or waiting on a resource.
In that case its easy to hook in a callback with Actors
Take a look at
http://pavelfatin.com/scala-for-project-euler/
Which gives you some examples on using functions like map fllter etc. Functions like these are used routinely by Ruby applications
Combination of immutability and recursion avoids a lot of stackoverflow problems. This come in handly while dealing with event driven applications
akka.io is a classic example which could have been build very concisely in scala.

hooks versus middleware in slim 2.0

Can anyone explain if there are any significant advantages or disadvantages when choosing to implement features such as authentication or caching etc using hooks as opposed to using middleware?
For instance - I can implement a translation feature by obtaining the request object through custom middleware and setting an app language variable that can be used to load the correct translation file when the app executes. Or I can add a hook before the routing and read the request variable and then load the correct file during the app execution.
Is there any obvious reason I am missing that makes one choice better than the other?
Super TL/DR; (The very short answer)
Use middleware when first starting some aspect of your application, i.e. routers, the boot process, during login confirmation, and use hooks everywhere else, i.e. in components or in microservices.
TL/DR; (The short answer)
Middleware is used when the order of execution matters. Because of this, middleware is often added to the execution stack in various aspects of code (middleware is often added during boot, while adding a logger, auth, etc. In most implementations, each middleware function subsequently decides if execution is continued or not.
However, using middleware when order of execution does not matter tends to lead to bugs in which middleware that gets added does not continue execution by mistake, or the intended order is shuffled, or someone simply forgets where or why a middleware was added, because it can be added almost anywhere. These bugs can be difficult to track down.
Hooks are generally not aware of the execution order; each hooked function is simply executed, and that is all that is guaranteed (i.e. adding a hook after another hook does not guarantee the 2nd hook is always executed second, only that it will simply be executed). The choice to perform it's task is left up to the function itself (to call out to state to halt execution). Most people feel this is much simpler and has fewer moving parts, so statistically yields less bugs. However, to detect if it should run or not, it can be important to include additional state in hooks, so that the hook does not reach out into the app and couple itself with things it's not inherently concerned with (this can take discipline to reason well, but is usually simpler). Also, because of their simplicity, hooks tend to be added at certain named points of code, yielding fewer areas where hooks can exist (often a single place).
Generally, hooks are easier to reason with and store because their order is not guaranteed or thought about. Because hooks can negate themselves, hooks are also computationally equivalent, making middleware only a form of coding style or shorthand for common issues.
Deep dive
Middleware is generally thought of today by architects as a poor choice. Middleware can lead to nightmares and the added effort in debugging is rarely outweighed by any shorthand achieved.
Middleware and Hooks (along with Mixins, Layered-config, Policy, Aspects and more) are all part of the "strategy" type of design pattern.
Strategy patterns, because they are invoked whenever code branching is involved, are probably one of if not the most often used software design patterns.
Knowledge and use of strategy patterns are probably the easiest way to detect the skill level of a developer.
A strategy pattern is used whenever you need to apply "if...then" type of logic (optional execution/branching).
The more computational thought experiments that are made on a piece of software, the more branches can mentally be reduced, and subsequently refactored away. This is essentially "aspect algebra"; constructing the "bones" of the issue, or thinking through what is happening over and over, reducing the procedure to it's fundamental concepts/first principles. When refactoring, these thought experiments are where an architect spends the most time; finding common aspects and reducing unnecessary complexity.
At the destination of complexity reduction is emergence (in systems theory vernacular, and specifically with software, applying configuration in special layers instead of writing software in the first place) and monads.
Monads tend to abstract away what is being done to a level that can lead to increased code execution time if a developer is not careful.
Both Monads and Emergence tend to abstract the problem away so that the parts can be universally applied using fundamental building blocks. Using Monads (for the small) and Emergence (for the large), any piece of complex software can be theoretically constructed from the least amount of parts possible.
After all, in refactoring: "the easiest code to maintain is code that no longer exists."
Functors and mapping functions
A great way to continually reduce complexity is applying functors and mapping functions. Functors are also usually the fastest possible way to implement a branch and let the compiler see into the problem deeply so it can optimize things in the best way possible. They are also extremely easy to reason with and maintain, so there is rarely harm in leaving your work for the day and committing your changes with a partially refactored application.
Functors get their name from mathematics (specifically category theory, in which they are referred to a function that maps between two sets). However, in computation, functors are generally just objects that map problem-space in one way or another.
There is great debate over what is or is not a functor in computer science, but in keeping with the definition, you only need to be concerned with the act of mapping out your problem, and using the "functor" as a temporary thought scaffold that allows you to abstract the issue away until it becomes configuration or a factor of implementation instead of code.
As far as I can say that middleware is perfect for each routing work. And hooks is best for doing anything application-wide. For your case I think it should be better to use hooks than middleware.

Design strategy advice for defining machine system functionality

This question relates to project design. The project takes an electrical system and defines it function programatically. Now that I'm knee-deep in defining the system, I'm incorporating a significant amount of interaction which causes the system to configure itself appropriately. Example: the system opens and closes electrical contactors when certain events occur. Because this system is on an airplane, it relies on air/ground logic and thus incorporates two different behaviors depending on where it is.
I give all of this explanation to demonstrate the level of complexity that this application contains. As I have continued in my design, I have employed the use of if/else constructs as a means of extrapolating the proper configurations in this electrical system. However, the deeper I get into the coding, the more if/else constructs are required. I feel that I have reached a point where I am inefficiently programing this system.
For those who tackled projects like this before, I ask: Am I treading a well-known path (when it comes to defining EVERY possible scenario that could occur) and I should continue to persevere... or can I employ some other strategies to accomplish the task of defining a real-world system's behavior.
At this point, I have little to no experience using delegates, but I wonder if I could utilize some observers or other "cocoa-ey" goodness for checking scenarios in lieu of endless if/else blocks.
Since you are trying to model a real world system, I would suggest creating a concrete object oriented design that well defines the is-a and a has-a relationships and apply good old fashioned object oriented design and apply it into breaking the real world system into a functional decomposition.
I would suggest you look into defining protocols that handle the generic case, and using them on specific cases.
For example, you can have many types of events adhering to an ElectricalEvent protocol and depending on the type you can better decide how an ElectricalContactor discriminates on a GeneralElectricEvent versus a SpecializedElectricEvent using the isKindOfClass selector.
If you can define all the states in advance, you're best of implementing this as a finite state machine. This allows you to define the state-dependent logic clearly in one central place.
There are some implementations that you could look into:
SCM allows you to generate state machine code for Objective-C
OFC implements them as DFSM
Of course you can also roll your own customized implementation if that suits you better.