Scala pipelines - DSL for building a DAG workflow - scala

Im curious about the current libraries for Scala & Akka which would allow me to elegantly build a workflow pipeline.
In my case a workflow is just a DAG of operations so actors/Akka feels like a good fit.
My question is what's the best approach? There are Libs like reactive streams which allow really elegant composition of a pipeline but they seem very record focused.
My use case is a flow of operations passing messages between them. Future composition is nice but syntax becomes unwieldy after a while. Maybe there is something better with scalaz and shapeless.
What are the approaches and tools to building a DSL for pipelines of computation steps using message passing?

While still in early development (pre 1.0 as of writing), you should have a look at akka-streams, which are exactly that - a way to describe a computation graph and then run it asynchronously.

If your pipeline is very much like a chain of method calls, use a chain of method calls!
There's no point making the solution more complicated than it needs to be; if it's well-modelled by a chain of methods calls, just use that. (Or functions, which you can compose.)
If you need something slightly more complicated but you don't actually need any message-passing, you might want something like AsyncFP or Scala.Rx.
If you need a multi-core solution, but you have stretches that look like method calls, then have a chain of method calls inside one stop. You could use Akka streams for that without having to worry so much about the overhead to useful computation ratio.

Related

How to call Rust functions in Flutter (Dart) via FFI, but with convenience and safety?

I know we can call Rust from Flutter/Dart via FFI. But Flutter only allows the C ABI when doing FFI. Therefore, I have to manually write down boilerplate code. Especially, Rust unsafe code - since I have to deal with lots of raw pointers :(
Therefore, is there any approaches to do it in a safe way? We know Rust itself is very safe (since its unique memory management approach), and Dart/Flutter itself is also very safe (since GC). But I do not want the ffi call be the Achilles heel and destroy the safety of my app!
There are several ways to do it.
a. JSON/Protobuf-based Approach
The first way that I have used in the production environment for a year is that, you can use JSON or Protobuf to pass all the data between Rust and Dart/Flutter. By doing this, you do not need to write down tons of boilerplate code to allocate/free a String, a List of bytes, a struct/class, etc. All you need to do is to write down one single function that accepts a byte array payload and outputs a byte array result. By saying "one" function, I mean, you can have an action field in your JSON/Protobuf, so calls to indeed different Rust functions can be interleaved into this one thin interface.
Despite its convenience (only a bit of unsafe boilerplate), the drawback is also evident. The serialization and deserialization does not come for free. You will have to pay the CPU time and memory for it, which can be quite large sometimes. Moreover, you cannot easily pass around big objects. For example, if you have an image (you know, at least megabytes of size), serializing it to Protobuf, then deserialize it from Protobuf can be quite a waste of both CPU and memory - useless copies! Even worse, since Flutter/Dart FFI does not support a convenient way of async FFI, you have to make it running in a separate worker isolate - one more memory copy. You can see more here: https://github.com/dart-lang/language/issues/1862 (this is an issue that I opened).
b. Code generator
The second way that I use recently is to write down a code generator. Indeed the code follows several common patterns, such as "allocate - fill data - call FFI - free", etc. So it is not that hard to write a generator to automatically do such kind of things. The idea is to mimic what human beings will do when they write down boilerplate code manually.
I did hope that there already exist some code generator such that I could directly use, but it seemed that none exists... So, go and write it by yourself.
c. Use existing open-source code generator
After I write down the code generator, I guess people may have the same problem as me, so I open-sourced it: https://github.com/fzyzcjy/flutter_rust_bridge
Indeed, my code generator not only solves the problem above, but also have rich type support, allows zero-copy, allows async programming and direct call from main isolate, etc, which can be implemented via code generator but will require lots of boilerplate code if you do it by hand.
Disclaimer: This is a Q&A-style answer to show my thoughts and what I have done on this problem that is critical to my own app in production environment. Indeed I have used the JSON approach since last year, and later refactor into the code generator approach. Hope it also helps other people who faces the same situation!

How to make a code thread safe in scala?

I have a code in scala that, for various reasons, have few lines of code that cannot be accessed by more threads at the same time.
How to easily make it thread-safe? I know I could use Actors model, but I find it a bit too overkill for few lines of code.
I would use some kind of lock, but I cannot find any concrete examples on either google or on StackOverflow.
I think that the most simple solution would be to use synchronized for critical sections (just like in Java). Here is Scala syntax for it:
someObj.synchronized {
// tread-safe part
}
It's easy to use, but it blocks and can easily cause deadlocks, so I encourage you to look at java.util.concurrent or Akka for, probably, more complicated, but better/non-blocking solutions.
You can use any Java concurrency construct, such as Semaphores, but I'd recommend against it, as semaphores are error prone and clunky to use. Actors are really the best way to do it here.
Creating actors is not necessarily hard. There is a short but useful tutorial on actors over at scala-lang.org: http://www.scala-lang.org/node/242
If it is really very simple you can use synchronized: http://www.ibm.com/developerworks/java/library/j-scala02049/index.html
Or you could use some of the classes from the concurrent package in the jdk: http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/package-summary.html
If you want to use actors, you should use akka actors (they will replace scala actors in the future), see here: http://doc.akka.io/docs/akka/2.0.1/. They also support things like FSM (Finite State Machine) and STM (Software Transactional Memory).
In general try to use pure 'functions' or methods with immutable data structures that should help with thread safety.

Key-Value-Observation -- Looking for a more elegant solution to respond to value changes

I've run into a frustrating feature of KVO: all notifications are funneled through a single method (observeValueForKeyPath:....), requiring a bunch of IF statements if the object is observing numerous properties.
The ideal solution would be to pass a method as an argument to the method that establishes the observing in the first place, but it seems this isn't possible. Does a solution exist to this problem? I initially considered using the keyPath argument (addObserver:forKeyPath:options:context:) to call a method via NSSelectorFromString, but then I came across the post KVO Dispatcher pattern with Method as context and the article it linked to which offers a different solution in order to pass arguments along as well (although I haven't gotten that working yet).
I know a lot of people have come up against this issue. Has a standard way of handling it emerged?
OP asks:
Has a standard way of handling it emerged?
No, not really. There are a lot of different approaches out there. Here are some:
https://github.com/sleroux/KVO-Blocks
http://pandamonia.github.io/BlocksKit
http://www.mikeash.com/pyblog/friday-qa-2012-03-02-key-value-observing-done-right-take-2.html
https://github.com/ReactiveCocoa/ReactiveCocoa
http://blog.andymatuschak.org/post/156229939/kvo-blocks-block-callbacks-for-cocoa-observers
Seriously, there are a ton of these... Google "KVO blocks"
I can't say that any of the options I've seen seem prevalent enough to earn the title "standard way". I suspect most folks who feel motivated to conquer this issue just pick one and go with it, or write their own -- it's not as if adapting KVO to use block based callbacks is rocket science. The Method-based approach you link to doesn't seem like a step forward for simplicity. I get that you're trying to take the uncertainty of the string-based-key-path <-> method conversion out of the equation, but that kind of falls down because not all observable keys/keyPaths are methods. (If nothing else, you can observe arbitrary keys on NSMutableDictionaries and get notifications.)
It sure would be nice if Apple would release a new blocks-based KVO API, but I'm not holding my breath. But in the meantime, like I said, just pick one you like and use it or write your own and use that.

Testing with probabilistic failure of components in Akka (Scala)

I've started using Akka with Scala to develop a set of interacting components in a bus-oriented architecture. I need to test the fault-tolerance of the system, and for that I was wondering if there is any way to use a probabilistic model of failure (i.e., set some failure parameters for each Actor) within a Scala test framework. Any ideas? Any framework out there that already implements this?
I assume you know thinks like Testkit and read the documentation at http://akka.io/docs/akka/1.3/scala/testing.html#akka-testkit (see also http://roestenburg.agilesquad.com/2011/02/unit-testing-akka-actors-with-testkit_12.html )
You don't need Akka in the test setup, if I understood your problem right. Assume that Akka itself is tested and works OK. Now you only have to test your code. Since you didn't show code it's hard to give advice, but I will try:
you can test your method calls in different sequences, and assert the results. I would hardcode the sequences, but you can also randomize that.
show some code and I will clarify what I mean. I also could be wrong, if I understood your question wrong.

Why use a post compiler?

I am battling to understand why a post compiler, like PostSharp, should ever be needed?
My understanding is that it just inserts code where attributed in the original code, so why doesn't the developer just do that code writing themselves?
I expect that someone will say it's easier to write since you can use attributes on methods and then not clutter them up boilerplate code, but that can be done using DI or reflection and a touch of forethought without a post compiler. I know that since I have said reflection, the performance elephant will now enter - but I do not care about the relative performance here, when the absolute performance for most scenarios is trivial (sub millisecond to millisecond).
Let's try to take an architectural point on the issue. Say you are an architect (everyone wants to be an architect ;)
You need to deliver the architecture to your team:
a selected set of libraries, architectural patterns, and design patterns. As a part of your design, you say: "we will implement caching using the following design pattern:"
string key = string.Format("[{0}].MyMethod({1},{2})", this, param1, param2 );
T value;
if ( !cache.TryGetValue( key, out value ) )
{
using ( cache.Lock(key) )
{
if (!cache.TryGetValue( key, out value ) )
{
// Do the real job here and store the value into variable 'value'.
cache.Add( key, value );
}
}
}
This is a correct way to do tracing. Developers are going to implement this pattern thousands of times, so you write a nice Word document telling how you want the pattern to be implemented. Yeah, a Word document. Do you have a better solution? I'm afraid you don't. Classic code generators won't help. Functional programming (delegates)? It works fairly well for some aspects, but not here: you need to pass method parameters to the pattern. So what's left? Describe the pattern in natural language and trust developers will implement them.
What will happen?
First, some junior developer will look at the code and tell "Hm. Two cache lookups. Kinda useless. One is enough." (that's not a joke -- ask the DNN team about this issue). And your patterns cease to be thread-safe.
As an architect, how do you ensure that the pattern is properly applied? Unit testing? Fair enough, but you will hardly detect threading issues this way. Code review? That's maybe the solution.
Now, what is you decide to change the pattern? For instance, you detect a bug in the cache component and decide to use your own? Are you going to edit thousands of methods? It's not just refactoring: what if the new component has different semantics?
What if you decide that a method is not going to be cached any more? How difficult will it be to remove caching code?
The AOP solution (whatever the framework is) has the following advantages over plain code:
It reduces the number of lines of code.
It reduces the coupling between components, therefore you don't have to change much things when you decide to change the logging component (just update the aspect), therefore it improves the capacity of your source code to cope with new requirements over time.
Because there is less code, the probability of bugs is lower for a given set of features, therefore AOP improves the quality of your code.
So if you put it all together:
Aspects reduce both development costs and maintenance costs of software.
I have a 90 min talk on this topic and you can watch it at http://vimeo.com/2116491.
Again, the architectural advantages of AOP are independent of the framework you choose. The differences between frameworks (also discussed in this video) influence principally the extent to which you can apply AOP to your code, which was not the point of this question.
Suppose you already have a class which is well-designed, well-tested etc. You want to easily add some timing on some of the methods. Yes, you could use dependency injection, create a decorator class which proxies to the original but with timing for each method - but even that class is going to be a mess of repetition...
... or you can add reflection to the mix and use a dynamic proxy of some description, which lets you write the timing code once, but requires you to get that reflection code just right -which isn't as easy as it might be, especially if generics are involved.
... or you can add an attribute to each method that you want timed, write the timing code once, and apply it as a post-compile step.
I know which seems more elegant to me - and more obvious when reading the code. It can be applied even in situations where DI isn't appropriate (and it really isn't appropriate for every single class in a system) and with no other changes elsewhere.
AOP (PostSharp) is for attaching code to all sorts of points in your application, from one location, so you don't have to place it there.
You cannot achieve what PostSharp can do with Reflection.
I personally don't see a big use for it, in a production system, as most things can be done in other, better, ways (logging, etc).
You may like to review the other threads on this matter:
Anyone with Postsharp experience in production?
Other than logging, and transaction management what are some practical applications of AOP?
Aspect Oriented Programming: What do you use PostSharp for?
etc (search)
Aspects take away all the copy & paste - code and make adding new features faster.
I hate nothing more than, for example, having to write the same piece of code over and over again. Gael has a very nice example regarding INotifyPropertyChanged on his website (www.postsharp.net).
This is exactly what AOP is for. Forget about the technical details, just implement what you are being asked for.
In the long run, I think we all should say goodbye to the way we are writing software now. It's tedious and plainly stupid to write boilerplate code and iterate manually.
The future belongs to declarative, functional style being held together by an object oriented framework - and the cross cutting concerns being handled by aspects.
I guess the only people who will not get it soon are the guys who are still payed for lines of code.