Scala streaming library differences (Reactive Streams/Iteratee/RxScala/Scalaz...)

Scala streaming library differences (Reactive Streams/Iteratee/RxScala/Scalaz...) - scala

I'm following the Functional Reactive Programming in Scala course on Coursera and we deal with RxScala Observables (based on RxJava).
As far as I know, the Play Iteratee's library looks a bit like RxScala Observables, where Observables a bit like Enumerators and Observers are bit like Iteratees.
There's also the Scalaz Stream library, and maybe some others?
So I'd like to know the main differences between all these libraries.
In which case one could be better than another?
PS: I wonder why Play Iteratees library has not been choosed by Martin Odersky for his course since Play is in the Typesafe stack. Does it mean Martin prefers RxScala over Play Iteratees?
Edit: the Reactive Streams initiative has just been announced, as an attempt to standardize a common ground for achieving statically typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure

PS: I wonder why Play Iteratees library has not been choosed by Martin
Odersky for his course since Play is in the Typesafe stack. Does it
mean Martin prefers RxScala over Play Iteratees?
I'll answer this. The decision of which streaming API's to push/teach is not one that has been made just by Martin, but by Typesafe as a whole. I don't know what Martin personally prefers (though I have heard him say iteratees are just too hard for newcomers), but we at Typesafe think that Iteratees require too high a learning curve to teach them to newcomers in asynchronous IO.
At the end of the day, the choice of streaming library really comes down to your use case. Play's iteratees library handles practically every streaming use case in existence, but at a cost of a very difficult to learn API (even seasoned Haskell developers often struggle with iteratees), and also some loss in performance. Other APIs handle less use cases, RX for example doesn't (currently) handle back pressure, and very few of the other APIs are suitable for simple streamed parsing. But streamed parsing is actually a pretty rare use case for end users, in most cases it suffices to simply buffer then parse. So Typesafe has chosen APIs that are easy to learn and meet the majority of the most common use cases.

Iteratees and Stream aren't really that similar to RxJava. The crucial difference is that they are concerned with resource safety (that is, closing files, sockets, etc. once they aren't needed anymore), which requires feedback (Iteratees can tell Enumerators they are done, but Observers don't tell anything to Observables) and makes them significantly more complex.

Related

Is there a difference between Netflix's RxJava Observables and Guava's ListenableFutures?

I'm using plain old Java 1.6, and am interested in both these libraries.
Having read the documentation, I'm not sure of the differences, if any. Can anyone explain it, or point me at some relevant info? Thanks in advance!

RxJava does a lot more than the ListenableFutures. I'm not familiar with ListenableFutures, but from the docs it seems that it is simply Futures with callbacks and a few simple methods to compose them. On the other hand, RxJava (or the original Reactive Extensions for .NET, which are a huge inspiration to RxJava) models also sequences of values over time - data in motion, basically anything from a stream of mouse moves to a stream of network packets or database results. It also provides various scheduling strategies and many combinators to compose the streams. A good introduction to RxJava and even comparison to the futures is at the wiki page of the project. You can also have a look at Intro to Rx for an introduction to the general concept.

why is scala actors deprecated in 2.10?

I was just comparing different scala actor implementations and now I'm wondering what could have been the motivation to deprecate the existing scala actor implementation in 2.10 and replace the default actor with the Akka implementation? Neither the migration guide nor the first announcement give any explanation.
According to the comparison the two solutions were different enough that keeping both would have been a benefit. Thus, I'm wondering whether there were any major problems with the existing implementation that caused this decision? In other words, was it a technical or a political decision?

I can't but give you a guess answer:
Akka provides a stable and powerful library to work with Actors, along with lots of features that deals with high concurrency (futures, agents, transactional actors, STM, FSM, non-blocking I/O, ...).
Also it implements actors in a safer way than scala's, in that the client code have only access to generic ActorRef. This makes it impossible to interact with actors other than through message-passing.
[edited: As Roland pointed out, this also enables additional features like fault-tolerance through a supervision hierarchy and location transparency: the ability to deploy the actor locally or remotely with no change needed on the client code.
The overall design more closely resembles the original one in erlang.]
Much of the core features were duplicated in scala and akka actors, so a unification seems a most sensible choice (given that the development team of both libraries is now part of the same company, too: Typesafe).
The main gain is avoiding duplication of the same core functionality, which would only create confusion and compatibility issues.
Given that a choice is due, it only remains to decide which would be the standard implementation.
It's evident to me that Akka has more to offer in this respect, being a full-blown framework with many enterprise-level features already included and more to come in the near future.
I can't think of a specific case where scala.actors is capable of accomplishing what akka can't.
p.s. A similar reasoning was made that led to the unification of the standard future/promise implementation in 2.10
The whole scala language and community have to gain from a simplified interface to base language features, instead of a fragmented scene made of different frameworks, each having it's own syntax and model to learn.
The same can't be said for other, more high-level aspects, like web-frameworks, where the developer gains from a richer panorama of available solutions.

Socket.io Scala client

I'm looking for a socket.io client for Scala. I'm well aware of this, but I cringe at the idea of using it in Scala as it wouldn't feel quite natural nor would it allow for an idiomatic implementation. Does any of you, thus, have a suggestion as to where could I find a Scala client?
If so, just the lines for SBT and a link to the doc will suffice as an answer ;)

I'm afraid I don't know any already implemented libraries or apparent solutions for Scala. But I'll present two very simple approaches that should be very easy to use if you have the time to DIY :-)
But of course it really depends on what you want. As you probably already could imagine a plain WebSocket implementation of Java's standard library can be quite efficient if you need to process simple requests. I found one at scala-lang.org implementing a server calculating random numbers. If it is of interest there's also something brewing at the nightly build which might reveal some handy tricks.
If you want to go for simplicity and for pure Scala in all its might the Actors (in particular a RemoteActor) are immensly powerful. It requires Scala on both ends naturally, but it gives you a messaging-system almost instantly. This is a pretty good start-guide if you aren't already familiar with them.
Anyway. If no good library surfaces I hope this helped. Good luck.

Clojure futures in context of Scala's concurrency models

After being exposed to scala's Actors and Clojure's Futures, I feel like both languages have excellent support for multi core data processing.
However, I still have not been able to determine the real engineering differences between the concurrency features and pros/cons of the two models. Are these languages complimentary, or opposed in terms of their treatment of concurrent process abstractions?
Secondarily, regarding big data issues, it's not clear wether the scala community continues to support Hadoop explicitly (whereas the clojure community clearly does ). How do Scala developers interface with the hadoop ecosystem?

Some solutions are well solved by agents/actors and some are not. This distinction is not really about languages more than how specific problems fit within general classes of solutions. This is a (very short) comparason of Actors/agents vs. References to try to clarify the point that the tool must fit the concurrency problem.
Actors excel in distributed situation where no data needs to be concurrently modified. If your problem can be expressed purely by passing messages then actors will do the trick. Actors work poorly where they need to modify several related data structures at the same time. The canonical example of this being moving money between bank accounts.
Clojure's refs are a great solution to the problem of many threads needing to modify the same thing at the same time. They excel at shared memory multi-processor systems like today's PCs and Servers. In addition to the Bank account example, Rich Hickey (the author of clojure) uses the example of a baseball game to explain why this is important. If you wanted to use actors to represent a baseball game then before you moved the ball, all the fans would have to send it a message asking it where it was... and if they wanted to watch a player catching the ball things get even more complex.
Clojure has cascalog which makes writing hadoop jobs look a lot like writing clojure.

Actors provide a way of handling the potential interleaving and synchronization control that inevitably comes when trying to get multiple threads to work together. Each actor has a queue of messages that it processes in order one at a time so as to avoid the need to include explicit locks. In this case a Future provides a way of waiting for a response from an actor.
As far as Hadoop is concerned, Twitter just released a library specifically for Hadoop called Scalding but as long as the library is written for the JVM, it should work with either language.

What are the differences between Scala middleware choices?

Note: Unfortunately this question was closed, but I'm trying to maintain it for if someone else comes along with the same question.
I've been looking for a good solution to developing a service in Scala that will sit between mobile devices and existing web services.
The current list of viable options are:
Finagle
Spray
BlueEyes
Akka
Play2 Mini
Unfiltered
Lift
Smoke
Scalatra
There are probably more options out there. How does one decide which one to use? What are the traits (excuse the pun ;-) of a good Scala middleware choice. On the one side, I would like to go for Akka, because it is part of the TypeSafe Scala stack, but on the other, something like Finagle has a rich set of libraries and makes plumbing so easy. Spray looks nice and simple to use.
Any advice, insights or experience would be greatly appreciated. I'm sure someone out there must have some experience with some of these that they won't mind sharing.
UPDATE:
I would love for this question to be reopened. A good answer to this question will help new Scalateers to avoid related pitfalls.
UPDATE 2:
These are my own experiences since asking this question:
Finagle - I used Finagle for a project and it's rock solid.
Spray - In my latest project I'm using Spray and I'm extremely happy. The latest releases are built on Akka 2 and you can run it directly with the Spray-can library which removes the need for a web server. Spray is a set of libraries, rather than a framework and is very modular. The Learn about Spray: REST on Akka video gives a great overview, and this blog at Cakesolutions shows a really nice development approach and architecture.
UPDATE 3:
Life moves pretty fast. If you don't stop and look around once in a while, you could miss it. - Ferris Bueller
These days the choice has become simpler. In my humble opinion Spray has won the battle. It is being integrated into Akka to become the next Akka HTTP. I have been using Spray now on multiple projects and can honestly say that it's fantastic and best supported software I have ever encountered.
This does not answer the initial question, but at least gives some indication on why Spray seems like the best choice in most cases. It is extremely flexible, non-blocking and very stable. It has both client-side and server-side libraries and a great testkit. Also, have a look at these stats to get an idea on performance: Web Framework Benchmarks

I personally started with spray a long time ago and tried everything else there was out there for Scala. While Scala, spray, akka, shapeless, and scalaz certainly have a bit of a learning curve, once you start digging in and really learning how you are supposed to use the technologies, they make sense and I immediately saw the benefits especially for the kind of work I'm doing right now.
Personally I think nothing really stands up to spray for building both servers, rest apis, http clients, and whatever else you want. What I love about spray is that they built with akka in mind. It may have been a really early project when I first started using it, but the architecture made sense. Those guys knew what they were doing in terms of exploiting the benefits of using an actor model and not having any blocking operations.
While actors might take a bit getting used to, I do like them. They have made my systems very scalable and cheap to run because I don't need as beefy hardware as in the past. Plus, spray has that spray-routing DSL so making a rest api is relatively simple as long as you follow the rules ... don't block. That of course means don't go and pull in apache commons http client to make client requests from the api or actors because you will be going back to blocking models.
So far I am very happy with spray, typesafe, and akka. Their models just naturally lend themselves to building very resilient systems that come back up on their own if anything should happen and you take a fail-fast approach. The one beef that I have with spray (and it's not spray's fault) is the damn IDE support for the routing DSL. I absolutely despise Eclipse and have always been an IDEA user. When I started using the Scala plugin, everything seemed ok. Then my routing dsl naturally evolved into way bigger beasts. Something about the way IDEA parses that code makes it shit its pants anytime it encounters anything with spray-routing or shapeless. It's to the point where it's unusable (I type 2-3 letters and have to wait 5 minutes to regain control).
So, for any spray-routing or heavy shapeless code, I fire up emacs with ensime, ensime-sbt, and scala-mode2. Now if I could only get a Cassandra library with the quality of astyanax and built using a more non-blocking architecture.

Here you can find a great list of scala resources with a brief description of all the alternatives you listed.
From my own experience, I use Scalatra and it is tiny, simple and effective for things like uri mapping and calling web services.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse