Play Iteratee for this use case - scala

I am learning Iteratee and related APIs for one of my requirements to stream live tweets. Using Play 2.1 and Scala 2.10. Is following the best way to use Iteratee which also produces result of saving tweet to MongoDB?
val wsStream = new Enumerator[Array[Byte]] {
def apply[A](iteratee: Iteratee[Array[Byte], A]) = {
WS.url("https://stream.twitter.com/1.1/statuses/filter.json?track=" + term)
.sign(OAuthCalculator(Twitter.KEY, tokens))
.get(_ => iteratee)
}
}
wsStream.apply(Iteratee.foreach(bytes => saveTweetToMongo(bytes)))

Note you you can apply multiple iteratees to the same enumerator. In order words you can create a streamingTweetIteratee and a saveTweetToMongoIteratee and apply both to the enumerator which provides tweets.
I often create a simple loggingIteratee which just funnels everything to STDOUT when I'm prototyping in the REPL. I apply both it and the iteratee I'm writing to the same enumerator.
I'm assuming you want to using WebSockets in order to stream tweets to a client? If you look at the chat demo that comes with Play! you'll get an idea of how to go about that.

Related

How to resolve - method using in object WebSocket is deprecated: Use accept with an Akka streams flow instead

I am trying to migrate my application from play 2.4 to 2.5 While migrating I am trying to remove the deprecated functions. One such warning that I am receiving is "method using in object WebSocket is deprecated: Use accept with an Akka streams flow instead"
Below is the function that I need to change, How do I refactor this code to use Akka Stream Flows.
def test = WebSocket.using[String] { request =>
val in = Iteratee.ignore[String]
val out = Enumerator("<response>Test</response>").andThen(Enumerator.eof)
Logger.warn("Some warning")
(in, out)
}
I tried going through How to refactor this code by using akka streams. but could not get much. I am new to scala and play :(
Any pointers or examples would be of great help.
Cheers!
You can refer to the Play documentation to implement Akka Streams flow. That's quite straightforward.

How to use Reactive Streams for NIO binary processing?

Are there some code examples of using org.reactivestreams libraries to process large data streams using Java NIO (for high performance)? I'm aiming at distributed processing, so examples using Akka would be best, but I can figure that out.
It still seems to be the case that most (I hope not all) examples of reading files in scala resort to Source (non-binary) or direct Java NIO (and even things like Files.readAllBytes!)
Perhaps there is an activator template I've missed? (Akka Streams with Scala! is close addressing everything I need except the binary/NIO side)
Do not use scala.collection.immutable.Stream to consume files like this, the reason being that it performs memoization - that is, while yes it is lazy it will keep the entire stream buffered (memoized) in memory!
This is definitely not what you want when you think about "stream processing a file". The reason Scala's Stream works like this is because in a functional setting it makes complete sense - you can avoid calculating fibbonachi numbers again and again easily thanks to this for example, for more details see the ScalaDoc.
Akka Streams provides Reactive Streams implementations and provides a FileIO class that you could use here (it will properly back-pressure and pull the data out of the file only when needed and the rest of the stream is ready to consume it):
import java.io._
import akka.actor.ActorSystem
import akka.stream.scaladsl.{ Sink, Source }
object ExampleApp extends App {
implicit val sys = ActorSystem()
implicit val mat = FlowMaterializer()
FileIO.fromPath(Paths.get("/example/file.txt"))
.map(c ⇒ { print(c); c })
.runWith(Sink.onComplete(_ ⇒ { f.close(); sys.shutdown() } ))
}
Here are more docs about working with IO with Akka Streams
Note that this is for the current-as-of writing version of Akka, so the 2.5.x series.
Hope this helps!
We actually use akka streams to process binary files. It was a little tricky to get things going as there wasn't any documentation around this, but this is what we came up with:
val binFile = new File(filePath)
val inputStream = new BufferedInputStream(new FileInputStream(binFile))
val binStream = Stream.continually(inputStream.read).takeWhile(-1 != _).map(_.toByte)
val binSource = Source(binStream)
Once you have binSource, which is an akka Source[Byte] you can go ahead and start applying whatever stream transformations (map, flatMap, transform, etc...) you want to it. This functionality leverages the Source companion object's apply that takes an Iterable, passing in a scala Stream that should read in the data lazily and make it available to your transforms.
EDIT
As Konrad pointed out in the comments section, a Stream can be an issue with large files due to the fact that it performs memoization of the elements it encounters as it's lazily building out the stream. This can lead to out of memory situations if you are not careful. However, if you look at the docs for Stream there is a tip for avoiding memoization building up in memory:
One must be cautious of memoization; you can very quickly eat up large
amounts of memory if you're not careful. The reason for this is that
the memoization of the Stream creates a structure much like
scala.collection.immutable.List. So long as something is holding on to
the head, the head holds on to the tail, and so it continues
recursively. If, on the other hand, there is nothing holding on to the
head (e.g. we used def to define the Stream) then once it is no longer
being used directly, it disappears.
So taking that into account, you could modify my original example as follows:
val binFile = new File(filePath)
val inputStream = new BufferedInputStream(new FileInputStream(binFile))
val binSource = Source(() => binStream(inputStream).iterator)
def binStream(in:BufferedInputStream) = Stream.continually(in.read).takeWhile(-1 != _).map(_.toByte)
So the idea here is to build the Stream via a def and not assign to a valand then immediately get the iterator from it and use that to initialize the Akka Source. Setting things up this way should avoid the issues with momoization. I ran the old code against a big file and was able to produce an OutOfMemory situation by doing a foreach on the Source. When I switched it over to the new code I was able to avoid this issue.

What is the |>> symbol in Scala

Ran into a piece of code that looks like the following:
chatRoom ? (Join("Robot")) map {
case Connected(robotChannel) =>
// Apply this Enumerator on the logger.
robotChannel |>> loggerIteratee
}
This comes from a sample Play Framework app. No idea what |>> is in this case.
It's an alias for apply on enumerators. Essentially what it's doing is attaching a data source (the channel, an enumerator here) to a data sink (the logger, which is an iteratee).
Iteratees can be a little tricky to wrap your head around at first, but there's a good introduction on the Play site.

Scala iteratee to write to a file

I have a method save that takes an Iteratee it and saves some data to it. Inside the method, the data is available as an enumerator producing byte-array chunks.
def save[E](consumer: Iteratee[Array[Byte], E]): Future[E] = {
val producer: Enumerator[Array[Byte]] = // ...
Iteratee.flatten(producer(consumer)).run
}
Wanted: Call save in order to have it write the data to a FileOutputStream.
I tried the following but am not sure whether this is the way to go:
def writeToStream(s: OutputStream) =
Iteratee.foreach((e: Array[Byte]) => s.write(e)).
mapDone(r => { s.close(); r })
save(writeToStream(new FileOutputStream(myFile)))
Question: Is this the way it's supposed to be done? I fear that this will not always close the stream (case of exceptions).
I am using the Play Framework Iteratee library from Play Framework 2.1 (which uses Scala futures).
The scaladocs for Iteratee say that it is in the responsibility of the "producer" not the iteratee to handle resources:
The Iteratee does not do any resource management (such as closing streams); the producer pushing stuff into the Iteratee has that responsibility.
You might be successful using the "onDoneEnumerating" method in Enumerator to clean up resources afterwards.
Scaladoc Iteratee

Implementing long polling in scala and play 2.0 with akka

I'm implementing long polling in Play 2.0 in potentially a distributed environment. The way I understand it is that when Play gets a request, it should suspend pending notification of an update then go to the db to fetch new data and repeat. I started looking at the chat example that Play 2.0 offers but it's in websocket. Furthermore it doesn't look like it's capable of being distributed. So I thought I will use Akka's event bus. I took the eventstream implementation and replicated my own with LookupClassification. However I'm stumped as to how I'm gonna get a message back (or for that matter, what should be the subscriber instead of ActorRef)?
EventStream implementation:
https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/event/EventStream.scala
I am not sure that is what you are looking for, but there is quite a simple solution in the comet-clock sample, that you can adapt to use AKKA actors. It uses an infinite iframe instead of long polling. I have used an adapted version for a more complex application doing multiple DB calls and long computation in AKKA actors and it works fine.
def enum = Action {
//get your actor
val myActorRef = Akka.system.actorOf(Props[TestActor])
//do some query to your DB here. Promise.timeout is to simulate a blocking call
def getDatabaseItem(id: Int): Promise[String] = { Promise.timeout("test", 10 milliseconds) }
//test iterator, you will want something smarter here
val items1 = 1 to 10 toIterator
// this is a very simple enumerator that takes ints from an existing iterator (for an http request parameters for instance) and do some computations
def myEnum(it: Iterator[Int]): Enumerator[String] = Enumerator.fromCallback[String] { () =>
if (!items1.hasNext)
Promise.pure[Option[String]](None) //we are done with our computations
else {
// get the next int, query the database and compose the promise with a further query to the AKKA actor
getDatabaseItem(items1.next).flatMap { dbValue =>
implicit val timeout = new Timeout(10 milliseconds)
val future = (myActorRef ? dbValue) mapTo manifest[String]
// here we convert the AKKA actor to the right Promise[Option] output
future.map(v => Some(v)).asPromise
}
}
}
// finally we stream the result to the infinite iframe.
// console.log is the javascript callback, you will want something more interesting.
Ok.stream(myEnum(items1) &> Comet(callback = "console.log"))
}
Note that this fromCallback doesn't allow you to combine enumerators with "andThen", there is in the trunk version of play2 a generateM method that might be more appropriate if you want to use combinations.
It's not long polling, but it works fine.
I stumbled on your question while looking for the same thing.
I found the streaming solution unsatisfying as they caused "spinner of death" in webkit browser (i.e. shows it is loading all the time)
Anyhow, didn't have any luck finding good examples but I managed to create my own proof-of-concept using promises:
https://github.com/kallebertell/longpoll