Ran into a piece of code that looks like the following:
chatRoom ? (Join("Robot")) map {
case Connected(robotChannel) =>
// Apply this Enumerator on the logger.
robotChannel |>> loggerIteratee
}
This comes from a sample Play Framework app. No idea what |>> is in this case.
It's an alias for apply on enumerators. Essentially what it's doing is attaching a data source (the channel, an enumerator here) to a data sink (the logger, which is an iteratee).
Iteratees can be a little tricky to wrap your head around at first, but there's a good introduction on the Play site.
Related
All of our backend services is written in Scala. We mostly write pure functional Scala using Cats.
I am trying to figure out if there is a design pattern in Cats or Scala in general that I can use to design a EventLogger.
This eventLogger should collects "events" (simple key values) as the request flows through the logic. At the end of the request, I want to write the collected events to data store. We already have a "context" implicit parameter that gets passed to all the methods. I could add this EventLogger to my Context class and it would have access to the event logger from most parts of my code. Now I am trying to figure out how to design the eventLogger itself, without using a mutable collection.
I have used to akka actors to collect states in the past to manage mutating states. I would prefer not to introduce Akka into our classpath just for this.
As #AndreyTyukin suggests, Writer would work well here.
You could do something like:
object EventLogger {
type Event = (String, String)
def log(event: Event): Writer[Vector[Event], Unit] =
Writer.tell(Vector(event))
}
And then you could use it like this:
// Example usage
for {
something <- Writer.value(myFunc(arg))
_ <- EventLogger.log("function_finished" -> "myFunc")
somethingElse <- Writer.value(myFunc2(arg2))
_ <- EventLogger.log("function_finished" -> "myFunc2")
} yield combine(something, somethingElse)
At the end of this, you'll have some kind of Writer[Vector[Event], ?] value where the ? might be a value you're interested in and the Vector[Event] is all of your Event log data ready for you to do something with.
Also usually Writer won't be the only container you want to use. You probably want to investigate Monad Transformers to stack up containers or something like Eff.
Are there some code examples of using org.reactivestreams libraries to process large data streams using Java NIO (for high performance)? I'm aiming at distributed processing, so examples using Akka would be best, but I can figure that out.
It still seems to be the case that most (I hope not all) examples of reading files in scala resort to Source (non-binary) or direct Java NIO (and even things like Files.readAllBytes!)
Perhaps there is an activator template I've missed? (Akka Streams with Scala! is close addressing everything I need except the binary/NIO side)
Do not use scala.collection.immutable.Stream to consume files like this, the reason being that it performs memoization - that is, while yes it is lazy it will keep the entire stream buffered (memoized) in memory!
This is definitely not what you want when you think about "stream processing a file". The reason Scala's Stream works like this is because in a functional setting it makes complete sense - you can avoid calculating fibbonachi numbers again and again easily thanks to this for example, for more details see the ScalaDoc.
Akka Streams provides Reactive Streams implementations and provides a FileIO class that you could use here (it will properly back-pressure and pull the data out of the file only when needed and the rest of the stream is ready to consume it):
import java.io._
import akka.actor.ActorSystem
import akka.stream.scaladsl.{ Sink, Source }
object ExampleApp extends App {
implicit val sys = ActorSystem()
implicit val mat = FlowMaterializer()
FileIO.fromPath(Paths.get("/example/file.txt"))
.map(c ⇒ { print(c); c })
.runWith(Sink.onComplete(_ ⇒ { f.close(); sys.shutdown() } ))
}
Here are more docs about working with IO with Akka Streams
Note that this is for the current-as-of writing version of Akka, so the 2.5.x series.
Hope this helps!
We actually use akka streams to process binary files. It was a little tricky to get things going as there wasn't any documentation around this, but this is what we came up with:
val binFile = new File(filePath)
val inputStream = new BufferedInputStream(new FileInputStream(binFile))
val binStream = Stream.continually(inputStream.read).takeWhile(-1 != _).map(_.toByte)
val binSource = Source(binStream)
Once you have binSource, which is an akka Source[Byte] you can go ahead and start applying whatever stream transformations (map, flatMap, transform, etc...) you want to it. This functionality leverages the Source companion object's apply that takes an Iterable, passing in a scala Stream that should read in the data lazily and make it available to your transforms.
EDIT
As Konrad pointed out in the comments section, a Stream can be an issue with large files due to the fact that it performs memoization of the elements it encounters as it's lazily building out the stream. This can lead to out of memory situations if you are not careful. However, if you look at the docs for Stream there is a tip for avoiding memoization building up in memory:
One must be cautious of memoization; you can very quickly eat up large
amounts of memory if you're not careful. The reason for this is that
the memoization of the Stream creates a structure much like
scala.collection.immutable.List. So long as something is holding on to
the head, the head holds on to the tail, and so it continues
recursively. If, on the other hand, there is nothing holding on to the
head (e.g. we used def to define the Stream) then once it is no longer
being used directly, it disappears.
So taking that into account, you could modify my original example as follows:
val binFile = new File(filePath)
val inputStream = new BufferedInputStream(new FileInputStream(binFile))
val binSource = Source(() => binStream(inputStream).iterator)
def binStream(in:BufferedInputStream) = Stream.continually(in.read).takeWhile(-1 != _).map(_.toByte)
So the idea here is to build the Stream via a def and not assign to a valand then immediately get the iterator from it and use that to initialize the Akka Source. Setting things up this way should avoid the issues with momoization. I ran the old code against a big file and was able to produce an OutOfMemory situation by doing a foreach on the Source. When I switched it over to the new code I was able to avoid this issue.
I have an Enumerator in Play Framework 2.1. I would like to have some code executed whenever that Enumerator produces a value.
The documentation is very difficult to understand, but it seems like I need to construct an Iteratee in order to do that, but I can't find actual code that does that anywhere. How do I do this?
Got it!
myEnumerator.run(Iteratee.foreach[TypeToIterateOver] { msg =>
println(msg)
})
The 3 descriptions of the Iteratee pattern in Scala that I've seen all include 3 cases for input. For example, from James:
sealed trait Input[+E]
object Input {
case object EOF extends Input[Nothing]
case object Empty extends Input[Nothing]
case class El[+E](e: E) extends Input[E]
}
More details see blogs by James, Runar, Josh.
My question is simply: why precisely is the Empty input case needed?
The iteratee pattern defines a relationship between a producer and a consumer of a stream of values. Intuitively, it seems that if any input is empty, the producer that "runs" the iteratee should simply collapse that empty item away, and not call the iteratee until non-empty input is available.
I note the pull-based analog of iteratees, the much more familiar iterators, do not define an empty case, although it's possible that elements have been filtered away "inside" the iterator.
trait Iterator[E] {
next: E // like El
hasNext: Boolean //like EOF
}
While all the above blogs mention the need for an Empty input in passing, but they don't discuss explicitly why it cannot be eliminated altogether. I notice the example iterators shown treat Empty input as a no-op.
I'd really like an example, with code, of a plausible "real-world-ish" problem that requires the Empty input message to solve.
Let's say you connect an enumerator that feeds some elements to the peek iteratee that looks at the first element and returns it but does not consume it, leaving it to be used by possibly another iteratee that will be composed with peek. Then you would want to provide a mechanism for peek to put back the element. From what I can tell from both Play and Scalaz iteratee, the done iteratee takes an argument just for this purpose. So you can do something like in pseudo code: done(Some(result), El(result)). See this implementation of peek.
Now if you implement something like head which will actually consume the element, then it feels like one way to do it is to return done(Some(result), emptyInput) to indicate that the input was consumed.
See also this comment in the playframework source code showing the second argument of Done(_, _) is for unused input and initialized as an empty default. So empty is not something seldom used for which it's hard to find real-world example. It is really key to the implementation of the iteratees. In fact it may be interesting to see which iteratee frameworks do not have empty and how they managed to implement peek and head.
James Roper gave a useful response here, including this snippet I found interesting:
I guess another way it could be implemented is to have Option[Input]
as the left over input for Done. This would make implementing
iteratees simpler since they wouldn't need to handle empty.
I am learning Iteratee and related APIs for one of my requirements to stream live tweets. Using Play 2.1 and Scala 2.10. Is following the best way to use Iteratee which also produces result of saving tweet to MongoDB?
val wsStream = new Enumerator[Array[Byte]] {
def apply[A](iteratee: Iteratee[Array[Byte], A]) = {
WS.url("https://stream.twitter.com/1.1/statuses/filter.json?track=" + term)
.sign(OAuthCalculator(Twitter.KEY, tokens))
.get(_ => iteratee)
}
}
wsStream.apply(Iteratee.foreach(bytes => saveTweetToMongo(bytes)))
Note you you can apply multiple iteratees to the same enumerator. In order words you can create a streamingTweetIteratee and a saveTweetToMongoIteratee and apply both to the enumerator which provides tweets.
I often create a simple loggingIteratee which just funnels everything to STDOUT when I'm prototyping in the REPL. I apply both it and the iteratee I'm writing to the same enumerator.
I'm assuming you want to using WebSockets in order to stream tweets to a client? If you look at the chat demo that comes with Play! you'll get an idea of how to go about that.