How can I integrate MongoDB Scala Async driver with Akka Streams? - mongodb

I'm migrating my old Casbah Mongo drivers to the new async Scala drivers and I'm trying to use this in an Akka stream, and the stream is getting stuck.
I have a GraphStage with createLogic() defined. The code is below. This worked fine with Casbah and I'd hoped the async nature of the new mongo drivers would be a great fit, but here what happens...
If I stream 2 records through this code, the first record flows through and triggers the next step. See output below ('HERE IN SEND' confirms it got through). The second record seems to go through the right steps in BlacklistFilter but Akka never flows to the SEND step.
Any ideas why this is not working with the new drivers?
object BlacklistFilter {
type FilterShape = FanOutShape2[QM[RenderedExpression], QM[RenderedExpression], QM[Unit]]
}
import BlacklistFilter._
case class BlacklistFilter(facilities: Facilities, helloConfig: HelloConfig)(implicit asys: ActorSystem) extends GraphStage[FilterShape] {
val outPass: Outlet[QM[RenderedExpression]] = Outlet("Pass")
val outFail: Outlet[QM[Unit]] = Outlet("Fail")
val reIn: Inlet[QM[RenderedExpression]] = Inlet("Command")
override val shape: FilterShape = new FanOutShape2(reIn, outPass, outFail)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
override def preStart(): Unit = pull(reIn)
setHandler(reIn, new InHandler {
override def onPush(): Unit = {
val cmd = grab(reIn)
val re: RenderedExpression = cmd.body
val check = re.recipient.contacts(re.media).toString
// NEW NON-BLOCKING CODE
//-------------------------------------
facilities.withMongo(helloConfig.msgDB, helloConfig.blacklistColl) { coll =>
var found: Option[Document] = None
coll.find(Document("_id" -> check)).first().subscribe(
(doc: Document) => {
found = Some(doc)
println("BLACKLIST FAIL! " + check)
emit(outFail, cmd)
// no pull() here as this happens on complete below
},
(e: Throwable) => {
// Log something here!
emit(outFail, cmd)
pull(reIn)
},
() => {
if (found.isEmpty) {
println("BLACKLIST OK. " + check)
emit(outPass, cmd)
}
pull(reIn)
println("Pulled reIn...")
}
)
}
// OLD BLOCKING CASBAH CODE THAT WORKED
//-------------------------------------
// await(facilities.mongoAccess().mongo(helloConfig.msgDB, helloConfig.blacklistColl)(_.findOne(MongoDBObject("_id" -> check)))) match {
// case Some(_) => emit(outFail, cmd)
// case None => emit(outPass, cmd)
// }
// pull(reIn)
}
override def onUpstreamFinish(): Unit = {} // necessary for some reason!
})
setHandler(outPass, eagerTerminateOutput)
setHandler(outFail, eagerTerminateOutput)
}
}
Output:
BLACKLIST OK. jsmith#yahoo.com
Pulled reIn...
HERE IN SEND (TemplateRenderedExpression)!!!
ACK!
BLACKLIST OK. 919-919-9119
Pulled reIn...
You can see from the output that the first record flowed nicely to the SEND/ACK steps. The second record printed the BLACKLIST message, meaning it emitted outPass then called pull on reIn... but then nothing happens downstream.
Anyone know why this would work differently in Akka Streams than the Casbah version that worked fine (code shown commented out)?
(I could just convert the Mongo call to a Future and Await on it, and that should work like the old code, but that kinda defeats the whole point of going async!)

Well then... "never mind"! :-)
The code above seemed like it should work. I then noticed the Akka guys have just released a new version (2.0.1). I'm not sure what tweaks lay within, but whatever it was, the code above now works as I'd hoped w/o modification.
Left this post up just in case anyone hits a similar problem.

Related

Why a Thread.sleep or closing the connection is required after waiting for a remove call to complete?

I'm again seeking you to share your wisdom with me, the scala padawan!
I'm playing with reactive mongo in scala and while I was writting a test using scalatest, I faced the following issue.
First the code:
"delete" when {
"passing an existent id" should {
"succeed" in {
val testRecord = TestRecord(someString)
Await.result(persistenceService.persist(testRecord), Duration.Inf)
Await.result(persistenceService.delete(testRecord.id), Duration.Inf)
Thread.sleep(1000) // Why do I need that to make the test succeeds?
val thrownException = intercept[RecordNotFoundException] {
Await.result(persistenceService.read(testRecord.id), Duration.Inf)
}
thrownException.getMessage should include(testRecord._id.toString)
}
}
}
And the read and delete methods with the code initializing connection to db (part of the constructor):
class MongoPersistenceService[R](url: String, port: String, databaseName: String, collectionName: String) {
val driver = MongoDriver()
val parsedUri: Try[MongoConnection.ParsedURI] = MongoConnection.parseURI("%s:%s".format(url, port))
val connection: Try[MongoConnection] = parsedUri.map(driver.connection)
val mongoConnection = Future.fromTry(connection)
def db: Future[DefaultDB] = mongoConnection.flatMap(_.database(databaseName))
def collection: Future[BSONCollection] = db.map(_.collection(collectionName))
def read(id: BSONObjectID): Future[R] = {
val query = BSONDocument("_id" -> id)
val readResult: Future[R] = for {
coll <- collection
record <- coll.find(query).requireOne[R]
} yield record
readResult.recover {
case NoSuchResultException => throw RecordNotFoundException(id)
}
}
def delete(id: BSONObjectID): Future[Unit] = {
val query = BSONDocument("_id" -> id)
// first read then call remove. Read will throw if not present
read(id).flatMap { (_) => collection.map(coll => coll.remove(query)) }
}
}
So to make my test pass, I had to had a Thread.sleep right after waiting for the delete to complete. Knowing this is evil usually punished by many whiplash, I want learn and find the proper fix here.
While trying other stuff, I found instead of waiting, entirely closing the connection to the db was also doing the trick...
What am I misunderstanding here? Should a connection to the db be opened and close for each call to it? And not do many actions like adding, removing, updating records with one connection?
Note that everything works fine when I remove the read call in my delete function. Also by closing the connection, I mean call close on the MongoDriver from my test and also stop and start again embed Mongo which I'm using in background.
Thanks for helping guys.
Warning: this is a blind guess, I've no experience with MongoDB on Scala.
You may have forgotten to flatMap
Take a look at this bit:
collection.map(coll => coll.remove(query))
Since collection is Future[BSONCollection] per your code and remove returns Future[WriteResult] per doc, so actual type of this expression is Future[Future[WriteResult]].
Now, you have annotated your function as returning Future[Unit]. Scala often makes Unit as a return value by throwing away possibly meaningful values, which it does in your case:
read(id).flatMap { (_) =>
collection.map(coll => {
coll.remove(query) // we didn't wait for removal
() // before returning unit
})
}
So your code should probably be
read(id).flatMap(_ => collection.flatMap(_.remove(query).map(_ => ())))
Or a for-comprehension:
for {
_ <- read(id)
coll <- collection
_ <- coll.remove(query)
} yield ()
You can make Scala warn you about discarded values by adding a compiler flag (assuming SBT):
scalacOptions += "-Ywarn-value-discard"

Akka-Streams ActorPublisher does not receive any Request messages

I am trying to continuously read the wikipedia IRC channel using this lib: https://github.com/implydata/wikiticker
I created a custom Akka Publisher, which will be used in my system as a Source.
Here are some of my classes:
class IrcPublisher() extends ActorPublisher[String] {
import scala.collection._
var queue: mutable.Queue[String] = mutable.Queue()
override def receive: Actor.Receive = {
case Publish(s) =>
println(s"->MSG, isActive = $isActive, totalDemand = $totalDemand")
queue.enqueue(s)
publishIfNeeded()
case Request(cnt) =>
println("Request: " + cnt)
publishIfNeeded()
case Cancel =>
println("Cancel")
context.stop(self)
case _ =>
println("Hm...")
}
def publishIfNeeded(): Unit = {
while (queue.nonEmpty && isActive && totalDemand > 0) {
println("onNext")
onNext(queue.dequeue())
}
}
}
object IrcPublisher {
case class Publish(data: String)
}
I am creating all this objects like so:
def createSource(wikipedias: Seq[String]) {
val dataPublisherRef = system.actorOf(Props[IrcPublisher])
val dataPublisher = ActorPublisher[String](dataPublisherRef)
val listener = new MessageListener {
override def process(message: Message) = {
dataPublisherRef ! Publish(Jackson.generate(message.toMap))
}
}
val ticker = new IrcTicker(
"irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(listener)
)
ticker.start() // if I comment this...
Thread.currentThread().join() //... and this I get Request(...)
Source.fromPublisher(dataPublisher)
}
So the problem I am facing is this Source object. Although this implementation works well with other sources (for example from local file), the ActorPublisher don't receive Request() messages.
If I comment the two marked lines I can see, that my actor has received the Request(count) message from my flow. Otherwise all messages will be pushed into the queue, but not in my flow (so I can see the MSG messages printed).
I think it's something with multithreading/synchronization here.
I am not familiar enough with wikiticker to solve your problem as given. One question I would have is: why is it necessary to join to the current thread?
However, I think you have overcomplicated the usage of Source. It would be easier for you to work with the stream as a whole rather than create a custom ActorPublisher.
You can use Source.actorRef to materialize a stream into an ActorRef and work with that ActorRef. This allows you to utilize akka code to do the enqueing/dequeing onto the buffer while you can focus on the "business logic".
Say, for example, your entire stream is only to filter lines above a certain length and print them to the console. This could be accomplished with:
def dispatchIRCMessages(actorRef : ActorRef) = {
val ticker =
new IrcTicker("irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(new MessageListener {
override def process(message: Message) =
actorRef ! Publish(Jackson.generate(message.toMap))
}))
ticker.start()
Thread.currentThread().join()
}
//these variables control the buffer behavior
val bufferSize = 1024
val overFlowStrategy = akka.stream.OverflowStrategy.dropHead
val minMessageSize = 32
//no need for a custom Publisher/Queue
val streamRef =
Source.actorRef[String](bufferSize, overFlowStrategy)
.via(Flow[String].filter(_.size > minMessageSize))
.to(Sink.foreach[String](println))
.run()
dispatchIRCMessages(streamRef)
The dispatchIRCMessages has the added benefit that it will work with any ActorRef so you aren't required to only work with streams/publishers.
Hopefully this solves your underlying problem...
I think the main problem is Thread.currentThread().join(). This line will 'hang' current thread because this thread is waiting for himself to die. Please read https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#join-long- .

Interact (i/o) with an external process in Scala

I'm looking for a simple way to start an external process and then write strings to its input and read its output.
In Python, this works:
mosesProcess = subprocess.Popen([mosesBinPath, '-f', mosesModelPath], stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE);
# ...
mosesProcess.stdin.write(aRequest);
mosesAnswer = mosesProcess.stdout.readline().rstrip();
# ...
mosesProcess.stdin.write(anotherRequest);
mosesAnswer = mosesProcess.stdout.readline().rstrip();
# ...
mosesProcess.stdin.close();
I think in Scala this should be done with scala.sys.process.ProcessBuilder and scala.sys.process.ProcessIO but I don't get how they work (especially the latter).
EDIT:
I have tried things like:
val inputStream = new scala.concurrent.SyncVar[java.io.OutputStream];
val outputStream = new scala.concurrent.SyncVar[java.io.InputStream];
val errStream = new scala.concurrent.SyncVar[java.io.InputStream];
val cmd = "myProc";
val pb = process.Process(cmd);
val pio = new process.ProcessIO(stdin => inputStream.put(stdin),
stdout => outputStream.put(stdout),
stderr => errStream.put(stderr));
pb.run(pio);
inputStream.get.write(("request1" + "\n").getBytes);
println(outputStream.get.read); // It is blocked here
inputStream.get.write(("request2" + "\n").getBytes);
println(outputStream.get.read);
inputStream.get.close()
But the execution gets stuck.
Granted, attrib below is not a great example on the write side of things. I have an EchoServer that would input/output
import scala.sys.process._
import java.io._
object EchoClient{
def main(args: Array[String]) {
var bContinue=true
var cmd="C:\\\\windows\\system32\\attrib.exe"
println(cmd)
val process = Process (cmd)
val io = new ProcessIO (
writer,
out => {scala.io.Source.fromInputStream(out).getLines.foreach(println)},
err => {scala.io.Source.fromInputStream(err).getLines.foreach(println)})
while (bContinue) {
process run io
var answer = readLine("Run again? (y/n)? ")
if (answer=="n" || answer=="N")
bContinue=false
}
}
def reader(input: java.io.InputStream) = {
// read here
}
def writer(output: java.io.OutputStream) = {
// write here
//
}
// TODO: implement an error logger
}
output below :
C:\\windows\system32\attrib.exe
A C:\dev\EchoClient$$anonfun$1.class
A C:\dev\EchoClient$$anonfun$2$$anonfun$apply$1.class
A C:\dev\EchoClient$$anonfun$2.class
A C:\dev\EchoClient$$anonfun$3$$anonfun$apply$2.class
A C:\dev\EchoClient$$anonfun$3.class
A C:\dev\EchoClient$.class
A C:\dev\EchoClient.class
A C:\dev\EchoClient.scala
A C:\dev\echoServer.bat
A C:\dev\EchoServerChg$$anonfun$main$1.class
A C:\dev\EchoServerChg$.class
A C:\dev\EchoServerChg.class
A C:\dev\EchoServerChg.scala
A C:\dev\ScannerTest$$anonfun$main$1.class
A C:\dev\ScannerTest$.class
A C:\dev\ScannerTest.class
A C:\dev\ScannerTest.scala
Run again? (y/n)?
Scala API for ProcessIO:
new ProcessIO(in: (OutputStream) ⇒ Unit, out: (InputStream) ⇒ Unit, err: (InputStream) ⇒ Unit)
I suppose you should provide at least two arguments, 1 outputStream function (writing to the process), 1 inputStream function (reading from the process).
For instance:
def readJob(in: InputStream) {
// do smthing with in
}
def writeJob(out: OutputStream) {
// do somthing with out
}
def errJob(err: InputStream) {
// do smthing with err
}
val process = new ProcessIO(writeJob, readJob, errJob)
Please keep in mind that the streams are Java streams so you will have to check Java API.
Edit: the package page provides examples, maybe you could take a look at them.
ProcessIO is the way to go for low level control and input and output interaction. There even is an often overlooked helper object BasicIO that assists with creating common ProcessIO instances for reading, connecting in/out streams with utility functions. You can look at the source for BasicIO.scala to see what it is doing internally in creating the ProcessIO Instances.
You can sometimes find inspiration from test cases or tools created for the class itself by the project. In the case of Scala, have a look at the source on GitHub. We are fortunate in that there is a detailed example of ProcessIO being used for the scala GraphViz Dot process runner DotRunner.scala!

Play / Logging / Print Response Body / Run over enumerator / buffer the body

I'm looking for a way to print the response body in Play framework, I have a code like this:
object AccessLoggingAction extends ActionBuilder[Request] {
def invokeBlock[A](request: Request[A], block: (Request[A]) => Future[Result]) = {
Logger.info(s"""Request:
id=${request.id}
method=${request.method}
uri=${request.uri}
remote-address=${request.remoteAddress}
body=${request.body}
""")
val ret = block(request)
/*
ret.map {result =>
Logger.info(s"""Response:
id=${request.id}
body=${result.body}
""")
}
*/ //TODO: find out how to print result.body (be careful not to consume the enumerator)
ret
}
}
Currently the commented-out code is not working as I wanted, I mean, it would print:
Response:
id=1
body=play.api.libs.iteratee.Enumerator$$anon$18#39e6c1a2
So, I need to find a way to get a String out of Enumerator[Array[Byte]]. I tried to grasp the concept of Enumerator by reading this: http://mandubian.com/2012/08/27/understanding-play2-iteratees-for-normal-humans/
So..., if I understand it correctly:
I shouldn't dry-up the enumerator in the process of converting it to String. Otherwise, the client would receive nothing.
Let's suppose I figure out how to implement the T / filter mechanism. But then... wouldn't it defeat the purpose of Play framework as non-blocking streaming framework (because I would be building up the complete array of bytes in the memory, before calling toString on it, and finally log it)?
So, what's the correct way to log the response?
Thanks in advance,
Raka
This code works:
object AccessLoggingAction extends ActionBuilder[Request] {
def invokeBlock[A](request: Request[A], block: (Request[A]) => Future[Result]) = {
val start = System.currentTimeMillis
Logger.info(s"""Request:
id=${request.id}
method=${request.method}
uri=${request.uri}
remote-address=${request.remoteAddress}
body=${request.body}
""")
val resultFut = block(request)
resultFut.map {result =>
val time = System.currentTimeMillis - start
Result(result.header, result.body &> Enumeratee.map(arrOfBytes => {
val body = new String(arrOfBytes.map(_.toChar))
Logger.info(s"""Response:
id=${request.id}
method=${request.method}
uri=${request.uri}
delay=${time}ms
status=${result.header.status}
body=${body}""")
arrOfBytes
}), result.connection)
}
}
}
I partly learned it from here (on how to get the byte array out of enumerator): Scala Play 2.1: Accessing request and response bodies in a filter.
I'm using Play 2.3.7 while the link I gave uses 2.1 (and still uses PlainResult, which no longer exists in 2.3).
As it appears to me, if you do logging inside result.body &> Enumeratee.map (as suggested in https://stackoverflow.com/a/27630208/1781549) and the result body is presented in more than one chunk, then each chunk will be logged independently. You probably don't want this.
I'd implement it like this:
val ret = block(request).flatMap { result =>
val consume = Iteratee.consume[Array[Byte]]()
val bodyF = Iteratee.flatten(result.body(consume)).run
bodyF.map { bodyBytes: Array[Byte] =>
//
// Log the body
//
result.copy(body = Enumerator(bodyBytes))
}
}
But be warned: the whole idea of this is to consume all the data from the result.body Enumerator before logging (and return the new Enumerator). So, if the response is big, or you rely on streaming, then it's probably also the thing you don't want.
I used the above answer as a starting point, but noticed that it will only log responses if a body is present. We've adapted it to this:
var responseBody = None:Option[String]
val captureBody = Enumeratee.map[Array[Byte]](arrOfBytes => {
val body = new String(arrOfBytes.map(_.toChar))
responseBody = Some(body)
arrOfBytes
})
val withLogging = (result.body &> captureBody).onDoneEnumerating({
logger.debug(.. create message here ..)
})
result.copy(body=withLogging)

Is it OK to use blocking actor messages when they are wrapped in a future?

My current application is based on akka 1.1. It has multiple ProjectAnalysisActors each responsible for handling analysis tasks for a specific project. The analysis is started when such an actor receives a generic start message. After finishing one step it sends itself a message with the next step as long one is defined. The executing code basically looks as follows
sealed trait AnalysisEvent {
def run(project: Project): Future[Any]
def nextStep: AnalysisEvent = null
}
case class StartAnalysis() extends AnalysisEvent {
override def run ...
override def nextStep: AnalysisEvent = new FirstStep
}
case class FirstStep() extends AnalysisEvent {
override def run ...
override def nextStep: AnalysisEvent = new SecondStep
}
case class SecondStep() extends AnalysisEvent {
...
}
class ProjectAnalysisActor(project: Project) extends Actor {
def receive = {
case event: AnalysisEvent =>
val future = event.run(project)
future.onComplete { f =>
self ! event.nextStep
}
}
}
I have some difficulties how to implement my code for the run-methods for each analysis step. At the moment I create a new future within each run-method. Inside this future I send all follow-up messages into the different subsystems. Some of them are non-blocking fire-and-forget messages, but some of them return a result which should be stored before the next analysis step is started.
At the moment a typical run-method looks as follows
def run(project: Project): Future[Any] = {
Future {
progressActor ! typicalFireAndForget(project.name)
val calcResult = (calcActor1 !! doCalcMessage(project)).getOrElse(...)
val p: Project = ... // created updated project using calcResult
val result = (storage !! updateProjectInformation(p)).getOrElse(...)
result
}
}
Since those blocking messages should be avoided, I'm wondering if this is the right way. Does it make sense to use them in this use case or should I still avoid it? If so, what would be a proper solution?
Apparently the only purpose of the ProjectAnalysisActor is to chain future calls. Second, the runs methods seems also to wait on results to continue computations.
So I think you can try refactoring your code to use Future Composition, as explained here: http://akka.io/docs/akka/1.1/scala/futures.html
def run(project: Project): Future[Any] = {
progressActor ! typicalFireAndForget(project.name)
for(
calcResult <- calcActor1 !!! doCalcMessage(project);
p = ... // created updated project using calcResult
result <- storage !!! updateProjectInformation(p)
) yield (
result
)
}