I'm trying to implement a queue of bundles using chisel
class element extends Bundle{
val data=UInt(32.W)
class testQueue extends Module
val io = IO(new Bundle {
val in = Flipped(Decoupled(new element))
val out = Decoupled(new element)
val queue = Queue(io.in, 16)
io.out <> queue
when i use the test api on chisel bootcamp
and the test code is:
test(new testQueue()){ c =>
// Example testsequence showing the use and behavior of Queue
val testVector = Seq.tabulate(100){ i => {
val s=Wire(new element())
s.data :=1.U
the error is :
chisel3.internal.ChiselException: Error: Not in a UserModule. Likely cause: Missed Module() wrap, bare chisel API call, or attempting to construct hardware inside a BlackBox.
when I remove "wire" from the last code block
it says
chisel3.package$ExpectedHardwareException: data to be connected 'UInt<32>' must be hardware, not a bare Chisel type. Perhaps you forgot to wrap it in Wire(_) or IO(_)?
so what should i write to test this testqueue module using the api enqueueSeq and expectDequeueSeq
I'm trying to understand how serialization in the case of a self constructed case class and a parser in a separate object works -- and I fail.
I tried to boil down the problem to:
parsing a string into case classes
constructing an RDD from those
taking the first element in order to print it
case class article(title: String, text: String) extends Serializable {
override def toString = title + s"/" + text
object parser {
def parse(line: String): article = {
val subs = "</end>"
val i = line.indexOf(subs)
val title = line.substring(6, i)
val text = line.substring(i + subs.length, line.length)
article(title, text)
val text = """"<beg>Title1</end>Text 1"
"<beg>Title2</end>Text 2"
val lines = text.split('\n')
val res = lines.map( line => parser.parse(line) )
val rdd = sc.parallelize(res)
rdd.take(1).map( println )
I get a
Job aborted due to stage failure: Failed to serialize task, not attempting to retry it. Exception during serialization: java.io.NotSerializableException
Can a gifted Scala expert please help me -- just that I understand the interaction of serialization in workers and master -- how to fix the parser / article interaction such that serialization works?
Thank you very much.
In your map function from lines.map( line => parser.parse(line) ) you call parser.parse and parser it's your object which is not serializable. Spark internally uses partitions which are spread across the cluster. The map functions will be called on each partitions. Because the partitions are not on the same JVM process, the function that is called on each partition needs to be serializable, that is why your object parser has to obey the rule.
I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
case (result, promise) => promise.success(result)
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
}.toOption.map(UserAgentData(userAgent, _))
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
It's possible to create sources and sinks from actors using Source.actorPublisher() and Sink.actorSubscriber() methods respectively. But is it possible to create a Flow from actor?
Conceptually there doesn't seem to be a good reason not to, given that it implements both ActorPublisher and ActorSubscriber traits, but unfortunately, the Flow object doesn't have any method for doing this. In this excellent blog post it's done in an earlier version of Akka Streams, so the question is if it's possible also in the latest (2.4.9) version.
I'm part of the Akka team and would like to use this question to clarify a few things about the raw Reactive Streams interfaces. I hope you'll find this useful.
Most notably, we'll be posting multiple posts on the Akka team blog about building custom stages, including Flows, soon, so keep an eye on it.
Don't use ActorPublisher / ActorSubscriber
Please don't use ActorPublisher and ActorSubscriber. They're too low level and you might end up implementing them in such a way that's violating the Reactive Streams specification. They're a relict of the past and even then were only "power-user mode only". There really is no reason to use those classes nowadays. We never provided a way to build a flow because the complexity is simply explosive if it was exposed as "raw" Actor API for you to implement and get all the rules implemented correctly.
If you really really want to implement raw ReactiveStreams interfaces, then please do use the Specification's TCK to verify your implementation is correct. You will likely be caught off guard by some of the more complex corner cases a Flow (or in RS terminology a Processor has to handle).
Most operations are possible to build without going low-level
Many flows you should be able to simply build by building from a Flow[T] and adding the needed operations onto it, just as an example:
val newFlow: Flow[String, Int, NotUsed] = Flow[String].map(_.toInt)
Which is a reusable description of the Flow.
Since you're asking about power user mode, this is the most powerful operator on the DSL itself: statefulFlatMapConcat. The vast majority of operations operating on plain stream elements is expressable using it: Flow.statefulMapConcat[T](f: () ⇒ (Out) ⇒ Iterable[T]): Repr[T].
If you need timers you could zip with a Source.timer etc.
GraphStage is the simplest and safest API to build custom stages
Instead, building Sources/Flows/Sinks has its own powerful and safe API: the GraphStage. Please read the documentation about building custom GraphStages (they can be a Sink/Source/Flow or even any arbitrary shape). It handles all of the complex Reactive Streams rules for you, while giving you full freedom and type-safety while implementing your stages (which could be a Flow).
For example, taken from the docs, is an GraphStage implementation of the filter(T => Boolean) operator:
class Filter[A](p: A => Boolean) extends GraphStage[FlowShape[A, A]] {
val in = Inlet[A]("Filter.in")
val out = Outlet[A]("Filter.out")
val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
setHandler(in, new InHandler {
override def onPush(): Unit = {
val elem = grab(in)
if (p(elem)) push(out, elem)
else pull(in)
setHandler(out, new OutHandler {
override def onPull(): Unit = {
It also handles asynchronous channels and is fusable by default.
In addition to the docs, these blog posts explain in detail why this API is the holy grail of building custom stages of any shape:
Akka team blog: Mastering GraphStages (part 1, introduction) - a high level overview
... tomorrow we'll publish one about it's API as well...
Kunicki blog: Implementing a Custom Akka Streams Graph Stage - another example implementing sources (really applies 1:1 to building Flows)
Konrad's solution demonstrates how to create a custom stage that utilizes Actors, but in most cases I think that is a bit overkill.
Usually you have some Actor that is capable of responding to questions:
val actorRef : ActorRef = ???
type Input = ???
type Output = ???
val queryActor : Input => Future[Output] =
(actorRef ? _) andThen (_.mapTo[Output])
This can be easily utilized with basic Flow functionality which takes in the maximum number of concurrent requests:
val actorQueryFlow : Int => Flow[Input, Output, _] =
(parallelism) => Flow[Input].mapAsync[Output](parallelism)(queryActor)
Now actorQueryFlow can be integrated into any stream...
Here is a solution build by using a graph stage. The actor has to acknowledge all messages in order to have back-pressure. The actor is notified when the stream fails/completes and the stream fails when the actor terminates.
This can be useful if you don't want to use ask, e.g. when not every input message has a corresponding output message.
import akka.actor.{ActorRef, Status, Terminated}
import akka.stream._
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler, OutHandler}
object ActorRefBackpressureFlowStage {
case object StreamInit
case object StreamAck
case object StreamCompleted
case class StreamFailed(ex: Throwable)
case class StreamElementIn[A](element: A)
case class StreamElementOut[A](element: A)
* Sends the elements of the stream to the given `ActorRef` that sends back back-pressure signal.
* First element is always `StreamInit`, then stream is waiting for acknowledgement message
* `ackMessage` from the given actor which means that it is ready to process
* elements. It also requires `ackMessage` message after each stream element
* to make backpressure work. Stream elements are wrapped inside `StreamElementIn(elem)` messages.
* The target actor can emit elements at any time by sending a `StreamElementOut(elem)` message, which will
* be emitted downstream when there is demand.
* If the target actor terminates the stage will fail with a WatchedActorTerminatedException.
* When the stream is completed successfully a `StreamCompleted` message
* will be sent to the destination actor.
* When the stream is completed with failure a `StreamFailed(ex)` message will be send to the destination actor.
class ActorRefBackpressureFlowStage[In, Out](private val flowActor: ActorRef) extends GraphStage[FlowShape[In, Out]] {
import ActorRefBackpressureFlowStage._
val in: Inlet[In] = Inlet("ActorFlowIn")
val out: Outlet[Out] = Outlet("ActorFlowOut")
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
private lazy val self = getStageActor {
case (_, StreamAck) =>
if(firstPullReceived) {
if (!isClosed(in) && !hasBeenPulled(in)) {
} else {
pullOnFirstPullReceived = true
case (_, StreamElementOut(elemOut)) =>
val elem = elemOut.asInstanceOf[Out]
emit(out, elem)
case (_, Terminated(targetRef)) =>
failStage(new WatchedActorTerminatedException("ActorRefBackpressureFlowStage", targetRef))
case (actorRef, unexpected) =>
failStage(new IllegalStateException(s"Unexpected message: `$unexpected` received from actor `$actorRef`."))
var firstPullReceived: Boolean = false
var pullOnFirstPullReceived: Boolean = false
override def preStart(): Unit = {
//initialize stage actor and watch flow actor.
setHandler(in, new InHandler {
override def onPush(): Unit = {
val elementIn = grab(in)
override def onUpstreamFailure(ex: Throwable): Unit = {
override def onUpstreamFinish(): Unit = {
setHandler(out, new OutHandler {
override def onPull(): Unit = {
if(!firstPullReceived) {
firstPullReceived = true
if(pullOnFirstPullReceived) {
if (!isClosed(in) && !hasBeenPulled(in)) {
override def onDownstreamFinish(): Unit = {
private def tellFlowActor(message: Any): Unit = {
flowActor.tell(message, self.ref)
override def shape: FlowShape[In, Out] = FlowShape(in, out)
I'm looking for a simple way to start an external process and then write strings to its input and read its output.
In Python, this works:
mosesProcess = subprocess.Popen([mosesBinPath, '-f', mosesModelPath], stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE);
# ...
mosesAnswer = mosesProcess.stdout.readline().rstrip();
# ...
mosesAnswer = mosesProcess.stdout.readline().rstrip();
# ...
I think in Scala this should be done with scala.sys.process.ProcessBuilder and scala.sys.process.ProcessIO but I don't get how they work (especially the latter).
I have tried things like:
val inputStream = new scala.concurrent.SyncVar[java.io.OutputStream];
val outputStream = new scala.concurrent.SyncVar[java.io.InputStream];
val errStream = new scala.concurrent.SyncVar[java.io.InputStream];
val cmd = "myProc";
val pb = process.Process(cmd);
val pio = new process.ProcessIO(stdin => inputStream.put(stdin),
stdout => outputStream.put(stdout),
stderr => errStream.put(stderr));
inputStream.get.write(("request1" + "\n").getBytes);
println(outputStream.get.read); // It is blocked here
inputStream.get.write(("request2" + "\n").getBytes);
But the execution gets stuck.
Granted, attrib below is not a great example on the write side of things. I have an EchoServer that would input/output
import scala.sys.process._
import java.io._
object EchoClient{
def main(args: Array[String]) {
var bContinue=true
var cmd="C:\\\\windows\\system32\\attrib.exe"
val process = Process (cmd)
val io = new ProcessIO (
out => {scala.io.Source.fromInputStream(out).getLines.foreach(println)},
err => {scala.io.Source.fromInputStream(err).getLines.foreach(println)})
while (bContinue) {
process run io
var answer = readLine("Run again? (y/n)? ")
if (answer=="n" || answer=="N")
def reader(input: java.io.InputStream) = {
// read here
def writer(output: java.io.OutputStream) = {
// write here
// TODO: implement an error logger
output below :
A C:\dev\EchoClient$$anonfun$1.class
A C:\dev\EchoClient$$anonfun$2$$anonfun$apply$1.class
A C:\dev\EchoClient$$anonfun$2.class
A C:\dev\EchoClient$$anonfun$3$$anonfun$apply$2.class
A C:\dev\EchoClient$$anonfun$3.class
A C:\dev\EchoClient$.class
A C:\dev\EchoClient.class
A C:\dev\EchoClient.scala
A C:\dev\echoServer.bat
A C:\dev\EchoServerChg$$anonfun$main$1.class
A C:\dev\EchoServerChg$.class
A C:\dev\EchoServerChg.class
A C:\dev\EchoServerChg.scala
A C:\dev\ScannerTest$$anonfun$main$1.class
A C:\dev\ScannerTest$.class
A C:\dev\ScannerTest.class
A C:\dev\ScannerTest.scala
Run again? (y/n)?
Scala API for ProcessIO:
new ProcessIO(in: (OutputStream) ⇒ Unit, out: (InputStream) ⇒ Unit, err: (InputStream) ⇒ Unit)
I suppose you should provide at least two arguments, 1 outputStream function (writing to the process), 1 inputStream function (reading from the process).
For instance:
def readJob(in: InputStream) {
// do smthing with in
def writeJob(out: OutputStream) {
// do somthing with out
def errJob(err: InputStream) {
// do smthing with err
val process = new ProcessIO(writeJob, readJob, errJob)
Please keep in mind that the streams are Java streams so you will have to check Java API.
Edit: the package page provides examples, maybe you could take a look at them.
ProcessIO is the way to go for low level control and input and output interaction. There even is an often overlooked helper object BasicIO that assists with creating common ProcessIO instances for reading, connecting in/out streams with utility functions. You can look at the source for BasicIO.scala to see what it is doing internally in creating the ProcessIO Instances.
You can sometimes find inspiration from test cases or tools created for the class itself by the project. In the case of Scala, have a look at the source on GitHub. We are fortunate in that there is a detailed example of ProcessIO being used for the scala GraphViz Dot process runner DotRunner.scala!
My current application is based on akka 1.1. It has multiple ProjectAnalysisActors each responsible for handling analysis tasks for a specific project. The analysis is started when such an actor receives a generic start message. After finishing one step it sends itself a message with the next step as long one is defined. The executing code basically looks as follows
sealed trait AnalysisEvent {
def run(project: Project): Future[Any]
def nextStep: AnalysisEvent = null
case class StartAnalysis() extends AnalysisEvent {
override def run ...
override def nextStep: AnalysisEvent = new FirstStep
case class FirstStep() extends AnalysisEvent {
override def run ...
override def nextStep: AnalysisEvent = new SecondStep
case class SecondStep() extends AnalysisEvent {
class ProjectAnalysisActor(project: Project) extends Actor {
def receive = {
case event: AnalysisEvent =>
val future = event.run(project)
future.onComplete { f =>
self ! event.nextStep
I have some difficulties how to implement my code for the run-methods for each analysis step. At the moment I create a new future within each run-method. Inside this future I send all follow-up messages into the different subsystems. Some of them are non-blocking fire-and-forget messages, but some of them return a result which should be stored before the next analysis step is started.
At the moment a typical run-method looks as follows
def run(project: Project): Future[Any] = {
Future {
progressActor ! typicalFireAndForget(project.name)
val calcResult = (calcActor1 !! doCalcMessage(project)).getOrElse(...)
val p: Project = ... // created updated project using calcResult
val result = (storage !! updateProjectInformation(p)).getOrElse(...)
Since those blocking messages should be avoided, I'm wondering if this is the right way. Does it make sense to use them in this use case or should I still avoid it? If so, what would be a proper solution?
Apparently the only purpose of the ProjectAnalysisActor is to chain future calls. Second, the runs methods seems also to wait on results to continue computations.
So I think you can try refactoring your code to use Future Composition, as explained here: http://akka.io/docs/akka/1.1/scala/futures.html
def run(project: Project): Future[Any] = {
progressActor ! typicalFireAndForget(project.name)
calcResult <- calcActor1 !!! doCalcMessage(project);
p = ... // created updated project using calcResult
result <- storage !!! updateProjectInformation(p)
) yield (