scala+jdbc+case class+actors design confusion - scala

I am writing an exporter that will take results from the database and take every individual records and write it to a comma separated file. Different queries will have different worker created for it since they need to write separate csv files. To start off, I have decoupled the tasks into two different actors. Actor1 is a JdbcWorker which queries the database provided a query parameter and Actor2 is a CSVWriter which receives case class representing the result from the query that needs to be appended to the CSV. My first question is, even though I like the separation of concerns provided by these two workers but is it good design to decouple the jdbc query from the CSV writer?
So, I have written actor1 as follows:
class DataQueryWorker(csvExporterWorker: ActorRef) extends Actor with ActorLogging{
private implicit def ModelConverter(rs: ResultSet): QueryModel = {
QueryModel(
id = rs.getString(0),
name = rs.getString(1),
age = rs.getString(2),
gender = rs.getString(3))
}
private def sendModelToCsvWorker(model: QueryModel): Unit = {
csvExporterWorker ! model
}
private def startExport[T](queryString: String)(resultFunc: T => Unit)(implicit ModelConverter: ResultSet => T): Unit = {
try {
val connection = DriverManager.getConnection(DbConfig.connectionString,
DbConfig.user,
DbConfig.password)
val statement = connection.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY)
statement.setFetchSize(Integer.MIN_VALUE)
val rs = statement.executeQuery(queryString)
while (rs.next()) {
resultFunc(ModelConverter(rs))
}
} catch {
case e: Exception => //What to do in case of an exception???
}
}
override def receive() = {
case startEvent => startExport(DbConfig.ModelExtractionQuery)(sendModelToCsvWorker)
}
}
My next question would be, is the code written above, the proper way to query the database, wrap it in a model and send the result to the CSVWorker? I am not sure if I am following the scala idioms properly. Also, what would be the proper way to handle exceptions in this case?
It will be great to get some guidance on this.
Thanks

I think your approach is ok with a couple of minor changes:
For the DB actor, you might want to look into making these long lived actors, pooled behind a Router. Let this actor hold a Connection as it's state, opening it once when started and closing then reopening in case of restart due to failure. I think this might be a better approach as you won't always need to be opening connections for calls to export data. You just need to write some code for perhaps checking the state of the connection (and reconnecting) before making calls to it.
Once you make the DB actor stateful and long lived, you won't be able to pass the CSVWorker in via the constructor. You should instead pass it in via the message to this actor indicating that you want an export. You could do that via a case class like so:
case class ExportQuery(query:String, csvWorker:ActorRef)
Change your receive to look like this:
def receive = {
case ExportQuery(query, csvWorker) =>
...
}
And lastly, remove the try/catch logic. Unless you can do something meaningful based on this failure (like call some alternate code path) it doesn't make sense catching it. Let the actor fail and get restarted (and close/reopen the connection) and move on.

I think using actors here is probably overkill.
Actors are useful when you want to operate on mutable state with multiple threads safely. But, in your case, you say that each query writes to a separate CSV file (so there's only one thread per CSV file). I don't think the CSVWorker actor is necessary. It could even potentially be harmful, as the actor mailbox could grow and consume a significant amount of memory, if the DBWorker is signifcantly faster than the CSVWorker.
Personally, I'd just call the CSV writer directly.
The question about separation of concerns depends on whether you expect this code to be re-used in unrelated contexts. If you're likely to want to use your JDBC worker with other writers, then it may be worth it (although there's a school of thought that says you're better off waiting until a need arises before refactoring - You Aint Gonna Need It or YAGNI). Otherwise, you might be better off simplifying.
If you do decide to attach the JDBC code to the CSV code directly, you might also want to take out the case class conversion. Again, if this is code that will be re-used elsewhere, then it's better to keep it.
Exception handling depends on your application, but in Scala (unlike in Java), if you don't know what to do about an Exception, you probably shouldn't do anything. Take the try..catch block out, and just let the exception propagate - something will catch it, and report it.
Java forces you to handle exceptions, which is a great idea in theory, but in practice often leads to error handling code that does nothing of any real use (either re-throwing, or worse, swallowing errors).
Oh, and if you're writing a lot of code that turns ResultSets into case classes, and vice versa, you might want to look at using an Object Relation Mapping framework, like Slick or Squeryl. They're optimised for precisely this use case.

Related

Looking for something like a TestFlow analogous to TestSink and TestSource

I am writing a class that takes a Flow (representing a kind of socket) as a constructor argument and that allows to send messages and wait for the respective answers asynchronously by returning a Future. Example:
class SocketAdapter(underlyingSocket: Flow[String, String, _]) {
def sendMessage(msg: MessageType): Future[ResponseType]
}
This is not necessarily trivial because there may be other messages in the socket stream that are irrelevant, so some filtering is required.
In order to test the class I need to provide something like a "TestFlow" analogous to TestSink and TestSource. In fact I can create a flow by combining both. However, the problem is that I only obtain the actual probes upon materialization and materialization happens inside the class under test.
The problem is similar to the one I described in this question. My problem would be solved if I could materialize the flow first and then pass it to a client to connect to it. Again, I'm thinking about using MergeHub and BroadcastHub and again I see the problem that the resulting stream would behave differently because it is not linear anymore.
Maybe I misunderstood how a Flow is supposed to be used. In order to feed messages into the flow when sendMessage() is called, I need a certain kind of Source anyway. Maybe a Source.actorRef(...) or Source.queue(...), so I could pass in the ActorRef or SourceQueue directly. However, I'd prefer if this choice was up to the SocketAdapter class. Of course, this applies to the Sink as well.
It feels like this is a rather common case when working with streams and sockets. If it is not possible to create a "TestFlow" like I need it, I'm also happy with some advice on how to improve my design and make it better testable.
Update: I browsed through the documentation and found SourceRef and SinkRef. It looks like these could solve my problem but I'm not sure yet. Is it reasonable to use them in my case or are there any drawbacks, e.g. different behaviour in the test compared to production where there are no such refs?
Indirect Answer
The nature of your question suggests a design flaw which you are bumping into at testing time. The answer below does not address the issue in your question, but it demonstrates how to avoid the situation altogether.
Don't Mix Business Logic with Akka Code
Presumably you need to test your Flow because you have mixed a substantial amount of logic into the materialization. Lets assume you are using raw sockets for your IO. Your question suggests that your flow looks like:
val socketFlow : Flow[String, String, _] = {
val socket = new Socket(...)
//business logic for IO
}
You need a complicated test framework for your Flow because your Flow itself is also complicated.
Instead, you should separate out the logic into an independent function that has no akka dependencies:
type MessageProcessor = MessageType => ResponseType
object BusinessLogic {
val createMessageProcessor : (Socket) => MessageProcessor = {
//business logic for IO
}
}
Now your flow can be very simple:
val socket : Socket = new Socket(...)
val socketFlow = Flow.map(BusinessLogic.createMessageProcessor(socket))
As a result: your unit testing can exclusively work with createMessageProcessor, there's no need to test akka Flow because it is a simple veneer around the complicated logic that is tested independently.
Don't Use Streams For Concurrency Around 1 Element
The other big problem with your design is that SocketAdapter is using a stream to process just 1 message at a time. This is incredibly wasteful and unnecessary (you're trying to kill a mosquito with a tank).
Given the separated business logic your adapter becomes much simpler and independent of akka:
class SocketAdapter(messageProcessor : MessageProcessor) {
def sendMessage(msg: MessageType): Future[ResponseType] = Future {
messageProcessor(msg)
}
}
Note how easy it is to use Future in some instances and Flow in other scenarios depending on the need. This comes from the fact that the business logic is independent of any concurrency framework.
This is what I came up with using SinkRef and SourceRef:
object TestFlow {
def withProbes[In, Out](implicit actorSystem: ActorSystem,
actorMaterializer: ActorMaterializer)
:(Flow[In, Out, _], TestSubscriber.Probe[In], TestPublisher.Probe[Out]) = {
val f = Flow.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])
(Keep.both)
val ((sinkRefFuture, (inProbe, outProbe)), sourceRefFuture) =
StreamRefs.sinkRef[In]()
.viaMat(f)(Keep.both)
.toMat(StreamRefs.sourceRef[Out]())(Keep.both)
.run()
val sinkRef = Await.result(sinkRefFuture, 3.seconds)
val sourceRef = Await.result(sourceRefFuture, 3.seconds)
(Flow.fromSinkAndSource(sinkRef, sourceRef), inProbe, outProbe)
}
}
This gives me a flow I can completely control with the two probes but I can pass it to a client that connects source and sink later, so it seems to solve my problem.
The resulting Flow should only be used once, so it differs from a regular Flow that is rather a flow blueprint and can be materialized several times. However, this restriction applies to the web socket flow I am mocking anyway, as described here.
The only issue I still have is that some warnings are logged when the ActorSystem terminates after the test. This seems to be due to the indirection introduced by the SinkRef and SourceRef.
Update: I found a better solution without SinkRef and SourceRef by using mapMaterializedValue():
def withProbesFuture[In, Out](implicit actorSystem: ActorSystem,
ec: ExecutionContext)
: (Flow[In, Out, _],
Future[(TestSubscriber.Probe[In], TestPublisher.Probe[Out])]) = {
val (sinkPromise, sourcePromise) =
(Promise[TestSubscriber.Probe[In]], Promise[TestPublisher.Probe[Out]])
val flow =
Flow
.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])(Keep.both)
.mapMaterializedValue { case (inProbe, outProbe) =>
sinkPromise.success(inProbe)
sourcePromise.success(outProbe)
()
}
val probeTupleFuture = sinkPromise.future
.flatMap(sink => sourcePromise.future.map(source => (sink, source)))
(flow, probeTupleFuture)
}
When the class under test materializes the flow, the Future is completed and I receive the test probes.

Akka Ask Pattern with many types of responses

I am writing a program that has to interact with a library that was implemented using Akka. In detail, this library exposes an Actor as endpoint.
As far as I know and as it is explained in the book Applied Akka Pattern, the best way to interact with an Actor system from the outside is using the Ask Pattern.
The library I have to use exposes an actor Main that accepts a Create message. In response to this message, it can respond with two different messages to the caller, CreateAck and CreateNack(error).
The code I am using is more or less the following.
implicit val timeout = Timeout(5 seconds)
def create() = (mainActor ? Create).mapTo[???]
The problem is clearly that I do not know which kind of type I have to use in mapTo function, instead of ???.
Am I using the right approach? Is there any other useful pattern to access to an Actor System from an outside program that does not use Actors?
In general it's best to leave Actors to talk between Actors, you'd simply receive a response then - simple.
If you indeed have to integrate them with the "outside", the ask pattern is fine indeed. Please note though that if you're doing this inside an Actor, this perhaps isn't the best way to go about it.
If there's a number of unrelated response types I'd suggest:
(1) Make such common type; this can be as simple as :
sealed trait CreationResponse
final case object CreatedThing extends CreationResponse
final case class FailedCreationOfThing(t: Throwable) extends CreationResponse
final case class SomethingElse...(...) extends CreationResponse
which makes the protocol understandable, and trackable. I recommend this as it's explicit and helps in understanding what's going on.
(2) For completely unrelated types simply collecting over the future would work by the way, without doing the mapTo:
val res: Future[...] = (bob ? CreateThing) collect {
case t: ThatWorked => t // or transform it
case nope: Nope => nope // or transform it to a different value
}
This would work fine type wise if the results, t and nope have a common super type, that type would then be the ... in the result Future. If a message comes back and does not match any case it'd be a match error; you could add a case _ => whatever then for example, OR it would point to a programming error.
See if CreateAck or CreateNack(error) inherit from any sort of class or object. If thats the case you can use the parent class or object in the .mapTo[CreateResultType].
Another solution is to use .mapTo[Any] and use a match case to find the resulting type.

Scala (method / function).hashCode static value?

I' was curious if the hashCode() function returns always the same value for a lambda or something in Scala?
My tests have shown to me some static value that does not change even over builds. Is this intended behavior or may it change in future?
If this was some static behavior it would help me a lot building my library.
EDIT:
Let's take this source code:
object Main {
def main(args: Array[String]): Unit = {
val x = (s: String) => 1
val y = (s: String) => 2
println(x.hashCode())
println(y.hashCode())
}
}
It's output on console is for me always 1792393294 and 226170135.
What I'm currently doing is implementing a parser combinator library which I implemented in several languages. And I need to know when wrapper classes are the same (e.g. the underlying functions are the same) so I can implement something like a call stack, which I need to parse as far as possible on failure but prevent endless recursion on error.
Thanks in advance!
The default implementation for hashCode (at least in the Oracle JVM) is in terms of the (initial) memory address of that particular object. So if your program has constructed exactly the same sized objects in exactly the same order before constructing that object, it will in practice return the same value every time.
But this is not at all reliable; most programs do not do exactly the same things every time they run. As soon as you're doing something like responding to user input, that perfect reproducibility will disappear - e.g. maybe you sometimes add enough entries to a HashMap to trigger an enlargement of the table, and sometimes not. And if you construct the same value later in the program, it will of course have a different address; try doing
val z = (s: String) => 1
and observe that it will have a different hashCode from x. Not to mention that the numbers may well be different across different JVMs, different versions of the same JVM, or even when the same JVM is launched with a different -Xms setting.
Computers are often a lot more deterministic in practice than in theory. But this is not the kind of thing that's specified to happen, and certainly not something to rely on in your programs.

Akka actor forward message with continuation

I have an actor which takes the result from another actor and applies some check on it.
class Actor1(actor2:Actor2) {
def receive = {
case SomeMessage =>
val r = actor2 ? NewMessage()
r.map(someTransform).pipeTo(sender)
}
}
now if I make an ask of Actor1, we now have 2 futures generated, which doesnt seem overly efficient. Is there a way to provide a foward with some kind of continuation, or some other approach I could use here?
case SomeMessage => actor2.forward(NewMessage, someTransform)
Futures are executed in an ExecutionContext, which are like thread pools. Creating a new future is not as expensive as creating a new thread, but it has its cost. The best way to work with futures is to create as much as needed and compose then in a way that things that can be computed in parallel are computed in parallel if the necessary resources are available. This way you will make the best use of your machine.
You mentioned that akka documentation discourages excessive use of futures. I don't know where you read this, but what I think it means is to prefer transforming futures rather than creating your own. This is exactly what you are doing by using map. Also, it may mean that if you create a future where it is not needed you are adding unnecessary overhead.
In your case you have a call that returns a future and you need to apply sometransform and return the result. Using map is the way to go.

Writing applications with Scala actors in practice

I've now written a few applications using scala actors and I'm interested in how people have approached or dealt with some of the problems I've encountered.
A plethora of Message classes or !?
I have an actor which reacts to a user operation and must cause something to happen. Let's say it reacts to a message UserRequestsX(id). A continuing problem I have is that, because I want to modularize my programs, a single actor on its own is unable to complete the action without involving other actors. For example, suppose I need to use the id parameter to retrieve a bunch of values and then these need to be deleted via some other actor. If I were writing a normal Java program, I might do something like:
public void reportTrades(Date date) {
Set<Trade> trades = persistence.lookup(date);
reportService.report(trades);
}
Which is simple enough. However, using actors this becomes a bit of a pain because I want to avoid using !?. One actor reacts to the ReportTrades(date) message but it must ask a PersistenceActor for the trades and then a ReportActor to report them. The only way I've found of doing this is to do:
react {
case ReportTrades(date) =>
persistenceActor ! GetTradesAndReport(date)
}
So that in my PersistenceActor I have a react block:
react {
case GetTradesAndReport(date) =>
val ts = trades.get(date) //from persietent store
reportActor ! ReportTrades(ts)
}
But now I have 2 problems:
I have to create extra message classes to represent the same request (i.e. "report trades"). In fact I have three in this scenario but I may have many more - it becomes a problem keeping track of these
What should I call the first and third message ReportTrades? It's confusing to call them both ReportTrades (or if I do, I must put them in separate packages). Essentially there is no such thing as overloading a class by val type.
Is there something I'm missing? Can I avoid this? Should I just give up and use !? Do people use some organizational structure to clarify what is going on?
To me, your ReportTrades message is mixing two different concepts. One is a Request, the order is a Response. They might be named GetTradesReport(Date) and SendTradesReport(List[Trade]), for example. Or, maybe, ReportTradesByDate(Date) and GenerateTradesReport(List[Trade]).
Are there some objections to using reply? Or passing trades around? If not, your code would probably look like
react {
case ReportTrades(date) => persistenceActor ! GetTrades(date)
case Trades(ts) => // do smth with trades
}
and
react {
case GetTrades(date) => reply(Trades(trades.get(date)))
}
respectively.