How To Mock Out KafkaProducer Used Inside a Scala Class - scala

I want to write a unit test for a Scala class. The purpose of the class is to collect metrics and post them on a Kafka topic. I am trying to mock the producer in the unit test to ensure sanity of the rest of the code. Below is a simplified version of my class:
class MyEmitter(sparkConf: SparkConf) {
<snip> -- member variables
private val kafkaProducer = createProducer()
def createProducer(): Producer[String, MyMetricClass] = {
val props = new Properties()
...
Code to initialize properties
...
new KafkaProducer[String, MyMetricClass](props)
}
def initEmitter(metricName: String): SomeClass = {
// Some implementation
}
def collect(key: String, value: String): Unit = {
// Some implementation
}
def emit(): Unit = {
val record = new ProducerRecord("<topic name>", "<key>", "<value>")
kafkaProducer.send(record)
}
What I would like to do in my unit test is to mock out the producer and check whether the send() command has been called and, if so, whether the producer record matches the expectation. I have been unsuccessful to find a solution on my own. Googling the solution has also been unfruitful. If anyone knows how the problem could be solved, I will be most grateful.

'new' is generally an enemy of testing, so you should extract the creation of that object so you can either pass a real KafkaProducer or a mock
One way to do it without changing the interface could be
def createProducer(
producer: Properties => KafkaProducer = props => new KafkaProducer[String, MyMetricClass](props)
): Producer[String, MyMetricClass] = {
val props = new Properties()
producer(props)
}
So then in real code you keep calling
myEmmiter.createProducer()
But in test you'd do
val producerMock = mock[KafkaProducer]
myEmmiter.createProducer(_ => producerMock)
Another good thing about this is that you could also stub the function itself so you can verify that the props your method creates are the expected ones
Hope it helps

Related

Which design pattern is my Scala application using?

I have a Scala App that has a trait that implements some function(s) and a class which extends that trait.
The class mentioned above also has a function which calls the function that is defined in the parent trait using it's parameter.
I observed this in Spark + Kafka implementation using Scala. I'm guessing this is some kind of design pattern but I don't know which one. Is it Cake Pattern? Dependency Injection Pattern? Or something else?
Below is the code I'm referring to:
trait SparkApplication {
def sparkConfig: Map[String, String]
def withSparkContext(f: SparkContext => Unit): Unit = {
val conf = new SparkConf()
sparkConfig.foreach { case (k, v) => conf.setIfMissing(k, v) }
val sc = new SparkContext(conf)
f(sc)
}
}
trait SparkStreamingApplication extends SparkApplication {
def withSparkStreamingContext(f: (SparkContext, StreamingContext) => Unit): Unit = {
withSparkContext { sc =>
val ssc = new StreamingContext(sc, Seconds(streamingBatchDuration.toSeconds))
ssc.checkpoint(streamingCheckpointDir)
f(sc, ssc)
ssc.start()
ssc.awaitTermination()
}
}
}
What is being used here (albeit, with a possible error) is the so-called Loan Pattern, called in such way because it's useful when you want to manage the lifecycle of a resource (in your case a SparkContext), while allowing the user to define how the resource is going to be used.
A classical example of this is files: you want to open a file, read it's content and then close it as soon as you are done, without allowing the user to make some mistake and forget to close the resource. You may implement this as follows:
import scala.io.Source
// Read a file at `path` and allow to pass a function that iterates over lines
def consume[A](path: String)(f: Iterator[String] => A): A = {
val source = Source.fromFile(path)
try {
f(source.getLines)
} finally {
source.close()
}
}
Then you'd use this as follows (in the example, to just print all the lines paired with their numbers):
consume("/path/to/some/file")(_.zipWithIndex.foreach(println))
As you may notice, there is something very close to this going on in your code, with the only difference that the resource whose lifecycle you are managing is a SparkContext.
Regarding the possible error I mentioned initially, it regards the fact that you are loaning a SparkContext that you never close. That is probably ok, but the main aspect of the Loan Pattern is precisely that of minimizing the error surface when it comes to managing resources. You may be interested in doing something like the following (you want to check the last line in the method):
def withSparkContext(f: SparkContext => Unit): Unit = {
val conf = new SparkConf()
sparkConfig.foreach { case (k, v) => conf.setIfMissing(k, v) }
val sc = new SparkContext(conf)
f(sc)
sc.stop() // shutdown the context after the user is done
}
You may read more regarding this pattern here.
As a side note, you may be interested in this project that creates a very nice and idiomatic interface around managed resources.

How to attach a HashMap to a Configuration object in Flink?

I want to share a HashMap across every node in Flink and allow the nodes to update that HashMap. I have this code so far:
object ParallelStreams {
val env = StreamExecutionEnvironment.getExecutionEnvironment
//Is there a way to attach a HashMap to this config variable?
val config = new Configuration()
config.setClass("HashMap", Class[CustomGlobal])
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
class CustomGlobal extends ExecutionConfig.GlobalJobParameters {
override def toMap: util.Map[String, String] = {
new HashMap[String, String]()
}
}
class MyCoMap extends RichCoMapFunction[String, String, String] {
var users: HashMap[String, String] = null
//How do I get access the HashMap I attach to the global config here?
override def open(parameters: Configuration): Unit = {
super.open(parameters)
val globalParams = getRuntimeContext.getExecutionConfig.getGlobalJobParameters
val globalConf = globalParams[Configuration]
val hashMap = globalConf.getClass
}
//Other functions to override here
}
}
I was wondering if you can attach a custom object to the config variable created here val config = new Configuration()? (Please see comments in the code above).
I noticed you can only attach primitive values. I created a custom class that extends ExecutionConfig.GlobalJobParameters and attached that class by doing config.setClass("HashMap", Class[CustomGlobal]) but I am not sure if that is how you are supposed to do it?
The common way to distribute parameters to operators is to have them as regular member variables in the function class. The function object that is created and assigned during plan construction is serialized and shipped to all workers. So you don't have to pass parameters via a configuration.
This would look as follows
class MyMapper(map: HashMap) extends MapFunction[String, String] {
// class definition
}
val inStream: DataStream[String] = ???
val myHashMap: HashMap = ???
val myMapper: MyMapper = new MyMapper(myHashMap)
val mappedStream: DataStream[String] = inStream.map(myMapper)
The myMapper object is serialized (using Java serialization) and shipped for execution. So the type of map must implement the Java Serializable interface.
EDIT: I missed the part that you want the map to be updatable from all parallel tasks. That is not possible with Flink. You would have to either fully replicate the map and all updated (by broadcasting) or use an external system (key-value store) for that.

Injecting playFramework dependancies to scala object using MacWire traits fail

Lets say I have bunch of car objects in my project, for example:
object Porsche extends Car {
override def start() {...}
override def canStart(fuelInLitr: Int) = fuelInLitr > 5
override val fuelInLitr = 45
override val carId = 1234567
}
im extending Car which is just a trait to set a car structure:
trait Car {
def start(): Unit
val canStart(fuel: Double): Boolean
val fuelInLitr: Int
val carId: Int
}
Now, in the start() method I want to use some api service that will give me a car key based on its id so I cant start the car.
So I have this CarApiService:
class CarApiService (wsClient: WSClient, configuration: Configuration) {
implicit val formats: Formats = DefaultFormats
def getCarkey(carId: String): Future[Option[CarKey]] = {
val carInfoServiceApi = s"${configuration.get[String]("carsdb.carsInfo")}?carId=$carId"
wsClient.url(carInfoServiceApi).withHttpHeaders(("Content-Type", "application/json")).get.map { response =>
response.status match {
case Status.OK => Some(parse(response.body).extract[CarKey])
case Status.NO_CONTENT => None
case _ => throw new Exception(s"carsdb failed to perform operation with status: ${response.status}, and body: ${response.body}")
}
}
}
}
I want to have the ability to use getCarkey() in my car objects, so I created a CarsApiServicesModule which will give my access to the carApiService and I can use its methods:
trait CarsApiServicesModule {
/// this supply the carApiService its confuguration dependancy
lazy val configuration: Config = ConfigFactory.load()
lazy val conf: Configuration = wire[Configuration]
/// this supply the carApiService its WSClient dependancy
lazy val wsc: WSClient = wire[WSClient]
lazy val carApiService: CarApiService = wire[CarApiService]
}
and now I want to add mix this trait in my car object this way:
object Porsche extends Car with CarsApiServicesModule {
// here I want to use myApiService
// for example: carApiService.getCarkey(carId)...
}
but when compiling this I get this error:
does anyone know what is the issue?
also, is that design make sense?
You need to keep in mind that wire is just a helper macro which tries to generate new instance creation code: it's quite dumb, in fact. Here, it would try to create a new instance of WSClient.
However, not all objects can be instantiated using a simple new call - sometimes you need to invoke "factory" method.
In this case, if you take a look at the readme on GitHub, you'll see that to instantiate the WSClient, you need to create it through the StandaloneAhcWSClient() object.
So in this case, wire won't help you - you'll need to simply write the initialisation code by hand. Luckily it's not too large.

Pass an actor as parameter

For example, I have
trait Logger {
def log(m: Message) = ???
}
If I want to pass a Logger as parameter I would simple call def foo(l: Logger).
But if I have
trait Logger {
def log(m: Message) = ???
def receive = {
case m: Message = log(m)
}
}
,I must pass an ActorRef to be able to call !/? etc on it. But that completely breaks type safety - def log(ar: ActorRef), and nothing says that it should be an Logger underneath.
So, in the end I'd like to call
class Foo(logger: Logger) {
def foo(m: Message) = logger ! message
}
Well, yes. That's just how Akka works, because actors can change behavior. See How to restrict actor messages to specific types?.
Solution 1: define a wrapper class like
case class LoggerActor(ar: ActorRef) {
def log(m: Message) = ar ! m
}
And then use LoggerActor where you want to make sure you have a Logger. Of course, if you want to do it everywhere, you'll probably want to automate creation of these classes somehow.
Solution 2: Use Akka Typed. Note it's experimental. I won't copy the documentation from there.
Solution 3: Use Scalaz actors instead, which were typed from the beginning. See https://github.com/scalaz/scalaz/blob/series/7.3.x/tests/src/test/scala/scalaz/concurrent/ActorTest.scala for usage examples. For your case it could be
val logger = new Logger { ... }
val loggerActor = new Actor[Message](logger.log)
You can specify that your ActorRef should implement Logger too:
def log(ar: ActorRef with Logger)

Akka Stream and HTTP Scala: How to send Messages to an Actor from a Route

I'm playing with the akka-stream-and-http-experimental 1.0. So far, I've a user service that can accept and respond to HTTP requests. I'm also going to have an appointment service that can manage appointments. In order to make appointments, one must be an existing user. Appointment service will check with the user service if the user exists. Now this obviously can be done over HTTP but I'd rather have the appointment service send a message to the user service. Being new to this, I'm not clear how to use actors (as akka-http abstracts that) to send and receive messages. There's mention of ActorRef and ActorPublisher in the doc but no examples of the former and the later looks like an overkill for my need.
My code looks like the following and is on Github:
trait UserReadResource extends ActorPlumbing {
val userService: UserService
val readRoute = {
// route stuff
}
}
trait ActorPlumbing {
implicit val system: ActorSystem
implicit def executor: ExecutionContextExecutor
implicit val materializer: Materializer
def config: Config
val logger: LoggingAdapter
}
trait UserService { // Implemented by Slick and MongoDB in the backend
def findByFirstName(firstName: String): Future[immutable.Seq[User]]
}
object UserApp extends App with UserReadResource with UserWriteResource with ActorPlumbing {
override implicit val system = ActorSystem()
override implicit def executor = system.dispatcher
override implicit val materializer = ActorMaterializer()
override def config = ConfigFactory.load()
override val logger = Logging(system, getClass)
private val collection = newCollection("users")
val userRepository = new MongoDBUserRepository(collection)
val userService: UserService = new MongoDBUserRepositoryAdapter(userRepository) with UserBusinessDelegate {
// implicitly finds the executor in scope. Ain't that cute?
override implicit def executor = implicitly
}
Http().bindAndHandle(readRoute ~ writeRoute, config.getString("http.interface"), config.getInt("http.port"))
}
Edit:
I figured out how to send messages, which could be done using Source.actorRef. That only emits the messages into the stream. What I'd like to do is for the route handler class to receive the response. That way when I create the appointment service, it's actor can call the user service actor and receive the response in the same manner as the user route handler in my example does.
Pseudo code:
val src = Source.single(name) \\ How to send this to an actor and get the response
Edit 2:
Based on the #yardena answer, I came up with the following but the last line doesn't compile. My actor publisher returns a Future which I'm guessing will be wrapped in a Promise and then delivered as a Future to the route handler.
get {
parameters("firstName".?, "lastName".?).as(FindByNameRequest) { name =>
type FindResponse = Future[FindByNameResponse]
val src: Source[FindResponse, Unit] = Source.actorPublisher[FindResponse](businessDelegateProps).mapMaterializedValue {
_ ! name
}
val emptyResponse = Future.apply(FindByNameResponse(OK, Seq.empty))
val sink = Sink.fold(emptyResponse)((_, response: FindResponse) => response)
complete(src.runWith(sink)) // doesn't compile
}
}
I ended up with using Actor.ask. Simple.
This link may be helpful: http://zuchos.com/blog/2015/05/23/how-to-write-a-subscriber-for-akka-streams/ and this answer by #Noah Accessing the underlying ActorRef of an akka stream Source created by Source.actorRef
Basically you have 2 choices:
1) if you want a "simple" actor, which will forward into the stream all messages that it receives, you can use Source.actorRef. Then you can pipeline the messages into UserService by creating a processing stage using mapAsync.
2) Another option, in case you want the actor to have some custom behavior, is to write your own ActorPublisher.
HTH