How to configure Akka to expose metrics to Prometheus using Kamon? - scala

I am trying to configure my project which is based on Akka 2.6.10 to expose metric values to Prometheus. I saw this question which uses Kamon but I couldn't figure out what I am missing on my configuration. My build.sbt file has the following configuration:
name := """explore-akka"""
version := "1.1"
scalaVersion := "2.12.7"
val akkaVersion = "2.6.10"
lazy val kamonVersion = "2.1.9"
libraryDependencies ++= Seq(
// Akka basics
"com.typesafe.akka" %% "akka-actor" % akkaVersion,
"com.typesafe.akka" %% "akka-testkit" % akkaVersion,
// Metrics: Kamon + Prometheus
"io.kamon" %% "kamon-core" % kamonVersion,
"io.kamon" %% "kamon-akka" % kamonVersion,
"io.kamon" %% "kamon-prometheus" % kamonVersion
)
and the plugins.sbt:
resolvers += Resolver.bintrayRepo("kamon-io", "sbt-plugins")
addSbtPlugin("io.kamon" % "sbt-aspectj-runner" % "1.1.1")
I added on the application.conf:
kamon.instrumentation.akka.filters {
actors.track {
includes = [ "CounterSystem/user/Counter**" ]
}
}
then I start a MainClass which call the counterActor:
object MainClass extends App {
Kamon.registerModule("akka-test", new PrometheusReporter())
Kamon.init()
CounterActor.run()
}
import akka.actor.{Actor, ActorSystem, Props}
import kamon.Kamon
object CounterActor extends App {
run()
def run() = {
import Counter._
val actorSystem = ActorSystem("CounterSystem")
val countActor = actorSystem.actorOf(Props[Counter], "Counter")
(1 to 100).foreach { v =>
Thread.sleep(1000)
countActor ! Increment
}
(1 to 50).foreach { v =>
Thread.sleep(1000)
countActor ! Decrement
}
countActor ! Print
}
class Counter extends Actor {
import Counter._
val counter = Kamon.counter("my-counter")
var count = 0
override def receive: Receive = {
case Increment =>
count += 1
println(s"incrementing... $count")
counter.withoutTags().increment()
case Decrement =>
count -= 1
println(s"decrementing... $count")
counter.withoutTags().increment()
case Print =>
sender() ! count
println(s"[counter] current count is: $count")
}
}
object Counter {
case object Increment
case object Decrement
case object Print
}
}
I think that after this I could launch the application using sbt run and listen to the metrics on the Prometheus console (http://127.0.0.1:9090/graph), but I do not see any metrics related to my actor. My guess is that I have to configure the scrape_config at the prometheus file /etc/prometheus/prometheus.yml. Am I right? How should I configure it?

I had to configure Prometheus to scrape the Kamon web service through the config file
cat /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: "kamon"
scrape_interval: "5s"
static_configs:
- targets: ['localhost:9095']
metrics_path: /
add these 2 Kamon libraries at build.sbt:
"io.kamon" %% "kamon-bundle" % "2.1.9",
"io.kamon" %% "kamon-prometheus" % "2.1.9",
add this configuration at application.conf:
kamon.instrumentation.akka.filters {
actors.track {
includes = [ "AkkaQuickStart/user/*" ]
# excludes = [ "AkkaQuickStart/system/**" ]
}
}
Start Kamon and call the counter:
Kamon.init()
val counterSendMsg = Kamon.counter("counter-send-msg")
counterSendMsg.withTag("whom", message.whom).increment()
Here is the full demo application from Akka quick start with Kamon configured to count messages:
import akka.actor.typed.scaladsl.Behaviors
import akka.actor.typed.{ActorRef, ActorSystem, Behavior}
import kamon.Kamon
import scala.util.Random
object Greeter {
val counterSendMsg = Kamon.counter("counter-send-msg")
def apply(): Behavior[Greet] = Behaviors.receive { (context, message) =>
context.log.info("Hello {}!", message.whom)
//#greeter-send-messages
message.replyTo ! Greeted(message.whom, context.self)
counterSendMsg.withTag("whom", message.whom).increment()
//#greeter-send-messages
Behaviors.same
}
final case class Greet(whom: String, replyTo: ActorRef[Greeted])
final case class Greeted(whom: String, from: ActorRef[Greet])
}
object GreeterBot {
def apply(max: Int): Behavior[Greeter.Greeted] = {
bot(0, max)
}
private def bot(greetingCounter: Int, max: Int): Behavior[Greeter.Greeted] =
Behaviors.receive { (context, message) =>
val n = greetingCounter + 1
context.log.info("Greeting {} for {}", n, message.whom)
if (n == max) {
Behaviors.stopped
} else {
message.from ! Greeter.Greet(message.whom, context.self)
bot(n, max)
}
}
}
object GreeterMain {
def apply(): Behavior[SayHello] =
Behaviors.setup { context =>
//#create-actors
val greeter = context.spawn(Greeter(), "greeter")
//#create-actors
Behaviors.receiveMessage { message =>
//#create-actors
val replyTo = context.spawn(GreeterBot(max = 3), message.name)
//#create-actors
greeter ! Greeter.Greet(message.name, replyTo)
Behaviors.same
}
}
final case class SayHello(name: String)
}
object AkkaQuickstart {
def main(args: Array[String]): Unit = {
run()
}
def run() = {
Kamon.init()
import GreeterMain._
val greeterMain: ActorSystem[GreeterMain.SayHello] = ActorSystem(GreeterMain(), "AkkaQuickStart")
val allPerson = List("Charles", "Bob", "Felipe", "Simone", "Fabio")
def randomPerson = allPerson(Random.nextInt(allPerson.length))
while (true) {
greeterMain ! SayHello(randomPerson)
Thread.sleep(1000)
}
}
}
and my Prometheus web console:

Related

ZIO: How to return JSON ? [instead of using case class in ZIO-Http use schema to map?]

I tried directly getting body of JSON in code which I then want to convert to Avro to write to a kafka topic.
Here is my code with case class:
import zhttp.http._
import zio._
import zhttp.http.{Http, Method, Request, Response, Status}
import zhttp.service.Server
import zio.json._
import zio.kafka._
import zio.kafka.serde.Serde
import zio.schema._
case class Experiments(experimentId: String,
variantId: String,
accountId: String,
deviceId: String,
date: Int)
//case class RootInterface (events: Seq[Experiments])
object Experiments {
implicit val encoder: JsonEncoder[Experiments] = DeriveJsonEncoder.gen[Experiments]
implicit val decoder: JsonDecoder[Experiments] = DeriveJsonDecoder.gen[Experiments]
implicit val codec: JsonCodec[Experiments] = DeriveJsonCodec.gen[Experiments]
implicit val schema: Schema[Experiments] = DeriveSchema.gen
}
object HttpService {
def apply(): Http[ExpEnvironment, Throwable, Request, Response] =
Http.collectZIO[Request] {
case req#(Method.POST -> !! / "zioCollector") =>
val c = req.body.asString.map(_.fromJson[Experiments])
for {
u <- req.body.asString.map(_.fromJson[Experiments])
r <- u match {
case Left(e) =>
ZIO.debug(s"Failed to parse the input: $e").as(
Response.text(e).setStatus(Status.BadRequest)
)
case Right(u) =>
println(s"$u + =====")
ExpEnvironment.register(u)
.map(id => Response.text(id))
}
}
yield r
}
}
// val experimentsSerde: Serde[Any, Experiments] = Serde.string.inmapM { string =>
// //desericalization
// ZIO.fromEither(string.fromJson[Experiments].left.map(errorMessage => new RuntimeException(errorMessage)))
// } { theMatch =>
// ZIO.effect(theMatch.toJson)
//
// }
object ZioCollectorMain extends ZIOAppDefault {
def run: ZIO[Environment with ZIOAppArgs with Scope, Any, Any] = {
Server.start(
port = 9001,
http = HttpService()).provide(ZLayerExp.layer)
}
}
I'm looking into Zio-Json but no success yet, any help is appreciated !
We could also schema something to get the avro generic record
here's my json :
{
"experimentId": "abc",
"variantId": "123",
"accountId": "123",
"deviceId": "123",
"date": 1664544365
}
This function works for me in Scala 3 (sorry, I didn't include all the code but it should be enough):
import zio.*
import zio.Console.printLine
import zhttp.http.*
import zhttp.service.Server
import zio.json.*
...
case class Experiments(experimentId: String,
variantId: String,
accountId: String,
deviceId: String,
date: Int)
//case class RootInterface (events: Seq[Experiments])
object Experiments:
implicit val encoder: JsonEncoder[Experiments] = DeriveJsonEncoder.gen[Experiments]
implicit val decoder: JsonDecoder[Experiments] = DeriveJsonDecoder.gen[Experiments]
implicit val codec: JsonCodec[Experiments] = DeriveJsonCodec.gen[Experiments]
val postingExperiment: Http[Any, Throwable, Request, Response] =
Http.collectZIO[Request] {
case req#(Method.POST -> !! / "zioCollector") =>
//val c = req.body.asString.map(_.fromJson[Experiments])
val experimentsZIO = req.body.asString.map(_.fromJson[Experiments])
for {
experimentsOrError <- experimentsZIO
response <- experimentsOrError match {
case Left(e) => ZIO.debug(s"Failed to parse the input: $e").as(
Response.text(e).setStatus(Status.BadRequest)
)
case Right(experiments) => ZIO.succeed(Response.json(experiments.toJson))
}
} yield response
}
I modified your code slightly (you didn't post your ExpEnvironment class), and it returns back the object posted to the url.
and the test code is:
import sttp.client3.{SimpleHttpClient, UriContext, basicRequest}
object TestExperiments:
def main(args: Array[String]): Unit =
val client = SimpleHttpClient()
//post request
val request = basicRequest
.post(uri"http://localhost:9009/zioCollector")
.body("{ \"experimentId\": \"abc\", \"variantId\": \"123\", \"accountId\": \"123\", \"deviceId\": \"123\", \"date\": 1664544365 }")
val response = client.send(request)
println(response.body)
val invalidJsonRequest = basicRequest
.post(uri"http://localhost:9009/zioCollector")
.body("{ \"experimentId\": \"abc\", \"variantId\": \"123\", \"accountId\": \"123\", \"deviceId\": \"123\", \"date\": 1664544365 ") // missing the closing bracket
val invalidJsonResponse = client.send(invalidJsonRequest)
println(invalidJsonResponse.body)
You have to add: "com.softwaremill.sttp.client3" %% "core" % "3.8.3" to your sbt file.
build.sbt:
ThisBuild / scalaVersion := "3.2.0"
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / organization := "TestSpeed"
ThisBuild / organizationName := "example"
lazy val root = (project in file("."))
.settings(
name := "TestZio",
libraryDependencies ++= Seq(
"dev.zio" %% "zio" % "2.0.2",
"dev.zio" %% "zio-json" % "0.3.0-RC11",
"io.d11" %% "zhttp" % "2.0.0-RC11",
"dev.zio" %% "zio-test" % "2.0.2" % Test,
"com.softwaremill.sttp.client3" %% "core" % "3.8.3" % Test
),
testFrameworks += new TestFramework("zio.test.sbt.ZTestFramework")
)
I didn't include anything related to avro because I am not familiar with it.

Streaming ByteArrayOutputStream in Action Result in scala Play Framework

I am trying to convert an svg image to a jpeg image in scala play framework.
I used batik and it worked ok.
Now I like to stream the output in the action result, instead of converting ByteArrayOutputStream to a
ByteArray which loads the entire output in memory.
How can I do that?
Here is the project code, which works without streaming the output:
build.sbt
name := "svg2png"
version := "1.0-SNAPSHOT"
lazy val root = (project in file(".")).enablePlugins(PlayScala)
resolvers += Resolver.sonatypeRepo("snapshots")
scalaVersion := "2.12.3"
libraryDependencies ++= Seq(
jdbc,
guice,
"org.apache.xmlgraphics" % "batik-transcoder" % "1.11",
"org.apache.xmlgraphics" % "batik-codec" % "1.11"
)
/project/plugins.sbt
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.6.6")
/project/build.properties
sbt.version=1.0.2
/conf/routes
GET / controllers.Application.index
GET /image controllers.Application.getImage
/conf/application.conf
play.filters.enabled += "play.filters.cors.CORSFilter"
play.filters {
hosts {
allowed = ["."]
}
headers {
contentSecurityPolicy = null
}
}
play.i18n {
langs = [ "en" ]
}
contexts {
imageService {
fork-join-executor {
parallelism-factor = 4.0
parallelism-max = 8
}
}
}
/app/controllers/Application.scala
package controllers
import play.api.mvc._
import javax.inject._
import scala.concurrent.ExecutionContext
#Singleton
class Application #Inject()(imageService: services.ImageService,
cc: ControllerComponents)(implicit exec: ExecutionContext) extends AbstractController(cc) {
def index = Action {
Ok("test app").as(HTML)
}
def getImage : Action[AnyContent] = Action.async {
imageService.getImage.map{res => Ok(res).as("image/jpeg") }
}
}
/app/services/ImageService.scala
package services
import java.io.{ByteArrayOutputStream, StringReader}
import com.google.inject.{Inject, Singleton}
import scala.concurrent.{ExecutionContext, Future}
import akka.actor.ActorSystem
import org.apache.batik.transcoder.image.JPEGTranscoder
import org.apache.batik.transcoder.TranscoderInput
import org.apache.batik.transcoder.TranscoderOutput
#Singleton
class ImageService #Inject()(actorSystem: ActorSystem) {
implicit val AnalyticsServiceExecutionContext: ExecutionContext = actorSystem.dispatchers.lookup("contexts.imageService")
def getImage: Future[Array[Byte]] = {
Future {
val t: JPEGTranscoder = new JPEGTranscoder
t.addTranscodingHint(JPEGTranscoder.KEY_QUALITY, new java.lang.Float(0.8))
val imageSVGString: String =
"""<svg width="1000" height="1000" viewBox="0 0 1000 1000" version="1.1"
| xmlns="http://www.w3.org/2000/svg"
| xmlns:xlink="http://www.w3.org/1999/xlink">
| <circle cx="500" cy="500" r="300" fill="lightblue" />
|</svg>
""".stripMargin
val input: TranscoderInput = new TranscoderInput(new StringReader(imageSVGString))
val outputStream = new ByteArrayOutputStream
val output: TranscoderOutput = new TranscoderOutput(outputStream)
t.transcode(input, output)
outputStream.toByteArray
}
}
}
This works for me:
svg match {
case None => NotFound
case Some(svg) =>
val svgImage = new TranscoderInput(new StringReader(svg))
val pngOstream = new ByteArrayOutputStream
val outputPngImage = new TranscoderOutput(pngOstream)
val converter = fileExtension match {
case "png" => new PNGTranscoder()
case _ => new JPEGTranscoder()
}
if(converter.isInstanceOf[JPEGTranscoder]){
converter.addTranscodingHint(JPEGTranscoder.KEY_QUALITY, (0.8).toFloat)
}
converter.transcode(svgImage, outputPngImage)
Ok(pngOstream.toByteArray).as("image/" + fileExtension)
}

Scala Akka Http - Use routes to return Future [A] as Json

Akka Http version: "com.typesafe.akka" %% "akka-http" % "10.0.11"
Stream version: "com.typesafe.akka" %% "akka-stream" % "2.5.7"
Play Json version: "com.typesafe.play" %% "play-json" % "2.6.7"
I have the following method in the crudService:
def getAll: Future[Seq[A]]
I want to return this in a route to provide the outcome as Json to the world. I currently have this:
val crudService = new CrudService[Todo]()
val route =
pathPrefix("todo" / "_all") {
get {
complete {
crudService.getAll
}
}
}
val bindingFuture = Http().bindAndHandle(route, hostname, port)
I have also tried this (first complete the Future):
val route =
pathPrefix("todo" / "_all") {
get {
onSuccess(crudService.getAll) { x =>
complete x
}
}
}
It keeps saying: Not applicable to ToResponseMarshallable. I can't find documentation that leads to a correct solution and I don't understand exactly the problem here. Can somebody help out?
You need add play json support which is not built-in to provide (un)marshalling support.
Try to import https://github.com/hseeberger/akka-http-json library and mix your class with PlayJsonSupport. You can view an example App here.
def route(implicit mat: Materializer) = {
import Directives._
import PlayJsonSupport._
pathSingleSlash {
post {
entity(as[Foo]) { foo =>
complete {
foo
}
}
}
}
}
Http().bindAndHandle(route, "127.0.0.1", 8000)

About an error accessing a field inside Tuple2

i am trying to access to a field within a Tuple2 and compiler is returning me an error. The software tries to push a case class within a kafka topic, then i want to recover it using spark streaming so i can feed a machine learning algorithm and save results within a mongo instance.
Solved!
I finally solved my problem, i am going to post the final solution:
This is the github project:
https://github.com/alonsoir/awesome-recommendation-engine/tree/develop
build.sbt
name := "my-recommendation-spark-engine"
version := "1.0-SNAPSHOT"
scalaVersion := "2.10.4"
val sparkVersion = "1.6.1"
val akkaVersion = "2.3.11" // override Akka to be this version to match the one in Spark
libraryDependencies ++= Seq(
"org.apache.kafka" % "kafka_2.10" % "0.8.1"
exclude("javax.jms", "jms")
exclude("com.sun.jdmk", "jmxtools")
exclude("com.sun.jmx", "jmxri"),
//not working play module!! check
//jdbc,
//anorm,
//cache,
// HTTP client
"net.databinder.dispatch" %% "dispatch-core" % "0.11.1",
// HTML parser
"org.jodd" % "jodd-lagarto" % "3.5.2",
"com.typesafe" % "config" % "1.2.1",
"com.typesafe.play" % "play-json_2.10" % "2.4.0-M2",
"org.scalatest" % "scalatest_2.10" % "2.2.1" % "test",
"org.twitter4j" % "twitter4j-core" % "4.0.2",
"org.twitter4j" % "twitter4j-stream" % "4.0.2",
"org.codehaus.jackson" % "jackson-core-asl" % "1.6.1",
"org.scala-tools.testing" % "specs_2.8.0" % "1.6.5" % "test",
"org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" ,
"org.apache.spark" % "spark-core_2.10" % "1.6.1" ,
"org.apache.spark" % "spark-streaming_2.10" % "1.6.1",
"org.apache.spark" % "spark-sql_2.10" % "1.6.1",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.1",
"com.google.code.gson" % "gson" % "2.6.2",
"commons-cli" % "commons-cli" % "1.3.1",
"com.stratio.datasource" % "spark-mongodb_2.10" % "0.11.1",
// Akka
"com.typesafe.akka" %% "akka-actor" % akkaVersion,
"com.typesafe.akka" %% "akka-slf4j" % akkaVersion,
// MongoDB
"org.reactivemongo" %% "reactivemongo" % "0.10.0"
)
packAutoSettings
//play.Project.playScalaSettings
Kafka Producer
package example.producer
import play.api.libs.json._
import example.utils._
import scala.concurrent.Future
import example.model.{AmazonProductAndRating,AmazonProduct,AmazonRating}
import example.utils.AmazonPageParser
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
/**
args(0) : productId
args(1) : userdId
Usage: ./amazon-producer-example 0981531679 someUserId 3.0
*/
object AmazonProducerExample {
def main(args: Array[String]): Unit = {
val productId = args(0).toString
val userId = args(1).toString
val rating = args(2).toDouble
val topicName = "amazonRatingsTopic"
val producer = Producer[String](topicName)
//0981531679 is Scala Puzzlers...
AmazonPageParser.parse(productId,userId,rating).onSuccess { case amazonRating =>
//Is this the correct way? the best performance? possibly not, what about using avro or parquet? How can i push data in avro or parquet format?
//You can see that i am pushing json String to kafka topic, not raw String, but is there any difference?
//of course there are differences...
producer.send(Json.toJson(amazonRating).toString)
//producer.send(amazonRating.toString)
println("amazon product with rating sent to kafka cluster..." + amazonRating.toString)
System.exit(0)
}
}
}
This is the definition of necessary case classes (UPDATED), the file is named models.scala:
package example.model
import play.api.libs.json.Json
import reactivemongo.bson.Macros
case class AmazonProduct(itemId: String, title: String, url: String, img: String, description: String)
case class AmazonRating(userId: String, productId: String, rating: Double)
case class AmazonProductAndRating(product: AmazonProduct, rating: AmazonRating)
// For MongoDB
object AmazonRating {
implicit val amazonRatingHandler = Macros.handler[AmazonRating]
implicit val amazonRatingFormat = Json.format[AmazonRating]
//added using #Yuval tip
lazy val empty: AmazonRating = AmazonRating("-1", "-1", -1d)
}
This is the full code of the spark streaming process:
package example.spark
import java.io.File
import java.util.Date
import play.api.libs.json._
import com.google.gson.{Gson,GsonBuilder, JsonParser}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import com.mongodb.casbah.Imports._
import com.mongodb.QueryBuilder
import com.mongodb.casbah.MongoClient
import com.mongodb.casbah.commons.{MongoDBList, MongoDBObject}
import reactivemongo.api.MongoDriver
import reactivemongo.api.collections.default.BSONCollection
import reactivemongo.bson.BSONDocument
import org.apache.spark.streaming.kafka._
import kafka.serializer.StringDecoder
import example.model._
import example.utils.Recommender
/**
* Collect at least the specified number of json amazon products in order to feed recomedation system and feed mongo instance with results.
Usage: ./amazon-kafka-connector 127.0.0.1:9092 amazonRatingsTopic
on mongo shell:
use alonsodb;
db.amazonRatings.find();
*/
object AmazonKafkaConnector {
private var numAmazonProductCollected = 0L
private var partNum = 0
private val numAmazonProductToCollect = 10000000
//this settings must be in reference.conf
private val Database = "alonsodb"
private val ratingCollection = "amazonRatings"
private val MongoHost = "127.0.0.1"
private val MongoPort = 27017
private val MongoProvider = "com.stratio.datasource.mongodb"
private val jsonParser = new JsonParser()
private val gson = new GsonBuilder().setPrettyPrinting().create()
private def prepareMongoEnvironment(): MongoClient = {
val mongoClient = MongoClient(MongoHost, MongoPort)
mongoClient
}
private def closeMongoEnviroment(mongoClient : MongoClient) = {
mongoClient.close()
println("mongoclient closed!")
}
private def cleanMongoEnvironment(mongoClient: MongoClient) = {
cleanMongoData(mongoClient)
mongoClient.close()
}
private def cleanMongoData(client: MongoClient): Unit = {
val collection = client(Database)(ratingCollection)
collection.dropCollection()
}
def main(args: Array[String]) {
// Process program arguments and set properties
if (args.length < 2) {
System.err.println("Usage: " + this.getClass.getSimpleName + " <brokers> <topics>")
System.exit(1)
}
val Array(brokers, topics) = args
println("Initializing Streaming Spark Context and kafka connector...")
// Create context with 2 second batch interval
val sparkConf = new SparkConf().setAppName("AmazonKafkaConnector")
.setMaster("local[4]")
.set("spark.driver.allowMultipleContexts", "true")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
sc.addJar("target/scala-2.10/blog-spark-recommendation_2.10-1.0-SNAPSHOT.jar")
val ssc = new StreamingContext(sparkConf, Seconds(2))
//this checkpointdir should be in a conf file, for now it is hardcoded!
val streamingCheckpointDir = "/Users/aironman/my-recommendation-spark-engine/checkpoint"
ssc.checkpoint(streamingCheckpointDir)
// Create direct kafka stream with brokers and topics
val topicsSet = topics.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet)
println("Initialized Streaming Spark Context and kafka connector...")
//create recomendation module
println("Creating rating recommender module...")
val ratingFile= "ratings.csv"
val recommender = new Recommender(sc,ratingFile)
println("Initialized rating recommender module...")
//THIS IS THE MOST INTERESTING PART AND WHAT I NEED!
//THE SOLUTION IS NOT PROBABLY THE MOST EFFICIENT, BECAUSE I HAD TO
//USE DATAFRAMES, ARRAYs and SEQs BUT IS FUNCTIONAL!
try{
messages.foreachRDD(rdd => {
val count = rdd.count()
if (count > 0){
val json= rdd.map(_._2)
val dataFrame = sqlContext.read.json(json) //converts json to DF
val myRow = dataFrame.select(dataFrame("userId"),dataFrame("productId"),dataFrame("rating")).take(count.toInt)
println("myRow is: " + myRow)
val myAmazonRating = AmazonRating(myRow(0).getString(0), myRow(0).getString(1), myRow(0).getDouble(2))
println("myAmazonRating is: " + myAmazonRating.toString)
val arrayAmazonRating = Array(myAmazonRating)
//this method needs Seq[AmazonRating]
recommender.predictWithALS(arrayAmazonRating.toSeq)
}//if
})
}catch{
case e: IllegalArgumentException => {println("illegal arg. exception")};
case e: IllegalStateException => {println("illegal state exception")};
case e: ClassCastException => {println("ClassCastException")};
case e: Exception => {println(" Generic Exception")};
}finally{
println("Finished taking data from kafka topic...")
}
ssc.start()
ssc.awaitTermination()
println("Finished!")
}
}
Thank you all, folks, #Yuval, #Emecas and #Riccardo.cardin.
Recommender.predict signature method looks like:
def predict(ratings: Seq[AmazonRating]) = {
// train model
val myRatings = ratings.map(toSparkRating)
val myRatingRDD = sc.parallelize(myRatings)
val startAls = DateTime.now
val model = ALS.train((sparkRatings ++ myRatingRDD).repartition(NumPartitions), 10, 20, 0.01)
val myProducts = myRatings.map(_.product).toSet
val candidates = sc.parallelize((0 until productDict.size).filterNot(myProducts.contains))
// get ratings of all products not in my history ordered by rating (higher first) and only keep the first NumRecommendations
val myUserId = userDict.getIndex(MyUsername)
val recommendations = model.predict(candidates.map((myUserId, _))).collect
val endAls = DateTime.now
val result = recommendations.sortBy(-_.rating).take(NumRecommendations).map(toAmazonRating)
val alsTime = Seconds.secondsBetween(startAls, endAls).getSeconds
println(s"ALS Time: $alsTime seconds")
result
}
//I think I've been as clear as possible, tell me if you need anything more and thanks for your patience teaching me #Yuval
Diagnosis
IllegalStateException suggests that you are operating over a StreamingContext that is already ACTIVE or STOPPED. see details here (lines 218-231)
java.lang.IllegalStateException: Adding new inputs, transformations, and output operations after starting a context is not supported
Code Review
By observing your code AmazonKafkaConnector , you are doing map, filter and foreachRDD into another foreachRDD over the same DirectStream object called : messages
General Advice:
Be functional my friend, by dividing your logic in small pieces for each one of the tasks you want to perform:
Streaming
ML Recommendation
Persistence
etc.
That will help you to understand and debug easier the Spark pipeline that you want to implement.
The problem is that the statement rdd.take(count.toInt) return an Array[T], as stated here
def take(num: Int): Array[T]
Take the first num elements of the RDD.
You're saying to your RDD to take the first n elements in it. Then, differently from what you guess, you've not a object of type Tuple2, but an array.
If you want to print each element of the array, you can use the method mkString defined on the Array type to obtain a single String with all the elements of the array.
It looks like what you're trying to do is is simply a map over a DStream. A map operation is a projection from type A to type B, where A is a String (that you're receiving from Kafka), and B is your case class AmazonRating.
Let's add an empty value to your AmazonRating:
case class AmazonRating(userId: String, productId: String, rating: Double)
object AmazonRating {
lazy val empty: AmazonRating = AmazonRating("-1", "-1", -1d)
}
Now let's parse the JSONs:
val messages = KafkaUtils
.createDirectStream[String, String, StringDecoder, StringDecoder]
(ssc, kafkaParams, topicsSet)
messages
.map { case (_, jsonRating) =>
val format = Json.format[AmazonRating]
val jsValue = Json.parse(record)
format.reads(jsValue) match {
case JsSuccess(rating, _) => rating
case JsError(_) => AmazonRating.empty
}
.filter(_ != AmazonRating.empty)
.foreachRDD(_.foreachPartition(it => recommender.predict(it.toSeq)))

Scala compiler error: package api does not have a member materializeWeakTypeTag

I am new to scala, so I am quite prepared to accept that I am doing something wrong!
I am playing around with Akka, and have a test using scalatest and the akka-testkit. Here is my build.sbt config
name := """EventHub"""
version := "1.0"
scalaVersion := "2.10.3"
libraryDependencies ++= Seq(
"com.typesafe.akka" % "akka-actor_2.10" % "2.2.3",
"com.typesafe.akka" % "akka-testKit_2.10" % "2.2.3" % "test",
"org.scalatest" % "scalatest_2.10.0-M4" % "1.9-2.10.0-M4-B2" % "test",
"com.ning" % "async-http-client" % "1.8.1"
)
When I compile, I get a message that I don't understand. I have google for this and have found related scala compiler issues and bugs. I have no idea if that is what I am seeing or if I am making a basic mistake somewhere. Here is a summary of the output (I have removed alot of "noise" for brevity; can add more detail if required!):
scalac:
while compiling: /Users/robert/Documents/Programming/Scala/Projects/EventHub/src/test/scala/Hub/Subscription/SubscriberSpec.scala
during phase: typer
library version: version 2.10.3
compiler version: version 2.10.3
...
...
== Expanded type of tree ==
TypeRef(
TypeSymbol(
class SubscriberSpec extends TestKit with WordSpec with BeforeAndAfterAll with ImplicitSender
)
)
uncaught exception during compilation: scala.reflect.internal.FatalError
And:
scalac: Error: package api does not have a member materializeWeakTypeTag
scala.reflect.internal.FatalError: package api does not have a member materializeWeakTypeTag
at scala.reflect.internal.Definitions$DefinitionsClass.scala$reflect$internal$Definitions$DefinitionsClass$$fatalMissingSymbol(Definitions.scala:1037)
at scala.reflect.internal.Definitions$DefinitionsClass.getMember(Definitions.scala:1055)
at scala.reflect.internal.Definitions$DefinitionsClass.getMemberMethod(Definitions.scala:1090)
at scala.reflect.internal.Definitions$DefinitionsClass.materializeWeakTypeTag(Definitions.scala:518)
at scala.tools.reflect.FastTrack$class.fastTrack(FastTrack.scala:34)
at scala.tools.nsc.Global$$anon$1.fastTrack$lzycompute(Global.scala:493)
at scala.tools.nsc.Global$$anon$1.fastTrack(Global.scala:493)
at scala.tools.nsc.typechecker.Namers$Namer.methodSig(Namers.scala:1144)
at scala.tools.nsc.typechecker.Namers$Namer.getSig$1(Namers.scala:1454)
at scala.tools.nsc.typechecker.Namers$Namer.typeSig(Namers.scala:1466)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply$mcV$sp(Namers.scala:731)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply(Namers.scala:730)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply(Namers.scala:730)
at scala.tools.nsc.typechecker.Namers$Namer.scala$tools$nsc$typechecker$Namers$Namer$$logAndValidate(Namers.scala:1499)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1.apply(Namers.scala:730)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1.apply(Namers.scala:729)
at scala.tools.nsc.typechecker.Namers$$anon$1.completeImpl(Namers.scala:1614)
...
...
I am using IntelliJ as the ide. There are a couple scala files; one contains an actor, the other a webclient:
package Hub.Subscription
import scala.concurrent.{Promise, Future}
import com.ning.http.client.{AsyncCompletionHandler, AsyncHttpClient, Response}
trait WebClient {
def postUpdate(url: String, payload: Any, topic: String): Future[Int]
def postUnSubscribe(url: String, topic: String): Future[Int]
}
case class PostUpdateFailed(status: Int) extends RuntimeException
object AsyncWebClient extends WebClient{
private val client = new AsyncHttpClient
override def postUpdate(url: String, payload: Any, topic: String): Future[Int] = {
val request = client.preparePost(url).build()
val result = Promise[Int]()
client.executeRequest(request, new AsyncCompletionHandler[Response]() {
override def onCompleted(response: Response) = {
if (response.getStatusCode / 100 < 4)
result.success(response.getStatusCode)
else
result.failure(PostUpdateFailed(response.getStatusCode))
response
}
override def onThrowable(t: Throwable) {
result.failure(t)
}
})
result.future
}
override def postUnSubscribe(url: String, topic: String): Future[Int] = {
val request = client.preparePost(url).build()
val result = Promise[Int]
client.executeRequest(request, new AsyncCompletionHandler[Response] {
override def onCompleted(response: Response) = {
if (response.getStatusCode / 100 < 4)
result.success(response.getStatusCode)
else
result.failure(PostUpdateFailed(response.getStatusCode))
response
}
override def onThrowable(t: Throwable) {
result.failure(t)
}
})
result.future
}
def shutdown(): Unit = client.close()
}
And my actor:
package Hub.Subscription
import akka.actor.Actor
import Hub.Subscription.Subscriber.{Failed, Update, UnSubscribe}
import scala.concurrent.ExecutionContext
import java.util.concurrent.Executor
object Subscriber {
object UnSubscribe
case class Update(payload: Any)
case class Failed(callbackUrl: String)
}
class Subscriber(callbackUrl: String, unSubscribeUrl: String, topic: String) extends Actor{
implicit val executor = context.dispatcher.asInstanceOf[Executor with ExecutionContext]
def client: WebClient = AsyncWebClient
def receive = {
case Update(payload) => doUpdate(payload)
case UnSubscribe => doUnSubscribe
case Failed(clientUrl) => //log?
}
def doUpdate(payload: Any): Unit = {
val future = client.postUpdate(callbackUrl, payload, topic)
future onFailure {
case err: Throwable => sender ! Failed(callbackUrl)
}
}
def doUnSubscribe: Unit = {
//tell the client that they have been un-subscribed
val future = client.postUnSubscribe(unSubscribeUrl, topic)
future onFailure {
case err: Throwable => //log
}
}
}
And finally my test spec:
package Hub.Subscription
import akka.testkit.{ImplicitSender, TestKit}
import akka.actor.{ActorRef, Props, ActorSystem}
import org.scalatest.{WordSpec, BeforeAndAfterAll}
import scala.concurrent.Future
import scala.concurrent.duration._
object SubscriberSpec {
def buildTestSubscriber(url: String, unSubscribeUrl: String, topic: String, webClient: WebClient): Props =
Props(new Subscriber(url, unSubscribeUrl, topic) {
override def client = webClient
})
object FakeWebClient extends WebClient {
override def postUpdate(url: String, payload: Any, topic: String): Future[Int] = Future.successful(201)
override def postUnSubscribe(url: String, topic: String): Future[Int] = Future.failed(PostUpdateFailed(500))
}
}
class SubscriberSpec extends TestKit(ActorSystem("SubscriberSpec"))
with WordSpec
with BeforeAndAfterAll
with ImplicitSender {
import SubscriberSpec._
"A subscriber" must {
"forward the update to the callback url" in {
val fakeClient = FakeWebClient
val callbackUrl = "http://localhost:9000/UserEvents"
val subscriber: ActorRef = system.actorOf(buildTestSubscriber(callbackUrl, "unSubscribeUrl", "aTopic", fakeClient))
subscriber ! Subscriber.Update(Nil)
within(200 millis) {
expectNoMsg
}
}
}
override def afterAll(): Unit = {
system.shutdown()
}
}
Thanks in advance for any help / pointers!
Update: I should have noted that if I do not include the test spec, then all is well. But when I add the test spec, I get the errors above.
Btw I just realized that you're using scalatest compiled for 2.10.0-M4.
Scala's final releases aren't supposed to be binary compatible with corresponding milestone releases, so weird things might happen, including crashes.
If you change scalatest's version to "org.scalatest" %% "scalatest" % "1.9.1", everything is going to work just fine.