Spark-Plugin for custom dynamic metrics - scala

I'm working on custom spark metrics using the SparkPlugin. I'm able to register static metrics
package com.mavencode.example.plugin
import com.codahale.metrics.MetricRegistry
import org.apache.spark.api.plugin.{DriverPlugin, ExecutorPlugin, PluginContext, SparkPlugin}
class CustomMetricSparkPlugin extends SparkPlugin {
override def driverPlugin(): DriverPlugin = null
override def executorPlugin(): ExecutorPlugin = new ExecutorPlugin {
override def init(ctx: PluginContext, extraConf: java.util.Map[String, String]): Unit = {
val metricRegistry = ctx.metricRegistry()
metricRegistry.register("blacklisted_count", EventMetricPlugin.blacklistedEventCounter)
metricRegistry.register("avro_json_parser_duration", EventMetricPlugin.avroParserTimer)
metricRegistry.register("avro_json_parser_partition_size", EventMetricPlugin.avroParserEventsPerPartition)
metricRegistry.register("avro_json_parser_errors", EventMetricPlugin.avroParserErrorCounter)
}
}
}
However, I would like to register dynamic metrics so I can update the metrics for the dynamic event <event_name>.blacklisted_count
metricRegistry.register("<event_name>.blacklisted_count", EventMetricPlugin.blacklistedEventCounter)
Any idea how to achieve this ?

Related

Creating stream from api in Apache Flink

Firstly I describe what I want to do. I have an API that gets a function as a argument (looks like this:dataFromApi => {//do sth}) and I would like to process this data by Flink. I wrote this code to simulate this API:
val myIterator = new TestIterator
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val th1 = new Thread {
override def run(): Unit = {
for (i <- 0 to 10) {
Thread sleep 1000
myIterator.addToQueue("test" + i)
}
}
}
th1.start()
val texts: DataStream[String] = env
.fromCollection(new TestIterator)
texts.print()
This is my iterator:
class TestIterator extends Iterator[String] with Serializable {
private val q: BlockingQueue[String] = new LinkedBlockingQueue[String]
def addToQueue(s: String): Unit = {
println("Put")
q.put(s)
}
override def hasNext: Boolean = true
override def next(): String = {
println("Wait for queue")
q.take()
}
}
My idea was execute myIterator.addToQueue(dataFromApi) when I receive data, but this code doesn't work. Despiting adding to the queue, execution blocks on q.take(). I tried to write own SourceFunction based on idea with Queue and also I tried with this: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/ but I can't manage I want.

How to make source function that polls an http endpoint into flink stream for every 1 hour?

I am trying to have a source which poll http endpoint every 1 hour and keep that as flink source to broadcast to operators.
I tried to make it as simple function but seems not working as expected.
Code is :
import org.apache.flink.streaming.api.functions.source.SourceFunction
import org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext
import org.apache.http.{HttpRequest, HttpResponse}
import org.apache.http.entity.StringEntity
import org.apache.http.impl.bootstrap.{HttpServer, ServerBootstrap}
import org.apache.http.protocol.{HttpContext, HttpRequestHandler}
import java.util.concurrent.TimeUnit
class HttpStreamFun(url: String) extends SourceFunction[String] {
#transient private var server: HttpServer = _
override def run(ctx: SourceContext[String]): Unit = {
server = ServerBootstrap
.bootstrap()
.registerHandler(
url,
new HttpRequestHandler() {
override def handle(req: HttpRequest,
rep: HttpResponse,
context: HttpContext): Unit = {
ctx.collect(req.getRequestLine.getUri)
rep.setStatusCode(200)
rep.setEntity(new StringEntity("OK"))
}
}
)
.create()
server.start()
server.awaitTermination(1, TimeUnit.HOURS)
}
override def cancel(): Unit = {
server.stop()
}
}
Main job has these to add the source as datastream:
val text: DataStream[String] = env.addSource(new HttpStreamFun(config.baseUri))
text.print()
Maybe you can try to use BroadcastStream
For more information, refer to
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/broadcast_state/

play.application.loader [class java.lang.Class}] does not implement interface play.api.ApplicationLoader or interface play.ApplicationLoader

My service is running fine until I hit api endpoint request with this error-
Cannot load play.application.loader[play.application.loader [class java.lang.Class}] does not implement interface play.api.ApplicationLoader or interface play.ApplicationLoader.]
my service Loader:-
class LagomPersistentEntityLoader extends LagomApplicationLoader {
override def load(context: LagomApplicationContext): LagomApplication =
new LagomPersistentEntityApplication(context) with AkkaDiscoveryComponents
override def loadDevMode(context: LagomApplicationContext): LagomApplication =
new LagomPersistentEntityApplication(context) with LagomDevModeComponents
override def describeService: Option[Descriptor] = Some(readDescriptor[LagomTestingEntity])
}
trait UserComponents
extends LagomServerComponents
with SlickPersistenceComponents
with HikariCPComponents
with AhcWSComponents {
override lazy val jsonSerializerRegistry: JsonSerializerRegistry = UserSerializerRegistry
lazy val userRepo: UserRepository = wire[UserRepository]
readSide.register(wire[UserEventsProcessor])
clusterSharding.init(
Entity(UserState.typeKey){ entityContext =>
UserBehaviour(entityContext)
}
)
}
abstract class LagomPersistentEntityApplication(context: LagomApplicationContext)
extends LagomApplication(context)
with UserComponents {
implicit lazy val actorSystemImpl: ActorSystem = actorSystem
implicit lazy val ec: ExecutionContext = executionContext
override lazy val lagomServer: LagomServer = serverFor[LagomTestingEntity](wire[LagomTestingEntityImpl])
lazy val bodyParserDefault: Default = wire[Default]
}
application.conf:-
play.application.loader = org.organization.service.LagomTestingEntityImpl
lagom-persistent-entity.cassandra.keyspace = lagom-persistent-entity
cassandra-journal.keyspace = ${lagom-persistent-entity.cassandra.keyspace}
cassandra-snapshot-store.keyspace = ${lagom-persistent-entity.cassandra.keyspace}
lagom.persistent.read-side.cassandra.keyspace = ${lagom-persistent-entity.cassandra.keyspace}
This service has both read side and write side support and If anyone need more info in code then please ask because I really wants to understand where the application is failing.
You should set play.application.loader to the name of your loader class, rather than your persistent entity class:
play.application.loader = org.organization.service.LagomPersistentEntityLoader
I'm assuming that LagomPersistentEntityLoader is in the same package as LagomTestingEntityImpl. If not, then adjust the fully-qualified class name as needed.

How to unitest gauge metrics in flink

I have a datastrean in flink and I generate my owns metrics using gauge in a ProcessFunction.
As these metrics are important for my activity, i would like to unit test them once the flow is executed.
Unfortunately, I didn't find a way to implement a proper test reporter.
Here is a simple code explaining my issue.
Two concerns with this code:
how do i trigger the gauge
how do I get the reporter instiantiated by env.execute
Here is the sample
import java.util.concurrent.atomic.AtomicInteger
import org.apache.flink.api.scala.metrics.ScalaGauge
import org.apache.flink.configuration.{ConfigConstants, Configuration}
import org.apache.flink.metrics.reporter.AbstractReporter
import org.apache.flink.metrics.{Gauge, Metric, MetricConfig}
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.ProcessFunction
import org.apache.flink.streaming.api.functions.sink.SinkFunction
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, _}
import org.apache.flink.util.Collector
import org.scalatest.FunSuite
import org.scalatest.Matchers._
import org.scalatest.PartialFunctionValues._
import scala.collection.JavaConverters._
import scala.collection.mutable
/* Test based on Flink test example https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/testing.html */
class MultiplyByTwo extends ProcessFunction[Long, Long] {
override def processElement(data: Long, context: ProcessFunction[Long, Long]#Context, collector: Collector[Long]): Unit = {
collector.collect(data * 2L)
}
val nbrCalls = new AtomicInteger(0)
override def open(parameters: Configuration): Unit = {
getRuntimeContext.getMetricGroup
.addGroup("counter")
.gauge[Int, ScalaGauge[Int]]("call" , ScalaGauge[Int]( () => nbrCalls.get()))
}
}
// create a testing sink
class CollectSink extends SinkFunction[Long] {
override def invoke(value: Long): Unit = {
synchronized {
CollectSink.values.add(value)
}
}
}
object CollectSink {
val values: java.util.ArrayList[Long] = new java.util.ArrayList[Long]()
}
class StackOverflowTestReporter extends AbstractReporter {
var gaugesMetrics : mutable.Map[String, String] = mutable.Map[String, String]()
override def open(metricConfig: MetricConfig): Unit = {}
override def close(): Unit = {}
override def filterCharacters(s: String): String = s
def report(): Unit = {
gaugesMetrics = this.gauges.asScala.map(t => (metricValue(t._1), t._2))
}
private def metricValue(m: Metric): String = {
m match {
case g: Gauge[_] => g.getValue.toString
case _ => ""
}
}
}
class StackOverflowTest extends FunSuite with StreamingMultipleProgramsTestBase{
def createConfigForReporter(reporterName : String) : Configuration = {
val cfg : Configuration = new Configuration()
cfg.setString(ConfigConstants.METRICS_REPORTER_PREFIX + reporterName + "." + ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX, classOf[StackOverflowTestReporter].getName)
cfg
}
test("test_metrics") {
val env = StreamExecutionEnvironment.createLocalEnvironment(
StreamExecutionEnvironment.getDefaultLocalParallelism,
createConfigForReporter("reporter"))
// configure your test environment
env.setParallelism(1)
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
// values are collected in a static variable
CollectSink.values.clear()
// create a stream of custom elements and apply transformations
env.fromElements[Long](1L, 21L, 22L)
.process(new MultiplyByTwo())
.addSink(new CollectSink())
// execute
env.execute()
// verify your results
CollectSink.values should have length 3
CollectSink.values should contain (2L)
CollectSink.values should contain (42L)
CollectSink.values should contain (44L)
//verify gauge counter
//pseudo code ...
val testReporter : StackOverflowTestReporter = _ // how to get testReporter instantiate in env
testReporter.gaugesMetrics should have size 1
testReporter.gaugesMetrics should contain key "count.call"
testReporter.gaugesMetrics.valueAt("count.call") should be equals("3")
}
}
Solution thanks to Chesnay Schepler
import java.util.concurrent.atomic.AtomicInteger
import org.apache.flink.api.common.time.Time
import org.apache.flink.api.scala.metrics.ScalaGauge
import org.apache.flink.configuration.{ConfigConstants, Configuration}
import org.apache.flink.metrics.reporter.MetricReporter
import org.apache.flink.metrics.{Metric, MetricConfig, MetricGroup}
import org.apache.flink.streaming.api.functions.ProcessFunction
import org.apache.flink.streaming.api.functions.sink.SinkFunction
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, _}
import org.apache.flink.test.util.MiniClusterResource
import org.apache.flink.util.Collector
import org.scalatest.Matchers._
import org.scalatest.PartialFunctionValues._
import org.scalatest.{BeforeAndAfterAll, FunSuite}
import scala.collection.mutable
/* Test based on Flink test example https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/testing.html */
class MultiplyByTwo extends ProcessFunction[Long, Long] {
override def processElement(data: Long, context: ProcessFunction[Long, Long]#Context, collector: Collector[Long]): Unit = {
nbrCalls.incrementAndGet()
collector.collect(data * 2L)
}
val nbrCalls = new AtomicInteger(0)
override def open(parameters: Configuration): Unit = {
getRuntimeContext.getMetricGroup
.addGroup("counter")
.gauge[Int, ScalaGauge[Int]]("call" , ScalaGauge[Int]( () => nbrCalls.get()))
}
}
// create a testing sink
class CollectSink extends SinkFunction[Long] {
import CollectSink._
override def invoke(value: Long): Unit = {
synchronized {
values.add(value)
}
}
}
object CollectSink {
val values: java.util.ArrayList[Long] = new java.util.ArrayList[Long]()
}
class StackOverflowTestReporter extends MetricReporter {
import StackOverflowTestReporter._
override def open(metricConfig: MetricConfig): Unit = {}
override def close(): Unit = {}
override def notifyOfAddedMetric(metric: Metric, metricName: String, group: MetricGroup) : Unit = {
metric match {
case gauge: ScalaGauge[_] => {
//drop group metrics meaningless for the test, seem's to be the first 6 items
val gaugeKey = group.getScopeComponents.toSeq.drop(6).mkString(".") + "." + metricName
gaugesMetrics(gaugeKey) = gauge.asInstanceOf[ScalaGauge[Int]]
}
case _ =>
}
}
override def notifyOfRemovedMetric(metric: Metric, metricName: String, group: MetricGroup): Unit = {}
}
object StackOverflowTestReporter {
var gaugesMetrics : mutable.Map[String, ScalaGauge[Int]] = mutable.Map[String, ScalaGauge[Int]]()
}
class StackOverflowTest extends FunSuite with BeforeAndAfterAll{
val miniClusterResource : MiniClusterResource = buildMiniClusterResource()
override def beforeAll(): Unit = {
CollectSink.values.clear()
StackOverflowTestReporter.gaugesMetrics.clear()
miniClusterResource.before()
}
override def afterAll(): Unit = {
miniClusterResource.after()
}
def createConfigForReporter() : Configuration = {
val cfg : Configuration = new Configuration()
cfg.setString(ConfigConstants.METRICS_REPORTER_PREFIX + "reporter" + "." + ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX, classOf[StackOverflowTestReporter].getName)
cfg
}
def buildMiniClusterResource() : MiniClusterResource = new MiniClusterResource(
new MiniClusterResource.MiniClusterResourceConfiguration(
createConfigForReporter(),1,1, Time.milliseconds(50L)))
test("test_metrics") {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.fromElements[Long](1L, 21L, 22L)
.process(new MultiplyByTwo())
.addSink(new CollectSink())
env.execute()
CollectSink.values should have length 3
CollectSink.values should contain (2L)
CollectSink.values should contain (42L)
CollectSink.values should contain (44L)
//verify gauge counter
val gaugeValues = StackOverflowTestReporter.gaugesMetrics.map(t => (t._1, t._2.getValue()))
gaugeValues should have size 1
gaugeValues should contain ("counter.call" -> 3)
}
}
your best bet is to use a MiniClusterResource to explicitly start a cluster before the job and configure a reporter that checks for specific metrics and exposes them through static fields.
#Rule
public final MiniClusterResource clusterResource = new MiniClusterResource(
new MiniClusterResourceConfiguration.Builder()
.setConfiguration(getConfig()));
private static Configuration getConfig() {
Configuration config = new Configuration();
config.setString(
ConfigConstants.METRICS_REPORTER_PREFIX +
"myTestReporter." +
ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX,
MyTestReporter.class.getName());
return config;
}
public static class MyTestReporter implements MetricReporter {
static volatile Gauge<?> myGauge = null;
#Override
public void open(MetricConfig metricConfig) {
}
#Override
public void close() {
}
#Override
public void notifyOfAddedMetric(Metric metric, String name, MetricGroup metricGroup) {
if ("myMetric".equals(name)) {
myGauge = (Gauge<?>) metric;
}
}
#Override
public void notifyOfRemovedMetric(Metric metric, String s, MetricGroup metricGroup) {
}
}

Getting execution exception for custom slick profile

Im' getting an exception when I try to use my own custom profile with slick. The reason why I want to use it is because I want to keep JSON in my postgresql database. Therefore, I'm using pg-slick.
The exception says:
slick.jdbc.PostgresProfile$ cannot be cast to util.ExtendedPostgresProfile.
This is my code for the ExtendedPostgresProfile:
package util
import com.github.tminglei.slickpg._
trait ExtendedPostgresProfile extends ExPostgresProfile with PgPlayJsonSupport {
override val api = new API with PlayJsonImplicits
override def pgjson: String = "jsonb"
}
object ExtendedPostgresProfile extends ExtendedPostgresProfile
This is my DAO class:
class ActivityDAO #Inject()(dbConfigProvider: DatabaseConfigProvider)(implicit ec: ExecutionContext) {
private val dbConfig = dbConfigProvider.get[ExtendedPostgresProfile]
import dbConfig._
import profile.api._
private class ActivityTable(tag: Tag) extends Table[Activity](tag, "activity") {
def id: Rep[Long] = column[Long]("id", O.PrimaryKey, O.AutoInc)
def activity: Rep[JsValue] = column[JsValue]("activity")
def atTime: Rep[Timestamp] = column[Timestamp]("at_time")
def activityTypeId: Rep[Int] = column[Int]("activiry_type_id")
def userId: Rep[Long] = column[Long]("user_id")
override def * : ProvenShape[Activity] =
(id.?, activity, atTime.?, activityTypeId, userId.?) <> ((Activity.apply _).tupled, Activity.unapply)
}
private val activities = TableQuery[ActivityTable]
def add(activity: Activity): Future[Long] = {
val query = activities returning activities.map(_.id)
db.run(query += activity)
}
def filter(userId: Long): Future[Seq[Activity]] = {
db.run(activities.filter(_.userId === userId).result)
}
}
I've tried searching for the answer my self, but haven't had much luck.
Is your custom profile configured in your Play-slick configuration as suggested at the Database Configuration section? I.e. is it util.ExtendedPostgresProfile$ or is it slick.jdbc.PostgresProfile$?