Scala: error: overloaded method value info with alternatives for log4j - scala

New to Scala, trying to compute and log latency for my kafka records using log4j, but running into errors. Tried to look at some SO articles but I think I missing some Scala concept here. Any help is appreciated.
Solution1: Does not give an error.
val currentTimeInMillis = Instant.now.toEpochMilli
val latency = Math.max(0, currentTimeInMillis - record.timestamp())
logger.info("record latency: {}", latency)
logger.info("record KafkaPartition: {}, record Offset: {}", record.kafkaPartition(), record.kafkaOffset())
Solution2: This gives error:
val currentTimeInMillis = Instant.now.toEpochMilli
val latency = Math.max(0, currentTimeInMillis - record.timestamp())
logger.info("record latency: {}, record KafkaPartition: {}, record Offset: {}", latency, record.kafkaPartition(), record.kafkaOffset())
Getting below error for Solution2:
error: overloaded method value info with alternatives
[ERROR] (x$1: org.slf4j.Marker,x$2: String,x$3: Object*)Unit <and>
[ERROR] (x$1: org.slf4j.Marker,x$2: String,x$3: Any,x$4: Any)Unit <and>
[ERROR] (x$1: String,x$2: Object*)Unit
[ERROR] cannot be applied to (String, Long, Integer, Long)
[ERROR] logger.info("record latency: {}, record KafkaPartition: {}, record Offset: {}", latency, record.kafkaPartition(), record.kafkaOffset())
[ERROR] ^
[ERROR] one error found

I've run into this before with Scala. Adding .toString to log arguments that aren't Strings will fix the problem.

Related

Scala: Cannot resolve overloaded methods (Flink WatermarkStrategy)

I'm following Flink's documentation on how to use WatermarkStrategy with KafkaConsumer. The code is shown below
val kafkaSource = new FlinkKafkaConsumer[MyType]("myTopic", schema, props)
kafkaSource.assignTimestampsAndWatermarks(
WatermarkStrategy
.forBoundedOutOfOrderness(Duration.ofSeconds(20)))
val stream: DataStream[MyType] = env.addSource(kafkaSource)
Anytime I try to compile the code above I get an error saying
error: overloaded method value assignTimestampsAndWatermarks with alternatives:
error: overloaded method value assignTimestampsAndWatermarks with alternatives:
[ERROR] (x$1: org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks[String])org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase[String] <and>
[ERROR] (x$1: org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks[String])org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase[String] <and>
[ERROR] (x$1: org.apache.flink.api.common.eventtime.WatermarkStrategy[String])org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase[String]
[ERROR] cannot be applied to (org.apache.flink.api.common.eventtime.WatermarkStrategy[Nothing])
[ERROR] consumer.assignTimestampsAndWatermarks(
The code below returns WatermarkStrategyy[Nothing] instead of WatermarkStrategy[String]
WatermarkStrategy
.forBoundedOutOfOrderness(Duration.ofSeconds(20)))
I solved this by using this code
val kafkaSource = new FlinkKafkaConsumer[MyType]("myTopic", schema, props)
watermark: Watermark[String] = WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(20))
kafkaSource.assignTimestampsAndWatermarks(watermark)
#Mayokun is right. But to make the code simpler, you could put the type information right after the static method:
val kafkaSource = new FlinkKafkaConsumer[MyType]("myTopic", schema, props)
kafkaSource.assignTimestampsAndWatermarks(
WatermarkStrategy.forBoundedOutOfOrderness[MyType](Duration.ofSeconds(20))
)

Errors in converting JSON to Map in Scala

I am new to scala and I am trying to write a function that takes in a JSON, converts it to Scala dictionary (Map) and checks for certain keys
Below is part of a function that checks for a bunch keys
import play.api.libs.json.Json
def setParams(jsonString: Map[String, Any]) = {
val paramsMap = Json.parse(jsonString)
if (parmsMap.contains("key_1")) {
println('key_1 present')
}
On compiling it with sbt, I get the following errors
/Users/usr/scala_codes/src/main/scala/wrapped_code.scala:29:26: overloaded method value parse with alternatives:
[error] (input: Array[Byte])play.api.libs.json.JsValue <and>
[error] (input: java.io.InputStream)play.api.libs.json.JsValue <and>
[error] (input: String)play.api.libs.json.JsValue
[error] cannot be applied to (Map[String,Any])
[error] val paramsMap = Json.parse(jsonString)
[error] ^
[error] /Users/usr/scala_codes/src/main/scala/wrapped_code.scala:31:9: not found: value parmsMap
[error] if (parmsMap.contains("key_1")) {
Also, in key-value pairs of the JSON, the keys are all strings but the values could be integers, floats or strings. Do I need to make changes for that?
Seems your input type in setParams function should be String not Map[String, Any]
and you have one typo: if (parmsMap.contains("key_1")) should be if (paramsMap.contains("key_1"))
correct function:
def setParams(jsonString: String): Unit = {
val paramsMap = Json.parse(jsonString).as[Map[String, JsValue]]
if (paramsMap.contains("key_1")) println('key_1 present')
}

scala spark type mismatching

I need to group my rdd by two columns and aggregate the count. I have a function:
def constructDiagnosticFeatureTuple(diagnostic: RDD[Diagnostic])
: RDD[FeatureTuple] = {
val grouped_patients = diagnostic
.groupBy(x => (x.patientID, x.code))
.map(_._2)
.map{ events =>
val p_id = events.map(_.patientID).take(1).mkString
val f_code = events.map(_.code).take(1).mkString
val count = events.size.toDouble
((p_id, f_code), count)
}
//should be in form:
//diagnostic.sparkContext.parallelize(List((("patient", "diagnostics"), 1.0)))
}
At compile time, I am getting an error:
/FeatureConstruction.scala:38:3: type mismatch;
[error] found : Unit
[error] required: org.apache.spark.rdd.RDD[edu.gatech.cse6250.features.FeatureConstruction.FeatureTuple]
[error] (which expands to) org.apache.spark.rdd.RDD[((String, String), Double)]
[error] }
[error] ^
How can I fix it?
I red this post: Scala Spark type missmatch found Unit, required rdd.RDD , but I do not use collect(), so, it does not help me.

how to create key value RDD out of kafka topic data

I am reading data from kafka topic in spark streaming job. I need to create key value RDD out of data.
val messages = KafkaUtils.createStream(streamingContext, "localhost:2181","abc",topics, StorageLevel.MEMORY_ONLY)
messages.print()
create key value RDD out of CustomerId and Tokens
val xactionByCustomer = messages.map(_._2).map {
transaction =>
val key = transaction.customerId
var tokens = transaction.tokens
(key, tokens)
}
Error ::
[error] /home/ec2-user/alok/marseille/src/main/scala/com/jcalc/feed/MarkovPredictor.scala:115: value customerId is not a member of String
[error] val key = transaction.customerId
[error] ^
[error] /home/ec2-user/alok/marseille/src/main/scala/com/jcalc/feed/MarkovPredictor.scala:116: value tokens is not a member of String
[error] var tokens = transaction.tokens
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
sample data ::
(null,W3Q6TF3CCI,X84N230CIH,NNN)
(null,O8IV7KEXT0,G1D590G05V,NNS)
(null,LBQKYNE081,MYU0O7JC5H,NHN)
(null,SRB4P501SW,E0FTI4RN7X,LHL)
(null,HELRFMAXVS,W6F704TN21,LHN)
(null,FS4PLQLI63,TK1O9YHS15,NNN)
(null,KI70UDVJLC,4ANBDAW7SU,LNN)
(null,IP6IVPGCWQ,MD93GGGBKA,NNN)
(null,976N9RPXSP,JKU0SV7UMH,LNL)
(null,J4V3AB1YVT,J9WXC1BRAY,LHN)
I am interested in 2nd & 4th value only for pair RDD.
Any Help ?
Your data looks like tuple: (String, String, String, String) and since you're interested in 2dn & 4th value mapping:
val xactionByCustomer = messages.map(row => (row._2, row._4))
should be enough.

use SQL in DStream.transform() over Spark Streaming?

There are some examples for use SQL over Spark Streaming in foreachRDD(). But if I want to use SQL in tranform():
case class AlertMsg(host:String, count:Int, sum:Double)
val lines = ssc.socketTextStream("localhost", 8888)
lines.transform( rdd => {
if (rdd.count > 0) {
val t = sqc.jsonRDD(rdd)
t.registerTempTable("logstash")
val sqlreport = sqc.sql("SELECT host, COUNT(host) AS host_c, AVG(lineno) AS line_a FROM logstash WHERE path = '/var/log/system.log' AND lineno > 70 GROUP BY host ORDER BY host_c DESC LIMIT 100")
sqlreport.map(r => AlertMsg(r(0).toString,r(1).toString.toInt,r(2).toString.toDouble))
} else {
rdd
}
}).print()
I got such error:
[error] /Users/raochenlin/Downloads/spark-1.2.0-bin-hadoop2.4/logstash/src/main/scala/LogStash.scala:52: no type parameters for method transform: (transformFunc: org.apache.spark.rdd.RDD[String] => org.apache.spark.rdd.RDD[U])(implicit evidence$5: scala.reflect.ClassTag[U])org.apache.spark.streaming.dstream.DStream[U] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[String] => org.apache.spark.rdd.RDD[_ >: LogStash.AlertMsg with String <: java.io.Serializable])
[error] --- because ---
[error] argument expression's type is not compatible with formal parameter type;
[error] found : org.apache.spark.rdd.RDD[String] => org.apache.spark.rdd.RDD[_ >: LogStash.AlertMsg with String <: java.io.Serializable]
[error] required: org.apache.spark.rdd.RDD[String] => org.apache.spark.rdd.RDD[?U]
[error] lines.transform( rdd => {
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
Seems only if I use sqlreport.map(r => r.toString) can be a correct usage?
dstream.transform take a function transformFunc: (RDD[T]) ⇒ RDD[U]
In this case, the if must result in the same type on both evaluations of the condition, which is not the case:
if (count == 0) => RDD[String]
if (count > 0) => RDD[AlertMsg]
In this case, remove the optimization of if rdd.count ... sothat you have an unique transformation path.