Scala hex literal for bytes - scala

Hex literal containing A-F digit are converting to int by default. When I am trying to declear an Int with 0x it is creating correctly.
val a: Int = 0x34
val b: Int = 0xFF
But when I am trying to declear a Byte with 0x second line is not compiling
val a: Byte = 0x34
val b: Byte = 0xFF // compilation error
I have found a workaround that is
val a: Byte = 0x34
val b: Byte = 0xFF.toByte
But is there any decent way to declear a Byte from its hex literal?
For example I am trying to declear a Byte array in a Test method in this way
anObject.someMethod(1, 1.1f, 0xAB, "1") shouldBe
Array[Byte](0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF, 0xAF)
anObject.someMethod(2, 2.2f, 0xCD, "2") shouldBe
Array[Byte](0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE, 0xBE)
anObject.someMethod(3, 3.2f, 0xEF, "3") shouldBe
Array[Byte](0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD, 0xCD)
But not in this way
anObject.someMethod(1, 1.1f, 0xAB.toByte, "1") shouldBe
Array[Byte](0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte, 0xAF.toByte)
anObject.someMethod(2, 2.2f, 0xCD.toByte, "2") shouldBe
Array[Byte](0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte, 0xBE.toByte)
anObject.someMethod(3, 3.2f, 0xEF.toByte, "3") shouldBe
Array[Byte](0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte, 0xCD.toByte)
Tested in scala 2.12.4

You can do this with implicit conversions.
Before:
def f(b: Byte) = println(s"Byte = $b")
f(0x34)
f(0xFF) // compilation error
After:
implicit def int2Byte(i: Int) = i.toByte
def f(b: Byte) = println(s"Byte = $b")
f(0x34)
f(0xFF)
Output:
Byte = 52
Byte = -1

Recall that in Scala we can easily define new ways to interpret arbitrary String literals by adding methods to a special class StringContext using the "pimp-my-library"-pattern. For example, we can add the method b for creating single bytes to StringContext so that we can write down bytes as follows:
val myByte = b"5C"
Here is how it can be implemented:
implicit class SingleByteContext(private val sc: StringContext) {
def b(args: Any*): Byte = {
val parts = sc.parts.toList
assert(
parts.size == 1 && args.size == 0,
"Expected a string literal with exactly one part"
)
Integer.parseInt(parts(0), 16).toByte
}
}
In order to use this new syntax, we have to import the above object into implicit scope. Then we can do this:
/* Examples */ {
def printByte(b: Byte) = println("%02X".format(b))
printByte(b"01")
printByte(b"7F")
printByte(b"FF")
printByte(b"80")
}
This will print:
01
7F
FF
80
You can obviously tweak the implementation (e.g. you can rename "b" to "hex" or "x"
or "Ox" or something like this).
Note that this technique can be easily extended to deal with entire byte arrays, as described in this answer to a similar question. This would allow you to write down byte arrays without repeating the annoying 0x-prefix, e.g.:
val myBytes = hexBytes"AB,CD,5E,7F,5A,8C,80,BC"

Related

Unit Testing Apache Spark Application with Intellij Results in Error

I have a Spark application that is supposed to do data preparation step. I have some unit tests written for checking data quality using deequ and as usual I wanted to run one of my unit tests, but I'm running into errors as below:
Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder':
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1148)
at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:159)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:155)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:152)
at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:997)
at org.apache.spark.sql.SparkSession.read(SparkSession.scala:658)
at com.bigelectrons.housingml.dataprep.HousingDataTest.$anonfun$new$1(HousingDataTest.scala:32)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682)
at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685)
at org.scalatest.FlatSpecLike.invokeWithFixture$1(FlatSpecLike.scala:1680)
at org.scalatest.FlatSpecLike.$anonfun$runTest$1(FlatSpecLike.scala:1692)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
at org.scalatest.FlatSpecLike.runTest(FlatSpecLike.scala:1692)
at org.scalatest.FlatSpecLike.runTest$(FlatSpecLike.scala:1674)
at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685)
at org.scalatest.FlatSpecLike.$anonfun$runTests$1(FlatSpecLike.scala:1750)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:396)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:373)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:410)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:379)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
at org.scalatest.FlatSpecLike.runTests(FlatSpecLike.scala:1750)
at org.scalatest.FlatSpecLike.runTests$(FlatSpecLike.scala:1749)
at org.scalatest.FlatSpec.runTests(FlatSpec.scala:1685)
at org.scalatest.Suite.run(Suite.scala:1147)
at org.scalatest.Suite.run$(Suite.scala:1129)
at org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1685)
at org.scalatest.FlatSpecLike.$anonfun$run$1(FlatSpecLike.scala:1795)
at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
at org.scalatest.FlatSpecLike.run(FlatSpecLike.scala:1795)
at org.scalatest.FlatSpecLike.run$(FlatSpecLike.scala:1793)
at com.bigelectrons.housingml.dataprep.HousingDataTest.org$scalatest$BeforeAndAfterAll$$super$run(HousingDataTest.scala:20)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at com.bigelectrons.housingml.dataprep.HousingDataTest.run(HousingDataTest.scala:20)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1346)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1340)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1340)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:1031)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:1010)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1506)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
at org.scalatest.tools.Runner$.run(Runner.scala:850)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)
Caused by: java.lang.IllegalStateException: LiveListenerBus is stopped.
at org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
at org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:99)
at org.apache.spark.sql.SparkSession.$anonfun$sharedState$1(SparkSession.scala:138)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:138)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:137)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:335)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1145)
... 60 more
Here is how I get access to a Spark session:
val spark: SparkSession = SparkSession.builder().config("spark.master", "local").appName("housing-data-test").getOrCreate()
Here is my actual code:
"simple unit test" should "check for data correctness" in {
appCfgT match {
case Success(appCfg) =>
preStart()
val rawDF: DataFrame = spark
.read
.format("csv")
.option("delimiter", ",")
.option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ")
.option("inferSchema", value = true)
.option("mode", "DROPMALFORMED")
.option("header", value = true)
.option("multiLine", value = true)
.schema(encodedHousingSchema)
.load(appCfg.sourceFileUrl)
DataTestUtils.withSpark { session =>
val rows = session.sparkContext.parallelize(Seq(new HousingModel()))
val data = session.createDataFrame(rows)
println("******************************************************************************")
val verificationResult = VerificationSuite()
.onData(data)
.addCheck(
Check(CheckLevel.Error, "unit testing my data")
.hasSize(_ == 4092) // we expect 4092 rows
.isComplete("id") // should never be NULL
.isUnique("id") // should not contain duplicates
.isComplete("productName") // should never be NULL
// should only contain the values "high" and "low"
.isContainedIn("priority", Array("high", "low"))
.isNonNegative("numViews") // should not contain negative values
// at least half of the descriptions should contain a url
.containsURL("description", _ >= 0.5)
// half of the items should have less than 10 views
.hasApproxQuantile("numViews", 0.5, _ <= 10)
)
.run()
}
case Failure(fail) =>
// TODO: Fail the unit test!
}
}

JMSConnection serialization fail

Currently I'm building an application that reads messages (transactions in json) in a Kafka Topic and sends to IBM MQ at production. I'm having some trouble with serialization in the JMS classes and kinda lost on how to fix it.
My code is:
object DispatcherMqApp extends Serializable {
private val logger = LoggerFactory.getLogger(this.getClass)
val config = ConfigFactory.load()
def inicialize(transactionType: String) = {
val spark = new SparkConf()
.setAppName("Dispatcher MQ Categorization")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.streaming.stopGracefullyOnShutDown", "true")
logger.debug(s"Loading configuration at ${printConfig(config).head} =>\n${printConfig(config)(1)}")
val kafkaConfig = KafkaConfig.buildFromConfiguration(config, "dispatcher-mq")
val streamCtx = new StreamingContext(spark, Seconds(kafkaConfig.streamingInterval))
sys.ShutdownHookThread {
logger.warn("Stopping the application ...")
streamCtx.stop(stopSparkContext = true, stopGracefully = true)
logger.warn("Application Finish with Success !!!")
}
val topic = config.getString(s"conf.dispatcher-mq.consumer-topic.$transactionType")
logger.info(s"Topic: $topic")
val zkdir = s"${kafkaConfig.zookeeperBaseDir}$transactionType-$topic"
val kafkaManager = new KafkaManager(kafkaConfig)
val stream = kafkaManager.createStreaming(streamCtx, kafkaConfig.offset, topic, zkdir)
val kafkaSink = streamCtx.sparkContext.broadcast(kafkaManager.createProducer())
val mqConfig = MQConfig(config.getString("conf.mq.name"),
config.getString("conf.mq.host"),
config.getInt("conf.mq.port"),
config.getString("conf.mq.channel"),
config.getString("conf.mq.queue-manager"),
config.getInt("conf.mq.retries"),
config.getString("conf.mq.app-name"),
Try(config.getString("conf.mq.user")).toOption,
Try(config.getString("conf.mq.password")).toOption,
config.getString("conf.dispatcher-mq.send.category_file"))
val queueConn = new MQService(mqConfig)
(stream, queueConn, streamCtx, kafkaSink, zkdir)
}
def main(args: Array[String]): Unit = {
val transactionType = args.head
if (transactionType=="account" | transactionType=="credit") {
val (messages, queueConn, sc, kafkaSink, zkdir) = inicialize(transactionType)
val fieldsType = config.getString(s"conf.dispatcher-mq.send.fields.$transactionType")
val source = config.getString("conf.dispatcher-mq.parameters.source")
val mqVersion = config.getString(s"conf.dispatcher-mq.parameters.version.$transactionType")
val topicError = config.getString("conf.kafka.topic_error")
messages.foreachRDD(rdd => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd.map(_._2).filter(_.toUpperCase.contains("BYCATEGORIZER"))
.foreach(message => {
val msg:Option[TextMessage] = try {
Some(queueConn.createOutputMq(message, fieldsType, source, mqVersion))
} catch {
case ex: Exception =>
logger.error(s"[ERROR] input: [[$message]]\n$ex")
val errorReport = ErrorReport("GENERAL", "DISPATCHER-MQ", transactionType.toString, ex.getMessage, None, Option(ex.toString))
ErrorReportService.sendError(errorReport, topicError, kafkaSink.value)
None
}
if(msg.nonEmpty) queueConn.submit(msg.get)
})
logger.info(s"Save Offset in $zkdir...\n${offsetRanges.toList.to}")
ZookeeperConn.saveOffsets(zkdir, offsetRanges)
})
sc.start()
sc.awaitTermination()
} else
logger.error(s"${args.head} is not a valid argument. ( account or credit ) !!! ")
}
I'm having error on serialization the JMSConnection which is called hidden in the createOutputMq method. The error is:
20/09/04 17:21:00 ERROR JobScheduler: Error running job streaming job 1599250860000 ms.0
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2054)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:918)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:917)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:917)
at br.com.HIDDEN.dispatcher.DispatcherMqApp$$anonfun$main$1.apply(DispatcherMqApp.scala:80)
at br.com.HIDDEN.dispatcher.DispatcherMqApp$$anonfun$main$1.apply(DispatcherMqApp.scala:76)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: com.ibm.msg.client.jms.JmsConnection
Serialization stack:
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 30 more
20/09/04 17:21:00 ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Task not serializable
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2054)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:918)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:917)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:917)
at br.com.HIDDEN.dispatcher.DispatcherMqApp$$anonfun$main$1.apply(DispatcherMqApp.scala:80)
at br.com.HIDDEN.dispatcher.DispatcherMqApp$$anonfun$main$1.apply(DispatcherMqApp.scala:76)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:227)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: com.ibm.msg.client.jms.JmsConnection
Serialization stack:
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 30 more
Anybody got some idea on how to fix it? The line shown in error message (76 and 80) are my messages.foreachRDD(rdd => { and .foreach(message => { respectively.
Thanks in advance

scala.MatchError Message whenever I run Scala Object

The following piece of code is apart of a Twitter Streaming app that I'm using with Spark Streaming.:
val Array(consumerKey, consumerSecret, accessToken, accessTokenSecret) = args.take(4)
val filters = args.takeRight(args.length - 4)
// Set the system properties so that Twitter4j library used by twitter stream
// can use them to generate OAuth credentials
System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
System.setProperty("twitter4j.oauth.accessToken", accessToken)
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret)
Whenever I go to run the program, I get the following error:
Exception in thread "main" scala.MatchError: [Ljava.lang.String;#323659f8 (of class [Ljava.lang.String;)
at SparkPopularHashTags$.main(SparkPopularHashTags.scala:18)
at SparkPopularHashTags.main(SparkPopularHashTags.scala)
Line 18 is:
val Array(consumerKey, consumerSecret, accessToken, accessTokenSecret) = args.take(4)
I have the Twitter4j.properties file saved in my F:\Software\ItelliJ\Projects\twitterStreamApp\src folder, and it's formatted like so:
oauth.consumerKey=***
oauth.consumerSecret=***
oauth.accessToken=***
oauth.accessTokenSecret=***
Where the "*"s are my keys without quotations around them (i.e. oauth.consumerKey=h12b31289fh7139fbh138ry)
Can anyone assist me with this please?
import org.apache.spark.streaming.{ Seconds, StreamingContext }
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.{ SparkContext, SparkConf }
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.flume._
import twitter4j.auth.OAuthAuthorization
import twitter4j.conf.ConfigurationBuilder
object SparkPopularHashTags {
val conf = new SparkConf().setMaster("local[4]").setAppName("Spark Streaming - PopularHashTags")
val sc = new SparkContext(conf)
def main(args: Array[String]) {
sc.setLogLevel("WARN")
val Array(consumerKey, consumerSecret, accessToken, accessTokenSecret) = args.take(4)
// val filters = args.takeRight(args.length - 4)
args.lift(0).foreach { consumerKey =>
System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
}
args.lift(1).foreach { consumerSecret =>
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
}
args.lift(2).foreach { accessToken =>
System.setProperty("twitter4j.oauth.accessToken", accessToken)
}
args.lift(3).foreach { accessTokenSecret =>
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret)
}
val filters = args.drop(4)
// Set the system properties so that Twitter4j library used by twitter stream
// can use them to generate OAuth credentials
// System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
// System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
// System.setProperty("twitter4j.oauth.accessToken", accessToken)
// System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret)
// Set the Spark StreamingContext to create a DStream for every 5 seconds
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None, filters)
// Split the stream on space and extract hashtags
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
// Get the top hashtags over the previous 60 sec window
val topCounts60 = hashTags.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(60))
.map { case (topic, count) => (count, topic) }
.transform(_.sortByKey(false))
// Get the top hashtags over the previous 10 sec window
val topCounts10 = hashTags.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(10))
.map { case (topic, count) => (count, topic) }
.transform(_.sortByKey(false))
// print tweets in the correct DStream
stream.print()
// Print popular hashtags
topCounts60.foreachRDD(rdd => {
val topList = rdd.take(10)
println("\nPopular topics in last 60 seconds (%s total):".format(rdd.count()))
topList.foreach { case (count, tag) => println("%s (%s tweets)".format(tag, count)) }
})
topCounts10.foreachRDD(rdd => {
val topList = rdd.take(10)
println("\nPopular topics in last 10 seconds (%s total):".format(rdd.count()))
topList.foreach { case (count, tag) => println("%s (%s tweets)".format(tag, count)) }
})
ssc.start()
ssc.awaitTermination()
}
}
This is the problem:
val Array(consumerKey, consumerSecret, accessToken, accessTokenSecret) = args.take(4)
This will fail if there are fewer than 4 arguments because it can't match the four values on the left hand side.
Instead, you need to test the elements of args individually to make sure they are present. For example
args.lift(0).foreach { consumerKey =>
System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
}
args.lift(1).foreach { consumerSecret =>
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
}
args.lift(2).foreach { accessToken =>
System.setProperty("twitter4j.oauth.accessToken", accessToken)
}
args.lift(3).foreach { accessTokenSecret =>
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret)
}
val filters = args.drop(4)
This should happen only when your not setting your Program arguments or setting insufficient no. of arguments i.e. less than 4

java.lang.VerifyError: Bad type on operand stack using Scala Pickle

The following code:
import scala.pickling.{FastTypeTag, Pickler, Unpickler}
import scala.pickling.binary._
import scala.pickling.Defaults._
class Serializer[T : Pickler : FastTypeTag] {
def serialize(data: T): Array[Byte] = {
Foo.bar(data)
}
}
object Foo {
def bar[T: Pickler: FastTypeTag](t: T): Array[Byte] = t.pickle.value
def unbar[T: Unpickler: FastTypeTag](bytes: Array[Byte]): T = bytes.unpickle[T]
}
class Message(message: String)
implicit object messageSerializer extends Serializer[Message]
def test[A: Pickler: FastTypeTag: Serializer](message: A): Array[Byte] = {
implicitly[Serializer[A]].serialize(message)
}
val message = new Message("message")
test(message)
Evaluates to:
import scala.pickling.{FastTypeTag, Pickler, Unpickler}
import scala.pickling.binary._
import scala.pickling.Defaults._
defined class Serializer
defined module Foo
defined class Message
defined module messageSerializer
test: test[A](val message: A)(implicit <synthetic> val evidence$10: scala.pickling.Pickler[A],implicit <synthetic> val evidence$11: pickling.FastTypeTag[A],implicit <synthetic> val evidence$12: Serializer[A]) => Array[Byte]
message: Message = com.impresign.hub.core.A$A12$A$A12$Message#6c7df044
java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
com/impresign/hub/core/A$A12$A$A12$messageSerializer$.<init>(Lcom/impresign/hub/core/A$A12$A$A12;)V #212: invokespecial
Reason:
Type uninitializedThis (current frame, stack[3]) is not assignable to 'com/impresign/hub/core/A$A12$A$A12$messageSerializer$'
Current Frame:
bci: #212
flags: { flagThisUninit }
locals: { uninitializedThis, 'com/impresign/hub/core/A$A12$A$A12', 'scala/runtime/VolatileObjectRef', 'scala/Tuple2' }
stack: { uninitializedThis, 'com/impresign/hub/core/A$A12$A$A12', 'scala/Predef$', uninitializedThis, 'scala/runtime/VolatileObjectRef', 'com/impresign/hub/core/A$A12$A$A12' }
Bytecode:
0x0000000: 2a2b b200 2db8 0031 4db2 0036 b600 3ab9
0x0000010: 0040 0100 b900 4601 0099 00c5 bb00 4859
0x0000020: b200 36b6 003a b900 4001 00b2 004d 124f
0x0000030: b200 54b6 0058 b900 5e01 00b9 0062 0200
0x0000040: b200 36b6 003a b900 4001 00b2 004d 124f
0x0000050: b200 54b6 0058 b900 5e01 00b9 0065 0200
0x0000060: b700 684e 2dc6 006c 2db6 006c c000 6e3a
0x0000070: 042d b600 71c0 006e 3a05 1904 c100 7399
0x0000080: 0052 1904 c000 733a 0619 06b6 0076 c000
0x0000090: 783a 0719 05c1 0073 9900 3919 05c0 0073
0x00000a0: 3a08 1908 b600 76c0 007a 3a09 1907 1909
0x00000b0: 3a0a 59c7 000c 5719 0ac6 000e a700 1519
0x00000c0: 0ab6 0080 9900 0d19 07c0 0082 3a0b a700
0x00000d0: 0b2a 2c2b b700 843a 0b19 0ba7 0009 2a2c
0x00000e0: 2bb7 0084 b600 88c0 0078 b200 4d12 4fb2
0x00000f0: 0054 b600 58b7 008b b1
Stackmap Table:
full_frame(#191,{UninitializedThis,Object[#147],Object[#10],Object[#72],Object[#110],Object[#110],Object[#115],Object[#120],Object[#115],Object[#122],Object[#122]},{UninitializedThis,Object[#147],Object[#41],Object[#120]})
full_frame(#199,{UninitializedThis,Object[#147],Object[#10],Object[#72],Object[#110],Object[#110],Object[#115],Object[#120],Object[#115],Object[#122],Object[#122]},{UninitializedThis,Object[#147],Object[#41]})
full_frame(#209,{UninitializedThis,Object[#147],Object[#10],Object[#72]},{UninitializedThis,Object[#147],Object[#41]})
full_frame(#217,{UninitializedThis,Object[#147],Object[#10],Object[#72],Top,Top,Top,Top,Top,Top,Top,Object[#130]},{UninitializedThis,Object[#147],Object[#41]})
full_frame(#222,{UninitializedThis,Object[#147],Object[#10]},{UninitializedThis,Object[#147],Object[#41]})
full_frame(#228,{UninitializedThis,Object[#147],Object[#10]},{UninitializedThis,Object[#147],Object[#41],Object[#130]})
Output exceeds cutoff limit.
Never saw anything like it. I'd like to use implicit Serializers with Scala Pickler 0.10 under the hood. Is this achievable? Scala 2.11.8.
If you change your implicit object declaration to a val
implicit val mssgSerializer: Serializer[Message] = new Serializer[Message]
everything works nicely. As #laughedelic comment mentions, this is probably due to some funny interaction between macros and objects initialization.
Tested with Scala 2.11.8 and Scala Pickling 0.10.1.

Gatling - value check is not a member of io.gatling.http.request.builder.Http

Trying to capture jsession id from string
jsessionid=an121kj533n232j53531314353.tomcat_17221212_1101
in Gatling 2.2.3 using the following function
.check(regex(""jsessionid=\"(.*?).tomcat"").saveAs("jsessid"))
getting following error during compilation
`[ERROR] i.g.c.ZincCompiler$ - C:\Users\xxx\Desktop\gatling-2.2.3\user-files\simulations\RecordedSimulation.scala:31: value check is not a member of io.gatling.http.request.builder.Http possible cause: maybe a semicolon is missing before 'value check'?
15:09:05.780 [ERROR] i.g.c.ZincCompiler$ - .check(regex("\"jsessionid=\"(.*?).tomcat").saveAs("jsessid"))`
Added the code for better presentation of the issue. I am new to Scala/Gatling, any help is really appreciated. Actual Code
import scala.concurrent.duration._
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.jdbc.Predef._
class RecordedSimulation extends Simulation {
val httpProtocol = http
.baseURL("https://XXXXX.net:9191")
.inferHtmlResources(BlackList(""".*\.js""", """.*\.css""", """.*\.gif""", """.*\.jpeg""", """.*\.jpg""", """.*\.ico""", """.*\.woff""", """.*\.(t|o)tf""", """.*\.png"""), WhiteList())
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.acceptEncodingHeader("gzip, deflate")
.acceptLanguageHeader("en-US,en;q=0.5")
.userAgentHeader("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0")
val headers_2 = Map("Accept" -> "image/png,image/*;q=0.8,*/*;q=0.5")
val uri1 = "https://login-qa.XXXX.net:443/minderagent/SmMakeCookie.ccc"
val uri2 = "https://login-qa.XXXX.net:443/minderagent"
val uri3 = "https://XXXXX.svr.us.XXXXX.net:9191/web"
val scn = scenario("RecordedSimulation")
// Launch
.exec(http("request_0")
.get("/web/")
.check(regex("\"jsessionid=\"(.*?).tomcat").saveAs("jsessid")))
.pause(24)
// Login
.exec(http("request_1")
.post(uri2 + "/SSOlogin.fcc")
.formParam("SMENC", "ISO-8859-1")
.formParam("SMLOCALE", "US-EN")
.formParam("location", "")
.formParam("target", "HTTPS://XXXXX.svr.us.XXXXX.net:9191/web/")
.formParam("smauthreason", "0")
.formParam("smagentname", "XXXX+xirJXXJhSPoyD4OiZyLt1C0KEntKHOu0n3c9AIjJ0oMQ7vtB2z2PtaGfRQrMCNbVlFycMzQmdjGuQMXXXXXhnprOFAn8")
.formParam("postpreservationdata", "")
.formParam("USER", "XXXXX")
.formParam("PASSWORD", "XXXXX")
.resources(http("request_2")
.get("/web/common/images/cme.jpg;jsessionid=${jsessid}.tomcat_1232323_11002")
.headers(headers_2),
http("request_3")
.get("/cmeweb/common/images/Title.jpg;jsessionid=${jsessid}.tomcat_1232323_11002")
.headers(headers_2),
http("request_4")
.get("/web/common/images/logo_chase.gif;jsessionid=${jsessid}.tomcat_1232323_11002")
.headers(headers_2),
http("request_5")
.get(uri1 + "?SMSESSION=QUERY&PERSIST=0&TARGET=$SM$https%3a%2f%2fvsin8u4784%2esvr%2eus%2eXXXXX%2enet%3a9191%2ffavicon%2eico"),
http("request_6")
.get(uri2 + "/SSOlogin.fcc?TYPE=33554433&REALMOID=06-000a5275-6315-16fb-8d7a-895aa9454077&GUID=&SMAUTHREASON=0&METHOD=GET&SMAGENTNAME=$SM$UKnA%2bxirJXXJhSPoyD4OiZyLt1C0KEntKHOu0n3c9AIjJ0oMQ7vtB2z2PtaGfRQrMCNbVlFycMzQmdjGuQMhnpYx0srOFAn8&TARGET=$SM$HTTPS%3a%2f%2fvsin8u4784%2esvr%2eus%2eXXXXX%2enet%3a9191%2ffavicon%2eico")))
.pause(12)
// Click Batch
.exec(http("request_7")
.get("/web/batchtracking!view.action;jsessionid=${jsessid}.tomcat_1232323_11002"))
.pause(13)
// Search Batch
.exec(http("request_8")
.post("/cmeweb/batchtracking!doSearch.action")
.formParam("startDate", "07/22/2016")
.formParam("endDate", "07/22/2016")
.formParam("cmeUser", "")
.formParam("sortColumnAlias", "Default"))
setUp(scn.inject(atOnceUsers(1))).protocols(httpProtocol)}
Not getting the above error, when I move .check function under launch tag, rite after get.
But need help in capturing the value in jessionid 'jsessionid=an121kj533n232j53531314353.tomcat_17221212_1101'