Class 'SessionTrigger' must either be declared abstract or implement abstract member - scala

I am building a tutorial on Flink 1.2 and I want to run some simple windowing examples. One of them is Session Windows.
The code I want to run is the following:
import <package>.Session
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.GlobalWindows
import org.apache.flink.streaming.api.windowing.triggers.PurgingTrigger
import org.apache.flink.streaming.api.windowing.windows.GlobalWindow
import scala.util.Try
object SessionWindowExample {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val source = env.socketTextStream("localhost", 9000)
//session map
val values = source.map(value => {
val columns = value.split(",")
val endSignal = Try(Some(columns(2))).getOrElse(None)
Session(columns(0), columns(1).toDouble, endSignal)
})
val keyValue = values.keyBy(_.sessionId)
// create global window
val sessionWindowStream = keyValue.
window(GlobalWindows.create()).
trigger(PurgingTrigger.of(new SessionTrigger[GlobalWindow]()))
sessionWindowStream.sum("value").print()
env.execute()
}
}
As you 'll notice I need to instantiate a new SessionTrigger object which I do, based on this class:
import <package>.Session
import org.apache.flink.streaming.api.windowing.triggers.Trigger.TriggerContext
import org.apache.flink.streaming.api.windowing.triggers.{Trigger, TriggerResult}
import org.apache.flink.streaming.api.windowing.windows.Window
class SessionTrigger[W <: Window] extends Trigger[Session,W] {
override def onElement(element: Session, timestamp: Long, window: W, ctx: TriggerContext): TriggerResult = {
if(element.endSignal.isDefined) TriggerResult.FIRE
else TriggerResult.CONTINUE
}
override def onProcessingTime(time: Long, window: W, ctx: TriggerContext): TriggerResult = {
TriggerResult.CONTINUE
}
override def onEventTime(time: Long, window: W, ctx: TriggerContext): TriggerResult = {
TriggerResult.CONTINUE
}
}
However, InteliJ keeps complaining that:
Class 'SessionTrigger' must either be declared abstract or implement abstract member 'clear(window: W, ctx: TriggerContext):void' in 'org.apache.flink.streaming.api.windowing.triggers.Trigger'.
I tried adding this in the class:
override def clear(window: W, ctx: TriggerContext): Unit = ctx.deleteEventTimeTimer(4)
but it is not working. This is the error I am getting:
03/27/2017 15:48:38 TriggerWindow(GlobalWindows(), ReducingStateDescriptor{serializer=co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anon$2$$anon$1#1aec64d0, reduceFunction=org.apache.flink.streaming.api.functions.aggregation.SumAggregator#1a052a00}, PurgingTrigger(co.uk.DRUK.flink.windowing.SessionWindowExample.SessionTrigger#f2f2cc1), WindowedStream.reduce(WindowedStream.java:276)) -> Sink: Unnamed(4/4) switched to CANCELED
03/27/2017 15:48:38 Job execution switched to status FAILED.
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:27)
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:24)
at org.apache.flink.streaming.api.scala.DataStream$$anon$4.map(DataStream.scala:521)
at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:272)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)
at java.lang.Thread.run(Thread.java:745)
Process finished with exit code 1
Anybody know why?

Well the exception clearly says
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:27)
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:24)
which obviously maps to the following line of code
Session(columns(0), columns(1).toDouble, endSignal)
So the next obvious thing is to log your columns and value after
val columns = value.split(",")
I suspect that value just doesn't contain second comma-separated column at least for some values.

Related

Generating recursive structures in scalacheck

I'm trying to make a generator for a recursive datatype called Row. A row is a list of named Vals, where a Val is either an atomic Bin or else a nested Row.
This is my code:
package com.dtci.data.anonymize.parquet
import java.nio.charset.StandardCharsets
import org.scalacheck.Gen
object TestApp extends App {
sealed trait Val
case class Bin(bytes: Array[Byte]) extends Val
object Bin {
def from_string(str: String): Bin = Bin(str.getBytes(StandardCharsets.UTF_8))
}
case class Row(flds: List[(String, Val)]) extends Val
val gen_bin = Gen.alphaStr.map(Bin.from_string)
val gen_field_name = Gen.alphaLowerStr
val gen_field = Gen.zip(gen_field_name, gen_val)
val gen_row = Gen.nonEmptyListOf(gen_field).map(Row.apply)
def gen_val: Gen[Val] = Gen.oneOf(gen_bin, gen_row)
gen_row.sample.get.flds.foreach( fld => println(s"${fld._1} --> ${fld._2}"))
}
It crashes with the following stack trace:
Exception in thread "main" java.lang.NullPointerException
at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen$.$anonfun$sequence$2(Gen.scala:492)
at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:168)
at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:164)
at scala.collection.immutable.List.foldLeft(List.scala:79)
at org.scalacheck.Gen$.$anonfun$sequence$1(Gen.scala:490)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$.$anonfun$sized$1(Gen.scala:551)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen.sample(Gen.scala:154)
What's wrong with my code, and what would have been the best way for me to diagnose it myself?
As a note, I've seen the remarks about Gen.oneOf being strict and needing Gen.lzy for recursive structures. But if, in my code, I wrap the definition of gen_val inside of Gen.lzy(...) then I get a stack overflow rather than the current null pointer exception.
First of all, be careful using object Main extends App. I find its fields initialization semantic less obvious than plain old main with line-after-line semantics:
object Main {
def main(args: Array[String]): Unit = {...}
}
This is likely a problem with the NullPointerException.
Usually, it can be fixed by careful checking out fields initialization order and marking some (or all of them) val's as lazy.
The StackOverflowError arises because of too deep generated data structure.
Generally, when you are dealing with any kind of recursion, always consider the base case when the recursion should stop and the step which eventually will hit the base case.
In your particular case we can utilize the Gen.sized and Gen.resize which are responsible for how "big" are generated elements (checkout docs and google for more information).
package com.dtci.data.anonymize.parquet
import java.nio.charset.StandardCharsets
import org.scalacheck.Gen
object Main extends App {
sealed trait Val
case class Bin(bytes: Array[Byte]) extends Val
object Bin {
def from_string(str: String): Bin = Bin(str.getBytes(StandardCharsets.UTF_8))
}
case class Row(flds: List[(String, Val)]) extends Val
val gen_bin = Gen.alphaStr.map(Bin.from_string)
val gen_field_name = Gen.alphaLowerStr
val gen_field = Gen.zip(gen_field_name, gen_val)
val gen_row = Gen.sized(size => Gen.resize(size / 2, Gen.nonEmptyListOf(gen_field).map(Row.apply)))
def gen_val: Gen[Val] = Gen.sized { size =>
if (size <= 0) {
gen_bin
} else {
Gen.oneOf(gen_bin, gen_row)
}
}
gen_row.sample.get.flds.foreach(fld => println(s"${fld._1} --> ${fld._2}"))
}

Null pointer exception with GenericWriteAheadSink and FileCheckpointCommitter in Flink 1.8

I'm implementing a custom NSQ sink for Flink. I have it working as a subclass of RichSinkFunction, but I'd like to get the write-ahead log implementation working for extra data integrity.
Using O'Reilly's WriteAheadSinkExample available here, I attempted to implement my own:
package com.wistia.analytics
import java.net.{InetSocketAddress, SocketAddress}
import com.github.mitallast.nsq._
import org.apache.flink.api.scala.createTypeInformation
import java.lang.Iterable
import java.nio.file.{Files, Paths}
import java.util.UUID
import org.apache.commons.lang3.StringUtils
import org.apache.flink.api.common.ExecutionConfig
import org.apache.flink.streaming.runtime.operators.{CheckpointCommitter, GenericWriteAheadSink}
import scala.collection.mutable
class WALNsqSink(val topic: String) extends GenericWriteAheadSink[String](
// CheckpointCommitter that commits checkpoints to the local filesystem
new FileCheckpointCommitter(System.getProperty("java.io.tmpdir")),
// Serializer for records
createTypeInformation[String]
.createSerializer(new ExecutionConfig),
// Random JobID used by the CheckpointCommitter
UUID.randomUUID.toString) {
var client: NSQClient = _
var producer: NSQProducer = _
override def open(): Unit = {
val lookup = new NSQLookup {
def nodes(): List[SocketAddress] = List(new InetSocketAddress("127.0.0.1",4150))
def lookup(topic: String): List[SocketAddress] = List(new InetSocketAddress("127.0.0.1",4150))
}
client = NSQClient(lookup)
producer = client.producer()
}
def sendValues(readings: Iterable[String], checkpointId: Long, timestamp: Long): Boolean = {
val arr = mutable.Seq()
readings.forEach{ reading =>
arr :+ reading
}
producer.mpubStr(topic=topic, data=arr)
true
}
}
reusing FileCheckpointCommitter at the bottom of the class, but I get a null pointer exception inside GenericWriteAheadSink:
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:638)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
at com.wistia.analytics.NsqProcessor$.main(NsqProcessor.scala:24)
at com.wistia.analytics.NsqProcessor.main(NsqProcessor.scala)
Caused by: java.lang.NullPointerException
at org.apache.flink.streaming.runtime.operators.GenericWriteAheadSink.processElement(GenericWriteAheadSink.java:277)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Nonzero exit code returned from runner: 1
(Compile / run) Nonzero exit code returned from runner: 1
Total time: 45 s, completed Feb 10, 2020 6:41:06 PM
I have no idea where to go from here. Any help appreciated
The issue here is most certainly the fact that You never call the open() method of the superclass. This will cause some of the variables to be uninitialized.
This should be solved by calling the super.open() inside Your open() method.

Scala Reflection exception during creation of DataSet in Spark

I want to run Spark Job on Spark Jobserver.
During execution, I got an exception:
stack:
java.lang.RuntimeException: scala.ScalaReflectionException: class
com.some.example.instrument.data.SQLMapping in JavaMirror with
org.apache.spark.util.MutableURLClassLoader#55b699ef of type class
org.apache.spark.util.MutableURLClassLoader with classpath
[file:/app/spark-job-server.jar] and parent being
sun.misc.Launcher$AppClassLoader#2e817b38 of type class
sun.misc.Launcher$AppClassLoader with classpath [.../classpath
jars/] not found.
at
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:123)
at
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:22)
at
com.some.example.instrument.DataRetriever$$anonfun$combineMappings$1$$typecreator15$1.apply(DataRetriever.scala:136)
at
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:49)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275) at
org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:233)
at
org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:33)
at
com.some.example.instrument.DataRetriever$$anonfun$combineMappings$1.apply(DataRetriever.scala:136)
at
com.some.example.instrument.DataRetriever$$anonfun$combineMappings$1.apply(DataRetriever.scala:135)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:237) at
scala.util.Try$.apply(Try.scala:192) at
scala.util.Success.map(Try.scala:237) at
scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237) at
scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237) at
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
In DataRetriever I convert simple case class to DataSet.
case class definition:
case class SQLMapping(id: String,
it: InstrumentPrivateKey,
cc: Option[String],
ri: Option[SourceInstrumentId],
p: Option[SourceInstrumentId],
m: Option[SourceInstrumentId])
case class SourceInstrumentId(instrumentId: Long,
providerId: String)
case class InstrumentPrivateKey(instrumentId: Long,
providerId: String,
clientId: String)
code that causes a problem:
import session.implicits._
def someFunc(future: Future[ID]): Dataset[SQLMappins] = {
future.map {f =>
val seq: Seq[SQLMapping] = getFromEndpoint(f)
val ds: Dataset[SQLMapping] = seq.toDS()
...
}
}
The job sometimes works, but if I re-run job, it will throw an exception.
update 28.03.2018
I forgot to mention one detail, that turns out to be important.
Dataset was constructed inside of Future.
Calling toDS() inside future causing ScalaReflectionException.
I decided to construct DataSet outside future.map.
You can verify that Dataset can't be constructed in future.map with this example job.
package com.example.sparkapplications
import com.typesafe.config.Config
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import scala.concurrent.Await
import scala.concurrent.Future
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import spark.jobserver.SparkJob
import spark.jobserver.SparkJobValid
import spark.jobserver.SparkJobValidation
object FutureJob extends SparkJob{
override def runJob(sc: SparkContext,
jobConfig: Config): Any = {
val session = SparkSession.builder().config(sc.getConf).getOrCreate()
import session.implicits._
val f = Future{
val seq = Seq(
Dummy("1", 1),
Dummy("2", 2),
Dummy("3", 3),
Dummy("4", 4),
Dummy("5", 5)
)
val ds = seq.toDS
ds.collect()
}
Await.result(f, 10 seconds)
}
case class Dummy(id: String, value: Long)
override def validate(sc: SparkContext,
config: Config): SparkJobValidation = SparkJobValid
}
Later I will provide information if the problem persists using spark 2.3.0, and when you pass jar via spark-submit directly.

Code working in Spark-Shell not in eclipse

I have a small Scala code which works properly on Spark-Shell but not in Eclipse with Scala plugin. I can access hdfs using plugin tried writing another file and it worked..
FirstSpark.scala
package bigdata.spark
import org.apache.spark.SparkConf
import java. io. _
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object FirstSpark {
def main(args: Array[String])={
val conf = new SparkConf().setMaster("local").setAppName("FirstSparkProgram")
val sparkcontext = new SparkContext(conf)
val textFile =sparkcontext.textFile("hdfs://pranay:8020/spark/linkage")
val m = new Methods()
val q =textFile.filter(x => !m.isHeader(x)).map(x=> m.parse(x))
q.saveAsTextFile("hdfs://pranay:8020/output") }
}
Methods.scala
package bigdata.spark
import java.util.function.ToDoubleFunction
class Methods {
def isHeader(s:String):Boolean={
s.contains("id_1")
}
def parse(line:String) ={
val pieces = line.split(',')
val id1=pieces(0).toInt
val id2=pieces(1).toInt
val matches=pieces(11).toBoolean
val mapArray=pieces.slice(2, 11).map(toDouble)
MatchData(id1,id2,mapArray,matches)
}
def toDouble(s: String) = {
if ("?".equals(s)) Double.NaN else s.toDouble
}
}
case class MatchData(id1: Int, id2: Int,
scores: Array[Double], matched: Boolean)
Error Message:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:335)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:334)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
Can anyone please help me with this
Try changing class Methods { .. } to object Methods { .. }.
I think the problem is at val q =textFile.filter(x => !m.isHeader(x)).map(x=> m.parse(x)). When Spark sees the filter and map functions it tries to serialize the functions passed to them (x => !m.isHeader(x) and x=> m.parse(x)) so that it can dispatch the work of executing them to all of the executors (this is the Task referred to). However, to do this, it needs to serialize m, since this object is referenced inside the function (it is in the closure of the two anonymous methods) - but it cannot do this since Methods is not serializable. You could add extends Serializable to the Methods class, but in this case an object is more appropriate (and is already Serializable).

Scala Play no application started when grabbing data sources from application.conf

I am trying to read in data sources from my application.conf file, but every time I run my server, or try and run test cases, I am getting an error saying that there is no application started.
Here is an example of what I am trying to do:
Unit test that is trying to read a property from my application.conf
class DbConfigWebUnitTest extends PlaySpec with OneAppPerSuite {
implicit override lazy val app: FakeApplication = FakeApplication(
additionalConfiguration = Map("db.test.url" -> "jdbc:postgresql://localhost:5432/suredbitswebtest",
"db.test.user" -> "postgres", "db.test.password" -> "postgres", "db.test.driver" -> "org.postgresql.Driver"))
val dbManagementWeb = new DbManagementWeb with DbConfigWeb with DbTestQualifier
"DbConfigWebTest" must {
"have the same username as what is defined in application.conf" in {
dbManagementWeb.username must be("postgres")
}
}
}
Here is my DbConfigWeb
import play.api.Play.current
trait DbConfigWeb extends DbConfig { qualifier: DbQualifier =>
val url: String = current.configuration.getString(qualifier + ".url").get
val username: String = current.configuration.getString(qualifier + ".user").get
val password: String = current.configuration.getString(qualifier + ".password").get
val driver: String = current.configuration.getString(qualifier + ".driver").get
override def database: DatabaseDef = JdbcBackend.Database.forURL(url, username, password, null, driver)
override implicit val session = database createSession
}
trait DbQualifier {
val qualifier: String
}
trait DbProductionQualifier extends DbQualifier {
override val qualifier = "db.production"
}
trait DbTestQualifier extends DbQualifier {
override val qualifier = "db.test"
}
and lastly here is my stack trace:
[suredbits-web] $ last test:test
[debug] Forking tests - parallelism = false
[debug] Create a single-thread test executor
[debug] Runner for sbt.FrameworkWrapper produced 0 initial tasks for 0 tests.
[debug] Runner for org.scalatest.tools.Framework produced 2 initial tasks for 2 tests.
[debug] Running TaskDef(com.suredbits.web.db.DbConfigWebUnitTest, sbt.ForkMain$SubclassFingerscan#48687c55, false, [SuiteSelector])
[error] Uncaught exception when running com.suredbits.web.db.DbConfigWebUnitTest: java.lang.RuntimeException: There is no started application
sbt.ForkMain$ForkError: There is no started application
at scala.sys.package$.error(package.scala:27)
at play.api.Play$$anonfun$current$1.apply(Play.scala:71)
at play.api.Play$$anonfun$current$1.apply(Play.scala:71)
at scala.Option.getOrElse(Option.scala:120)
at play.api.Play$.current(Play.scala:71)
at com.suredbits.web.db.DbConfigWeb$class.$init$(DbConfigWebProduction.scala:14)
at com.suredbits.web.db.DbConfigWebUnitTest$$anon$1.<init>(DbConfigWebUnitTest.scala:14)
at com.suredbits.web.db.DbConfigWebUnitTest.<init>(DbConfigWebUnitTest.scala:14)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:641)
at sbt.ForkMain$Run$2.call(ForkMain.java:294)
at sbt.ForkMain$Run$2.call(ForkMain.java:284)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I think the key problem is that vals in Scala traits are initialized at construction time, which is prior to the test Play application being started (presumably its lifecycle is tied to each spec example.) You have a couple of workarounds:
make everything in DbConfigWeb a def or perhaps a lazy val
give DbConfigWeb an abstract play.api.Application field from which to extract the config values (rather than using current) and pass it explicitly (the fake application) to whatever DbManagementWeb is as a constructor parameter
Here's a simplified version, using the first approach (which works for me):
import play.api.Play.current
trait DbConfig
trait DbConfigWeb extends DbConfig {
self: DbQualifier =>
// Using defs instead of vals
def url: String = current.configuration.getString(qualifier + ".url").get
def username: String = current.configuration.getString(qualifier + ".user").get
def password: String = current.configuration.getString(qualifier + ".password").get
def driver: String = current.configuration.getString(qualifier + ".driver").get
}
trait DbQualifier {
val qualifier: String
}
trait DbTestQualifier extends DbQualifier {
override val qualifier = "db.test"
}
and the spec:
import controllers.{DbConfigWeb, DbTestQualifier}
import org.scalatestplus.play.{OneAppPerSuite, PlaySpec}
import play.api.test.FakeApplication
class DbConfigTest extends PlaySpec with OneAppPerSuite {
implicit override lazy val app: FakeApplication = FakeApplication(
additionalConfiguration = Map("db.test.url" -> "jdbc:h2:mem:play",
"db.test.user" -> "sa", "db.test.password" -> "", "db.test.driver" -> "org.h2.Driver"))
val dbManagementWeb = new DbConfigWeb with DbTestQualifier
"DbConfigWebTest" must {
"have the same username as what is defined in application.conf" in {
dbManagementWeb.username must be("sa")
}
}
}
Personally I prefer the second approach, which keeps the application state passed around explicitly rather than relying on play.api.Play.current, which you cannot rely on always being started.
You mentioned in the comments that lazy vals were not working for you but I can only conjecture that some chain of calls was forcing initialization: check again that this isn't the case.
Note also that order of initialization for vals can be complex and, while some might disagree, it's a pretty safe bet to stick to defs as trait members unless you're sure it's some expensive operation (in which case a lazy val might be an option.)