Spark: Error while compressing and saving to text file - scala

I have a scala Spark job. I want to compress the output using Gzip and then saveToTextFile.
compressedEvents.saveAsTextFile(outputDirectory, org.apache.hadoop.io.compress.GzipCodec)
But I get the following error:
[error] /var/lib/jenkins/workspace/producer-data-test/producer-data-test-build/src/main/scala/IpFromLogs.scala:46: object org.apache.hadoop.io.compress.GzipCodec is not a value
[error] compressedEvents.saveAsTextFile(outputDirectory, org.apache.hadoop.io.compress.GzipCodec)
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
I tried different variations of the same but none of them work. Please help!

Correct way of saving is
compressedEvents.saveAsTextFile(outputDirectory, classOf[GzipCodec])
Or
before you save set the configuration as
sc.hadoopConfiguration.setClass(FileOutputFormat.COMPRESS_CODEC, classOf[GzipCodec], classOf[CompressionCodec])
And save it as
compressedEvents.saveAsTextFile(outputDirectory)

Related

How can I migrate from avro4s 3.0.4 to 4.0.0-RC2?

How can I migrate from avro4s 3.0.4 to 4.0.0-RC2?
I have the following compiling errors:
[error] /Users/nicolae.marasoiu/proj/data-availability-global-topic-conveyor/src/main/scala/com/ovoenergy/globaltopics/serdes/AvroFormatImplicits.scala:8:15: value const is not a member of object com.sksamuel.avro4s.SchemaFor
[error] SchemaFor.const(new Schema.Parser().parse(getClass.getResourceAsStream(hasSchema.resourcePath)))
[error] ^
[error] /Users/nicolae.marasoiu/proj/data-availability-global-topic-conveyor/src/main/scala/com/ovoenergy/globaltopics/serdes/AvroFormatImplicits.scala:11:26: not enough arguments for method apply: (implicit evidence$1: com.sksamuel.avro4s.Encoder[T], implicit evidence$2: com.sksamuel.avro4s.Decoder[T])com.sksamuel.avro4s.RecordFormat[T] in object RecordFormat.
[error] Unspecified value parameter evidence$2.
[error] RecordFormat.apply[T](AvroSchema[T](readSchema))
[error] ^
[error] /Users/nicolae.marasoiu/proj/data-availability-global-topic-conveyor/src/main/scala/com/ovoenergy/globaltopics/serdes/SerdeProvider.scala:29:37: org.apache.avro.Schema does not take parameters
[error] val schema = SchemaFor[T].schema(DefaultFieldMapper)
[error] ^
[error] /Users/nicolae.marasoiu/proj/data-availability-global-topic-conveyor/src/main/scala/com/ovoenergy/globaltopics/serdes/SerdeProvider.scala:37:70: no arguments allowed for nullary method build: ()com.sksamuel.avro4s.AvroOutputStream[T]
[error] val os = AvroOutputStream.binary[T].to(output).build(schema)
[error] ^
[error] four errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 17 s, completed 28-Jul-2020 19:42:21
[IJ]sbt:global-topic-conveyor>
First of all, you can always follow the changes at avro4s github repository. Furthermore, you can see the specific changes made between the versions you specified.
You did not attach any source code, so I'll try to address your failures as I can understand them from your failures.
SchemaFor.const was removed at version 3.0.5 . You can see that in the diff between the versions. Assuming you had Schema s, and you initiated so far the SchemaFor.const(s), Now you'll need to initiate it: SchemaFor(s). The default fieldMapper will be used. You can see it here.
RecordFormat does not take arguments anymore. Therefore assuming you had a type T, that was applied at RecordFormat.apply[T](AvroSchema[T](readSchema)), you need to change it into: RecordFormat[T].
Very similar to number 1. SchemaFor.schema was removed. Instead, you can just do: SchemaFor[T].schema. Schema does not take parameters anymore, so you cannot call it with parentheses.
AvroOutputStream.build used to take a schema as a parameter. It doesn't take it anymore. You need to change AvroOutputStream.binary[T].to(output).build(schema) into: AvroOutputStream.binary[T].to(output).build()

How to add parameters to run SBT in Ammonite?

I want to run this SBT command in Ammonite:
sbt -mem 3000 clean compile docker:publishLocal
I tried a few things like:
%.sbt("-mem 3000", 'clean, 'test)(pwd)
Which gives this exception:
[error] Expected symbol
[error] Not a valid command: -
[error] Expected end of input.
[error] Expected '--'
[error] Expected 'debug'
[error] Expected 'info'
[error] Expected 'warn'
[error] Expected 'error'
[error] Expected 'addPluginSbtFile'
[error] -mem 3000
[error] ^
How is this done?
I recently had to the same thing, and i can tell you that is not fun when those "random" errors happen.
// I had to put the full path where sbt is, like this
val SBT = "C:\\Program Files (x86)\\sbt\\bin\\sbt.bat"
%(SBT, "-mem", "3000", "clean", "compile", "docker:publishLocal")(pwd)
with this the solution is:
%.sbt("-mem", "3000", 'clean, 'test)(pwd)

How to debug scalajs linker error: non-existent method java.lang.Class.getDeclaredFields

After adding the following method to map case classes to Map or js.Dictionary - and I've tried now five or six variants of the following - my code compiles fine and without warnings, but then hits errors at the fastOptJS sjs linking stage.
The method
def ccToMap(cc: AnyRef) =
(Map[String, Any]() /: cc.getClass.getDeclaredFields) {
(a, f) =>
f.setAccessible(true)
a + (f.getName -> f.get(cc))
}
Note that all the variants I have tried do the same thing in a slightly different manner.
The error
[info] Fast optimizing /Users/justin/Desktop/arete/jt/client/target/scala-2.11/client-fastopt.js
[error] Referring to non-existent class java.lang.reflect.Field
[error] called from com.jshin47.jtdc.client.module.visualization.DiodeStateVizC$.ccToMap(java.lang.Object)scala.collection.immutable.Map
[error] called from com.jshin47.jtdc.client.module.visualization.DiodeStateVizC$.<init>()
[error] called from com.jshin47.jtdc.client.module.landing.LandingLocC$$anonfun$2.apply(japgolly.scalajs.react.extra.router.RouterCtl)japgolly.scalajs.react.ReactElement
[error] called from com.jshin47.jtdc.client.module.landing.LandingLocC$$anonfun$2.apply(java.lang.Object)java.lang.Object
[error] called from scala.collection.LinearSeqOptimized$class.foreach(scala.collection.LinearSeqOptimized,scala.Function1)scala.Unit
[error] called from scala.collection.mutable.MutableList.foreach(scala.Function1)scala.Unit
[error] called from scala.collection.TraversableLike$WithFilter.map(scala.Function1,scala.collection.generic.CanBuildFrom)java.lang.Object
[error] called from scala.collection.immutable.Stream$StreamWithFilter.map(scala.Function1,scala.collection.generic.CanBuildFrom)java.lang.Object
[error] called from org.scalajs.testinterface.internal.Slave.org$scalajs$testinterface$internal$Slave$$execute(scala.scalajs.js.Dynamic)scala.Unit
[error] called from org.scalajs.testinterface.internal.Slave.handleMsgImpl(java.lang.String,scala.Function0)scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.handleMsg(java.lang.String)scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.$$anonfun$1(java.lang.String)scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.init()scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.$$js$exported$meth$init()java.lang.Object
[error] called from org.scalajs.testinterface.internal.BridgeBase.init
[error] exported to JavaScript with #JSExport
[error] involving instantiated classes:
[error] com.jshin47.jtdc.client.module.visualization.DiodeStateVizC$
[error] com.jshin47.jtdc.client.module.landing.LandingLocC$$anonfun$2
[error] scala.collection.mutable.Queue
[error] scala.collection.mutable.MutableList
[error] scala.collection.TraversableLike$WithFilter
[error] scala.collection.immutable.Stream$StreamWithFilter
[error] org.scalajs.testinterface.internal.Slave
[error] org.scalajs.testinterface.internal.Master
[error] Referring to non-existent method java.lang.Class.getDeclaredFields() [java.lang.reflect.Field
[error] called from com.jshin47.jtdc.client.module.visualization.DiodeStateVizC$.ccToMap(java.lang.Object)scala.collection.immutable.Map
[error] called from com.jshin47.jtdc.client.module.visualization.DiodeStateVizC$.<init>()
[error] called from com.jshin47.jtdc.client.module.landing.LandingLocC$$anonfun$2.apply(japgolly.scalajs.react.extra.router.RouterCtl)japgolly.scalajs.react.ReactElement
[error] called from com.jshin47.jtdc.client.module.landing.LandingLocC$$anonfun$2.apply(java.lang.Object)java.lang.Object
[error] called from scala.collection.LinearSeqOptimized$class.foreach(scala.collection.LinearSeqOptimized,scala.Function1)scala.Unit
[error] called from scala.collection.mutable.MutableList.foreach(scala.Function1)scala.Unit
[error] called from scala.collection.TraversableLike$WithFilter.map(scala.Function1,scala.collection.generic.CanBuildFrom)java.lang.Object
[error] called from scala.collection.immutable.Stream$StreamWithFilter.map(scala.Function1,scala.collection.generic.CanBuildFrom)java.lang.Object
[error] called from org.scalajs.testinterface.internal.Slave.org$scalajs$testinterface$internal$Slave$$execute(scala.scalajs.js.Dynamic)scala.Unit
[error] called from org.scalajs.testinterface.internal.Slave.handleMsgImpl(java.lang.String,scala.Function0)scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.handleMsg(java.lang.String)scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.$$anonfun$1(java.lang.String)scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.init()scala.Unit
[error] called from org.scalajs.testinterface.internal.BridgeBase.$$js$exported$meth$init()java.lang.Object
[error] called from org.scalajs.testinterface.internal.BridgeBase.init
[error] exported to JavaScript with #JSExport
[error] involving instantiated classes:
[error] com.jshin47.jtdc.client.module.visualization.DiodeStateVizC$
[error] com.jshin47.jtdc.client.module.landing.LandingLocC$$anonfun$2
[error] scala.collection.mutable.Queue
[error] scala.collection.mutable.MutableList
[error] scala.collection.TraversableLike$WithFilter
[error] scala.collection.immutable.Stream$StreamWithFilter
[error] org.scalajs.testinterface.internal.Slave
[error] org.scalajs.testinterface.internal.Master
[trace] Stack trace suppressed: run last client/compile:fastOptJS for the full output.
[error] (client/compile:fastOptJS) There were linking errors
[error] Total time: 36 s, completed May 10, 2016 2:01:07 AM
What I've tried
Being unfamiliar with the details of the linker I am only able to try some "obvious" diagnostics:
Whether I call the method or not, this error is thrown (it doesn't have to be in the code path, so this is getting thrown when the method itself is linked)
The Map type itself works just fine as an argument to the function where I am trying to invoke this
I know to a certainty that iff this method (or similar) is present, then I get the above linker error. (Without, no error.)
Any tips on how to proceed, debug, etc are appreciated.
Alternatively, any tips on how I may convert a case class to a Map without the above-style (reflection-based) function, please let me know
You simply can't use Java reflection in Scala.js:
Java reflection and, a fortiori, Scala reflection, are not supported. There is limited support for java.lang.Class, e.g., obj.getClass.getName will work for any Scala.js object (not for objects that come from JavaScript interop).
Use macros for this instead. See e.g. Scala Macros: Making a Map out of fields of a class in Scala

Can't initialize Spark Context while using sbt test

I have written unit test cases in Spark using Scala in Specs2 framework. In some of the tests, I am creating a Spark Context and passing in functions.
val conf = new SparkConf().setAppName("test").setMaster("local[2]")
val sc = new SparkContext(conf)
val rdd = sc.parallelize(arr)
val output = Util.getHistograms(rdd, header, skipCols, nBins)
These tests are executing correctly in eclipse JUnit plug-in with no errors or failures, but when I run sbt test, I get a strange exception and the test returns with errors.
[info] Case 8: getHistograms should
[error] ! return with correct output
[error] akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! (ChildrenContainer.scala:192)
[error] akka.actor.dungeon.ChildrenContainer$TerminatingChildrenContainer.reserve(ChildrenContainer.scala:192)
[error] akka.actor.dungeon.Children$class.reserveChild(Children.scala:77)
[error] akka.actor.ActorCell.reserveChild(ActorCell.scala:369)
[error] akka.actor.dungeon.Children$class.makeChild(Children.scala:202)
[error] akka.actor.dungeon.Children$class.attachChild(Children.scala:42)
[error] akka.actor.ActorCell.attachChild(ActorCell.scala:369)
[error] akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.actorRef$lzycompute$1(AkkaRpcEnv.scala:92)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$actorRef$1(AkkaRpcEnv.scala:92)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$setupEndpoint$1.apply(AkkaRpcEnv.scala:148)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$setupEndpoint$1.apply(AkkaRpcEnv.scala:148)
[error] org.apache.spark.rpc.akka.AkkaRpcEndpointRef.actorRef$lzycompute(AkkaRpcEnv.scala:281)
[error] org.apache.spark.rpc.akka.AkkaRpcEndpointRef.actorRef(AkkaRpcEnv.scala:281)
[error] org.apache.spark.rpc.akka.AkkaRpcEndpointRef.hashCode(AkkaRpcEnv.scala:329)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.registerEndpoint(AkkaRpcEnv.scala:73)
[error] org.apache.spark.rpc.akka.AkkaRpcEnv.setupEndpoint(AkkaRpcEnv.scala:149)
[error] org.apache.spark.executor.Executor.<init>(Executor.scala:89)
[error] org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalBackend.scala:57)
[error] org.apache.spark.scheduler.local.LocalBackend.start(LocalBackend.scala:119)
[error] org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
[error] org.apache.spark.SparkContext.<init>(SparkContext.scala:514)
[error] UtilTest$$anonfun$8$$anonfun$apply$29.apply(UtilTest.scala:113)
[error] UtilTest$$anonfun$8$$anonfun$apply$29.apply(UtilTest.scala:111)
I guess because of the the SparkContext (sc) is not getting created and I am getting a null, but I can't understand what is causing this.
Thanks in advance.
This was happening because sbt executes all the tests together, and thus multiple SparkContext were getting created due to the running of Specifications file multiple times.
To resolve this, add a separate object and initialize your SparkContext in it. Use this sc all over the test code so that it doesn't get created multiple times.
In fact the reason is even simpler - you cannot run mutable spark contexts in the same JVM at the same time. sbt test executes tests in parallel, meaning that if your tests all spawn a spark context, the tests will fail.
To prevent this from happening add the following to your build.sbt:
// super important with multiple tests running spark Contexts
parallelExecution in Test := false
which will result in sequential tests execution.

Simple scalatra-test specs2 example throws Exception

I'm getting this exception when running the scalatra specs2 example from the scalatra docs:
ThrowableException: org.eclipse.jetty.http.HttpGenerator.flushBuffer()I (FutureTask.java:138)
Here is the test code (starting on line 5, skipping imports):
class MyAppTest extends MutableScalatraSpec {
addServlet(classOf[MyApp], "/*")
"GET / on AdminApp" should {
"return status 200" in {
get("/") {
status must_== 200
}
}
}
}
Here is the app definition:
class MyApp extends ScalatraServlet {
get("/") {
"aloha"
}
}
I'm using scalatra-specs2 2.0.4 and scala 2.9.1. I'm running an embedded jetty server using xsbt-web-plugin 0.2.10 with sbt 0.11.2. The test was executed using sbt test.
Here is the full trace:
[info] GET / on AdminApp should
[error] ! Fragment evaluation error
[error] ThrowableException: org.eclipse.jetty.http.HttpGenerator.flushBuffer()I (FutureTask.java:138)
[error] org.eclipse.jetty.testing.HttpTester.generate(HttpTester.java:225)
[error] org.scalatra.test.ScalatraTests$class.submit(ScalatraTests.scala:46)
[error] com.example.MyAppTest.submit(MyAppTest.scala:5)
[error] org.scalatra.test.ScalatraTests$class.submit(ScalatraTests.scala:71)
[error] com.example.MyAppTest.submit(MyAppTest.scala:5)
[error] org.scalatra.test.ScalatraTests$class.get(ScalatraTests.scala:127)
[error] com.example.MyAppTest.get(MyAppTest.scala:5)
[error] com.example.MyAppTest$$anonfun$1$$anonfun$apply$3.apply(MyAppTest.scala:10)
[error] com.example.MyAppTest$$anonfun$1$$anonfun$apply$3.apply(MyAppTest.scala:10)
[error] org.eclipse.jetty.http.HttpGenerator.flushBuffer()I
[error] org.eclipse.jetty.testing.HttpTester.generate(HttpTester.java:225)
[error] org.scalatra.test.ScalatraTests$class.submit(ScalatraTests.scala:46)
[error] com.example.MyAppTest.submit(MyAppTest.scala:5)
[error] org.scalatra.test.ScalatraTests$class.submit(ScalatraTests.scala:71)
[error] com.example.MyAppTest.submit(MyAppTest.scala:5)
[error] org.scalatra.test.ScalatraTests$class.get(ScalatraTests.scala:127)
[error] com.example.MyAppTest.get(MyAppTest.scala:5)
[error] com.example.MyAppTest$$anonfun$1$$anonfun$apply$3.apply(MyAppTest.scala:10)
[error] com.example.MyAppTest$$anonfun$1$$anonfun$apply$3.apply(MyAppTest.scala:10)
This is the only search result that has turned up so far:
Fragment Evaluation Error.
Can someone point me in the right direction?
Thanks,
-f
Still unsure of the root cause, but the test executes successfully after rolling jetty-webapp back from 8.0.3.v20111011 to 7.6.0.v20120127.
You probably have a conflict in your dependencies, more specifically with the Jetty library version. Since the "flush" method on HttpGenerator has changed between Jetty 6 and Jetty 7, you might be getting a "NoSuchMethodFoundError" which explains the strange "org.eclipse.jetty.http.HttpGenerator.flushBuffer()I" signature in the exception message.
This also explains why you get a "fragment evaluation error" and not a regular failure as explained in the link you mentioned.
If you give a go at the latest specs2-1.10-SNAPSHOT, you will get a better message for "fragment evaluation error" showing 'NoSuchMethodError' when that happens. This will help you diagnosing the issue faster.