saveAsNewAPIHadoopFile() giving error when used as output format - scala

I am running a modified version of the teragen program in Spark, written in Scala. I am trying to save the output file using the function saveAsNewAPIHadoopFile(). The relevant code is given below:
dataset.map(row => (NullWritable.get(), new BytesWritable(row))).saveAsNewAPIHadoopFile(output)
The code is compiling successfully. However, when running it, I am getting the following error:
Exception in thread "main" java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1794)
at org.apache.hadoop.mapreduce.Job.setOutputFormatClass(Job.java:823)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:830)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:811)
at GenSort$.main(GenSort.scala:52)
at GenSort.main(GenSort.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Is there a way to make it work with saveAsNewAPIHadoopFile()? I would be glad for any help.

The saveAsNewAPIHadoopFile expect key, value, outformat classes.
Method signature is:
saveAsNewAPIHadoopFile(path: String,suffix: String,
keyClass: Class[_],
valueClass: Class[_],
outputFormatClass: Class[_ <: org.apache.hadoop.mapreduce.OutputFormat[_, _]])
Implementation should be:
dataset.map(row => (NullWritable.get(), new BytesWritable(row))).saveAsNewAPIHadoopFile("hdfs:\\.....","<suffix>",classOf[NullWritable],classOf[BytesWritable],classOf[org.apache.hadoop.mapreduce.lib.output.TextOutputFormat[NullWritable, BytesWritable]]))
or
dataset.map(row => (NullWritable.get(), new BytesWritable(row))).
saveAsNewAPIHadoopFile("hdfs:\\.....","<suffix>",
new NullWritable().getClass,new BytesWritable.getClass,
new org.apache.hadoop.mapreduce.lib.output.TextOutputFormat[NullWritable, BytesWritable].getClass))

Related

Instantiating scala collections via their apply method with scala reflection

I have a tool that is trying to build instances of sub-classes of various scala collections, for example scala.collection.Seq. I don't know in advance what specific class should be built, so I am trying to use reflection to get the apply method in the companion object as follows (similar to declaring List[Int](1, 2, 3)).
import scala.reflect.runtime.{universe => ru}
import scala.reflect.runtime.universe._
def makeNewInstance[T <: scala.collection.Seq[_]](clazz: Class[T], args: List[_]): T = {
val clazzMirror: ru.Mirror = ru.runtimeMirror(clazz.getClassLoader)
val clazzSymbol = clazzMirror.classSymbol(clazz)
val companionObject = clazzSymbol.companion.asModule
val instanceMirror = clazzMirror reflect (clazzMirror reflectModule companionObject).instance
val typeSignature = instanceMirror.symbol.typeSignature
val name = "apply"
val ctor = typeSignature.member(TermName(name)).asMethod
instanceMirror.reflectMethod(ctor)(args:_*).asInstanceOf[T]
}
makeNewInstance(clazz = classOf[scala.collection.mutable.ListBuffer[Int]], args = List[Int](1,2,3))
Nonetheless, I am getting the following exception. I am unable to figure out what I should be passing into the apply method.
java.lang.IllegalArgumentException: argument type mismatch
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror1.jinvokeraw(JavaMirrors.scala:373)
at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:339)
at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:355)
at Main$$anon$1.makeNewInstance(test.scala:12)
at Main$$anon$1.<init>(test.scala:15)
at Main$.main(test.scala:1)
at Main.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:101)
at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:70)
at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:39)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:29)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:39)
at scala.tools.nsc.ScriptRunner.scala$tools$nsc$ScriptRunner$$runCompiled(ScriptRunner.scala:175)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:192)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:192)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1$$anonfun$apply$mcZ$sp$1.apply(ScriptRunner.scala:161)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply$mcZ$sp(ScriptRunner.scala:161)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:129)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:129)
at scala.tools.nsc.util.package$.trackingThreads(package.scala:43)
at scala.tools.nsc.util.package$.waitingForThreads(package.scala:27)
at scala.tools.nsc.ScriptRunner.withCompiledScript(ScriptRunner.scala:128)
at scala.tools.nsc.ScriptRunner.runScript(ScriptRunner.scala:192)
at scala.tools.nsc.ScriptRunner.runScriptAndCatch(ScriptRunner.scala:205)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:67)
at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
Thank-you for any help you can offer in advance.
I will answer my own question, check if the constructor accepts varargs or not:
if (ctor.isVarargs) instanceMirror.reflectMethod(ctor)(args.toList).asInstanceOf[T]
else instanceMirror.reflectMethod(ctor)(args:_*).asInstanceOf[T]
Thanks to a friend who pointed me in the right direction.

Java List to Scala Conversion Error

I have a Java code base that returns me a java.util.List that I consume in my Scala layer as below:
import scala.collection.JavaConverters._
val myList = myServiceClient.getMyList.asScala.toList //fails here!
println(myList)
I then hit the following error:
Exception in thread "main" javax.xml.ws.soap.SOAPFaultException: scala.collection.immutable.$colon$colon cannot be cast to java.util.List
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:161)
at com.sun.proxy.$Proxy49.getSlaveList(Unknown Source)
at Test$.main(Test.scala:35)
at Test.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to java.util.List
at org.apache.cxf.binding.soap.SoapMessage.getHeaders(SoapMessage.java:56)
at org.apache.cxf.binding.soap.interceptor.SoapHeaderOutFilterInterceptor.handleMessage(SoapHeaderOutFilterInterceptor.java:37)
at org.apache.cxf.binding.soap.interceptor.SoapHeaderOutFilterInterceptor.handleMessage(SoapHeaderOutFilterInterceptor.java:29)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:514)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:423)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:324)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:277)
at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:139)
... 8 more
So the original problem was a couple of lines above in my code base to what I posted in my original question:
I had to do the following when I pass the List to the Apache CXF library:
val headerList = Seq(
new Header(new QName("http://www.myService.com/MyServices/", "UserName"), "", new JAXBDataBinding(classOf[String])),
new Header(new QName("http://www.myService.com/MyServices//", "Password"), "", new JAXBDataBinding(classOf[String]))
)
import scala.collection.JavaConverters._
proxy.getRequestContext.put(Header.HEADER_LIST, headerList.asJava)

Task Not Serializable exception when using IgniteRDD

What is wrong with this code?? I can not escape from Task Not Serializable
#throws(classOf[Exception])
override def setUp(cfg: BenchmarkConfiguration) {
super.setUp(cfg)
sc = new SparkContext("local[4]", "BenchmarkTest")
sqlContext = new HiveContext(sc)
ic = new IgniteContext[RddKey, RddVal](sc,
() ⇒ configuration("client", client = true))
icCache = ic.fromCache(PARTITIONED_CACHE_NAME)
icCache.savePairs( sc.parallelize({
(0 until 1000).map{ n => (n.toLong, s"Value for key $n")}
}, 10)) // Error happens here: this is "line 89"
println(icCache.collect)
}
Here is the ST:
<20:47:45><yardstick> Failed to start benchmark server (will stop and exit).
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:805)
at org.apache.ignite.spark.IgniteRDD.savePairs(IgniteRDD.scala:170)
at org.yardstickframework.spark.SparkAbstractBenchmark.setUp(SparkAbstractBenchmark.scala:89)
at org.yardstickframework.spark.SparkCoreRDDBenchmark.setUp(SparkCoreRDDBenchmark.scala:18)
at org.yardstickframework.spark.SparkCoreRDDBenchmark$.main(SparkCoreRDDBenchmark.scala:72)
at org.yardstickframework.spark.SparkNode.start(SparkNode.scala:28)
at org.yardstickframework.BenchmarkServerStartUp.main(BenchmarkServerStartUp.java:61)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.serializer.SerializationDebugger$ObjectStreamClassMethods$.getObjFieldValues$extension(SerializationDebugger.scala:240)
It looks like your code is compiled against a different version of scala than
the ignite or spark modules were compiled. I got similar exceptions while
testing when my code was compiled against scala 2.10 and spark was running
scala 2.11 or vice-versa. Module com.databricks:spark-csv_2.10:1.1.0 might
be the reason for this.

Result to Map in Scala Anorm

I am trying to get a map of name -> id from the resultset.
val isp = SQL("select id, name from internet_service_providers").map { x => x[String]("name") -> x[String]("id") }
I am unable to understand why I am getting this error.
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at anorm.SqlStatementParser$$anonfun$3.apply(SqlStatementParser.scala:43)
at anorm.SqlStatementParser$$anonfun$3.apply(SqlStatementParser.scala:43)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at scala.util.parsing.combinator.RegexParsers$class.parse(RegexParsers.scala:148)
at anorm.SqlStatementParser$.parse(SqlStatementParser.scala:11)
at anorm.SqlStatementParser$$anonfun$parse$1.apply(SqlStatementParser.scala:26)
at anorm.SqlStatementParser$$anonfun$parse$1.apply(SqlStatementParser.scala:26)
at scala.util.Try$.apply(Try.scala:161)
at anorm.SqlStatementParser$.parse(SqlStatementParser.scala:26)
at anorm.package$.SQL(package.scala:40)
at com.gumgum.nativead.NativeInventoryApp$.main(NativeInventoryApp.scala:49)
at com.gumgum.nativead.NativeInventoryApp.main(NativeInventoryApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
I am guessing that my way of creating the map in code above might be completely wrong or there is a scala version mismatch in the libs used.
I am using scala 2.11.5 and anrom 2.4.0-M3 built with scala 2.11
First the error java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; is not from Anorm but from Predef: the -> operator is not found to build tupple, which is quite weird. I would suggest to check your scala version and dependencies, to be sure there is not several scala lib pulled.
Then if you want to turn a Row as a tuple, SqlParser.flatten can be used.
Finally as the result will be a list of tuple, .toMap can be used.
import anorm.SqlParser.{ flatten, str }
SQL("...").as((str("name") ~ str("id")).map(flatten).*).toMap

Can deserialize avros to Scala case-classes from in-memory, but why not from files? Record can't be cast to case class?

I'm trying to use Salat-Avro to serialize and deserialize Scala case classes.
I can serialize and deserialize fine in memory, but I can only serialize to files; I can't deserialize form file yet.
Why won't my DatumReader succeed when reading from a file like it did when reading from a stream?
[error] (run-main) java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at Main.main(salat-avro-example.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to models.Record
at Main$.<init>(salat-avro-example.scala:55)
at Main$.<clinit>(salat-avro-example.scala)
at Main.main(salat-avro-example.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[error] {file:/home/julianpeeters/salat-avro-example/}default-7321ab/compile:run: Nonzero exit code: 1
[error] Total time: 18 s, completed Aug 30, 2012 12:04:01 AM
Here's the code:
val obj2 = grater[Record].asObjectFromDataFile(infile)
calls:
lazy val asDatumReader: AvroDatumReader[X] = asGenericDatumReader
lazy val asGenericDatumReader: AvroGenericDatumReader[X] = new AvroGenericDatumReader[X](asAvroSchema)def asObjectFromDataFile(infile: File): X = {
val asDataFileReader: DataFileReader[X] = new DataFileReader[X](infile, asDatumReader)
asDataFileReader.next()
} `
The code can also be seen at Github.com: Salat-Avro-Example.scala and
Salat-Avro.avrograter.scala
How do I fix this? Thanks!
Now I see that dataFileReader.next returned a record, but the values of the fields were still UTF-8, and I needed to unmarshall the values back into a scala object with applyValues. Something like the hackish thing below worked for me:
val objIterator = asDataFileReader.asScala
.iterator
.map(i => asGenericDatumReader.applyValues(i.asInstanceOf[GenericData.Record]).asInstanceOf[X])