My Scala program does not recognize java.io.File - scala

Why does not my Scala program recognize java.io.File?
scala> val in = new java.util.Scanner(java.io.File("~/test.py"))
<console>:13: error: object java.io.File is not a value
val in = new java.util.Scanner(java.io.File("~/test.py"))
^

When you write code like this java.io.File("~/test.py"), it means that you're calling apply method of java.io.File class. But File API doesn't provide such method.
If you want to create a new File instance, you should use constuctor:
new java.io.File("~/test.py")

Related

Converting from java.util.List to spark dataset

I am still very new to spark and scala, but very familiar with Java. I have some java jar that has a function that returns an List (java.util.List) of Integers, but I want to convert these to a spark dataset so I can append it to another column and then perform a join. Is there any easy way to do this? I've tried things similar to this code:
val testDSArray : java.util.List[Integer] = new util.ArrayList[Integer]()
testDSArray.add(4)
testDSArray.add(7)
testDSArray.add(10)
val testDS : Dataset[Integer] = spark.createDataset(testDSArray, Encoders.INT())
but it gives me compiler errors (cannot resolve overloaded method)?
If you look at the type signature you will see that in Scala the encoder is passed in a second (and implicit) parameter list.
You may:
Pass it in another parameter list.
val testDS = spark.createDataset(testDSArray)(Encoders.INT)
Don't pass it, and leave the Scala's implicit mechanism resolves it.
import spark.implicits._
val testDS = spark.createDataset(testDSArray)
Convert the Java's List to a Scala's one first.
import collection.JavaConverters._
import spark.implicits._
val testDS = testDSArray.asScala.toDS()

Scala JavaConverters doesn't seem to work with collections returned by static methods

I'm using a Java library from inside Scala 2.11 code. This Java library has a static load method that returns a Map<String,String>. Example usage in Java:
Map<String,String> map = Environment.load("dev");
I'm trying to get it working in Scala like so:
import scala.collection.JavaConverters._
val map : Map[String,String] = Environment.load("dev").asJava
And I'm getting a compiler error:
"Cannot resolve symbol asJava"
Any ideas?
Use asScala instead of asJava:
import scala.collection.JavaConverters._
val map: Map[String, String] = Environment.load("dev").asScala.toMap

Not able to write SequenceFile in Scala for Array[NullWritable, ByteWritable]

I have a Byte Array in Scala: val nums = Array[Byte](1,2,3,4,5,6,7,8,9) or you can take any other Byte array.
I want to save it as a sequence file in HDFS. Below is the code, I am writing in scala console.
import org.apache.hadoop.io.compress.GzipCodec
nums.map( x => (NullWritable.get(), new ByteWritable(x)))).saveAsSequenceFile("/yourPath", classOf[GzipCodec])
But, it's giving following error:
error: values saveAsSequenceFile is not a member of Array[ (org.apache.hadoop.io.NullWritable), (org.apache.hadoop.io.ByteWritable)]
You require to import these classes as well (in scala console).
import org.apache.hadoop.io.NullWritable
import org.apache.hadoop.io.ByteWritable
The method saveAsSequenceFile is available on an RDD not on an array. So first you need to lift your array into an RDD and then you will be able to call the method saveAsSequenceFile
val v = sc.parallelize(Array(("owl",3), ("gnu",4), ("dog",1), ("cat",2), ("ant",5)), 2)
v.saveAsSequenceFile("hd_seq_file")
http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html

Scala error: value $ is not a member of object org.apache.spark.api.java.JavaSparkContext

I'm new to Spark. I'm trying to run the following code in Spark shell:
import org.apache.spark.api.java.JavaSparkContext
import org.apache.hadoop.conf
JavaSparkContext context = new JavaSparkContext(conf);
But I'm getting the following error:
<console>:32: error: value context is not a member of object
org.apache.spark.api.java.JavaSparkContext
val $ires10 = JavaSparkContext.context
^
<console>:29: error: value context is not a member of object
org.apache.spark.api.java.JavaSparkContext
JavaSparkContext context = new JavaSparkContext(conf);
Is there any import statement that I'm missing? I also added
import org.apache.spark.api.java.JavaSparkContext._
but it still didn't work. Please help.
UPDATE: Whether the configuration is valid or not is something you'll have to work on, but this addresses the error you asked about in your original question.
Your code is (almost) valid Java, but not valid Scala. Did you mean something like this:
import org.apache.spark.api.java.JavaSparkContext
val context = new JavaSparkContext()
Alternatively, since you're using Scala, you might want to try this instead.
import org.apache.spark.SparkContext
val context = new SparkContext()
As for the error you report, Scala will treat the statement JavaSparkContext context as a reference to a member named context of an object named JavaSparkContext - and not as a variable declaration as it is in Java.

Creating a scala Reader from a file

How do you instantiate a scala.util.parsing.input.Reader to read from a file? The API mentions in passing something about PagedSeq and java.io.Reader, but it's not clear at all how to accomplish that.
You create a FileInputStream, pass that to an InputStreamReader and pass that to the apply method of the StreamReader companion object, which returns a StreamReader, a subtype of Reader.
scala> import scala.util.parsing.input.{StreamReader,Reader}
import scala.util.parsing.input.{StreamReader, Reader}
scala> import java.io._
import java.io._
scala> StreamReader(new InputStreamReader(new FileInputStream("test")))
res0: scala.util.parsing.input.StreamReader = scala.util.parsing.input.StreamReader#1e5a0cb