Spark Shell - Clear imports - scala

I 'm facing below problem with Spark Shell. So, in a shell session -
I imported following - import scala.collection.immutable.HashMap
Then I realized my mistake and imported correct class - import java.util.HashMap
But, now I get following error on running my code -
<console>:34: error: reference to HashMap is ambiguous;
it is imported twice in the same scope by
import java.util.HashMap
and import scala.collection.immutable.HashMap
val colMap = new HashMap[String, HashMap[String, String]]()
Please assist me if I have long running Spark Shell session i.e I do not want to close and reopen my shell. So, is there a way I can clear previous imports and use correct class?
I know that we can also specify full qualified name like - val colMap = new java.util.HashMap[String, java.util.HashMap[String, String]]()
But, 'm looking if there is a way to clear an incorrect loaded class?
Thanks

Related

How to pass variable arguments to my scala program?

I am very new to scala spark. Here I have a wordcount program wherein I pass the inputfile as an argument instead of hardcoding it and reading it. But when I run the program I get an error Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException : 0
I think it's because I have not mentioned the argument I am taking in the main class but don't know how to do so.
I tried running the program as is and also tried changing the run configurations. i do not know how to pass the filename (in code) as an argument in my main class
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.types.{StructType,StructField,StringType};
import org.apache.spark.sql.Row;
object First {
def main(args : Array[String]): Unit = {
val filename = args(0)
val cf = new SparkConf().setAppName("Tutorial").setMaster("local")
val sc = new SparkContext(cf)
val input = sc.textFile(filename)
val w = input.flatMap(line => line.split(" ")).map(word=>
(word,1)).reduceByKey(_ + _)
w.collect.foreach(println)
w.saveAsTextFile(args(1))
}
}
I wish to run this program by passing the right arguments (input file and save output file as arguments) in my main class. I am using scala eclipse IDE. I do not know what changes to make in my program please help me out here as I am new.
In the run configuration for the project, there is an option right next to main called '(x)=Arguments' where you can pass in arguments to main in the 'Program Arguments' section.
Additionally, you may print args.length to see the number of arguments your code is actually receiving after doing the above.
It appears you are running Spark on Windows, so I'm not sure if this will work exactly as-is, but you can definitely pass arguments like any normal command line application. The only difference is that you have to pass the arguments AFTER specifying the Spark-related parameters.
For example, the JAR filename is the.jar and the main object is com.obrigado.MyMain, then you could run a Spark submit job like so: spark-submit --class com.obrigado.MyMain the.jar path/to/inputfile. I believe args[0] should then be path/to/inputfile.
However, like any command-line program, it's generally better to use POSIX-style arguments (or at least named arguments), and there are several good ones out there. Personally, I love using Scallop as it's easy to use and doesn't seem to interfere with Spark's own CLI parsing library.
Hopefully this fixes your issue!

How to make aliases for the classes in scala

I am using the following code:
val fs = FileSystem.get(new Configuration())
val status = fs.listStatus(new Path("wasb:///example/"))
status.foreach(x=> println(x.getPath)
from this question: How to enumerate files in HDFS directory
My problem is that I do not understand how to make an alias for a class and without it the code fails. I found all the classes mentioned in the code and the following code works:
val fs = org.apache.hadoop.fs.FileSystem.get(new org.apache.hadoop.conf.Configuration())
val status = fs.listStatus(new org.apache.hadoop.fs.Path("wasb:///example/"))
status
So the question is: How to make an alias for a class in scala? How to point Path() to org.apache.hadoop.fs.Path()?
I tried this question on Stackoverflow: Class alias in scala, but did not find a connection with my case.
Not sure about your term alias. I think you want to import. e.g.
import org.apache.hadoop.fs.Path
or more generally
import org.apache.hadoop.fs._
Note that you can alias via an import, thus:
import org.apache.hadoop.fs.{Path => MyPath}
and then refer to Path as MyPath. This is particularly useful when writing code that imports 2 classes of the same name but differing packages e.g. java.util.Date and java.sql.Date. Aliasing allows you to resolve that confusion.

object play.http.HttpEntity.Streamed is not a value

Using Scala and Play 2.5.10 (according to plugin.sbt) I have this code:
import akka.stream.scaladsl.Source
import play.api.libs.streams.Streams
import play.http._
val source = Source.fromPublisher(Streams.enumeratorToPublisher(enumerator))
Ok.sendEntity(HttpEntity.Streamed(source, None, Some("application/zip")))
The imports there are mostly from testing because no matter what I try I can't get the framework to accept HttpEntity.Streamed. With this setup the error is what the title says. Or taken from the console:
Looking at the documentation here I can't really figure out why it doesn't work: https://www.playframework.com/documentation/2.5.10/api/java/play/http/HttpEntity.Streamed.html
This is also what the official examples use: https://www.playframework.com/documentation/2.5.x/ScalaStream
Does anyone at least have some pointers on where to start looking? I've never used Scala or Play before so any hints are welcome.
you should import this one
import play.api.http.HttpEntity
import play.api.libs.streams.Streams
val entity: HttpEntity = HttpEntity.Streamed(fileContent, None, None)
Result(ResponseHeader(200), entity).as(MemeFoRTheFile)
It means that HttpEntity.Streamed is not a value so you should wrap it in a Result() with its ResponseHeader and its extension

Scala - Check if a class is imported or not

In Scala console the command:
import testPackage._
will give the below output:
scala> import testPackage._
import testPackage._
But after importing how to check what are the classes imported in the console or how to list the classes (of testPackage) in the console (just for verification). Please help.
Within the REPL I'm not sure if there's a command for listing all imported classes, however what you can do is use the tabbed completion, just type in:
scala> val tmp : testPackage.
and then hit TAB. You should get a list of the types available within that scope.
HTH

Why does the Scala compiler give "value registerKryoClasses is not a member of org.apache.spark.SparkConf" for Spark 1.4?

I tried to register a class for Kryo as follows
val conf = new SparkConf().setMaster(...).setAppName(...)
conf.registerKryoClasses(Seq(classOf[MyClass]))
val sc = new SparkContext(conf)
However, I get the following error
value registerKryoClasses is not a member of org.apache.spark.SparkConf
I also tried, conf.registerKryoClasses(classOf[MyClass]), but still it complains about the same error.
What mistake am I doing? I am using Spark 1.4.
The method SparkConf.registerKryoClasses is defined in Spark 1.4 (since 1.2). However, it expects an Array[Class[_]] as an argument. This might be the problem. Try calling conf.registerKryoClasses(Array(classOf[MyClass])) instead.