I am using the following code:
val fs = FileSystem.get(new Configuration())
val status = fs.listStatus(new Path("wasb:///example/"))
status.foreach(x=> println(x.getPath)
from this question: How to enumerate files in HDFS directory
My problem is that I do not understand how to make an alias for a class and without it the code fails. I found all the classes mentioned in the code and the following code works:
val fs = org.apache.hadoop.fs.FileSystem.get(new org.apache.hadoop.conf.Configuration())
val status = fs.listStatus(new org.apache.hadoop.fs.Path("wasb:///example/"))
status
So the question is: How to make an alias for a class in scala? How to point Path() to org.apache.hadoop.fs.Path()?
I tried this question on Stackoverflow: Class alias in scala, but did not find a connection with my case.
Not sure about your term alias. I think you want to import. e.g.
import org.apache.hadoop.fs.Path
or more generally
import org.apache.hadoop.fs._
Note that you can alias via an import, thus:
import org.apache.hadoop.fs.{Path => MyPath}
and then refer to Path as MyPath. This is particularly useful when writing code that imports 2 classes of the same name but differing packages e.g. java.util.Date and java.sql.Date. Aliasing allows you to resolve that confusion.
Related
I'm a scala newbie, using pyspark extensively (on DataBricks, FWIW). I'm finding that Protobuf deserialization is too slow for me in python, so I'm porting my deserialization udf to scala.
I've compiled my .proto files to scala and then a JAR using scalapb as described here
When I try to use these instructions to create a UDF like this:
import gnmi.gnmi._
import org.apache.spark.sql.{Dataset, DataFrame, functions => F}
import spark.implicits.StringToColumn
import scalapb.spark.ProtoSQL
// import scalapb.spark.ProtoSQL.implicits._
import scalapb.spark.Implicits._
val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }
I get the following error:
command-4409173194576223:9: error: could not find implicit value for evidence parameter of type frameless.TypedEncoder[Array[Byte]]
val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }
I've double checked that I'm importing the correct implicits, to no avail. I'm pretty fuzzy on implicits, evidence parameters and scala in general.
I would really appreciate it if someone would point me in the right direction. I don't even know how to start diagnosing!!!
Update
It seems like frameless doesn't include an implicit encoder for Array[Byte]???
This works:
frameless.TypedEncoder[Byte]
this does not:
frameless.TypedEncoder[Array[Byte]]
The code for frameless.TypedEncoder seems to include a generic Array encoder, but I'm not sure I'm reading it correctly.
#Dymtro, Thanks for the suggestion. That helped.
Does anyone have ideas about what is going on here?
Update
Ok, progress - this looks like a DataBricks issue. I think that the notebook does something like the following on startup:
import spark.implicits._
I'm using scalapb, which requires that you don't do that
I'm hunting for a way to disable that automatic import now, or "unimport" or "shadow" those modules after they get imported.
If spark.implicits._ are already imported then a way to "unimport" (hide or shadow them) is to create a duplicate object and import it too
object implicitShadowing extends SQLImplicits with Serializable {
protected override def _sqlContext: SQLContext = ???
}
import implicitShadowing._
Testing for case class Person(id: Long, name: String)
// no import
List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]
import spark.implicits._
List(Person(1, "a")).toDS() // compiles
import spark.implicits._
import implicitShadowing._
List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]
How to override an implicit value?
Wildcard Import, then Hide Particular Implicit?
How to override an implicit value, that is imported?
How can an implicit be unimported from the Scala repl?
Not able to hide Scala Class from Import
NullPointerException on implicit resolution
Constructing an overridable implicit
Caching the circe implicitly resolved Encoder/Decoder instances
Scala implicit def do not work if the def name is toString
Is there a workaround for this format parameter in Scala?
Please check whether this helps.
Possible problem can be that you don't want just to unimport spark.implicits._ (scalapb.spark.Implicits._), you probably want to import scalapb.spark.ProtoSQL.implicits._ too. And I don't know whether implicitShadowing._ shadow some of them too.
Another possible workaround is to resolve implicits manually and use them explicitly.
I have a Spark UDF written on Scala. I'd like to use my function with some additional files.
import scala.io.Source
import org.json4s.jackson.JsonMethods.parse
import org.json4s.DefaultFormats
object consts {
implicit val formats = DefaultFormats
val my_map = parse(Source.fromFile("src/main/resources/map.json").mkString).extract[Map[String, Map[String, List[String]]]]
}
Now I want to use my_map object inside UDF. So I basically do this:
import package.consts.my_map
object myUDFs{
*bla-bla and use my_map*
}
I've already tested my function in a local, so it works well.
Now I want to understand how to pack a jar file so that .json file stays there?
Thank you.
If you manage your project with Maven, you can place your .json file(s) under src/main/resources as it's the default place where Maven looks for your project's resources.
You also can define a custom path for your resources as described here: https://maven.apache.org/plugins/maven-resources-plugin/examples/resource-directory.html
UPD: I managed to do so by creating fatJar and reading my resource file this way:
Source
.fromInputStream(
getClass.getClassLoader.getResourceAsStream("map.json")
)
.mkString
).extract[Map[String, Map[String, List[String]]]]
I 'm facing below problem with Spark Shell. So, in a shell session -
I imported following - import scala.collection.immutable.HashMap
Then I realized my mistake and imported correct class - import java.util.HashMap
But, now I get following error on running my code -
<console>:34: error: reference to HashMap is ambiguous;
it is imported twice in the same scope by
import java.util.HashMap
and import scala.collection.immutable.HashMap
val colMap = new HashMap[String, HashMap[String, String]]()
Please assist me if I have long running Spark Shell session i.e I do not want to close and reopen my shell. So, is there a way I can clear previous imports and use correct class?
I know that we can also specify full qualified name like - val colMap = new java.util.HashMap[String, java.util.HashMap[String, String]]()
But, 'm looking if there is a way to clear an incorrect loaded class?
Thanks
I am trying to type check an arbitrary AST using Toolbox in Scala. Basically, I am building expressions using quasiquotes like
newTree = q"$oldTree + $x"
newTree = Typecheck(newTree)
where $oldTree is an AST with some type unknown to me. I need to fill in the fields like newTree.tpe based on the information already present in oldTree and x.
Typecheck() is defined as follows:
import scala.reflect.runtime.universe._
import scala.reflect.runtime.currentMirror
import scala.tools.reflect.ToolBox
import bitstream.types._
object Typecheck {
def apply[A](treeStr: String): A = {
val toolbox = runtimeMirror(getClass.getClassLoader).mkToolBox()
val tree = toolbox.parse(treeStr)
toolbox.typecheck(tree).asInstanceOf[A]
}
}
Currently I am trying to process the following:
Typecheck("types.this.Bit.bit2Int(d2_old)")
where Bit.bit2Int() is a method defined in an object Bit in the package bitstream.types. This is a package containing custom classes and objects. I currently get the error:
scala.tools.reflect.ToolBoxError: reflective typecheck has failed: types is not an enclosing class
My guess is that bitstream.types isn't in the context the mirror used by the Toolbox, but I don't know how to resolve this. I think this Github issue is related, but I'm not sure how to interpret the discussion on the issue page.
Try
Typecheck("_root_.bitstream.types.Bit.bit2Int(d2_old)")
i.e. with package and without this.
(I have no idea what d2_old is and if you can put it there.)
maybe you can add import to tree;
val tree = toolbox.parse("""
import bitstream.types._
${treeStr}
""")
I have a situation where I have to get the fully qualified name of a class I generate dynamically in Scala. Here's what I have so far.
import scala.reflect.runtime.universe
import scala.tools.reflect.ToolBox
val tb = universe.runtimeMirror(getClass.getClassLoader).mkToolBox()
val generatedClass = "class Foo { def addOne(i: Int) = i + 1 }"
tb.compile(tb.parse(generatedClass))
val fooClass:String = ???
Clearly this is just a toy example, but I just don't know how to get the fully qualified name of Foo. I tried sticking a package declaration into the code but that threw an error when calling tb.compile.
Does anyone know how to get the fully qualified class name or (even better) to specify the package that Foo gets compiled under?
Thanks
EDIT
After using the solution proposed I was able to get the class name. However, the next step is the register this class to take some actions later. Specifically I'm trying to make use of the UDTRegistration within Apache Spark to handle my own custom UserDefinedTypes. This strategy works fine when I manually create all the types, however, I want to use them to extend other types I may not know about.
After reading this it seems like what I'm trying to do might not be possible using code compiled at runtime using reflection. Maybe a better solution is to use Scala macros, but I'm very new to that area.
You may use define instead of compile to generate new class and get its package
val cls = tb.define(tb.parse(generatedClass).asInstanceOf[universe.ImplDef])
println(cls.fullName) //__wrapper$1$d1de39015284494799acd2875643f78e.Foo