Another question lead me to the need to create a sequence of Scala expressions. I seem to be unable to do that.
I have a SchemaRDD object z:
org.apache.spark.sql.SchemaRDD =
SchemaRDD[0] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
ParquetTableScan [event_type#0,timestamp#1,id#2,domain#3,code#4], (ParquetRelation ...., Some(Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml, mapred-default.xml, mapred-site.xml, hdfs-default.xml, hdfs-site.xml), org.apache.spark.sql.SQLContext#e7f91e, []), []
and I want to project it on two columns. select should be the answer:
z.select _
res19: Seq[org.apache.spark.sql.catalyst.expressions.Expression] => org.apache.spark.sql.SchemaRDD = <function1>
However, I seem to be unable to generate a Seq[Expression], e.g.:
z.select(Seq('event_type,'code))
<console>:21: error: type mismatch;
found : Seq[Symbol]
required: org.apache.spark.sql.catalyst.expressions.Expression
z.select(Seq('event_type,'code))
^
or:
z.select('event_type,'code)
<console>:21: error: type mismatch;
found : Symbol
required: org.apache.spark.sql.catalyst.expressions.Expression
z.select('event_type,'code)
^
I thought that a symbol was an expression...
so, how do I invoke select?
from a private e-mail:
It looks like you're missing this import:
import sqc._
where sqc is your SqlContext variable (so it's subtly different from import SqlContext._).
This loads a bunch of implicit functions which convert the symbols to an Expression.
If you haven't encountered implicit functions in scala yet it's basically a way to define a function for transparently converting one type to another. e.g. if I define a function like this which takes an instance of Bar and returns a Foo:
implicit def toFoo(bar : Bar) : Foo = new Foo(bar.toString)
Then, anywhere in the code where I have a function which expects a Foo I can pass it a Bar instead and the compiler will introduce a toFoo() call on it in the background (i.e. "implicitly").
It's one of the most quirky features of scala and very much a double-edged sword. It lets you write very powerful and concise DSLs but at the same time introduces a lot of magic that can make code impossible to follow :-/
Related
I am new to Scala, and I hope this question is not too basic. I couldn't find the answer to this question on the web (which might be because I don't know the relevant keywords).
I am trying to understand the following definition:
def functionName[T <: AnyRef](name: Symbol)(range: String*)(f: T => String)(implicit tag: ClassTag[T]): DiscreteAttribute[T] = {
val r = ....
new anotherFunctionName[T](name.toString, f, Some(r))
}
First , why is it defined as def functionName[...](...)(...)(...)(...)? Can't we define it as def functionName[...](..., ..., ..., ...)?
Second, how does range: String* from range: String?
Third, would it be a problem if implicit tag: ClassTag[T] did not exist?
First , why is it defined as def functionName...(...)(...)(...)? Can't we define it as def functionName[...](..., ..., ..., ...)?
One good reason to use currying is to support type inference. Consider these two functions:
def pred1[A](x: A, f: A => Boolean): Boolean = f(x)
def pred2[A](x: A)(f: A => Boolean): Boolean = f(x)
Since type information flows from left to right if you try to call pred1 like this:
pred1(1, x => x > 0)
type of the x => x > 0 cannot be determined yet and you'll get an error:
<console>:22: error: missing parameter type
pred1(1, x => x > 0)
^
To make it work you have to specify argument type of the anonymous function:
pred1(1, (x: Int) => x > 0)
pred2 from the other hand can be used without specifying argument type:
pred2(1)(x => x > 0)
or simply:
pred2(1)(_ > 0)
Second, how does range: String* from range: String?
It is a syntax for defining Repeated Parameters a.k.a varargs. Ignoring other differences it can be used only on the last position and is available as a scala.Seq (here scala.Seq[String]). Typical usage is apply method of the collections types which allows for syntax like SomeDummyCollection(1, 2, 3). For more see:
What does `:_*` (colon underscore star) do in Scala?
Scala variadic functions and Seq
Is there a difference in Scala between Seq[T] and T*?
Third, would it be a problem if implicit tag: ClassTag[T] did not exist?
As already stated by Aivean it shouldn't be the case here. ClassTags are automatically generated by the compiler and should be accessible as long as the class exists. In general case if implicit argument cannot be accessed you'll get an error:
scala> import scala.concurrent._
import scala.concurrent._
scala> val answer: Future[Int] = Future(42)
<console>:13: error: Cannot find an implicit ExecutionContext. You might pass
an (implicit ec: ExecutionContext) parameter to your method
or import scala.concurrent.ExecutionContext.Implicits.global.
val answer: Future[Int] = Future(42)
Multiple argument lists: this is called "currying", and enables you to call a function with only some of the arguments, yielding a function that takes the rest of the arguments and produces the result type (partial function application). Here is a link to Scala documentation that gives an example of using this. Further, any implicit arguments to a function must be specified together in one argument list, coming after any other argument lists. While defining functions this way is not necessary (apart from any implicit arguments), this style of function definition can sometimes make it clearer how the function is expected to be used, and/or make the syntax for partial application look more natural (f(x) rather than f(x, _)).
Arguments with an asterisk: "varargs". This syntax denotes that rather than a single argument being expected, a variable number of arguments can be passed in, which will be handled as (in this case) a Seq[String]. It is the equivalent of specifying (String... range) in Java.
the implicit ClassTag: this is often needed to ensure proper typing of the function result, where the type (T here) cannot be determined at compile time. Since Scala runs on the JVM, which does not retain type information beyond compile time, this is a work-around used in Scala to ensure information about the type(s) involved is still available at runtime.
Check currying:Methods may define multiple parameter lists. When a method is called with a fewer number of parameter lists, then this will yield a function taking the missing parameter lists as its arguments.
range:String* is the syntax for varargs
implicit TypeTag parameter in Scala is the alternative for Class<T> clazzparameter in Java. It will be always available if your class is defined in scope. Read more about type tags.
I asked a longer question, but it seems it's too much code for people to sort through so I've created this question to focus on one smaller, specific problem I'm facing regarding use of macros in Scala.
Consider the following code snippet:
val tpe = weakTypeOf[T]
val companion = tpe.typeSymbol.companionSymbol
val fields = tpe.declarations.collectFirst {
case m: MethodSymbol if m.isPrimaryConstructor => m
}.get.paramss.head
val toMapParams = fields.map { field =>
val name = field.name
val decoded = name.decoded
q"$decoded -> t.$name"
}
Note that fields is just the list of parameters for the primary constructor of a case class in this code. Where I'm confused is the result of the quasiquote q"$decoded -> t.$name". What does this mean exactly? And what type should it have? I'm getting a compile error stating the following:
Multiple markers at this line
- Implicit conversions found: q"$decoded -> t.$name" => Quasiquote(q"$decoded -> t.
$name")
- type mismatch; found : field.NameType required: c.universe.TermName
- type mismatch; found : field.NameType required: c.universe.TermName
Can anyone explain this error? Thanks.
The type of fields is List[Symbol], which means that the type of names of those fields is inconclusive (unknown whether it's a TermName or TypeName). This means that you can't insert such names essentially anywhere in a quasiquote.
A simple fix would be to do val name = field.name.toTermName, explicitly telling the compiler that it's looking at a term name, so that quasiquote knows how to process it.
Working in Scala-IDE, I have a Java library, in which one of the methods receives java.lang.Object. And I want to map a list of Int values to it. The only solution that works is:
val listOfInts = groupOfObjects.map(_.getNeededInt)
for(int <- listOfInts) libraryObject.libraryMethod(int)
while the following one:
groupOfObjects.map(_.getNeededInt).map(libraryMethod(_)
and even
val listOfInts = groupOfObjects.map(_.getNeededInt)
val result = listOfInts.map(libraryObject.libraryMethod(_))
say
type mismatch; found : Int required: java.lang.Object Note: an
implicit exists from scala.Int => java.lang.Integer, but methods
inherited from Object are rendered ambiguous. This is to avoid a
blanket implicit which would convert any scala.Int to any AnyRef. You
may wish to use a type ascription: x: java.lang.Integer.
and something like
val result = listOfInts.map(libraryObject.libraryMethod(x => x.toInt))
or
val result = listOfInts.map(libraryObject.libraryMethod(_.toInt))
does not work also.
1) Why is it happening? As far as I know, the for and map routines do not differ that much!
2) Also: what means You may wish to use a type ascription: x: java.lang.Integer? How would I do that? I tried designating the type explicitly, like x: Int => x.toInt, but that is too erroneus. So what is the "type ascription"?
UPDATE:
The solution proposed by T.Grottker, adds to it. The error that I am getting with it is this:
missing parameter type for expanded function ((x$3) => x$3.asInstanceOf[java.lang.Object])
missing parameter type for expanded function ((x$3) => x$3.asInstanceOf{#null#}[java.lang.Object]{#null#}) {#null#}
and I'm like, OMG, it just grows! Who can explain what all these <null> things mean here? I just want to know the truth. (NOTE: I had to replace <> brakets with # because the SO engine cut out the whole thing then, so use your imagination to replace them back).
The type mismatch tells you exactly the problem: you can convert to java.lang.Integer but not to java.lang.Object. So tell it you want to ask for an Integer somewhere along the way. For example:
groupOfObjects.map(_.getNeededInt: java.lang.Integer).map(libraryObject.libraryMethod(_))
(The notation value: Type--when used outside of the declaration of a val or var or parameter method--means to view value as that type, if possible; value either needs to be a subclass of Type, or there needs to be an implicit conversion that can convert value into something of the appropriate type.)
I have the following code:
private lazy val keys: List[String] = obj.getKeys().asScala.toList
obj.getKeys returns a java.util.Iterator<java.lang.String>
Calling asScala, via JavaConverers (which is imported) according to the docs..
java.util.Iterator <==> scala.collection.Iterator
scala.collection.Iterator defines
def toList: List[A]
So based on this I believed this should work, however here is the compilation error:
[scalac] <file>.scala:11: error: type mismatch;
[scalac] found : List[?0] where type ?0
[scalac] required: List[String]
[scalac] private lazy val keys : List[String] = obj.getKeys().asScala.toList
[scalac] one error found
I understand the type parameter or the java Iterator is a Java String, and that I am trying to create a list of Scala strings, but (perhaps naively) thought that there would be an implicit conversion.
You don't need to call asScala, it is an implicit conversion:
import scala.collection.JavaConversions._
val javaList = new java.util.LinkedList[String]() // as an example
val scalaList = javaList.iterator.toList
If you really don't have the type parameter of the iterator, just cast it to the correct type:
javaList.iterator.asInstanceOf[java.util.Iterator[String]].toList
EDIT: Some people prefer not to use the implicit conversions in JavaConversions, but use the asScala/asJava decorators in JavaConverters to make the conversions more explicit.
That would work if obj.getKeys() was a java.util.Iterator<String>. I suppose it is not.
If obj.getKeys() is just java.util.Iterator in raw form, not java.util.Iterator<String>, not even java.util.Iterator<?>, this is something scala tend to dislikes, but anyway, there is no way scala will type your expression as List[String] if it has no guarantee obj.getKeys() contains String.
If you know your iterator is on Strings, but the type does not say so, you may cast :
obj.getKeys().asInstanceOf[java.util.Iterator[String]]
(then go on with .asScala.toList)
Note that, just as in java and because of type erasure, that cast will not be checked (you will get a warning). If you want to check immediately that you have Strings, you may rather do
obj.getKeys().map(_.asInstanceOf[String])
which will check the type of each element while you iterate to build the list
I dislike the other answers. Hell, I dislike anything that suggests using asInstanceOf unless there's no alternative. In this case, there is. If you do this:
private lazy val keys : List[String] = obj.getKeys().asScala.collect {
case s: String => s
}.toList
You turn the Iterator[_] into a Iterator[String] safely and efficiently.
Note that starting Scala 2.13, package scala.jdk.CollectionConverters replaces deprecated packages scala.collection.JavaConverters/JavaConversions when it comes to implicit conversions between Java and Scala collections:
import scala.jdk.CollectionConverters._
// val javaIterator: java.util.Iterator[String] = java.util.Arrays.asList("a", "b").iterator
javaIterator.asScala
// Iterator[String] = <iterator>
As of scala 2.12.8 one could use
import scala.collection.JavaConverters._
asScalaIterator(java.util.Iterator variable).toSeq
How can I convert a java 1.4 Collection to a Scala Seq?
I am trying to pass a java-collection to a scala method:
import scala.collection.JavaConversions._
// list is a 1.4 java.util.ArrayList
// repository.getDir is Java legacy code
val list = repository.getDir(...)
perform(list)
def perform(entries: List[SVNDirEntry]) = ...
I always receive this error:
type mismatch; found : java.util.Collection[?0] where type ?0 required: List
[SVNDirEntry]
So I guess I have to create the parameterized Sequence myself as Scala is only able to create an unparameterized Iterable?
First you have to make sure that list has the type java.util.List[SVNDirEntry]. To do this, use a cast:
list.asInstanceOf[java.util.List[SVNDirEntry]]
After that, the implicit conversion will be resolved for you if you import the JavaConversions object. An implicit conversion to a Scala sequence exists in the JavaConversions object. See the following example with a list of strings being passed to a method that expects a Scala sequence:
scala> val jvlist: java.util.List[_] = new java.util.ArrayList[String]
jvlist: java.util.List[_] = []
scala> jvlist.asInstanceOf[java.util.List[String]]
res0: java.util.List[String] = []
scala> import collection.JavaConversions._
import collection.JavaConversions._
scala> def perform(scalaseq: Seq[String]) = println("ok")
perform: (scalaseq: scala.collection.Seq[String])Unit
scala> perform(res0)
ok
These conversions do not copy the elements - they simply construct a wrapper around a Java collection. Both versions point to the same underlying data. Thus, there is no implicit conversion in JavaConversions to immutable Scala lists from mutable Java lists, because that would enable changing the contents of a Scala collection that is guaranteed to be immutable.
In short - prefer Seq[...] to List[...] when defining parameters for methods if you can live with a less specific interface (as in perform above). Or write your own function that does the conversion by copying the elements.
You have to cast the legacy collection down to the target type. Something along the lines of:
perform(list.asInstanceOf[List[SVNDirEntry]])