I am writing a Hive UDF in Scala (because I want to learn scala). To do this, I have to override three functions: evaluate, initialize and getDisplayString.
In the initialize function I have to:
Receive an array of ObjectInspector and return an ObjectInspector
Check if the array is null
Check if the array has the correct size
Check if the array contains the object of the correct type
To do this, I am using pattern matching and came up with the following function:
override def initialize(genericInspectors: Array[ObjectInspector]): ObjectInspector = genericInspectors match {
case null => throw new UDFArgumentException(functionNameString + ": ObjectInspector is null!")
case _ if genericInspectors.length != 1 => throw new UDFArgumentException(functionNameString + ": requires exactly one argument.")
case _ => {
listInspector = genericInspectors(0) match {
case concreteInspector: ListObjectInspector => concreteInspector
case _ => throw new UDFArgumentException(functionNameString + ": requires an input array.")
}
PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(listInspector.getListElementObjectInspector.asInstanceOf[PrimitiveObjectInspector].getPrimitiveCategory)
}
}
Nevertheless, I have the impression that the function could be made more legible and, in general, prettier since I don't like to have code with too many levels of indentation.
Is there an idiomatic Scala way to improve the code above?
It's typical for patterns to include other patterns. The type of x here is String.
scala> val xs: Array[Any] = Array("x")
xs: Array[Any] = Array(x)
scala> xs match {
| case null => ???
| case Array(x: String) => x
| case _ => ???
| }
res0: String = x
The idiom for "any number of args" is "sequence pattern", which matches arbitrary args:
scala> val xs: Array[Any] = Array("x")
xs: Array[Any] = Array(x)
scala> xs match { case Array(x: String) => x case Array(_*) => ??? }
res2: String = x
scala> val xs: Array[Any] = Array(42)
xs: Array[Any] = Array(42)
scala> xs match { case Array(x: String) => x case Array(_*) => ??? }
scala.NotImplementedError: an implementation is missing
at scala.Predef$.$qmark$qmark$qmark(Predef.scala:230)
... 32 elided
scala> Array("x","y") match { case Array(x: String) => x case Array(_*) => ??? }
scala.NotImplementedError: an implementation is missing
at scala.Predef$.$qmark$qmark$qmark(Predef.scala:230)
... 32 elided
This answer should not be construed as advocating matching your way back to type safety.
Related
I tried to do collection matching in Scala without using scala.reflect.ClassTag
case class Foo(name: String)
case class Bar(id: Int)
case class Items(items: Vector[AnyRef])
val foo = Vector(Foo("a"), Foo("b"), Foo("c"))
val bar = Vector(Bar(1), Bar(2), Bar(3))
val fc = Items(foo)
val bc = Items(bar)
we can not do this:
fc match {
case Items(x) if x.isInstanceOf[Vector[Foo]]
}
because:
Warning: non-variable type argument Foo in type scala.collection.immutable.Vector[Foo] (the underlying of Vector[Foo]) is unchecked since it is eliminated by erasure
and this:
fc match {
case Items(x: Vector[Foo]) =>
}
but we can do this:
fc match {
case Items(x#(_: Foo) +: _) => ...
case Items(x#(_: Bar) +: _) => ...
}
bc match {
case Items(x#(_: Foo) +: _) => ...
case Items(x#(_: Bar) +: _) => ...
}
As you can see, we are check - is collection Foo + vector or Bar + vector.
And here we are have some problems:
We can do Vector(Foo("1"), Bar(2)), and this is will be match with Foo.
We are still need "val result = x.asInstanceOf[Vector[Bar]]" class casting for result extraction
Is there are some more beautiful way?
Like this:
fc match {
case Items(x: Vector[Foo]) => // result is type of Vector[Foo] already
}
What you're doing here is fundamentally just kind of unpleasant, so I'm not sure making it possible to do it in a beautiful way is a good thing, but for what it's worth, Shapeless's TypeCase is a little nicer:
case class Foo(name: String)
case class Bar(id: Int)
case class Items(items: Vector[AnyRef])
val foo = Vector(Foo("a"), Foo("b"), Foo("c"))
val bar = Vector(Bar(1), Bar(2), Bar(3))
val fc = Items(foo)
val bc = Items(bar)
val FooVector = shapeless.TypeCase[Vector[Foo]]
val BarVector = shapeless.TypeCase[Vector[Bar]]
And then:
scala> fc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res0: Vector[Foo] = Vector(Foo(a), Foo(b), Foo(c))
scala> bc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res1: Vector[Foo] = Vector()
Note that while ClassTag instances can also be used in this way, they don't do what you want:
scala> val FooVector = implicitly[scala.reflect.ClassTag[Vector[Foo]]]
FooVector: scala.reflect.ClassTag[Vector[Foo]] = scala.collection.immutable.Vector
scala> fc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res2: Vector[Foo] = Vector(Foo(a), Foo(b), Foo(c))
scala> bc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res3: Vector[Foo] = Vector(Bar(1), Bar(2), Bar(3))
…which will of course throw ClassCastExceptions if you try to use res3.
This really isn't a nice thing to do, though—inspecting types at runtime undermines parametricity, makes your code less robust, etc. Type erasure is a good thing, and the only problem with type erasure on the JVM is that it's not more complete.
If you want something that is simple using implicit conversions. then try this!
implicit def VectorConversionI(items: Items): Vector[AnyRef] = items match { case x#Items(v) => v }
Example:
val fcVertor: Vector[AnyRef] = fc // Vector(Foo(a), Foo(b), Foo(c))
val bcVertor: Vector[AnyRef] = bc // Vector(Bar(1), Bar(2), Bar(3))
I have a csv file with three columns and I want to get the third column into an Iterator. I want to filter out the headers by using the trytoDouble method in combination of the Pattern Matching.
def trytoDouble(s: String): Option[Double] = {
try {
Some(s.toDouble)
} catch {
case e: Exception => None
}
}
val asdf = Source.fromFile("my.csv").getLines().map(_.split(",").map(_.trim).map(utils.trytoDouble(_))).map{
_ match {
case Array(a, b, Some(c: Double)) => c
}
}
results in
An exception or error caused a run to abort: [Lscala.Option;#2b4a2ec7 (of class [Lscala.Option;)
scala.MatchError: [Lscala.Option;#2b4a2ec7 (of class [Lscala.Option;)
what did I do wrong?
Try using an extractor like StringDouble below. If the unapply returns Some then it matches, if it returns None then it doesn't match.
object StringDouble {
def unapply(str: String): Option[Double] = Try(str.toDouble).toOption
}
val doubles =
Source.fromFile("my.csv").getLines().map { line =>
line.split(",").map(_.trim)
}.map {
case Array(_, _, StringDouble(d)) => d
}
It will always give scala.MatchError
scala> val url = "/home/knoldus/data/moviedataset.csv"
url: String = /home/knoldus/data/moviedataset.csv
scala> val asdf1 = Source.fromFile(url).getLines().map(_.split(",").map(_.trim).map(trytoDouble(_))).toList
asdf1: List[Array[Option[Double]]] = List(Array(Some(1.0), None, Some(1993.0), Some(3.9), Some(4568.0)), Array(Some(2.0), None, Some(1932.0), Some(3.5), Some(4388.0)), Array(Some(3.0), None, Some(1921.0), Some(3.2), Some(9062.0)), Array(Some(4.0), None, Some(1991.0), Some(2.8), Some(6150.0)), Array(Some(5.0), None, Some(1963.0), Some(2.8), Some(5126.0)), ....
As it will return non-empty iterator so, to see the result i have converted it toList.
If you notice return type is List[Array[Option[Double]]], and you are trying to match with Array of tuple3 but it always returns Array[Option[Double]].
Therefore, it will always throw error.
Try This:
val asdf = Array("1,2,3","1,b,c").map(_.split(",").map(_.trim).map(trytoDouble(_))).map{
_(2) match {
case x:Option[Double] => x
}
}
having only one match case is always breaking code.
See the REPL example,
scala> def trytoDouble(s: String): Option[Double] = {
| try {
| Some(s.toDouble)
| } catch {
| case e: Exception => None
| }
| }
when there's no Double in your CSV,
scala> List("a,b,c").map(_.split(",").map(_.trim).map(trytoDouble(_)))
res1: List[Array[Option[Double]]] = List(Array(None, None, None))
When you match above result (Array(None, None, None)) with Array(a, b, Some(c: Double)) always going to fail,
scala> List("a,b,c").map(_.split(",").map(_.trim).map(trytoDouble(_))).map { data =>
| data match {
| case Array(a, b, Some(c: Double)) => c
| }
| }
scala.MatchError: [Lscala.Option;#22604c7e (of class [Lscala.Option;)
at .$anonfun$res4$4(<console>:15)
at .$anonfun$res4$4$adapted(<console>:13)
at scala.collection.immutable.List.map(List.scala:283)
... 33 elided
And when there Double,
scala> List("a,b,100").map(_.split(",").map(_.trim).map(trytoDouble(_))).map { data => data match { case Array(a, b, Some(c: Double)) => c }}
res5: List[Double] = List(100.0)
You basically need to check for Some(c) or None.
BUT, if I understand you you are to extract third field as Double which can be done this way,
scala> List("a,b,100", "100, 200, p").map(_.split(",")).map { case Array(a, b, c) => trytoDouble(c)}.filter(_.nonEmpty)
res14: List[Option[Double]] = List(Some(100.0))
The other answers are great, but I just noticed that an alternative of minimal code change is to simply write
val asdf = Source.fromFile(fileName).getLines().map(_.split(",").map(_.trim).map( utils.trytoDouble(_))).flatMap{
_ match {
case Array(a, b, c) => c
}
}
or slightly more efficient (since we are only interested in the last column anyway):
val asdf = Source.fromFile(fileName).getLines().map(_.split(",")).flatMap{
_ match {
case Array(a, b, c) => trytoDouble(c.trim)
}
}
Important to notice is here that flatMap will remove None objects.
I have a match statement like this:
val x = y match {
case array: Array[Float] => call z
case array: Array[Double] => call z
case array: Array[BigDecimal] => call z
case array: Array[_] => show error
}
How do I simplify this to use only two case statements, since first three case statements do same thing, instead of four.
Type erasure does not really gives you opportunity to understand how array was typed. What you should do instead is to extract head ( first element) of array and check it's type. For example following code works for me:
List(1,2,3) match {
case (a:Int) :: tail => println("yep")
}
This work, although not very nice:
def x(y: Array[_]) = y match {
case a if a.isInstanceOf[Array[Double]] ||
a.isInstanceOf[Array[Float]] ||
a.isInstanceOf[Array[BigDecimal]] => "call z"
case _ => "show error"
}
Would have thought that pattern matching with "|" as below would do the trick. However, this gives pattern type is incompatible with expected type on Array[Float] and Array[BigDecimal]. It might be that matching of generic on this single case where it could work has not been given so much attention:
def x(y: Array[_ <: Any]) = y match {
case a # (_:Array[Double] | _:Array[Float] | _:Array[BigDecimal]) => "call z"
case a: Array[_] => "show error"
}
May be it helps a bit:
import reflect.runtime.universe._
object Tester {
def test[T: TypeTag](y: Array[T]) = y match {
case c: Array[_] if typeOf[T] <:< typeOf[AnyVal] => "hi"
case c: Array[_] => "oh"
}
}
scala> Tester.test(Array(1,2,3))
res0: String = hi
scala> Tester.test(Array(1.0,2.0,3.0))
res1: String = hi
scala> Tester.test(Array("a", "b", "c"))
res2: String = oh
You can obtain the class of array elements as follows (it will be null for non-array types): c.getClass.getComponentType. So you can write:
if (Set(classOf[Float], classOf[Double], classOf[BigDecimal]).contains(c.getClass.getComponentType)) {
// call z
} else {
// show error
}
Not particularly Scala'ish, though; I think #thoredge's answer is the best for that.
You could also check whether the Array is empty first and then if not, just pattern match on Array.head...something like:
def x(y: Array[_]) = {
y.isEmpty match {
case true => "error"
case false => y.head match {
case a:Double | a:BigInt => do whatever
case _ => "error"
}
}
}
Is there a way to write this code in a more elegant way in Scala?
Basically I have the following algebraic data type:
trait Exp
case class Atom(i: Int) extends Exp
case class Add(l: Exp, r: Exp) extends Exp
and I want to check if a given variable matches a specific term inside
a conditional expression. For example I can write
val x = Add(Atom(1), Atom(2))
if (x match { case Add(Atom(1), Atom(2)) => true; case _ => false }) {
println("match")
}
However, it's ugly and verbose. Is there a more elegant alternative?
You can use a PartialFunction and its isDefinedAt method:
type MatchExp = PartialFunction[Exp, Unit] // alias
val isOnePlusTwo: MatchExp = {
case Add(Atom(1), Atom(2)) =>
}
if (isOnePlusTwo.isDefinedAt(x)) println("match")
A match expression produces either a Function or a PartialFunction, depending on the expected type. The type alias is only for convenience here.
Another possibility, if you do not need de-structuring through the pattern match, is to simply rely on the equality of the expression:
if (x == Add(Atom(1), Atom(2))) println("match")
This works because you use case classes which have correct equality implementations.
Use these for concision:
scala> import PartialFunction._
import PartialFunction._
scala> condOpt(x) { case Add(Atom(x), Atom(y)) => x + y }
res2: Option[Int] = Some(3)
scala> cond(x) { case Add(Atom(x), Atom(y)) => println("yep"); true }
yep
res3: Boolean = true
scala> List(x, Atom(42)) flatMap (condOpt(_) { case Add(Atom(x), Atom(y)) => x + y })
res4: List[Int] = List(3)
You could put the println into the match and get rid of the if entirely:
x match {
case Add(Atom(1), Atom(2)) => println("match")
case _ => {} //do nothing
}
If you have a pattern matching (case) in Scala, for example:
foo match {
case a: String => doSomething(a)
case f: Float => doSomethingElse(f)
case _ => ? // How does one determine what this was?
}
Is there a way to determine what type was actually caught in the catch-all?
case x => println(x.getClass)
Too easy :-)
Basically, you just need to bind the value in your catch-all statement to a name (x in this case), then you can use the standard getClass method to determine the type.
If you're trying to perform specific logic based on the type, you're probably doing it wrong. You could compose your match statements as partial functions if you need some 'default' cases that you don't want to define inline there. For instance:
scala> val defaultHandler: PartialFunction[Any, Unit] = {
| case x: String => println("String: " + x)
| }
defaultHandler: PartialFunction[Any,Unit] = <function1>
scala> val customHandler: PartialFunction[Any, Unit] = {
| case x: Int => println("Int: " + x)
| }
customHandler: PartialFunction[Any,Unit] = <function1>
scala> (customHandler orElse defaultHandler)("hey there")
String: hey there
foo match {
case a: String => doSomething(a)
case f: Float => doSomethingElse(f)
case x => println(x.getClass)
}