I am trying to do secondary sort with Scala Spark, following this blog. And I have the following code:
case class TokenZidKey(token: String, zid: Long)
object TokenZidKey {
implicit def orderingByTokenZid[A <: TokenZidKey]: Ordering[A] = {
Ordering.by(tzk => (tzk.token, tzk.zid))
}
}
class TokenZidPartitioner(partitions: Int) extends Partitioner {
override def numPartitions: Int = partitions;
override def getPartition(key: Any): Int = key.asInstanceOf[TokenZidKey].token.hashCode() % numPartitions;
}
object SetSim {
def main(args: Array[String]) {
...
val linesWithZid = dataLines.zipWithIndex();
def mapZidByToken(r: (String, Long)): Iterable[(TokenZidKey, String)] = {
for (t <- r._1.split(" ")) yield (new TokenZidKey(t, r._2), r._1);
}
val tokensWithZid = linesWithZid.flatMap(mapZidByToken);
val tokenZidPartitioner = new TokenZidPartitioner(tokensWithZid.context.defaultParallelism);
val tokenZidsWithSecondarySort = tokensWithZid.repartitionAndSortWithinPartitions(tokenZidPartitioner).map(r => {
(r._1.token, (r._1.zid, r._2))
}).toDF("token", "zid_with_line");
}
}
where tokensWithZid is org.apache.spark.rdd.RDD[(TokenZidKey, String)], but I still got
value repartitionAndSortWithinPartitions is not a member of org.apache.spark.rdd.RDD[(TokenZidKey, String)]
Any idea why and how I fix this? Is it that my custom TokenZidKey is not recognized as a proper RDD key that is sortable?
Turns out that the code works, the issue is I tested it in spark-shell and when input on different lines, spark-shell does not register the case class and the companion object as on the same file.
So if you want to test this in spark-shell, enter the case class and companion object in a single line like the following
case class TokenZidKey(token: String, zid: Long); object TokenZidKey {
implicit def orderingByTokenZid[A <: TokenZidKey]: Ordering[A] = {
Ordering.by(tzk => (tzk.token, tzk.zid))
}
}
Related
I have an array of Any (in real life, it's a Spark Row, but it's sufficient to isolate the problem)
object Row {
val buffer : Array[Any] = Array(42, 21, true)
}
And I want to apply some operations on its elements.
So, I've defined a simple ADT to define a compute operation on a type A
trait Op[A] {
def cast(a: Any) : A = a.asInstanceOf[A]
def compute(a: A) : A
}
case object Count extends Op[Int] {
override def compute(a: Int): Int = a + 1
}
case object Exist extends Op[Boolean] {
override def compute(a: Boolean): Boolean = a
}
Given that I have a list of all operations and I know which operation is to apply to each element, let's use these operations.
object GenericsOp {
import Row._
val ops = Seq(Count, Exist)
def compute() = {
buffer(0) = ops(0).compute(ops(0).cast(buffer(0)))
buffer(1) = ops(0).compute(ops(0).cast(buffer(1)))
buffer(2) = ops(1).compute(ops(1).cast(buffer(2)))
}
}
By design, for a given op, types are aligned between cast and combine. But unfortunately the following code does not compile. The error is
Type mismatch, expected: _$1, actual: AnyVal
Is there a way to make it work ?
I've found a workaround by using abstract type member instead of type parameter.
object AbstractOp extends App {
import Row._
trait Op {
type A
def compute(a: A) : A
}
case object Count extends Op {
type A = Int
override def compute(a: Int): Int = a + 1
}
case object Exist extends Op {
type A = Boolean
override def compute(a: Boolean): Boolean = a
}
val ops = Seq(Count, Exist)
def compute() = {
val op0 = ops(0)
val op1 = ops(1)
buffer(0) = ops(0).compute(buffer(0).asInstanceOf[op0.A])
buffer(1) = ops(0).compute(buffer(1).asInstanceOf[op0.A])
buffer(2) = ops(1).compute(buffer(2).asInstanceOf[op1.A])
}
}
Is there a better way ?
It seems that your code can be simplified by making Op[A] extend Any => A:
trait Op[A] extends (Any => A) {
def cast(a: Any) : A = a.asInstanceOf[A]
def compute(a: A) : A
def apply(a: Any): A = compute(cast(a))
}
case object Count extends Op[Int] {
override def compute(a: Int): Int = a + 1
}
case object Exist extends Op[Boolean] {
override def compute(a: Boolean): Boolean = a
}
object AbstractOp {
val buffer: Array[Any] = Array(42, 21, true)
val ops: Array[Op[_]] = Array(Count, Count, Exist)
def main(args: Array[String]): Unit = {
for (i <- 0 until buffer.size) {
buffer(i) = ops(i)(buffer(i))
}
println(buffer.mkString("[", ",", "]"))
}
}
Since it's asInstanceOf everywhere anyway, it does not make the code any less safe than what you had previously.
Update
If you cannot change the Op interface, then invoking cast and compute is a bit more cumbersome, but still possible:
trait Op[A] {
def cast(a: Any) : A = a.asInstanceOf[A]
def compute(a: A) : A
}
case object Count extends Op[Int] {
override def compute(a: Int): Int = a + 1
}
case object Exist extends Op[Boolean] {
override def compute(a: Boolean): Boolean = a
}
object AbstractOp {
val buffer: Array[Any] = Array(42, 21, true)
val ops: Array[Op[_]] = Array(Count, Count, Exist)
def main(args: Array[String]): Unit = {
for (i <- 0 until buffer.size) {
buffer(i) = ops(i) match {
case op: Op[t] => op.compute(op.cast(buffer(i)))
}
}
println(buffer.mkString("[", ",", "]"))
}
}
Note the ops(i) match { case op: Opt[t] => ... } part with a type-parameter in the pattern: this allows us to make sure that cast returns a t that is accepted by compute.
As a more general solution than Andrey Tyukin's, you can define the method outside Op, so it works even if Op can't be modified:
def apply[A](op: Op[A], x: Any) = op.compute(op.cast(x))
buffer(0) = apply(ops(0), buffer(0))
More specifically, I have:
case class Key (key: String)
abstract class abstr {
type MethodMap = PartialFunction[Key, String => Unit]
def myMap: MethodMap // abstract
def useIt (key: Key, value: String) = {
val meth = myMap(key)
meth(value)
}
def report = {
for (key <- myMap.keySet) // how to do this
println("I support "+key)
}
}
I use it like this:
class concrete extends abstr {
var one: Boolean
def method1(v: String): Unit = ???
def method2(v: String): Unit = ???
def map1: MethodMap = {
case Key("AAA") => method1
}
def map2: MethodMap = {
case Key("AAA") => method2
}
override def myMap: MethodMap = if (one) map1 else map2
}
Of course, this is somewhat simplified, but the report function is necessary.
Some history: I first had it implemented using Map but then I changed it to PartialFunction in order to support the following override def myMap: MethodMap = if (one) map1 else map2.
Any suggestion to refactor my code to support everything is also appreciated.
No. PartialFunction can be defined (and often is) on infinite sets. E.g. what do you expect report to return in these situations:
class concrete2 extends abstr {
def myMap = { case Key(_) => ??? }
}
or
class concrete2 extends abstr {
def myMap = { case Key(key) if key.length > 3 => ??? }
}
? If you have a finite list of values you are interested in, you can do
abstract class abstr {
type MethodMap = PartialFunction[Key, String => Unit]
def myMap: MethodMap // abstract
val keys: Seq[Key] = ...
def report = {
for (key <- keys if myMap.isDefined(key))
println("I support "+key)
}
}
Some history: I first had it implemented using Map but then I changed it to PartialFunction in order to support the last line in second part.
Why? This would work just as well with Map.
In your solution, is there any way to define the domain of the partial function to be the finite set keys
def f: MethodMap = { case key if keys.contains(key) => ... }
Of course, the domain isn't part of the type.
Can't I use a generic on the unapply method of an extractor along with an implicit "converter" to support a pattern match specific to the parameterised type?
I'd like to do this (Note the use of [T] on the unapply line),
trait StringDecoder[A] {
def fromString(string: String): Option[A]
}
object ExampleExtractor {
def unapply[T](a: String)(implicit evidence: StringDecoder[T]): Option[T] = {
evidence.fromString(a)
}
}
object Example extends App {
implicit val stringDecoder = new StringDecoder[String] {
def fromString(string: String): Option[String] = Some(string)
}
implicit val intDecoder = new StringDecoder[Int] {
def fromString(string: String): Option[Int] = Some(string.charAt(0).toInt)
}
val result = "hello" match {
case ExampleExtractor[String](x) => x // <- type hint barfs
}
println(result)
}
But I get the following compilation error
Error: (25, 10) not found: type ExampleExtractor
case ExampleExtractor[String] (x) => x
^
It works fine if I have only one implicit val in scope and drop the type hint (see below), but that defeats the object.
object Example extends App {
implicit val intDecoder = new StringDecoder[Int] {
def fromString(string: String): Option[Int] = Some(string.charAt(0).toInt)
}
val result = "hello" match {
case ExampleExtractor(x) => x
}
println(result)
}
A variant of your typed string decoder looks promising:
trait StringDecoder[A] {
def fromString(s: String): Option[A]
}
class ExampleExtractor[T](ev: StringDecoder[T]) {
def unapply(s: String) = ev.fromString(s)
}
object ExampleExtractor {
def apply[A](implicit ev: StringDecoder[A]) = new ExampleExtractor(ev)
}
then
implicit val intDecoder = new StringDecoder[Int] {
def fromString(s: String) = scala.util.Try {
Integer.parseInt(s)
}.toOption
}
val asInt = ExampleExtractor[Int]
val asInt(Nb) = "1111"
seems to produce what you're asking for. One problem remains: it seems that trying to
val ExampleExtractor[Int](nB) = "1111"
results in a compiler crash (at least inside my 2.10.3 SBT Scala console).
I'm having some problems with a macro I've written to help me log metrics represented as case class instances to to InfluxDB. I presume I'm having a type erasure problem and that the tyep parameter T is getting lost, but I'm not entirely sure what's going on. (This is also my first exposure to Scala macros.)
import scala.language.experimental.macros
import play.api.libs.json.{JsNumber, JsString, JsObject, JsArray}
abstract class Metric[T] {
def series: String
def jsFields: JsArray = macro MetricsMacros.jsFields[T]
def jsValues: JsArray = macro MetricsMacros.jsValues[T]
}
object Metrics {
case class LoggedMetric(timestamp: Long, series: String, fields: JsArray, values: JsArray)
case object Kick
def log[T](metric: Metric[T]): Unit = {
println(LoggedMetric(
System.currentTimeMillis,
metric.series,
metric.jsFields,
metric.jsValues
))
}
}
And here's an example metric case class:
case class SessionCountMetric(a: Int, b: String) extends Metric[SessionCountMetric] {
val series = "sessioncount"
}
Here's what happens when I try to log it:
scala> val m = SessionCountMetric(1, "a")
m: com.confabulous.deva.SessionCountMetric = SessionCountMetric(1,a)
scala> Metrics.log(m)
LoggedMetric(1411450638296,sessioncount,[],[])
Even though the macro itself seems to work fine:
scala> m.jsFields
res1: play.api.libs.json.JsArray = ["a","b"]
scala> m.jsValues
res2: play.api.libs.json.JsArray = [1,"a"]
Here's the actual macro itself:
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
object MetricsMacros {
private def fieldNames[T: c.WeakTypeTag](c: Context)= {
val tpe = c.weakTypeOf[T]
tpe.decls.collect {
case field if field.isMethod && field.asMethod.isCaseAccessor => field.asTerm.name
}
}
def jsFields[T: c.WeakTypeTag](c: Context) = {
import c.universe._
val names = fieldNames[T](c)
Apply(
q"play.api.libs.json.Json.arr",
names.map(name => Literal(Constant(name.toString))).toList
)
}
def jsValues[T: c.WeakTypeTag](c: Context) = {
import c.universe._
val names = fieldNames[T](c)
Apply(
q"play.api.libs.json.Json.arr",
names.map(name => q"${c.prefix.tree}.$name").toList
)
}
}
Update
I tried Eugene's second suggestion like this:
abstract class Metric[T] {
def series: String
}
trait MetricSerializer[T] {
def fields: Seq[String]
def values(metric: T): Seq[Any]
}
object MetricSerializer {
implicit def materializeSerializer[T]: MetricSerializer[T] = macro MetricsMacros.materializeSerializer[T]
}
object Metrics {
def log[T: MetricSerializer](metric: T): Unit = {
val serializer = implicitly[MetricSerializer[T]]
println(serializer.fields)
println(serializer.values(metric))
}
}
with the macro now looking like this:
object MetricsMacros {
def materializeSerializer[T: c.WeakTypeTag](c: Context) = {
import c.universe._
val tpe = c.weakTypeOf[T]
val names = tpe.decls.collect {
case field if field.isMethod && field.asMethod.isCaseAccessor => field.asTerm.name
}
val fields = Apply(
q"Seq",
names.map(name => Literal(Constant(name.toString))).toList
)
val values = Apply(
q"Seq",
names.map(name => q"metric.$name").toList
)
q"""
new MetricSerializer[$tpe] {
def fields = $fields
def values(metric: Metric[$tpe]) = $values
}
"""
}
}
However, when I call Metrics.log -- specifically when it calls implicitly[MetricSerializer[T]] I get the following error:
error: value a is not a member of com.confabulous.deva.Metric[com.confabulous.deva.SessionCountMetric]
Why is it trying to use Metric[com.confabulous.deva.SessionCountMetric] instead of SessionCountMetric?
Conclusion
Fixed it.
def values(metric: Metric[$tpe]) = $values
should have been
def values(metric: $tpe) = $values
You're in a situation that's very close to one described in a recent question: scala macros: defer type inference.
As things stand right now, you'll have to turn log into a macro. An alternative would also to turn Metric.jsFields and Metric.jsValues into JsFieldable and JsValuable type classes materialized by implicit macros at callsites of log (http://docs.scala-lang.org/overviews/macros/implicits.html).
I am trying to cast a String to Int using extractors. My code looks as follows.
object Apply {
def unapply(s: String): Option[Int] = try {
Some(s.toInt)
} catch {
case _: java.lang.Exception => None
}
}
object App {
def toT[T](s: AnyRef): Option[T] = s match {
case v: T => Some(v)
case _ => None
}
def foo(param: String): Int = {
//reads a Map[String,String] m at runtime
toT[Int](m("offset")).getOrElse(0)
}
}
I get a runtime error: java.lang.String cannot be cast to java.lang.Integer. It seems the extractor is not being used at all. What should I do?
Edit: My use case is as follows. I am using play and I want to parse the query string passed in the url. I want to take the query string value (String) and use it as an Int, Double etc. For example,
val offset = getQueryStringAs[Int]("offset").getOrElse(0)
I think the biggest problem here is, that you seem to confuse casting and conversion. You have a Map[String, String] and therefore you can't cast the values to Int. You have to convert them. Luckily Scala adds the toInt method to strings through implicit conversion to StringOps.
This should work for you:
m("offset").toInt
Note that toInt will throw a java.lang.NumberFormatException if the string can not be converted to an integer.
edit:
What you want will afaik only work with typeclasses.
Here is an example:
trait StringConverter[A] {
def convert(x: String): A
}
implicit object StringToInt extends StringConverter[Int] {
def convert(x: String): Int = x.toInt
}
implicit object StringToDouble extends StringConverter[Double] {
def convert(x: String): Double = x.toDouble
}
implicit def string2StringConversion(x: String) = new {
def toT[A](implicit ev: StringConverter[A]) = ev.convert(x)
}
usage:
scala> "0.".toT[Double]
res6: Double = 0.0
There's a problem in your code, for which you should have received compiler warnings:
def toT[T](s: AnyRef): Option[T] = s match {
case v: T => Some(v) // this doesn't work, because T is erased
case _ => None
}
Now... where should Apply have been used? I see it declared, but I don't see it used anywhere.
EDIT
About the warning, look at the discussions around type erasure on Stack Overflow. For example, this answer I wrote on how to get around it -- though it's now deprecated with Scala 2.10.0.
To solve your problem I'd use type classes. For example:
abstract class Converter[T] {
def convert(s: String): T
}
object Converter {
def toConverter[T](converter: String => T): Converter[T] = new Converter[T] {
override def convert(s: String): T = converter(s)
}
implicit val intConverter = toConverter(_.toInt)
implicit val doubleConverter = toConverter(_.toDouble)
}
Then you can rewrite your method like this:
val map = Map("offset" -> "10", "price" -> "9.99")
def getQueryStringAs[T : Converter](key: String): Option[T] = {
val converter = implicitly[Converter[T]]
try {
Some(converter convert map(key))
} catch {
case ex: Exception => None
}
}
In use:
scala> getQueryStringAs[Int]("offset")
res1: Option[Int] = Some(10)
scala> getQueryStringAs[Double]("price")
res2: Option[Double] = Some(9.99)