Anonymous comparator function in scala with deconstructed tuples? - scala

I have a list of tuples, that I want to sort ascending according to the second element in the tuple. I do it with this code:
freqs.sortWith( _._2 < _._2 )
But I dont like the ._2 naming, as I would prefer to have a nice name to the second parameter like freqs.sortWith( _.weight < _.weight ).
Any ideas how to do this?

You don't need to repeat everything twice in sortWith, you can use sortBy(_._2) instead.
If you want to have a nice name, create a custom case class that has member variables of this name:
case class Foo(whatever: String, weight: Double)
val list: List[Foo] = ???
list.sortBy(_.weight)
It takes just a single line.
Alternatively, you can "pimp" the tuples locally:
class WeightOps(val whatever: String, val weight: Double)
implicit def tupleToWeightOps(t: (String, Double)): WeightOps =
new WeightOps(t._1, t._2)
then you can use .weight on tuples directly:
val list: List[(String, Double)] = ???
list.sortBy(_.weight)
Don't forget to keep the implicit scope as small as possible.

scala> val list: List[(String, Int)] = List (("foo", 7), ("bar", 3), ("foobar", 5))
list: List[(String, Int)] = List((foo,7), (bar,3), (foobar,5))
scala> list.sortBy {case (ignore, price) => price }
res70: List[(String, Int)] = List((bar,3), (foobar,5), (foo,7))
A case extractor can be used to put meaningful names on variables.

Related

Merging two arrays in Scala

My requirement is that :
arr1 : Array[(String, String)] = Array((bangalore,Kanata), (kannur,Kerala))
arr2 : Array[(String, String)] = Array((001,anup), (002,sithu))
should give me
Array((001,anup,bangalore,Krnata), (002,sithu,kannur,Kerala))
I tried this :
val arr3 = arr2.map(field=>(field,arr1))
but it didn't work
#nicodp's answer addressed your question very nicely. zip and then map will give you the resultant array.
Recall that if one list is larger than the other, its remaining elements are ignored.
My attempt tries to address this:
Consider:
val arr1 = Array(("bangalore","Kanata"), ("kannur","Kerala"))
val arr2 = Array(("001","anup", "ramakrishan"), ("002","sithu", "bhattacharya"))
zip and mapping on tuples will give the result as:
arr1.zip(arr2).map(field => (field._1._1, field._1._2, field._2._1, field._2._2))
Array[(String, String, String, String)] = Array((bangalore,Kanata,001,anup), (kannur,Kerala,002,sithu))
// This ignores the last field of arr2
While mapping, you can convert the tuple in iterator and get a list from it. This will enable you to not keep a track of Tuple2 or Tuple3
arr1.zip(arr2).map{ case(k,v) => List(k.productIterator.toList, v.productIterator.toList).flatten }
// Array[List[Any]] = Array(List(bangalore, Kanata, 001, anup, ramakrishan), List(kannur, Kerala, 002, sithu, bhattacharya))
You can do a zip followed by a map:
scala> val arr1 = Array((1,2),(3,4))
arr1: Array[(Int, Int)] = Array((1,2), (3,4))
scala> val arr2 = Array((5,6),(7,8))
arr2: Array[(Int, Int)] = Array((5,6), (7,8))
scala> arr1.zip(arr2).map(field => (field._1._1, field._1._2, field._2._1, field._2._2))
res1: Array[(Int, Int, Int, Int)] = Array((1,2,5,6), (3,4,7,8))
The map acts as a flatten for tuples, that is, takes things of type ((A, B), (C, D)) and maps them to (A, B, C, D).
What zip does is... meh, let's see its type:
def zip[B](that: GenIterable[B]): List[(A, B)]
So, from there, we can argue that it takes an iterable collection (which can be another list) and returns a list which is the combination of the corresponding elements of both this: List[A] and that: List[B] lists. Recall that if one list is larger than the other, its remaining elements are ignored. You can dig more about list functions in the documentation.
I agree that the cleanes solution is using the zip method from collections
val arr1 = Array(("bangalore","Kanata"), ("kannur","Kerala"))
val arr2 = Array(("001","anup"), ("002","sithu"))
arr1.zip(arr2).foldLeft(List.empty[Any]) {
case (acc, (a, b)) => acc ::: List(a.productIterator.toList ++ b.productIterator.toList)
}

Scala convert Iterable of Tuples to Array of Just Tuple._2

Say I have a Iterable[(Int, String)]. How do I get an array of just the "values"? That is, how do I convert from Iterable[(Int, String)] => Array[String]? The "keys" or "values" do not have to be unique, and that's why I put them in quotation marks.
iterable.map(_._2).toArray
_._2 : take out the second element of the tuple represented by input variable( _ ) whose name I don't care.
Simply:
val iterable: Iterable[(Int, String)] = Iterable((1, "a"), (2, "b"))
val values = iterable.toArray.map(_._2)
Simply map the iterable and extract the second element(tuple._2),
scala> val iterable: Iterable[(Int, String)] = Iterable((100, "Bring me the horizon"), (200, "Porcupine Tree"))
iterable: Iterable[(Int, String)] = List((100,Bring me the horizon), (200,Porcupine Tree))
scala> iterable.map(tuple => tuple._2).toArray
res3: Array[String] = Array(Bring me the horizon, Porcupine Tree)
In addition to the already suggested map you might want to build the array as you map from tuple to string instead of converting at some point as it might save an iteration.
import scala.collection
val values: Array[String] = iterable.map(_._2)(collection.breakOut)

How do I put a case class in an rdd and have it act like a tuple(pair)?

Say for example, I have a simple case class
case class Foo(k:String, v1:String, v2:String)
Can I get spark to recognise this as a tuple for the purposes of something like this, without converting to a tuple in, say a map or keyBy step.
val rdd = sc.parallelize(List(Foo("k", "v1", "v2")))
// Swap values
rdd.mapValues(v => (v._2, v._1))
I don't even care if it looses the original case class after such an operation. I've tried the following with no luck. I'm fairly new to Scala, am I missing something?
case class Foo(k:String, v1:String, v2:String)
extends Tuple2[String, (String, String)](k, (v1, v2))
edit: In the above snippet the case class extends Tuple2, this does not produce the desired effect that the RDD class and functions do not treat it like a tuple and allow PairRDDFunctions, such as mapValues, values, reduceByKey, etc.
Extending TupleN isn't a good idea for a number of reasons, with one of the best being the fact that it's deprecated, and on 2.11 it's not even possible to extend TupleN with a case class. Even if you make your Foo a non-case class, defining it on 2.11 with -deprecation will show you this: "warning: inheritance from class Tuple2 in package scala is deprecated: Tuples will be made final in a future version.".
If what you care about is convenience of use and you don't mind the (almost certainly negligible) overhead of the conversion to a tuple, you can enrich a RDD[Foo] with the syntax provided by PairRDDFunctions with a conversion like this:
import org.apache.spark.rdd.{ PairRDDFunctions, RDD }
case class Foo(k: String, v1: String, v2: String)
implicit def fooToPairRDDFunctions[K, V]
(rdd: RDD[Foo]): PairRDDFunctions[String, (String, String)] =
new PairRDDFunctions(
rdd.map {
case Foo(k, v1, v2) => k -> (v1, v2)
}
)
And then:
scala> val rdd = sc.parallelize(List(Foo("a", "b", "c"), Foo("d", "e", "f")))
rdd: org.apache.spark.rdd.RDD[Foo] = ParallelCollectionRDD[6] at parallelize at <console>:34
scala> rdd.mapValues(_._1).first
res0: (String, String) = (a,b)
The reason your version with Foo extending Tuple2[String, (String, String)] doesn't work is that RDD.rddToPairRDDFunctions targets an RDD[Tuple2[K, V]] and RDD isn't covariant in its type parameter, so an RDD[Foo] isn't a RDD[Tuple2[K, V]]. A simpler example might make this clearer:
case class Box[A](a: A)
class Foo(k: String, v: String) extends Tuple2[String, String](k, v)
class PairBoxFunctions(box: Box[(String, String)]) {
def pairValue: String = box.a._2
}
implicit def toPairBoxFunctions(box: Box[(String, String)]): PairBoxFunctions =
new PairBoxFunctions(box)
And then:
scala> Box(("a", "b")).pairValue
res0: String = b
scala> Box(new Foo("a", "b")).pairValue
<console>:16: error: value pairValue is not a member of Box[Foo]
Box(new Foo("a", "b")).pairValue
^
But if you make Box covariant…
case class Box[+A](a: A)
class Foo(k: String, v: String) extends Tuple2[String, String](k, v)
class PairBoxFunctions(box: Box[(String, String)]) {
def pairValue: String = box.a._2
}
implicit def toPairBoxFunctions(box: Box[(String, String)]): PairBoxFunctions =
new PairBoxFunctions(box)
…everything's fine:
scala> Box(("a", "b")).pairValue
res0: String = b
scala> Box(new Foo("a", "b")).pairValue
res1: String = b
You can't make RDD covariant, though, so defining your own implicit conversion to add the syntax is your best bet. Personally I'd probably choose to do the conversion explicitly, but this is a relatively un-horrible use of implicit conversions.
Not sure if I get your question right, but let say you have a case class
import org.apache.spark.rdd.RDD
case class DataFormat(id: Int, name: String, value: Double)
val data: Seq[(Int, String, Double)] = Seq(
(1, "Joe", 0.1),
(2, "Mike", 0.3)
)
val rdd: RDD[DataFormat] = (
sc.parallelize(data).map(x=>DataFormat(x._1, x._2, x._3))
)
// Print all data
rdd.foreach(println)
// Print only names
rdd.map(x=>x.name).foreach(println)

Scala - Iterate over an Iterator of type Product[K,V]

I am a newbie to Scala and I am trying to understand collectives. I have a sample Scala code in which a method is defined as follows:
override def write(records: Iterator[Product2[K, V]]): Unit = {...}
From what I understand, this function is passed an argument record which is an Iterator of type Product2[K,V]. Now what I don't understand is this Product2 a user defined class or is it a built in data structure. Moreover how do explore the key-value pair contents of Product2 and how do I iterate over them.
Chances are Product2 is a built-in class and you can easily check it if you're in modern IDE (just hover over it with ctrl pressed), or, by inspecting file header -- if there is no related imports, like some.custom.package.Product2, it's built-in.
What is Product2 and where it's defined? You can easily found out such things by utilizing Scala's ScalaDoc:
In case of build-in class you can treat it like tuple of 2 elements (in fact Tuple2 extends Product2, as you may see below), which has ._1 and ._2 accessor methods.
scala> val x: Product2[String, Int] = ("foo", 1)
// x: Product2[String,Int] = (foo,1)
scala> x._1
// res0: String = foo
scala> x._2
// res1: Int = 1
See How should I think about Scala's Product classes? for more.
Iteration is also hassle free, for example here is the map operation:
scala> val xs: Iterator[Product2[String, Int]] = List("foo" -> 1, "bar" -> 2, "baz" -> 3).iterator
xs: Iterator[Product2[String,Int]] = non-empty iterator
scala> val keys = xs.map(kv => kv._1)
keys: Iterator[String] = non-empty iterator
scala> val keys = xs.map(kv => kv._1).toList
keys: List[String] = List(foo, bar, baz)
scala> xs
res2: Iterator[Product2[String,Int]] = empty iterator
Keep in mind though, that once iterator was consumed, it transitions to empty state and can't be re-used again.
Product2 is just two values of type K and V.
use it like this:
write(List((1, "one"), (2, "two")))
the prototype can also be written like: override def write(records: Iterator[(K, V)]): Unit = {...}
To access values k of type K and v of type V.
override def write(records: Iterator[(K, V)]): Unit = {
records.map{case (k, v) => w(k, v)}
}

Scala map and/or groupby functions

I am new to Scala and I am trying to figure out some scala syntax.
So I have a list of strings.
wordList: List[String] = List("this", "is", "a", "test")
I have a function that returns a list of pairs that contains consonants and vowels counts per word:
def countFunction(words: List[String]): List[(String, Int)]
So, for example:
countFunction(List("test")) => List(('Consonants', 3), ('Vowels', 1))
I now want to take a list of words and group them by count signatures:
def mapFunction(words: List[String]): Map[List[(String, Int)], List[String]]
//using wordList from above
mapFunction(wordList) => List(('Consonants', 3), ('Vowels', 1)) -> Seq("this", "test")
List(('Consonants', 1), ('Vowels', 1)) -> Seq("is")
List(('Consonants', 0), ('Vowels', 1)) -> Seq("a")
I'm thinking I need to use GroupBy to do this:
def mapFunction(words: List[String]): Map[List[(String, Int)], List[String]] = {
words.groupBy(F: (A) => K)
}
I've read the scala api for Map.GroupBy and see that F represents discriminator function and K is the type of keys you want returned. So I tried this:
words.groupBy(countFunction => List[(String, Int)]
However, scala doesn't like this syntax. I tried looking up some examples for groupBy and nothing seems to help me with my use case. Any ideas?
Based on your description, your count function should take a word instead of a list of words. I would have defined it like this:
def countFunction(words: String): List[(String, Int)]
If you do that you should be able to call words.groupBy(countFunction), which is the same as:
words.groupBy(word => countFunction(word))
If you cannot change the signature of countFunction, then you should be able to call group by like this:
words.groupBy(word => countFunction(List(word)))
You shouldn't put the return type of the function in the call. The compiler can figure this out itself. You should just call it like this:
words.groupBy(countFunction)
If that doesn't work, please post your countFunction implementation.
Update:
I tested it in the REPL and this works (note that my countFunction has a slightly different signature from yours):
scala> def isVowel(c: Char) = "aeiou".contains(c)
isVowel: (c: Char)Boolean
scala> def isConsonant(c: Char) = ! isVowel(c)
isConsonant: (c: Char)Boolean
scala> def countFunction(s: String) = (('Consonants, s count isConsonant), ('Vowels, s count isVowel))
countFunction: (s: String)((Symbol, Int), (Symbol, Int))
scala> List("this", "is", "a", "test").groupBy(countFunction)
res1: scala.collection.immutable.Map[((Symbol, Int), (Symbol, Int)),List[java.lang.String]] = Map((('Consonants,0),('Vowels,1)) -> List(a), (('Consonants,1),('Vowels,1)) -> List(is), (('Consonants,3),('Vowels,1)) -> List(this, test))
You can include the type of the function passed to groupBy, but like I said you don't need it. If you want to pass it in you do it like this:
words.groupBy(countFunction: String => ((Symbol, Int), (Symbol, Int)))