How to select elements of collection based on another of different type? - scala

I know I can do this:
scala> val a = List(1,2,3)
a: List[Int] = List(1, 2, 3)
scala> val b = List(2,4)
b: List[Int] = List(2, 4)
scala> a.filterNot(b.toSet)
res0: List[Int] = List(1, 3)
But I'd like to select elements of a collection based on their integer key, as in the following:
case class N (p: Int , q: Int)
val x = List(N(1,100), N(2,200), N(3,300))
val y = List(2,4)
val z = .... ?
Z // want Z to be ((N1,100), (N3,300)) after removing the items of type N with 'p'
// matching any item in list y.
I know one way to do it is is something like the following which makes the above broken code work:
val z = x.filterNot(e => y.contains(e.p))
but this seems very inefficient. Is there a better way?

Just do
val z = y.toSet
x.filterNot {z.contains(_.p)}
That's linear.

The problem with contains is that the search will be a linear search and you are looking at O(N^2) solution(which is still OK, if the dataset is not large)
Anyways, a simple solution can be to use Binary search to get O(NlnN) solution. You can easily convert val y to Array from list and then use java's binary search method.
scala> case class N(p: Int, q: Int)
defined class N
scala> val x = List(N(1, 100), N(2, 200), N(3, 300))
x: List[N] = List(N(1,100), N(2,200), N(3,300))
scala> val y = Array(2, 4) // Using Array directly.
y: Array[Int] = Array(2, 4)
scala> val z = x.filterNot(e => java.util.Arrays.binarySearch(y, e.p) >= 0)
z: List[N] = List(N(1,100), N(3,300))

Related

How to avoid calling toVector on each Scala for comprehension / yield?

I have several methods that operate on Vector sequences and the following idiom is common when combining data from multiple vectors into a single one with the use of a for comprehension / yield:
(for (i <- 0 until y.length) yield y(i) + 0.5*dy1(i)) toVector
Notice the closing toVector and the enclosing parentheses around the for comprehension. I want to get rid of it because it's ugly, but removing it produces the following error:
type mismatch;
found : scala.collection.immutable.IndexedSeq[Double]
required: Vector[Double]
Is there a better way of achieving what I want that avoids explicitly calling toVector many times to essentially achieve a non-operation (converting and indexed sequence...to an indexed sequence)?
One way to avoid collection casting, e.g. toVector, is to invoke, if possible, only those methods that return the same collection type.
y.zipWithIndex.map{case (yv,idx) => yv + 0.5*dy1(idx)}
for yield on Range which you are using in your example yields a Vector[T] by default.
example,
scala> val squares= for (x <- Range(1, 3)) yield x * x
squares: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 4)
check the type,
scala> squares.isInstanceOf[Vector[Int]]
res14: Boolean = true
Note that Vector[T] also extends IndexedSeq[T].
#SerialVersionUID(-1334388273712300479L)
final class Vector[+A] private[immutable] (private[collection] val startIndex: Int, private[collection] val endIndex: Int, focus: Int)
extends AbstractSeq[A]
with IndexedSeq[A]
with GenericTraversableTemplate[A, Vector]
with IndexedSeqLike[A, Vector[A]]
with VectorPointer[A #uncheckedVariance]
with Serializable
with CustomParallelizable[A, ParVector[A]]
That's why above result is also an instance of IndexedSeq[T],
scala> squares.isInstanceOf[IndexedSeq[Int]]
res15: Boolean = true
You can define the type of your result as IndexedSeq[T] and still achieve what you want with Vector without explicitly calling .toVector
scala> val squares: IndexedSeq[Int] = for (x <- Range(1, 3)) yield x * x
squares: IndexedSeq[Int] = Vector(1, 4)
scala> squares == Vector(1, 4)
res16: Boolean = true
But for yield on Seq[T] gives List[T] by default.
scala> val squares = for (x <- Seq(1, 3)) yield x * x
squares: Seq[Int] = List(1, 9)
Only in that case if you want vector you must .toVector the result.
scala> squares.isInstanceOf[Vector[Int]]
res21: Boolean = false
scala> val squares = (for (x <- Seq(1, 3)) yield x * x).toVector
squares: Vector[Int] = Vector(1, 9)

Why Spark doesn't allow map-side combining with array keys?

I'm using Spark 1.3.1 and I'm curious why Spark doesn't allow using array keys on map-side combining.
Piece of combineByKey function:
if (keyClass.isArray) {
if (mapSideCombine) {
throw new SparkException("Cannot use map-side combining with array keys.")
}
}
Basically for the same reason why default partitioner cannot partition array keys.
Scala Array is just a wrapper around Java array and its hashCode doesn't depend on a content:
scala> val x = Array(1, 2, 3)
x: Array[Int] = Array(1, 2, 3)
scala> val h = x.hashCode
h: Int = 630226932
scala> x(0) = -1
scala> x.hashCode() == h1
res3: Boolean = true
It means that two arrays with exact the same content are not equal
scala> x
res4: Array[Int] = Array(-1, 2, 3)
scala> val y = Array(-1, 2, 3)
y: Array[Int] = Array(-1, 2, 3)
scala> y == x
res5: Boolean = false
As result Arrays cannot be used as a meaningful keys. If you're not convinced just check what happens when you use Array as key for Scala Map:
scala> Map(Array(1) -> 1, Array(1) -> 2)
res7: scala.collection.immutable.Map[Array[Int],Int] = Map(Array(1) -> 1, Array(1) -> 2)
If you want to use a collection as key you should use an immutable data structure like a Vector or a List.
scala> Map(Array(1).toVector -> 1, Array(1).toVector -> 2)
res15: scala.collection.immutable.Map[Vector[Int],Int] = Map(Vector(1) -> 2)
See also:
SI-1607
How does HashPartitioner work?
A list as a key for PySpark's reduceByKey

Match more than one element in collection

Define:
val x = List(1, 2, 3, 4)
I want to find if x contains either 1 or 3.
One way is
x.contains(1) || x.contains(3)
another is
x.exists(y => y == 1 || y == 3)
and another is:
x.exists(List(1,3).contains(_))
I would have preferred something similar to
x.containsAnyOf(1, 3)
Note that x.containsSlice does not work in this case.
Is there a better solution?
You can do
x exists Set(0, 1, 2)
Ofcourse, there's no containsAnyOf in Scala's standard library. You can make it look like there is by using the "pimp my library" pattern.
class ContainsAnyOf[T](seq: Seq[T]) {
def containsAnyOf(xs: T*) = seq.exists(xs.contains(_))
}
implicit def seqToContainsAnyOf[T](seq: Seq[T]) = new ContainsAnyOf(seq)
Now you can do:
scala> val a = List(1,2,3,4)
a: List[Int] = List(1, 2, 3, 4)
scala> a.containsAnyOf(1,3)
res0: Boolean = true

Mutating a mutable collection using map?

If you have a mutable data structure like an Array, is it possible to use map operations or something similar to change its values?
Say I have val a = Array(5, 1, 3), what the best way of say, subtracting 1 from each value? The best I've come up with is
for(i <- 0 until a.size) { a(i) = a(i) - 1 }
I suppose way would be make the array a var rather than a val so I can say
a = a map (_-1)
edit: fairly easy to roll my own if there's nothing built-in, although I don't know how to generalise to other mutable collections
scala> implicit def ArrayToMutator[T](a: Array[T]) = new {
| def mutate(f: T => T) = a.indices.foreach {i => a(i) = f(a(i))}
| }
ArrayToMutator: [T](a: Array[T])java.lang.Object{def mutate(f: (T) => T): Unit}
scala> val a = Array(5, 1, 3)
a: Array[Int] = Array(5, 1, 3)
scala> a mutate (_-1)
scala> a
res16: Array[Int] = Array(4, 0, 2)
If you don't care about indices, you want the transform method.
scala> val a = Array(1,2,3)
a: Array[Int] = Array(1, 2, 3)
scala> a.transform(_+1)
res1: scala.collection.mutable.WrappedArray[Int] = WrappedArray(2, 3, 4)
scala> a
res2: Array[Int] = Array(2, 3, 4)
(It returns a copy of the wrapped array so that chaining transforms is more efficient, but the original array is modified as you can see.)
How about using foreach since you don't really want to return a modified collection, you want to perform a task for each element. For example:
a.indices.foreach{i => a(i) = a(i) - 1}
or
a.zipWithIndex.foreach{case (x,i) => a(i) = x - 1}

Simple question about tuple of scala

I'm new to scala, and what I'm learning is tuple.
I can define a tuple as following, and get the items:
val tuple = ("Mike", 40, "New York")
println("Name: " + tuple._1)
println("Age: " + tuple._2)
println("City: " + tuple._3)
My question is:
How to get the length of a tuple?
Is tuple mutable? Can I modify its items?
Is there any other useful operation we can do on a tuple?
Thanks in advance!
1] tuple.productArity
2] No.
3] Some interesting operations you can perform on tuples: (a short REPL session)
scala> val x = (3, "hello")
x: (Int, java.lang.String) = (3,hello)
scala> x.swap
res0: (java.lang.String, Int) = (hello,3)
scala> x.toString
res1: java.lang.String = (3,hello)
scala> val y = (3, "hello")
y: (Int, java.lang.String) = (3,hello)
scala> x == y
res2: Boolean = true
scala> x.productPrefix
res3: java.lang.String = Tuple2
scala> val xi = x.productIterator
xi: Iterator[Any] = non-empty iterator
scala> while(xi.hasNext) println(xi.next)
3
hello
See scaladocs of Tuple2, Tuple3 etc for more.
One thing that you can also do with a tuple is to extract the content using the match expression:
def tupleview( tup: Any ){
tup match {
case (a: String, b: String) =>
println("A pair of strings: "+a + " "+ b)
case (a: Int, b: Int, c: Int) =>
println("A triplet of ints: "+a + " "+ b + " " +c)
case _ => println("Unknown")
}
}
tupleview( ("Hello", "Freewind"))
tupleview( (1,2,3))
Gives:
A pair of strings: Hello Freewind
A triplet of ints: 1 2 3
Tuples are immutable, but, like all cases classes, they have a copy method that can be used to create a new Tuple with a few changed elements:
scala> (1, false, "two")
res0: (Int, Boolean, java.lang.String) = (1,false,two)
scala> res0.copy(_2 = true)
res1: (Int, Boolean, java.lang.String) = (1,true,two)
scala> res1.copy(_1 = 1f)
res2: (Float, Boolean, java.lang.String) = (1.0,true,two)
Concerning question 3:
A useful thing you can do with Tuples is to store parameter lists for functions:
def f(i:Int, s:String, c:Char) = s * i + c
List((3, "cha", '!'), (2, "bora", '.')).foreach(t => println((f _).tupled(t)))
//--> chachacha!
//--> borabora.
[Edit] As Randall remarks, you'd better use something like this in "real life":
def f(i:Int, s:String, c:Char) = s * i + c
val g = (f _).tupled
List((3, "cha", '!'), (2, "bora", '.')).foreach(t => println(g(t)))
In order to extract the values from tuples in the middle of a "collection transformation chain" you can write:
val words = List((3, "cha"),(2, "bora")).map{ case(i,s) => s * i }
Note the curly braces around the case, parentheses won't work.
Another nice trick ad question 3) (as 1 and 2 are already answered by others)
val tuple = ("Mike", 40, "New York")
tuple match {
case (name, age, city) =>{
println("Name: " + name)
println("Age: " + age)
println("City: " + city)
}
}
Edit: in fact it's rather a feature of pattern matching and case classes, a tuple is just a simple example of a case class...
You know the size of a tuple, it's part of it's type. For example if you define a function def f(tup: (Int, Int)), you know the length of tup is 2 because values of type (Int, Int) (aka Tuple2[Int, Int]) always have a length of 2.
No.
Not really. Tuples are useful for storing a fixed amount of items of possibly different types and passing them around, putting them into data structures etc. There's really not much you can do with them, other than creating tuples, and getting stuff out of tuples.
1 and 2 have already been answered.
A very useful thing that you can use tuples for is to return more than one value from a method or function. Simple example:
// Get the min and max of two integers
def minmax(a: Int, b: Int): (Int, Int) = if (a < b) (a, b) else (b, a)
// Call it and assign the result to two variables like this:
val (x, y) = minmax(10, 3) // x = 3, y = 10
Using shapeless, you easily get a lot of useful methods, that are usually available only on collections:
import shapeless.syntax.std.tuple._
val t = ("a", 2, true, 0.0)
val first = t(0)
val second = t(1)
// etc
val head = t.head
val tail = t.tail
val init = t.init
val last = t.last
val v = (2.0, 3L)
val concat = t ++ v
val append = t :+ 2L
val prepend = 1.0 +: t
val take2 = t take 2
val drop3 = t drop 3
val reverse = t.reverse
val zip = t zip (2.0, 2, "a", false)
val (unzip, other) = zip.unzip
val list = t.toList
val array = t.toArray
val set = t.to[Set]
Everything is typed as one would expect (that is first has type String, concat has type (String, Int, Boolean, Double, Double, Long), etc.)
The last method above (.to[Collection]) should be available in the next release (as of 2014/07/19).
You can also "update" a tuple
val a = t.updatedAt(1, 3) // gives ("a", 3, true, 0.0)
but that will return a new tuple instead of mutating the original one.