Mutating a mutable collection using map? - scala

If you have a mutable data structure like an Array, is it possible to use map operations or something similar to change its values?
Say I have val a = Array(5, 1, 3), what the best way of say, subtracting 1 from each value? The best I've come up with is
for(i <- 0 until a.size) { a(i) = a(i) - 1 }
I suppose way would be make the array a var rather than a val so I can say
a = a map (_-1)
edit: fairly easy to roll my own if there's nothing built-in, although I don't know how to generalise to other mutable collections
scala> implicit def ArrayToMutator[T](a: Array[T]) = new {
| def mutate(f: T => T) = a.indices.foreach {i => a(i) = f(a(i))}
| }
ArrayToMutator: [T](a: Array[T])java.lang.Object{def mutate(f: (T) => T): Unit}
scala> val a = Array(5, 1, 3)
a: Array[Int] = Array(5, 1, 3)
scala> a mutate (_-1)
scala> a
res16: Array[Int] = Array(4, 0, 2)

If you don't care about indices, you want the transform method.
scala> val a = Array(1,2,3)
a: Array[Int] = Array(1, 2, 3)
scala> a.transform(_+1)
res1: scala.collection.mutable.WrappedArray[Int] = WrappedArray(2, 3, 4)
scala> a
res2: Array[Int] = Array(2, 3, 4)
(It returns a copy of the wrapped array so that chaining transforms is more efficient, but the original array is modified as you can see.)

How about using foreach since you don't really want to return a modified collection, you want to perform a task for each element. For example:
a.indices.foreach{i => a(i) = a(i) - 1}
or
a.zipWithIndex.foreach{case (x,i) => a(i) = x - 1}

Related

Compare two list and get the index of same elements

val a = List(1,1,1,0,0,2)
val b = List(1,0,3,2)
I want to get the List of indices of elements of "List b" which are existing in "List a".
Here output to be List(0,1,3)
I tried this
for(x <- a.filter(b.contains(_))) yield a.indexOf(x))
Sorry. I missed this. The list size may vary. Edited the Lists
Is there a better way to do this?
If you want a result of indices, it's often useful to start with indices.
b.indices.filter(a contains b(_))
REPL tested.
scala> val a = List(1,1,1,0,0,2)
a: List[Int] = List(1, 1, 1, 0, 0, 2)
scala> val b = List(1,0,3,2)
b: List[Int] = List(1, 0, 3, 2)
scala> b.indices.filter(a contains b(_))
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 3)
val result = (a zip b).zipWithIndex.flatMap {
case ((aItem, bItem), index) => if(aItem == bItem) Option(index) else None
}
a zip b will return all elements from a that have a matching pair in b.
For example, if a is longer, like in your example, the result would be List((1,1),(1,0),(1,3),(0,2)) (the list will be b.length long).
Then you need the index also, that's zipWithIndex.
Since you only want the indexes, you return an Option[Int] and flatten it.
You can use indexed for for this:
for{ i <- 0 to b.length-1
if (a contains b(i))
} yield i
scala> for(x <- b.indices.filter(a contains b(_))) yield x;
res27: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 3)
Here is another option:
scala> val a = List(1,1,1,0,0,2)
a: List[Int] = List(1, 1, 1, 0, 0, 2)
scala> val b = List(1,0,3,2)
b: List[Int] = List(1, 0, 3, 2)
scala> b.zipWithIndex.filter(x => a.contains(x._1)).map(x => x._2)
res7: List[Int] = List(0, 1, 3)
I also want to point out that your original idea of: Finding elements in b that are in a and then getting indices of those elements would not work, unless all elements in b contained in a are unique, indexOf returns index of the first element. Just heads up.

Why Spark doesn't allow map-side combining with array keys?

I'm using Spark 1.3.1 and I'm curious why Spark doesn't allow using array keys on map-side combining.
Piece of combineByKey function:
if (keyClass.isArray) {
if (mapSideCombine) {
throw new SparkException("Cannot use map-side combining with array keys.")
}
}
Basically for the same reason why default partitioner cannot partition array keys.
Scala Array is just a wrapper around Java array and its hashCode doesn't depend on a content:
scala> val x = Array(1, 2, 3)
x: Array[Int] = Array(1, 2, 3)
scala> val h = x.hashCode
h: Int = 630226932
scala> x(0) = -1
scala> x.hashCode() == h1
res3: Boolean = true
It means that two arrays with exact the same content are not equal
scala> x
res4: Array[Int] = Array(-1, 2, 3)
scala> val y = Array(-1, 2, 3)
y: Array[Int] = Array(-1, 2, 3)
scala> y == x
res5: Boolean = false
As result Arrays cannot be used as a meaningful keys. If you're not convinced just check what happens when you use Array as key for Scala Map:
scala> Map(Array(1) -> 1, Array(1) -> 2)
res7: scala.collection.immutable.Map[Array[Int],Int] = Map(Array(1) -> 1, Array(1) -> 2)
If you want to use a collection as key you should use an immutable data structure like a Vector or a List.
scala> Map(Array(1).toVector -> 1, Array(1).toVector -> 2)
res15: scala.collection.immutable.Map[Vector[Int],Int] = Map(Vector(1) -> 2)
See also:
SI-1607
How does HashPartitioner work?
A list as a key for PySpark's reduceByKey

How to select elements of collection based on another of different type?

I know I can do this:
scala> val a = List(1,2,3)
a: List[Int] = List(1, 2, 3)
scala> val b = List(2,4)
b: List[Int] = List(2, 4)
scala> a.filterNot(b.toSet)
res0: List[Int] = List(1, 3)
But I'd like to select elements of a collection based on their integer key, as in the following:
case class N (p: Int , q: Int)
val x = List(N(1,100), N(2,200), N(3,300))
val y = List(2,4)
val z = .... ?
Z // want Z to be ((N1,100), (N3,300)) after removing the items of type N with 'p'
// matching any item in list y.
I know one way to do it is is something like the following which makes the above broken code work:
val z = x.filterNot(e => y.contains(e.p))
but this seems very inefficient. Is there a better way?
Just do
val z = y.toSet
x.filterNot {z.contains(_.p)}
That's linear.
The problem with contains is that the search will be a linear search and you are looking at O(N^2) solution(which is still OK, if the dataset is not large)
Anyways, a simple solution can be to use Binary search to get O(NlnN) solution. You can easily convert val y to Array from list and then use java's binary search method.
scala> case class N(p: Int, q: Int)
defined class N
scala> val x = List(N(1, 100), N(2, 200), N(3, 300))
x: List[N] = List(N(1,100), N(2,200), N(3,300))
scala> val y = Array(2, 4) // Using Array directly.
y: Array[Int] = Array(2, 4)
scala> val z = x.filterNot(e => java.util.Arrays.binarySearch(y, e.p) >= 0)
z: List[N] = List(N(1,100), N(3,300))

How can I find the index of the maximum value in a List in Scala?

For a Scala List[Int] I can call the method max to find the maximum element value.
How can I find the index of the maximum element?
This is what I am doing now:
val max = list.max
val index = list.indexOf(max)
One way to do this is to zip the list with its indices, find the resulting pair with the largest first element, and return the second element of that pair:
scala> List(0, 43, 1, 34, 10).zipWithIndex.maxBy(_._1)._2
res0: Int = 1
This isn't the most efficient way to solve the problem, but it's idiomatic and clear.
Since Seq is a function in Scala, the following code works:
list.indices.maxBy(list)
even easier to read would be:
val g = List(0, 43, 1, 34, 10)
val g_index=g.indexOf(g.max)
def maxIndex[ T <% Ordered[T] ] (list : List[T]) : Option[Int] = list match {
case Nil => None
case head::tail => Some(
tail.foldLeft((0, head, 1)){
case ((indexOfMaximum, maximum, index), elem) =>
if(elem > maximum) (index, elem, index + 1)
else (indexOfMaximum, maximum, index + 1)
}._1
)
} //> maxIndex: [T](list: List[T])(implicit evidence$2: T => Ordered[T])Option[Int]
maxIndex(Nil) //> res0: Option[Int] = None
maxIndex(List(1,2,3,4,3)) //> res1: Option[Int] = Some(3)
maxIndex(List("a","x","c","d","e")) //> res2: Option[Int] = Some(1)
maxIndex(Nil).getOrElse(-1) //> res3: Int = -1
maxIndex(List(1,2,3,4,3)).getOrElse(-1) //> res4: Int = 3
maxIndex(List(1,2,2,1)).getOrElse(-1) //> res5: Int = 1
In case there are multiple maximums, it returns the first one's index.
Pros:You can use this with multiple types, it goes through the list only once, you can supply a default index instead of getting exception for empty lists.
Cons:Maybe you prefer exceptions :) Not a one-liner.
I think most of the solutions presented here go thru the list twice (or average 1.5 times) -- Once for max and the other for the max position. Perhaps a lot of focus is on what looks pretty?
In order to go thru a non empty list just once, the following can be tried:
list.foldLeft((0, Int.MinValue, -1)) {
case ((i, max, maxloc), v) =>
if (v > max) (i + 1, v, i)
else (i + 1, max, maxloc)}._3
Pimp my library! :)
class AwesomeList(list: List[Int]) {
def getMaxIndex: Int = {
val max = list.max
list.indexOf(max)
}
}
implicit def makeAwesomeList(xs: List[Int]) = new AwesomeList(xs)
//> makeAwesomeList: (xs: List[Int])scalaconsole.scratchie1.AwesomeList
//Now we can do this:
List(4,2,7,1,5,6) getMaxIndex //> res0: Int = 2
//And also this:
val myList = List(4,2,7,1,5,6) //> myList : List[Int] = List(4, 2, 7, 1, 5, 6)
myList getMaxIndex //> res1: Int = 2
//Regular list methods also work
myList filter (_%2==0) //> res2: List[Int] = List(4, 2, 6)
More details about this pattern here: http://www.artima.com/weblogs/viewpost.jsp?thread=179766

Match more than one element in collection

Define:
val x = List(1, 2, 3, 4)
I want to find if x contains either 1 or 3.
One way is
x.contains(1) || x.contains(3)
another is
x.exists(y => y == 1 || y == 3)
and another is:
x.exists(List(1,3).contains(_))
I would have preferred something similar to
x.containsAnyOf(1, 3)
Note that x.containsSlice does not work in this case.
Is there a better solution?
You can do
x exists Set(0, 1, 2)
Ofcourse, there's no containsAnyOf in Scala's standard library. You can make it look like there is by using the "pimp my library" pattern.
class ContainsAnyOf[T](seq: Seq[T]) {
def containsAnyOf(xs: T*) = seq.exists(xs.contains(_))
}
implicit def seqToContainsAnyOf[T](seq: Seq[T]) = new ContainsAnyOf(seq)
Now you can do:
scala> val a = List(1,2,3,4)
a: List[Int] = List(1, 2, 3, 4)
scala> a.containsAnyOf(1,3)
res0: Boolean = true