Scala groupBy: Want array indices satisfying predicate, not array values

Scala groupBy: Want array indices satisfying predicate, not array values - scala

val m = Array(10,20,30,30,50,60,70,80) groupBy ( s => s %30 == 0)
m(true).map { kv => println(kv) }
prints the values 30, 30, 60
I want the indices i.e. 2, 3, 5 to be printed.
How do I go about this?

val m = Array(10,20,30,30,50,60,70,80).zipWithIndex.groupBy(s =>
s._1 % 30 == 0).map(e => e._1 -> (e._2.unzip._2))
Just FYI, if you only want the true values, then you could go with #missingfaktor's approach and equally you could partition this:
val m = Array(10, 20, 30, 30, 50, 60, 70, 80).zipWithIndex.partition(s =>
s._1 % 30 == 0)._1.unzip._2

Here's another way to do it:
Array(10,20,30,30,50,60,70,80).zipWithIndex.filter{ _._1 % 30 == 0 }.map{ _._2 }
I find the .map{ _._2 } easier to comprehend than .unzip._2, but maybe that's just me. What's also interesting is that the above returns:
Array[Int] = Array(2, 3, 5)
While the unzip variant returns this:
scala.collection.mutable.IndexedSeq[Int] = ArrayBuffer(2, 3, 5)

Array(10, 20, 30, 30, 50, 60, 70, 80)
.zipWithIndex
.collect { case (element, index) if element % 30 == 0 => index }
// Array[Int] = Array(2, 3, 5)

Here's a more direct way,
val m = Array(10,20,30,30,50,60,70,80).zipWithIndex.filter(_._1 % 30 == 0).unzip
obtains the values and indices as a pair, (ArrayBuffer(30, 30, 60),ArrayBuffer(2, 3, 5)) You can print just the indices with
m._2.foreach(println _)

val a=Array(10,20,30,30,50,60,70,80)
println( a.indices.filter( a(_)%30==0 ) )

Related

access index and value of a scala list inside a map function

val arr = List(8, 15, 22, 1, 10, 6, 18, 18, 1)
arr.zipWithIndex.map(_._2) works and give me index of the elements in the list
.How to access the index and the element as part of the map function

val arr = List(8, 15, 22, 1, 10, 6, 18, 18, 1)
arr.zipWithIndex.map(zippedList => (zippedList._1, zippedList._2))
if you want to access the element it's ._1 and index ._2
you can also use this:
arr.zipWithIndex.map {
case (x, y) => print(x, y)
}
and so the operation on x and y what ever you want to do.

This is typically done using zipWithIndex and pattern matching:
arr.zipWithIndex.map{ case (value, index) => ??? }

You can use partial function to deconstruct a tuple
val arr = List(8, 15, 22, 1, 10, 6, 18, 18, 1)
arr.zipWithIndex.map { case (value, index) => println(value -> index) }

Getting maximum value in remaining sub-list for each element in a list

val input = List(16, 17, 4, 5, 3, 0)
We have to sort in a way where starting from the first element we need to return the maximum element in the remaining list.
I want output like 17,5,5,3,0.
I've Tried Below code
val v1 = input.scanRight(Int.MinValue)(math.max).dropRight(1)
println("variation 1, with scan")
println(v1)

If I understand your requirement correctly, you could use foldRight to traverse the list from right to left, and at each iteration store the maximum value of the traversed elements in the accumulator, which is a Tuple of (List[Int], Int):
val input = List(16, 17, 4, 5, 3, 0)
input.foldRight((List[Int](), Int.MinValue)){ case (i, (ls, j)) =>
val m = i max j
(m :: ls, m)
}._1
// res1: List[Int] = List(17, 17, 5, 5, 3, 0)

Reversing will work in above case
var max = Int.MinValue
val buffer = new scala.collection.mutable.ArrayBuffer[Int](input.length)
for (i <- input.reverse if i >= max) {
max = i
i +=: buffer
}
println(buffer.toList)

How do I return Spark RDD partition values without a local iterator?

I'm learning Spark and its parallelism that relates to RDD partition distributions. I have a 4 CPU machine hence I have 4 units of parallelism. To return the members of partition index "0" I couldn't find a way to return this partition without forcing the RDD to use a localIterator.
I'm used to spark being quite terse. Is there a more concise way to filter an RDD by partition? The following two methods work, but it seems clumsy.
scala> val data = 1 to 20
data: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
scala> val distData = sc.parallelize(data)
distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[75] at parallelize at <console>:26
scala> distData.mapPartitionsWithIndex{
(index,it) => {
it.toList.map(x => if (index == 0) (x)).iterator
}
}.toLocalIterator.toList.filterNot(
_.isInstanceOf[Unit]
)
res107: List[AnyVal] = List(1, 2, 3, 4, 5)
scala> distData.mapPartitionsWithIndex{
(index,it) => {
it.toList.map(x => if (index == 0) (x)).iterator
}
}.toLocalIterator.toList.filter(
_ match{
case x: Unit => false
case x => true
}
)
res108: List[AnyVal] = List(1, 2, 3, 4, 5)

distData.mapPartitionsWithIndex{ (index, it) =>
if (index == 0) it else Array[Int]().iterator
}
You can return an empty iterator and it will work fine.

Drop every Nth element from a Scala Array

My requirement is to drop every Nth element from a Scala Array (pls note every Nth element). I wrote the below method which does the job. Since, I am new to Scala, I couldn't avoid the Java hangover. Is there a simpler or more efficient alternative?
def DropNthItem(a: Array[String], n: Int): Array[String] = {
val in = a.indices.filter(_ % n != 0)
val ab: ArrayBuffer[String] = ArrayBuffer()
for ( i <- in)
ab += a(i-1)
return ab.toArray
}

You made a good start. Consider this simplification.
def DropNthItem(a: Array[String], n: Int): Array[String] =
a.indices.filter(x => (x+1) % n != 0).map(a).toArray

How about something like this?
arr.grouped(n).flatMap(_.take(n-1)).toArray

You can do this in two steps functionally using zipWithIndex to get an array of elements tupled with their indices, and then collect to build a new array consisting of only elements that have indices that aren't 0 = i % n.
def dropNth[A: reflect.ClassTag](arr: Array[A], n: Int): Array[A] =
arr.zipWithIndex.collect { case (a, i) if (i + 1) % n != 0 => a }

This will make it
def DropNthItem(a: Array[String], n: Int): Array[String] =
a.zipWithIndex.filter(_._2 % n != 0).map(_._1)

If you're looking for performance (since you're using an ArrayBuffer), you might as well track the index with a var, manually increment it, and check it with an if to filter out n-multiple-indexed values.
def dropNth[A: reflect.ClassTag](arr: Array[A], n: Int): Array[A] = {
val buf = new scala.collection.mutable.ArrayBuffer[A]
var i = 0
for(a <- arr) {
if((i + 1) % n != 0) buf += a
i += 1
}
buf.toArray
}
It's faster still if we traverse the original array as an iterator using a while loop.
def dropNth[A: reflect.ClassTag](arr: Array[A], n: Int): Array[A] = {
val buf = new scala.collection.mutable.ArrayBuffer[A]
val it = arr.iterator
var i = 0
while(it.hasNext) {
val a = it.next
if((i + 1) % n != 0) buf += a
i += 1
}
buf.toArray
}

I'd go with something like this;
def dropEvery[A](arr: Seq[A], n: Int) = arr.foldLeft((Seq.empty[A], 0)) {
case ((acc, idx), _) if idx == n - 1 => (acc, 0)
case ((acc, idx), el) => (acc :+ el, idx + 1)
}._1
// example: dropEvery(1 to 100, 3)
// res0: Seq[Int] = List(1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29, 31, 32, 34, 35, 37, 38, 40, 41, 43, 44, 46, 47, 49, 50, 52, 53, 55, 56, 58, 59, 61, 62, 64, 65, 67, 68, 70, 71, 73, 74, 76, 77, 79, 80, 82, 83, 85, 86, 88, 89, 91, 92, 94, 95, 97, 98, 100)
This is efficient since it requires a single pass over the array and removes every nth element from it – I believe that is easy to see.
The first case matches when idx == n - 1 and ignores the element at that index, and passes over the acc and resets the count to 0 for the next element.
If the first case doesn't match, it adds the element to the end of the acc and increments the count by 1.
Since you're willing to get rid of the Java hangover, you might want to use implicit classes to use this in a very nice way:
implicit class MoreFuncs[A](arr: Seq[A]) {
def dropEvery(n: Int) = arr.foldLeft((Seq.empty[A], 0)) {
case ((acc, idx), _) if idx == n - 1 => (acc, 0)
case ((acc, idx), el) => (acc :+ el, idx + 1)
}._1
}
// example: (1 to 100 dropEvery 1) == Nil (: true)

Ensure list contains only specific values

How can I make sure a list only contains a specific set of items?
List[Int]
A function to make sure the list only contains the values 10, 20 or 30.
I'm sure this is built in by I can't find it!

Your question doesn't specify what you want to happen when the list doesn't contain the requisite items.
The following will return true if all the items in the List match your criteria, false otherwise:
val ints1: List[Int] = List(1, 2, 3, 4, 5, 6, 7)
val ints2: List[Int] = List(10, 10, 10, 10)
ints1.forall(i => List(10, 20, 30).contains(i)) // false
ints2.forall(i => List(10, 20, 30).contains(i)) // true
The following will return a List with only those items which matched the criteria:
val ints1: List[Int] = List(10, 20, 30, 40, 50, 60, 70)
val ints2: List[Int] = List(10, 10, 10)
ints1.filter(i => List(10, 20, 30).contains(i)) // List(10, 20, 30)
ints2.filter(i => List(10, 20, 30).contains(i)) // List(10, 10, 10)

forall
You may use forall with a Set containing elements which are valid or legal and you want to see in the list.
list.forall(Set(10, 20, 30).contains) //true means list only contains 10, 20, 30
Set is Function
You need not use contains method as Set extends Int => Boolean. You can use Set like a function
list forall Set(10, 20, 30)
Filter
You can use filter to filter out the elements which are not in the given list. Again you can use Set as function as Set extends Function.
list.filter(Set(10, 20, 30)).nonEmpty //true means list only contains 10, 20 and 30
Collect if you like pattern matching
Collect takes a Partial function. If you like pattern matching just use collect
list.collect {
case 10 => 10
case 20 => 20
case 30 => 30
}.nonEmpty //true means list only contains 10, 20 and 30
Scala REPL
scala> val list = List(10, 20, 30, 40, 50)
list: List[Int] = List(10, 20, 30, 40, 50)
scala> list forall Set(10, 20, 30)
res6: Boolean = false

If you simply want to determine whether all of the values in your list are "legal", use forall:
def isLegal(i: Int): Boolean = ??? // e.g. is it 10, 20, or 30
val allLegal = list forall isLegal
If you want to trim down your list so that only legal values remain, use filter:
val onlyLegalValues = list filter isLegal
Note that a Set[Int] counts as a Int => Boolean function, so you could use that in place of your isLegal method:
val isLegal = Set(10, 20, 30)
val allLegal = list forall isLegal
val onlyLegalValues = list filter isLegal