How to search efficiently in a nested collection in a functional way - scala

I'd like to find the indices (coordinates) of the first element whose value is 4, in a nested Vector of Int, in a functional way.
val a = Vector(Vector(1,2,3), Vector(4,5), Vector(3,8,4))
a.map(_.zipWithIndex).zipWithIndex.collect{
case (col, i) =>
col.collectFirst {
case (num, index) if num == 4 =>
(i, index)
}
}.collectFirst {
case Some(x) ⇒ x
}
It returns:
Some((0, 1))
the coordinate of the first 4 occurrence.
This solution is quite simple, but it has a performance penalty, because the nested col.collect is performed for all the elements of the top Vector, when we are only interested in the 1st match.
One possible solution is to write a guard in the pattern matching. But I don't know how to write a guard based in a slow condition, and return something that has already been calculated in the guard.
Can it be done better?

Recursive maybe?
If you insist on using Vectors, something like this will work (for a non-indexed seq, you'd need a different approach):
#tailrec
findit(
what: Int,
lists: IndexedSeq[IndexedSeq[Int]],
i: Int = 0,
j: Int = 0
): Option[(Int, Int)] =
if(i >= lists.length) None
else if(j >= lists(i).length) findit(what, lists, i+1, 0)
else if(lists(i)(j) == what) Some((i,j))
else findit(what, lists, i, j+1)

A simple thing you can to without changing the algorithm is to use Scala streams to be able to exit as soon as you find the match. Streams are lazily evaluated as opposed to sequences.
Just make a change similar to this
a.map(_.zipWithIndex.toStream).zipWithIndex.toStream.collect{ ...
In terms of algorithmic changes, if you can somehow have your data sorted (even before you start to search) then you can use Binary search instead of looking at each element.
import scala.collection.Searching._
val dummy = 123
implicit val anOrdering = new Ordering[(Int, Int, Int)]{
override def compare(x: (Int, Int, Int), y: (Int, Int, Int)): Int = Integer.compare(x._1, y._1)
}
val seqOfIntsWithPosition = a.zipWithIndex.flatMap(vectorWithIndex => vectorWithIndex._1.zipWithIndex.map(intWithIndex => (intWithIndex._1, vectorWithIndex._2, intWithIndex._2)))
val sorted: IndexedSeq[(Int, Int, Int)] = seqOfIntsWithPosition.sortBy(_._1)
val element = sorted.search((4, dummy, dummy))
This code is not very pretty or readable, I just quickly wanted to show an example of how it could be done.

Related

Optimizing scala-spark code by removing "for loop"

I wanted to optimize this code ( scala spark) to remove for loop . How do i do it ?
var varianceExplained = Array[(Int,Double)]();
var varExplained = Array[(Double)]();//{This one contains double values assigned before}
var sums = 0.00
for(x<-0 to varExplained.length-1)
{sums =sums+varExplained(x)
varianceExplained +:= (x,sums)
}
Not really sure how you would parallelize a set which is reliant on its preceding values... only this I can add is how to remove loops and make it a recursive function as per functional programming best practices.
def go(acc: Array[(Int, Double)], iter: Int, sums: Double): Array[(Int, Double)] ={
if (iter == varExplained.length)acc
else {
go((iter, sums+varExplained(iter)) +: acc, iter+1, sums+varExplained(iter))
}
}
go(Array[(Int, Double)](), 0, 0)
One possible solution is to convert the for loop to a translation of map.
You can try the following:
val varianceExplained = varExplained.map(elem => (elem, sums+varExplained(elem))).
In this case you do not need the varianceExplained array. You get the required Array[(Int, Double)] as a result of the map operation. I have used similar strategies at work to make my code efficient.
Also, do try using vals instead of vars in your code.

Scala - increasing prefix of a sequence

I was wondering what is the most elegant way of getting the increasing prefix of a given sequence. My idea is as follows, but it is not purely functional or any elegant:
val sequence = Seq(1,2,3,1,2,3,4,5,6)
var currentElement = sequence.head - 1
val increasingPrefix = sequence.takeWhile(e =>
if (e > currentElement) {
currentElement = e
true
} else
false)
The result of the above is:
List(1,2,3)
You can take your solution, #Samlik, and effectively zip in the currentElement variable, but then map it out when you're done with it.
sequence.take(1) ++ sequence.zip(sequence.drop(1)).
takeWhile({case (a, b) => a < b}).map({case (a, b) => b})
Also works with infinite sequences:
val sequence = Seq(1, 2, 3).toStream ++ Stream.from(1)
sequence is now an infinite Stream, but we can peek at the first 10 items:
scala> sequence.take(10).toList
res: List[Int] = List(1, 2, 3, 1, 2, 3, 4, 5, 6, 7)
Now, using the above snippet:
val prefix = sequence.take(1) ++ sequence.zip(sequence.drop(1)).
takeWhile({case (a, b) => a < b}).map({case (a, b) => b})
Again, prefix is a Stream, but not infinite.
scala> prefix.toList
res: List[Int] = List(1, 2, 3)
N.b.: This does not handle the cases when sequence is empty, or when the prefix is also infinite.
If by elegant you mean concise and self-explanatory, it's probably something like the following:
sequence.inits.dropWhile(xs => xs != xs.sorted).next
inits gives us an iterator that returns the prefixes longest-first. We drop all the ones that aren't sorted and take the next one.
If you don't want to do all that sorting, you can write something like this:
sequence.scanLeft(Some(Int.MinValue): Option[Int]) {
case (Some(last), i) if i > last => Some(i)
case _ => None
}.tail.flatten
If the performance of this operation is really important, though (it probably isn't), you'll want to use something more imperative, since this solution still traverses the entire collection (twice).
And, another way to skin the cat:
val sequence = Seq(1,2,3,1,2,3,4,5,6)
sequence.head :: sequence
.sliding(2)
.takeWhile{case List(a,b) => a <= b}
.map(_(1)).toList
// List[Int] = List(1, 2, 3)
I will interpret elegance as the solution that most closely resembles the way we humans think about the problem although an extremely efficient algorithm could also be a form of elegance.
val sequence = List(1,2,3,2,3,45,5)
val increasingPrefix = takeWhile(sequence, _ < _)
I believe this code snippet captures the way most of us probably think about the solution to this problem.
This of course requires defining takeWhile:
/**
* Takes elements from a sequence by applying a predicate over two elements at a time.
* #param xs The list to take elements from
* #param f The predicate that operates over two elements at a time
* #return This function is guaranteed to return a sequence with at least one element as
* the first element is assumed to satisfy the predicate as there is no previous
* element to provide the predicate with.
*/
def takeWhile[A](xs: Traversable[A], f: (Int, Int) => Boolean): Traversable[A] = {
// function that operates over tuples and returns true when the predicate does not hold
val not = f.tupled.andThen(!_)
// Maybe one day our languages will be better than this... (dependant types anyone?)
val twos = sequence.sliding(2).map{case List(one, two) => (one, two)}
val indexOfBreak = twos.indexWhere(not)
// Twos has one less element than xs, we need to compensate for that
// An intuition is the fact that this function should always return the first element of
// a non-empty list
xs.take(i + 1)
}

Initial value in loops

Suppose you are writing a tail-recursive loop function to evaluate a collection of elements according to some criterium and want to end up with the element that scores best, and its score.
Naturally you will pass the best scoring element so far, as well as its score, as parameters to the function.
But since there is no best element at the start of the recursion, what should you initially pass as parameters to the loop function? Not wanting to use null, you could use Option[T] as parameter types, but then you have to check for isEmpty at each recursion while you know that it always has a value after the initial call. Isn't there a better way?
You can use list.head as initial value and loop over list. The first evaluation will be "wasted" since you're evaluating list.head against itself but that will calculate the score for list.head and the rest of the iteration can carry on and do what you want.
How is a "best element" evaluated?
Typically that is done through a numeric value.
At the start, you typically set that value to a number. Best practices state that the number should be defined as a constant, something like the name of MIN_VALUE. That value could be zero, negative, or the minimum floating point number that is representable.
The head of the list, as #vptheron answered, seems to be the best as you don't have a good starting value.
But, rather than use a tail-recursive function...
def getBestScore(scores: List[A]): A = {
def go(as: List[A], acc: A): A = as match {
case x :: xs => go(xs, getBest(acc, x)) // getBest: ((A, A) => A)
case Nil => acc
}
go(scores, scores.head)
}
... you can use foldLeft to make it concise.
val best = scores.foldLeft(scores.head)(getBest)
Example:
scala> def getBest(x: Int, y: Int) = if(x > y) x else y
getBest: (x: Int, y: Int)Int
scala> val scores = List(1, 20, 3)
scores: List[Int] = List(1, 20, 3)
scala> scores.foldLeft(scores.head)(getBest)
res6: Int = 20

Finding character in 2 dimensional scala list

So this might not be the best way to tackle it but my initial thought was a for expression.
Say I have a List like
List(List('a','b','c'),List('d','e','f'),List('h','i','j'))
I would like to find the row and column for a character, say 'e'.
def findChar(letter: Char, list: List[List[Char]]): (Int, Int) =
for {
r <- (0 until list.length)
c <- (0 until list(r).length)
if list(r)(c) == letter
} yield (r, c)
If there is a more elegant way I'm all ears but I would also like to understand what's wrong with this. Specifically the error the compiler gives me here is
type mismatch; found : scala.collection.immutable.IndexedSeq[(Int, Int)] required: (Int, Int)
on the line assigning to r. It seems to be complaining that my iterator doesn't match the return type but I don't quite understand why this is or what to do about it ...
In the signature of findChar you are telling the compiler that it returns (Int, Int). However, the result of your for expression (as inferred by Scala) is IndexedSeq[(Int, Int)] as the error message indicates. The reason is that (r, c) after yield is produced for every "iteration" in the for expression (i.e., you are generating a sequence of results, not just a single result).
EDIT: As for findChar, you could do:
def findChar(letter: Char, list: List[List[Char]]) = {
val r = list.indexWhere(_ contains letter)
val c = list(r).indexOf(letter)
(r, c)
}
It is not the most efficient solution, but relatively short.
EDIT: Or reuse your original idea:
def findAll(letter: Char, list: List[List[Char]]) =
for {
r <- 0 until list.length
c <- 0 until list(r).length
if list(r)(c) == letter
} yield (r, c)
def findChar(c: Char, xs: List[List[Char]]) = findAll(c, xs).head
In both cases, be aware that an exception occurs if the searched letter is not contained in the input list.
EDIT: Or you write a recursive function yourself, like:
def findPos[A](c: A, list: List[List[A]]) = {
def aux(i: Int, xss: List[List[A]]) : Option[(Int, Int)] = xss match {
case Nil => None
case xs :: xss =>
val j = xs indexOf c
if (j < 0) aux(i + 1, xss)
else Some((i, j))
}
aux(0, list)
}
where aux is a (locally defined) auxiliary function that does the actual recursion (and remembers in which sublist we are, the index i). In this implementation a result of None indicates that the searched element was not there, whereas a successful result might return something like Some((1, 1)).
For your other ear, the question duplicates
How to capture inner matched value in indexWhere vector expression?
scala> List(List('a','b','c'),List('d','e','f'),List('h','i','j'))
res0: List[List[Char]] = List(List(a, b, c), List(d, e, f), List(h, i, j))
scala> .map(_ indexOf 'e').zipWithIndex.find(_._1 > -1)
res1: Option[(Int, Int)] = Some((1,1))

Sort List according to more than only constraint in Scala

I am desperately trying to find a way to sort a List of strings, where the strings are predefined identifiers of following form: a1.1, a1.2,..., a1.100, a2.1, a2.2,....,a2.100,...,b1.1, b1.2,.. and so on, which is alread the correct ordering. So each identifier is first ordered by its first character (descending alphabetic order) and within this ordering descending ordered by consecutive numbers. I have tried sortWith by providing a sorting function specifying the above rule for all two consecutive list members.
scala> List("a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => a.take(1) < b.take(1) && a.drop(1).toDouble < b.drop(1).toDouble)
res2: List[java.lang.String] = List(a1.102, a1.1, b2.2, b2.1)
This is not the ordering I expected. However, by swapping the ordering of the expressions, as
scala> List("a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => (a.drop(1).toDouble < b.drop(1).toDouble && a.take(1) < b.take(2)))
res3: List[java.lang.String] = List(a1.1, a1.102, b2.1, b2.2)
this indeed gives me (at least for this example) the desired ordering, which I do not understand neither.
I would be so thankful, if somebody could give me a hint what exactly is going on there and how I can sort lists as I wish (with a more complex boolean expression than only comparing < or >). A further question: The strings I am sorting (in my example) are actually keys from a HashMap m. Will any solution effect sorting m by its keys within
m.toSeq.sortWith((a: (String, String), b: (String, String)) => a._1.drop(1).toDouble < b._1.drop(1).toDouble && a._1.take(1) < b._1.take(1))
Many thanks in advanced!
Update: I misread your example—you want a1.2 to precede a1.102, which the toDouble versions below won't get right. I'd suggest the following instead:
items.sortBy { s =>
val Array(x, y) = s.tail.split('.')
(s.head, x.toInt, y.toInt)
}
Here we use Scala's Ordering instance for Tuple3[Char, Int, Int].
It looks like you have a typo in your second ("correct") version: b.take(2) should doesn't make sense, and should be b.take(1) to match the first. Once you fix that, you get the same (incorrect) ordering.
The real problem is that you only need the second condition in the case where the numbers match. So the following works as desired:
val items = List("a1.102", "b2.2", "b2.1", "a1.1")
items.sortWith((a, b) =>
a.head < b.head || (a.head == b.head && a.tail.toDouble < b.tail.toDouble)
)
I'd actually suggest the following, though:
items.sortBy(s => s.head -> s.tail.toDouble)
Here we take advantage of the fact that Scala provides an appropriate Ordering instance for Tuple2[Char, Double], so we can just provide a transformation function that turns your items into that type.
And to answer your last question: yes, either of these approaches should work just fine with your Map example.
Create a tuple containing the string before the "." and then the integer after the ".". This will use a lexicographic order for the first part and an order on the integer for the second part.
scala> val order = Ordering.by((s:String) => (s.split("\\.")(0),s.split("\\.")(1).toInt))
order: scala.math.Ordering[String] = scala.math.Ordering$$anon$7#384eb259
scala> res2
res8: List[java.lang.String] = List(a1.5, a2.2, b1.11, b1.8, a1.10)
scala> res2.sorted(order)
res7: List[java.lang.String] = List(a1.5, a1.10, a2.2, b1.8, b1.11)
So consider what happens when your sorting function is passed a="a1.1" and b="a1.102".
What you'd like is for the function to return true. However, a.take(1) < b.take(1) returns false, so the function returns false.
Think about your cases a bit more carefully
if the prefix is equal, and the tails are ordered properly, then the arguments are ordered properly
if the prefixes are not equal, then the arguments are ordered properly only if the prefixes are.
So try this instead:
(a: String, b: String) => if (a.take(1) == b.take(1)) a.drop(1).toDouble < b.drop(1).toDouble else a.take(1) < b.take(1)
And that returns the proper ordering:
scala> List("a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => if (a.take(1) == b.take(1)) a.drop(1).toDouble < b.drop(1).toDouble else a.take(1) < b.take(1))
res8: List[java.lang.String] = List(a1.1, a1.102, b2.1, b2.2)
The reason it worked for you with the reversed ordering was luck. Consider the extra input "c0" to see what was happening:
scala> List("c0", "a1.102", "b2.2", "b2.1", "a1.1").sortWith((a: String, b: String) => (a.drop(1).toDouble < b.drop(1).toDouble && a.take(1) < b.take(2)))
res1: List[java.lang.String] = List(c0, a1.1, a1.102, b2.1, b2.2)
The reversed function sorts on the numeric part of the string first, then on the prefix. It just so happens that your numeric ordering you gave also preserved the prefix ordering, but that won't always be the case.