Using `quick-look` to find certain element - scala

First of all, I want to say that this is a school assignment and I am only seeking for some guidance.
My task was to write an algorithm that finds the k:th smallest element in a seq using quickselect. This should be easy enough but when running some tests I hit a wall. For some reason if I use input (List(1, 1, 1, 1), 1) it goes into infinite loop.
Here is my implementation:
val rand = new scala.util.Random()
def find(seq: Seq[Int], k: Int): Int = {
require(0 <= k && k < seq.length)
val a: Array[Int] = seq.toArray[Int] // Can't modify the argument sequence
val pivot = rand.nextInt(a.length)
val (low, high) = a.partition(_ < a(pivot))
if (low.length == k) a(pivot)
else if (low.length < k) find(high, k - low.length)
else find(low, k)
}
For some reason (or because I am tired) I cannot spot my mistake. If someone could hint me where I go wrong I would be pleased.

Basically you are depending on this line - val (low, high) = a.partition(_ < a(pivot)) to split the array into 2 arrays. The first one containing the continuous sequence of elements smaller than pivot-element and the second contains the rest.
Then you say that if the first array has length k that means you have already seen k elements smaller that your pivot-element. Which means pivot-element is actually k+1th smallest and you are actually returning k+1th smallest element instead of kth. This is your first mistake.
Also... A greater problem occurs when you have all elements which are same because your first array will always have 0 elements.
Not only that... your code will give you wrong answer for inputs where you have repeating elements among k smallest ones like - (1, 3, 4, 1, 2).
The solution lies in obervation that in the sequence (1, 1, 1, 1) the 4th smallest element is the 4th 1. Meaning you have to use <= instead of <.
Also... Since the partition function will not split the array until your boolean condition is false, you can not use partition for achieving this array split. you will have to write the split yourself.

Related

How to trick Scala map method to produce more than one output per each input item?

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups). Most rows are transformed 1 : 1 from input to output, but in some scenarios require more than one output per each input. The input row schema can change anytime. The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1 : n output?
The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).
My original problem is too complex to share here, but this example demonstrates the concept. Let's have a list of integers. Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:
val X = Seq(1, 2, 3, 4, 5)
val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here
val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples
// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
newX = x*x :: newX
if(x % 2 == 0) newX = (x / 2) :: newX
})
newX
Is there a better way than foreach construct?
.flatMap produces any number of outputs from a single input.
val X = Seq(1, 2, 3, 4, 5)
X.flatMap { x =>
if (x % 2 == 0) Seq(x*x, x / 2) else Seq(x / 2)
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)
flatMap in more detail
In X.map(f), f is a function that maps each input to a single output. By contrast, in X.flatMap(g), the function g maps each input to a sequence of outputs. flatMap then takes all the sequences produced (one for each element in f) and concatenates them.
The neat thing is .flatMap works not just for sequences, but for all sequence-like objects. For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option. Similarly, Future(x)#flatMap(g) will allow g to return a Future.
Whenever the number of elements you return depends on the input, you should think of flatMap.

find out if a number is a good number in scala

Hi I am new to scala functional programming methodology. I want to input a number to my function and check if it is a good number or not.
A number is a good number if its every digit is larger than the sum of digits which are on the right side of that digit. 
For example:
9620  is good as (2 > 0, 6 > 2+0, 9 > 6+2+0)
steps I am using to solve this is
1. converting a number to string and reversing it
2. storing all digits of the reversed number as elements of a list
3. applying for loop from i equals 1 to length of number - 1
4. calculating sum of first i digits as num2
5. extracting ith digit from the list as digit1 which is one digit ahead of the first i numbers for which we calculated sum because list starts from zero.
6. comparing output of 4th and 5th step. if num1 is greater than num2 then we will break the for loop and come out of the loop to print it is not a good number.
please find my code below
val num1 = 9521.toString.reverse
val list1 = num1.map(_.todigit).toList
for (i <- 1 to num1.length - 1) {
val num2 = num1.take(i).map(_.toDigits) sum
val digit1 = list1(i)
if (num2 > digit1) {
print("number is not a good number")
break
}
}
I know this is not the most optimized way to solve this problem. Also I am looking for a way to code this using tail recursion where I pass two numbers and get all the good numbers falling in between those two numbers.
Can this be done in more optimized way?
Thanks in advance!
No String conversions required.
val n = 9620
val isGood = Stream.iterate(n)(_/10)
.takeWhile(_>0)
.map(_%10)
.foldLeft((true,-1)){ case ((bool,sum),digit) =>
(bool && digit > sum, sum+digit)
}._1
Here is a purely numeric version using a recursive function.
def isGood(n: Int): Boolean = {
#tailrec
def loop(n: Int, sum: Int): Boolean =
(n == 0) || (n%10 > sum && loop(n/10, sum + n%10))
loop(n/10, n%10)
}
This should compile into an efficient loop.
Using this function:(This will be the efficient way as the function forall will not traverse the entire list of digits. it stops when it finds the false condition immediately ( ie., when v(i)>v.drop(i+1).sum becomes false) while traversing from left to right of the vector v. )
def isGood(n: Int)= {
val v1 = n.toString.map(_.asDigit)
val v = if(v1.last!=0) v1 else v1.dropRight(1)
(0 to v.size-1).forall(i=>v(i)>v.drop(i+1).sum)
}
If we want to find good numbers in an interval of integers ranging from n1 to n2 we can use this function:
def goodNums(n1:Int,n2:Int) = (n1 to n2).filter(isGood(_))
In Scala REPL:
scala> isGood(9620)
res51: Boolean = true
scala> isGood(9600)
res52: Boolean = false
scala> isGood(9641)
res53: Boolean = false
scala> isGood(9521)
res54: Boolean = true
scala> goodNums(412,534)
res66: scala.collection.immutable.IndexedSeq[Int] = Vector(420, 421, 430, 510, 520, 521, 530, 531)
scala> goodNums(3412,5334)
res67: scala.collection.immutable.IndexedSeq[Int] = Vector(4210, 5210, 5310)
This is a more functional way. pairs is a list of tuples between a digit and the sum of the following digits. It is easy to create these tuples with drop, take and slice (a combination of drop and take) methods.
Finally I can represent my condition in an expressive way with forall method.
val n = 9620
val str = n.toString
val pairs = for { x <- 1 until str.length } yield (str.slice(x - 1, x).toInt, str.drop(x).map(_.asDigit).sum)
pairs.forall { case (a, b) => a > b }
If you want to be functional and expressive avoid to use break. If you need to check a condition for each element is a good idea to move your problem to collections, so you can use forAll.
This is not the case, but if you want performance (if you don't want to create an entire pairs collection because the condition for the first element is false) you can change your for collection from a Range to Stream.
(1 until str.length).toStream
Functional style tends to prefer monadic type things, such as maps and reduces. To make this look functional and clear, I'd do something like:
def isGood(value: Int) =
value.toString.reverse.map(digit=>Some(digit.asDigit)).
reduceLeft[Option[Int]]
{
case(sum, Some(digit)) => sum.collectFirst{case sum if sum < digit => sum+digit}
}.isDefined
Instead of using tail recursion to calculate this for ranges, just generate the range and then filter over it:
def goodInRange(low: Int, high: Int) = (low to high).filter(isGood(_))

Fibonnaci Sequence fast implementation

I have written this function in Scala to calculate the fibonacci number given a particular index n:
def fibonacci(n: Long): Long = {
if(n <= 1) n
else
fibonacci(n - 1) + fibonacci(n - 2)
}
However it is not efficient when calculating with large indexes. Therefore I need to implement a function using a tuple and this function should return two consecutive values as the result.
Can somebody give me any hints about this? I have never used Scala before. Thanks!
This question should maybe go to Mathematics.
There is an explicit formula for the Fibonacci sequence. If you need to calculate the Fibonacci number for n without the previous ones, this is much faster. You find it here (Binet's formula): http://en.wikipedia.org/wiki/Fibonacci_number
Here's a simple tail-recursive solution:
def fibonacci(n: Long): Long = {
def fib(i: Long, x: Long, y: Long): Long = {
if (i > 0) fib(i-1, x+y, x)
else x
}
fib(n, 0, 1)
}
The solution you posted takes exponential time since it creates two recursive invocation trees (fibonacci(n - 1) and fibonacci(n - 2)) at each step. By simply tracking the last two numbers, you can recursively compute the answer without any repeated computation.
Can you explain the middle part, why (i-1, x+y, x) etc. Sorry if I am asking too much but I hate to copy and paste code without knowing how it works.
It's pretty simple—but my poor choice of variable names might have made it confusing.
i is simply a counter saying how many steps we have left. If we're calculating the Mth (I'm using M since I already used n in my code) Fibonacci number, then i tells us how many more terms we have left to calculate before we reach the Mth term.
x is the mth term in the Fibonacci sequence, or Fm (where m = M - i).
y is the m-1th term in the Fibonacci sequence, or Fm-1 .
So, on the first call fib(n, 0, 1), we have i=M, x=0, y=1. If you look up the bidirectional Fibonacci sequence, you'll see that F0 = 0 and F-1 = 1, which is why x=0 and y=1 here.
On the next recursive call, fib(i-1, x+y, x), we pass x+y as our next x value. This come straight from the definiton:
Fn = Fn-1 + Fn-2
We pass x as the next y term, since our current Fn-1 is the same as Fn-2 for the next term.
On each step we decrement i since we're one step closer to the final answer.
I am assuming that you don't have saved values from previous computations. If so, it will be faster for you to use the direct formula using the golden ratio instead of the recursive definition. The formula can be found in the Wikipedia page for Fibonnaci number:
floor(pow(phi, n)/root_of_5 + 0.5)
where phi = (1 + sqrt(5)/2).
I have no knowledge of programming in Scala. I am hoping someone on SO will upgrade my pseudo-code to actual Scala code.
Update
Here's another solution again using Streams as below (getting Memoization for free) but a bit more intuitive (aka: without using zip/tail invocation on fibs Stream):
val fibs = Stream.iterate( (0,1) ) { case (a,b)=>(b,a+b) }.map(_._1)
that yields the same output as below for:
fibs take 5 foreach println
Scala supports Memoizations through Streams that is an implementation of lazy lists. This is a perfect fit for Fibonacci implementation which is actually provided as an example in the Scala Api for Streams. Quoting here:
import scala.math.BigInt
object Main extends App {
val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #:: fibs.zip(fibs.tail).map { n => n._1 + n._2 }
fibs take 5 foreach println
}
// prints
//
// 0
// 1
// 1
// 2
// 3

Seq with maximal elements

I have a Seq and function Int => Int. What I need to achieve is to take from original Seq only thoose elements that would be equal to the maximum of the resulting sequence (the one, I'll have after applying given function):
def mapper:Int=>Int= x=>x*x
val s= Seq( -2,-2,2,2 )
val themax= s.map(mapper).max
s.filter( mapper(_)==themax)
But this seems wasteful, since it has to map twice (once for the filter, other for the maximum).
Is there a better way to do this? (without using a cycle, hopefully)
EDIT
The code has since been edited; in the original this was the filter line: s.filter( mapper(_)==s.map(mapper).max). As om-nom-nom has pointed out, this evaluates `s.map(mapper).max each (filter) iteration, leading to quadratic complexity.
Here is a solution that does the mapping only once and using the `foldLeft' function:
The principle is to go through the seq and for each mapped element if it is greater than all mapped before then begin a new sequence with it, otherwise if it is equal return the list of all maximums and the new mapped max. Finally if it is less then return the previously computed Seq of maximums.
def getMaxElems1(s:Seq[Int])(mapper:Int=>Int):Seq[Int] = s.foldLeft(Seq[(Int,Int)]())((res, elem) => {
val e2 = mapper(elem)
if(res.isEmpty || e2>res.head._2)
Seq((elem,e2))
else if (e2==res.head._2)
res++Seq((elem,e2))
else res
}).map(_._1) // keep only original elements
// test with your list
scala> getMaxElems1(s)(mapper)
res14: Seq[Int] = List(-2, -2, 2, 2)
//test with a list containing also non maximal elements
scala> getMaxElems1(Seq(-1, 2,0, -2, 1,-2))(mapper)
res15: Seq[Int] = List(2, -2, -2)
Remark: About complexity
The algorithm I present above has a complexity of O(N) for a list with N elements. However:
the operation of mapping all elements is of complexity O(N)
the operation of computing the max is of complexity O(N)
the operation of zipping is of complexity O(N)
the operation of filtering the list according to the max is also of complexity O(N)
the operation of mapping all elements is of complexity O(M), with M the number of final elements
So, finally the algorithm you presented in your question has the same complexity (quality) than my answer's one, moreover the solution you present is more clear than mine. So, even if the 'foldLeft' is more powerful, for this operation I would recommend your idea, but with zipping original list and computing the map only once (especially if your map is more complicated than a simple square). Here is the solution computed with the help of *scala_newbie* in question/chat/comments.
def getMaxElems2(s:Seq[Int])(mapper:Int=>Int):Seq[Int] = {
val mappedS = s.map(mapper) //map done only once
val m = mappedS.max // find the max
s.zip(mappedS).filter(_._2==themax).unzip._1
}
// test with your list
scala> getMaxElems2(s)(mapper)
res16: Seq[Int] = List(-2, -2, 2, 2)
//test with a list containing also non maximal elements
scala> getMaxElems2(Seq(-1, 2,0, -2, 1,-2))(mapper)
res17: Seq[Int] = List(2, -2, -2)

Scala: what is the most appropriate data structure for sorted subsets?

Given a large collection (let's call it 'a') of elements of type T (say, a Vector or List) and an evaluation function 'f' (say, (T) => Double) I would like to derive from 'a' a result collection 'b' that contains the N elements of 'a' that result in the highest value under f. The collection 'a' may contain duplicates. It is not sorted.
Maybe leaving the question of parallelizability (map/reduce etc.) aside for a moment, what would be the appropriate Scala data structure for compiling the result collection 'b'? Thanks for any pointers / ideas.
Notes:
(1) I guess my use case can be most concisely expressed as
val a = Vector( 9,2,6,1,7,5,2,6,9 ) // just an example
val f : (Int)=>Double = (n)=>n // evaluation function
val b = a.sortBy( f ).take( N ) // sort, then clip
except that I do not want to sort the entire set.
(2) one option might be an iteration over 'a' that fills a TreeSet with 'manual' size bounding (reject anything worse than the worst item in the set, don't let the set grow beyond N). However, I would like to retain duplicates present in the original set in the result set, and so this may not work.
(3) if a sorted multi-set is the right data structure, is there a Scala implementation of this? Or a binary-sorted Vector or Array, if the result set is reasonably small?
You can use a priority queue:
def firstK[A](xs: Seq[A], k: Int)(implicit ord: Ordering[A]) = {
val q = new scala.collection.mutable.PriorityQueue[A]()(ord.reverse)
val (before, after) = xs.splitAt(k)
q ++= before
after.foreach(x => q += ord.max(x, q.dequeue))
q.dequeueAll
}
We fill the queue with the first k elements and then compare each additional element to the head of the queue, swapping as necessary. This works as expected and retains duplicates:
scala> firstK(Vector(9, 2, 6, 1, 7, 5, 2, 6, 9), 4)
res14: scala.collection.mutable.Buffer[Int] = ArrayBuffer(6, 7, 9, 9)
And it doesn't sort the complete list. I've got an Ordering in this implementation, but adapting it to use an evaluation function would be pretty trivial.