How can i make the following function more efficient? - scala

Here is a function which makes a map from given array. Where key is the integer number and the value is the frequency of this number in the given array.
I need to find the key which has the maximum frequency. If two key has the same frequency then i need to take the key which is smaller.
that's what i have written:
def findMinKeyWithMaxFrequency(arr: List[Int]): Int = {
val ansMap:scala.collection.mutable.Map[Int,Int] = scala.collection.mutable.Map()
arr.map(elem=> ansMap+=(elem->arr.count(p=>elem==p)))
ansMap.filter(_._2==ansMap.values.max).keys.min
}
val arr = List(1, 2 ,3, 4, 5, 4, 3, 2, 1, 3, 4)
val ans=findMinKeyWithMaxFrequency(arr) // output:3
How can i make it more efficient, it is giving me the right answer but i don't think it's the most efficient way to solve the problem.
In the given example the frequency of 3 and 4 is 3 so the answer should be 3 as 3 is smaller than 4.
Edit 1:
That's what i have done to make it bit efficient. Which is converting arr into Set as we need to find frequency for the unique elements only.
def findMinKeyWithMaxFrequency(arr: List[Int]): Int = {
val ansMap=arr.toSet.map{ e: Int =>(e,arr.count(x=>x==e))}.toMap
ansMap.filter(_._2==ansMap.values.max).keys.min
}
Can it be more efficient? Is it the most functional way of writing the solution for the given problem.

def findMinKeyWithMaxFrequency(arr: List[Int]): Int =
arr.groupBy(identity).toSeq.maxBy(p => (p._2.length,-p._1))._1
Use groupBy() to get an effective count for each element then, after flattening to a sequence of tuples, code the required rules to determine the maximum.

Related

Can Lazy modify elements in Array in Scala?

I want to define an array, each element of array is a data set read from certain path in the file system, because the data reading is costly and the position in array to be visited is sparse, so I want to use Lazy modifier to realize that one data set will not be read until being visited. How can define this kind of array?
Yes,we can define it with view function.
Instead of (0 to 10).toArray
scala> val a=(0 to 10).toArray
a: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
View can not instantiate the array but when ever we call then only it executes
val a=(0 to 10).view
a: scala.collection.SeqView[Int,scala.collection.immutable.IndexedSeq[Int]] = SeqView(...)
scala> for (x <- a ){
| println(x)}
0
1
2
3
4
5
6
7
8
9
10
I hope this answered your question
No, you cannot use lazy on the array to make the elements lazy. The most natural thing to do would be to use a caching library like ScalaCache.
You can also make a kind of wrapper class with a lazy field as you suggested in your comment. However, I would prefer not to expose the caching to the clients of the array. You should be able to just write myArray(index) to access an element.
If you don't want to use a library, this is another option (without lazy) that gives you an array like object with caching:
class CachingArray[A](size: Int, getElement: Int => A) {
val elements = scala.collection.mutable.Map[Int, A]()
def apply(index: Int) = elements.getOrElseUpdate(index, getElement(index))
}
Just initialize it with the size and a function that computes an element at a given index.
If you like, you can make it extend IndexedSeq[A] so it can be used more like a real array. Just implement length like this:
override def length: Int = size

Create Spark dataset with parts of other dataset

I'm trying to create a new dataset by taking intervals from another dataset, for example, consider dataset1 as input and dataset2 as output:
dataset1 = [1, 2, 3, 4, 5, 6]
dataset2 = [1, 2, 2, 3, 3, 4, 4, 5, 5, 6]
I managed to do that using arrays, but for mlib a dataset is needed.
My code with array:
def generateSeries(values: Array[Double], n: Int): Seq[Array[Float]] = {
var res: Array[Array[Float]] = new Array[Array[Float]](m)
for(i <- 0 to m-n){
res :+ values(i to i + n)
}
return res
}
FlatMap seems like the way to go, but how a function can search for the next value in the dataset?
The problem here is that an array is in no way similar to a DataSet. A DataSet is unordered and has no indices, so thinking in terms of arrays won't help you. Go for a Seq and treat it without using indices and positions at all.
So, to represent an array-like behaviour on a DataSet you need to create your own indices. This is simply done by pairing the value with the position in the "abstract array" we are representing.
So the type of your DataSet will be something like [(Int,Int)], where the first is the index and the second is the value. They will arrive unordered, so you will need to rework your logic in a more functional way. It's not really clear what you're trying to achieve but I hope I gave you an hint. Otherwise explain better the expected result in the comment to my answer and I will edit.

How to create a scala function that returns the number of even values in an array?

I'm attempting to create a function that returns the number of even values in an array. So far I have the following code, but it's not working:
(a: Array[Int]): Int = {
var howManyEven = 0
for(i <-0 to a.length) {
if(a(i)%2==0){
howManyEven+= 1
}
howManyEven
}
In addition, for some reason I'm stumped as to how to return the number of odd values in an array. Are my methods off? I guess I'm just stumped about what methods to use to generate my desired output.
You have an off-by-one error (ignoring other typos and missing information), in that you're trying to go from 0 to a.length. But if the length is 10, then you're going from 0 to 10, which is 11 indices. It should be a.length - 1.
You could avoid having to reason about off-by-one errors by using a functional approach. The same thing can be accomplished in one line using standard methods in the collections library.
def howManyEven(a: Array[Int]): Int = a.count(_ % 2 == 0)
scala> howManyEven(Array(1, 2, 3, 4, 6, 8, 9, 10, 11))
res1: Int = 5
count is a method in the collections library that counts the elements in a collection that satisfy a Boolean property. In this case, checking that an element is even.
I suggest having a read of the methods available on List, for example. The Scala collections library is very rich, and has methods for almost anything you want to do. It's just a matter of finding the right one (or combination of). As you can see, the Java way of setting up for loops and using mutable variables tends to be error prone, and in Scala it is best to avoid that.
You can also achieve this by using filter or groupBy
def howManyEven(a: Array[Int]): Int = a.filter(_%2==0).size
def howManyEven(a: Array[Int]): Int = a.groupBy(_ % 2 == 0).get(true).get.size

Seq with maximal elements

I have a Seq and function Int => Int. What I need to achieve is to take from original Seq only thoose elements that would be equal to the maximum of the resulting sequence (the one, I'll have after applying given function):
def mapper:Int=>Int= x=>x*x
val s= Seq( -2,-2,2,2 )
val themax= s.map(mapper).max
s.filter( mapper(_)==themax)
But this seems wasteful, since it has to map twice (once for the filter, other for the maximum).
Is there a better way to do this? (without using a cycle, hopefully)
EDIT
The code has since been edited; in the original this was the filter line: s.filter( mapper(_)==s.map(mapper).max). As om-nom-nom has pointed out, this evaluates `s.map(mapper).max each (filter) iteration, leading to quadratic complexity.
Here is a solution that does the mapping only once and using the `foldLeft' function:
The principle is to go through the seq and for each mapped element if it is greater than all mapped before then begin a new sequence with it, otherwise if it is equal return the list of all maximums and the new mapped max. Finally if it is less then return the previously computed Seq of maximums.
def getMaxElems1(s:Seq[Int])(mapper:Int=>Int):Seq[Int] = s.foldLeft(Seq[(Int,Int)]())((res, elem) => {
val e2 = mapper(elem)
if(res.isEmpty || e2>res.head._2)
Seq((elem,e2))
else if (e2==res.head._2)
res++Seq((elem,e2))
else res
}).map(_._1) // keep only original elements
// test with your list
scala> getMaxElems1(s)(mapper)
res14: Seq[Int] = List(-2, -2, 2, 2)
//test with a list containing also non maximal elements
scala> getMaxElems1(Seq(-1, 2,0, -2, 1,-2))(mapper)
res15: Seq[Int] = List(2, -2, -2)
Remark: About complexity
The algorithm I present above has a complexity of O(N) for a list with N elements. However:
the operation of mapping all elements is of complexity O(N)
the operation of computing the max is of complexity O(N)
the operation of zipping is of complexity O(N)
the operation of filtering the list according to the max is also of complexity O(N)
the operation of mapping all elements is of complexity O(M), with M the number of final elements
So, finally the algorithm you presented in your question has the same complexity (quality) than my answer's one, moreover the solution you present is more clear than mine. So, even if the 'foldLeft' is more powerful, for this operation I would recommend your idea, but with zipping original list and computing the map only once (especially if your map is more complicated than a simple square). Here is the solution computed with the help of *scala_newbie* in question/chat/comments.
def getMaxElems2(s:Seq[Int])(mapper:Int=>Int):Seq[Int] = {
val mappedS = s.map(mapper) //map done only once
val m = mappedS.max // find the max
s.zip(mappedS).filter(_._2==themax).unzip._1
}
// test with your list
scala> getMaxElems2(s)(mapper)
res16: Seq[Int] = List(-2, -2, 2, 2)
//test with a list containing also non maximal elements
scala> getMaxElems2(Seq(-1, 2,0, -2, 1,-2))(mapper)
res17: Seq[Int] = List(2, -2, -2)

Scala: what is the most appropriate data structure for sorted subsets?

Given a large collection (let's call it 'a') of elements of type T (say, a Vector or List) and an evaluation function 'f' (say, (T) => Double) I would like to derive from 'a' a result collection 'b' that contains the N elements of 'a' that result in the highest value under f. The collection 'a' may contain duplicates. It is not sorted.
Maybe leaving the question of parallelizability (map/reduce etc.) aside for a moment, what would be the appropriate Scala data structure for compiling the result collection 'b'? Thanks for any pointers / ideas.
Notes:
(1) I guess my use case can be most concisely expressed as
val a = Vector( 9,2,6,1,7,5,2,6,9 ) // just an example
val f : (Int)=>Double = (n)=>n // evaluation function
val b = a.sortBy( f ).take( N ) // sort, then clip
except that I do not want to sort the entire set.
(2) one option might be an iteration over 'a' that fills a TreeSet with 'manual' size bounding (reject anything worse than the worst item in the set, don't let the set grow beyond N). However, I would like to retain duplicates present in the original set in the result set, and so this may not work.
(3) if a sorted multi-set is the right data structure, is there a Scala implementation of this? Or a binary-sorted Vector or Array, if the result set is reasonably small?
You can use a priority queue:
def firstK[A](xs: Seq[A], k: Int)(implicit ord: Ordering[A]) = {
val q = new scala.collection.mutable.PriorityQueue[A]()(ord.reverse)
val (before, after) = xs.splitAt(k)
q ++= before
after.foreach(x => q += ord.max(x, q.dequeue))
q.dequeueAll
}
We fill the queue with the first k elements and then compare each additional element to the head of the queue, swapping as necessary. This works as expected and retains duplicates:
scala> firstK(Vector(9, 2, 6, 1, 7, 5, 2, 6, 9), 4)
res14: scala.collection.mutable.Buffer[Int] = ArrayBuffer(6, 7, 9, 9)
And it doesn't sort the complete list. I've got an Ordering in this implementation, but adapting it to use an evaluation function would be pretty trivial.