Efficiently Filter elements in a List based on its indexes - scala

I am doing an exercise that ask to remove the elements at odd positions.
I wonder if there is a best alternative to what I thought:
val a = List(1,2,3,4,5,6)
The first approach:
a.zipWithIndex.filter(x => (x._2 & 1) == 1).map(_._1)
and the second:
a.indices.filter(i => (i & 1) == 1).map(a(_))
Am I correct if I think the second approach is more efficient? Since it is not necessary to produce an intermediate list as zipWithIndex does?

You can use a view to avoid intermediate lists:
a.view
.zipWithIndex
.filter(x => (x._2 & 1) == 1)
.map(_._1)
.force
This will only traverse a once when force is called.

You can use the collect method on the zipped list, might be a bit clearer
a.zipWithIndex.collect{
case (x,i) if i % 2 == 1 => x
}
https://scalafiddle.io/sf/YbureiX/0
I am not sure about the efficiency though

You can avoid formation of intermediate collection by using withFilter, also you can convert list to Vector to extract element at particular indices in constant time:
val a: Vector[Int] = List(1,2,3,4,5,6).toVector
val res: Seq[Int] = a.indices.withFilter(i => (i & 1) == 1).map(a(_))
println(res)

Related

Scala. How to delete all n'th element from Stream

Now studying Streams in Scala. Can anybody help me with function that will delete all n'th element from Stream.
[2,3,99,1,66,3,4];3 must return this: [2,3,1,66,4]
myStream.zipWithIndex //attach index to every element
.filter(x => (1 + x._2) % n > 0) //adjust index, remove every nth
.map(_._1) //remove index
Oops, almost forgot: filter and map can be combined.
myStream.zipWithIndex
.collect{case (e,x) if (1 + x) % n > 0 => e}
I wanted to try doing this without zipWithIndex and arrived at:
def dropNth[T](s: Stream[T], n: Int): Stream[T] = {
val (firstn, rest) = s.splitAt(n)
if (firstn.length < n)
firstn
else
firstn.take(n - 1) #::: dropNth(rest, n)
}
There must be a way to replace the explicit recursion with a fold or scan, but it doesn't seem to be trivial.
(In my comment I missed the requirement to omit all nth.) Here is a solution with .zipWithIndex and flatMap based on #jwvh's answer:
stream.zipWithIndex.flatMap{ case (v, idx) if (idx + 1) % n > 0 => Stream(v) // keep when not nth
case _ => Stream.empty // omit nth
}
Here flatMap is used like a filter. If you need to replace the nth elements with something other than an empty Stream, this might be useful.

Scala collect function

Let's say I want to print duplicates in a list with their count. So I have 3 options as shown below:
def dups(dup:List[Int]) = {
//1)
println(dup.groupBy(identity).collect { case (x,ys) if ys.lengthCompare(1) > 0 => (x,ys.size) }.toSeq)
//2)
println(dup.groupBy(identity).collect { case (x, List(_, _, _*)) => x }.map(x => (x, dup.count(y => x == y))))
//3)
println(dup.distinct.map((a:Int) => (a, dup.count((b:Int) => a == b )) ).filter( (pair: (Int,Int) ) => { pair._2 > 1 } ))
}
Questions:
-> For option 2, is there any way to name the list parameter so that it can be used to append the size of the list just like I did in option 1 using ys.size?
-> For option 1, is there any way to avoid the last call to toSeq to return a List?
-> which one of the 3 choices is more efficient by using the least amount of loops?
As an example input: List(1,1,1,2,3,4,5,5,6,100,101,101,102)
Should print: List((1,3), (5,2), (101,2))
Based on #lutzh answer below the best way would be to do the following:
val list: List[(Int, Int)] = dup.groupBy(identity).collect({ case (x, ys # List(_, _, _*)) => (x, ys.size) })(breakOut)
val list2: List[(Int, Int)] = dup.groupBy(identity).collect { case (x, ys) if ys.lengthCompare(1) > 0 => (x, ys.size) }(breakOut)
For option 1 is there any way to avoid the last call to toSeq to
return a List?
collect takes a CanBuildFrom, so if you assign it to something of the desired type you can use breakOut:
import collection.breakOut
val dups: List[(Int,Int)] =
dup
.groupBy(identity)
.collect({ case (x,ys) if ys.size > 1 => (x,ys.size)} )(breakOut)
collect will create a new collection (just like map), using a Builder. Usually the return type is determined by the origin type. With breakOut you basically ignore the origin type and look for a builder for the result type. So when collect creates the resulting collection, it will already create the "right" type, and you don't have to traverse the result again to convert it.
For option 2, is there any way to name the list parameter so that it
can be used to append the size of the list just like I did in option 1
using ys.size?
Yes, you can bind it to a variable with #
val dups: List[(Int,Int)] =
dup
.groupBy(identity)
.collect({ case (x, ys # List(_, _, _*)) => (x, ys.size) } )(breakOut)
which one of the 3 choices is more efficient?
Calling dup.count on a match seems inefficient, as dup needs to be traversed again then, I'd avoid that.
My guess would be that the guard (if lengthCompare(1) > 0) takes a few cycles less than the List(,,_*) pattern, but I haven't measured. And am not planning to.
Disclaimer: There may be a completely different (and more efficient) way of doing it that I can't think of right now. I'm only answering your specific questions.

Scala - finding a specific tuple in a list

Let's say we have this list of tuples:
val data = List(('a', List(1, 0)), ('b', List(1, 1)), ('c', List(0)))
The list has this signature:
List[(Char, List[Int])]
My task is to get the "List[Int]" element from a tuple inside "data" whose key is, for instance, letter "b". If I implement a method like "findIntList(data, 'b')", then I expect List(1, 1) as a result. I have tried the following approaches:
data.foreach { elem => if (elem._1 == char) return elem._2 }
data.find(x=> x._1 == ch)
for (elem <- data) yield elem match {case (x, y: List[Bit]) => if (x == char) y}
for (x <- data) yield if (x._1 == char) x._2
With all the approaches (except Approach 1, where I employ an explicit "return"), I get either a List[Option] or List[Any] and I don't know how to extract the "List[Int]" out of it.
One of many ways:
data.toMap.get('b').get
toMap converts a list of 2-tuples into a Map from the first element of the tuples to the second. get gives you the value for the given key and returns an Option, thus you need another get to actually get the list.
Or you can use:
data.find(_._1 == 'b').get._2
Note: Only use get on Option when you can guarantee that you'll have a Some and not a None. See http://www.scala-lang.org/api/current/index.html#scala.Option for how to use Option idiomatic.
Update: Explanation of the result types you see with your different approaches
Approach 2: find returns an Option[List[Int]] because it can not guarantee that a matching element gets found.
Approach 3: here you basically do a map, i.e. you apply a function to each element of your collection. For the element you are looking for the function returns your List[Int] for all other elements it contains the value () which is the Unit value, roughly equivalent to void in Java, but an actual type. Since the only common super type of ´List[Int]´ and ´Unit´ is ´Any´ you get a ´List[Any]´ as the result.
Approach 4 is basically the same as #3
Another way is
data.toMap.apply('b')
Or with one intermediate step this is even nicer:
val m = data.toMap
m('b')
where apply is used implicitly, i.e., the last line is equivalent to
m.apply('b')
There are multiple ways of doing it. One more way:
scala> def listInt(ls:List[(Char, List[Int])],ch:Char) = ls filter (a => a._1 == ch) match {
| case Nil => List[Int]()
| case x ::xs => x._2
| }
listInt: (ls: List[(Char, List[Int])], ch: Char)List[Int]
scala> listInt(data, 'b')
res66: List[Int] = List(1, 1)
You can try something like(when you are sure it exists) simply by adding type information.
val char = 'b'
data.collect{case (x,y:List[Int]) if x == char => y}.head
or use headOption if your not sure the character exists
data.collect{case (x,y:List[Int]) if x == char => y}.headOption
You can also solve this using pattern matching. Keep in mind you need to make it recursive though. The solution should look something like this;
def findTupleValue(tupleList: List[(Char, List[Int])], char: Char): List[Int] = tupleList match {
case (k, list) :: _ if char == k => list
case _ :: theRest => findTupleValue(theRest, char)
}
What this will do is walk your tuple list recursively. Check whether the head element matches your condition (the key you are looking for) and then returns it. Or continues with the remainder of the list.

Extract elements from one list that aren't in another

Simply, I have two lists and I need to extract the new elements added to one of them.
I have the following
val x = List(1,2,3)
val y = List(1,2,4)
val existing :List[Int]= x.map(xInstance => {
if (!y.exists(yInstance =>
yInstance == xInstance))
xInstance
})
Result :existing: List[AnyVal] = List((), (), 3)
I need to remove all other elements except the numbers with the minimum cost.
Pick a suitable data structure, and life becomes a lot easier.
scala> x.toSet -- y
res1: scala.collection.immutable.Set[Int] = Set(3)
Also beware that:
if (condition) expr1
Is shorthand for:
if (condition) expr1 else ()
Using the result of this, which will usually have the static type Any or AnyVal is almost always an error. It's only appropriate for side-effects:
if (condition) buffer += 1
if (condition) sys.error("boom!")
retronym's solution is okay IF you don't have repeated elements that and you don't care about the order. However you don't indicate that this is so.
Hence it's probably going to be most efficient to convert y to a set (not x). We'll only need to traverse the list once and will have fast O(log(n)) access to the set.
All you need is
x filterNot y.toSet
// res1: List[Int] = List(3)
edit:
also, there's a built-in method that is even easier:
x diff y
(I had a look at the implementation; it looks pretty efficient, using a HashMap to count ocurrences.)
The easy way is to use filter instead so there's nothing to remove;
val existing :List[Int] =
x.filter(xInstance => !y.exists(yInstance => yInstance == xInstance))
val existing = x.filter(d => !y.exists(_ == d))
Returns
existing: List[Int] = List(3)

Scala, make my loop more functional

I'm trying to reduce the extent to which I write Scala (2.8) like Java. Here's a simplification of a problem I came across. Can you suggest improvements on my solutions that are "more functional"?
Transform the map
val inputMap = mutable.LinkedHashMap(1->'a',2->'a',3->'b',4->'z',5->'c')
by discarding any entries with value 'z' and indexing the characters as they are encountered
First try
var outputMap = new mutable.HashMap[Char,Int]()
var counter = 0
for(kvp <- inputMap){
val character = kvp._2
if(character !='z' && !outputMap.contains(character)){
outputMap += (character -> counter)
counter += 1
}
}
Second try (not much better, but uses an immutable map and a 'foreach')
var outputMap = new immutable.HashMap[Char,Int]()
var counter = 0
inputMap.foreach{
case(number,character) => {
if(character !='z' && !outputMap.contains(character)){
outputMap2 += (character -> counter)
counter += 1
}
}
}
Nicer solution:
inputMap.toList.filter(_._2 != 'z').map(_._2).distinct.zipWithIndex.toMap
I find this solution slightly simpler than arjan's:
inputMap.values.filter(_ != 'z').toSeq.distinct.zipWithIndex.toMap
The individual steps:
inputMap.values // Iterable[Char] = MapLike(a, a, b, z, c)
.filter(_ != 'z') // Iterable[Char] = List(a, a, b, c)
.toSeq.distinct // Seq[Char] = List(a, b, c)
.zipWithIndex // Seq[(Char, Int)] = List((a,0), (b,1), (c,2))
.toMap // Map[Char, Int] = Map((a,0), (b,1), (c,2))
Note that your problem doesn't inherently involve a map as input, since you're just discarding the keys. If I were coding this, I'd probably write a function like
def buildIndex[T](s: Seq[T]): Map[T, Int] = s.distinct.zipWithIndex.toMap
and invoke it as
buildIndex(inputMap.values.filter(_ != 'z').toSeq)
First, if you're doing this functionally, you should use an immutable map.
Then, to get rid of something, you use the filter method:
inputMap.filter(_._2 != 'z')
and finally, to do the remapping, you can just use the values (but as a set) withzipWithIndex, which will count up from zero, and then convert back to a map:
inputMap.filter(_._2 != 'z').values.toSet.zipWithIndex.toMap
Since the order of values isn't going to be preserved anyway*, presumably it doesn't matter that the order may have been shuffled yet again with the set transformation.
Edit: There's a better solution in a similar vein; see Arjan's. Assumption (*) is wrong, since it was a LinkedHashMap. So you do need to preserve order, which Arjan's solution does.
i would create some "pipeline" like this, but this has a lot of operations and could be probably shortened. These two List.map's could be put in one, but I think you've got a general idea.
inputMap
.toList // List((5,c), (1,a), (2,a), (3,b), (4,z))
.sorted // List((1,a), (2,a), (3,b), (4,z), (5,c))
.filterNot((x) => {x._2 == 'z'}) // List((1,a), (2,a), (3,b), (5,c))
.map(_._2) // List(a, a, b, c)
.zipWithIndex // List((a,0), (a,1), (b,2), (c,3))
.map((x)=>{(x._2+1 -> x._1)}) // List((1,a), (2,a), (3,b), (4,c))
.toMap // Map((1,a), (2,a), (3,b), (4,c))
performing these operation on lists keeps ordering of elements.
EDIT: I misread the OP question - thought you wanted run length encoding. Here's my take on your actual question:
val values = inputMap.values.filterNot(_ == 'z').toSet.zipWithIndex.toMap
EDIT 2: As noted in the comments, use toSeq.distinct or similar if preserving order is important.
val values = inputMap.values.filterNot(_ == 'z').toSeq.distinct.zipWithIndex.toMap
In my experience I have found that maps and functional languages do not play nice. You'll note that all answers so far in one way or another in involve turning the map into a list, filtering the list, and then turning the list back into a map.
I think this is due to maps being mutable data structures by nature. Consider that when building a list, that the underlying structure of the list does not change when you append a new element and if a true list then an append is a constant O(1) operation. Whereas for a map the internal structure of a map can vastly change when a new element is added ie. when the load factor becomes too high and the add algorithm resizes the map. In this way a functional language cannot just create a series of a values and pop them into a map as it goes along due to the possible side effects of introducing a new key/value pair.
That said, I still think there should be better support for filtering, mapping and folding/reducing maps. Since we start with a map, we know the maximum size of the map and it should be easy to create a new one.
If you're wanting to get to grips with functional programming then I'd recommending steering clear of maps to start with. Stick with the things that functional languages were designed for -- list manipulation.