Scala indices of values in one list which are not in a second list - scala

I am trying to find the indices of elements in one Scala list which are not present in a second list (assume the second list has distinct elements so that you do not need to invoke toSet on it). The best way I found is:
val ls = List("a", "b", "c") // the list
val excl = List("c", "d") // the list of items to exclude
val ixs = ls.zipWithIndex.
filterNot{p => excl.contains(p._1)}.
map{ p => p._2} // the list of indices
However, I feel there should be a more direct method. Any hints?

Seems OK to me. It's a bit more elegant as a for-comprehension, perhaps:
for ((e,i) <- ls.zipWithIndex if !excl.contains(e)) yield i
And for efficiency, you might want to make excl into a Set anyway
val exclSet = excl.toSet
for ((e,i) <- ls.zipWithIndex if !exclSet(e)) yield i

One idea would be this
(ls.zipWithIndex.toMap -- excl).values
only works however, if you are not interested in all position if an element occurs multiple times in the list. That would need a MultiMap which Scala does not have in the standard library.
An other version would be to use a partial function and convert the second list to a set first (unless it is really small lookup in a set will be much fast)
val set = excl.toSet
ls.zipWithIndex.collect{case (x,y) if !set(x) => y}

Related

How to add a list of elements into another list at a certain index without removing or replacing any elements

Learning Scala right now to prepare for college
I want to add a list into another list at a certain index without replacing elements at that index. For instance, if I have an initial list:
var list1: List[Int] = List(1,2,3,4)
I want to add List(4,1,5) into it so that it will become:
var list1: List[Int] = List(1,2,4,1,5,3,4)
Edit: I've tried creating new lists and adding the head of the first list, the list I want to add to the first, and the tail of the first list to return a brand new list.
This is what I did but I was wondering if there were any more efficient and "smarter" ways. I've done some research on insert but I'm not sure if insert satisfies what I'm trying to do as I don't understand insert fully.
A key part of learning Scala is learning the standard library, which has a rich set of classes and methods to handle a lot of standard operations. In this case the splitAt method on a collection is going to help:
var list1 = List(1, 2, 3, 4)
val list2 = List(4, 1, 5)
val (pre, post) = list1.splitAt(2)
pre ++ list2 ++ post
This is a "smart" way of doing it because it clearly and simply shows the sequence of operations that is being done, making the code easier to write and easier to maintain.
Note that this is safe in the case where the initial list is shorter than 2 elements because splitAt takes care of this and just returns the initial list in pre and leaves post empty.
Scala collections have a patch member, which can replace or insert elements:
var list1 = List(1,2,3,4)
list1.patch(2, List(4,1,5), 0)
Note: inserting elements is done by telling the collection to replace 0 elements.
def insertlistAtIndex(index : Int , original : List[Int], appendedList : List[Int]) : List[Int] = {
original.
zipWithIndex.flatMap{case (elem, ind) =>
if(ind == index)
appendedList :+ elem // append the element at that index
// appendedList ::: List(elem) more efficient solution (concatenation)
else
List(elem)
}
}
val list1: List[Int] = List(1,2,3,4)
val list2 : List[Int] = List(4,1,5)
val expectedResult : List[Int] = List(1,2,4,1,5,3,4)
val insertAtIndex = 2
assert(insertlistAtIndex(insertAtIndex,list1,list2) == expectedResult )
Here is another solution, it's more flexible and functional, the idea is to zip the list with its index, move through each element in the list, and when you arrive at the index you want to put the other list, you insert that list there, as well as the element that was originally at that element(shown in two ways). This way, you avoid unnecessary exceptions even if the index supplied is out of range. Hope this helps.

How to create a map from a RDD[String] using scala?

My file is,
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
Here there are 7 rows & 5 columns(0,1,2,3,4)
I want the output as,
Map(0 -> Set("sunny","overcast","rainy"))
Map(1 -> Set("hot","mild","cool"))
Map(2 -> Set("high","normal"))
Map(3 -> Set("false","true"))
Map(4 -> Set("yes","no"))
The output must be the type of [Map[Int,Set[String]]]
EDIT: Rewritten to present the map-reduce version first, as it's more suited to Spark
Since this is Spark, we're probably interested in parallelism/distribution. So we need to take care to enable that.
Splitting each string into words can be done in partitions. Getting the set of values used in each column is a bit more tricky - the naive approach of initialising a set then adding every value from every row is inherently serial/local, since there's only one set (per column) we're adding the value from each row to.
However, if we have the set for some part of the rows and the set for the rest, the answer is just the union of these sets. This suggests a reduce operation where we merge sets for some subset of the rows, then merge those and so on until we have a single set.
So, the algorithm:
Split each row into an array of strings, then change this into an
array of sets of the single string value for each column - this can
all be done with one map, and distributed.
Now reduce this using an
operation that merges the set for each column in turn. This also can
be distributed
turn the single row that results into a Map
It's no coincidence that we do a map, then a reduce, which should remind you of something :)
Here's a one-liner that produces the single row:
val data = List(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val row = data.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
Converting it to a Map as the question asks:
val theMap = row.zipWithIndex.map(_.swap).toMap
Zip the list with the index, since that's what we need as the key of
the map.
The elements of each tuple are unfortunately in the wrong
order for .toMap, so swap them.
Then we have a list of (key, value)
pairs which .toMap will turn into the desired result.
These don't need to change AT ALL to work with Spark. We just need to use a RDD, instead of the List. Let's convert data into an RDD just to demo this:
val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc= new SparkContext(conf)
val rdd = sc.makeRDD(data)
val row = rdd.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
(This can be converted into a Map as before)
An earlier oneliner works neatly (transpose is exactly what's needed here) but is very difficult to distribute (transpose inherently needs to visit every row)
data.map(_.split("\\W+")).transpose.map(_.toSet)
(Omitting the conversion to Map for clarity)
Split each string into words.
Transpose the result, so we have a list that has a list of the first words, then a list of the second words, etc.
Convert each of those to a set.
Maybe this do the trick:
val a = Array(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val b = new Array[Map[String, Set[String]]](5)
for (i <- 0 to 4)
b(i) = Map(i.toString -> (Set() ++ (for (s <- a) yield s.split(",")(i))) )
println(b.mkString("\n"))

Flattening a Set of pairs of sets to one pair of sets

I have a for-comprehension with a generator from a Set[MyType]
This MyType has a lazy val variable called factsPair which returns a pair of sets:
(Set[MyFact], Set[MyFact]).
I wish to loop through all of them and unify the facts into one flattened pair (Set[MyFact], Set[MyFact]) as follows, however I am getting No implicit view available ... and not enough arguments for flatten: implicit (asTraversable ... errors. (I am a bit new to Scala so still trying to get used to the errors).
lazy val allFacts =
(for {
mytype <- mytypeList
} yield mytype.factsPair).flatten
What do I need to specify to flatten for this to work?
Scala flatten works on same types. You have a Seq[(Set[MyFact], Set[MyFact])], which can't be flattened.
I would recommend learning the foldLeft function, because it's very general and quite easy to use as soon as you get the hang of it:
lazy val allFacts = myTypeList.foldLeft((Set[MyFact](), Set[MyFact]())) {
case (accumulator, next) =>
val pairs1 = accumulator._1 ++ next.factsPair._1
val pairs2 = accumulator._2 ++ next.factsPair._2
(pairs1, pairs2)
}
The first parameter takes the initial element it will append the other elements to. We start with an empty Tuple[Set[MyFact], Set[MyFact]] initialized like this: (Set[MyFact](), Set[MyFact]()).
Next we have to specify the function that takes the accumulator and appends the next element to it and returns with the new accumulator that has the next element in it. Because of all the tuples, it doesn't look nice, but works.
You won't be able to use flatten for this, because flatten on a collection returns a collection, and a tuple is not a collection.
You can, of course, just split, flatten, and join again:
val pairs = for {
mytype <- mytypeList
} yield mytype.factsPair
val (first, second) = pairs.unzip
val allFacts = (first.flatten, second.flatten)
A tuple isn't traverable, so you can't flatten over it. You need to return something that can be iterated over, like a List, for example:
List((1,2), (3,4)).flatten // bad
List(List(1,2), List(3,4)).flatten // good
I'd like to offer a more algebraic view. What you have here can be nicely solved using monoids. For each monoid there is a zero element and an operation to combine two elements into one.
In this case, sets for a monoid: the zero element is an empty set and the operation is a union. And if we have two monoids, their Cartesian product is also a monoid, where the operations are defined pairwise (see examples on Wikipedia).
Scalaz defines monoids for sets as well as tuples, so we don't need to do anything there. We'll just need a helper function that combines multiple monoid elements into one, which is implemented easily using folding:
def msum[A](ps: Iterable[A])(implicit m: Monoid[A]): A =
ps.foldLeft(m.zero)(m.append(_, _))
(perhaps there already is such a function in Scala, I didn't find it). Using msum we can easily define
def pairs(ps: Iterable[MyType]): (Set[MyFact], Set[MyFact]) =
msum(ps.map(_.factsPair))
using Scalaz's implicit monoids for tuples and sets.

How to collect elements of a collection basing on a result of some method?

Suppose we have a list of values sorted according to some ordering. We also have a map of elements mapped to these values. We want to obtain a collection of elements from the map in the same order as their keys are in the list. A straightforward method to do this is:
val order = Seq("a", "b", "c")
val map = Map("a" -> "aaa", "c" -> "ccc")
val elems = order.map(map.get(_)).filter(_.isDefined).map(_.get)
However the program needs to iterate over the collection three times. Is it possible to implement this functionality more efficiently? In particular, is it possible to do this with collect method?
Well, a standard Scala map is also a PartialFunction, so you can use "collect".
val elems = order.collect(map)
If you base it on an Option return, then this works:
order flatMap (map get)
Though, of course, order collect map is enough in this particular example.
More generally you can use views; then the collection is only iterated once and all three operations are applied as you go:
order.view.map(map.get).filter(_.isDefined).map(_.get).force
You can use flatMap for that. Here is an example:
List(1,2,3,4,5).flatMap(x => if (x%2 == 1) Some(2*x) else None)
This is equivalent to
List(1,2,3,4,5).filter(_%2==1).map(2*)

Scala, make my loop more functional

I'm trying to reduce the extent to which I write Scala (2.8) like Java. Here's a simplification of a problem I came across. Can you suggest improvements on my solutions that are "more functional"?
Transform the map
val inputMap = mutable.LinkedHashMap(1->'a',2->'a',3->'b',4->'z',5->'c')
by discarding any entries with value 'z' and indexing the characters as they are encountered
First try
var outputMap = new mutable.HashMap[Char,Int]()
var counter = 0
for(kvp <- inputMap){
val character = kvp._2
if(character !='z' && !outputMap.contains(character)){
outputMap += (character -> counter)
counter += 1
}
}
Second try (not much better, but uses an immutable map and a 'foreach')
var outputMap = new immutable.HashMap[Char,Int]()
var counter = 0
inputMap.foreach{
case(number,character) => {
if(character !='z' && !outputMap.contains(character)){
outputMap2 += (character -> counter)
counter += 1
}
}
}
Nicer solution:
inputMap.toList.filter(_._2 != 'z').map(_._2).distinct.zipWithIndex.toMap
I find this solution slightly simpler than arjan's:
inputMap.values.filter(_ != 'z').toSeq.distinct.zipWithIndex.toMap
The individual steps:
inputMap.values // Iterable[Char] = MapLike(a, a, b, z, c)
.filter(_ != 'z') // Iterable[Char] = List(a, a, b, c)
.toSeq.distinct // Seq[Char] = List(a, b, c)
.zipWithIndex // Seq[(Char, Int)] = List((a,0), (b,1), (c,2))
.toMap // Map[Char, Int] = Map((a,0), (b,1), (c,2))
Note that your problem doesn't inherently involve a map as input, since you're just discarding the keys. If I were coding this, I'd probably write a function like
def buildIndex[T](s: Seq[T]): Map[T, Int] = s.distinct.zipWithIndex.toMap
and invoke it as
buildIndex(inputMap.values.filter(_ != 'z').toSeq)
First, if you're doing this functionally, you should use an immutable map.
Then, to get rid of something, you use the filter method:
inputMap.filter(_._2 != 'z')
and finally, to do the remapping, you can just use the values (but as a set) withzipWithIndex, which will count up from zero, and then convert back to a map:
inputMap.filter(_._2 != 'z').values.toSet.zipWithIndex.toMap
Since the order of values isn't going to be preserved anyway*, presumably it doesn't matter that the order may have been shuffled yet again with the set transformation.
Edit: There's a better solution in a similar vein; see Arjan's. Assumption (*) is wrong, since it was a LinkedHashMap. So you do need to preserve order, which Arjan's solution does.
i would create some "pipeline" like this, but this has a lot of operations and could be probably shortened. These two List.map's could be put in one, but I think you've got a general idea.
inputMap
.toList // List((5,c), (1,a), (2,a), (3,b), (4,z))
.sorted // List((1,a), (2,a), (3,b), (4,z), (5,c))
.filterNot((x) => {x._2 == 'z'}) // List((1,a), (2,a), (3,b), (5,c))
.map(_._2) // List(a, a, b, c)
.zipWithIndex // List((a,0), (a,1), (b,2), (c,3))
.map((x)=>{(x._2+1 -> x._1)}) // List((1,a), (2,a), (3,b), (4,c))
.toMap // Map((1,a), (2,a), (3,b), (4,c))
performing these operation on lists keeps ordering of elements.
EDIT: I misread the OP question - thought you wanted run length encoding. Here's my take on your actual question:
val values = inputMap.values.filterNot(_ == 'z').toSet.zipWithIndex.toMap
EDIT 2: As noted in the comments, use toSeq.distinct or similar if preserving order is important.
val values = inputMap.values.filterNot(_ == 'z').toSeq.distinct.zipWithIndex.toMap
In my experience I have found that maps and functional languages do not play nice. You'll note that all answers so far in one way or another in involve turning the map into a list, filtering the list, and then turning the list back into a map.
I think this is due to maps being mutable data structures by nature. Consider that when building a list, that the underlying structure of the list does not change when you append a new element and if a true list then an append is a constant O(1) operation. Whereas for a map the internal structure of a map can vastly change when a new element is added ie. when the load factor becomes too high and the add algorithm resizes the map. In this way a functional language cannot just create a series of a values and pop them into a map as it goes along due to the possible side effects of introducing a new key/value pair.
That said, I still think there should be better support for filtering, mapping and folding/reducing maps. Since we start with a map, we know the maximum size of the map and it should be easy to create a new one.
If you're wanting to get to grips with functional programming then I'd recommending steering clear of maps to start with. Stick with the things that functional languages were designed for -- list manipulation.