Scala for loop over two lists simultaneously - scala

I have a List[Message] and a List[Author] which have the same number of items, and should be ordered so that at each index, the Message is from the Author.
I also have class that we'll call here SmartMessage, with a constructor taking 2 arguments: a Message and the corresponding Author.
What I want to do, is to create a List[SmartMessage], combining the data of the 2 simple lists.
Extra question: does List preserve insertion order in Scala? Just to make sure I create List[Message] and a List[Author] with same ordering.

You could use zip:
val ms: List[Message] = ???
val as: List[Author] = ???
var sms = for ( (m, a) <- (ms zip as)) yield new SmartMessage(m, a)
If you don't like for-comprehensions you could use map:
var sms = (ms zip as).map{ case (m, a) => new SmartMessage(m, a)}
Method zip creates collection of pairs. In this case List[(Message, Author)].
You could also use zipped method on Tuple2 (and on Tuple3):
var sms = (ms, as).zipped.map{ (m, a) => new SmartMessage(m, a)}
As you can see you don't need pattern matching in map in this case.
Extra
List is Seq and Seq preserves order. See scala collections overview.
There are 3 main branches of collections: Seq, Set and Map.
Seq preserves order of elements.
Set contains no duplicate elements.
Map contains mappings from keys to values.
List in scala is linked list, so you should prepend elements to it, not append. See Performance Characteristics of scala collections.

Related

Retrieve many values associated with the given key at once using Scala

I have a map with hundreds of elements and I want to retrieve many value at once associated with the given key.
Example:
Map(t1 -> 1),(t2 -> 2),.....(t340 ->340)
I know I can use the apply method but i'm trying to retrieve like 50 values at once and this would make the code to look like:
val a = map.apply("t1","t2","t3","t4","t5","t6"...."t50)
Are there any other ways that i can retrieve many values at once using apply method or other methods of the scala map collection?
Depending on what kind of output you expect it can be done in many different ways.
If you are expecting collection, you could do something like:
val keys: List[String]
keys.flatMap(map.get) // List[Int]
where you would get a list of values ordered according to keys, but if there were no value for a key it would be skipped.
If you needed values in a collection and order would be irrelevant then
val keySet = keys.toSet
map.filter(p => keySet(p._1)).values
should be enough.
Yet another option would be:
keys.map { k =>
k -> map.get(k)
}
to avoid loosing information which values were present and which not. Though this wouldn't be much different in practice than just:
map.filter(p => keySet(p._1))
If you expected fixed number of keys, and all have to be present, then I can only see this as:
for {
a1 <- map.get(k1)
a2 <- map.get(k2)
...
an <- map.get(kn)
} yield (a1, a2, ..., an)
which would return Option of tuple. That could be written in a prettier way using Cats e.g.
(map.get(k1), map.get(k2), ..., map.get(kn)).tupled
though most extension methods (and tuples) support up to 22 arguments, so I can only see solving your problem for 50 keys with a collection. (Or very long for comprehension yielding custom 50-field case class).

Can we replace map with flatMap?

I was trying to find line with maximum words, and i wrote the following lines, to run on spark-shell:
import java.lang.Math
val counts = textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b))
But since, map is one to one , and flatMap is one to either zero or anything. So i tried replacing map with flatMap, in above code. But its giving error as:
<console>:24: error: type mismatch;
found : Int
required: TraversableOnce[?]
val counts = F1.flatMap(s => s.split(" ").size).reduce((a,b)=> Math.max(a,b))
If anybody could make me understand the reason, it will really be helpful.
flatMap must return an Iterable which is clearly not what you want. You do want a map because you want to map a line to the number of words, so you want a one-to-one function that takes a line and maps it to the number of words (though you could create a collection with one element, being the size of course...).
FlatMap is meant to associate a collection to an input, for instance if you wanted to map a line to all its words you would do:
val words = textFile.flatMap(x => x.split(" "))
and that would return an RDD[String] containing all the words.
In the end, map transforms an RDD of size N into another RDD of size N (e.g. your lines to their length) whereas flatMap transforms an RDD of size N into an RDD of size P (actually an RDD of size N into an RDD of size N made of collections, all these collections are then flattened to produce the RDD of size P).
P.S.: one last word that has nothing to do with your problem, it is more efficient to do (for a string s)
val nbWords = s.split(" ").length
than call .size(). Indeed, the split method returns an array of String and arrays do not have a size method. So when you call .size() you have an implicit conversion from Array[String] to SeqLike[String] which creates new objects. But Array[T] do have a length field so there's no conversion calling length. (It's a detail but I think it's good habit though).
Any use of map can be replaced by flatMap, but the function argument has to be changed to return a single-element List: textFile.flatMap(line => List(line.split(" ").size)). This isn't a good idea: it just makes your code less understandable and less efficient.
After reading Tired of Null Pointer Exceptions? Consider Using Java SE 8's Optional!'s part about why use flatMap() rather than Map(), I have realized the truly reason why flatMap() can not replace map() is that map() is not a special case of flatMap().
It's true that flatMap() means one-to-many, but that's not the only thing flatMap() does. It can also strip outer Stream() if put it simply.
See the definations of map and flatMap:
Stream<R> map(Function<? super T, ? extends R> mapper)
Stream<R> flatMap(Function<? super T, ? extends Stream<? extends R>> mapper)
the only difference is the type of returned value in inner function. What map() returned is "Stream<'what inner function returned'>", while what flatMap() returned is just "what inner function returned".
So you can say that flatMap() can kick outer Stream() away, but map() can't. This is the most difference in my opinion, and also why map() is not just a special case of flatMap().
ps:
If you really want to make one-to-one with flatMap, then you should change it into one-to-List(one). That means you should add an outer Stream() manually which will be stripped by flatMap() later. After that you'll get the same effect as using map().(Certainly, it's clumsy. So don't do like that.)
Here are examples for Java8, but the same as Scala:
use map():
list.stream().map(line -> line.split(" ").length)
deprecated use flatMap():
list.stream().flatMap(line -> Arrays.asList(line.split(" ").length).stream())

How to create a map from a RDD[String] using scala?

My file is,
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
Here there are 7 rows & 5 columns(0,1,2,3,4)
I want the output as,
Map(0 -> Set("sunny","overcast","rainy"))
Map(1 -> Set("hot","mild","cool"))
Map(2 -> Set("high","normal"))
Map(3 -> Set("false","true"))
Map(4 -> Set("yes","no"))
The output must be the type of [Map[Int,Set[String]]]
EDIT: Rewritten to present the map-reduce version first, as it's more suited to Spark
Since this is Spark, we're probably interested in parallelism/distribution. So we need to take care to enable that.
Splitting each string into words can be done in partitions. Getting the set of values used in each column is a bit more tricky - the naive approach of initialising a set then adding every value from every row is inherently serial/local, since there's only one set (per column) we're adding the value from each row to.
However, if we have the set for some part of the rows and the set for the rest, the answer is just the union of these sets. This suggests a reduce operation where we merge sets for some subset of the rows, then merge those and so on until we have a single set.
So, the algorithm:
Split each row into an array of strings, then change this into an
array of sets of the single string value for each column - this can
all be done with one map, and distributed.
Now reduce this using an
operation that merges the set for each column in turn. This also can
be distributed
turn the single row that results into a Map
It's no coincidence that we do a map, then a reduce, which should remind you of something :)
Here's a one-liner that produces the single row:
val data = List(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val row = data.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
Converting it to a Map as the question asks:
val theMap = row.zipWithIndex.map(_.swap).toMap
Zip the list with the index, since that's what we need as the key of
the map.
The elements of each tuple are unfortunately in the wrong
order for .toMap, so swap them.
Then we have a list of (key, value)
pairs which .toMap will turn into the desired result.
These don't need to change AT ALL to work with Spark. We just need to use a RDD, instead of the List. Let's convert data into an RDD just to demo this:
val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc= new SparkContext(conf)
val rdd = sc.makeRDD(data)
val row = rdd.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
(This can be converted into a Map as before)
An earlier oneliner works neatly (transpose is exactly what's needed here) but is very difficult to distribute (transpose inherently needs to visit every row)
data.map(_.split("\\W+")).transpose.map(_.toSet)
(Omitting the conversion to Map for clarity)
Split each string into words.
Transpose the result, so we have a list that has a list of the first words, then a list of the second words, etc.
Convert each of those to a set.
Maybe this do the trick:
val a = Array(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val b = new Array[Map[String, Set[String]]](5)
for (i <- 0 to 4)
b(i) = Map(i.toString -> (Set() ++ (for (s <- a) yield s.split(",")(i))) )
println(b.mkString("\n"))

Flattening a Set of pairs of sets to one pair of sets

I have a for-comprehension with a generator from a Set[MyType]
This MyType has a lazy val variable called factsPair which returns a pair of sets:
(Set[MyFact], Set[MyFact]).
I wish to loop through all of them and unify the facts into one flattened pair (Set[MyFact], Set[MyFact]) as follows, however I am getting No implicit view available ... and not enough arguments for flatten: implicit (asTraversable ... errors. (I am a bit new to Scala so still trying to get used to the errors).
lazy val allFacts =
(for {
mytype <- mytypeList
} yield mytype.factsPair).flatten
What do I need to specify to flatten for this to work?
Scala flatten works on same types. You have a Seq[(Set[MyFact], Set[MyFact])], which can't be flattened.
I would recommend learning the foldLeft function, because it's very general and quite easy to use as soon as you get the hang of it:
lazy val allFacts = myTypeList.foldLeft((Set[MyFact](), Set[MyFact]())) {
case (accumulator, next) =>
val pairs1 = accumulator._1 ++ next.factsPair._1
val pairs2 = accumulator._2 ++ next.factsPair._2
(pairs1, pairs2)
}
The first parameter takes the initial element it will append the other elements to. We start with an empty Tuple[Set[MyFact], Set[MyFact]] initialized like this: (Set[MyFact](), Set[MyFact]()).
Next we have to specify the function that takes the accumulator and appends the next element to it and returns with the new accumulator that has the next element in it. Because of all the tuples, it doesn't look nice, but works.
You won't be able to use flatten for this, because flatten on a collection returns a collection, and a tuple is not a collection.
You can, of course, just split, flatten, and join again:
val pairs = for {
mytype <- mytypeList
} yield mytype.factsPair
val (first, second) = pairs.unzip
val allFacts = (first.flatten, second.flatten)
A tuple isn't traverable, so you can't flatten over it. You need to return something that can be iterated over, like a List, for example:
List((1,2), (3,4)).flatten // bad
List(List(1,2), List(3,4)).flatten // good
I'd like to offer a more algebraic view. What you have here can be nicely solved using monoids. For each monoid there is a zero element and an operation to combine two elements into one.
In this case, sets for a monoid: the zero element is an empty set and the operation is a union. And if we have two monoids, their Cartesian product is also a monoid, where the operations are defined pairwise (see examples on Wikipedia).
Scalaz defines monoids for sets as well as tuples, so we don't need to do anything there. We'll just need a helper function that combines multiple monoid elements into one, which is implemented easily using folding:
def msum[A](ps: Iterable[A])(implicit m: Monoid[A]): A =
ps.foldLeft(m.zero)(m.append(_, _))
(perhaps there already is such a function in Scala, I didn't find it). Using msum we can easily define
def pairs(ps: Iterable[MyType]): (Set[MyFact], Set[MyFact]) =
msum(ps.map(_.factsPair))
using Scalaz's implicit monoids for tuples and sets.

How can I idiomatically "remove" a single element from a list in Scala and close the gap?

Lists are immutable in Scala, so I'm trying to figure out how I can "remove" - really, create a new collection - that element and then close the gap created in the list. This sounds to me like it would be a great place to use map, but I don't know how to get started in this instance.
Courses is a list of strings. I need this loop because I actually have several lists that I will need to remove the element at that index from (I'm using multiple lists to store data associated across lists, and I'm doing this by simply ensuring that the indices will always correspond across lists).
for (i <- 0 until courses.length){
if (input == courses(i) {
//I need a map call on each list here to remove that element
//this element is not guaranteed to be at the front or the end of the list
}
}
}
Let me add some detail to the problem. I have four lists that are associated with each other by index; one list stores the course names, one stores the time the class begins in a simple int format (ie 130), one stores either "am" or "pm", and one stores the days of the classes by int (so "MWF" evals to 1, "TR" evals to 2, etc). I don't know if having multiple this is the best or the "right" way to solve this problem, but these are all the tools I have (first-year comp sci student that hasn't programmed seriously since I was 16). I'm writing a function to remove the corresponding element from each lists, and all I know is that 1) the indices correspond and 2) the user inputs the course name. How can I remove the corresponding element from each list using filterNot? I don't think I know enough about each list to use higher order functions on them.
This is the use case of filter:
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.filter(_ != 2)
res1: List[Int] = List(1, 3, 4, 5)
You want to use map when you are transforming all the elements of a list.
To answer your question directly, I think you're looking for patch, for instance to remove element with index 2 ("c"):
List("a","b","c","d").patch(2, Nil, 1) // List(a, b, d)
where Nil is what we're replacing it with, and 1 is the number of characters to replace.
But, if you do this:
I have four lists that are associated with each other by index; one
list stores the course names, one stores the time the class begins in
a simple int format (ie 130), one stores either "am" or "pm", and one
stores the days of the classes by int
you're going to have a bad time. I suggest you use a case class:
case class Course(name: String, time: Int, ampm: String, day: Int)
and then store them in a Set[Course]. (Storing time and days as Ints isn't a great idea either - have a look at java.util.Calendar instead.)
First a few sidenotes:
List is not an index-based structure. All index-oriented operations on it take linear time. For index-oriented algorithms Vector is a much better candidate. In fact if your algorithm requires indexes it's a sure sign that you're really not exposing Scala's functional capabilities.
map serves for transforming a collection of items "A" to the same collection of items "B" using a passed in transformer function from a single "A" to single "B". It cannot change the number of resulting elements. Probably you've confused map with fold or reduce.
To answer on your updated question
Okay, here's a functional solution, which works effectively on lists:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= (courses, timeList, amOrPmList, dateList)
.zipped
.filterNot(_._1 == input)
.unzip4
But there's a catch. I actually came to be quite astonished to find out that functions used in this solution, which are so basic for functional languages, were not present in the standard Scala library. Scala has them for 2 and 3-ary tuples, but not the others.
To solve that you'll need to have the following implicit extensions imported.
implicit class Tuple4Zipped
[ A, B, C, D ]
( val t : (Iterable[A], Iterable[B], Iterable[C], Iterable[D]) )
extends AnyVal
{
def zipped
= t._1.toStream
.zip(t._2).zip(t._3).zip(t._4)
.map{ case (((a, b), c), d) => (a, b, c, d) }
}
implicit class IterableUnzip4
[ A, B, C, D ]
( val ts : Iterable[(A, B, C, D)] )
extends AnyVal
{
def unzip4
= ts.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
}
This implementation requires Scala 2.10 as it utilizes the new effective Value Classes feature for pimping the existing types.
I have actually included these in a small extensions library called SExt, after depending your project on which you'll be able to have them by simply adding an import sext._ statement.
Of course, if you want you can just compose these functions directly into the solution:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= courses.toStream
.zip(timeList).zip(amOrPmList).zip(dateList)
.map{ case (((a, b), c), d) => (a, b, c, d) }
.filterNot(_._1 == input)
.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
Removing and filtering List elements
In Scala you can filter the list to remove elements.
scala> val courses = List("Artificial Intelligence", "Programming Languages", "Compilers", "Networks", "Databases")
courses: List[java.lang.String] = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
Let's remove a couple of classes:
courses.filterNot(p => p == "Compilers" || p == "Databases")
You can also use remove but it's deprecated in favor of filter or filterNot.
If you want to remove by an index you can associate each element in the list with an ordered index using zipWithIndex. So, courses.zipWithIndex becomes:
List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Programming Languages,1), (Compilers,2), (Networks,3), (Databases,4))
To remove the second element from this you can refer to index in the Tuple with courses.filterNot(_._2 == 1) which gives the list:
res8: List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Compilers,2), (Networks,3), (Databases,4))
Lastly, another tool is to use indexWhere to find the index of an arbitrary element.
courses.indexWhere(_ contains "Languages")
res9: Int = 1
Re your update
I'm writing a function to remove the corresponding element from each
lists, and all I know is that 1) the indices correspond and 2) the
user inputs the course name. How can I remove the corresponding
element from each list using filterNot?
Similar to Nikita's update you have to "merge" the elements of each list. So courses, meridiems, days, and times need to be put into a Tuple or class to hold the related elements. Then you can filter on an element of the Tuple or a field of the class.
Combining corresponding elements into a Tuple looks as follows with this sample data:
val courses = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
val meridiems = List(am, pm, am, pm, am)
val times = List(100, 1200, 0100, 0900, 0800)
val days = List(MWF, TTH, MW, MWF, MTWTHF)
Combine them with zip:
courses zip days zip times zip meridiems
val zipped = List[(((java.lang.String, java.lang.String), java.lang.String), java.lang.String)] = List((((Artificial Intelligence,MWF),100),am), (((Programming Languages,TTH),1200),pm), (((Compilers,MW),0100),am), (((Networks,MWF),0900),pm), (((Databases,MTWTHF),0800),am))
This abomination flattens the nested Tuples to a Tuple. There are better ways.
zipped.map(x => (x._1._1._1, x._1._1._2, x._1._2, x._2)).toList
A nice list of tuples to work with.
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Networks,MWF,0900,pm), (Databases,MTWTHF,0800,am))
Finally we can filter based on course name using filterNot. e.g. filterNot(_._1 == "Networks")
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Databases,MTWTHF,0800,am))
The answer I am about to give might be overstepping what you have been taught so far in your course, so if that is the case I apologise.
Firstly, you are right to question whether you should have four lists - fundamentally, it sounds like what you need is an object which represents a course:
/**
* Represents a course.
* #param name the human-readable descriptor for the course
* #param time the time of day as an integer equivalent to
* 12 hour time, i.e. 1130
* #param meridiem the half of the day that the time corresponds
* to: either "am" or "pm"
* #param days an encoding of the days of the week the classes runs.
*/
case class Course(name : String, timeOfDay : Int, meridiem : String, days : Int)
with which you may define an individual course
val cs101 =
Course("CS101 - Introduction to Object-Functional Programming",
1000, "am", 1)
There are better ways to define this type (better representations of 12-hour time, a clearer way to represent the days of the week, etc), but I won't deviate from your original problem statement.
Given this, you would have a single list of courses:
val courses = List(cs101, cs402, bio101, phil101)
And if you wanted to find and remove all courses that matched a given name, you would write:
val courseToRemove = "PHIL101 - Philosophy of Beard Ownership"
courses.filterNot(course => course.name == courseToRemove)
Equivalently, using the underscore syntactic sugar in Scala for function literals:
courses.filterNot(_.name == courseToRemove)
If there was the risk that more than one course might have the same name (or that you are filtering based on some partial criteria using a regular expression or prefix match) and that you only want to remove the first occurrence, then you could define your own function to do that:
def removeFirst(courses : List[Course], courseToRemove : String) : List[Course] =
courses match {
case Nil => Nil
case head :: tail if head == courseToRemove => tail
case head :: tail => head :: removeFirst(tail)
}
Use the ListBuffer is a mutable List like a java list
var l = scala.collection.mutable.ListBuffer("a","b" ,"c")
print(l) //ListBuffer(a, b, c)
l.remove(0)
print(l) //ListBuffer(b, c)