Scala iterating through 2 collections and finding matches - scala

I have 2 collections: a is a sequence of Scala objects of class C. b is a sequence of strings. C has a string field, name, that could possibly match an item in b. What I want is to loop through a and find all c.name that matches with one of the item in b. How do I do this in Scala?

Iterating through both a and b can get expensive because one loop nested inside another yields O(n^2) time. If b is sufficiently large, you probably want to make it into a Set first to bring this down to O(n).
val bSet = b.toSet;
a.filter(c => b.contains(c.name))
I would read this as "Apply the following filter to a: for each item c in a, include it in the result if and only if the name of c is in b."

Here's the equivalent for loop with yield.
for(c <- a if b.contains(c.name)) yield c.name

Related

What does Array((1L, 2L)) do?

What does the following code do?
var pair: RDD[(Long,Long)] = sparkContext.parallelize(Array((3L, 4L)))
Does variable "pair" consist of a single element/pair? The types of 3 and 4 is "long" not "int".
It creates an Array[Tuple2[Long,Long]] with one element (3L, 4L) (aka Tuple2(3L,4L)).
Such case could be written (maybe more readable), as Array(3L -> 4L).
Also note it's better not to use mutable type like Array (rather Seq or List).
Let's decompose it:
Array(el1, el2, el3, ...) creates a new Array containing the given elements. In our case, there is a single element in the array: (3L,4L).
Writing (a,b) creates a Tuple2 value containing a and b of types A and B (not necessarily the same types). The Tuple2[A,B] type can also be written as (A,B) for clarity.
In our case, a=3L and b=4L. You might be confused by the L suffix after literals. It is there so that the compiler can differentiate between 3 as an Int and 3L as a Long. Scala also supports other such suffixes, like f or d for explicitly declaring Float or Double literals.
So you're indeed creating an Array[Tuple2[Long,Long]], also expressed as Array[(Long,Long)], which goes into your RDD.

Scala combination function issue

I have a input file like this:
The Works of Shakespeare, by William Shakespeare
Language: English
and I want to use flatMap with the combinations method to get the K-V pairs per line.
This is what I do:
var pairs = input.flatMap{line =>
line.split("[\\s*$&#/\"'\\,.:;?!\\[\\(){}<>~\\-_]+")
.filter(_.matches("[A-Za-z]+"))
.combinations(2)
.toSeq
.map{ case array => array(0) -> array(1)}
}
I got 17 pairs after this, but missed 2 of them: (by,shakespeare) and (william,shakespeare). I think there might be something wrong with the last word of the first sentence, but I don't know how to solve it, can anyone tell me?
The combinations method will not give duplicates even if the values are in the opposite order. So the values you are missing already appear in the solution in the other order.
This code will create all ordered pairs of words in the text.
for {
line <- input
t <- line.split("""\W+""").tails if t.length > 1
a = t.head
b <- t.tail
} yield a -> b
Here is the description of the tails method:
Iterates over the tails of this traversable collection. The first value will be this traversable collection and the final one will be an empty traversable collection, with the intervening values the results of successive applications of tail.

Scala for loop over two lists simultaneously

I have a List[Message] and a List[Author] which have the same number of items, and should be ordered so that at each index, the Message is from the Author.
I also have class that we'll call here SmartMessage, with a constructor taking 2 arguments: a Message and the corresponding Author.
What I want to do, is to create a List[SmartMessage], combining the data of the 2 simple lists.
Extra question: does List preserve insertion order in Scala? Just to make sure I create List[Message] and a List[Author] with same ordering.
You could use zip:
val ms: List[Message] = ???
val as: List[Author] = ???
var sms = for ( (m, a) <- (ms zip as)) yield new SmartMessage(m, a)
If you don't like for-comprehensions you could use map:
var sms = (ms zip as).map{ case (m, a) => new SmartMessage(m, a)}
Method zip creates collection of pairs. In this case List[(Message, Author)].
You could also use zipped method on Tuple2 (and on Tuple3):
var sms = (ms, as).zipped.map{ (m, a) => new SmartMessage(m, a)}
As you can see you don't need pattern matching in map in this case.
Extra
List is Seq and Seq preserves order. See scala collections overview.
There are 3 main branches of collections: Seq, Set and Map.
Seq preserves order of elements.
Set contains no duplicate elements.
Map contains mappings from keys to values.
List in scala is linked list, so you should prepend elements to it, not append. See Performance Characteristics of scala collections.

How can I idiomatically "remove" a single element from a list in Scala and close the gap?

Lists are immutable in Scala, so I'm trying to figure out how I can "remove" - really, create a new collection - that element and then close the gap created in the list. This sounds to me like it would be a great place to use map, but I don't know how to get started in this instance.
Courses is a list of strings. I need this loop because I actually have several lists that I will need to remove the element at that index from (I'm using multiple lists to store data associated across lists, and I'm doing this by simply ensuring that the indices will always correspond across lists).
for (i <- 0 until courses.length){
if (input == courses(i) {
//I need a map call on each list here to remove that element
//this element is not guaranteed to be at the front or the end of the list
}
}
}
Let me add some detail to the problem. I have four lists that are associated with each other by index; one list stores the course names, one stores the time the class begins in a simple int format (ie 130), one stores either "am" or "pm", and one stores the days of the classes by int (so "MWF" evals to 1, "TR" evals to 2, etc). I don't know if having multiple this is the best or the "right" way to solve this problem, but these are all the tools I have (first-year comp sci student that hasn't programmed seriously since I was 16). I'm writing a function to remove the corresponding element from each lists, and all I know is that 1) the indices correspond and 2) the user inputs the course name. How can I remove the corresponding element from each list using filterNot? I don't think I know enough about each list to use higher order functions on them.
This is the use case of filter:
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.filter(_ != 2)
res1: List[Int] = List(1, 3, 4, 5)
You want to use map when you are transforming all the elements of a list.
To answer your question directly, I think you're looking for patch, for instance to remove element with index 2 ("c"):
List("a","b","c","d").patch(2, Nil, 1) // List(a, b, d)
where Nil is what we're replacing it with, and 1 is the number of characters to replace.
But, if you do this:
I have four lists that are associated with each other by index; one
list stores the course names, one stores the time the class begins in
a simple int format (ie 130), one stores either "am" or "pm", and one
stores the days of the classes by int
you're going to have a bad time. I suggest you use a case class:
case class Course(name: String, time: Int, ampm: String, day: Int)
and then store them in a Set[Course]. (Storing time and days as Ints isn't a great idea either - have a look at java.util.Calendar instead.)
First a few sidenotes:
List is not an index-based structure. All index-oriented operations on it take linear time. For index-oriented algorithms Vector is a much better candidate. In fact if your algorithm requires indexes it's a sure sign that you're really not exposing Scala's functional capabilities.
map serves for transforming a collection of items "A" to the same collection of items "B" using a passed in transformer function from a single "A" to single "B". It cannot change the number of resulting elements. Probably you've confused map with fold or reduce.
To answer on your updated question
Okay, here's a functional solution, which works effectively on lists:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= (courses, timeList, amOrPmList, dateList)
.zipped
.filterNot(_._1 == input)
.unzip4
But there's a catch. I actually came to be quite astonished to find out that functions used in this solution, which are so basic for functional languages, were not present in the standard Scala library. Scala has them for 2 and 3-ary tuples, but not the others.
To solve that you'll need to have the following implicit extensions imported.
implicit class Tuple4Zipped
[ A, B, C, D ]
( val t : (Iterable[A], Iterable[B], Iterable[C], Iterable[D]) )
extends AnyVal
{
def zipped
= t._1.toStream
.zip(t._2).zip(t._3).zip(t._4)
.map{ case (((a, b), c), d) => (a, b, c, d) }
}
implicit class IterableUnzip4
[ A, B, C, D ]
( val ts : Iterable[(A, B, C, D)] )
extends AnyVal
{
def unzip4
= ts.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
}
This implementation requires Scala 2.10 as it utilizes the new effective Value Classes feature for pimping the existing types.
I have actually included these in a small extensions library called SExt, after depending your project on which you'll be able to have them by simply adding an import sext._ statement.
Of course, if you want you can just compose these functions directly into the solution:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= courses.toStream
.zip(timeList).zip(amOrPmList).zip(dateList)
.map{ case (((a, b), c), d) => (a, b, c, d) }
.filterNot(_._1 == input)
.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
Removing and filtering List elements
In Scala you can filter the list to remove elements.
scala> val courses = List("Artificial Intelligence", "Programming Languages", "Compilers", "Networks", "Databases")
courses: List[java.lang.String] = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
Let's remove a couple of classes:
courses.filterNot(p => p == "Compilers" || p == "Databases")
You can also use remove but it's deprecated in favor of filter or filterNot.
If you want to remove by an index you can associate each element in the list with an ordered index using zipWithIndex. So, courses.zipWithIndex becomes:
List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Programming Languages,1), (Compilers,2), (Networks,3), (Databases,4))
To remove the second element from this you can refer to index in the Tuple with courses.filterNot(_._2 == 1) which gives the list:
res8: List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Compilers,2), (Networks,3), (Databases,4))
Lastly, another tool is to use indexWhere to find the index of an arbitrary element.
courses.indexWhere(_ contains "Languages")
res9: Int = 1
Re your update
I'm writing a function to remove the corresponding element from each
lists, and all I know is that 1) the indices correspond and 2) the
user inputs the course name. How can I remove the corresponding
element from each list using filterNot?
Similar to Nikita's update you have to "merge" the elements of each list. So courses, meridiems, days, and times need to be put into a Tuple or class to hold the related elements. Then you can filter on an element of the Tuple or a field of the class.
Combining corresponding elements into a Tuple looks as follows with this sample data:
val courses = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
val meridiems = List(am, pm, am, pm, am)
val times = List(100, 1200, 0100, 0900, 0800)
val days = List(MWF, TTH, MW, MWF, MTWTHF)
Combine them with zip:
courses zip days zip times zip meridiems
val zipped = List[(((java.lang.String, java.lang.String), java.lang.String), java.lang.String)] = List((((Artificial Intelligence,MWF),100),am), (((Programming Languages,TTH),1200),pm), (((Compilers,MW),0100),am), (((Networks,MWF),0900),pm), (((Databases,MTWTHF),0800),am))
This abomination flattens the nested Tuples to a Tuple. There are better ways.
zipped.map(x => (x._1._1._1, x._1._1._2, x._1._2, x._2)).toList
A nice list of tuples to work with.
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Networks,MWF,0900,pm), (Databases,MTWTHF,0800,am))
Finally we can filter based on course name using filterNot. e.g. filterNot(_._1 == "Networks")
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Databases,MTWTHF,0800,am))
The answer I am about to give might be overstepping what you have been taught so far in your course, so if that is the case I apologise.
Firstly, you are right to question whether you should have four lists - fundamentally, it sounds like what you need is an object which represents a course:
/**
* Represents a course.
* #param name the human-readable descriptor for the course
* #param time the time of day as an integer equivalent to
* 12 hour time, i.e. 1130
* #param meridiem the half of the day that the time corresponds
* to: either "am" or "pm"
* #param days an encoding of the days of the week the classes runs.
*/
case class Course(name : String, timeOfDay : Int, meridiem : String, days : Int)
with which you may define an individual course
val cs101 =
Course("CS101 - Introduction to Object-Functional Programming",
1000, "am", 1)
There are better ways to define this type (better representations of 12-hour time, a clearer way to represent the days of the week, etc), but I won't deviate from your original problem statement.
Given this, you would have a single list of courses:
val courses = List(cs101, cs402, bio101, phil101)
And if you wanted to find and remove all courses that matched a given name, you would write:
val courseToRemove = "PHIL101 - Philosophy of Beard Ownership"
courses.filterNot(course => course.name == courseToRemove)
Equivalently, using the underscore syntactic sugar in Scala for function literals:
courses.filterNot(_.name == courseToRemove)
If there was the risk that more than one course might have the same name (or that you are filtering based on some partial criteria using a regular expression or prefix match) and that you only want to remove the first occurrence, then you could define your own function to do that:
def removeFirst(courses : List[Course], courseToRemove : String) : List[Course] =
courses match {
case Nil => Nil
case head :: tail if head == courseToRemove => tail
case head :: tail => head :: removeFirst(tail)
}
Use the ListBuffer is a mutable List like a java list
var l = scala.collection.mutable.ListBuffer("a","b" ,"c")
print(l) //ListBuffer(a, b, c)
l.remove(0)
print(l) //ListBuffer(b, c)

Nested array comprehensions in CoffeeScript

In Python
def cross(A, B):
"Cross product of elements in A and elements in B."
return [a+b for a in A for b in B]
returns an one-dimensional array if you call it with two arrays (or strings).
But in CoffeeScript
cross = (A, B) -> (a+b for a in A for b in B)
returns a two-dimensional array.
Do you think it's by design in CoffeeScript or is it a bug?
How do I flatten arrays in CoffeScript?
First I would say say that 2 array comprehensions in line is not a very maintainable pattern. So lets break it down a little.
cross = (A, B) ->
for a in A
for b in B
a+b
alert JSON.stringify(cross [1,2], [3,4])
What's happening here is that the inner creates a closure, which has its own comprehension collector. So it runs all the b's, then returns the results as an array which gets pushed onto the parent comprehension result collector. You are sort of expecting a return value from an inner loop, which is a bit funky.
Instead I would simply collect the results myself.
cross = (A, B) ->
results = []
for a in A
for b in B
results.push a + b
results
alert JSON.stringify(cross [1,2], [3,4])
Or if you still wanted to do some crazy comprehension magic:
cross = (A, B) ->
results = []
results = results.concat a+b for b in B for a in A
results
alert JSON.stringify(cross [1,2], [3,4])
Whether this is a bug in CS or not is a bit debatable, I suppose. But I would argue it's good practice to do more explicit comprehension result handling when dealing with nested iterators.
https://github.com/jashkenas/coffee-script/issues/1191