Scala - Create a tuple from another - scala

I am trying to find out how, using the map operator, I can create a tuple with a different number of fields from an existing ones.
Namely, if I have a tuple in the form (String, Int1, Int2) I want to create a tuple with 2 fields in the form of (String, Int1 + Int2), where the first field will be the same as in the original tuple and the second field will be the addition of the 2nd and the 3rd field of the original tuple.

using pattern match:
tuple match { case (a, b, c) => (a, b + c) }
map is not a member of tuple. in case the tuple is type of element in a collection, it can be used.
collection map { case (a, b, c) => (a, b + c) }

I suggest using Shyamendra Solanki's answer, however for the sake of completeness I wanted to point out that Shapeless provides map and flatMap methods for tuples

For a single tuple of type (String,Int,Int), in addition to extracting values with pattern matching as mentioned by #Shyamendra Solanki, note methods _1, _2 and _3; hence
def create(t : (String,Int,Int)) = (t._1, t._2+t._3)
is also a feasible approach, though perhaps not so readable.
For a given collection of tuples, consider also for comprehensions where
def create(xs : Seq[(String,Int,Int)]) = for ((s,i1,i2) <- xs) yield (s, i1+i2 )
conveys the desired semantics.

Related

Flatten value in paired RDD in spark

I have a paired RDD that looks like
(a1, (a2, a3))
(b1, (b2, b3))
...
I want to flatten the values to obtain
(a1, a2, a3)
(b1, b2, b3)
...
Currently I'm doing
rddData.map(x => (x._1, x._2._1, x._2._2))
Is there a better way of performing the conversion? The above solution gets ugly if value contains many elements instead of just 2.
When I'm trying to avoid all the ugly underscore number stuff that comes with tuple manipulation I like to use case notation:
rddData.map { case (a, (b, c)) => (a, b, c) }
You can also give your variables meaningful names to make your code self documenting and the use of curly braces means you have fewer nested parentheses.
EDIT:
The map { case ... } pattern is pretty compact and can be used for surprisingly deep nested tuples as long as the structure is known at compile time. If you absolutely, positively cannot know the structure of the tuple at compile time, then here is some hacky, slow code that, probably, can flatten any arbitrarily nested tuple... as long as there are no more than 23 elements in total. It works by recursivly converting each element of the tuple to a list, flatmap-ing it to a single list, then using scary reflection to convert the list back into a tuple as seen here.
def flatten(b:Product): List[Any] = {
b.productIterator.toList.flatMap {
case x: Product => flatten(x)
case y: Any => List(y)
}
}
def toTuple[Any](as:List[Any]):Product = {
val tupleClass = Class.forName("scala.Tuple" + as.size)
tupleClass.getConstructors.apply(0).newInstance(as.map(_.asInstanceOf[AnyRef]):_*).asInstanceOf[Product]
}
rddData.map(t => toTuple(flatten(t)))
There is no better way. The 1st answer is equivalent to:
val abc2 = xyz.map{ case (k, v) => (k, v._1, v._2) }
which is equivalent to your own example.

Why does Scala require pattern variables to be linear?

Scala requires pattern variables to be linear, i.e. pattern
variable may not occur more than once in a pattern. Thus, this example does not compile:
def tupleTest(tuple: (Int, Int)) = tuple match {
case (a, a) => a
case _ => -1
}
But you can use two pattern variables and a guard to check equality instead:
def tupleTest(tuple: (Int, Int)) = tuple match {
case (a, b) if a == b => a
case _ => -1
}
So why does Scala require pattern variables to be linear? Are there any cases that can not be transformed like this?
Edit
It is easy to transform the first example into the second (Scala to Scala). Of all occurrences of a variable v in the pattern take the expressions that is evaluated first and assign it to the variable v. For each other occurrence introduce a new variable with a name that is not used in the current scope. For each of those variables v' add a guard v == v'. It is the same way a programmer would go (=> same efficiency). Is there any problem with this approach? I'd like to see an example that can not be transformed like this.
Because case (a, b) is basically assigning val a to _._1 and val b to _._2 (at least you can view it like that). In case of case (a, a), you cannot assign val a to both _._1 and _._2.
Actually the thing you want to do would have been looked like
case (a, `a`) => ???
as scala uses backtick to match an identifier. But unfortunately that still doesn't work as the visibility of a is given only after => (would have been fun though, I also hate writing case (a, b) if a = b =>). And the reason of this is probably just because it is harder to write a compiler that supports that

Apache-Spark : What is map(_._2) shorthand for?

I read a project's source code, found:
val sampleMBR = inputMBR.map(_._2).sample
inputMBR is a tuple.
the function map's definition is :
map[U classTag](f:T=>U):RDD[U]
it seems that map(_._2) is the shorthand for map(x => (x._2)).
Anyone can tell me rules of those shorthand ?
The _ syntax can be a bit confusing. When _ is used on its own it represents an argument in the anonymous function. So if we working on pairs:
map(_._2 + _._2) would be shorthand for map(x, y => x._2 + y._2). When _ is used as part of a function name (or value name) it has no special meaning. In this case x._2 returns the second element of a tuple (assuming x is a tuple).
collection.map(_._2) emits a second component of the tuple. Example from pure Scala (Spark RDDs work the same way):
scala> val zipped = (1 to 10).zip('a' to 'j')
zipped: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((1,a), (2,b), (3,c), (4,d), (5,e), (6,f), (7,g), (8,h), (9,i), (10,j))
scala> val justLetters = zipped.map(_._2)
justLetters: scala.collection.immutable.IndexedSeq[Char] = Vector(a, b, c, d, e, f, g, h, i, j)
Two underscores in '_._2' are different.
First '_' is for placeholder of anonymous function; Second '_2' is member of case class Tuple.
Something like:
case class Tuple3 (_1: T1, _2: T2, _3: T3)
{...}
I have found the solutions.
First the underscore here is as placeholder.
To make a function literal even more concise, you can use underscores
as placeholders for one or more parameters, so long as each parameter
appears only one time within the function literal.
See more about underscore in Scala at What are all the uses of an underscore in Scala?.
The first '_' is referring what is mapped to and since what is mapped to is a tuple you might call any function within the tuple and one of the method is '_2' so what below tells us transform input into it's second attribute.

How can I idiomatically "remove" a single element from a list in Scala and close the gap?

Lists are immutable in Scala, so I'm trying to figure out how I can "remove" - really, create a new collection - that element and then close the gap created in the list. This sounds to me like it would be a great place to use map, but I don't know how to get started in this instance.
Courses is a list of strings. I need this loop because I actually have several lists that I will need to remove the element at that index from (I'm using multiple lists to store data associated across lists, and I'm doing this by simply ensuring that the indices will always correspond across lists).
for (i <- 0 until courses.length){
if (input == courses(i) {
//I need a map call on each list here to remove that element
//this element is not guaranteed to be at the front or the end of the list
}
}
}
Let me add some detail to the problem. I have four lists that are associated with each other by index; one list stores the course names, one stores the time the class begins in a simple int format (ie 130), one stores either "am" or "pm", and one stores the days of the classes by int (so "MWF" evals to 1, "TR" evals to 2, etc). I don't know if having multiple this is the best or the "right" way to solve this problem, but these are all the tools I have (first-year comp sci student that hasn't programmed seriously since I was 16). I'm writing a function to remove the corresponding element from each lists, and all I know is that 1) the indices correspond and 2) the user inputs the course name. How can I remove the corresponding element from each list using filterNot? I don't think I know enough about each list to use higher order functions on them.
This is the use case of filter:
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.filter(_ != 2)
res1: List[Int] = List(1, 3, 4, 5)
You want to use map when you are transforming all the elements of a list.
To answer your question directly, I think you're looking for patch, for instance to remove element with index 2 ("c"):
List("a","b","c","d").patch(2, Nil, 1) // List(a, b, d)
where Nil is what we're replacing it with, and 1 is the number of characters to replace.
But, if you do this:
I have four lists that are associated with each other by index; one
list stores the course names, one stores the time the class begins in
a simple int format (ie 130), one stores either "am" or "pm", and one
stores the days of the classes by int
you're going to have a bad time. I suggest you use a case class:
case class Course(name: String, time: Int, ampm: String, day: Int)
and then store them in a Set[Course]. (Storing time and days as Ints isn't a great idea either - have a look at java.util.Calendar instead.)
First a few sidenotes:
List is not an index-based structure. All index-oriented operations on it take linear time. For index-oriented algorithms Vector is a much better candidate. In fact if your algorithm requires indexes it's a sure sign that you're really not exposing Scala's functional capabilities.
map serves for transforming a collection of items "A" to the same collection of items "B" using a passed in transformer function from a single "A" to single "B". It cannot change the number of resulting elements. Probably you've confused map with fold or reduce.
To answer on your updated question
Okay, here's a functional solution, which works effectively on lists:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= (courses, timeList, amOrPmList, dateList)
.zipped
.filterNot(_._1 == input)
.unzip4
But there's a catch. I actually came to be quite astonished to find out that functions used in this solution, which are so basic for functional languages, were not present in the standard Scala library. Scala has them for 2 and 3-ary tuples, but not the others.
To solve that you'll need to have the following implicit extensions imported.
implicit class Tuple4Zipped
[ A, B, C, D ]
( val t : (Iterable[A], Iterable[B], Iterable[C], Iterable[D]) )
extends AnyVal
{
def zipped
= t._1.toStream
.zip(t._2).zip(t._3).zip(t._4)
.map{ case (((a, b), c), d) => (a, b, c, d) }
}
implicit class IterableUnzip4
[ A, B, C, D ]
( val ts : Iterable[(A, B, C, D)] )
extends AnyVal
{
def unzip4
= ts.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
}
This implementation requires Scala 2.10 as it utilizes the new effective Value Classes feature for pimping the existing types.
I have actually included these in a small extensions library called SExt, after depending your project on which you'll be able to have them by simply adding an import sext._ statement.
Of course, if you want you can just compose these functions directly into the solution:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= courses.toStream
.zip(timeList).zip(amOrPmList).zip(dateList)
.map{ case (((a, b), c), d) => (a, b, c, d) }
.filterNot(_._1 == input)
.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
Removing and filtering List elements
In Scala you can filter the list to remove elements.
scala> val courses = List("Artificial Intelligence", "Programming Languages", "Compilers", "Networks", "Databases")
courses: List[java.lang.String] = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
Let's remove a couple of classes:
courses.filterNot(p => p == "Compilers" || p == "Databases")
You can also use remove but it's deprecated in favor of filter or filterNot.
If you want to remove by an index you can associate each element in the list with an ordered index using zipWithIndex. So, courses.zipWithIndex becomes:
List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Programming Languages,1), (Compilers,2), (Networks,3), (Databases,4))
To remove the second element from this you can refer to index in the Tuple with courses.filterNot(_._2 == 1) which gives the list:
res8: List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Compilers,2), (Networks,3), (Databases,4))
Lastly, another tool is to use indexWhere to find the index of an arbitrary element.
courses.indexWhere(_ contains "Languages")
res9: Int = 1
Re your update
I'm writing a function to remove the corresponding element from each
lists, and all I know is that 1) the indices correspond and 2) the
user inputs the course name. How can I remove the corresponding
element from each list using filterNot?
Similar to Nikita's update you have to "merge" the elements of each list. So courses, meridiems, days, and times need to be put into a Tuple or class to hold the related elements. Then you can filter on an element of the Tuple or a field of the class.
Combining corresponding elements into a Tuple looks as follows with this sample data:
val courses = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
val meridiems = List(am, pm, am, pm, am)
val times = List(100, 1200, 0100, 0900, 0800)
val days = List(MWF, TTH, MW, MWF, MTWTHF)
Combine them with zip:
courses zip days zip times zip meridiems
val zipped = List[(((java.lang.String, java.lang.String), java.lang.String), java.lang.String)] = List((((Artificial Intelligence,MWF),100),am), (((Programming Languages,TTH),1200),pm), (((Compilers,MW),0100),am), (((Networks,MWF),0900),pm), (((Databases,MTWTHF),0800),am))
This abomination flattens the nested Tuples to a Tuple. There are better ways.
zipped.map(x => (x._1._1._1, x._1._1._2, x._1._2, x._2)).toList
A nice list of tuples to work with.
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Networks,MWF,0900,pm), (Databases,MTWTHF,0800,am))
Finally we can filter based on course name using filterNot. e.g. filterNot(_._1 == "Networks")
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Databases,MTWTHF,0800,am))
The answer I am about to give might be overstepping what you have been taught so far in your course, so if that is the case I apologise.
Firstly, you are right to question whether you should have four lists - fundamentally, it sounds like what you need is an object which represents a course:
/**
* Represents a course.
* #param name the human-readable descriptor for the course
* #param time the time of day as an integer equivalent to
* 12 hour time, i.e. 1130
* #param meridiem the half of the day that the time corresponds
* to: either "am" or "pm"
* #param days an encoding of the days of the week the classes runs.
*/
case class Course(name : String, timeOfDay : Int, meridiem : String, days : Int)
with which you may define an individual course
val cs101 =
Course("CS101 - Introduction to Object-Functional Programming",
1000, "am", 1)
There are better ways to define this type (better representations of 12-hour time, a clearer way to represent the days of the week, etc), but I won't deviate from your original problem statement.
Given this, you would have a single list of courses:
val courses = List(cs101, cs402, bio101, phil101)
And if you wanted to find and remove all courses that matched a given name, you would write:
val courseToRemove = "PHIL101 - Philosophy of Beard Ownership"
courses.filterNot(course => course.name == courseToRemove)
Equivalently, using the underscore syntactic sugar in Scala for function literals:
courses.filterNot(_.name == courseToRemove)
If there was the risk that more than one course might have the same name (or that you are filtering based on some partial criteria using a regular expression or prefix match) and that you only want to remove the first occurrence, then you could define your own function to do that:
def removeFirst(courses : List[Course], courseToRemove : String) : List[Course] =
courses match {
case Nil => Nil
case head :: tail if head == courseToRemove => tail
case head :: tail => head :: removeFirst(tail)
}
Use the ListBuffer is a mutable List like a java list
var l = scala.collection.mutable.ListBuffer("a","b" ,"c")
print(l) //ListBuffer(a, b, c)
l.remove(0)
print(l) //ListBuffer(b, c)

difference between foldLeft and reduceLeft in Scala

I have learned the basic difference between foldLeft and reduceLeft
foldLeft:
initial value has to be passed
reduceLeft:
takes first element of the collection as initial value
throws exception if collection is empty
Is there any other difference ?
Any specific reason to have two methods with similar functionality?
Few things to mention here, before giving the actual answer:
Your question doesn't have anything to do with left, it's rather about the difference between reducing and folding
The difference is not the implementation at all, just look at the signatures.
The question doesn't have anything to do with Scala in particular, it's rather about the two concepts of functional programming.
Back to your question:
Here is the signature of foldLeft (could also have been foldRight for the point I'm going to make):
def foldLeft [B] (z: B)(f: (B, A) => B): B
And here is the signature of reduceLeft (again the direction doesn't matter here)
def reduceLeft [B >: A] (f: (B, A) => B): B
These two look very similar and thus caused the confusion. reduceLeft is a special case of foldLeft (which by the way means that you sometimes can express the same thing by using either of them).
When you call reduceLeft say on a List[Int] it will literally reduce the whole list of integers into a single value, which is going to be of type Int (or a supertype of Int, hence [B >: A]).
When you call foldLeft say on a List[Int] it will fold the whole list (imagine rolling a piece of paper) into a single value, but this value doesn't have to be even related to Int (hence [B]).
Here is an example:
def listWithSum(numbers: List[Int]) = numbers.foldLeft((List.empty[Int], 0)) {
(resultingTuple, currentInteger) =>
(currentInteger :: resultingTuple._1, currentInteger + resultingTuple._2)
}
This method takes a List[Int] and returns a Tuple2[List[Int], Int] or (List[Int], Int). It calculates the sum and returns a tuple with a list of integers and it's sum. By the way the list is returned backwards, because we used foldLeft instead of foldRight.
Watch One Fold to rule them all for a more in depth explanation.
reduceLeft is just a convenience method. It is equivalent to
list.tail.foldLeft(list.head)(_)
foldLeft is more generic, you can use it to produce something completely different than what you originally put in. Whereas reduceLeft can only produce an end result of the same type or super type of the collection type. For example:
List(1,3,5).foldLeft(0) { _ + _ }
List(1,3,5).foldLeft(List[String]()) { (a, b) => b.toString :: a }
The foldLeft will apply the closure with the last folded result (first time using initial value) and the next value.
reduceLeft on the other hand will first combine two values from the list and apply those to the closure. Next it will combine the rest of the values with the cumulative result. See:
List(1,3,5).reduceLeft { (a, b) => println("a " + a + ", b " + b); a + b }
If the list is empty foldLeft can present the initial value as a legal result. reduceLeft on the other hand does not have a legal value if it can't find at least one value in the list.
For reference, reduceLeft will error if applied to an empty container with the following error.
java.lang.UnsupportedOperationException: empty.reduceLeft
Reworking the code to use
myList foldLeft(List[String]()) {(a,b) => a+b}
is one potential option. Another is to use the reduceLeftOption variant which returns an Option wrapped result.
myList reduceLeftOption {(a,b) => a+b} match {
case None => // handle no result as necessary
case Some(v) => println(v)
}
The basic reason they are both in Scala standard library is probably because they are both in Haskell standard library (called foldl and foldl1). If reduceLeft wasn't, it would quite often be defined as a convenience method in different projects.
From Functional Programming Principles in Scala (Martin Odersky):
The function reduceLeft is defined in terms of a more general function, foldLeft.
foldLeft is like reduceLeft but takes an accumulator z, as an additional parameter, which is returned when foldLeft is called on an empty list:
(List (x1, ..., xn) foldLeft z)(op) = (...(z op x1) op ...) op x
[as opposed to reduceLeft, which throws an exception when called on an empty list.]
The course (see lecture 5.5) provides abstract definitions of these functions, which illustrates their differences, although they are very similar in their use of pattern matching and recursion.
abstract class List[T] { ...
def reduceLeft(op: (T,T)=>T) : T = this match{
case Nil => throw new Error("Nil.reduceLeft")
case x :: xs => (xs foldLeft x)(op)
}
def foldLeft[U](z: U)(op: (U,T)=>U): U = this match{
case Nil => z
case x :: xs => (xs foldLeft op(z, x))(op)
}
}
Note that foldLeft returns a value of type U, which is not necessarily the same type as List[T], but reduceLeft returns a value of the same type as the list).
To really understand what are you doing with fold/reduce,
check this: http://wiki.tcl.tk/17983
very good explanation. once you get the concept of fold,
reduce will come together with the answer above:
list.tail.foldLeft(list.head)(_)
Scala 2.13.3, Demo:
val names = List("Foo", "Bar")
println("ReduceLeft: "+ names.reduceLeft(_+_))
println("ReduceRight: "+ names.reduceRight(_+_))
println("Fold: "+ names.fold("Other")(_+_))
println("FoldLeft: "+ names.foldLeft("Other")(_+_))
println("FoldRight: "+ names.foldRight("Other")(_+_))
outputs:
ReduceLeft: FooBar
ReduceRight: FooBar
Fold: OtherFooBar
FoldLeft: OtherFooBar
FoldRight: FooBarOther