Scala List of tuples to flat list - scala

I have list of tuple pairs, List[(String,String)] and want to flatten it to a list of strings, List[String].

Some of the options might be:
concatenate:
list.map(t => t._1 + t._2)
one after the other interleaved (after your comment it seems you were asking for this):
list.flatMap(t => List(t._1, t._2))
split and append them:
list.map(_._1) ++ list.map(_._2)

Well, you can always use flatMap as in:
list flatMap (x => List(x._1, x._2))
Although your question is a little vague.

Try:
val tt = List(("John","Paul"),("George","Ringo"))
tt.flatMap{ case (a,b) => List(a,b) }
This results in:
List(John, Paul, George, Ringo)

In general for lists of tuples of any arity, consider this,
myTuplesList.map(_.productIterator.map(_.toString)).flatten
Note the productIterator casts all types in a tuple to Any, hence we recast values here to String.

See -
https://stackoverflow.com/a/43716004/4610065
In this case -
import syntax.std.tuple._
List(("John","Paul"),("George","Ringo")).flatMap(_.toList)

Related

Conditionally using .reverse in the same line with Scala

I have a composition of combinators in Scala, and the last one is .top, which I could use as .top(num)(Ordering[(Int, Int)].reverse) depending on a boolean parameter.
How do I implement this composition of combinators to use or not use .reverse depending on the boolean parameter, in the same line? I mean, without creating another val to indicate whether .reverse is used?
val mostPopularHero = sparkContext
.textFile("resource/marvel/Marvel-graph.txt") // build up superhero co-apperance data
.map(countCoOccurrences) // convert to (hero ID, number of connections) RDD
.reduceByKey((x, y) => x + y) // combine entries that span more than one line
.map(x => (x._2, x._1)) // flip it from (hero ID, number of connections) to (number of connections, hero ID)
.top(num)(Ordering[(Int, Int)].reverse)
Solution 0
As nicodp has already pointed out, if you have a boolean variable b in scope, you can simply replace the expression
Ordering[(Int, Int)]
by an if-expression
if (b) Ordering[(Int, Int)] else Ordering[(Int, Int)].reverse
I have to admit that this is the shortest and clearest solution I could come up with.
However... I didn't quite like that the expression Ordering[(Int, Int)] appears in the code twice. It doesn't really matter in this case, because it's short, but what if the expression were a bit longer? Apparently, even Ruby has something for such cases.
So, I tried to come up with some ways to not repeat the subexpression Ordering[(Int, Int)]. The nicest solution would be if we had a default Id-monad implementation in the standard library, because then we could simply wrap the one value in pure, and then map it using the boolean.
But there is no Id in standard library. So, here are a few other proposals, just for the case that the expression in question becomes longer:
Solution 1
You can use blocks as expressions in scala, so you can replace the above
Ordering[(Int, Int)] by:
{val x = Ordering[(Int, Int)]; if (b) x else x.reverse}
Update: Wait! This is shorter than the version with repetition! ;)
Solution 2
Define the function that conditionally reverses an ordering, declare Ordering[(Int, Int)] as the type of the argument, and then
instead of re-typing Ordering[(Int, Int)] as an expression, use implicitly:
((x: Ordering[(Int, Int)]) => if (b) x else x.reverse)(implicitly)
Solution 3
We don't have Id, but we can abuse constructors and eliminators of other functors. For example, one could wrap the complex expression in a List or Option, then map it, then unpack the result. Here is a variant with Some:
Some(Ordering[(Int, Int)]).map{ x => if(b) x else x.reverse }.get
Ideally, this would have been Id instead of Some. Notice that Solution 1 does something similar with the default ambient monad.
Solution 4
Finally, if the above pattern occurs more than once in your code, it might be worth it to introduce some extra syntax to deal with it:
implicit class ReversableOrderingOps[X](ord: Ordering[X]) {
def reversedIf(b: Boolean): Ordering[X] = if (b) ord.reverse else ord
}
Now you can define orderings like this:
val myConditionHolds = true
val myOrd = Ordering[(Int, Int)] reversedIf myConditionHolds
or use it in your lengthy expression directly:
val mostPopularHero = sparkContext
.textFile("resource/marvel/Marvel-graph.txt")
.map(countCoOccurrences)
.reduceByKey((x, y) => x + y)
.map(x => (x._2, x._1))
.top(num)(Ordering[(Int, Int)] reversedIf myConditionHolds)
I'm not quite sure if you have access to the boolean parameter here or not, but you can work this out as follows:
.top(num)(if (booleanParameter) Ordering[(Int, Int)].reverse else Ordering[(Int, Int)])

Scala: reduceLeft with String

I have a list of Integers and I want to make a String of it.
var xs = list(1,2,3,4,5)
(xs foldLeft "") (_+_) // String = 12345
with foldLeft it works perfect, but my question is does it also work with reduceLeft? And if yes, how?
It cannot work this way with reduceLeft. Informally you can view reduceLeft as a special case of foldLeft where the accumulated value is of the same type as the collection's elements. Because in your case the element type is Int and the accumulated value is String, there is no way to use reduceLeft in the way you used foldLeft.
However in this specific case you can simply convert all your Int elements to String up front, and then reduce:
scala> xs.map(_.toString) reduceLeft(_+_)
res5: String = 12345
Note that this will throw an exception if the list is empty. This is another difference with foldLeft, which handles the empty case just fine (because it has an explicit starting value).
This is also less efficient because we create a whole new collection (of strings) just to reduce it on the spot.
All in all, foldLeft is a much better choice here.
It takes a little bit of work to make sure the types are understood correctly. Expanding them, though, you could use something like:
(xs reduceLeft ((a: Any, b: Int) => a + b.toString)).toString

Adding Constant to RDD

I have a really stupid question, I know that a RDD is immutable, but is there any way that you can add a column of constant to a RDD?
More specifically, I have an RDD of RDD[a:String, b:String], I wish to add a column of 1's after it so that I have a RDD of RDD[a:Stirng, b:String, c:Int].
The reason is that I want to use the reduceByKey function to process these strings, and an arbitrary Int (that will be constantly updated) will help the function in reducing.
Solution in Scala is to use map simply
rdd.map( t => (t._1, t._2, 1))
Or
rdd.map{ case (a, b) => (a, b, 1)}
You can easily do it with map function, here's an example in Python:
rdd.map(lambda (a,b): (a,b,1))

How to use Scala reduceLeft on case classes?

I understand how to use reduceLeft on simple lists of integers but attempts to use if on case class objects fail.
Assume I have:
case class LogMsg(time:Int, cat:String, msg:String)
val cList = List(LogMsg(1,"a", "bla"), LogMsg(2,"a", "bla"), LogMsg(4,"b", "bla"))
and I want to find the largest difference in time between LogMsgs.
I want to do something like:
cList.reduceLeft((a,b) => (b.time - a.time)
which of course doesn't work.
The first iteration of reduceLeft compares the first two elements, which are both of type LogMsg. After that it compares the next element (LogMsg) with the result of the first iteration (Int).
Do I just have the syntax wrong or should I be doing this another way?
I'd probably do something like this:
(cList, cList.tail).zipped.map((a, b) => b.time - a.time).max
You'll need to check beforehand that cList has at least 2 elements.
reduceLeft can't be used to return the largest difference, because it always returns the type of the List you're reducing, i.e. LogMsg in this case, and you're asking for an Int.
My try:
cList.sliding(2).map(t => t(1).time - t(0).time).max
Another one that came into my mind: since LogMsg is a case class, we can take advantage of pattern matching:
cList.sliding(2).collect{
case List(LogMsg(a, _, _), LogMsg(b, _, _)) => b - a}.
max
I would recommand you to use foldLeft which is a reduceLeft enabling you to initialize the results.
val head::tail = cList
tail.foldLeft((head.time, 0)) ((a,b) => (b.time, math.max(a._2,b.time-a._1)))._2

difference between foldLeft and reduceLeft in Scala

I have learned the basic difference between foldLeft and reduceLeft
foldLeft:
initial value has to be passed
reduceLeft:
takes first element of the collection as initial value
throws exception if collection is empty
Is there any other difference ?
Any specific reason to have two methods with similar functionality?
Few things to mention here, before giving the actual answer:
Your question doesn't have anything to do with left, it's rather about the difference between reducing and folding
The difference is not the implementation at all, just look at the signatures.
The question doesn't have anything to do with Scala in particular, it's rather about the two concepts of functional programming.
Back to your question:
Here is the signature of foldLeft (could also have been foldRight for the point I'm going to make):
def foldLeft [B] (z: B)(f: (B, A) => B): B
And here is the signature of reduceLeft (again the direction doesn't matter here)
def reduceLeft [B >: A] (f: (B, A) => B): B
These two look very similar and thus caused the confusion. reduceLeft is a special case of foldLeft (which by the way means that you sometimes can express the same thing by using either of them).
When you call reduceLeft say on a List[Int] it will literally reduce the whole list of integers into a single value, which is going to be of type Int (or a supertype of Int, hence [B >: A]).
When you call foldLeft say on a List[Int] it will fold the whole list (imagine rolling a piece of paper) into a single value, but this value doesn't have to be even related to Int (hence [B]).
Here is an example:
def listWithSum(numbers: List[Int]) = numbers.foldLeft((List.empty[Int], 0)) {
(resultingTuple, currentInteger) =>
(currentInteger :: resultingTuple._1, currentInteger + resultingTuple._2)
}
This method takes a List[Int] and returns a Tuple2[List[Int], Int] or (List[Int], Int). It calculates the sum and returns a tuple with a list of integers and it's sum. By the way the list is returned backwards, because we used foldLeft instead of foldRight.
Watch One Fold to rule them all for a more in depth explanation.
reduceLeft is just a convenience method. It is equivalent to
list.tail.foldLeft(list.head)(_)
foldLeft is more generic, you can use it to produce something completely different than what you originally put in. Whereas reduceLeft can only produce an end result of the same type or super type of the collection type. For example:
List(1,3,5).foldLeft(0) { _ + _ }
List(1,3,5).foldLeft(List[String]()) { (a, b) => b.toString :: a }
The foldLeft will apply the closure with the last folded result (first time using initial value) and the next value.
reduceLeft on the other hand will first combine two values from the list and apply those to the closure. Next it will combine the rest of the values with the cumulative result. See:
List(1,3,5).reduceLeft { (a, b) => println("a " + a + ", b " + b); a + b }
If the list is empty foldLeft can present the initial value as a legal result. reduceLeft on the other hand does not have a legal value if it can't find at least one value in the list.
For reference, reduceLeft will error if applied to an empty container with the following error.
java.lang.UnsupportedOperationException: empty.reduceLeft
Reworking the code to use
myList foldLeft(List[String]()) {(a,b) => a+b}
is one potential option. Another is to use the reduceLeftOption variant which returns an Option wrapped result.
myList reduceLeftOption {(a,b) => a+b} match {
case None => // handle no result as necessary
case Some(v) => println(v)
}
The basic reason they are both in Scala standard library is probably because they are both in Haskell standard library (called foldl and foldl1). If reduceLeft wasn't, it would quite often be defined as a convenience method in different projects.
From Functional Programming Principles in Scala (Martin Odersky):
The function reduceLeft is defined in terms of a more general function, foldLeft.
foldLeft is like reduceLeft but takes an accumulator z, as an additional parameter, which is returned when foldLeft is called on an empty list:
(List (x1, ..., xn) foldLeft z)(op) = (...(z op x1) op ...) op x
[as opposed to reduceLeft, which throws an exception when called on an empty list.]
The course (see lecture 5.5) provides abstract definitions of these functions, which illustrates their differences, although they are very similar in their use of pattern matching and recursion.
abstract class List[T] { ...
def reduceLeft(op: (T,T)=>T) : T = this match{
case Nil => throw new Error("Nil.reduceLeft")
case x :: xs => (xs foldLeft x)(op)
}
def foldLeft[U](z: U)(op: (U,T)=>U): U = this match{
case Nil => z
case x :: xs => (xs foldLeft op(z, x))(op)
}
}
Note that foldLeft returns a value of type U, which is not necessarily the same type as List[T], but reduceLeft returns a value of the same type as the list).
To really understand what are you doing with fold/reduce,
check this: http://wiki.tcl.tk/17983
very good explanation. once you get the concept of fold,
reduce will come together with the answer above:
list.tail.foldLeft(list.head)(_)
Scala 2.13.3, Demo:
val names = List("Foo", "Bar")
println("ReduceLeft: "+ names.reduceLeft(_+_))
println("ReduceRight: "+ names.reduceRight(_+_))
println("Fold: "+ names.fold("Other")(_+_))
println("FoldLeft: "+ names.foldLeft("Other")(_+_))
println("FoldRight: "+ names.foldRight("Other")(_+_))
outputs:
ReduceLeft: FooBar
ReduceRight: FooBar
Fold: OtherFooBar
FoldLeft: OtherFooBar
FoldRight: FooBarOther