Scala: How to interpret foldLeft - scala

I have two examples of foldLeft that I cannot really grasp the logic.
First example:
val donuts: List[String] = List("Plain", "Strawberry", "Glazed")
println(donuts.foldLeft("")((acc, curr) => s" $acc, $curr Donut ")) // acc for accumulated, curr for current
This will give
, Plain Donut , Strawberry Donut , Glazed Donut
Does not foldLeft accumulate value that was earlier?
I was expecting the result to be
, Plain Donut , Plain Strawberry Donut , Strawberry Glazed Donut
Second example:
def multi(num:Int,num1:Int):Int=num*10
(1 until 3).foldLeft(1)(multi)
Can someone explain what is happening in every step? I cannot see how this can become 100. Also, the second argument multi is a function. Why does it not take any input variables (num and num1)?

Does not foldLeft accumulate value that was earlier?
yes it does. It uses result of previous iteration and current element and produces new result (using the binary operator you have provided) so for the first example next steps are happening:
the start value of accumulator - ""
acc = "", curr = "Plain" -> " , Plain Donut "
acc = " , Plain Donut ", curr = "Strawberry" -> " , Plain Donut , Strawberry Donut "
acc = " , Plain Donut , Strawberry Donut ", curr = "Strawberry" -> " , Plain Donut , Strawberry Donut , Glazed Donut"
For the second example current value is simply ignored - i.e. multi can be rewritten as def multi(acc:Int, curr:Int):Int = acc*10 where curr is not actually used so foldLeft simply multiplies starting value (1) by 10 n times where n is number of elements in the sequence (i.e. (1 until 3).length which is 2).
Why does it not take any input variables (num and num1)?
foldLeft is a function which accepts a function. It accepts
a generic function which in turn accepts two parameters and returns result of the same type as the first parameter (op: (B, A) => B, where B is the the result type and A is sequence element type). multi matches this definition when B == A == Int and is passed to the foldLeft which will internally provide the input variables on the each step.

A left fold needs three things: a list to works on, an initial value, and a function.
That function takes each element of the list and applies it to the initial value. Each time, the initial value is the result of the previous fucntion application.
Summing a list is a great toy example.
val nums = List(1, 2, 3, 4)
val total = nums.foldLeft(0)((a, b) => a + b)
total is 10.
That initial value has to pass all of the state you need. Currently, you pass the accumulated string. What you should be passing is a tuple giving your function extra information to work with: the accumulated string, and a string representing the previous element in the list.
Your result would look something like:
("Plain donut, ...", "Glazed")
Then you just need to select out the accumulated string and discard the second element.

Related

How to consecutive and non-consecutive list in scala?

val keywords = List("do", "abstract","if")
val resMap = io.Source
.fromFile("src/demo/keyWord.txt")
.getLines()
.zipWithIndex
.foldLeft(Map.empty[String,Seq[Int]].withDefaultValue(Seq.empty[Int])){
case (m, (line, idx)) =>
val subMap = line.split("\\W+")
.toSeq //separate the words
.filter(keywords.contains) //keep only key words
.groupBy(identity) //make a Map w/ keyword as key
.mapValues(_.map(_ => idx+1)) //and List of line numbers as value
.withDefaultValue(Seq.empty[Int])
keywords.map(kw => (kw, m(kw) ++ subMap(kw))).toMap
}
println("keyword\t\tlines\t\tcount")
keywords.sorted.foreach{kw =>
println(kw + "\t\t" +
resMap(kw).distinct.mkString("[",",","]") + "\t\t" +
resMap(kw).length)
}
This code is not mine and i don't own it ... .using for study purpose. However, I am still learning and I am stuck at implement consecutive to nonconsecutive list, such as the word "if" is in many line and when three or more consecutive line numbers appear then they should be written with a dash in between, e.g. 20-22, but not 20, 21, 22. How can I implement? I just wanted to learn this.
output:
keyword lines count
abstract [1] 1
do [6] 1
if [14,15,16,17,18] 5
But I want the result to be such as [14-18] because word "if" is in line 14 to 18.
First off, I'll give the customary caution that SO isn't meant to be a place to crowdsource answers to homework or projects. I'll give you the benefit of the doubt that this isn't the case.
That said, I hope you gain some understanding about breaking down this problem from this suggestion:
your existing implementation has nothing in place to understand if the int values are indeed consecutive, so you are going to need to add some code that sorts the Ints returned from resMap(kw).distinct in order to set yourself up for the next steps. You can figure out how to do this.
you will then need to group the Ints by their consecutive nature. For example, if you have (14,15,16,18,19,20,22) then this really needs to be further grouped into ((14,15,16),(18,19,20),(22)). You can come up with your algorithm for this.
map over the outer collection (which is a Seq[Seq[Int]] at this point), having different handling depending on whether or not the length of the inside Seq is greater than 1. If greater than one, you can safely call head and tail to get the Ints that you need for rendering your range. Alternatively, you can more idiomatically make a for-comprehension that composes the values from headOption and tailOption to build the same range string. You said something about length of 3 in your question, so you can adjust this step to meet that need as necessary.
lastly, now you have Seq[String] looking like ("14-16","18-20","22") that you need to join together using a mkString call similar to what you already have with the square brackets
For reference, you should get further acquainted with the Scaladoc for the Seq trait:
https://www.scala-lang.org/api/2.12.8/scala/collection/Seq.html
Here's one way to go about it.
def collapseConsecutives(nums :Seq[Int]) :List[String] =
nums.foldRight((nums.last, List.empty[List[Int]])) {
case (n, (prev,acc)) if prev-n == 1 => (n, (n::acc.head) :: acc.tail)
case (n, ( _ ,acc)) => (n, List(n) :: acc)
}._2.map{ ns =>
if (ns.length < 3) ns.mkString(",") //1 or 2 non-collapsables
else s"${ns.head}-${ns.last}" //3 or more, collapsed
}
usage:
println(kw + "\t\t" +
collapseConsecutives(resMap(kw).distinct).mkString("[",",","]") + "\t\t" +
resMap(kw).length)

Scala - Use of .indexOf() and .indexWhere()

I have a tuple like the following:
(Age, List(19,17,11,3,2))
and I would like to get the position of the first element where their position in the list is greater than their value. To do this I tried to use .indexOf() and .indexWhere() but I probably can't find exactly the right syntax and so I keep getting:
value indexWhere is not a member of org.apache.spark.rdd.RDD[(String,
Iterable[Int])]
My code so far is:
val test =("Age", List(19,17,11,3,2))
test.indexWhere(_.2(_)<=_.2(_).indexOf(_.2(_)) )
I also searched the documentation here with no result: http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List
If you want to perform this for each element in an RDD, you can use RDD's mapValues (which would only map the right-hand-side of the tuple) and pass a function that uses indexWhere:
rdd.mapValues(_.zipWithIndex.indexWhere { case (v, i) => i+1 > v} + 1)
Notes:
Your example seems wrong, if you want the last matching item it should be 5 (position of 2) and not 4
You did not define what should be done when no item matches your condition, e.g. for List(0,0,0) - in this case the result would be 0 but not sure that's what you need

How do `map` and `reduce` methods work in Spark RDDs?

Following code is from the quick start guide of Apache Spark.
Can somebody explain me what is the "line" variable and where it comes from?
textFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)
Also, how does a value get passed into a,b?
Link to the QSG http://spark.apache.org/docs/latest/quick-start.html
First, according to your link, the textfile is created as
val textFile = sc.textFile("README.md")
such that textfile is a RDD[String] meaning it is a resilient distributed dataset of type String. The API to access is very similar to that of regular Scala collections.
So now what does this map do?
Imagine you have a list of Strings and want to convert that into a list of Ints, representing the length of each String.
val stringlist: List[String] = List("ab", "cde", "f")
val intlist: List[Int] = stringlist.map( x => x.length )
The map method expects a function. A function, that goes from String => Int. With that function, each element of the list is transformed. So the value of intlist is List( 2, 3, 1 )
Here, we have created an anonymous function from String => Int. That is x => x.length. One can even write the function more explicit as
stringlist.map( (x: String) => x.length )
If you do use write the above explicit, you can
val stringLength : (String => Int) = {
x => x.length
}
val intlist = stringlist.map( stringLength )
So, here it is absolutely evident, that stringLength is a function from String to Int.
Remark: In general, map is what makes up a so called Functor. While you provide a function from A => B, map of the functor (here List) allows you use that function also to go from List[A] => List[B]. This is called lifting.
Answers to your questions
What is the "line" variable?
As mentioned above, line is the input parameter of the function line => line.split(" ").size
More explicit
(line: String) => line.split(" ").size
Example: If line is "hello world", the function returns 2.
"hello world"
=> Array("hello", "world") // split
=> 2 // size of Array
How does a value get passed into a,b?
reduce also expects a function from (A, A) => A, where A is the type of your RDD. Lets call this function op.
What does reduce. Example:
List( 1, 2, 3, 4 ).reduce( (x,y) => x + y )
Step 1 : op( 1, 2 ) will be the first evaluation.
Start with 1, 2, that is
x is 1 and y is 2
Step 2: op( op( 1, 2 ), 3 ) - take the next element 3
Take the next element 3:
x is op(1,2) = 3 and y = 3
Step 3: op( op( op( 1, 2 ), 3 ), 4)
Take the next element 4:
x is op(op(1,2), 3 ) = op( 3,3 ) = 6 and y is 4
Result here is the sum of the list elements, 10.
Remark: In general reduce calculates
op( op( ... op(x_1, x_2) ..., x_{n-1}), x_n)
Full example
First, textfile is a RDD[String], say
TextFile
"hello Tyth"
"cool example, eh?"
"goodbye"
TextFile.map(line => line.split(" ").size)
2
3
1
TextFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)
3
Steps here, recall `(a, b) => if (a > b) a else b)`
- op( op(2, 3), 1) evaluates to op(3, 1), since op(2, 3) = 3
- op( 3, 1 ) = 3
Map and reduce are methods of RDD class, which has interface similar to scala collections.
What you pass to methods map and reduce are actually anonymous function (with one param in map, and with two parameters in reduce). textFile calls provided function for every element (line of text in this context) it has.
Maybe you should read some scala collection introduction first.
You can read more about RDD class API here:
https://spark.apache.org/docs/1.2.1/api/scala/#org.apache.spark.rdd.RDD
what map function does is, it takes the list of arguments and map it to some function. Similar to map function in python, if you are familiar.
Also, File is like a list of Strings. (not exactly but that's how it's being iterated)
Let's consider this is your file.
val list_a: List[String] = List("first line", "second line", "last line")
Now let's see how map function works.
We need two things, list of values which we already have and function to which we want to map this values. let's consider really simple function for understanding.
val myprint = (arg:String)=>println(arg)
this function simply takes single String argument and prints on the console.
myprint("hello world")
hello world
if we match this function to your list, it's gonna print all the lines
list_a.map(myprint)
We can write an anonymous function as mentioned below as well, which does the same thing.
list_a.map(arg=>println(arg))
in your case, line is the first line of the file. you could change the argument name as you like. for example, in above example, if I change arg to line it would work without any issue
list_a.map(line=>println(line))

Scala - List of Strings to Square Cypher String

I am following an exercise in Scala to build a square cypher. Here's an overview of the problem:
List("hello", "world", "fille", "r") is written taking the first letter from each String in the List and concatenating to the final string. Essentially, if you write them in square cypher form, you get:
hwfr
eoi
lrl
lll
ode
Which if you read from top to bottom, left to right, is the message. My expected output needs to be a List[String] that becomes List("hwfr", "eoi", ...). I don't know what methods or where to start in order to manipulate the original List in order to adhere to the form that I need. I can't map zip since zip only takes two arguments and I have an indeterminate amount of Strings. I'm not exactly sure how I might iterate over this List to get the result I need and would appreciate any suggestions or tips.
scala> val list = List("hello", "world", "fille", "rtext")
list: List[String] = List(hello, world, fille, rtext)
scala> list.transpose
res6: List[List[Char]] = List(List(h, w, f, r), List(e, o, i, t), List(l, r, l, e), List(l, l, l, x), List(o, d, e, t))
does the trick, api
Here is a version which does not care about equal word length. There should be more efficient versions, but I wanted to keep it relatively short.
Basic idea: Find out how long the longest word is (max). Since you know that, you start with index i = 0 and take the character at that position i from each string and form a string from it until you are at i = max - 1 (which is the position of the last character of the longest word. When the words are not at equal length, you have to make sure that you don't access a character which is not there.
Example: i = 1, then you get e from hello, o from world, i from fille, but accessing character 1 on r would result in an exception. That is why we check for size of the string beforehand and in that case append the empty string. if(i < elem.size) elem(i) else ""
val list = List("hello", "world", "fille", "r")
val max = list.maxBy(_.size).size //gives you the size of the longest word
val result: List[String] = (0 until max).map(i => list.foldLeft("")
((s, elem) => s + (if(i < elem.size) elem(i) else "")))(collection.breakOut)
println(result) //List(hwfr, eoi, lrl, lll, ode)
Edit:
If you still want it to be readable from left-right/top-bottom (if they are not ordered by length and you don't want to order them), you can introduce spaces. Change if(i < elem.size) elem(i) else "" to if(i < elem.size) elem(i) else " ".
List("hello", "world", "fille", "r") would become List(hwfr, eoi , lrl , lll , ode ) and List("hello", "world", "r", "fille") would become List(hwrf, eo i, lr l, ll l, od e)

How to sum the corresponding values in the List into a Tuple?

I have a list details of this type :
case class Detail(point: List[Double], cluster: Int)
val details = List(Detail(List(2.0, 10.0),1), Detail(List(2.0, 5.0),3),
Detail(List(8.0, 4.0),2), Detail(List(5.0, 8.0),2))
I want filter this list into a tuple which contains a sum of each corresponding point where the cluster is 2
So I filter this List :
details.filter(detail => detail.cluster == 2)
which returns :
List(Detail(List(8.0, 4.0),2), Detail(List(5.0, 8.0),2))
It's the summing of the corresponding values I'm having trouble with. In this example the tuple should contain (8+5, 4+8) = (13, 12)
I'm thinking to flatten the List and then sum each corresponding value but
List(details).flatten
just returns the same List
How to sum the corresponding values in the List into a Tuple ?
I could achieve this easily using a for loop and just extract the details I need into a counter but what is the functional solution ?
What do you want to happen if the lists for different Details have different lengths?
Or same length which is different from 2? Tuples are generally only used when you need a fixed in advance number of elements; you won't even be able to write a return type if you need tuples of different lengths.
Assuming that all of them are lists of the same length and you get a list in return, something like this should work (untested):
details.filter(_.cluster == 2).map(_.point).transpose.map(_.sum)
I.e. first get all points as a list of lists, transpose it so you get a list for each "coordinate", and sum each of these lists.
If you do know that each point has two coordinates, this should likely be reflected in your Point type, by using (Double, Double) instead of List[Double] and you can just fold over the list of points, which should be a bit more efficient. Look at definition of foldLeft and the standard implementation of sum in terms of foldLeft:
def sum(list: List[Int]): Int = list.foldLeft(0)((acc, x) => acc + x)
and it should be easy to do what you want.
You can use just one foldLeft with PF without filter:
details.foldLeft((0.0,0.0))({
case ((accX, accY), Detail(x :: y :: Nil, 2)) => (accX + x, accY + y)
case (acc, _) => acc
})
res1: (Double, Double) = (13.0,12.0)