Understanding map in Scala [closed] - scala

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Please help me understand what map(_(0)) means here:
scala> a.collect
res97: Array[org.apache.spark.sql.Row] = Array([1039], [1010], [1002], [926])
scala> a.collect.map(_(0))
res98: Array[Any] = Array(1039, 1010, 1002, 926)

1. .map in functional programming applies the function you want to each element of your collection.
Say, you want to add some data to each element in an array you have, which can be done as below,
scala> val data = Array("a", "b", "c")
data: Array[String] = Array(a, b, c)
scala> data.map(element => element+"-add something")
res10: Array[String] = Array(a-add something, b-add something, c-add something)
Here, I'm saying, on each element add something, but element is unnecessary because you are adding on every element anyway. So, _ is what represents any element here.
So, same map can be done in following way.
scala> data.map(_+"-add something")
res9: Array[String] = Array(a-add something, b-add something, c-add something)
Also, note that _ is used when you have one line mapping function.
2. collection(index) is the way to access nth element in a collection.
eg.
scala> val collection = Array(Vector(1039), Vector(1010), Vector(1002), Vector(926))
collection: Array[scala.collection.immutable.Vector[Int]] = Array(Vector(1039), Vector(1010), Vector(1002), Vector(926))
scala> collection(0)
res13: scala.collection.immutable.Vector[Int] = Vector(1039)
So, combining #1 and #2, in your case you are mapping the original collection and getting the first element.
scala> collection.map(_.head)
res17: Array[Int] = Array(1039, 1010, 1002, 926)
Refs
https://twitter.github.io/scala_school/collections.html#map
Map, Map and flatMap in Scala

You are accessing the zeroth element of the items in the collection a. _ is a common placeholder in Scala when working with the items in a collection.
More concretely, your code is equivalent to
a.collect.map(item => item(0))

Related

Append items to a list in a for loop using an immutable list? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
val list = List()
for(i <- 1 to 10){
list:+i
}
println(list)
This ends up giving me an empty list although it should be filled with numbers from 1 to 10? I have a theory that it creates a new list each time due to the ":" operator but I am not entirely sure. I have solved the issue using a ListBuffer instead but I want to learn how to approach such a problem using immutable lists instead. Thank you.
There is no single functional solution to this class of problem, but here are some options.
For the simple case in the question, you can do this
List.range(1,11) // List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
If you want to calculate a different value for each element based on index, use tabulate:
List.tabulate(10)(x => x*3) // List(0, 3, 6, 9, 12, 15, 18, 21, 24, 27)
(You can pass a function if the logic is more complicated than this)
If you are building a list but are not sure whether you need every element, use Option and then flatten:
def genValue(i: Int): Option[Int] = ???
List.tabulate(10)(genValue).flatten
This will discard any values where genValue returns None and extract the Int where it returns Some(???).
If each operation may return a different number of elements, use List then flatten:
def genValue(i: Int): List[Int] = ???
List.tabulate(10)(genValue).flatten
This will take all the elements from all the List values returned by genValue and put them into a single List[Int].
If the length of the List is not known in advance then the best solution is likely to be a recursive function. While this may seem daunting to start with, it is worth learning how to use them as they are often the cleanest way of solving a problem.
You cannot add an element (that is mutate the list) to an immutable list.
You are right when you say:
I have a theory that it creates a new list each time
as a first step consider
var list = List.empty[Int]
for(i <- 1 to 10) {
list = list :+ i
}
println(list)
note that list is now a variable so that we can reassign value, but the list is still an immutable object. Infact for each iteratin we reassign to the variable list a new list with an element appended
If you don't like the use of a variable you could use a fold operation, which is not much different from the for above, it still construct partial lists adding element one by one
val result = (1 to 10).foldLeft(List.empty[Int]){ (partial_list, item) =>
partial_list :+ item
}
println(result)
Here is the simplified signature of :+:
def :+(elem: B): List[B]
It returns a new List with elem so it does not alter the current list.
To make this work switch to a ListBuffer i.e. something mutable:
import scala.collection.mutable.ListBuffer
val buffer = ListBuffer.empty[Int]
for(i <- 1 to 10) {
buffer += i
}
println(buffer)
If you want to keep the immutable List you could accumulate with fold:
val list = List.empty[Int]
(1 to 10)
.foldLeft(list) { (acc, value) => acc :+ value }

How to pass a List to function in scala? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to pass a list to the scala function, transform it and return it.
val fun=(lst:List[Int]):List[Int]=>lst+10
error: identifier expected but integer literal found.
There are several issues with your code.
You are mixing the definition of the type of fun with its implementation.
You probably want something like this:
val fun: List[Int] => List[Int] = (lst: List[Int]) => lst + 10
Then the type of the parameter is actually redundant with the type definition of fun, so you can remove it:
val fun: List[Int] => List[Int] = lst => lst + 10
You are trying to use the + operation/method on a List[Int] and a Int. There's no such method. If you want to append an item into a list you can use :+ instead:
val fun: List[Int] => List[Int] = lst => lst :+ 10
More about the available methods on List in the official documentation: https://www.scala-lang.org/api/2.13.x/scala/collection/immutable/List.html

how to get a non repeated element in scala val ele = List (1,2,3,4,2,3,5,6) output(1,4,5,6) only non repeated elements [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Given
val ele = List (1,2,3,4,2,3,5,6)
I want output like this
val ele1 = List(1,4,5,6)
Only the non-repeated elements.
Explained differently, you want to filter elements for which the count of their occurences is only one.
Luckily, scala provides function for both those keywords !
val list = List(1, 2, 3, 4, 2, 3, 5, 6)
val result = list.filter(element => list.count(_ == element) == 1)
Note that this function is not computationally optimal, but it's easy to read and to relate to the english formulation :-)
Here's a solution that's not terribly performant but it's pretty easy to understand.
val ele = List (1,2,3,4,2,3,5,6,2)
val distEle = ele.distinct //List(1, 2, 3, 4, 5, 6)
val rslt = distEle diff (ele diff distEle) //List(1, 4, 5, 6)
Get a List of all the distinct elements.
Remove those from the original List of all elements. This creates a List of the repeated elements.
Remove the repeated elements from the distinct elements.
The result preserves the order of unique elements as they appeared in the original list.

Scala - List Variable outside of foreach loop [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
The code below isn't giving me the desired output. I am getting the output of finallist as individual characters separated by commas; I was expecting lists with two values only (filename, sizeofcolumn).
val pathurl="adl://*****.azuredatalakestore.net/<folder>/<sub_folder>"
val filelist=dbutils.fs.ls(pathurl)
val newdf = df.select("path").rdd.map(r => r(0)).collect.toList
var finallist = scala.collection.mutable.ListBuffer.empty[Any]
newdf.foreach(f => {
val MasterPq = spark.read.option("header","true").option("inferSchema","true").parquet(f.toString())
val size = MasterPq.columns.length
val mergedlist = List(f.toString(), size.toString())
mergedlist.map((x => {finallist = finallist ++ x}))
})
println(finallist)
The bug in your code is that you're using the ++ method to append values to your list. This method is used to append two list.
scala> List(1, 2) ++ List(3, 4)
res0: List[Int] = List(1, 2, 3, 4)
In scala strings are viewn as a list of characters, so your appending each individual character to your list.
scala> List(1, 2) ++ "Hello"
res3: List[AnyVal] = List(1, 2, H, e, l, l, o)
Since you're using a mutable list, you can append values with the '+=' method. If you just want to get your code working, than the following should be enough, but it is not a good solution.
// mergedlist.map((x => {finallist = finallist ++ x}))
mergedlist.map((x => finallist += x}))
You're probably new to scala, coming from a imperative language like Java. Scala collections do not work as you're known from such programming languages. Scala's collections are immutable by default. Instead of modifying collections, you're using using functions such as map to build new lists based on the old list.
The map function is one of the most used functions on lists. It takes an anonymous function as parameter that takes one element and transforms it to another value. This function is applied onto all methods of the list thereby build a new list. Here's an example:
scala> val list = List(1, 2, 3).map(i => i * 2)
list: List[Int] = List(2, 4, 6)
In this example, a function that multiplies integers by two is applied onto each element in the list. The results are put into the new list. Maybe this illustration helps to comprehend the process:
List(1, 2, 3)
| | |
* 2 * 2 * 2
↓ ↓ ↓
List(2, 4, 6)
We could use the map function to solve your task.
We can use it to map each element in the newdf list into a tuple with the corresponding (filename, filesize).
val finallist = newdf.map { f =>
val masterPq = spark.read.option("header","true").option("inferSchema","true").parquet(f.toString())
val size = masterPq.columns.length
(f.toString(), size.toString())
}
I think this code is shorter, simpler, easier to read and just way more beautiful. I will definitely recommend you to learn more about Scala's collections and immutable collections in general. Once you understand them, you'll just love them!

Use of underscore and map fuction in dataframes/scala [duplicate]

This question already has an answer here:
Underscores in a Scala map/foreach
(1 answer)
Closed 4 years ago.
I'm trying to understand the use of map function and that underscore _ in the code below. keys is a List[String] and df is a DateFrame. I run an sample and found out listOfVal is a list of column type, but could someone help to explain how this works? What does _ mean in this case and what gets applied by map fuction? Many thanks
val listOfVal = keys.map(df(_))
ps: I've read the two questions suggested but I think they are different use cases
In Scala, _ can act as a place-holder for an anonymous function. For example:
List("A", "B", "C").map(_.toLowerCase)
// `_.toLowerCase` represents anonymous function `x => x.toLowerCase`
// res1: List[String] = List(a, b, c)
List(1, 2, 3, 4, 5).foreach(print(_))
// `print(_)` represents anonymous function `x => print(x)`
// res2: 12345
In your sample code, keys.map(df(_)) is equivalent to:
keys.map(c => df(c))
Let's say your keys is a list of column names:
List[String]("col1", "col2", "col3")
Then it simply gets mapped to:
List[Column](df("col1"), df("col2"), df("col3"))