Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
The code below isn't giving me the desired output. I am getting the output of finallist as individual characters separated by commas; I was expecting lists with two values only (filename, sizeofcolumn).
val pathurl="adl://*****.azuredatalakestore.net/<folder>/<sub_folder>"
val filelist=dbutils.fs.ls(pathurl)
val newdf = df.select("path").rdd.map(r => r(0)).collect.toList
var finallist = scala.collection.mutable.ListBuffer.empty[Any]
newdf.foreach(f => {
val MasterPq = spark.read.option("header","true").option("inferSchema","true").parquet(f.toString())
val size = MasterPq.columns.length
val mergedlist = List(f.toString(), size.toString())
mergedlist.map((x => {finallist = finallist ++ x}))
})
println(finallist)
The bug in your code is that you're using the ++ method to append values to your list. This method is used to append two list.
scala> List(1, 2) ++ List(3, 4)
res0: List[Int] = List(1, 2, 3, 4)
In scala strings are viewn as a list of characters, so your appending each individual character to your list.
scala> List(1, 2) ++ "Hello"
res3: List[AnyVal] = List(1, 2, H, e, l, l, o)
Since you're using a mutable list, you can append values with the '+=' method. If you just want to get your code working, than the following should be enough, but it is not a good solution.
// mergedlist.map((x => {finallist = finallist ++ x}))
mergedlist.map((x => finallist += x}))
You're probably new to scala, coming from a imperative language like Java. Scala collections do not work as you're known from such programming languages. Scala's collections are immutable by default. Instead of modifying collections, you're using using functions such as map to build new lists based on the old list.
The map function is one of the most used functions on lists. It takes an anonymous function as parameter that takes one element and transforms it to another value. This function is applied onto all methods of the list thereby build a new list. Here's an example:
scala> val list = List(1, 2, 3).map(i => i * 2)
list: List[Int] = List(2, 4, 6)
In this example, a function that multiplies integers by two is applied onto each element in the list. The results are put into the new list. Maybe this illustration helps to comprehend the process:
List(1, 2, 3)
| | |
* 2 * 2 * 2
↓ ↓ ↓
List(2, 4, 6)
We could use the map function to solve your task.
We can use it to map each element in the newdf list into a tuple with the corresponding (filename, filesize).
val finallist = newdf.map { f =>
val masterPq = spark.read.option("header","true").option("inferSchema","true").parquet(f.toString())
val size = masterPq.columns.length
(f.toString(), size.toString())
}
I think this code is shorter, simpler, easier to read and just way more beautiful. I will definitely recommend you to learn more about Scala's collections and immutable collections in general. Once you understand them, you'll just love them!
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
val list = List()
for(i <- 1 to 10){
list:+i
}
println(list)
This ends up giving me an empty list although it should be filled with numbers from 1 to 10? I have a theory that it creates a new list each time due to the ":" operator but I am not entirely sure. I have solved the issue using a ListBuffer instead but I want to learn how to approach such a problem using immutable lists instead. Thank you.
There is no single functional solution to this class of problem, but here are some options.
For the simple case in the question, you can do this
List.range(1,11) // List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
If you want to calculate a different value for each element based on index, use tabulate:
List.tabulate(10)(x => x*3) // List(0, 3, 6, 9, 12, 15, 18, 21, 24, 27)
(You can pass a function if the logic is more complicated than this)
If you are building a list but are not sure whether you need every element, use Option and then flatten:
def genValue(i: Int): Option[Int] = ???
List.tabulate(10)(genValue).flatten
This will discard any values where genValue returns None and extract the Int where it returns Some(???).
If each operation may return a different number of elements, use List then flatten:
def genValue(i: Int): List[Int] = ???
List.tabulate(10)(genValue).flatten
This will take all the elements from all the List values returned by genValue and put them into a single List[Int].
If the length of the List is not known in advance then the best solution is likely to be a recursive function. While this may seem daunting to start with, it is worth learning how to use them as they are often the cleanest way of solving a problem.
You cannot add an element (that is mutate the list) to an immutable list.
You are right when you say:
I have a theory that it creates a new list each time
as a first step consider
var list = List.empty[Int]
for(i <- 1 to 10) {
list = list :+ i
}
println(list)
note that list is now a variable so that we can reassign value, but the list is still an immutable object. Infact for each iteratin we reassign to the variable list a new list with an element appended
If you don't like the use of a variable you could use a fold operation, which is not much different from the for above, it still construct partial lists adding element one by one
val result = (1 to 10).foldLeft(List.empty[Int]){ (partial_list, item) =>
partial_list :+ item
}
println(result)
Here is the simplified signature of :+:
def :+(elem: B): List[B]
It returns a new List with elem so it does not alter the current list.
To make this work switch to a ListBuffer i.e. something mutable:
import scala.collection.mutable.ListBuffer
val buffer = ListBuffer.empty[Int]
for(i <- 1 to 10) {
buffer += i
}
println(buffer)
If you want to keep the immutable List you could accumulate with fold:
val list = List.empty[Int]
(1 to 10)
.foldLeft(list) { (acc, value) => acc :+ value }
This question already has an answer here:
Underscores in a Scala map/foreach
(1 answer)
Closed 4 years ago.
I'm trying to understand the use of map function and that underscore _ in the code below. keys is a List[String] and df is a DateFrame. I run an sample and found out listOfVal is a list of column type, but could someone help to explain how this works? What does _ mean in this case and what gets applied by map fuction? Many thanks
val listOfVal = keys.map(df(_))
ps: I've read the two questions suggested but I think they are different use cases
In Scala, _ can act as a place-holder for an anonymous function. For example:
List("A", "B", "C").map(_.toLowerCase)
// `_.toLowerCase` represents anonymous function `x => x.toLowerCase`
// res1: List[String] = List(a, b, c)
List(1, 2, 3, 4, 5).foreach(print(_))
// `print(_)` represents anonymous function `x => print(x)`
// res2: 12345
In your sample code, keys.map(df(_)) is equivalent to:
keys.map(c => df(c))
Let's say your keys is a list of column names:
List[String]("col1", "col2", "col3")
Then it simply gets mapped to:
List[Column](df("col1"), df("col2"), df("col3"))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Please help me understand what map(_(0)) means here:
scala> a.collect
res97: Array[org.apache.spark.sql.Row] = Array([1039], [1010], [1002], [926])
scala> a.collect.map(_(0))
res98: Array[Any] = Array(1039, 1010, 1002, 926)
1. .map in functional programming applies the function you want to each element of your collection.
Say, you want to add some data to each element in an array you have, which can be done as below,
scala> val data = Array("a", "b", "c")
data: Array[String] = Array(a, b, c)
scala> data.map(element => element+"-add something")
res10: Array[String] = Array(a-add something, b-add something, c-add something)
Here, I'm saying, on each element add something, but element is unnecessary because you are adding on every element anyway. So, _ is what represents any element here.
So, same map can be done in following way.
scala> data.map(_+"-add something")
res9: Array[String] = Array(a-add something, b-add something, c-add something)
Also, note that _ is used when you have one line mapping function.
2. collection(index) is the way to access nth element in a collection.
eg.
scala> val collection = Array(Vector(1039), Vector(1010), Vector(1002), Vector(926))
collection: Array[scala.collection.immutable.Vector[Int]] = Array(Vector(1039), Vector(1010), Vector(1002), Vector(926))
scala> collection(0)
res13: scala.collection.immutable.Vector[Int] = Vector(1039)
So, combining #1 and #2, in your case you are mapping the original collection and getting the first element.
scala> collection.map(_.head)
res17: Array[Int] = Array(1039, 1010, 1002, 926)
Refs
https://twitter.github.io/scala_school/collections.html#map
Map, Map and flatMap in Scala
You are accessing the zeroth element of the items in the collection a. _ is a common placeholder in Scala when working with the items in a collection.
More concretely, your code is equivalent to
a.collect.map(item => item(0))
Has anyone got an example of how to use andThen with Lists? I notice that andThen is defined for List but the documentations hasn't got an example to show how to use it.
My understanding is that f andThen g means that execute function f and then execute function g. The input of function g is output of function f. Is this correct?
Question 1 - I have written the following code but I do not see why I should use andThen because I can achieve the same result with map.
scala> val l = List(1,2,3,4,5)
l: List[Int] = List(1, 2, 3, 4, 5)
//simple function that increments value of element of list
scala> def f(l:List[Int]):List[Int] = {l.map(x=>x-1)}
f: (l: List[Int])List[Int]
//function which decrements value of elements of list
scala> def g(l:List[Int]):List[Int] = {l.map(x=>x+1)}
g: (l: List[Int])List[Int]
scala> val p = f _ andThen g _
p: List[Int] => List[Int] = <function1>
//printing original list
scala> l
res75: List[Int] = List(1, 2, 3, 4, 5)
//p works as expected.
scala> p(l)
res74: List[Int] = List(1, 2, 3, 4, 5)
//but I can achieve the same with two maps. What is the point of andThen?
scala> l.map(x=>x+1).map(x=>x-1)
res76: List[Int] = List(1, 2, 3, 4, 5)
Could someone share practical examples where andThen is more useful than methods like filter, map etc. One use I could see above is that with andThen, I could create a new function,p, which is a combination of other functions. But this use brings out usefulness of andThen, not List and andThen
andThen is inherited from PartialFunction a few parents up the inheritance tree for List. You use List as a PartialFunction when you access its elements by index. That is, you can think of a List as a function from an index (from zero) to the element that occupies that index within the list itself.
If we have a list:
val list = List(1, 2, 3, 4)
We can call list like a function (because it is one):
scala> list(0)
res5: Int = 1
andThen allows us to compose one PartialFunction with another. For example, perhaps I want to create a List where I can access its elements by index, and then multiply the element by 2.
val list2 = list.andThen(_ * 2)
scala> list2(0)
res7: Int = 2
scala> list2(1)
res8: Int = 4
This is essentially the same as using map on the list, except the computation is lazy. Of course, you could accomplish the same thing with a view, but there might be some generic case where you'd want to treat the List as just a PartialFunction, instead (I can't think of any off the top of my head).
In your code, you aren't actually using andThen on the List itself. Rather, you're using it for functions that you're passing to map, etc. There is no difference in the results between mapping a List twice over f and g and mapping once over f andThen g. However, using the composition is preferred when mapping multiple times becomes expensive. In the case of Lists, traversing multiple times can become a tad computationally expensive when the list is large.
With the solution l.map(x=>x+1).map(x=>x-1) you are traversing the list twice.
When composing 2 functions using the andThen combinator and then applying it to the list, you only traverse the list once.
val h = ((x:Int) => x+1).andThen((x:Int) => x-1)
l.map(h) //traverses it only once
Sorry for the lack of a descriptive title; I couldn't think of anything better. Edit it if you think of one.
Let's say I have two Lists of Objects, and they are always changing. They need to remain as separate lists, but many operations have to be done on both of them. This leads me to doing stuff like:
//assuming A and B are the lists
A.foo(params)
B.foo(params)
In other words, I'm doing the exact same operation to two different lists at many places in my code. I would like a way to reduce them down to one list without explicitly having to construct another list. I know that just combining lists A and b into a list C would solve all my problems, but then we'd just be back to the same operation if I needed to add a new object to the list (because I'd have to add it to C as well as its respective list).
It's in a tight loop and performance is very important. Is there any way to construct an iterator or something that would iterate A and then move on to B, all transparently? I know another solution would be to construct the combined list (C) every time I'd like to perform some kind of function on both of these lists, but that is a huge waste of time (computationally speaking).
Iterator is what you need here. Turning a List into an Iterator and concatenating 2 Iterators are both O(1) operations.
scala> val l1 = List(1, 2, 3)
l1: List[Int] = List(1, 2, 3)
scala> val l2 = List(4, 5, 6)
l2: List[Int] = List(4, 5, 6)
scala> (l1.iterator ++ l2.iterator) foreach (println(_)) // use List.elements for Scala 2.7.*
1
2
3
4
5
6
I'm not sure if I understand what's your meaning.
Anyway, this is my solution:
scala> var listA :List[Int] = Nil
listA: List[Int] = List()
scala> var listB :List[Int] = Nil
listB: List[Int] = List()
scala> def dealWith(op : List[Int] => Unit){ op(listA); op(listB) }
dealWith: ((List[Int]) => Unit)Unit
and then if you want perform a operator in both listA and listB,you can use like following:
scala> listA ::= 1
scala> listB ::= 0
scala> dealWith{ _ foreach println }
1
0