Creating a list of StructFields from data frame - scala

I need to ultimately build a schema from a CSV. I can read the CSV into data frame, and I've got a case class defined.
case class metadata_class (colname:String,datatype:String,length:Option[Int],precision:Option[int])
val foo = spark.read.format("csv").option("delimiter",",").option("header","true").schema(Encoders.product[metadata_class.schema).load("/path/to/file").as[metadata_file].toDF()
Now I'm trying to iterate through that data frame and build a list of StructFields. My current effort:
val sList: List[StructField] = List(
for (m <- foo.as[metadata_class].collect) {
StructField[m.colname,getType(m.datatype))
})
That gives me a type mismatch:
found : Unit
required: org.apache.spark.sql.types.StructField
for (m <- foo.as[metadata_class].collect) {
^
What am I doing wrong here? Or am I not even close?

There is not usual to use for-loop in scala. For loop has Unit return type, and in your code, result value of sList will be List[Unit]:
val sList: List[Unit] = List(
for (m <- foo.as[metadata_class].collect) {
StructField(m.colname, getType(m.datatype))
}
)
but you declared sList as List[StructField] this is the cause of compile error.
I suppose you should use map function instead of for loop for iterate on metadata_class objects and create StructFields from them:
val structFields: List[StructField] = foo.as[metadata_class]
.collect
.map(m => StructField(m.colname, getType(m.datatype)))
.toList
you will earn List[StructField] such way.
In scala language every statement is expression with return type, for-loop also and it return type is Unit.
read more about statements/expressions:
statement vs expression in scala
statements and expressions in scala

Related

appending elements to list of list in scala

i have created a empty scala mutable list
import scala.collection.mutable.ListBuffer
val list_of_list : List[List[String]] = List.empty
i want to append elements to it as below
filtered_df.collect.map(
r => {
val val_list = List(r(0).toString,r(4).toString,r(5).toString)
list_of_list += val_list
}
)
error that i am getting is
Error:(113, 26) value += is not a member of List[List[String]]
Expression does not convert to assignment because receiver is not assignable.
list_of_list += val_list
Can someone help
Your declaration seems wrong:
val list_of_list : List[List[String]] = List.empty
means that you've declared scala.collection.immutable.List whose operations return a new list without changing the current.
To fix the error you need to change the outer List type to ListBuffer that you imported above the declaration as follows:
val list_of_list : ListBuffer[List[String]] = ListBuffer.empty
Also it looks like you don't to use map here unless you want to modify your data collected from DataFrame, so you can change it to foreach:
filtered_df.collect.foreach {
r => {
val val_list = List(r(0).toString,r(4).toString,r(5).toString)
list_of_list += val_list
}
}
Furthermore you can make it in a functional way without resorting to ListBuffer, by using immutable List and foldRight as follows:
val list_of_list: List[List[String]] =
filtered_df.collect.toList
.foldRight(List.empty[List[String]])((r, acc) => List(r(0).toString,r(4).toString,r(5).toString) :: acc)
toList is used to achieve a stack safety when calling foldRight, because it's not stack safe for Arrays
More info about foldLeft and foldRight
You have to change that val list_of_list to var list_of_list. That alone would not be enough as you also have to change the type of list_of_list into a mutable alternative.

How to define a function in scala for flatMap

New to Scala, I want to try to rewrite some code in flatMap by calling a function instead of writing the whole process inside "()".
The original code is like:
val longForm = summary.flatMap(row => {
/*This is the code I want to replace with a function*/
val metric = row.getString(0)
(1 until row.size).map{i=>
(metric,schema(i).name,row.getString(i).toDouble)
})
}/*End of function*/)
The function I wrote is:
def tfunc(line:Row):List[Any] ={
val metric = line.getString(0)
var res = List[Any]
for (i<- 1 to line.size){
/*Save each iteration result as a List[tuple], then append to the res List.*/
val tup = (metric,schema(i).name,line.getString(i).toDouble)
val tempList = List(tup)
res = res :: tempList
}
res
}
The function did not passed compilation with the following error:
error: missing argument list for method apply in object List
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing apply _ or apply(_) instead of apply.
var res = List[Any]
What is wrong with this function?
And for flatMap, is it the write way to return the result as a List?
You haven't explained why you want to replace that code block. Is there a particular goal you're after? There are many, many, different ways that block could be rewritten. How can we know which would be better at meeting you requirements?
Here's one approach.
def tfunc(line :Row) :List[(String,String,Double)] ={
val metric = line.getString(0)
List.tabulate(line.tail.length){ idx =>
(metric, schema(idx+1).name, line.getString(idx+1).toDouble)
}
}

Can't we use scala flatMap method on List of integers (i.e) List[Int]?

Can't we use scala flatMap method on List of integers (i.e) List[Int]?
I am getting compile time error for the below code
object FlatMapExample {
def main(args:Array[String])
{
val numberList = List(1,2,3)
val mappedList = numberList.map { elem => elem*2 }
println(mappedList)
val flatMappedList = numberList.flatMap { elem => elem*2 }//compile time error
println(flatMappedList)
}
}
Compile time error:
type mismatch ; found: Int required :scala.collection.GenTraversableOnce[?]
flatMap() assumes you are returning a collection of values rather than a single element. Thus these would work:
val list = List(1,2,3)
list.flatMap(elem => List(elem * 2)) // List (2,4,6)
If you just want to multiply by two, use map.

Loop and create list in Scala

I am getting empty list when I am trying to create the list with :: operator. My code looks like this:
def getAllInfo(locks: List[String]): List[LockBundle] = DB.withTransaction { implicit s =>
val myList = List[LockBundle]()
locks.foreach(
l => findForLock(l) :: myList
)
myList
}
def findForLock(lock: String): Option[LockBundle] = { ... }
Any suggestion?
Use flatMap
locks.flatMap(l => findForLock(l))
Your code becomes
def getAllInfo(locks: List[String]): List[LockBundle] = DB.withTransaction { implicit s =>
locks.flatMap(l => findForLock(l))
}
Alternatively you could use map and flatten. Something like this locks.map(l => findForLock(l)).flatten
Functional programming is all about transformations. You just have to transform your existing list into another list using a transformation which is your function findForLock.
Problem with your code
val myList = List[LockBundle]()
locks.foreach(
l => findForLock(l) :: myList
)
myList
First of all foreach returns Unit so, you use foreach for side effecting operations and not transformations. As you need transformation so do not use foreach.
Next, findForLock(l) :: myList gives you a value but this gets ignored as there is no one who is storing the value generated. So, in order to store the value use accumulator and pass it as a function parameter in case of recursion.
Correcting your code
If you want to do in your way. You need to use the accumulator.
First fix your types findForLock(l) returns Option, You list is of type List[LockBundle] so change the list type to List[Option[LockBundle]].
In order to get List[LockBundle] from List[Option[LockBundle]] Just do flatten on List[Option[LockBundle]] list. See below code snippet
var myList = List[Option[LockBundle]]()
locks.foreach(
l => myList = findForLock(l) :: myList
)
myList.flatten
The above way is not functional and is not recommended.
Your code doesn't work, because foreach combinator calls given closure for each element, but all you do here is to return expression findForLock(l) :: myList which is discarded.
As pamu suggested, you can use flatMap on a function to map each element to values returned by findForLock and flatten that list, which turns Option into element of the list if it's Some or nothing if it's None.
Keep in mind that this works only because there is an implicit conversion from Option to Seq, in general flatMap works only if you return the same type as given monad (that in this case is List or Option).

Convert java.util.Map to Scala List[NewObject]

I have a java.util.Map[String, MyObject] and want to create a Scala List[MyNewObject] consisting of alle entries of the map with some special values.
I found a way but, well, this is really ugly:
val result = ListBuffer[MyNewObject]()
myJavaUtilMap.forEach
(
(es: Entry[String, MyObject]) =>
{ result += MyNewObject(es.getKey(), ey.getValue().getMyParameter); println("Aa")}
)
How can I get rid of the println("Aa")? Just deleting does not help because foreach needs a Consumer but the += operation yields a list....
Is there a more elegant way to convert the java.util.Map to a List[MyNewObject]?
Scala has conversions that give you all the nice methods of the Scala collection API on Java collections:
import collection.JavaConversions._
val result = myJavaUtilMap.map{
case (k,v) => MyNewObject(k, v.getMyParameter)
}.toList
By the way: to define a function which returns Unit, you can explicitly specify the return type:
val f = (x: Int) => x: Unit