Scala : adding to Scala List - scala

I am trying to append to a List[String] based on a condition But List shows empty
Here is the Simple code :
object Mytester{
def main(args : Array[String]): Unit = {
val columnNames = List("t01354", "t03345", "t11858", "t1801566", "t180387685", "t015434")
//println(columnNames)
val prim = List[String]()
for(i <- columnNames) {
if(i.startsWith("t01"))
println("Printing i : " + i)
i :: prim :: Nil
}
println(prim)
}
}
Output :
Printing i : t01354
Printing i : t015434
List()
Process finished with exit code 0

This line, i :: prim :: Nil, creates a new List[] but that new List is not saved (i.e. assigned to a variable) so it is thrown away. prim is never changed, and it can't be because it is a val.
If you want a new List of only those elements that meet a certain condition then filter the list.
val prim: List[String] = columnNames.filter(_.startsWith("t01"))
// prim: List[String] = List(t01354, t015434)

1) why can't I add to List?
List is immutable, you have to mutable List (called ListBuffer)
definition
scala> val list = scala.collection.mutable.ListBuffer[String]()
list: scala.collection.mutable.ListBuffer[String] = ListBuffer()
add elements
scala> list+="prayagupd"
res3: list.type = ListBuffer(prayagupd)
scala> list+="urayagppd"
res4: list.type = ListBuffer(prayagupd, urayagppd)
print list
scala> list
res5: scala.collection.mutable.ListBuffer[String] = ListBuffer(prayagupd, urayagppd)
2. Filtering a list in scala?
Also, in your case the best approach to solve the problem would be to use List#filter, no need to use for loop.
scala> val columnNames = List("t01354", "t03345", "t11858", "t1801566", "t180387685", "t015434")
columnNames: List[String] = List(t01354, t03345, t11858, t1801566, t180387685, t015434)
scala> val columnsStartingWithT01 = columnNames.filter(_.startsWith("t01"))
columnsStartingWithT01: List[String] = List(t01354, t015434)
Related resources
Add element to a list In Scala
filter a List according to multiple contains

In addition to what jwvh explained.
Note that in Scala you'd usually do what you want as
val prim = columnNames.filter(_.startsWith("t01"))

Related

Transform a list of object to lists of its field

I have a List[MyObject], with MyObject containing the fields field1, field2 and field3.
I'm looking for an efficient way of doing :
Tuple3(_.map(_.field1), _.map(_.field2), _.map(_.field3))
In java I would do something like :
Field1Type f1 = new ArrayList<Field1Type>();
Field2Type f2 = new ArrayList<Field2Type>();
Field3Type f3 = new ArrayList<Field3Type>();
for(MyObject mo : myObjects) {
f1.add(mo.getField1());
f2.add(mo.getField2());
f3.add(mo.getField3());
}
I would like something more functional since I'm in scala but I can't put my finger on it.
Get 2\3 sub-groups with unzip\unzip3
Assuming the starting point:
val objects: Seq[MyObject] = ???
You can unzip to get all 3 sub-groups:
val (firsts, seconds, thirds) =
objects
.unzip3((o: MyObject) => (o.f1, o.f2, o.f3))
What if I have more than 3 relevant sub-groups ?
If you really need more sub-groups you need to implement your own unzipN however instead of working with Tuple22 I would personally use an adapter:
case class MyObjectsProjection(private val objs: Seq[MyObject]) {
lazy val f1s: Seq[String] =
objs.map(_.f1)
lazy val f2s: Seq[String] =
objs.map(_.f2)
...
lazy val f22s: Seq[String] =
objs.map(_.f3)
}
val objects: Seq[MyClass] = ???
val objsProjection = MyObjectsProjection(objects)
objs.f1s
objs.f2s
...
objs.f22s
Notes:
Change MyObjectsProjection according to your needs.
This is from a Scala 2.12\2.11 vanilla perspective.
The following will decompose your objects into three lists:
case class MyObject[T,S,R](f1: T, f2: S, f3: R)
val myObjects: Seq[MyObject[Int, Double, String]] = ???
val (l1, l2, l3) = myObjects.foldLeft((List.empty[Int], List.empty[Double], List.empty[String]))((acc, nxt) => {
(nxt.f1 :: acc._1, nxt.f2 :: acc._2, nxt.f3 :: acc._3)
})

Full outer join in Scala

Given a list of lists, where each list has an object that represents the key, I need to write a full outer join that combines all the lists. Each record in the resulting list is the combination of all the fields of all the lists. In case that one key is present in list 1 and not present in list 2, then the fields in list 2 should be null or empty.
One solution I thought of is to embed an in-memory database, create the tables, run a select and get the result. However, I'd like to know if there are any libraries that handle this in a more simpler way. Any ideas?
For example, let's say I have two lists, where the key is the first field in the list:
val list1 = List ((1,2), (3,4), (5,6))
val list2 = List ((1,"A"), (7,"B"))
val allLists = List (list1, list2)
The full outer joined list would be:
val allListsJoined = List ((1,2,"A"), (3,4,None), (5,6,None), (7,None,"B"))
NOTE: the solution needs to work for N lists
def fullOuterJoin[K, V1, V2](xs: List[(K, V1)], ys: List[(K, V2)]): List[(K, Option[V1], Option[V2])] = {
val map1 = xs.toMap
val map2 = ys.toMap
val allKeys = map1.keySet ++ map2.keySet
allKeys.toList.map(k => (k, map1.get(k), map2.get(k)))
}
Example usage:
val list1 = List ((1,2), (3,4), (5,6))
val list2 = List ((1,"A"), (7,"B"))
println(fullOuterJoin(list1, list2))
Which prints:
List((1,Some(2),Some(A)), (3,Some(4),None), (5,Some(6),None), (7,None,Some(B)))
Edit per suggestion in comments:
If you're interested in joining an arbitrary number of lists and don't care about type info, here's a version that does that:
def fullOuterJoin[K](xs: List[List[(K, Any)]]): List[(K, List[Option[Any]])] = {
val maps = xs.map(_.toMap)
val allKeys = maps.map(_.keySet).reduce(_ ++ _)
allKeys.toList.map(k => (k, maps.map(m => m.get(k))))
}
val list1 = List ((1,2), (3,4), (5,6))
val list2 = List ((1,"A"), (7,"B"))
val list3 = List((1, 3.5), (7, 4.0))
val lists = List(list1, list2, list3)
println(fullOuterJoin(lists))
which outputs:
List((1,List(Some(2), Some(A), Some(3.5))), (3,List(Some(4), None, None)), (5,List(Some(6), None, None)), (7,List(None, Some(B), Some(4.0))))
If you want both an arbitrary number of lists and well-typed results, that's probably beyond the scope of a stackoverflow answer but could probably be accomplished with shapeless.
Here is a way to do it using collect separately on both list
val list1Ite = list1.collect{
case ele if list2.filter(e=> e._1 == ele._1).size>0 => { //if list2 _1 contains ele._1
val left = list2.find(e=> e._1 == ele._1) //find the available element
(ele._1, ele._2, left.get._2) //perform join
}
case others => (others._1, others._2, None) //others add None as _3
}
//list1Ite: List[(Int, Int, java.io.Serializable)] = List((1,2,A), (3,4,None), (5,6,None))
Do similar operation but exclude the elements which are already available in list1Ite
val list2Ite = list2.collect{
case ele if list1.filter(e=> e._1 == ele._1).size==0 => (ele._1, None , ele._2)
}
//list2Ite: List[(Int, None.type, String)] = List((7,None,B))
Combine both list1Ite and list2Ite to result
val result = list1Ite.++(list2Ite)
result: List[(Int, Any, java.io.Serializable)] = List((1,2,A), (3,4,None), (5,6,None), (7,None,B))

difference between pipe and comma delimiter in spark-scala

Can someone tell me why do we have two separate ways of representing pipe(|) and comma(,). Like
sc.textFile(file).map( x => x.split(","))
for comma, and
sc.textFile(file).map( x => x.split('|'))
for pipe.
Keeping both in double quotes, its failing with pipe and comma is giving me correct result.
Below is the full code which I am running
package com.rakesh.singh
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.log4j._
object MPMovie {
def namex ( x : String) = {
val fields = x.split('|')
val id = fields(0).toInt
val name = fields(1).toString
(id , name)
}
def main(rakesh : Array[String]) = {
Logger.getLogger("yoyo").setLevel(Level.ERROR)
val conf = new SparkConf().setAppName("Movies").setMaster("local[2]")
val sc = new SparkContext(conf)
val rdd = sc.textFile("F:/Raakesh/ml-100k/movies.data")
val names = sc.textFile("F:/Raakesh/ml-100k/names.data")
val mappednames = names.map(namex)
val splited = rdd.map(x => (x.split("\t")(1).toInt,1))
//.map(x => (x,1))
val counteachmovie = splited.reduceByKey( (a ,b )=> a + b).map( x => (x._2 , x._1))
val mpm = counteachmovie.max()
println(s"the final value of mpm is $mpm")
mappednames.foreach(println)
val finalname = mappednames.lookup(mpm._2)(0)
println(s"the final value of mpm is $finalname")
}
}
and data files are
movies.data
196 101 3 881250949
186 101 3 891717742
22 103 1 878887116
244 102 2 880606923
names:Data
101|Sajan
102|Mela
103|Hum
There are two different split methods:
The split(",") method comes originally from String.split(regex: String), it works with arbitrary regexes as separators, e.g.
scala> "helloABCworldCABfooBBACCAbar".split("[ABC]+")
res0: Array[String] = Array(hello, world, foo, bar)
The other split('|') comes from StringOps.split(separator: Char), and is rather like a generic Scala-collection operation. It doesn't work with regex, but it works on all StringLike collections, for example on StringBuilders:
scala> val b = new StringBuilder
b: StringBuilder =
scala> b ++= "hello|"
res2: b.type = hello|
scala> b ++= "world"
res3: b.type = hello|world
scala> b.split('|')
res4: Array[String] = Array(hello, world)
The "|" doesn't work with the first method, because it's a nonsensical "OR"-regex. In order to use the pipe | with the split(regex: String) version, you either have to escape it like this "\\|" or (often easier) to enclose it into "[|]"-character class.

Appending Data to List or any other collection Dynamically in scala [duplicate]

This question already has answers here:
Add element to a list In Scala
(4 answers)
Closed 6 years ago.
I am new to scala.
Can we Add/Append data into List or any other Collection Dynamically in scala.
I mean can we add data in List or any collection using foreach (or any other loop).
I am trying to do something like below:
var propertyData = sc.textFile("hdfs://ip:8050/property.conf")
var propertyList = new ListBuffer[(String,String)]()
propertyData.foreach { line =>
var c = line.split("=")
propertyList.append((c(0), c(1)))
}
And suppose property.conf file contains:
"spark.shuffle.memoryFraction"="0.5"
"spark.yarn.executor.memoryOverhead"="712"
This is compiled fine But value is not added in ListBuffer.
I tried it using Darshan's code from his (updated) question:
val propertyData = List(""""spark.shuffle.memoryFraction"="0.5"""", """"spark.yarn.executor.memoryOverhead"="712" """)
val propertyList = new ListBuffer[(String,String)]()
propertyData.foreach { line =>
val c = line.split("=")
propertyList.append((c(0), c(1)))
}
println(propertyList)
It works as expected: it prints to the console:
ListBuffer(("spark.shuffle.memoryFraction","0.5"), ("spark.yarn.executor.memoryOverhead","712" ))
I didn't do it in a Spark Context, although I will try that in a few minutes. So, I provided the data in a list of Strings (shouldn't make a difference). I also changed the "var" keywords to "val" since none of them needs to be a mutable variable, but of course that makes no difference either. The code works whether they are val or var.
See my comment below. But here is idiomatic Spark/Scala code which does behave exactly as you would expect:
object ListTest extends App {
val conf = new SparkConf().setAppName("listtest")
val sc = new SparkContext(conf)
val propertyData = sc.textFile("listproperty.conf")
val propertyList = propertyData map { line =>
val xs: Array[String] = line.split("""\=""")
(xs(0),xs(1))
}
propertyList foreach ( println(_))
}
yes thats possible using mutable collections (see this link), example:
import scala.collection.mutable
val buffer = mutable.ListBuffer.empty[String]
// add elements
buffer += "a string"
buffer += "another string"
or in a loop:
val buffer = mutable.ListBuffer.empty[Int]
for(i <- 1 to 10) {
buffer += i
}
You can either use a mutable collection (not functional), or return a new collection (functional and more idiomatic) as below :
scala> val a = List(1,2,3)
a: List[Int] = List(1, 2, 3)
scala> val b = a :+ 4
b: List[Int] = List(1, 2, 3, 4)

Append/Add JsObject into JsArray in Play Framework

I am newbie to Play Framework, I need to append/add JsObject elements into JsArray
Aim(What I need)
{"s_no":1,"s_name":"one",
,"sub_s": [{"sub_s_no":1,"sub_s_name":"one_sub","sub_s_desc":"one_sub"},{"sub_s_no":2,"sub_s_name":"two_sub","sub_s_desc":"two_sub"}]},
{"s_no":2,"s_name":"two","sub_s":[{"sub_s_no":2,"sub_s_name":"two_sub","sub_s_desc":"two_sub"},
{"sub_s_no":3,"sub_s_name":"three_sub","sub_s_desc":"three_sub"}]}
What I Got
JsObject 1
{"s_no":1,"s_name":"one",
,"sub_s":[{"sub_s_no":1,"sub_s_name":"one_sub","sub_s_desc":"one_sub"},{"sub_s_no":2,"sub_s_name":"two_sub","sub_s_desc":"two_sub"}]}
JsObject 2
{"s_no":2,"s_name":"two","sub_s":[{"sub_s_no":2,"sub_s_name":"two_sub","sub_s_desc":"two_sub"},
{"sub_s_no":3,"sub_s_name":"three_sub","sub_s_desc":"three_sub"}]}
I have got two JsObject and will get more than two, I need to add/append these all JsObjects into JsArray
I tried with .+:,.append methods which gave empty JsArray values
The reason why getting an empty JsArray is because JsArray is immutable so the original JsArray will not modified. You need to assign the result of the append to a new variable in order for it to work how you expect.
val jsonString1 = """{"s_no":1,"sub_s":[1,2]}"""
val jsonString2 = """{"s_no":2,"sub_s":[3,4]}"""
val jsObj1 = Json.parse(jsonString1)
val jsObj2 = Json.parse(jsonString2)
val emptyArray = Json.arr()
val filledArray = emptyArray :+ obj1 :+ obj2
Json.prettyPrint(emptyArray)
Json.prettyPrint(filledArray)
And some of the REPL output
> filledArray: play.api.libs.json.JsArray = [{"s_no":1,"s_name":"one","sub_s":[{"sub_s_no":1,"sub_s_name":"one_sub","sub_s_desc":"one_sub"},{"sub_s_no":2,"sub_s_name":"two_sub","sub_s_desc":"two_sub"}]},{"s_no":2,"s_name":"two","sub_s":[{"sub_s_no":2,"sub_s_name":"two_sub","sub_s_desc":"two_sub"},{"sub_s_no":3,"sub_s_name":"three_sub","sub_s_desc":"three_sub"}]}]
> // pretty print of the empty array
> res1: String = [ ]
> // pretty print of the filled array
> res2: String = [ {"s_no" : 1 ...}, {"s_no" : 2 ...} ]