Scala - Convert List[String] to tuple List[(Int, Int)]

Scala - Convert List[String] to tuple List[(Int, Int)] - scala

I would like to getLine from a Source and convert it to a tuple (Int, Int). I've did it using foreach.
val values = collection.mutable.ListBuffer[(Int, Int)]()
Source.fromFile(invitationFile.ref.file).getLines().filter(line => !line.isEmpty).foreach(line => {
val value = line.split("\\s")
values += ((value(0).toInt, (value(1).toInt)))
})
What's the best way to write the same code without use foreach?

Use map, it builds a new list for you:
Source.fromFile(invitationFile.ref.file)
.getLines()
.filter(line => !line.isEmpty)
.map(line => {
val value = line.split("\\s")
(value(0).toInt, value(1).toInt)
})
.toList()

foreach should be a final operation, not a transformation.
In your case, you want to use the function map
val values = Source.fromFile(invitationFile.ref.file).getLines()
.filter(line => !line.isEmpty)
.map(line => line.split("\\s"))
.map(line => (line(0).toInt, line(1).toInt))

Using a for comprehension:
val values = for(line <- Source.fromFile(invitationFile.ref.file).getLines(); if !line.isEmpty) {
val splits = line.split("\\s")
yield (split(0).toInt, split(1).toInt)
}

Related

How to convert var to List?

How to convert one var to two var List?
Below is my input variable:
val input="[level:1,var1:name,var2:id][level:1,var1:name1,var2:id1][level:2,var1:add1,var2:city]"
I want my result should be:
val first= List(List("name","name1"),List("add1"))
val second= List(List("id","id1"),List("city"))

First of all, input is not a valid json
val input="[level:1,var1:name,var2:id][level:1,var1:name1,var2:id1][level:2,var1:add1,var2:city]"
You have to make it valid json RDD ( as you are going to use apache spark)
val validJsonRdd = sc.parallelize(Seq(input)).flatMap(x => x.replace(",", "\",\"").replace(":", "\":\"").replace("[", "{\"").replace("]", "\"}").replace("}{", "}&{").split("&"))
Once you have valid json rdd, you can easily convert that to dataframe and then apply the logic you have
import org.apache.spark.sql.functions._
val df = spark.read.json(validJsonRdd)
.groupBy("level")
.agg(collect_list("var1").as("var1"), collect_list("var2").as("var2"))
.select(collect_list("var1").as("var1"), collect_list("var2").as("var2"))
You should get desired output in dataframe as
+------------------------------------------------+--------------------------------------------+
|var1 |var2 |
+------------------------------------------------+--------------------------------------------+
|[WrappedArray(name1, name2), WrappedArray(add1)]|[WrappedArray(id1, id2), WrappedArray(city)]|
+------------------------------------------------+--------------------------------------------+
And you can convert the array to list if required
To get the values as in the question, you can do the following
val rdd = df.collect().map(row => (row(0).asInstanceOf[Seq[Seq[String]]], row(1).asInstanceOf[Seq[Seq[String]]]))
val first = rdd(0)._1.map(x => x.toList).toList
//first: List[List[String]] = List(List(name1, name2), List(add1))
val second = rdd(0)._2.map(x => x.toList).toList
//second: List[List[String]] = List(List(id1, id2), List(city))
I hope the answer is helpful

reduceByKey is the important function to achieve your required output. More explaination on step by step reduceByKey explanation
You can do the following
val input="[level:1,var1:name1,var2:id1][level:1,var1:name2,var2:id2][level:2,var1:add1,var2:city]"
val groupedrdd = sc.parallelize(Seq(input)).flatMap(_.split("]\\[").map(x => {
val values = x.replace("[", "").replace("]", "").split(",").map(y => y.split(":")(1))
(values(0), (List(values(1)), List(values(2))))
})).reduceByKey((x, y) => (x._1 ::: y._1, x._2 ::: y._2))
val first = groupedrdd.map(x => x._2._1).collect().toList
//first: List[List[String]] = List(List(add1), List(name1, name2))
val second = groupedrdd.map(x => x._2._2).collect().toList
//second: List[List[String]] = List(List(city), List(id1, id2))

Modifying List of String in scala

I have input file i would like to read a scala stream and then modify each record and then output the file.
My input is as follows -
Name,id,phone-number
abc,1,234567
dcf,2,345334
I want to change the above input as follows -
Name,id,phone-number
testabc,test1,test234567
testdcf,test2,test345334
I am trying to read a file as scala stream as follows:
val inputList = Source.fromFile("/test.csv")("ISO-8859-1").getLines
after the above step i get Iterator[String]
val newList = inputList.map{line =>
line.split(',').map{s =>
"test" + s
}.mkString (",")
}.toList
but the new list is empty.
I am not sure if i can define an empty list and empty array and then append the modified record to the list.
Any suggestions?

You might want to transform the iterator into a stream
val l = Source.fromFile("test.csv")
.getLines()
.toStream
.tail
.map { row =>
row.split(',')
.map { col =>
s"test$col"
}.mkString (",")
}
l.foreach(println)
testabc,test1,test234567
testdcf,test2,test345334

Here's a similar approach that returns a List[Array[String]]. You can use mkString, toString, or similar if you want a String returned.
scala> scala.io.Source.fromFile("data.txt")
.getLines.drop(1)
.map(l => l.split(",").map(x => "test" + x)).toList
res3: List[Array[String]] = List(
Array(testabc, test1, test234567),
Array(testdcf, test2, test345334)
)

Raw string to list of tuples

I have below raw string and I want to convert it to List or List of tuples or List of maps, basically I need to iterate through foreach
val rawStr = "(foo,bar), (foo1,bar1), (foo3,bar3)"
How would I go for it?

Split the string using any of ( , ) and then group
rawStr.split(s"""[(|,|)]""").filterNot(s => s.isEmpty || s.trim.isEmpty)
.grouped(2)
.toList
.map(pair => (pair(0), pair(1))).toList
Scala REPL
scala> val rawStr = "(foo,bar), (foo1,bar1), (foo3,bar3)"
rawStr: String = "(foo,bar), (foo1,bar1), (foo3,bar3)"
scala> rawStr.split(s"""[(|,|)]""").filterNot(s => s.isEmpty || s.trim.isEmpty).grouped(2).toList.map(pair => (pair(0), pair(1))).toList
res13: List[(String, String)] = List(("foo", "bar"), ("foo1", "bar1"), ("foo3", "bar3"))

This one can also deal with invalid input:
"\\(([^,]+{1})\\s*,\\s*([^,]+{1})\\)".r
.findAllMatchIn(rawStr)
.map(m => m.group(1) -> m.group(2)).toMap
You can give it
val rawStr = "(foo,bar,baz), (foo1,bar1), (foo3,bar3)"
or
val rawStr = "(foo), (foo1,bar1), (foo3,bar3)"
and it won't crash

Not able to complete the scala Program to count the employee in each country

I am doing some basic programs in scala
import scala.io.Source
/* records.txt
USA,Surender
USA,Raja
CHINA,Yen
CHINA,Chen
INDIA,Adam
INDIA,Edward
*/
object ReadingFile
{
def main (args :Array[String]){
val fileLoc = "D:\\inputfiles\\records.txt"
val lines = Source.fromFile(fileLoc).getLines().toList
val linesSplit = lines.map(x => x.split(","))
val linesMap = linesSplit.map(x => (x(0),x(1)))
}
}
I don't know how to use AGG function to linesMap. What do I need to add in my code to get the below output
USA,2
CHINA,2
INDIA,2

Source.fromFile(fileLoc)
.getLines()
.map(_.split(",")).
.groupBy(_(0))
.map(i => (i._1, i._2.size))
also can use mapValues:
Source.fromFile(fileLoc)
.getLines()
.map(_.split(","))
.groupBy(_(0))
.mapValues(_.size)

Spark scala RDD traversing

How can i traverse following RDD using Spark scala. I wants to print every value present in Seq with associated key
res1: org.apache.spark.rdd.RDD[(java.lang.String, Seq[java.lang.String])] = MapPartitionsRDD[6] at groupByKey at <console>:14
I tried following code for it.
val ss=mapfile.map(x=>{
val key=x._1
val value=x._2.sorted
var i=0
while (i < value.length) {
(key,value(i))
i += 1
}
}
)
ss.top(20).foreach(println)

I try to convert your codes as follows:
val ss = mapfile.flatMap {
case (key, value) => value.sorted.map((key, _))
}
ss.top(20).foreach(println)
Is it what you want?

I tried this and it works for the return type as mentioned.
val ss=mapfile.map(x=>{case (key, value) => value.sorted.map((key, _))}.groupByKey().map(x=>(x._1,x._2.toSeq))
ss.top(20).foreach(println)
Note: ss is of type::: org.apache.spark.rdd.RDD[(java.lang.String, Seq[java.lang.String])]

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Scala - Convert List[String] to tuple List[(Int, Int)] - scala

Use map, it builds a new list for you: Source.fromFile(invitationFile.ref.file) .getLines() .filter(line => !line.isEmpty) .map(line => { val value = line.split("\\s") (value(0).toInt, value(1).toInt) }) .toList()

foreach should be a final operation, not a transformation. In your case, you want to use the function map val values = Source.fromFile(invitationFile.ref.file).getLines() .filter(line => !line.isEmpty) .map(line => line.split("\\s")) .map(line => (line(0).toInt, line(1).toInt))

Using a for comprehension: val values = for(line <- Source.fromFile(invitationFile.ref.file).getLines(); if !line.isEmpty) { val splits = line.split("\\s") yield (split(0).toInt, split(1).toInt) }

Related

How to convert var to List?

Modifying List of String in scala

Raw string to list of tuples

Not able to complete the scala Program to count the employee in each country

Spark scala RDD traversing

Categories

Resources