I am very new to Scala, and I am not sure how this is done. I have googled it with no luck.
let us assume the code is:
var arr = readLine().split(" ")
Now arr is a string array. Assuming I know that the line I input is a series of numbers e.g. 1 2 3 4, I want to convert arr to an Int (or int) array.
I know that I can convert individual elements with .toInt, but I want to convert the whole array.
Thank you and apologies if the question is dumb.
Applying a function to every element of a collection is done using .map :
scala> val arr = Array("1", "12", "123")
arr: Array[String] = Array(1, 12, 123)
scala> val intArr = arr.map(_.toInt)
intArr: Array[Int] = Array(1, 12, 123)
Note that the _.toInt notation is equivalent to x => x.toInt :
scala> val intArr = arr.map(x => x.toInt)
intArr: Array[Int] = Array(1, 12, 123)
Obviously this will raise an exception if one of the element is not an integer :
scala> val arr = Array("1", "12", "123", "NaN")
arr: Array[String] = Array(1, 12, 123, NaN)
scala> val intArr = arr.map(_.toInt)
java.lang.NumberFormatException: For input string: "NaN"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
...
... 33 elided
Starting Scala 2.13, you might want to use String::toIntOption in order to safely cast Strings to Option[Int]s and thus also handle items that can't be cast:
Array("1", "12", "abc", "123").flatMap(_.toIntOption)
// Array[Int] = Array(1, 12, 123)
Related
I have data like this below. In an array we have different words
scala> val x=rdd.flatMap(_.split(" "))
x: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[9] at flatMap at <console>:26
scala> x.collect
res41: Array[String] = Array(Roses, are, red, Violets, are, blue)
I want find the length of each word in an array in scala
Spark allows you to chain the functions that are defined on a RDD[T], which is RDD[String] in your case. You can add the map function following your flatMap function to get the lengths.
val sentence: String = "Apache Spark is a cluster compute engine"
val sentenceRDD = sc.makeRDD(List(sentence))
val wordLength = sentenceRDD.flatMap(_.split(" ")).map(_.length)
wordLength.take(2)
For instance I'll use your value x to show the demonstration:
we can do something like this to find the length of each word in array in scala
>x.map(s => s -> s.length)
This will print out the following:
Array[(String, Int)] = Array((Roses,5), (are,3), (red,3), (Violets,7), (are,3), (blue,4))
In the case, if you are using Spark. Then change as follows:
>x.map(s => s -> s.length).collect()
This will print out the following:
Array[(String, Int)] = Array((Roses,5), (are,3), (red,3), (Violets,7), (are,3), (blue,4))
If you want only the length of each word then use this:
>x.map(_.length).collect()
Output:
Array(5,3,3,7,3,4)
you can just give ...
val a = Array("Roses", "are", "red", "Violets", "are", "blue")
var b = a.map(x => x.length)
This will give you Array[Int] = Array(5, 3, 3, 7, 3, 4)
val list = List("1","10","12","30","40","50")
based on parameter n in eg "12" here , the elements ahead of "12" should form a
list List("30,40,50") and final list should be created as below
Expected Output
List("1","10","12",List("30,40,50") )
list.dropWhile(_!="12").tail gives `List("30,40,50")` but i am not above the achieve the desired output
partition will give the closest output to what you are looking for.
scala> list.partition(_ <= "12")
res21: (List[String], List[String]) = (List(1, 10, 12),List(30, 40, 50))
All elements of List must have the same type. splitAt or partition can accomplish this albeit with a different return type than you want. I suspect that the desired return type List[String, ..., List[String]] is a "type smell" that may indicate a another issue.
Maybe span could help:
val (a, b) = list.span(_ != "12")
val h :: t = b
val res = a :+ h :+ List(t.mkString(","))
produces for input List("123", "10", "12", "30", "40", "50"):
List("123", "10", "12", List("30,40,50"))
If you handle the output type(which is List[java.io.Serializable]), here is the acc method, which takes the desired parameter, a String s in this case :
def acc(list:List[String],s:String) = {
val i = list.indexOf(s)
list.take(i+1):+List(list.drop(i+1).mkString(","))
}
In Scala REPL:
scala> val list = List("1","10","12","30","40","50")
list: List[String] = List(1, 10, 12, 30, 40, 50)
scala> acc(list,"12")
res29: List[java.io.Serializable] = List(1, 10, 12, List(30,40,50))
scala> acc(list,"10")
res30: List[java.io.Serializable] = List(1, 10, List(12,30,40,50))
scala> acc(list,"40")
res31: List[java.io.Serializable] = List(1, 10, 12, 30, 40, List(50))
scala> acc(list,"30")
res32: List[java.io.Serializable] = List(1, 10, 12, 30, List(40,50))
I'm trying to transform an RDD of tuple of Strings of this format :
(("abc","xyz","123","2016-02-26T18:31:56"),"15") TO
(("abc","xyz","123"),"2016-02-26T18:31:56","15")
Basically seperating out the timestamp string as a seperate tuple element. I tried following but it's still not clean and correct.
val result = rdd.map(r => (r._1.toString.split(",").toVector.dropRight(1).toString, r._1.toString.split(",").toList.last.toString, r._2))
However, it results in
(Vector(("abc", "xyz", "123"),"2016-02-26T18:31:56"),"15")
The expected output I'm looking for is
(("abc", "xyz", "123"),"2016-02-26T18:31:56","15")
This way I can access the elements using r._1, r._2 (the timestamp string) and r._3 in a seperate map operation.
Any hints/pointers will be greatly appreciated.
Vector.toString will include the String 'Vector' in its result. Instead, use Vector.mkString(",").
Example:
scala> val xs = Vector(1,2,3)
xs: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3)
scala> xs.toString
res25: String = Vector(1, 2, 3)
scala> xs.mkString
res26: String = 123
scala> xs.mkString(",")
res27: String = 1,2,3
However, if you want to be able to access (abc,xyz,123) as a Tuple and not as a string, you could also do the following:
val res = rdd.map{
case ((a:String,b:String,c:String,ts:String),d:String) => ((a,b,c),ts,d)
}
I'm new to Scala (and Spark). I'm trying to read in a csv file and extract multiple arbitrary columns from the data. The following function does this, but with hard-coded column indices:
def readCSV(filename: String, sc: SparkContext): RDD[String] = {
val input = sc.textFile(filename).map(line => line.split(","))
val out = input.map(csv => csv(2)+","+csv(4)+","+csv(15))
return out
}
Is there a way to use map with an arbitrary number of column indices passed to the function in an array?
If you have a sequence of indices, you could map over it and return the values :
scala> val m = List(List(1,2,3), List(4,5,6))
m: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6))
scala> val indices = List(0,2)
indices: List[Int] = List(0, 2)
// For each inner sequence, get the relevant values
// indices.map(inner) is the same as indices.map(i => inner(i))
scala> m.map(inner => indices.map(inner))
res1: List[List[Int]] = List(List(1, 3), List(4, 6))
// If you want to join all of them use .mkString
scala> m.map(inner => indices.map(inner).mkString(","))
res2: List[String] = List(1,3, 4,6) // that's actually a List containing 2 String
I have to remove all List elements from the Array.
scala> var numbers=Array("321","3232","2401","7777","666","555")
numbers: Array[String] = Array(321, 3232, 2401, 7777, 666, 555)
scala> var nums=List("321","3232","2401")
nums: List[String] = List(321, 3232, 2401)
Would filter be useful here?
You should use numbers.diff(nums) - as simple as that:
scala> var numbers = Array("321", "3232", "2401", "7777", "666", "555")
numbers: Array[String] = Array(321, 3232, 2401, 7777, 666, 555)
scala> var nums = List("321", "3232", "2401")
nums: List[String] = List(321, 3232, 2401)
scala> numbers diff nums
res0: Array[String] = Array(7777, 666, 555)
Truly using diff leads to a neat and simple approach; some other, more verbose ways,
numbers filterNot { nums.contains(_) }
for ( n <- numbers if !nums.contains(n) ) yield n