I have to remove all List elements from the Array.
scala> var numbers=Array("321","3232","2401","7777","666","555")
numbers: Array[String] = Array(321, 3232, 2401, 7777, 666, 555)
scala> var nums=List("321","3232","2401")
nums: List[String] = List(321, 3232, 2401)
Would filter be useful here?
You should use numbers.diff(nums) - as simple as that:
scala> var numbers = Array("321", "3232", "2401", "7777", "666", "555")
numbers: Array[String] = Array(321, 3232, 2401, 7777, 666, 555)
scala> var nums = List("321", "3232", "2401")
nums: List[String] = List(321, 3232, 2401)
scala> numbers diff nums
res0: Array[String] = Array(7777, 666, 555)
Truly using diff leads to a neat and simple approach; some other, more verbose ways,
numbers filterNot { nums.contains(_) }
for ( n <- numbers if !nums.contains(n) ) yield n
Related
I have data like this below. In an array we have different words
scala> val x=rdd.flatMap(_.split(" "))
x: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[9] at flatMap at <console>:26
scala> x.collect
res41: Array[String] = Array(Roses, are, red, Violets, are, blue)
I want find the length of each word in an array in scala
Spark allows you to chain the functions that are defined on a RDD[T], which is RDD[String] in your case. You can add the map function following your flatMap function to get the lengths.
val sentence: String = "Apache Spark is a cluster compute engine"
val sentenceRDD = sc.makeRDD(List(sentence))
val wordLength = sentenceRDD.flatMap(_.split(" ")).map(_.length)
wordLength.take(2)
For instance I'll use your value x to show the demonstration:
we can do something like this to find the length of each word in array in scala
>x.map(s => s -> s.length)
This will print out the following:
Array[(String, Int)] = Array((Roses,5), (are,3), (red,3), (Violets,7), (are,3), (blue,4))
In the case, if you are using Spark. Then change as follows:
>x.map(s => s -> s.length).collect()
This will print out the following:
Array[(String, Int)] = Array((Roses,5), (are,3), (red,3), (Violets,7), (are,3), (blue,4))
If you want only the length of each word then use this:
>x.map(_.length).collect()
Output:
Array(5,3,3,7,3,4)
you can just give ...
val a = Array("Roses", "are", "red", "Violets", "are", "blue")
var b = a.map(x => x.length)
This will give you Array[Int] = Array(5, 3, 3, 7, 3, 4)
Let's say I have the following array of string:
val lines: List[String] = List("GOOL,1182", "AMZN,1920", "MSFT,124", "APPL,192.2")
In practice this type of array is typically obtained by reading a csv file.
Conceptually, I would like to
For each line, split it by ","
After splitting all the lines, assign the first column to a List, and the second column to another List.
The approach I came up with is the following:
var col1List = List[String]()
var col2List = List[String]()
lines.foreach{ x =>
val cols = x split ","
col1List = col1List ::: List(cols(0))
col2List = col2List ::: List(cols(1))
}
Afterward, I got the following Lists:
List[String] = List(GOOL, AMZN, MSFT, APPL)
List[String] = List(1182, 1920, 124, 192.2)
Is there a better way to do this in Scala?
What you are looking for is .unzip method.
Here's an example:
val lines: List[String] = List("GOOL,1182", "AMZN,1920", "MSFT,124", "APPL,192.2")
val (l1, l2) = lines.map(_.split(",")).map(arr => (arr.head, arr.last)).unzip
println(l1, l2)
Result:
(List(GOOL, AMZN, MSFT, APPL),List(1182, 1920, 124, 192.2))
bottaio is kind-of right, you need unzip. You need NOTHING BUT unzip:
val lines: List[String] = List("GOOL,1182", "AMZN,1920", "MSFT,124", "APPL,192.2")
val (xs, ys) = lines.unzip{ str => val a = str.split(","); (a(0), a(1)) }
println(xs)
println(ys)
// Output:
// List(GOOL, AMZN, MSFT, APPL)
// List(1182, 1920, 124, 192.2)
Notice that unzip itself accepts a function that transforms entries of the list into pairs.
This in general, works for all sizes of lines read from .csv file into the lines
List:
lines.map(_.split(",")).transpose
In Scala REPL:
scala> val lines: List[String] = List("GOOL,1182", "AMZN,1920", "MSFT,124", "APPL,192.2")
lines: List[String] = List(GOOL,1182, AMZN,1920, MSFT,124, APPL,192.2)
scala> lines.map(_.split(",")).transpose
res30: List[List[String]] = List(List(GOOL, AMZN, MSFT, APPL), List(1182, 1920, 124, 192.2))
scala> val lines: List[String] = List("GOOL,1182,23,56", "AMZN,1920,57,21", "MSFT,124,345,987", "APPL,192.2,765,908")
lines: List[String] = List(GOOL,1182,23,56, AMZN,1920,57,21, MSFT,124,345,987, APPL,192.2,765,908)
scala> lines.map(_.split(",")).transpose
res29: List[List[String]] = List(List(GOOL, AMZN, MSFT, APPL), List(1182, 1920, 124, 192.2), List(23, 57, 345, 765), List(56
, 21, 987, 908))
Can someone suggest the best way to merge two lists in scala so that the resulting list will only contain matching elements in both lists?
Example:
List[Int] = List(10,20,30)
List[Int] = List(30,50)
Result: List[Int] = List(30)
And condition (nested for loop)
You can use nested for loops as
val list1 = List(10, 20, 30)
val list2 = List(30, 50)
val result = for(value1 <- list1; value2 <- list2; if value1 == value2) yield value1
println(result)
which would print List(30)
Intersect() (built-in function)
You can use intersect function which will give you common values in both lists as
println(list1.intersect(list2))
which should give you List(30)
Besides the intersect solution, you can also use an filter with a contains inside
l1.filter(l2.contains(_))
With your input:
l1: List[Int] = List(10, 20, 30)
l2: List[Int] = List(30, 50)
Result will be:
List[Int] = List(30)
Similar functionality can be achieved with dropwhile or takeWhile also
scala> l2.takeWhile(l1.contains(_))
res8: List[Int] = List(30)
scala> l1.dropWhile(!l2.contains(_))
res10: List[Int] = List(30)
val a = List(1,1,1,0,0,2)
val b = List(1,0,3,2)
I want to get the List of indices of elements of "List b" which are existing in "List a".
Here output to be List(0,1,3)
I tried this
for(x <- a.filter(b.contains(_))) yield a.indexOf(x))
Sorry. I missed this. The list size may vary. Edited the Lists
Is there a better way to do this?
If you want a result of indices, it's often useful to start with indices.
b.indices.filter(a contains b(_))
REPL tested.
scala> val a = List(1,1,1,0,0,2)
a: List[Int] = List(1, 1, 1, 0, 0, 2)
scala> val b = List(1,0,3,2)
b: List[Int] = List(1, 0, 3, 2)
scala> b.indices.filter(a contains b(_))
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 3)
val result = (a zip b).zipWithIndex.flatMap {
case ((aItem, bItem), index) => if(aItem == bItem) Option(index) else None
}
a zip b will return all elements from a that have a matching pair in b.
For example, if a is longer, like in your example, the result would be List((1,1),(1,0),(1,3),(0,2)) (the list will be b.length long).
Then you need the index also, that's zipWithIndex.
Since you only want the indexes, you return an Option[Int] and flatten it.
You can use indexed for for this:
for{ i <- 0 to b.length-1
if (a contains b(i))
} yield i
scala> for(x <- b.indices.filter(a contains b(_))) yield x;
res27: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 3)
Here is another option:
scala> val a = List(1,1,1,0,0,2)
a: List[Int] = List(1, 1, 1, 0, 0, 2)
scala> val b = List(1,0,3,2)
b: List[Int] = List(1, 0, 3, 2)
scala> b.zipWithIndex.filter(x => a.contains(x._1)).map(x => x._2)
res7: List[Int] = List(0, 1, 3)
I also want to point out that your original idea of: Finding elements in b that are in a and then getting indices of those elements would not work, unless all elements in b contained in a are unique, indexOf returns index of the first element. Just heads up.
I am very new to Scala, and I am not sure how this is done. I have googled it with no luck.
let us assume the code is:
var arr = readLine().split(" ")
Now arr is a string array. Assuming I know that the line I input is a series of numbers e.g. 1 2 3 4, I want to convert arr to an Int (or int) array.
I know that I can convert individual elements with .toInt, but I want to convert the whole array.
Thank you and apologies if the question is dumb.
Applying a function to every element of a collection is done using .map :
scala> val arr = Array("1", "12", "123")
arr: Array[String] = Array(1, 12, 123)
scala> val intArr = arr.map(_.toInt)
intArr: Array[Int] = Array(1, 12, 123)
Note that the _.toInt notation is equivalent to x => x.toInt :
scala> val intArr = arr.map(x => x.toInt)
intArr: Array[Int] = Array(1, 12, 123)
Obviously this will raise an exception if one of the element is not an integer :
scala> val arr = Array("1", "12", "123", "NaN")
arr: Array[String] = Array(1, 12, 123, NaN)
scala> val intArr = arr.map(_.toInt)
java.lang.NumberFormatException: For input string: "NaN"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
...
... 33 elided
Starting Scala 2.13, you might want to use String::toIntOption in order to safely cast Strings to Option[Int]s and thus also handle items that can't be cast:
Array("1", "12", "abc", "123").flatMap(_.toIntOption)
// Array[Int] = Array(1, 12, 123)