How to compare two list using Scala? - scala

I have two lists
val firstList = List(("A","B",12),("P","Q",13),("L","M",21))
val secondList = List(("A",11),("P",34),("L",43))
I want output as below
val outPutList = List(("P","Q",13,34),("L","M",21,43))
I want to compare third member of firstList to second element of secondList. This means -
I want to check second list value as secondList.map(_.2) is greater than first list as firstList.map(_.3)

Using a for comprehension as follows,
for ( ((a,b,m), (c,n)) <- (firstList zip secondList) if n > m) yield (a,b,m,n)

Related

Passing parameters to foldLeft scala

I am trying to use Scala's foldLeft function to compare a Stream with a List of Lists.
This is a snippet of the code I have
def freqParse(pairs: (Pair,Int,String), record: String): (Pair,Int,String) ={
val m: Pair = ("","")
val t: FreqPairs = Map((m,0))
(pairs._1,pairs._2,pairs._3)
}
val freqItems = items.map(v => (v._1)).toList
val cross = freqItems.flatMap(x => freqItems.map(y => (x, y))) // cross product to get pair of frequent items
val freq = lines.foldLeft(pairs,0,delim)(freqParse)
cross is basically a lists where each element is a pair of strings List(List(a,b), List(a,c)...).
lines is an input stream with a record per line (25902 lines in total).
I want to count how many times each pair (or each element in cross) occurs in the entirety of the stream. Essentially comparing all elements in cross to all elements in lines.
I decided foldLeft because that way I can take each line in the stream and split it at a delimiter and then check if both the elements in cross appear or not.
I am able to split each record in lines but I don't know how to pass the cross variable to the function to begin the comparison.

Spark-Scala: Map the first element of list with every other element of list when lists are of varying length

I have dataset of the following type in a textile:
1004,bb5469c5|2021-09-19 01:25:30,4f0d-bb6f-43cf552b9bc6|2021-09-25 05:12:32,1954f0f|2021-09-19 01:27:45,4395766ae|2021-09-19 01:29:13,
1018,36ba7a7|2021-09-19 01:33:00,
1020,23fe40-4796-ad3d-6d5499b|2021-09-19 01:38:59,77a90a1c97b|2021-09-19 01:34:53,
1022,3623fe40|2021-09-19 01:33:00,
1028,6c77d26c-6fb86|2021-09-19 01:50:50,f0ac93b3df|2021-09-19 01:51:11,
1032,ac55-4be82f28d|2021-09-19 01:54:20,82229689e9da|2021-09-23 01:19:47,
I read the file using sc.textFile which returns an RDD of type Array[String] after which I perform the operations .map(x=>x.substring(1,x.length()-1)).map(x=>x.split(",").toList)
After split.toList I want to map the first element of each of the lists obtained to every other element of the list for which I use .map(x=>(x(0),x(1))).toDF("c1","c2")
This works fine for those lists which have only one value after split but skips on all other elements of the lists having more than one value for obvious reasons. For eg:
.map(x=>(x(0),x(1))) returns [1020,23fe40-4796-ad3d-6d5499b|2021-09-19 01:38:59] but skips out on the third element here 77a90a1c97b|2021-09-19 01:34:53
How can I write a map function which returns [1020,23fe40-4796-ad3d-6d5499b|2021-09-19 01:38:59], [1020,77a90a1c97b|2021-09-19 01:34:53] given that all the lists created using .map(x=>x.split(",").toList) are of varying lengths (have varying number of elements)?
I noted the ',' at the end of the file, but split ignores nulls.
The solution is as follows, just try it and you will see it works:
// x._n cannot work here initially.
val rdd = spark.sparkContext.textFile("/FileStore/tables/oddfile_01.txt")
val rdd2 = rdd.map(line => line.split(','))
val rdd3 = rdd2.map(x => (x(0), x.tail.toList))
val rdd4 = rdd3.flatMap{case (x, y) => y.map((x, _))}
rdd4.collect
Cardinality does change in this approach though.

Scala - conditional product/join of two arrays with default values using for comprehensions

I have two Sequences, say:
val first = Array("B", "L", "T")
val second = Array("T70", "B25", "B80", "A50", "M100", "B50")
How do I get a product such that elements of the first array are joined with each element of the second array which startsWith the former and also yield a default empty result when no element in the second array meets the condition.
Effectively to get an Output:
expectedProductArray = Array("B-B25", "B-B80", "B-B50", "L-Default", "T-T70")
I tried doing,
val myProductArray: Array[String] = for {
f <- first
s <- second if s.startsWith(f)
} yield s"""$f-$s"""
and i get:
myProductArray = Array("B-B25", "B-B80", "B-B50", "T-T70")
Is there an Idiomatic way of adding a default value for values in first sequence not having a corresponding value in the second sequence with the given criteria? Appreciate your thoughts.
Here's one approach by making array second a Map and looking up the Map for elements in array first with getOrElse:
val first = Array("B", "L", "T")
val second = Array("T70", "B25", "B80", "A50", "M100", "B50")
val m = second.groupBy(_(0).toString)
// m: scala.collection.immutable.Map[String,Array[String]] =
// Map(M -> Array(M100), A -> Array(A50), B -> Array(B25, B80, B50), T -> Array(T70))
first.flatMap(x => m.getOrElse(x, Array("Default")).map(x + "-" + _))
// res1: Array[String] = Array(B-B25, B-B80, B-B50, L-Default, T-T70)
In case you prefer using for-comprehension:
for {
x <- first
y <- m.getOrElse(x, Array("Default"))
} yield s"$x-$y"

change a list column order scala

I have the following list and I want to change the order of columns
val list=List("banana,QS,1,0,0",
"apple,EE,1,2,1",
"peas,US,1,4,4")
The expected result is:
val list=List("banana,QS,0,1,0",
"apple,EE,1,1,2",
"peas,US,4,1,4")
Best Regards
if you are looking for swapping column 3rd with 4th,
split with ,
construct new List with swapped columns
concat List to get string
example,
scala> val list = List("banana,QS,1,0,0", "apple,EE,1,2,1", "peas,US,1,4,4")
list: List[String] = List(banana,QS,1,0,0, apple,EE,1,2,1, peas,US,1,4,4)
scala> list.map(_.split(",")).map(elem => List(elem(0), elem(1), elem(3), elem(2), elem(4)).mkString(","))
res0: List[String] = List(banana,QS,0,1,0, apple,EE,2,1,1, peas,US,4,1,4)

Scala indices of values in one list which are not in a second list

I am trying to find the indices of elements in one Scala list which are not present in a second list (assume the second list has distinct elements so that you do not need to invoke toSet on it). The best way I found is:
val ls = List("a", "b", "c") // the list
val excl = List("c", "d") // the list of items to exclude
val ixs = ls.zipWithIndex.
filterNot{p => excl.contains(p._1)}.
map{ p => p._2} // the list of indices
However, I feel there should be a more direct method. Any hints?
Seems OK to me. It's a bit more elegant as a for-comprehension, perhaps:
for ((e,i) <- ls.zipWithIndex if !excl.contains(e)) yield i
And for efficiency, you might want to make excl into a Set anyway
val exclSet = excl.toSet
for ((e,i) <- ls.zipWithIndex if !exclSet(e)) yield i
One idea would be this
(ls.zipWithIndex.toMap -- excl).values
only works however, if you are not interested in all position if an element occurs multiple times in the list. That would need a MultiMap which Scala does not have in the standard library.
An other version would be to use a partial function and convert the second list to a set first (unless it is really small lookup in a set will be much fast)
val set = excl.toSet
ls.zipWithIndex.collect{case (x,y) if !set(x) => y}