Scala Spark: how to compare two list of tuple - scala

I would like to find if each Tuple in my firstArray exists in my secondArray of tuple.
If it is not the case, i would like to return all tuple that doesn't match and which element of tuple exactly doesn't match.
It could be something like :
for each element (x,y) in firstArray:
for each element (k,z) in secondArray:
if (x != k) print(something)
return (x,y)
if (y != z) print(something)
return (x,y)
Example:
val firstArray: Array[(String,String)] = Array(("elem1","elem2"), ("elem3","elem4"))
val secondArray: Array[(String,String)] = Array(("elem1","elem2"), ("elem5","elem4"), ("elem3","elem7"))
Desired output
Output:
("elem3","elem4") is eliminated because elem4 doesn't match elem7
val result: Array[(String,String)] = Array(("elem3","elem4"))

you can try something like
val res = firstArray.filterNot(secondArray.contains(_))
It will return the elements of first array that are not present in the second.
Edit
The following code will loop over the two arrays, and compare the tuples
for {
(i,j) <- firstArray
(k,l) <- secondArray
}
{
println((i,j) match {
case (a,b) if (a == k && b ==l) => "Tuple found"
case (a,_) if (a == k)=> "First elem only found."
case (_,b) if (b ==l)=> "Second elem only found."
case _ => "No match"
})
}
Hope this will help

Related

Add a marker character between duplicate pair in list

I am working on an exercise that I need to figure out how to add designated marker char between two duplicate elements in a list.
input - a string
output - a list of string pairs
Two rules;
if the input string has duplicate characters, a char x needs to be
added between them. For ex; trees will become tr, ex, es
if the duplicate char pair is xx, add a q between them. For ex;
boxx becomes bo,xq, x
Both rules run together on the input, For example;
if the input is HelloScalaxxxx the output should be List("He", "lx", "lo", "Sc", "al", "ax", "xq", "xq", "x")
I got the first rule working with following code and struggling to get the second rule satisfied.
input.foldRight[List[Char]](Nil) {
case (h, t) =>
println(h :: t)
if (t.nonEmpty) {
(h, t.head) match {
case ('x', 'x') => t ::: List(h, 'q')
case _ => if (h == t.head) h :: 'x' :: t else h :: t
}
} else h :: t
}
.mkString("").grouped(2).toSeq
I think I am close, for the input HelloScalaxxxx it produces List("He", "lx", "lo", "Sc", "al", "ax", "xq", "xq", "xq"), but with an extra q in the last pair.
I don't want to use a regex-based solution. Looking for an idiomatic Scala version.
I tried searching for existing answers but no luck. Any help would be appreciated. Thank you.
I assume you want to apply the xx rule first...but you can decide.
"Trees & Scalaxxxx"
.replaceAll("(x)(?=\\1)","$1q")
.replaceAll("([^x])(?=\\1)","$1x")
.grouped(2).toList
//res0: List[String] = List(Tr, ex, es, " &", " S", ca, la, xq, xq, xq, x)
And here's the non-regex offering.
"Trees & Scalaxxxx"
.foldLeft(('_',"")){
case (('x',acc),'x') => ('x', s"${acc}qx")
case ((p,acc),c) if c == p &&
p.isLetter => ( c , s"${acc}x$c")
case ((_,acc),c) => ( c , s"$acc$c")
}._2.grouped(2).toList
Tail recursive solution
def processString(input: String): List[String] = {
#scala.annotation.tailrec
def inner(buffer: List[String], str: String): List[String] = {
// recursion ending condition. Nothing left to process
if (str.isEmpty) return buffer
val c0 = str.head
val c1 = if (str.isDefinedAt(1)) {
str(1)
} else {
// recursion ending condition. Only head remains.
return buffer :+ c0.toString
}
val (newBuffer, remainingString) =
(c0, c1) match {
case ('x', 'x') => (buffer :+ "xq", str.substring(1))
case (_, _) if c0 == c1 => (buffer :+ s"${c0}x", str.substring(1))
case _ => (buffer :+ s"$c0$c1", str.substring(2))
}
inner(newBuffer, remainingString)
}
// start here. Pass empty buffer and complete input string
inner(List.empty, input)
}
println(processString("trees"))
println(processString("boxx"))
println(processString("HelloScalaxxxx"))

Scala count number of times function returns each value, functionally

I want to count up the number of times that a function f returns each value in it's range (0 to f_max, inclusive) when applied to a given list l, and return the result as an array, in Scala.
Currently, I accomplish as follows:
def count (l: List): Array[Int] = {
val arr = new Array[Int](f_max + 1)
l.foreach {
el => arr(f(el)) += 1
}
return arr
}
So arr(n) is the number of times that f returns n when applied to each element of l. This works however, it is imperative style, and I am wondering if there is a clean way to do this purely functionally.
Thank you
how about a more general approach:
def count[InType, ResultType](l: Seq[InType], f: InType => ResultType): Map[ResultType, Int] = {
l.view // create a view so we don't create new collections after each step
.map(f) // apply your function to every item in the original sequence
.groupBy(x => x) // group the returned values
.map(x => x._1 -> x._2.size) // count returned values
}
val f = (i:Int) => i
count(Seq(1,2,3,4,5,6,6,6,4,2), f)
l.foldLeft(Vector.fill(f_max + 1)(0)) { (acc, el) =>
val result = f(el)
acc.updated(result, acc(result) + 1)
}
Alternatively, a good balance of performance and external purity would be:
def count(l: List[???]): Vector[Int] = {
val arr = l.foldLeft(Array.fill(f_max + 1)(0)) { (acc, el) =>
val result = f(el)
acc(result) += 1
}
arr.toVector
}

How to skip keys in map function on map in scala

Given a map of Map[String, String].
I want to know how to skip a key from map
val m = Map("1"-> "1", "2"-> "2")
m.map[(String, String), Map[String, String]].map{
case(k,v)=>
if (v == "1") {
// Q1: how to skip this key
// Do not need to return anything
} else {
// If the value is value that I want, apply some other transformation on it
(k, someOtherTransformation(v))
}
}
.collect is doing exactly what you want, it takes partial function, if function is not defined for some element (pair for Map), that element is dropped:
Map("1"-> "1", "2"-> "2").collect { case (k, v) if v != "1" => (k, v * 2) }
//> scala.collection.immutable.Map[String,String] = Map(2 -> 22)
Here partial function is defined for v != "1" (because of guard), hence element with v == "1" is dropped.
You could put a "guard" on your case clause ...
case (k,v) if v != "1" => // apply some transformation on it
case (k,v) => (k,v) // leave as is
... or simply leave the elements you're not interested in unchanged.
case (k,v) => if (v == "1") (k,v) else // apply some transformation on it
The output of map is a new collection the same size as the input collection with all/some/none of the elements modified.
Victor Moroz's answer is good for this case, but for cases where you can't make the decision on whether to skip immediately in the pattern match, use flatMap:
Map("1"-> "1", "2"-> "2").flatMap {
case (k,v) =>
val v1 = someComplexCalculation(k, v)
if (v1 < 0) {
None
} else {
// If the value is value that I want, apply some other transformation on it
Some((k, someOtherTransformation(v1)))
}
}
Why not .filterNot to remove all unwanted values(according to your condition) and then a .map?
Sample code:
Map("1"-> "1", "2" -> "2").filterNot(_._2 == "1").map(someFunction)
//someFunction -> whatever you would implement

Tune Nested Loop in Scala

I was wondering if I can tune the following Scala code :
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
var listNoDuplicates: List[(Class1, Class2)] = Nil
for (outerIndex <- 0 until listOfTuple.size) {
if (outerIndex != listOfTuple.size - 1)
for (innerIndex <- outerIndex + 1 until listOfTuple.size) {
if (listOfTuple(i)._1.flag.equals(listOfTuple(j)._1.flag))
listNoDuplicates = listOfTuple(i) :: listNoDuplicates
}
}
listNoDuplicates
}
Usually if you have someting looking like:
var accumulator: A = new A
for( b <- collection ) {
accumulator = update(accumulator, b)
}
val result = accumulator
can be converted in something like:
val result = collection.foldLeft( new A ){ (acc,b) => update( acc, b ) }
So here we can first use a map to force the unicity of flags. Supposing the flag has a type F:
val result = listOfTuples.foldLeft( Map[F,(ClassA,ClassB)] ){
( map, tuple ) => map + ( tuple._1.flag -> tuple )
}
Then the remaining tuples can be extracted from the map and converted to a list:
val uniqList = map.values.toList
It will keep the last tuple encoutered, if you want to keep the first one, replace foldLeft by foldRight, and invert the argument of the lambda.
Example:
case class ClassA( flag: Int )
case class ClassB( value: Int )
val listOfTuples =
List( (ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)), (ClassA(1),ClassB(-1)) )
val result = listOfTuples.foldRight( Map[Int,(ClassA,ClassB)]() ) {
( tuple, map ) => map + ( tuple._1.flag -> tuple )
}
val uniqList = result.values.toList
//uniqList: List((ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)))
Edit: If you need to retain the order of the initial list, use instead:
val uniqList = listOfTuples.filter( result.values.toSet )
This compiles, but as I can't test it it's hard to say if it does "The Right Thing" (tm):
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
(for {outerIndex <- 0 until listOfTuple.size
if outerIndex != listOfTuple.size - 1
innerIndex <- outerIndex + 1 until listOfTuple.size
if listOfTuple(i)._1.flag == listOfTuple(j)._1.flag
} yield listOfTuple(i)).reverse.toList
Note that you can use == instead of equals (use eq if you need reference equality).
BTW: https://codereview.stackexchange.com/ is better suited for this type of question.
Do not use index with lists (like listOfTuple(i)). Index on lists have very lousy performance. So, some ways...
The easiest:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
SortedSet(listOfTuple: _*)(Ordering by (_._1.flag)).toList
This will preserve the last element of the list. If you want it to preserve the first element, pass listOfTuple.reverse instead. Because of the sorting, performance is, at best, O(nlogn). So, here's a faster way, using a mutable HashSet:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
// Produce a hash map to find the duplicates
import scala.collection.mutable.HashSet
val seen = HashSet[Flag]()
// now fold
listOfTuple.foldLeft(Nil: List[(Class1,Class2)]) {
case (acc, el) =>
val result = if (seen(el._1.flag)) acc else el :: acc
seen += el._1.flag
result
}.reverse
}
One can avoid using a mutable HashSet in two ways:
Make seen a var, so that it can be updated.
Pass the set along with the list being created in the fold. The case then becomes:
case ((seen, acc), el) =>

What is Scala way of finding whether all the elements of an Array has same length?

I am new to Scala and but very old to Java and had some understanding working with FP languages like "Haskell".
Here I am wondering how to implement this using Scala. There is a list of elements in an array all of them are strings and I just want to know if there is a way I can do this in Scala in a FP way. Here is my current version which works...
def checkLength(vals: Array[String]): Boolean = {
var len = -1
for(x <- conts){
if(len < 0)
len = x.length()
else{
if (x.length() != len)
return false
else
len = x.length()
}
}
return true;
}
And I am pretty sure there is a better way of doing this in Scala/FP...
list.forall( str => str.size == list(0).size )
Edit: Here's a definition that's as general as possilbe and also allows to check whether a property other than length is the same for all elements:
def allElementsTheSame[T,U](f: T => U)(list: Seq[T]) = {
val first: Option[U] = list.headOption.map( f(_) )
list.forall( f(_) == first.get ) //safe to use get here!
}
type HasSize = { val size: Int }
val checkLength = allElementsTheSame((x: HasSize) => x.size)_
checkLength(Array( "123", "456") )
checkLength(List( List(1,2), List(3,4) ))
Since everyone seems to be so creative, I'll be creative too. :-)
def checkLength(vals: Array[String]): Boolean = vals.map(_.length).removeDuplicates.size <= 1
Mind you, removeDuplicates will likely be named distinct on Scala 2.8.
Tip: Use forall to determine whether all elements in the collection do satisfy a certain predicate (e.g. equality of length).
If you know that your lists are always non-empty, a straight forall works well. If you don't, it's easy to add that in:
list match {
case x :: rest => rest forall (_.size == x.size)
case _ => true
}
Now lists of length zero return true instead of throwing exceptions.
list.groupBy{_.length}.size == 1
You convert the list into a map of groups of equal length strings. If all the strings have the same length, then the map will hold only one such group.
The nice thing with this solution is that you don't need to know anything about the length of the strings, and don't need to comapre them to, say, the first string. It works well on an empty string, in which case it returns false (if that's what you want..)
Here's another approach:
def check(list:List[String]) = list.foldLeft(true)(_ && list.head.length == _.length)
Just my €0.02
def allElementsEval[T, U](f: T => U)(xs: Iterable[T]) =
if (xs.isEmpty) true
else {
val first = f(xs.head)
xs forall { f(_) == first }
}
This works with any Iterable, evaluates f the minimum number of times possible, and while the block can't be curried, the type inferencer can infer the block parameter type.
"allElementsEval" should "return true for an empty Iterable" in {
allElementsEval(List[String]()){ x => x.size } should be (true)
}
it should "eval the function at each item" in {
allElementsEval(List("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(List("aa", "bb", "ccc")) { x => x.size } should be (false)
}
it should "work on Vector and Array as well" in {
allElementsEval(Vector("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(Vector("aa", "bb", "ccc")) { x => x.size } should be (false)
allElementsEval(Array("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(Array("aa", "bb", "ccc")) { x => x.size } should be (false)
}
It's just a shame that head :: tail pattern matching fails so insidiously for Iterables.