Comparing items in two lists

Comparing items in two lists - scala

I have two lists : List(1,1,1) , List(1,0,1)
I want to get the following :
A count of every element that contains a 1 in first list and a 0 in the corresponding list at same position and vice versa.
In above example this would be 1 , 0 since the first list contains a 1 at middle position and second list contains a 0 at same position (middle).
A count of every element where 1 is in first list and 1 is also in second list.
In above example this is two since there are two 1's in each corresponding list. I can get this using the intersect method of class List.
I am just looking an answer to point 1 above. I could use an iterative a approach to count the items but is there a more functional method ?
Here is the entire code :
class Similarity {
def getSimilarity(number1: List[Int], number2: List[Int]) = {
val num: List[Int] = number1.intersect(number2)
println("P is " + num.length)
}
}
object HelloWorld {
def main(args: Array[String]) {
val s = new Similarity
s.getSimilarity(List(1, 1, 1), List(1, 0, 1))
}
}

For the first one:
scala> val a = List(1,1,1)
a: List[Int] = List(1, 1, 1)
scala> val b = List(1,0,1)
b: List[Int] = List(1, 0, 1)
scala> a.zip(b).filter(x => x._1==1 && x._2==0).size
res7: Int = 1
For the second:
scala> a.zip(b).filter(x => x._1==1 && x._2==1).size
res7: Int = 2

You can count all combinations easily and have it in a map with
def getSimilarity(number1 : List[Int] , number2 : List[Int]) = {
//sorry for the 1-liner, explanation follows
val countMap = (number1 zip number2) groupBy (identity) mapValues {_.length}
}
/*
* Example
* number1 = List(1,1,0,1,0,0,1)
* number2 = List(0,1,1,1,0,1,1)
*
* countMap = Map((1,0) -> 1, (1,1) -> 3, (0,1) -> 2, (0,0) -> 1)
*/
The trick is a common one
// zip the elements pairwise
(number1 zip number2)
/* List((1,0), (1,1), (0,1), (1,1), (0,0), (0,1), (1,1))
*
* then group together with the identity function, so pairs
* with the same elements are grouped together and the key is the pair itself
*/
.groupBy(identity)
/* Map( (1,0) -> List((1,0)),
* (1,1) -> List((1,1), (1,1), (1,1)),
* (0,1) -> List((0,1), (0,1)),
* (0,0) -> List((0,0))
* )
*
* finally you count the pairs mapping the values to the length of each list
*/
.mapValues(_.length)
/* Map( (1,0) -> 1,
* (1,1) -> 3,
* (0,1) -> 2,
* (0,0) -> 1
* )
Then all you need to do is lookup on the map

a.zip(b).filter(x => x._1 != x._2).size

Almost the same solution that was proposed by Jatin, except that you can useList.countfor a better lisibility:
def getSimilarity(l1: List[Int], l2: List[Int]) =
l1.zip(l2).count({case (x,y) => x != y})

You can also use foldLeft. Assuming there are no non-negative numbers:
a.zip(b).foldLeft(0)( (x,y) => if (y._1 + y._2 == 1) x + 1 else x )

1) You could zip 2 lists to get list of (Int, Int), collect only pairs (1, 0) and (0, 1), replace (1, 0) with 1 and (0, 1) with -1 and get sum. If count of (1, 0) and count of (0, 1) are the same the sum would be equal 0:
val (l1, l2) = (List(1,1,1) , List(1,0,1))
(l1 zip l2).collect{
case (1, 0) => 1
case (0, 1) => -1
}.sum == 0
You could use view method to prevent creation intermediate collections.
2) You could use filter and length to get count of elements with some condition:
(l1 zip l2).filter{ _ == (1, 1) }.length
(l1 zip l2).collect{ case (1, 1) => () }.length

Related

Attacking grouping sets of values functionally

Given a map associating indices to values, how do I create a separate map that accumulates the values that are above a particular threshold when the number of values that can be grouped together cannot exceed some limiting value?
For example, given a mapping like this:
val raw = Map(0 -> 2, 1 -> 1, 2 -> 2, 3 -> 0, 4 -> 1, 5 -> 2)
Group those values over 2 together but each grouping can only contain at most the sum of 2 values such that if the first value is >= 2 then the grouping would contain a single value. In contrast, if the 1st value is less than 2, the grouping will be of size 2 with a value consisting of the same of the 1st value and the second value.
Executing that on the mapping above would yield a map of the group's index to the value, e.g.,
Map(0 -> 2, 1 -> 3, 2 -> 1, 3 -> 2) // Result
Obviously the way to do this in a non-functional way would be like this:
var c = 0
var sortedIndex = 0
var acc: Map[Int, Int] = Map() // Result accumulator
val limit = 2 // Anything larger will be forced into the next group
while (c < raw.size) {
if (raw(c) >= limit) {
acc = acc ++ Map(sortedIndex -> raw(c))
c = c + 1
} else {
acc = acc ++ Map(sortedIndex -> raw(c) + raw(c + i)
c = c + 2
}
sortedIndex = sortedIndex + 1
}
acc
How would I do this functionally? I.e., immutable states, reducing my use of loops. (I understand that loops are not "dead" in FP, just trying to reinforce a use case where I can get away with NOT using loops.)

I do not think you need to work with Map for this problem. Since the key of the map is simple index. Any case the following work for your problem:
val testLimit = 2 // Update the constants as required
val takeUpto = 2
def accumulator(input: List[Int], output: List[Int] = List.empty[Int]): List[Int] = {
input match {
case Nil => output // We have reached at the end of the input
case head :: tail if head >= testLimit => accumulator(tail, output :+ head)
case m =>
val (toSum, next) = m.splitAt(takeUpto)
accumulator(next, output :+ toSum.sum)
}
}
// Map(0 -> 2, 1 -> 3, 3 -> 1, 4 -> 2) // Result
// val raw = Map(0 -> 2, 1 -> 1, 2 -> 2, 3 -> 0, 4 -> 1, 5 -> 2) equivalent is List(2, 1, 2, 0, 1, 2)
println(accumulator(List(2, 1, 2, 0, 1, 2)))

scala Stream transformation and evaluation model

Consider a following list transformation:
List(1,2,3,4) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
It is evaluated in the following way:
List(1, 2, 3, 4) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
List(11, 12, 13, 14) filter (_ % 2 == 0) map (_ * 3)
List(12, 14) map (_ * 3)
List(36, 42)
So there are three passes and with each one a new list structure created.
So, the first question: can Stream help to avoid it and if yes -- how? Can all evaluations be made in a single pass and without additional structures created?
Isn't the following Stream evaluation model correct:
Stream(1, ?) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
Stream(11, ?) filter (_ % 2 == 0) map (_ * 3)
// filter condition fail, evaluate the next element
Stream(2, ?) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
Stream(12, ?) filter (_ % 2 == 0) map (_ * 3)
Stream(12, ?) map (_ * 3)
Stream(36, ?)
// finish
If it is, then there are the same number of passes and the same number of new Stream structures created as in the case of a List. If it is not -- then the second question: what is Stream evaluation model in particularly this type of transformation chain?

One way to avoid intermediate collections is to use view.
List(1,2,3,4).view map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
It doesn't avoid every intermediate, but it can be useful. This page has lots of info and is well worth the time.

No, you can't avoid it by using Stream.
But you do can avoid it by using the method collect, and you should keep the idea that everytime you use a map after filter you may need a collect.
Here is the code:
scala> def time(n: Int)(call : => Unit): Long = {
| val start = System.currentTimeMillis
| var cnt = n
| while(cnt > 0) {
| cnt -= 1
| call
| }
| System.currentTimeMillis - start
| }
time: (n: Int)(call: => Unit)Long
scala> val xs = List.fill(10000)((math.random * 100).toInt)
xs: List[Int] = List(37, 86, 74, 1, ...)
scala> val ys = Stream(xs :_*)
ys: scala.collection.immutable.Stream[Int] = Stream(37, ?)
scala> time(10000){ xs map (_+10) filter (_%2 == 0) map (_*3) }
res0: Long = 7182
//Note call force to evaluation of the whole stream.
scala> time(10000){ ys map (_+10) filter (_%2 == 0) map (_*3) force }
res1: Long = 17408
scala> time(10000){ xs.view map (_+10) filter (_%2 == 0) map (_*3) force }
res2: Long = 6322
scala> time(10000){ xs collect { case x if (x+10)%2 == 0 => (x+10)*3 } }
res3: Long = 2339

As far as I know, If you always iterate through the whole collection Stream does not help you.
It will create the same number as new Streams as with the List.
Correct me if I am wrong, but I understand it as follows:
Stream is a lazy structure, so when you do:
val result = Stream(1, ?) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
the result is another stream that links to the results of the previous transformations. So if force evaluation with a foreach (or e.g. mkString)
result.foreach(println)
for each iteration the above chain is evaluated to get the current item.
However, you can reduce passes by 1, if you replace filter with withFilter. Then the filter is kind of applied with the map function.
List(1,2,3,4) map (_ + 10) withFilter (_ % 2 == 0) map (_ * 3)
You can reduce it to one pass with flatMap:
List(1,2,3,4) flatMap { x =>
val y = x + 10
if (y % 2 == 0) Some(y * 3) else None
}

Scala can filter and transform a collection in a variety of ways.
First your example:
List(1,2,3,4) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
Could be optimized:
List(1,2,3,4) filter (_ % 2 == 0) map (v => (v+10)*3)
Or, folds could be used:
List(1,2,3,4).foldLeft(List[Int]()){ case (a,b) if b % 2 == 0 => a ++ List((b+10)*3) case (a,b) => a }
Or, perhaps a for-expression:
for( v <- List(1,2,3,4); w=v+10 if w % 2 == 0 ) yield w*3
Or, maybe the clearest to understand, a collection:
List(1,2,3,4).collect{ case v if v % 2 == 0 => (v+10)*3 }
But to address your questions about Streams; Yes, streams can be used
and for large collections where what is wanted is often found early, a
Stream is a good choice:
def myStream( s:Stream[Int] ): Stream[Int] =
((s.head+10)*3) #:: myStream(s.tail.filter( _ % 2 == 0 ))
myStream(Stream.from(2)).take(2).toList // An infinitely long list yields
// 36 & 42 where the 3rd element
// has not been processed yet
With this Stream example the filter is only applied to the next element as it is needed, not to the entire list -- good thing, or it would never stop :)

Play Scala - groupBy remove repetitive values

I apply groupBy function to my List collection, however I want to remove the repetitive values in the value part of the Map. Here is the initial List collection:
PO_ID PRODUCT_ID RETURN_QTY
1 1 10
1 1 20
1 2 30
1 2 10
When I apply groupBy to that List, it will produce something like this:
(1, 1) -> (1, 1, 10),(1, 1, 20)
(1, 2) -> (1, 2, 30),(1, 2, 10)
What I really want is something like this:
(1, 1) -> (10),(20)
(1, 2) -> (30),(10)
So, is there anyway to remove the repetitive part in the Map's values [(1,1),(1,2)] ?
Thanks..

For
val a = Seq( (1,1,10), (1,1,20), (1,2,30), (1,2,10) )
consider
a.groupBy( v => (v._1,v._2) ).mapValues( _.map (_._3) )
which delivers
Map((1,1) -> List(10, 20), (1,2) -> List(30, 10))
Note that mapValues operates over a List[List] of triplets obtained from groupBy, whereas in map we extract the third element of each triplet.

Is it easier to pull the tuple apart first?
scala> val ts = Seq( (1,1,10), (1,1,20), (1,2,30), (1,2,10) )
ts: Seq[(Int, Int, Int)] = List((1,1,10), (1,1,20), (1,2,30), (1,2,10))
scala> ts map { case (a,b,c) => (a,b) -> c }
res0: Seq[((Int, Int), Int)] = List(((1,1),10), ((1,1),20), ((1,2),30), ((1,2),10))
scala> ((Map.empty[(Int, Int), List[Int]] withDefaultValue List.empty[Int]) /: res0) { case (m, (k,v)) => m + ((k, m(k) :+ v)) }
res1: scala.collection.immutable.Map[(Int, Int),List[Int]] = Map((1,1) -> List(10, 20), (1,2) -> List(30, 10))
Guess not.

How can I find repeated items in a Scala List?

I have a Scala List that contains some repeated numbers. I want to count the number of times a specific number will repeat itself. For example:
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val repeats = list.takeWhile(_ == List(3,3)).size
And the val repeats would equal 2.
Obviously the above is pseudo-code and takeWhile will not find two repeated 3s since _ represents an integer. I tried mixing both takeWhile and take(2) but with little success. I also referred code from How to find count of repeatable elements in scala list but it appears the author is looking to achieve something different.
Thanks for your help.

This will work in this case:
val repeats = list.sliding(2).count(_.forall(_ == 3))
The sliding(2) method gives you an iterator of lists of elements and successors and then we just count where these two are equal to 3.
Question is if it creates the correct result to List(3, 3, 3)? Do you want that to be 2 or just 1 repeat.

val repeats = list.sliding(2).toList.count(_==List(3,3))
and more generally the following code returns tuples of element and repeats value for all elements:
scala> list.distinct.map(x=>(x,list.sliding(2).toList.count(_.forall(_==x))))
res27: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
which means that the element '3' repeats 2 times consecutively at 2 places and all others 0 times.
and also if we want element repeats 3 times consecutively we just need to modify the code as follows:
list.distinct.map(x=>(x,list.sliding(3).toList.count(_.forall(_==x))))
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 5)
scala> list.distinct.map(x=>(x,list.sliding(3).toList.count(_==List(x,x,x))))
res29: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
Even sliding value can be varied by defining a function as:
def repeatsByTimes(list:List[Int],n:Int) =
list.distinct.map(x=>(x,list.sliding(n).toList.count(_.forall(_==x))))
Now in REPL:
scala> val list = List(1,2,3,3,4,2,8,4,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 4, 2, 8, 4, 3, 3, 5)
scala> repeatsByTimes(list,2)
res33: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeatsByTimes(list,3)
res34: List[(Int, Int)] = List((1,0), (2,0), (3,3), (4,0), (8,0), (5,0))
scala>
We can go still further like given a list of integers and given a maximum number
of consecutive repetitions that any of the element can occur in the list, we may need a list of 3-tuples representing (the element, number of repetitions of this element, at how many places this repetition occurred). this is more exhaustive information than the above. Can be achieved by writing a function like this:
def repeats(list:List[Int],maxRep:Int) =
{ var v:List[(Int,Int,Int)] = List();
for(i<- 1 to maxRep)
v = v ++ list.distinct.map(x=>
(x,i,list.sliding(i).toList.count(_.forall(_==x))))
v.sortBy(_._1) }
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeats(list,3)
res38: List[(Int, Int, Int)] = List((1,1,1), (1,2,0), (1,3,0), (2,1,3),
(2,2,0), (2,3,0), (3,1,9), (3,2,6), (3,3,3), (4,1,3), (4,2,0), (4,3,0),
(5,1,1), (5,2,0), (5,3,0), (8,1,1), (8,2,0), (8,3,0))
scala>
These results can be understood as follows:
1 times the element '1' occurred at 1 places.
2 times the element '1' occurred at 0 places.
............................................
............................................
.............................................
2 times the element '3' occurred at 6 places..
.............................................
3 times the element '3' occurred at 3 places...
............................................and so on.

Thanks to Luigi Plinge I was able to use methods in run-length encoding to group together items in a list that repeat. I used some snippets from this page here: http://aperiodic.net/phil/scala/s-99/
var n = 0
runLengthEncode(totalFrequencies).foreach{ o =>
if(o._1 > 1 && o._2==subjectNumber) n+=1
}
n
The method runLengthEncode is as follows:
private def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
private def runLengthEncode[A](ls: List[A]): List[(Int, A)] =
pack(ls) map { e => (e.length, e.head) }
I'm not entirely satisfied that I needed to use the mutable var n to count the number of occurrences but it did the trick. This will count the number of times a number repeats itself no matter how many times it is repeated.

If you knew your list was not very long you could do it with Strings.
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val matchList = List(3,3)
(matchList.mkString(",")).r.findAllMatchIn(list.mkString(",")).length

From you pseudocode I got this working:
val pairs = list.sliding(2).toList //create pairs of consecutive elements
val result = pairs.groupBy(x => x).map{ case(x,y) => (x,y.size); //group pairs and retain the size, which is the number of occurrences.
result will be a Map[List[Int], Int] so you can the count number like:
result(List(3,3)) // will return 2
I couldn't understand if you also want to check lists of several sizes, then you would need to change the parameter to sliding to the desired size.

def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
def encode[A](ls: List[A]): List[(Int, A)] = pack(ls) map { e => (e.length, e.head) }
val numberOfNs = list.distinct.map{ n =>
(n -> list.count(_ == n))
}.toMap
val runLengthPerN = runLengthEncode(list).map{ t => t._2 -> t._1}.toMap
val nRepeatedMostInSuccession = runLengthPerN.toList.sortWith(_._2 <= _._2).head._1
Where runLength is defined as below from scala's 99 problems problem 9 and scala's 99 problems problem 10.
Since numberOfNs and runLengthPerN are Maps, you can get the population count of any number in the list with numberOfNs(number) and the length of the longest repitition in succession with runLengthPerN(number). To get the runLength, just compute as above with runLength(list).map{ t => t._2 -> t._1 }.

Why does val range = Range(1, 2, 3, 4) give error?

for (i <- Array.apply(1 to 4))
print(i);
Range(1, 2, 3, 4)
Range(1, 10)
//res0: scala.collection.immutable.Range = Range(1, 2, 3, 4, 5, 6, 7, 8, 9)
So why does val range = Range(1, 2, 3, 4) give error?

A Range is a special kind of collection that is restricted in what it can represent in order to efficiently perform its operations. It is only able to represent a sequence of numbers with a fixed step in between elements. As such, it only needs to be told about the start, end, and step size in order to be constructed. An Array on the other can hold arbitrary values, so its constructor must be told explicitly what those values are.
The definition of Range.apply is that it takes either:
two arguments: a start and end for a range, or
three arguments: a start, end, and step size for the range.
Here are the definitions of apply from scala.collection.immutable.Range:
/** Make a range from `start` until `end` (exclusive) with given step value.
* #note step != 0
*/
def apply(start: Int, end: Int, step: Int): Range = new Range(start, end, step)
/** Make an range from `start` to `end` inclusive with step value 1.
*/
def apply(start: Int, end: Int): Range = new Range(start, end, 1)
Constrast this with the apply for scala.Array, which accepts a variable-length argument T*:
/** Creates an array with given elements.
*
* #param xs the elements to put in the array
* #return an array containing all elements from xs.
*/
def apply[T: ClassManifest](xs: T*): Array[T] = {
val array = new Array[T](xs.length)
var i = 0
for (x <- xs.iterator) { array(i) = x; i += 1 }
array
}
If your goal is to have an Array of the numbers 1 to 4, try this:
(1 to 4).toArray

Well,
scala> Array("abc") // an array containing a string
res0: Array[String] = Array(abc)
scala> Array(1) // an array containing a number
res1: Array[Int] = Array(1)
scala> Array(true) // an array containing a boolean
res2: Array[Boolean] = Array(true)
scala> Array(1 to 4) // an array containing a range
res3: Array[scala.collection.immutable.Range.Inclusive] = Array(Range(1, 2, 3, 4))
Why should it have worked any other way? Anyway, this is what you should have used:
scala> Array.range(1, 4)
res4: Array[Int] = Array(1, 2, 3)