This question already has answers here:
How to get a set of all elements that occur multiple times in a list in Scala?
(2 answers)
Closed 8 years ago.
I find a lot about how to remove duplicates, but what is the most elegant way to remove unique items first and then the remaining duplicates.
E.g. a sequence (1, 2, 5, 2, 3, 4, 4, 0, 2) should be converted into (2, 4).
I can think of using a for-loop to add a count to each distinct item, but I could imagine that Scala has a more elegant way to achieve this.
distinct and diff will works for you:
val a = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
> a: List[Int] = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
val b = a diff a.distinct
> b: List[Int] = List(2, 4, 2)
val c = (a diff a.distinct).distinct
> c: List[Int] = List(2, 4)
In place distinct you can use toSet as well.
Also keep in mind that i => i can be replaced by identity and map(_._1) by keys, like this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).filter(_._2.size > 1).keys.toSeq
This is where a countByKey method, such as the one that can be found in Spark's API, would be useful.
Pretty straight forward:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(i => i).filter(_._2.size > 1).map(_._1).toSeq
Using the link from Ende Neu I think your code would become this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).collect { case (v, l) if l.length > 1 => v } toSeq
Related
I have the following array of arrays that represents a cycle in a graph that I want to print in the below format.
scala> result.collect
Array[Array[Long]] = Array(Array(0, 1, 4, 0), Array(1, 5, 2, 1), Array(1, 4, 0, 1), Array(2, 3, 5, 2), Array(2, 1, 5, 2), Array(3, 5, 2, 3), Array(4, 0, 1, 4), Array(5, 2, 3, 5), Array(5, 2, 1, 5))
0:0->1->4;
1:1->5->2;1->4->0;
2:2->3->5;2->1->5;
3:3->5->2;
4:4->0->1;
5:5->2->3;5->2->1;
How can I do this? I have tried to do a for loop with if statements like other coding languages but scala's ifs in for loops are for filtering and cannot make use if/else to account for two different criteria.
example python code
for (array,i) in enumerate(range(0,result.length)):
if array[i] == array[i+1]:
//print thing needed
else:
// print other thing
I also tried to do result.groupBy to make it easier to print but doing that ruins the arrays.
Array[(Long, Iterable[Array[Long]])] = Array((4,CompactBuffer([J#3677a08a)), (0,CompactBuffer([J#695fd7e)), (1,CompactBuffer([J#50b0f441, [J#142efc4d)), (3,CompactBuffer([J#1fd66db2)), (5,CompactBuffer([J#36811d3b, [J#61c4f556)), (2,CompactBuffer([J#2eba1b7, [J#2efcf7a5)))
Is there a way to nicely print the output needed in Scala?
This should do it:
result
.groupBy(_.head)
.toArray
.sortBy(_._1)
.map {
case (node, cycles) =>
val paths = cycles.map { cycle =>
cycle
.init // drop last node
.mkString("->")
}
s"$node:${paths.mkString(";")}"
}
.mkString(";\n")
This is the output for the sample input you provided:
0:0->1->4;
1:1->5->2;1->4->0;
2:2->3->5;2->1->5;
3:3->5->2;
4:4->0->1;
5:5->2->3;5->2->1
Hi I am new to scala and getting silly doubts, I have a list of lists which looks like this
(4,List(List(2, 4, 0, 2, 4), List(3, 4, 0, 2, 4), List(4, 0, 1, 2, 4)))
I want to get the lists which starts with 4. How to do it.
you use filter to traverse through the List and apply your predicate on each list to check if first elem is 4.
example:
scala> val (data, options) = (4, List(List(2, 4, 0, 2, 4), List(3, 4, 0, 2, 4), List(4, 0, 1, 2, 4)))
data: Int = 4
options: List[List[Int]] = List(List(2, 4, 0, 2, 4), List(3, 4, 0, 2, 4), List(4, 0, 1, 2, 4))
scala> options.filter(_.headOption.contains(data))
res0: List[List[Int]] = List(List(4, 0, 1, 2, 4))
Also see: Scala List.filter with two conditions, applied only once
There are several ways.
Here is another
listOfLists.collect{ case l # 4 :: _ => l}
Potentially more powerful because we can filter on the first n elements, e.g.
listOfLists.collect{ case l # 4 :: 0 :: 1 :: _ => l}
If you have a Tuple (Int, List[List[Int]]), and want to return Lists that start with the Int provided in the start, for this case 4:
I would recommend you do something like this:
val myTuple = (4,List(List(2, 4, 0, 2, 4), List(3, 4, 0, 2, 4), List(4, 0, 1, 2, 4)))
myTuple._2.filter(_.headOption.contains(myTuple._1))
And this will return List(List(4, 0, 1, 2, 4))
What we are doing here is, we are first accessing the List[List[Int]] in the Tuple by doing myTuple._2 then we filter to remove Lists that don't have a head value as 4 - which we passed in as myTuple._1.
Note we are using headOption instead of head to get the first element in a List, this is to handle exceptions where no List contains the value provided in the start, for this case 4 (more details on this can be found here http://www.bks2.com/blog/2012/12/31/head_vs_headOption/)
val t = (4, List(List(2, 4, 0, 2, 4), List(3, 4, 0, 2, 4), List(4, 0, 1, 2, 4)))
t._2.filter(_.head==t._1)
In REPL:
scala> t._2.filter(_.head==t._1)
res5: List[List[Int]] = List(List(4, 0, 1, 2, 4))
I want to iterate over a scala list in an incremental way, i.e. the first pass should yield the head, the second the first 2 elements, the next the first 3, etc...
I can code this myself as a recursive function, but does a pre-existing function exist for this in the standard library?
You can use the .inits method to get there, albeit there may be performance issues for a large list (I haven't played around with making this lazy):
scala> val data = List(0,1,2,3,4)
data: List[Int] = List(0, 1, 2, 3, 4)
scala> data.inits.toList.reverse.flatten
res2: List[Int] = List(0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4)
You can use the take like so:
scala> val myList = 1 to 10 toList
myList: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> for(cnt <- myList.indices) yield myList.take(cnt+1)
res1: scala.collection.immutable.IndexedSeq[List[Int]] = Vector(List(1), List(1, 2), List(1, 2, 3), List(1, 2, 3, 4), List(1, 2, 3, 4, 5), List(1, 2, 3, 4, 5, 6), List(1, 2, 3, 4, 5, 6, 7), List(1, 2, 3, 4, 5, 6, 7, 8), List(1, 2, 3, 4, 5, 6, 7, 8, 9), List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
OK, since I've whined enough, here's an iterator version that tries reasonably hard to not waste space or compute more than is needed at at one point:
class stini[A](xs: List[A]) extends Iterator[List[A]] {
var ys: List[A] = Nil
var remaining = xs
def hasNext = remaining.nonEmpty
def next = {
val e = remaining.head
remaining = remaining.tail
ys = e :: ys
ys.reverse
}
}
val it = new stini(List(1, 2, 3, 4))
it.toList
//> List[List[Int]] =
// List(List(1), List(1, 2), List(1, 2, 3), List(1, 2, 3, 4))
Try: for((x, i) <- l.view.zipWithIndex) println(l.take(i + 1))
if you need something side-effected (I just did println to give you an example)
This feels like a peculiar problem and I am very new to Scala, so I don't know how to ask the right questions in order to get progress on this problem.
As a demonstration, say I have a list of lists like this:
val data = List(List(1, 2, 3, 4), List(1, 2, 2, 3, 4), List(1, 2, 3, 3, 3, 4), List(1, 2, 3, 4), List(2, 3, 4))
and I want to be able to reduce it down to a List of integers that looks mostly like a distinct set of the multiple lists, with one exception: where each list has more than one of each integer, I want to represent that in the final list. So as a general rule, the list with the most representations of that integer will have "their" repetitions of that integer in the final list. So that would ideally give:
List(1, 2, 2, 3, 3, 3, 4)
I know I can do data.flatten.distinct and get:
List(1, 2, 3, 4)
but that's not what I want and I know there's probably a bit more work to get to the desired result.
I am wondering if there is a good way to achieve the desired result in a functional way in scala.
Try this
val data = List(List(1, 2, 3, 4), List(1, 2, 2, 3, 4), List(1, 2, 3, 3, 3, 4), List(1, 2, 3, 4), List(2, 3, 4))
val map = data.map(_.groupBy(identity)).foldLeft(Map[Int, List[Int]]()) {
case (r, c) => r ++ c.map {
case (k, v) => k -> (if (v.size > r.getOrElse(k, List()).size) v else r(k))
}
}.values.flatten
//> map : Iterable[Int] = List(2, 2, 4, 1, 3, 3, 3)
It does not maintain the ordering. After this you can call to sort this.
Maybe this is cleaner
data.flatMap(_.groupBy(identity)).groupBy(_._1).mapValues(_.sortBy(_._2.size).reverse(0)._2).values.flatten
//> res0: Iterable[Int] = List(2, 2, 4, 1, 3, 3, 3)
I don't quite get it, but you can just order elements
data.flatten.sorted
Which would give you
List(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
if you want them ordered by number of encounters, you can do it like this:
data.flatten.groupBy(k => k).mapValues(_.size).toList.sortBy(_._2).map(_._1)
which would give you
List(1, 4, 2, 3)
UserGuide of scalacheck project mentioned sized generators. The explanation code
def matrix[T](g:Gen[T]):Gen[Seq[Seq[T]]] = Gen.sized {size =>
val side = scala.Math.sqrt(size).asInstanceOf[Int] //little change to prevent compile-time exception
Gen.vectorOf(side, Gen.vectorOf(side, g))
}
explained nothing for me. After some exploration I understood that length of generated sequence does not depend on actual size of generator (there is resize method in Gen object that "Creates a resized version of a generator" according to javadoc (maybe that means something different?)).
val g = Gen.choose(1,5)
val g2 = Gen.resize(15, g)
println(matrix(g).sample) // (1)
println(matrix(g2).sample) // (2)
//1,2 produce Seq with same length
Could you explain me what had I missed and give me some examples how you use them in testing code?
The vectorOf (which now is replaced with listOf) generates lists with a size that depends (linearly) on the size parameter that ScalaCheck sets when it evaluates a generator. When ScalaCheck tests a property it will increase this size parameter for each test, resulting in properties that are tested with larger and larger lists (if listOf is used).
If you create a matrix generator by just using the listOf generator in a nested fashion, you will get matrices with a size that depends on the square of the size parameter. Hence when using such a generator in a property you might end up with very large matrices, since ScalaCheck increases the size parameter for each test run. However, if you use the resize generator combinator in the way it is done in the ScalaCheck User Guide, your final matrix size depend linearly on the size parameter, resulting in nicer performance when testing your properties.
You should really not have to use the resize generator combinator very often. If you need to generate lists that are bounded by some specific size, it's much better to do something like the example below instead, since there is no guarantee that the listOf/ containerOf generators really use the size parameter the way you expect.
def genBoundedList(maxSize: Int, g: Gen[T]): Gen[List[T]] = {
Gen.choose(0, maxSize) flatMap { sz => Gen.listOfN(sz, g) }
}
The vectorOf method that you use is deprecated , and you should use the listOf method. This generates a list of random length where the maximum length is limited by the size of the generator. You should therefore resize the generator that
actually generates the actual list if you want control over the maximum elements that are generated:
scala> val g1 = Gen.choose(1,5)
g1: org.scalacheck.Gen[Int] = Gen()
scala> val g2 = Gen.listOf(g1)
g2: org.scalacheck.Gen[List[Int]] = Gen()
scala> g2.sample
res19: Option[List[Int]] = Some(List(4, 4, 4, 4, 2, 4, 2, 3, 5, 1, 1, 1, 4, 4, 1, 1, 4, 5, 5, 4, 3, 3, 4, 1, 3, 2, 2, 4, 3, 4, 3, 3, 4, 3, 2, 3, 1, 1, 3, 2, 5, 1, 5, 5, 1, 5, 5, 5, 5, 3, 2, 3, 1, 4, 3, 1, 4, 2, 1, 3, 4, 4, 1, 4, 1, 1, 4, 2, 1, 2, 4, 4, 2, 1, 5, 3, 5, 3, 4, 2, 1, 4, 3, 2, 1, 1, 1, 4, 3, 2, 2))
scala> val g3 = Gen.resize(10, g2)
g3: java.lang.Object with org.scalacheck.Gen[List[Int]] = Gen()
scala> g3.sample
res0: Option[List[Int]] = Some(List(1))
scala> g3.sample
res1: Option[List[Int]] = Some(List(4, 2))
scala> g3.sample
res2: Option[List[Int]] = Some(List(2, 1, 2, 4, 5, 4, 2, 5, 3))