I have two arrays like [1, 2, 3] and ["a", "b", "c"] and I want to map over the zipped values (1, "a"), (2, "b"), and (3, "c") using Zip2.
If I do this:
let foo = map(Zip2([1, 2, 3], ["a", "b", "c"]).generate()) { $0.0 }
foo has the type ZipGenerator2<IndexingGenerator<Array<Int>>, IndexingGenerator<Array<String>>>?.
Is there a way to make that an array?
The following will get you an array from the return value of Zip2:
var myZip = Zip2([1, 2, 3], ["a", "b", "c"]).generate()
var myZipArray: Array<(Int, String)> = []
while let elem = myZip.next() {
myZipArray += elem
println(myZipArray) // [(1, a), (2, b), (3, c)]
let myZip = Zip2([1, 2, 3], ["a", "b", "c"])
let myZipArray = Array(myZip)
println(myZipArray) // [(1, a), (2, b), (3, c)]
-- now for fun --
I'm going to guess that we can init a new Array with anything that responds to generate() ?
println(Array("abcde")) // [a, b, c, d, e]
Assume that vals is the result of Zip2, which I'll presume is an array of two tuples. Like this:
let vals = [(1, "a"), (2, "b"), (3, "c")]
With that, just invoke the map() method on an array.
vals.map { $0.0 }
For example:
> vals.map { $0.1 }
$R16: String[] = size=3 {
[0] = "a"
[1] = "b"
[2] = "c"
I am looking to do this in Scala, but nothing works. In pyspark it works obviously.
from operator import itemgetter
rdd = sc.parallelize([(0, [(0,'a'), (1,'b'), (2,'c')]), (1, [(3,'x'), (5,'y'), (6,'z')])])
mapped = rdd.mapValues(lambda v: map(itemgetter(0), v))
[(0, [0, 1, 2]), (1, [3, 5, 6])]
val rdd = sparkContext.parallelize(List(
(0, Array((0, "a"), (1, "b"), (2, "c"))),
(1, Array((3, "x"), (5, "y"), (6, "z")))
.mapValues(v => v.map(_._1))
.foreach(v=>println(v._1+"; "+v._2.toSeq.mkString(",") ))
0; 0,1,2
1; 3,5,6
Is there a function in scala that groups all elements of a list by the number of these occurrences?
For example, I have this list:
val x = List("c", "b", "b", "c", "a", "d", "c")
And I want to get a new list like that:
x = List((3, "c"), (2, "b"), (1, "a"), (1, "d"))
You can first count the occurrences of each element and then reverse the resulting tuples:
List("c", "b", "b", "c", "a", "d", "c")
.groupBy(identity).mapValues(_.size) // Map(b -> 2, d -> 1, a -> 1, c -> 3)
.toList // List((b,2), (d,1), (a,1), (c,3))
.map{ case (k, v) => (v, k) } // List((2,b), (1,d), (1,a), (3,c))
You don't specifically mention a notion of order for the output, but if this was a requirement, this solution would need to be adapted.
Try this to get exactly what you want in the order you mentioned. (ie., order preserved in the List while taking counts):
scala> val x = List("c", "b", "b", "c", "a", "d", "c")
x: List[String] = List(c, b, b, c, a, d, c)
scala> x.distinct.map(v=>(x.filter(_==v).size,v))
res225: List[(Int, String)] = List((3,c), (2,b), (1,a), (1,d))
I want to select some elements(feature) of rdd based on binary array. I have an array consisting of 0,1 with size 40 that specify if an element is present at that index or not.
My RDD was created form kddcup99 dataset
val rdd=sc.textfile("./data/kddcup.txt")
val data=rdd.map(_.split(','))
How can I to filter or select elements of data(rdd[Array[String]]) whose value of correspondent index in binary array is 1?
If I understood your question correctly, you have an array like :
val arr = Array(1, 0, 1, 1, 1, 0)
And a RDD[Array[String]] which looks like :
val rdd = sc.parallelize(Array(
Array("A", "B", "C", "D", "E", "F") ,
Array("G", "H", "I", "J", "K", "L")
) )
Now, to get elements at the indices where arr has 1, you need to first get the indices which have 1 as the value in arr
val requiredIndices = arr.zipWithIndex.filter(_._1 == 1).map(_._2)
requiredIndices: Array[Int] = Array(0, 2, 3, 4)
And then similarily with RDD, you can use zipWithIndex and contains to check if that index is available in your requiredIndices array :
rdd.map(_.zipWithIndex.filter(x => requiredIndices.contains(x._2) ).map(_._1) )
// Array[Array[String]] = Array(Array(A, C, D, E), Array(G, I, J, K))
I have an array, something like that:
val a = Array("a", "c", "c", "z", "c", "b", "a")
and I want to get a map with keys of all different values of this array and values with a collection of relevant indexes for each such group, i.e. for a given array the answer would be:
"a" -> Array(0, 6),
"b" -> Array(5),
"c" -> Array(1, 2, 4),
"z" -> Array(3)
Surprisingly, it proved to be somewhat more complicated that I've anticipated. The best I've came so far with is:
a.zipWithIndex.groupBy {
case(cnt, idx) => cnt
}.map {
case(cnt, arr) => (cnt, arr.map {
case(k, v) => v
which is not either concise or easy to understand. Any better ideas?
Your code can be rewritten as oneliner, but it looks ugly.
Another way is to use mutable.MultiMap
import collection.mutable.{ HashMap, MultiMap, Set }
val as = Array("a", "c", "c", "z", "c", "b", "a")
val mm = new HashMap[String, Set[Int]] with MultiMap[String, Int]
and then just add every binding
as.zipWithIndex foreach (mm.addBinding _).tupled
//mm = Map(z -> Set(3), b -> Set(5), a -> Set(0, 6), c -> Set(1, 2, 4))
finally you can convert it mm.toMap if you want immutable version.
Here's a version with foldRight. I think it's reasonably clear.
val a = Array("a", "c", "c", "z", "c", "b", "a")
.foldRight(Map[String, List[Int]]())
{case ((e,i), m)=> m updated (e, i::m.getOrElse(e, Nil))}
//> res0: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(0, 6)
//| , b -> List(5), c -> List(1, 2, 4), z -> List(3))
Another version using foldLeft and an immutable Map with default value:
val a = Array("a", "c", "c", "z", "c", "b", "a")
a.zipWithIndex.foldLeft(Map[String, List[Int]]().withDefaultValue(Nil))( (m, p) => m + ((p._1, p._2 +: m(p._1))))
// res6: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(6, 0), c -> List(4, 2, 1), z -> List(3), b -> List(5))
Starting in Scala 2.13, we can use the new groupMap which (as its name suggests) is a one-pass equivalent of a groupBy and a mapping over grouped items:
// val a = Array("a", "c", "c", "z", "c", "b", "a")
// Map("z" -> Array(3), "b" -> Array(5), "a" -> Array(0, 6), "c" -> Array(1, 2, 4))
zips each item with its index, giving (item, index) tuples
groups elements based on their first tuple part (_._1) (group part of groupMap)
maps grouped values to their second tuple part (_._2 i.e. their index) (map part of groupMap)
Consider such a map:
Map("one" -> Iterable(1,2,3,4), "two" -> Iterable(3,4,5), "three" -> Iterable(1,2))
I want to get a list of all possible permutations of elements under Iterable, one element for each key. For this example, this would be something like:
// first element of "one", first element of "two", first element of "three"
// second element of "one", second element of "two", second element of "three"
// third element of "one", third element of "two", first element of "three"
// etc.
Seq(Iterable(1,3,1), Iterable(2,4,2), Iterable(3,5,1),...)
What would be a good way to accomplish that?
val m = Map("one" -> Iterable(1,2,3,4), "two" -> Iterable(5,6,7), "three" -> Iterable(8,9))
If you want every combination:
for (a <- m("one"); b <- m("two"); c <- m("three")) yield Iterable(a,b,c)
If you want each iterable to march up together, but stop when the shortest is exhuasted:
(m("one"), m("two"), m("three")).zipped.map((a,b,c) => Iterable(a,b,c))
If you want each iterable to wrap around but stop when the longest one has been exhausted:
val maxlen = m.values.map(_.size).max
def icf[A](i: Iterable[A]) = Iterator.continually(i).flatMap(identity).take(maxlen).toList
(icf(m("one")), icf(m("two")), icf(m("three"))).zipped.map((a,b,c) => Iterable(a,b,c))
Edit: If you want arbitrary numbers of input lists, then you're best off with recursive functions. For Cartesian products:
def cart[A](iia: Iterable[Iterable[A]]): List[List[A]] = {
if (iia.isEmpty) List()
else {
val h = iia.head
val t = iia.tail
if (t.isEmpty) h.map(a => List(a)).toList
else h.toList.map(a => cart(t).map(x => a :: x)).flatten
and to replace zipped you want something like:
def zipper[A](iia: Iterable[Iterable[A]]): List[List[A]] = {
def zipp(iia: Iterable[Iterator[A]], part: List[List[A]] = Nil): List[List[A]] = {
if (iia.isEmpty || !iia.forall(_.hasNext)) part
else zipp(iia, iia.map(_.next).toList :: part)
You can try these out with cart(m.values), zipper(m.values), and zipper(m.values.map(icf)).
If you are out for an cartesian product, I have a solution for lists of lists of something.
xproduct (List (List (1, 2, 3, 4), List (3, 4, 5), List (1, 2)))
res3: List[List[_]] = List(List(1, 3, 1), List(2, 3, 1), List(3, 3, 1), List(4, 3, 1), List(1, 3, 2), List(2, 3, 2), List(3, 3, 2), List(4, 3, 2), List(1, 4, 1), List(2, 4, 1), List(3, 4, 1), List(4, 4, 1), List(1, 4, 2), List(2, 4, 2), List(3, 4, 2), List(4, 4, 2), List(1, 5, 1), List(2, 5, 1), List(3, 5, 1), List(4, 5, 1), List(1, 5, 2), List(2, 5, 2), List(3, 5, 2), List(4, 5, 2))
Invoke it with Rex' m:
xproduct (List (m("one").toList, m("two").toList, m("three").toList))
Have a look at this answer. The question is about a fixed number of lists to combine, but some answers address the general case.