I have two arrays like [1, 2, 3] and ["a", "b", "c"] and I want to map over the zipped values (1, "a"), (2, "b"), and (3, "c") using Zip2.
If I do this:
let foo = map(Zip2([1, 2, 3], ["a", "b", "c"]).generate()) { $0.0 }
foo has the type ZipGenerator2<IndexingGenerator<Array<Int>>, IndexingGenerator<Array<String>>>?.
Is there a way to make that an array?
The following will get you an array from the return value of Zip2:
var myZip = Zip2([1, 2, 3], ["a", "b", "c"]).generate()
var myZipArray: Array<(Int, String)> = []
while let elem = myZip.next() {
myZipArray += elem
}
println(myZipArray) // [(1, a), (2, b), (3, c)]
-- UPDATE: EVEN BETTER! --
let myZip = Zip2([1, 2, 3], ["a", "b", "c"])
let myZipArray = Array(myZip)
println(myZipArray) // [(1, a), (2, b), (3, c)]
-- now for fun --
I'm going to guess that we can init a new Array with anything that responds to generate() ?
println(Array("abcde")) // [a, b, c, d, e]
Assume that vals is the result of Zip2, which I'll presume is an array of two tuples. Like this:
let vals = [(1, "a"), (2, "b"), (3, "c")]
With that, just invoke the map() method on an array.
vals.map { $0.0 }
For example:
> vals.map { $0.1 }
$R16: String[] = size=3 {
[0] = "a"
[1] = "b"
[2] = "c"
}
Related
I am looking to do this in Scala, but nothing works. In pyspark it works obviously.
from operator import itemgetter
rdd = sc.parallelize([(0, [(0,'a'), (1,'b'), (2,'c')]), (1, [(3,'x'), (5,'y'), (6,'z')])])
mapped = rdd.mapValues(lambda v: map(itemgetter(0), v))
Output
mapped.collect()
[(0, [0, 1, 2]), (1, [3, 5, 6])]
val rdd = sparkContext.parallelize(List(
(0, Array((0, "a"), (1, "b"), (2, "c"))),
(1, Array((3, "x"), (5, "y"), (6, "z")))
))
rdd
.mapValues(v => v.map(_._1))
.foreach(v=>println(v._1+"; "+v._2.toSeq.mkString(",") ))
Output:
0; 0,1,2
1; 3,5,6
Is there a function in scala that groups all elements of a list by the number of these occurrences?
For example, I have this list:
val x = List("c", "b", "b", "c", "a", "d", "c")
And I want to get a new list like that:
x = List((3, "c"), (2, "b"), (1, "a"), (1, "d"))
You can first count the occurrences of each element and then reverse the resulting tuples:
List("c", "b", "b", "c", "a", "d", "c")
.groupBy(identity).mapValues(_.size) // Map(b -> 2, d -> 1, a -> 1, c -> 3)
.toList // List((b,2), (d,1), (a,1), (c,3))
.map{ case (k, v) => (v, k) } // List((2,b), (1,d), (1,a), (3,c))
You don't specifically mention a notion of order for the output, but if this was a requirement, this solution would need to be adapted.
Try this to get exactly what you want in the order you mentioned. (ie., order preserved in the List while taking counts):
x.distinct.map(v=>(x.filter(_==v).size,v))
In SCALA REPL:
scala> val x = List("c", "b", "b", "c", "a", "d", "c")
x: List[String] = List(c, b, b, c, a, d, c)
scala> x.distinct.map(v=>(x.filter(_==v).size,v))
res225: List[(Int, String)] = List((3,c), (2,b), (1,a), (1,d))
scala>
I want to select some elements(feature) of rdd based on binary array. I have an array consisting of 0,1 with size 40 that specify if an element is present at that index or not.
My RDD was created form kddcup99 dataset
val rdd=sc.textfile("./data/kddcup.txt")
val data=rdd.map(_.split(','))
How can I to filter or select elements of data(rdd[Array[String]]) whose value of correspondent index in binary array is 1?
If I understood your question correctly, you have an array like :
val arr = Array(1, 0, 1, 1, 1, 0)
And a RDD[Array[String]] which looks like :
val rdd = sc.parallelize(Array(
Array("A", "B", "C", "D", "E", "F") ,
Array("G", "H", "I", "J", "K", "L")
) )
Now, to get elements at the indices where arr has 1, you need to first get the indices which have 1 as the value in arr
val requiredIndices = arr.zipWithIndex.filter(_._1 == 1).map(_._2)
requiredIndices: Array[Int] = Array(0, 2, 3, 4)
And then similarily with RDD, you can use zipWithIndex and contains to check if that index is available in your requiredIndices array :
rdd.map(_.zipWithIndex.filter(x => requiredIndices.contains(x._2) ).map(_._1) )
// Array[Array[String]] = Array(Array(A, C, D, E), Array(G, I, J, K))
I have an array, something like that:
val a = Array("a", "c", "c", "z", "c", "b", "a")
and I want to get a map with keys of all different values of this array and values with a collection of relevant indexes for each such group, i.e. for a given array the answer would be:
Map(
"a" -> Array(0, 6),
"b" -> Array(5),
"c" -> Array(1, 2, 4),
"z" -> Array(3)
)
Surprisingly, it proved to be somewhat more complicated that I've anticipated. The best I've came so far with is:
a.zipWithIndex.groupBy {
case(cnt, idx) => cnt
}.map {
case(cnt, arr) => (cnt, arr.map {
case(k, v) => v
}
}
which is not either concise or easy to understand. Any better ideas?
Your code can be rewritten as oneliner, but it looks ugly.
as.zipWithIndex.groupBy(_._1).mapValues(_.map(_._2))
Another way is to use mutable.MultiMap
import collection.mutable.{ HashMap, MultiMap, Set }
val as = Array("a", "c", "c", "z", "c", "b", "a")
val mm = new HashMap[String, Set[Int]] with MultiMap[String, Int]
and then just add every binding
as.zipWithIndex foreach (mm.addBinding _).tupled
//mm = Map(z -> Set(3), b -> Set(5), a -> Set(0, 6), c -> Set(1, 2, 4))
finally you can convert it mm.toMap if you want immutable version.
Here's a version with foldRight. I think it's reasonably clear.
val a = Array("a", "c", "c", "z", "c", "b", "a")
a
.zipWithIndex
.foldRight(Map[String, List[Int]]())
{case ((e,i), m)=> m updated (e, i::m.getOrElse(e, Nil))}
//> res0: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(0, 6)
//| , b -> List(5), c -> List(1, 2, 4), z -> List(3))
Another version using foldLeft and an immutable Map with default value:
val a = Array("a", "c", "c", "z", "c", "b", "a")
a.zipWithIndex.foldLeft(Map[String, List[Int]]().withDefaultValue(Nil))( (m, p) => m + ((p._1, p._2 +: m(p._1))))
// res6: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(6, 0), c -> List(4, 2, 1), z -> List(3), b -> List(5))
Starting in Scala 2.13, we can use the new groupMap which (as its name suggests) is a one-pass equivalent of a groupBy and a mapping over grouped items:
// val a = Array("a", "c", "c", "z", "c", "b", "a")
a.zipWithIndex.groupMap(_._1)(_._2)
// Map("z" -> Array(3), "b" -> Array(5), "a" -> Array(0, 6), "c" -> Array(1, 2, 4))
This:
zips each item with its index, giving (item, index) tuples
groups elements based on their first tuple part (_._1) (group part of groupMap)
maps grouped values to their second tuple part (_._2 i.e. their index) (map part of groupMap)
Consider such a map:
Map("one" -> Iterable(1,2,3,4), "two" -> Iterable(3,4,5), "three" -> Iterable(1,2))
I want to get a list of all possible permutations of elements under Iterable, one element for each key. For this example, this would be something like:
// first element of "one", first element of "two", first element of "three"
// second element of "one", second element of "two", second element of "three"
// third element of "one", third element of "two", first element of "three"
// etc.
Seq(Iterable(1,3,1), Iterable(2,4,2), Iterable(3,5,1),...)
What would be a good way to accomplish that?
val m = Map("one" -> Iterable(1,2,3,4), "two" -> Iterable(5,6,7), "three" -> Iterable(8,9))
If you want every combination:
for (a <- m("one"); b <- m("two"); c <- m("three")) yield Iterable(a,b,c)
If you want each iterable to march up together, but stop when the shortest is exhuasted:
(m("one"), m("two"), m("three")).zipped.map((a,b,c) => Iterable(a,b,c))
If you want each iterable to wrap around but stop when the longest one has been exhausted:
val maxlen = m.values.map(_.size).max
def icf[A](i: Iterable[A]) = Iterator.continually(i).flatMap(identity).take(maxlen).toList
(icf(m("one")), icf(m("two")), icf(m("three"))).zipped.map((a,b,c) => Iterable(a,b,c))
Edit: If you want arbitrary numbers of input lists, then you're best off with recursive functions. For Cartesian products:
def cart[A](iia: Iterable[Iterable[A]]): List[List[A]] = {
if (iia.isEmpty) List()
else {
val h = iia.head
val t = iia.tail
if (t.isEmpty) h.map(a => List(a)).toList
else h.toList.map(a => cart(t).map(x => a :: x)).flatten
}
}
and to replace zipped you want something like:
def zipper[A](iia: Iterable[Iterable[A]]): List[List[A]] = {
def zipp(iia: Iterable[Iterator[A]], part: List[List[A]] = Nil): List[List[A]] = {
if (iia.isEmpty || !iia.forall(_.hasNext)) part
else zipp(iia, iia.map(_.next).toList :: part)
}
zipp(iia.map(_.iterator))
}
You can try these out with cart(m.values), zipper(m.values), and zipper(m.values.map(icf)).
If you are out for an cartesian product, I have a solution for lists of lists of something.
xproduct (List (List (1, 2, 3, 4), List (3, 4, 5), List (1, 2)))
res3: List[List[_]] = List(List(1, 3, 1), List(2, 3, 1), List(3, 3, 1), List(4, 3, 1), List(1, 3, 2), List(2, 3, 2), List(3, 3, 2), List(4, 3, 2), List(1, 4, 1), List(2, 4, 1), List(3, 4, 1), List(4, 4, 1), List(1, 4, 2), List(2, 4, 2), List(3, 4, 2), List(4, 4, 2), List(1, 5, 1), List(2, 5, 1), List(3, 5, 1), List(4, 5, 1), List(1, 5, 2), List(2, 5, 2), List(3, 5, 2), List(4, 5, 2))
Invoke it with Rex' m:
xproduct (List (m("one").toList, m("two").toList, m("three").toList))
Have a look at this answer. The question is about a fixed number of lists to combine, but some answers address the general case.