How to pass arguments to a function that's nested inside the Ipywidgets.interactive_output() function - callback

Documented behavior:
a = widgets.IntSlider()
b = widgets.IntSlider()
c = widgets.IntSlider()
ui = widgets.HBox([a, b, c])
def f(a, b, c):
print((a, b, c))
out = widgets.interactive_output(f, {'a': a, 'b': b, 'c': c})
display(ui, out)
Desired behaviour:
Widget > Value Assigned to Variable > Input in analysis > Compute > Plot
Problem: Nesting a compute_function will cause an error saying that df1, df2, df3, df4, df5 are not defined (not being fed into the outside function).
Alternately, I can try to feed df1, df2, df3, df4, df5 into the interact function "f" in this example. However, this will give me an error that inputs should be widgets.
a = widgets.IntSlider()
b = widgets.IntSlider()
c = widgets.IntSlider()
ui = widgets.HBox([a, b, c])
def f(a, b, c):
df = compute_function(a, b, c, df1, df2, df3, df4, df5)
plot = df.plot()
display(plot)
out = widgets.interactive_output(f, {'a': a, 'b': b, 'c': c})
display(ui, out)
Perhaps the desired outcome described above is not a good approach. What is a better approach:
compute all possible permutations into a huge dataframe, which will
be passed into widgets.interactive_output() to filter the huge
dataframe?
nest a computation function inside widgets.interactive_output() ?

Related

what can be best approch - Get all possible sublists

I have a List "a, b, c,d", and this is the expected result
a
ab
abc
abcd
b
bc
bcd
c
cd
d
I tried bruteforce, but I think, there might be any other efficient solution, provided I am having very long list.
Here's a one-liner.
"abcd".tails.flatMap(_.inits.toSeq.init.reverse).mkString(",")
//res0: String = a,ab,abc,abcd,b,bc,bcd,c,cd,d
The mkString() is added just so we can see the result. Otherwise the result is an Iterator[String], which is a pretty memory efficient collection type.
The reverse is only there so that it comes out in the order you specified. If the order of the results is unimportant then that can be removed.
The toSeq.init is there to remove empty elements left behind by the inits call. If those can be dealt with elsewhere then this can also be removed.
This may not be the best solution but one way of doing this is by using sliding function as follow,
val lst = List('a', 'b', 'c', 'd')
val groupedElements = (1 to lst.size).flatMap(x =>
lst.sliding(x, 1))
groupedElements.foreach(x => println(x.mkString("")))
//output
/* a
b
c
d
ab
bc
cd
abc
bcd
abcd
*/
It may not be the best solution, but I think is a good one, and it's tailrec
First this function to get the possible sublists of a List
def getSubList[A](lista: Seq[A]): Seq[Seq[A]] = {
for {
i <- 1 to lista.length
} yield lista.take(i)
}
And then this one to perform the recursion calling the first function and obtain all the sublists possible:
def getSubListRecursive[A](lista: Seq[A]): Seq[Seq[A]] = {
#tailrec
def go(acc: Seq[Seq[A]], rest: Seq[A]): Seq[Seq[A]] = {
rest match {
case Nil => acc
case l => go(acc = acc ++ getSubList(l), rest= l.tail)
}
}
go(Nil, lista)
}
The ouput:
getSubListRecursive(l)
res4: Seq[Seq[String]] = List(List(a), List(a, b), List(a, b, c), List(a, b, c, d), List(b), List(b, c), List(b, c, d), List(c), List(c, d), List(d))

map3 in scala in Parallelism

def map2[A,B,C] (a: Par[A], b: Par[B]) (f: (A,B) => C) : Par[C] =
(es: ExecutorService) => {
val af = a (es)
val bf = b (es)
UnitFuture (f(af.get, bf.get))
}
def map3[A,B,C,D] (pa :Par[A], pb: Par[B], pc: Par[C]) (f: (A,B,C) => D) :Par[D] =
map2(map2(pa,pb)((a,b)=>(c:C)=>f(a,b,c)),pc)(_(_))
I have map2 and need to produce map3 in terms of map2. I found the solution in GitHub but it is hard to understand. Can anyone put a sight on it and explain map3 and also what this does (())?
On a purely abstract level, map2 means you can run two tasks in parallel, and that is a new task in itself. The implementation provided for map3 is: run in parallel (the task that consist in running in parallel the two first ones) and (the third task).
Now down to the code: first, let's give name to all the objects created (I also extended _ notations for clarity):
def map3[A,B,C,D] (pa :Par[A], pb: Par[B], pc: Par[C]) (f: (A,B,C) => D) :Par[D] = {
def partialCurry(a: A, b: B)(c: C): D = f(a, b, c)
val pc2d: Par[C => D] = map2(pa, pb)((a, b) => partialCurry(a, b))
def applyFunc(func: C => D, c: C): D = func(c)
map2(pc2d, pc)((c2d, c) => applyFunc(c2d, c)
}
Now remember that map2 takes two Par[_], and a function to combine the eventual values, to get a Par[_] of the result.
The first time you use map2 (the inside one), you parallelize the first two tasks, and combine them into a function. Indeed, using f, if you have a value of type A and a value of type B, you just need a value of type C to build one of type D, so this exactly means that partialCurry(a, b) is a function of type C => D (partialCurry itself is of type (A, B) => C => D).
Now you have again two values of type Par[_], so you can again map2 on them, and there is only one natural way to combine them to get the final value.
The previous answer is correct but I found it easier to think about like this:
def map3[A, B, C, D](a: Par[A], b: Par[B], c: Par[C])(f: (A, B, C) => D): Par[D] = {
val f1 = (a: A, b: B) => (c: C) => f(a, b, c)
val f2: Par[C => D] = map2(a, b)(f1)
map2(f2, c)((f3: C => D, c: C) => f3(c))
}
Create a function f1 that is a version of f with the first 2 arguments partially applied, then we can map2 that with a and b to give us a function of type C => D in the Par context (f1).
Finally we can use f2 and c as arguments to map2 then apply f3(C => D) to c to give us a D in the Par context.
Hope this helps someone!

aggregating multiple values at once

So I'm running into a speed issue where I have a dataset that needs to be aggregated multiple times.
Initially my team had set up three accumulators and were running a single foreach loop over the data. Something along the lines of
val accum1:Accumulable[a]
val accum2: Accumulable[b]
val accum3: Accumulable[c]
data.foreach{
u =>
accum1+=u
accum2 += u
accum3 += u
}
I am trying to switch these accumulations into an aggregation so that I can get a speed boost and have access to accumulators for debugging. I am currently trying to figure out a way to aggregate these three types at once, since running 3 separate aggregations is significantly slower. Does anyone have any thoughts as to how I can do this? Perhaps aggregating agnostically then pattern matching to split into two RDDs?
Thank you
As far as I can tell all you need here is aggregate with zeroValue, seqOp and combOp corresponding to the operations which are performed by your accumulators.
val zeroValue: (A, B, C) = ??? // (accum1.zero, accum2.zero, accum3.zero)
def seqOp(r: (A, B, C), t: T): (A, B, C) = r match {
case (a, b, c) => {
// Apply operations equivalent to
// accum1.addAccumulator(a, t)
// accum2.addAccumulator(c, t))
// accum3.addAccumulator(c, t)
// and return the first argument
// r
}
}
def combOp(r1: (A, B, C), r2: (A, B, C)): (A, B, C) = (r1, r2) match {
case ((a1, b1, c1), (a2, b2, c2)) => {
// Apply operations equivalent to
// acc1.addInPlace(a1, a2)
// acc2.addInPlace(b1, b2)
// acc3.addInPlace(c1, c2)
// and return the first argument
// r1
}
}
val rdd: RDD[T] = ???
val accums: (A, B, C) = rdd.aggregate(zeroValue)(seqOp, combOp)

Spark: How to efficiently have intersections preserving duplicates (in Scala)?

I have 2 RDDs, each of which is a set of strings containing duplicates. I want to find the intersection of the two sets preserving duplicates. Example:
RDD1 : a, b, b, c, c, c, c
RDD2 : a, a, b, c, c
The intersection I want is the set a, b, c, c i.e. the intersection would contain each element the minimum amount of times that it is present in both the sets.
The default intersection transformation does not preserve duplicates AFAIK. Is there a way to efficiently compute the intersection using some other transformations and/or the intersection transformation? I'm trying to avoid doing it algorithmically, which is unlikely to be as efficient as doing it the Spark way. (For the interested, I'm trying to compute Jaccard bag similarity for a set of files).
Borrowing a little from the implementation of intersection, you could do something like:
(val rdd1 = sc.parallelize(Seq("a", "b", "b", "c", "c", "c", "c")))
(val rdd2 = sc.parallelize(Seq("a", "a", "b", "c", "c")))
val cogrouped = rdd1.map(k => (k, null)).cogroup(rdd2.map(k => (k, null)))
val groupSize = cogrouped.map { case (key, (buf1, buf2)) => (key, math.min(buf1.size, buf2.size)) }
val finalSet = groupSize.flatMap { case (key, size) => List.fill(size)(key) }
(finalSet.collect = Array(a, b, c, c))
This works because cogroup will maintain duplicate occurrences of values of a pair for each grouping (in this case, all of your nulls). Also note that we are doing no more shuffles here than we would have with the original use of intersection.

Building a list of "incremental sums"

What is an efficient, functional way of building a list of "incremental sums"
For example, given
val (a,b,c,d) = (2,3,5,6)
val list1 = List(a, b, c, d)
How would you implement f such as:
list1.map(f)
would result in
List(a, a+b, a+b+c, a+b+c+d)
Can you do
list1.scanLeft(0)(_ + _).tail
?