Adding lists by element in pyspark

Adding lists by element in pyspark - pyspark

I'd like to take a RDD of integer lists and reduce it down to one list. For example...
[1, 2, 3, 4]
[2, 3, 4, 5]
to
[3, 5, 7, 9]
I can do this in python using the zip function but not sure how to replicate it in spark besides doing collect on the object but I want to keep the data in the rdd.

If all elements in rdd are of the same length, you can use reduce with zip:
rdd = sc.parallelize([[1,2,3,4],[2,3,4,5]])
rdd.reduce(lambda x, y: [i+j for i, j in zip(x, y)])
# [3, 5, 7, 9]

Related

Can you merge two Flux, without blocking, such that the result only contains unique elements?

Is there a way to merge two Flux such that the result only contains unique elements? I can block on the output and then convert it to a set, but is there a way that does not depend on blocking?
Source (Kotlin)
val set1 = Flux.just(1, 2, 3, 4, 5)
val set2 = Flux.just(2, 4, 6, 8, 10)
val mergedSet = set1.mergeWith(set2)
println(mergedSet.collectList().block())
Output
[1, 2, 3, 4, 5, 2, 4, 6, 8, 10]
Desired Output (order is not important)
[1, 2, 3, 4, 5, 6, 8, 10]

You can use the Flux's merge method and then apply distinct() to it.
Flux.merge (Flux.just(1, 2, 3, 4, 5), Flux.just(2, 4, 6, 8, 10)).distinct();
This way you get a flux which produces only distinct values.

How can I merge 2 observables in a custom fashion?

Custom fashion is:
obs1 = [1, 3, 5, 7, 9], obs2 = [2, 4, 6, 8, 10] -> mergedObs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
I was thinking about obs1.zipWith(obs2) and my bifunction was (a, b) -> Observable.just(a, b) and then it's not trivial for me to flatten Observable<Observable<Integer>>.

That looks like an ordered merge: merge so that the smallest is picked from the sources when they all have items ready:
Flowables.orderedMerge():
Given a fixed number of input sources (which can be self-comparable or given a Comparator) merges them into a single stream by repeatedly picking the smallest one from each source until all of them completes.
Flowables.orderedMerge(Flowable.just(1, 3, 5), Flowable.just(2, 4, 6))
.test()
.assertResult(1, 2, 3, 4, 5, 6);
Edit
If the sources are guaranteed to be the same length, you can also zip them into a structure and then flatten that:
Observable.zip(source1, source2, (a, b) -> Arrays.asList(a, b))
.flatMapIterable(list -> list)
;

Quicksort with swapping middle and first element in each partition

Assume we have a standard, two-way partition QuickSort algorithm that always pivots on the first element. However, in this slight variant of QuickSort, we first swap the first and middle elements, and then pivot on the 'new' first element. My question is, will this change the worst-case running time?
My initial thinking was no, as in each sub-array the elements are still in random order relative to each other, and thus switching the first and middle elements would not change the overall runtime. But as I am interested in finding the worst-case scenario, I'm not sure if there's some 'special' array that would cause this slight variant to change the worst-case runtime of the original algorithm.

Quicksort’s worst case is when the pivot is always the minimum or maximum. With that in mind, you can build a worst-case array:
[1, 2] (every element in a two-element array is a min or max)
[3, 1, 2] post-swap produces the above
[1, 3, 2] pre-swap
[4, 1, 3, 2] post-swap produces the above
[1, 4, 3, 2] pre-swap
[5, 1, 4, 3, 2] post-swap produces the above
[4, 1, 5, 3, 2] pre-swap
[6, 4, 1, 5, 3, 2] post-swap produces the above
[1, 4, 6, 5, 3, 2] pre-swap
etc.

Is it possible to match N elements with a coffeescript splat?

Is it possible to specify how many elements a splat should match? Something like:
foo = [1, 2, 3, 4, 5, 6]
[firstThree...(3), fourth, rest...] = foo
console.log firstThree // [1, 2, 3]
console.log forth // 4
console.log rest // [5, 6]

As far as I know there is no way of adding a limit to the amount of arguments a splat can take.
But you can use ranges(search for range in the Loops and Comprehensions Docs) to get a similar syntax in your destructuring assignment:
foo = [1, 2, 3, 4, 5, 6]
[firstThree, fourth, rest] = [foo[0..2], foo[3], foo[4..-1]]
firstThree
# => [1, 2, 3]
fourth
# => 4
rest
# => [5, 6]

Remove list element at index?

There is a serious lack of methods available to lists in Swift. It is really disappointing, coming from a Python background. For example, I want to remove the first element, something like this would work in Python:
mylist = mylist[1:]
How do I remove an element from a list (preferably by index, but I can do whatever method is easiest)?

Use removeAtIndex
var arr = [1, 2, 3]
arr.removeAtIndex(1)

If you want to remove a range of values, you can use removeRange:
var x = [1, 2, 3, 4, 5]
x.removeRange(1...2) // result is [1, 4, 5]
var x = [1, 2, 3, 4, 5]
x.removeRange(1..<2) // result is [1, 3, 4, 5]
Note that this method doesn't check for range bounds, so if you specify a range outside the array size, it will throw a runtime exception.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Adding lists by element in pyspark - pyspark

I'd like to take a RDD of integer lists and reduce it down to one list. For example... [1, 2, 3, 4] [2, 3, 4, 5] to [3, 5, 7, 9] I can do this in python using the zip function but not sure how to replicate it in spark besides doing collect on the object but I want to keep the data in the rdd.

If all elements in rdd are of the same length, you can use reduce with zip: rdd = sc.parallelize([[1,2,3,4],[2,3,4,5]]) rdd.reduce(lambda x, y: [i+j for i, j in zip(x, y)]) # [3, 5, 7, 9]

Related

Can you merge two Flux, without blocking, such that the result only contains unique elements?

How can I merge 2 observables in a custom fashion?

Quicksort with swapping middle and first element in each partition

Is it possible to match N elements with a coffeescript splat?

Remove list element at index?

Categories

Resources