Julia convert EnumerableGroupBy to array - group-by

I have the following code that does a groupby on an array of numbers and returns an array of tuples with the numbers and respective counts:
using Query
a = [1, 2, 1, 2, 3, 4, 6, 1, 5, 5, 5, 5]
key_counts = a |> #groupby(_) |> #map g -> (key(g), length(values(g)))
collect(key_counts)
Is there a way to complete the last step in the pipeline to convert the key_counts of type QueryOperators.EnumerableMap{Tuple{Int64, Int64}, QueryOperators.EnumerableIterable{Grouping{Int64, Int64}, QueryOperators.EnumerableGroupBy{Grouping{Int64, Int64}, Int64, Int64, QueryOperators.EnumerableIterable{Int64, Vector{Int64}}, var"#12#17", var"#13#18"}}, var"#15#20"} to Vector{Tuple{Int, Int}} directly by integrating the collect operation to the pipeline as one liner?

The question has been clarified. My answer is no longer intended as a solution but provides additional information.
Using key_counts |> collect instead of collect(key_counts) works on the second line, but |> collect at the end of the pipe line does not, which feels like unwanted behavior.
Below response no longer relevant
When I run your code I actually do receive a Vector{Tuple{Int, Int}} as output.
I'm using Julia v1.6.0 with Query v1.0.0.
using Query
a = [1, 2, 1, 2, 3, 4, 6, 1, 5, 5, 5, 5]
key_counts = a |> #groupby(_) |> #map g -> (key(g), length(values(g)))
output = collect(key_counts)
typeof(output) # Vector{Tuple{Int64, Int64}} (alias for Array{Tuple{Int64, Int64}, 1})

Related

Adding lists by element in pyspark

I'd like to take a RDD of integer lists and reduce it down to one list. For example...
[1, 2, 3, 4]
[2, 3, 4, 5]
to
[3, 5, 7, 9]
I can do this in python using the zip function but not sure how to replicate it in spark besides doing collect on the object but I want to keep the data in the rdd.
If all elements in rdd are of the same length, you can use reduce with zip:
rdd = sc.parallelize([[1,2,3,4],[2,3,4,5]])
rdd.reduce(lambda x, y: [i+j for i, j in zip(x, y)])
# [3, 5, 7, 9]

`to` function for a number in Scala?

What does to function do in:
rdd.flatMap(x => x.to(3))
?
rdd is composed of {1, 2, 3, 3} and the above function returns {1, 2, 3, 2, 3, 3, 3}
I have been googling "scala number to function" and its variants, but can't seem to find what it does.
To function
To create a Range in Scala, use the predefined methods to and by.
Example:
1 to 3 will give Range(1, 2, 3)
What does to function do in RDD
a) Looking into map function:
sc.range(1L, 6L).map(x => x to 3).collect.foreach(println)
This prints
NumericRange(1, 2, 3)
NumericRange(2, 3)
NumericRange(3)
NumericRange() // 4 to 3 returns an empty Range
NumericRange() // 5 to 3 returns an empty Range
b) Looking into flatMap function:
sc.range(1L, 6L).flatMap(x => x to 3).collect.foreach(println)
This prints
1
2
3
2
3
3
It combines mapping and flattening. flatMap takes a function that works on the nested lists and then concatenates the results back together.
An important thing to understand about flatMap is that anything that looks like an empty array will disappear. so NumericRange() doesn't appears in flatMap result .
rdd.flatMap(x => x.to(3)) works as follows:
1. rdd.map(x => x.to(3))
a) fetch the first element of {1, 2, 3, 3}, that is 1
b) apply to x => x.to(3), that is 1.to(3),
that is also explained as 1 to 3, it will generate the range {1, 2, 3}
c) fetch the second element of {1, 2, 3, 3}, that is 2
d) apply to x => x.to(3), that is 2.to(3), that is also explained as 2 to 3,
it will generate the range {2, 3}
e) repeat above, 3 to 3 will get {3}, the final 3 will get {3}
f) so finally, you get {{1, 2, 3}, {2, 3}, {3}, {3}}
2. flatMap is combination of map and flattern
so flatmap will make {{1, 2, 3}, {2, 3}, {3}, {3}} become {1, 2, 3, 2, 3, 3, 3}
So, x.to(y) just to generate a range, [x, y], you can use repl to verify it.
C:\Windows\System32>scala
Welcome to Scala version 2.10.6 (Java HotSpot(TM) Client VM, Java 1.7.0_55).
Type in expressions to have them evaluated.
Type :help for more information.
scala> 2 to 5
res0: scala.collection.immutable.Range.Inclusive = Range(2, 3, 4, 5)

Remove list element at index?

There is a serious lack of methods available to lists in Swift. It is really disappointing, coming from a Python background. For example, I want to remove the first element, something like this would work in Python:
mylist = mylist[1:]
How do I remove an element from a list (preferably by index, but I can do whatever method is easiest)?
Use removeAtIndex
var arr = [1, 2, 3]
arr.removeAtIndex(1)
If you want to remove a range of values, you can use removeRange:
var x = [1, 2, 3, 4, 5]
x.removeRange(1...2) // result is [1, 4, 5]
var x = [1, 2, 3, 4, 5]
x.removeRange(1..<2) // result is [1, 3, 4, 5]
Note that this method doesn't check for range bounds, so if you specify a range outside the array size, it will throw a runtime exception.

How do I detect Change in Observable?

Say I have IObservable and I want an observable that ignores the repeating numbers of the original one, how can I do that ? I tried the following
I have tried GroupBy() but it is a hot observable, which is not going to work. And all I need to compare with is with the previous one.
You want to use DistinctUntilChanged.
// yields 1, 4, 4, 2, 2, 2, 3, 4, 4, 3
IObservable<int> a = ...;
// yields 1, 4, 2, 3, 4, 3
IObservable<int> b = obs.DistinctUntilChanged();

Object range with conditions

In groovy I can write
def n = 10
print 1..<n
Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Are there other language that allow to specify range with conditions?
examples
def n = 10
print 1<=..n
Output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def n = -2
print 1<=..n
Output: [1]
def n = -2
print 1..n
Output: [1, 0, -1, -2]
Python has the range() method which does a similar thing. While it does not use operators for the condition you can specify a start value, stop value and step value. It then creates a list containing all values starting with the start value, then start+step, ... until it reaches the end value (which is not included).