Error when combining two Flux using zipWith - publish-subscribe

I am learning the reactive streams and trying to combine the two Flux as follows;
List<Integer> elems = new ArrayList<>();
Flux.just(10,20,30,40)
.log()
.map(x -> x * 2)
.zipWith(Flux.range(0, Integer.MAX_VALUE),
(two, one) -> String.format("First : %d, Second : %d \n", one, two))
.subscribe(elems::add);
and when calling subscribe, i got the following error:
Multiple markers at this line
- The method subscribe(Consumer<? super String>) in the type Flux<String> is not applicable for the arguments
(elems::add)
- The type List<Integer> does not define add(String) that is applicable here
and i got the following suggestions to solve the issues:
But none of these alternative worked.
Any suggestions, how to solve this issue?

Sometimes method references makes you overlook the obvious. I have re-written your function, but with anonymous class.
List<Integer> elems = new ArrayList<>();
Flux.just(10,20,30,40)
.log()
.map(x -> x * 2)
.zipWith(Flux.range(0, Integer.MAX_VALUE),
(two, one) -> String.format("First : %d, Second : %d \n", one, two))
.subscribe(new Consumer<String>() {
#Override
public void accept(String s) {
}
});
I have used code completion from my IDE(intellij) to create this anonymous class. As you can see the input to this consumer is a String, which is coming from
String.format("First : %d, Second : %d \n", one, two)
So it is complaining that you cannot add a String to a List<Integer> which is what you are trying to do using elems:add

Related

get one element from each GroupedObservable in RxJava

I'm struggling with groupBy in RxJava.
The problem is - I cant get only one element from each group.
For example i have a list of elements:
SomeModel:
class SomeModel {
int importantField1;
int mainData;
}
My list of models for example:
List<SomeModel> dataList = new ArrayList<>();
dataList.add(new SomeModel(3, 1));
dataList.add(new SomeModel(3, 1));
dataList.add(new SomeModel(2, 1));
In my real project there is more complex data model. I added same models on purpose. It is matter for my project.
Then I'm trying to take one element from group in this manner:
List<SomeModel> resultList = Observable.fromIterable(dataList)
.sorted((s1, s2) -> Long.compare(s2.importantField1, s1.importantField1))
.groupBy(s -> s.importantField1)
.firstElement()
// Some transformation to Observable. May be it is not elegant, but it is not a problem
.flatMapObservable(item -> item)
.groupBy(s -> s.mainData)
//Till this line I have all I need. But then I need to take only one element from each branch
.flatMap(groupedItem -> groupedItem.firstElement().toObservable())
.toList()
.blockingGet();
And of course it's not working. I still have two same elements in the resultList.
I cant add .firstElement(), after last .flatMap operator, because there could be situations when after last .groupBy may be more then one branch.
I need only one element from each branch.
I've tryed this way:
.flatMap(groupedItem -> groupedItem.publish(item -> item.firstElement().concatWith(item.singleElement()).toObservable())
no effect. This sample of code I took from this post: post
There author suggests this :
.flatMap(grp -> grp.publish(o -> o.first().concatWith(o.ignoreElements())))
but even if I remove last two rows of my code:
.toList()
.blockingGet();
And change resultList to Disposable disposable suggested option not working because of error:
.concatWith(o.ignoreElements()) - concatWith not taking Completable.
Some method signatures changed with 3.x since my post so you'll need these:
first -> firstElement
firstElement returns Maybe which is no good inside publish, plus there is no Maybe.concatWith(CompletableSource), thus the need to convert to Observable.
.flatMap(grp ->
grp.publish(o ->
o.firstElement()
.toObservable()
.concatWith(o.ignoreElements())
)
)

How is new object instantiation handled in case of Datasets?

I have to following scenario
case class A(name:String)
class Eq { def isMe(s:String) = s == "ME" }
val a = List(A("ME")).toDS
a.filter(l => new Eq().isMe(l.name))
Does this create a new object Eq every time for each data point on each executor ?
Nice one! I didn't know there is a different filter method for a typed dataset.
In order to answer your question, I will do some deep dive into Spark internals.
filter on a typed Dtaset has the following signature:
def filter(func: T => Boolean): Dataset[T]
Note that func is parameterized with T, hence Spark needs to deserialize both your object A along with the function.
TypedFilter Main$$$Lambda$, class A, [StructField(name,StringType,true)], newInstance(class A)
where Main$$$Lambda$ is a randomly generated function name
During optimization phase it might be eliminated by the EliminateSerialization rule if the following condition is met:
ds.map(...).filter(...) can be optimized by this rule to save extra deserialization, but ds.map(...).as[AnotherType].filter(...) can not be optimized.
If the rule is applicable TypedFilter is replaced by Filter.
The catch here is a Filter's condition. In fact, it is another special expression named Invoke where:
targetObject is the filter function Main$$$Lambda$
functionName is apply since it is a regular Scala function.
Spark eventually runs in one of these two modes - generate code or interpreter. Let's concentrate on the first one as it is the default.
Here is a simplified stack trace of the methods invocation that will generate the code
SparkPlan.execute
//https://github.com/apache/spark/blob/03e30063127fd71bef8a14553381e805fe5b6679/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L596
-> WholeStageCodegenExec.execute
[child: Filter]
-> child.execute
[condition Invoke]
-> Invoke.genCode
//https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L345
-> doGenCode
Simplified code after generation phase:
final class GeneratedIteratorForCodegenStage1 extends BufferedRowIterator {
private Object[] references;
private scala.collection.Iterator input;
private UnsafeRowWriter writer = new UnsafeRowWriter();
public GeneratedIteratorForCodegenStage1(Object[] references) {
this.references = references;
}
public void init(Iterator inputs) {
this.inputs = inputs;
}
protected void processNext() throws IOException {
while (input.hasNext() && !stopEarly()) {
InternalRow row = input.next();
do {
//Create A object
UTF8String value = row.getUTF8String(0));
A a = new A(value.toString)
//Filter by A's value
result = (scala.Function1) references[0].apply(a);
if (!result) continue;
writer.write(0, value)
append((writer.getRow());
}
if (shouldStop()) return;
}
}
}
We can see that projection is constructed with an array of objects passed in references variable. But where and how many times the references variable is instantiated?
It is created during WholeStageCodegenExec and instantiated only once per partition.
And this leads us to the answer that, however, filter function will be created only once per partition and not per data point, the Eq and A classes will be created per data point.
If you are curious about where it has been added to the code context:
It happens here
where javaType is scala.function1 .
and value is the implementation - Main$$$Lambda$

Ist it possible to combine arbitrarily many timed Flux into one?

I am aware about combineLatest() to combine the last values in two to six Flux instances (Combining Publishers in Project ). However, assume I have a List<Flux<Integer>> listOfFlux. Is it somehow possible to combine all of them into one, like e.g. listOfFlux.combineAllLatest( (a,b) -> a + b) )?
Yes, there is a operator variant just for that:
Flux.combineLatest(Iterable<? extends Publisher<? extends T>> sources,
Function<Object[],V> combinator)
You can use it like:
List<Flux<Integer>> listOfFlux = //...
Flux<Integer> result = Flux.combineLatest(listOfFlux, arr -> {
//...
});

Unsupported Exception while modifying list inside flatmap function

I get an Unsupported Operation Exception when trying to modify a list from inside "apply" method of flatmap in RxJava2.
compositeDisposable.add(createObservable()
.flatMap(new Function<List<String>, ObservableSource<List<String>>>() {
#Override
public ObservableSource<List<String>> apply(List<String> s) throws Exception {
List<String> modiList = new ArrayList<String>();
modiList.addAll(s);
modiList.add("barber");
//s.add("barber") and return Observable.fromArray(s) thows error
return Observable.fromArray(modiList);
}
})
.subscribeWith(getObserver()));
However, If i create a new list it works fine as shown above.
Any insights about it ?
Below is my Observable creation logic :
String[] arr = {"hi", "hello", "bye"};
Observable<List<String>> observable;
observable = Observable.fromCallable(() -> Arrays.asList(arr));
As akarnokd as pointed out, mutating the value of the list in your flatMap is generally a bad idea, but your superficial problem is that implementation of the interface List returned by Arrays.asList does not implement addAll.

Scheduling a IEnumerable periodically with .NET reactive extensions

Say for example I have an enumerable
dim e = Enumerable.Range(0, 1024)
I'd like to be able to do
dim o = e.ToObservable(Timespan.FromSeconds(1))
So that the observable would generate values every second
until the enumerable is exhausted. I can't figure a simple way to
do this.
You can use Interval together with Zip to get the desired functionality:
var sequence = Observable.Interval(TimeSpan.FromSeconds(1))
.Zip(e.ToObservable(), (tick, index) => index)
I have also looked for the solution and after reading the intro to rx made my self one:
There is an Observable.Generate() overload which I have used to make my own ToObservable() extension method, taking TimeSpan as period:
public static class MyEx {
public static IObservable<T> ToObservable<T>(this IEnumerable<T> enumerable, TimeSpan period)
{
return Observable.Generate(
enumerable.GetEnumerator(),
x => x.MoveNext(),
x => x,
x => x.Current,
x => period);
}
public static IObservable<T> ToObservable<T>(this IEnumerable<T> enumerable, Func<T,TimeSpan> getPeriod)
{
return Observable.Generate(
enumerable.GetEnumerator(),
x => x.MoveNext(),
x => x,
x => x.Current,
x => getPeriod(x.Current));
}
}
Already tested in LINQPad. Only concerning about what happens with the enumerator instance after the resulting observable is e.g. disposed. Any corrections appreciated.
You'd need something to schedule notifying observers with each value taken from the Enumerable.
You can use the recursive Schedule overload on an Rx scheduler.
Public Shared Function Schedule ( _
scheduler As IScheduler, _
dueTime As TimeSpan, _
action As Action(Of Action(Of TimeSpan)) _
) As IDisposable
On each scheduled invocation, simply call enumerator.MoveNext(), and call OnNext(enumerator.Current), and finally OnCompleted when MoveNext() returns false. This is pretty much the bare-bones way of doing it.
An alternative was to express your requirement is to restate it as "for a sequence, have a minimum interval between each value".
See this answer. The test case resembles your original question.
You could always do this very simple approach:
dim e = Enumerable.Range(0, 1024)
dim o = e.ToObservable().Do(Sub (x) Thread.Sleep(1000))
When you subscribe to o the values take a second to be produced.
I can only assume that you are using Range to dumb down your question.
Do you want every value that the Enumerable pushes to be delayed by a second?
var e = Enumerable.Range(0, 10);
var o = Observable.Interval(TimeSpan.FromSeconds(1))
.Zip(e, (_,i)=>i);
Or do you want only the last value of the Enumerable at each second to be pushed. i.e. reading from Enumerable that is evaluating as you enumerate it (perhaps some IO). In which case CombineLatest is more useful than Zip.
Or perhaps you just want to get a value every second, in which case just use the Observable.Interval method
var o = Observable.Interval(TimeSpan.FromSeconds(1));
If you explain your problem space then the community will be able to better help you.
Lee
*Excuse the C# answer, but I dont know what the equivalent VB.NET code would be.