Connect one observable to another existing one - system.reactive

I have two observables, one that produces elements, other that is a Subject that subscribers are subscribed to. I would like subscribers to get what comes through the first one. One way is to do the following.
first.Subscribe(a => second.OnNext(a);
Somehow I only like to Subscribe at the final end, not to Subscribe just to pipe one to the other. I am curious if there is any Rx extension that allows something like that before I go about writing my own.

This works for me:
var subject = new Subject<long>();
subject.Subscribe(x => Console.WriteLine(x));
var observable = Observable.Interval(TimeSpan.FromSeconds(1.0));
observable.Subscribe(subject);
I get the series of values from the Interval observable from the Console.WriteLine observer.

Related

RSJX Observable not returning data

The following code, when subscribed to, returns an expected array of objects.
this.store.select (this.selectors.evidenceSelector);
The objects include a 'subjectId' field.
This code, based on multiple examples on the web, returns nothing when subscribed to:
this.store
.select (this.selectors.evidenceSelector)
.pipe (
groupBy (ev => ev['subjectId']),
mergeMap (group$ => group$.pipe(toArray())),
);
The subscription never gets triggered...
Any suggestions?
Oh after reading again... I think I've figured out what's going on here.
toArray will only emit a value when the stream is closed. As you're listening from your store (which will never be closed), you'll never get anything under the toArray. You have to use something like scan if you want to accumulate and show a result on every new emission.

Keeping entries in a state store just for a defined time

Problem: I need to find out how sent a message in last e.g. 24 hours. I have following stream and state store for lookups.
#SendTo(Bindings.MESSAGE_STORE)
#StreamListener(Bindings.MO)
public KStream<?, ?> groupBySender(KStream<String, Message> messages) {
return messages.selectKey((key,message) -> message.from)
.map((k,v) -> new KeyValue<>(k, v.sentAt.toString()))
.groupByKey()
.reduce((oldTimestamp, newTimestamp) -> newTimestamp,
Materialized.as(AggregatorApplication.MESSAGE_STORE))
.toStream();
}
it works fine
[
"key=123 value=2019-06-21T13:29:05.509Z",
"key=from value=2019-06-21T13:29:05.509Z",
]
so a look up loos like :
store.get(from);
but I would like to remove entries older then 24h automatically from to store, currently they will be persisted probable forever
Is there a better way how to do it? maybe some windowing operation or so?
Atm, KTables (that are basically key-value stores) don't support a TTL (cf. https://issues.apache.org/jira/browse/KAFKA-4212)
The current recommendation is to use a windowed-store if you want to expire data. You might want to use a custom .transform() instead of windowedBy().reduce() so, to get more flexibility. (cf. https://docs.confluent.io/current/streams/developer-guide/processor-api.html)

Can the subflows of groupBy depend on the keys they were generated from ?

I have a flow with data associated to users. I also have a state for each user, that I can get asynchronously from DB.
I want to separate my flow with one subflow per user, and load the state for each user when materializing the subflow, so that the elements of the subflow can be treated with respect to this state.
If I don't want to merge the subflows downstream, I can do something with groupBy and Sink.lazyInit :
def getState(userId: UserId): Future[UserState] = ...
def getUserId(element: Element): UserId = ...
def treatUser(state: UserState): Sink[Element, _] = ...
val treatByUser: Sink[Element] = Flow[Element].groupBy(
Int.MaxValue,
getUserId
).to(
Sink.lazyInit(
elt => getState(getUserId(elt)).map(treatUser),
??? // this is never called, since the subflow is created when an element comes
)
)
However, this does not work if treatUser becomes a Flow, since there is no equivalent for Sink.lazyInit.
Since subflows of groupBy are materialized only when a new element is pushed, it should be possible to use this element to materialize the subflow, but I wasn't able to adapt the source code for groupBy so that this work consistently. Likewise, Sink.lazyInitdoesn't seem to be easily translatable to the Flow case.
Any idea on how to solve this issue ?
The relevant Akka issue you have to look at is #20129: add Sink.dynamic and Flow.dynamic.
In the associated PR #20579 they actually implemented LazySink stuffs.
They are planning to do LazyFlow next:
Will do next lazyFlow with similar signature.
Unfortunately you have to wait for that functionality to be implemented in Akka or write it yourself (then consider a PR to Akka).

Spark: How to structure a series of side effect actions inside mapping transformation to avoid repetition?

I have a spark streaming application that needs to take these steps:
Take a string, apply some map transformations to it
Map again: If this string (now an array) has a specific value in it, immediately send an email (or do something OUTSIDE the spark environment)
collect() and save in a specific directory
apply some other transformation/enrichment
collect() and save in another directory.
As you can see this implies to lazily activated calculations, which do the OUTSIDE action twice. I am trying to avoid caching, as at some hundreds lines per second this would kill my server.
Also trying to mantaining the order of operation, though this is not as much as important: Is there a solution I do not know of?
EDIT: my program as of now:
kafkaStream;
lines = take the value, discard the topic;
lines.foreachRDD{
splittedRDD = arg.map { split the string };
assRDD = splittedRDD.map { associate to a table };
flaggedRDD = assRDD.map { add a boolean parameter under a if condition + send mail};
externalClass.saveStaticMethod( flaggedRDD.collect() and save in file);
enrichRDD = flaggedRDD.map { enrich with external data };
externalClass.saveStaticMethod( enrichRDD.collect() and save in file);
}
I put the saving part after the email so that if something goes wrong with it at least the mail has been sent.
The final 2 methods I found were these:
In the DStream transformation before the side-effected one, make a copy of the Dstream: one will go on with the transformation, the other will have the .foreachRDD{ outside action }. There are no major downside in this, as it is just one RDD more on a worker node.
Extracting the {outside action} from the transformation and mapping the already sent mails: filter if mail has already been sent. This is a almost a superfluous operation as it will filter out all of the RDD elements.
Caching before going on (although I was trying to avoid it, there was not much to do)
If trying to not caching, solution 1 is the way to go

Is Rx extensions suitable for reading a file and store to database

I have a really long Excel file wich I read using EPPlus. For each line I test if it meets certain criteria and if so I add the line (an object representing the line) to a collection. When the file is read, I store those objects to the database. Would it be possible to do both things at the same time? My idea is to have a collection of objects that somehow would be consumed by thread that would save the objects to the DB. At the same time the excel reader method would populate the collection... Could this be done using Rx or is there a better method?
Thanks.
An alternate answer - based on comments to my first.
Create a function returning an IEnumberable<Records> from EPPlus/Xls - use yield return
then convert the seqence to an observable on the threadpool and you've got the Rx way of having a producer/consumer and BlockingCollection.
function IEnumberable<Records> epplusRecords()
{
while (...)
yield return nextRecord;
}
var myRecords = epplusRecords
.ToObservable(Scheduler.ThreadPool)
.Where(rec => meetsCritera(rec))
.Select(rec => newShape(rec))
.Do(newRec => writeToDb(newRec))
.ToArray();
Your case seems to be of pulling data (IEnumerable) and not pushing data (IObservable/Rx).
Hence I would suggest LINQ to objects is something that can be used to model the solution.
Something like shown in below code.
publis static IEnumerable<Records> ReadRecords(string excelFile)
{
//Read from excel file and yield values
}
//use linq operators to do filtering
var filtered = ReadRecords("fileName").Where(r => /*ur condition*/)
foreach(var r in filtered)
WriteToDb(r);
NOTE: In using IEnumerable you don't create intermediate collections in this case and the whole process looks like a pipeline.
It doesn't seem like a good fit, as there's no inherent concurrency, timing, or eventing in the use case.
That said, it may be a use case for plinq. If EEPlus supports concurrent reads. Something like
epplusRecords
.AsParallel()
.Where(rec => meetsCritera(rec))
.Select(rec => newShape(rec))
.Do(newRec => writeToDb(newRec))
.ToArray();