I have an akka stream that processes some messages. When an event occurs the stream should create a new instance of a different akka stream.
At the moment this is what I am doing. Is this the best way?
if(event.happened) new AnalysisFlow(info.id,info.time).flow
Thanks
If event is part of the stream, you might be able to use groupBy to split your stream into substreams.
You can also use flatMapConcat or flatMapMerge to transform your stream of elements to stream of Sources which then are run and flattened in using concat or merge strategy correspondingly.
Related
I need to model a series of LINEAR Akka Streams processing streams that model a 1 producer N consumers system.
As a quick reference, you can imagine having a single producer that produces messages in a Kafka topic, and N consumers that consume from that topic (with their own different consumer groups). In my specific case, however, this has all to be handled in memory, within the same process.
Since each linear stream is handled independently, I cannot use standard tools provided by Akka streams, such as the GraphDSL Broadcast operator. Each linear stream has to have its own Source and Sink.
Moreover, those linear streams are dynamically constructed at runtime, meaning I have to provide some reusable Source and Sink implementation.
I tried to use Actor interoperator (ActorSink.actorRefWithBackpressure() for the producer and N ActorSource.actorRef() for the consumers, but it doesn't model my case, as I cannot access the full stream materialized value, i.e. the source actor ref).
What I would need is something with the same semantics of the Kafka Source and Sink, but backed by a fully in-memory data structure. Is there anything (maybe in Alpakka) that I could use for this, otherwise what would be the correct approach?
BroadcastHub is probably the closest thing to what you're describing. It includes a built-in buffer to allow attached subscribers to fall behind by up to the size of the buffer.
Note that any "time travel" semantics (consumers receiving messages produced before the consuming stream materialized) are going to be limited or non-existent (I'm not sure which) compared to Kafka.
I want to build a graph in Akka streams where the source is a Kafka topic (topic_a) and the sink is topic_b (always) and, depending on the message's data, also topic_c (the output message will be different than the output message that was sent to topic_b).
Is there any way to achieve this in Akka streams? Thanks!
You need a graph like this one:
topicASource ~> broadcast ~> topicBSink
broadcast ~> filterFlow ~> topicCSink
It can be easily created using graphs or simplified API.
I am using kafka Processor API to do some custom calculations. Because of some complex processing, DSL was not the best fit. The stream code looks like the one below.
KeyValueBytesStoreSupplier storeSupplier = Stores.persistentKeyValueStore("storeName");
StoreBuilder<KeyValueStore<String, StoreObject>> storeBuilder = Stores.keyValueStoreBuilder(storeSupplier,
Serdes.String(), storeObjectSerde);
topology.addSource("SourceReadername", stringDeserializer, sourceSerde.deserializer(), "sourceTopic")
.addProcessor("processor", () -> new CustomProcessor("store"), FillReadername)
.addStateStore(storeBuilder, "processor") // define store for processor
.addSink("sinkName", "outputTopic", stringSerializer, resultSerde.serializer(),
Fill_PROCESSOR);
I need to clear some items from the state store based on an event coming in a separate topic. I am not able to find the right way to probably join with another stream using Processor API or some other way to listen to events in another topic to be able to trigger the cleanup code in the CustomProcessor class.
Is there a way we can get events in another topic in Processor API? Or probably mix DSL with Processor API to be able to join the two and send events in any of the topic to the Process method so that I can run the cleanup code when an event is received in the cleanup topic?
Thanks
You just need to add another input topic (add:Source) and add Processor that transforms messages from that topic and based on them remove staff from state store. One note, those topics should use same keys (because of partitioning).
In Scala, I'm using import org.apache.kafka.streams.KafkaStreams and am able to read off of an input stream and easily do some computation and send to an output stream. Is there a way via branch or filter to take the resulting record from the input stream and send to two output streams?
branch does exactly what you want. It returns an array of KStream, which you can individually send to() two different topics.
If you want to send the same stream to two topics, use through followed by to
I have a Kafka topic / stream that sometimes receives duplicates of events. How can I deduplicate the stream in KSQL?
De-duplicating a stream is not currently possible in raw KSQL. You might be able to write a UDF for this.
Note that a table will only store the latest update (message) for a given key. Depending on your usecase, that could be helpful.