Kafka state-store sharing across sub-toplogies

Kafka state-store sharing across sub-toplogies - apache-kafka

I am trying to create a custom joining consumer to join multiple events.
I have created a topology which have four sub-toplogies(subtopology-0, subtoplogy-1, subtopology-2, subtopology-3) not in the exact order of what is described by topology.describe().
I have created a state-store in three of the sub-toplogies(subtopology-0, subtoplogy-1, subtopology-2) and trying to attach all the state-store created different state-stores using .connectProcessorAndStateStores("PROCESS2", "COUNTS") as per the kafka developer guide https://kafka.apache.org/0110/documentation/streams/developer-guide
Here is the code snippet of how I am creating and attaching processors to the topology.
class StreamCustomizer implements KafkaStreamsInfrastructureCustomizer {
public someMethod(StreamBuilder builder) {
Topology topology = builder.build();
topology.addProcessor("Processor1", new Processor() {...}, "state-store-1).addStateStore(store1,..);
topology.addProcessor("Processor2", new Processor() {...}, "state-store-1)
.addStateStore(store1,..);
topology.addProcessor("Processor3", new Processor() {...}, "state-store-1)
addStateStore(store1,..);
topology.addProcessor("Processor4", new Processor4() {...}, "Processor1", Processor2", "Processor3")
connectProcessorAndStateStores("Prcoessor4", "state-store-1", "state-store-2", "state-store-3");
}
}
This is how the processor is defined for all the sub-toplogies as described above
new Processor {
private ProcessorContext;
private KeyValueStore<K, V> store;
init(ProcessorContext) {
this.context = context;
store = context.getStore("store-name");
}
}
This is hot the processor 4 is written, with all the state-store retrieved in init method from context store.
new Processor4() {
private KeyValueStore<K, V> store1;
private KeyValueStore<K, V> store2;
private KeyValueStore<K, V> store3;
}
I am observing a strange behaviour that with the above code, store1, store2, and store3 all are re-intiailized and no keys are preserved whatever were stored in their respective sub-toplogies(1,2,3). However, the same code works i.e., all state store preserved the key-value stored in their respective sub-topology when state-stores are declared at class level.
class StreamCustomizer implements KafkaStreamsInfrastructureCustomizer {
private KeyValueStore <K, V> store1;
private KeyValueStore <K, V> store2;
private KeyValueStore <K, V> store3;
}
and then in the processor implementation, just init the state-store in init method.
new Processor {
private ProcessorContext;
init(ProcessorContext) {
this.context = context;
store1 = context.getStore("store-name-1");
}
}
Can someone please assist in finding the reason, or if there is anything wrong in this topology? Also, I have read in state-store can be shared within the same sub-topology.

Hard to say (the code snippets are not really clear), however, if share state you effectively merge sub-topologies. Thus, if you do it correct, you would end up with a single sub-topology containing all your processor.
As long as you see 4 sub-topologies, state store are not shared yet, ie, not connected correctly.

Related

KStream.processValues() - getting a null state store from FixedKeyProcessor

I have the following topology which uses processValues() method to combine streams DSL with Processor Api. I'm adding a state store here.
KStream<String, SecurityCommand> securityCommands =
builder.stream(
"security-command",
Consumed.with(Serdes.String(), JsonSerdes.securityCommand()));
StoreBuilder<KeyValueStore<String, UserAccountSnapshot>> storeBuilder =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("user-account-snapshot"),
Serdes.String(),
JsonSerdes.userAccountSnapshot());
builder.addStateStore(storeBuilder);
securityCommands.processValues(() -> new SecurityCommandProcessor(), Named.as("security-command-processor"), "user-account-snapshot")
.processValues(() -> new UserAccountSnapshotUpdater(), Named.as("user-snapshot-updater"), "user-account-snapshot")
.to("security-event", Produced.with(
Serdes.String(),
JsonSerdes.userAccountEvent()));
The SecurityCommandProcessor code follows:
class SecurityCommandProcessor implements FixedKeyProcessor<String, SecurityCommand, UserAccountEvent> {
private KeyValueStore<String, UserAccountSnapshot> kvStore;
private FixedKeyProcessorContext context;
#Override
public void init(FixedKeyProcessorContext context) {
this.kvStore = (KeyValueStore<String, UserAccountSnapshot>) context.getStateStore("user-account-snapshot");
this.context = context;
}
...
}
The problem is that context.getStateStore("user-account-snapshot") returns null.
I tried doing nearly the same code, by using the obsolete transformValues() and I'm able to get the state store. The problem is with processValues(). Am I doing anything wrong?

The issue is that you're using a lambda instance for the FixedKeyProcessorSupplier. When the processor needs access to a state store, you'll need to override the stores method, which returns null when it's not overridden. The FixedKeyProcessorSupplier extends the ConnectedStoreProvider interface.
So you'll need to provide a concrete instance of the processor supplier.
Let me know how it goes.
HTH,
Bill

Axon Framework - Configuring Multiple EventStores in Axon Configuration

We are having an usecase wherein each aggregate root should have different eventstores. We have used the following configuration where currently , we have only one event-store configured as below
#Configuration
#EnableDiscoveryClient
public class AxonConfig {
private static final String DOMAIN_EVENTS_COLLECTION_NAME = "coll-capture.domainEvents";
//private static final String DOMAIN_EVENTS_COLLECTION_NAME_TEST =
//"coll-capture.domainEvents-test";
#Value("${mongodb.database}")
private String databaseName;
#Value("${spring.application.name}")
private String appName;
#Bean
public RestTemplate restTemplate() {
CloseableHttpClient httpClient = HttpClientBuilder.create().build();
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new
HttpComponentsClientHttpRequestFactory(httpClient);
return new RestTemplate(clientHttpRequestFactory);
}
#Bean
#Profile({"uat", "prod"})
public CommandRouter springCloudHttpBackupCommandRouter(DiscoveryClient discoveryClient,
Registration localInstance,
RestTemplate restTemplate,
#Value("${axon.distributed.spring-
cloud.fallback-url}") String messageRoutingInformationEndpoint) {
return new SpringCloudHttpBackupCommandRouter(discoveryClient,
localInstance,
new AnnotationRoutingStrategy(),
serviceInstance -> appName.equalsIgnoreCase(serviceInstance.getServiceId()),
restTemplate,
messageRoutingInformationEndpoint);
}
#Bean
public Repository<TestEnquiry> testEnquiryRepository(EventStore eventStore) {
return new EventSourcingRepository<>(TestEnquiry.class, eventStore);
}
#Bean
public Repository<Test2Enquiry> test2enquiryRepository(EventStore eventStore) {
return new EventSourcingRepository<>(Test2Enquiry.class, eventStore);
}
#Bean
public EventStorageEngine eventStorageEngine(MongoClient client) {
MongoTemplate mongoTemplate = new DefaultMongoTemplate(client, databaseName)
.withDomainEventsCollection(DOMAIN_EVENTS_COLLECTION_NAME);
return new MongoEventStorageEngine(mongoTemplate);
}
}
Now , We want to configure "DOMAIN_EVENTS_COLLECTION_NAME_TEST"(just for example) as well in EventStorageEngine. How we can achieve the same support for multiple event-stores and select the tracking process as which collection they should be part of

If you are going the route of segregating the event streams, then combining them from an event handling perspective could become a necessity indeed. Especially when having several bounded contexts, segregating the event streams into distinct storage solutions is reasonable.
If you want to define which [message source / event store] is used by a TrackingEventProcessor, you will have to deal with the EventProcessingConfigurer. More specifically, you should invoke the EventProcessingConfigurer#registerTrackingEventProcessor(String, Function<Configuration, StreamableMessageSource<TrackedEventMessage<?>>>) method. The first String parameter is the name of the processor you want to configure as being "tracking". The second parameter defines a Function which gives you the message source to be used by this TrackingEventProcessor (TEP). It is here where you should provide the event store you want this TEP to ingest events from.
Pairing them up at a later stage could also occur of course, which is also supported by Axon Framework. This boils down to a specific form of StreamableMessageSource implementation.
More specifically, you can use the MultiStreamableMessageSource, where you can connect any number of StreamableMessageSources together.
Note that Axon's EmbeddedEventStore is in essence an implementation of a StreamableMessageSource. Once the MultiStreamableMessageSource, you will have to specify it as the messageSource for your TrackingEventProcessors of course.
Last note, know that this solution can only be used when you are using TrackingEventProcessors, as those are the only Event Processors provided by Axon ingesting a StreamableMessageSource as the source for it's events.

Detecting abandoned processess in Kafka Streams 2.0

I have this case: users collect orders as order lines. I implemented this with Kafka topic containing events with order changes, they are merged, stored in local key-value store and broadcasted as second topic with order versions.
I need to somehow react to abandoned orders - ones that were started but there was no change for at least last x hours.
Simple solution could be to scan local storage every y minutes and post event of order status change to Abandoned. It seems I cannot access store not from processor... But it is also not very elegant coding. Any suggestions are welcome.
--edit
I cannot just add puctuation to merge/validation transformer, because its output is different and should be routed elsewhere, like on this image (single kafka streams app):
so "abandoned orders processor/transformer" will be a no-op for its input (the only trigger here is time). Another thing is that i such case (as on image) my transformer gets ForwardingDisabledProcessorContext upon initialization so I cannot emit any messages in punctuator. I could just pass there kafkaTemplate bean and just produce new messages, but then whole processor/transformer is just empty shell only to access local store...
this is snippet of code I used:
public class AbandonedOrdersTransformer implements ValueTransformer<OrderEvent, OrderEvent> {
#Override
public void init(ProcessorContext processorContext) {
this.context = processorContext;
stateStore = (KeyValueStore)processorContext.getStateStore(KafkaConfig.OPENED_ORDERS_STORE);
//main scheduler
this.context.schedule(TimeUnit.MINUTES.toMillis(5), PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
KeyValueIterator<String, Order> iter = this.stateStore.all();
while (iter.hasNext()) {
KeyValue<String, Order> entry = iter.next();
if(OrderStatuses.NEW.equals(entry.value.getStatus()) &&
(timestamp - entry.value.getLastChanged().getTime()) > TimeUnit.HOURS.toMillis(4)) {
//SEND ABANDON EVENT "event"
context.forward(entry.key, event);
}
}
iter.close();
context.commit();
});
}
#Override
public OrderEvent transform(OrderEvent orderEvent) {
//do nothing
return null;
}
#Override
public void close() {
//do nothing
}
}

Hazelcast is calling loadAll() TWICE for the same key

As far as I'm aware, when using a MapLoader, Hazelcast calls loadAllKeys() once on the node which owns the partition which owns the map's name.
loadAll(Collection<Long> keys) is then only called on partitions which own a given key retrieved from loadAllKeys(). After this, the values are distributed as-needed around the cluster.
I'm performing a basic test with one node, one map and one record in my persistent store.
What I'm finding is loadAllKeys() is correctly called once, however, loadAll(Collection<Long> keys) is called twice. Why is this the case?
My implementation of loadAll(Collection<Long> keys) is as follows:
#Override
public synchronized Map<Long, MyCacheEntry> loadAll(Collection<Long> keys) {
return myCacheRepository.loadMyCacheDataForKeys(keys)
.stream()
.collect(Collectors.toMap(MyCacheEntity::getId,
entity -> gson.fromJson(entity.getValue(), MyCacheEntry.class)
));
}
This means that I am doing two lookups to my persistent storage instead of one. Seeing as though I have one record in my database I would expect loadAll(Collection<Long> keys) to only be called once.
What is happening here?
My crude, test Hazelcast/Spring configuration is as follows:
#Configuration
public class HazelcastConfiguration {
private final MyMapStore myMapStore;
#Inject
HazelcastConfiguration(#Lazy MyMapStore myMapStore) {
this.myMapStore = myMapStore;
}
#PreDestroy
public void shutdown() {
Hazelcast.shutdownAll();
}
#Bean
public HazelcastInstance hazelcastInstance() {
Config config = new Config();
config.getGroupConfig().setName("MyGroup");
NetworkConfig networkConfig = config.getNetworkConfig();
networkConfig.setPortAutoIncrement(false);
JoinConfig joinConfig = networkConfig.getJoin();
joinConfig.getMulticastConfig().setEnabled(false);
joinConfig.getTcpIpConfig().setEnabled(true).setMembers(Collections.singletonList("127.0.0.1"));
MapConfig mapConfig = new MapConfig("MyMap");
mapConfig.setBackupCount(1);
mapConfig.setAsyncBackupCount(1);
mapConfig.setInMemoryFormat(InMemoryFormat.OBJECT);
mapConfig.setTimeToLiveSeconds(0);
EntryListenerConfig entryListenerConfig = new EntryListenerConfig();
entryListenerConfig.setImplementation(new MyCacheEntryListener());
mapConfig.addEntryListenerConfig(entryListenerConfig);
MapStoreConfig mapStoreConfig = new MapStoreConfig();
mapStoreConfig.setInitialLoadMode(MapStoreConfig.InitialLoadMode.EAGER);
mapStoreConfig.setWriteDelaySeconds(1);
mapStoreConfig.setImplementation(myMapStore);
mapConfig.setMapStoreConfig(mapStoreConfig);
config.addMapConfig(mapConfig);
return Hazelcast.newHazelcastInstance(config);
}
}

How do I concurrently process Reactor Kafka Streams by Topic and Partition with Auto Acknowledgement?

I am trying to achieve concurrent processing of Kafka Topic-Partitions using Reactor Kafka with auto-acknowledgement. The documentation here makes it seem like this is possible:
http://projectreactor.io/docs/kafka/milestone/reference/#concurrent-ordered
The only difference between that and what I am attempting is I am using auto-acknowledgement.
I have the following code (relevant method is receiveAuto):
public class KafkaFluxFactory<K, V> {
private final Map<String, Object> properties;
public KafkaFluxFactory(Map<String, Object> properties) {
this.properties = properties;
}
public Flux<ConsumerRecord<K, V>> receiveAuto(Collection<String> topics, Scheduler scheduler) {
return KafkaReceiver.create(ReceiverOptions.create(properties).subscription(topics))
.receiveAutoAck()
.flatMap(flux -> flux.groupBy(this::extractTopicPartition))
.flatMap(topicPartitionFlux -> topicPartitionFlux.publishOn(scheduler));
}
private TopicPartition extractTopicPartition(ConsumerRecord<K, V> record) {
return new TopicPartition(record.topic(), record.partition());
}
}
When I use this to create a Flux of Consumer Records from Kafka with a parallel Scheduler (Schedulers.newParallel("debug", 10)), I see that they all end up getting processed on the same Thread.
Any thoughts on what I may be doing wrong?

After quite a bit of trial-and-error plus some rethinking of what I want to accomplish I realized I was trying to solve two problems in one bit of code.
The two things I need are:
In-order processing of Kafka Partitions
Ability to parallelize the processing of each partition
In trying to solve both with this piece of code, I was limiting downstream users' abilities to configure the level of parallelization. I therefore changed the method to return a Flux of GroupedFluxes which provides downstream users with the correct granularity of determining what is parallelizable:
public Flux<GroupedFlux<TopicPartition, ConsumerRecord<K, V>>> receiveAuto(Collection<String> topics) {
return KafkaReceiver.create(createReceiverOptions(topics))
.receiveAutoAck()
.flatMap(flux -> flux.groupBy(this::extractTopicPartition));
}
Downstream, users are able to parallelize each emitted GroupedFlux using whatever Scheduler they wish:
public <V> void work(Flux<GroupedFlux<TopicPartition, V>> flux) {
flux.doOnNext(groupPublisher -> groupPublisher
.publishOn(Schedulers.elastic())
.subscribe(this::doWork))
.subscribe();
}
This has the desired behavior processing each TopicPartition-GroupedFlux in-order and parallel to other GroupedFluxes.

I guess it executes sequentially at least in your consumer. To do a parallel consuming you should convert you flux to ParallelFlux
public ParallelFlux<ConsumerRecord<K, V>> receiveAuto(Collection<String> topics, Scheduler scheduler) {
return KafkaReceiver.create(ReceiverOptions.create(properties).subscription(topics))
.receiveAutoAck()
.flatMap(flux -> flux.groupBy(this::extractTopicPartition))
.flatMap(topicPartitionFlux -> topicPartitionFlux.parallel().runOn(Schedulers.parallel()));
}
After in your consumer function if you want to consume in parallel way you should use method such as:
void subscribe(Consumer<? super T> onNext, Consumer<? super Throwable>
onError, Runnable onComplete, Consumer<? super Subscription> onSubscribe)
Or any other overloaded method with Consumer<T super T> onNext arguments.
If you just use method as below you will consume flux in sequential way
void subscribe(Subscriber<? super T> s)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse