Kafka Streams persistent store error: the state store, may have migrated to another instance - apache-kafka

I am using Kafka Streams with Spring Boot. In my use case when I receive customer event from other microservice I need to store in customer materialized view and when I receive order event, I need to join customer and order then store in customer-order materialized view. To achieve this I created persistent key-value store customer-store and updating this when a new event comes.
StoreBuilder customerStateStore = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("customer"),Serdes.String(), customerSerde).withLoggingEnabled(new HashMap<>());
streamsBuilder.addStateStore(customerStateStore);
KTable<String,Customer> customerKTable=streamsBuilder.table("customer",Consumed.with(Serdes.String(),customerSerde));
customerKTable.foreach(((key, value) -> System.out.println("Customer from Topic: "+value)));
Configured Topology, Streams and started streams object. When I try to access store using ReadOnlyKeyValueStore, I got the following exception, even though I stored some objects few moments ago
streams.start();
ReadOnlyKeyValueStore<String, Customer> customerStore = streams.store("customer", QueryableStoreTypes.keyValueStore());
System.out.println("customerStore.approximateNumEntries()-> " + customerStore.approximateNumEntries());
Code uploaded to Github for reference. Appreciate your help.
Exception:
org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, customer, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:60)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1043)
at com.kafkastream.service.EventsListener.main(EventsListener.java:94)

The state store needs some time to be prepared usually. The simplest approach is like below. (code from the official document)
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}
You can find additional info in the document.
https://docs.confluent.io/current/streams/faq.html#interactive-queries

Related

Can the state stores in Kafka be shared across streams?

We have a scenario where a statestore having some values from one kstream needs to be accessed in another kstream, is there any way to achieve this?
They can be accessed with Interactive Queries.
Between applications or instances of the same application, you need to use RPC calls such as adding an HTTP or gRPC server.
https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html
You can attach the same state store to multiple processors if you use the Processor API, but also if you use the Processor API Integration in the DSL.
There are two ways to do that (see javadocs). You can either manually add the store to the processors, like:
// create store
StoreBuilder<KeyValueStore<String,String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
// add store
builder.addStateStore(keyValueStoreBuilder);
KStream outputStream = inputStream.processor(new ProcessorSupplier() {
public Processor get() {
return new MyProcessor();
}
}, "myProcessorState");
or you can implement stores() on the passed in ProcessorSupplier:
class MyProcessorSupplier implements ProcessorSupplier {
// supply processor
Processor get() {
return new MyProcessor();
}
// provide store(s) that will be added and connected to the associated processor
// the store name from the builder ("myProcessorState") is used to access the store later via the ProcessorContext
Set<StoreBuilder> stores() {
StoreBuilder<KeyValueStore<String, String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
return Collections.singleton(keyValueStoreBuilder);
}
}
These are examples for KStream#process(), but it works similarly for the family of KStream#*transform*() methods.

Adding data to state store for stateful processing and fault tolerance

I have a microservice that perform some stateful processing. The application construct a KStream from an input topic, do some stateful processing then write data into the output topic.
I will be running 3 of this applications in the same group. There are 3 parameters that I need to store in the event when the microservice goes down, the microservice that takes over can query the shared statestore and continue where the crashed service left off.
I am thinking of pushing these 3 parameters into a statestore and query the data when the other microservice takes over. From my research, I have seen a lot of example when people perform event counting using state store but that's not exactly what I want, does anyone know an example or what is the right approach for this problem?
So you want to do 2 things:
a. the service going down have to store the parameters:
If you want to do it in a straightforward way than all you have to do is to write a message in the topic associated with the state store (the one you are reading with a KTable). Use the Kafka Producer API or a KStream (could be kTable.toStream()) to do it and that's it.
Otherwise you could create manually a state store:
// take these serde as just an example
Serde<String> keySerde = Serdes.String();
Serde<String> valueSerde = Serdes.String();
KeyValueBytesStoreSupplier storeSupplier = inMemoryKeyValueStore(stateStoreName);
streamsBuilder.addStateStore(Stores.keyValueStoreBuilder(storeSupplier, keySerde, valueSerde));
then use it in a transformer or processor to add items to it; you'll have to declare this in the transformer/processor:
// depending on the serde above you might have something else then String
private KeyValueStore<String, String> stateStore;
and initialize the stateStore variable:
#Override
public void init(ProcessorContext context) {
stateStore = (KeyValueStore<String, String>) context.getStateStore(stateStoreName);
}
and later use the stateStore variable:
#Override
public KeyValue<String, String> transform(String key, String value) {
// using stateStore among other actions you might take here
stateStore.put(key, processedValue);
}
b. read the parameters in the service taking over:
You could do it with a Kafka consumer but with Kafka Streams you first have to make the store available; the easiest way to do it is by creating a KTable; then you have to get the queryable store name that is automatically created with the KTable; then you have to actually get access to the store; then you extract a record value from the store (i.e. a parameter value by its key).
// this example is a modified copy of KTable javadocs example
final StreamsBuilder streamsBuilder = new StreamsBuilder();
// Creating a KTable over the topic containing your parameters a store shall automatically be created.
//
// The serde for your MyParametersClassType could be
// new org.springframework.kafka.support.serializer.JsonSerde(MyParametersClassType.class)
// though further configurations might be necessary here - e.g. setting the trusted packages for the ObjectMapper behind JsonSerde.
//
// If the parameter-value class is a String then you could use Serdes.String() instead of a MyParametersClassType serde.
final KTable paramsTable = streamsBuilder.table("parametersTopicName", Consumed.with(Serdes.String(), <<your InstanceOfMyParametersClassType serde>>));
...
// see the example from KafkaStreams javadocs for more KafkaStreams related details
final KafkaStreams streams = ...;
streams.start()
...
// get the queryable store name that is automatically created with the KTable
final String queryableStoreName = paramsTable.queryableStoreName();
// get access to the store
ReadOnlyKeyValueStore view = streams.store(queryableStoreName, QueryableStoreTypes.timestampedKeyValueStore());
// extract a record value from the store
InstanceOfMyParametersClassType parameter = view.get(key);

Detecting abandoned processess in Kafka Streams 2.0

I have this case: users collect orders as order lines. I implemented this with Kafka topic containing events with order changes, they are merged, stored in local key-value store and broadcasted as second topic with order versions.
I need to somehow react to abandoned orders - ones that were started but there was no change for at least last x hours.
Simple solution could be to scan local storage every y minutes and post event of order status change to Abandoned. It seems I cannot access store not from processor... But it is also not very elegant coding. Any suggestions are welcome.
--edit
I cannot just add puctuation to merge/validation transformer, because its output is different and should be routed elsewhere, like on this image (single kafka streams app):
so "abandoned orders processor/transformer" will be a no-op for its input (the only trigger here is time). Another thing is that i such case (as on image) my transformer gets ForwardingDisabledProcessorContext upon initialization so I cannot emit any messages in punctuator. I could just pass there kafkaTemplate bean and just produce new messages, but then whole processor/transformer is just empty shell only to access local store...
this is snippet of code I used:
public class AbandonedOrdersTransformer implements ValueTransformer<OrderEvent, OrderEvent> {
#Override
public void init(ProcessorContext processorContext) {
this.context = processorContext;
stateStore = (KeyValueStore)processorContext.getStateStore(KafkaConfig.OPENED_ORDERS_STORE);
//main scheduler
this.context.schedule(TimeUnit.MINUTES.toMillis(5), PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
KeyValueIterator<String, Order> iter = this.stateStore.all();
while (iter.hasNext()) {
KeyValue<String, Order> entry = iter.next();
if(OrderStatuses.NEW.equals(entry.value.getStatus()) &&
(timestamp - entry.value.getLastChanged().getTime()) > TimeUnit.HOURS.toMillis(4)) {
//SEND ABANDON EVENT "event"
context.forward(entry.key, event);
}
}
iter.close();
context.commit();
});
}
#Override
public OrderEvent transform(OrderEvent orderEvent) {
//do nothing
return null;
}
#Override
public void close() {
//do nothing
}
}

How can I create a state store that is restorable from an existing changelog topic?

I am using the streams DSL to deduplicate a topic called users:
topology.addStateStore(Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("users"), byteStringSerde, userSerde));
KStream<ByteString, User> users = topology.stream("users", Consumed.with(byteStringSerde, userSerde));
users.transform(() -> new Transformer<ByteString, User, KeyValue<ByteString, User>>() {
private KeyValueStore<ByteString, User> store;
#Override
#SuppressWarnings("unchecked")
public void init(ProcessorContext context) {
store = (KeyValueStore<ByteString, User>) context.getStateStore("users");
}
#Override
public KeyValue<ByteString, User> transform(ByteString key, User value) {
User user = store.get(key);
if (user != null) {
store.put(key, value);
return new KeyValue<>(key, value);
}
return null;
}
#Override
public KeyValue<ByteString, User> punctuate(long timestamp) {
return null;
}
#Override
public void close() {
}
}, "users");
Given this code, Kafka Streams creates an internal changelog topic for the users store. I am wondering, is there some way I can use the existing users topic instead of creating an essentially identical changelog topic?
PS. I see that StreamsBuilder says this is possible:
However, no internal changelog topic is created since the original input topic can be used for recovery
But following the code to InternalStreamsBuilder#table() and InternalStreamsBuilder#createKTable(), I am not seeing how it's achieving this effect.
Not all thing the DSL does are possible at the Processor API level -- it's using some internals, that are not part of public API to achieve what you describe.
It's the call to InternalTopologyBuilder#connectSourceStoreAndTopic() that does the trick (cf. InternalStreamsBuilder#table()).
For your use case about de-duplication, it seem that you need two topics though (depending what de-duplication logic you apply). Restoring via a changelog topic does key-based updates and thus does not consider values (that might be part of your deduplication logic, too).

Shouldn't EntityManager#merge work with non identifiable entity?

I'm designing JAX-RS APIs.
POST /myentities
PUT /myentities/{myentityId}
I did like this.
#PUT
#Path("/{myentityId: \\d+}")
#Consumes(...)
#Transactional
public Response updateMyentity(
#PathParam("myentityId") final long myentityId,
final Myentity myentity) {
if (!Objects.equal(myentitiyId, myentity.getId())) {
// throw bad request
}
entityManager.merge(myentity);
return Response.noContent().build();
}
I suddenly get curious and questioned to my self.
When a client invokes following request
PUT /myentities/101 HTTP/101
Host: ..
<myentity id=101>
</myentity>
Is it possible that the request is processed even if there is no resource identified by 101?
I ran a test.
acceptEntityManager(entityManager -> {
// persist
final Device device1 = mergeDevice(entityManager, null);
entityManager.flush();
logger.debug("device1: {}", device1);
assertNotNull(device1.getId());
// remove
entityManager.remove(device1);
entityManager.flush();
assertNull(entityManager.find(Device.class, device1.getId()));
// merge again, with non existing id
final Device device2 = entityManager.merge(device1);
entityManager.flush();
logger.debug("device2: {}", device2);
});
I found that second merge works and a new id assigned.
Is this normal? Shouldn't EntityManager#merge deny the operation?
Is it mean that any client can attack the API by calling
PUT /myentities/<any number>
?
If I understand this right, then yes, this is how merge should work.
merge creates a new instance of your entity class, copies the current state of the entity that you provide (in your case device1, and adds that NEW instance to the transaction.
So the NEW instance is managed, and any changes you make to the old instance are not updated in the transaction and will not be propagated to the database via flush - operation.
So while you removed device1 from the database with entityManager.remove(device1); the call
final Device device2 = entityManager.merge(device1); adds a new copy of it (with new ID) to the database. Changes you make to device1 are not tracked in the transaction, so they won't be reflected in the database tables once you flush. Changes to device2 on the other hand WILL be tracked in the transaction.