I am trying to learn Kafka Streams using Confluent's test platform and the setup instruction here. I can start up and connect to my test broker, but the streams application never writes to my sink topic. Looking in the logs, Kafka Streams is constantly fetching and monitoring the offset (if I am reading the logs correctly), but it never actually reads or writes anything.
14:07:29.654 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Received successful Heartbeat response
14:07:29.770 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:29.770 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:29.770 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:30.273 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:30.273 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:30.273 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:30.775 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:30.776 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:30.776 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:31.279 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:31.279 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:31.279 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:31.782 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:31.782 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:31.782 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:32.284 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Fetch READ_UNCOMMITTED at offset 4 for partition exportStatusUpdatesV2_rquinlivan-0 returned fetch data (error=NONE, highWaterMark=4, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=0)
14:07:32.284 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Added READ_UNCOMMITTED fetch request for partition exportStatusUpdatesV2_rquinlivan-0 at offset 4 to node localhost:29092 (id: 1 rack: null)
14:07:32.284 [katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending READ_UNCOMMITTED fetch for partitions [exportStatusUpdatesV2_rquinlivan-0] to broker localhost:29092 (id: 1 rack: null)
14:07:32.656 [kafka-coordinator-heartbeat-thread | katanaTest] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=katanaTest-a821659c-a994-4c6f-9714-fa48020b6378-StreamThread-1-consumer, groupId=katanaTest] Sending Heartbeat request to coordinator localhost:29092 (id: 2147483646 rack: null)
I don't understand from this stack trace what the issue is, and there is never an error logged. How can I debug why my streams application isn't working? What is the recommended method of debugging in Kafka Streams?
Related
Stateful Kafka Stream process is losing state when moving tasks from one pod to another during the rebalancing process.
When killing the pod, it restarts and the task stays assigned to the same pod and the process restarts correctly (No issues with this scenario).
When we scale down and the task is forced to move to another pod we can see that the Kafka stream restored the changelog, but it didn't work because the process created a new state with version 1 instead of 3992 and with the CounterSize 1 instead of 2964. (It can be checked in the logs)
After checking the logs we saw that the same key goes to different partitions in the changelog topic when the task is assigned to a new pod as per the screenshots below ( Not sure if is an issue )
Details of what we are using:
Application name is customer-state.
We are using AWS MSK - Apache Kafka version 2.8.0 with 6 brokers
Application deployed as statefulSets into EKS/Kubernetes with 3 replicas
Java 11
Spring Cloud 2020.0.3
Kafka Stream 2.8.0
We applied a few configuration changes, but we cannot find why it is not taking the last state from the changelog.
spring.cloud.stream.kafka.streams.binder.configuration:
max.request.size: 5242892
max.partition.fetch.bytes: 5242892
max.fetch.bytes: 15728676
acceptable.recovery.lag: 0
num.standby.replicas: 1
num.stream.threads: 2
spring.kafka.streams.binder:
functions:
process.applicationId: customer-state-process
configuration:
group.instance.id: ${POD_NAME} => We have a helm chart that will populate the VARIABLE with a POD name. As it is a statefulSet the name will be always customer-state-0, customer-state-1, customer-state-3
session.timeout.ms: 30000
acceptable.recovery.lag: 0
What are the settings for the state-store?
#Bean
public StreamsBuilderFactoryBeanConfigurer streamsBuilderFactoryBeanCustomizer() {
return factoryBean -> {
try {
final StreamsBuilder streamsBuilder = factoryBean.getObject();
if (isNull(streamsBuilder)) {
throw new NullPointerException("streamsBuilder is null");
}
// Customer's State initialization
final PrimitiveAvroSerde<String> customerKeySerde = createKeySerde(factoryBean, true);
final SpecificAvroSerde<StateAvro> valueSerde = createValueSerde(factoryBean);
final StoreBuilder<KeyValueStore<String, StateAvro>> storeBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME),
customerKeySerde,
valueSerde);
streamsBuilder.addStateStore(storeBuilder);
} catch (Exception e) {
throw new RuntimeException("Can't create State Store");
}
};
}
How is CounterSize implemented and saved to kafka?
It's a Java Map serialized to Avro, and then published to Kafka
CounterSize is printed in the logs is just a number of entries for that Map
"type" : "record",
"name" : "StateAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "version",
"type" : "long"
}, {
"name" : "DepositCounter",
"type" : [
"null",
"MetaCounterAvro"
],
"default": null
} ...
{
"type" : "record",
"name" : "MetaCounterAvro",
"namespace" : "com.dpml.avro.state",
"fields" : [ {
"name" : "entries",
"type" : {
"type" : "array",
"items" : "PairLongString",
"java-class" : "java.util.Map"
}
}, {
"name" : "hours",
"type" : "long"
} ]
}
This is the logs print:
log.debug("Successfully updated state for customerId={}, {}", getCustomerId(), createLogInfo(avro));
private String createLogInfo(StateAvro stateAvro) {
final var depositSize = Optional.ofNullable(stateAvro.getDepositCounter())
.map(MetaCounterAvro::getEntries)
.map(List::size)
.orElse(0);
return String.format("version=%d, eventTime=%d, triggerEvent=%s, updatedByEventId=%s, CounterSize=%s,
stateAvro.getVersion(),
stateAvro.getEventTime(),
stateAvro.getTriggerEvent().name(),
stateAvro.getUpdatedByEventId(),
depositSize);
}
LOGS:
=> offset from incoming event partition=2 offset=74800529
2022-02-24T10:52:58.867Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=topic-input-name, offset=74800529 task=1_2 - Received event {"header": {"timestamp": 1645699978378, "eventId": "8f10dc2e-fcf2-4fcb-a312-0bba7170e16d", "customerId": 1234567890}
2022-02-24T10:52:58.868Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890 version=3990, eventTime=1645699976221, updatedByEventId=21844581-5fd9-41da-b05d-e1cb719349a5, CounterSize=2962
2022-02-24T10:52:58.868Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978378 eventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d Technical info -> partition=2 offset=74800529 task=1_2
2022-02-24T10:52:58.869Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890 version=3991, eventTime=1645699978378, triggerEvent=CustomerEvent, updatedByEventId=8f10dc2e-fcf2-4fcb-a312-0bba7170e16d, CounterSize=2963
=> scaled down from 3 pods to 2 pods to force the rebalance
2022-02-24T10:52:59.011Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Attempt to heartbeat failed since group is rebalancing
2022-02-24T10:52:59.012Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] (Re-)joining group
2022-02-24T10:52:59.090Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully joined group with generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.AbstractCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Successfully synced group in generation Generation{generationId=192444, memberId='customer-state-0-1-6693263a-0054-46bb-b775-697741beb01a', protocol='stream'}
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Updating assignment with Assigned partitions: [topic-input-name-2, topic-input-name-0] Current owned partitions: [topic-input-name-2] Added partitions (assigned - owned): [topic-input-name-0] Revoked partitions (owned - assigned): []
2022-02-24T10:52:59.101Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Notifying assignor about the new Assignment(partitions=[topic-input-name-0, topic-input-name-2], userDataSize=98)
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StreamsPartitionAssignor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer] No followup rebalance was requested, resetting the rebalance schedule.
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.TaskManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Handle new assignment with: New active tasks: [1_0, 1_2] New standby tasks: [1_3] Existing active tasks: [1_2] Existing standby tasks: [1_1]
2022-02-24T10:52:59.102Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Suspended running
2022-02-24T10:52:59.150Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.154Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_1] Closed clean
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroSerializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroSerializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.155Z INFO i.c.k.s.KafkaAvroDeserializerConfig [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - KafkaAvroDeserializerConfig values: bearer.auth.token = [hidden] schema.registry.url = [https://cp-schema-registry.cp-schema-registry.svc.cluster.local:443] basic.auth.user.info = [hidden] auto.register.schemas = true max.schemas.per.subject = 1000 basic.auth.credentials.source = URL schema.registry.basic.auth.user.info = [hidden] bearer.auth.credentials.source = STATIC_TOKEN specific.avro.reader = true value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicRecordNameStrategy key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
2022-02-24T10:52:59.157Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Adding newly assigned partitions: topic-input-name-0
2022-02-24T10:52:59.157Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from RUNNING to PARTITIONS_ASSIGNED
2022-02-24T10:52:59.157Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from RUNNING to REBALANCING
2022-02-24T10:52:59.158Z INFO o.a.k.c.c.i.ConsumerCoordinator [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer instanceId=customer-state-0-1, clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-consumer, groupId=customer-state-process] Setting offset for partition topic-input-name-0 to the committed offset FetchPosition{offset=73628671, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=141}}
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.199Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Initialized
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.ProcessorStateManager [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] State store customer-state-store did not find checkpoint offset, hence would default to the starting offset at changelog customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.240Z INFO o.a.k.s.p.i.StandbyTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] standby-task [1_3] Initialized
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.KafkaConsumer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): customer-state-process-customer-state-store-changelog-0, customer-state-process-customer-state-store-changelog-3, customer-state-process-customer-state-store-changelog-2
2022-02-24T10:52:59.286Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-0
2022-02-24T10:52:59.287Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition customer-state-process-customer-state-store-changelog-3
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-3 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 2 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:52:59.471Z INFO o.a.k.c.c.i.SubscriptionState [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - [Consumer clientId=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1-restore-consumer, groupId=null] Resetting offset for partition customer-state-process-customer-state-store-changelog-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-4.d2-msk-cluster-pt.c2.kafka.eu-central-1.amazonaws.com:9094 (id: 4 rack: euc1-az3)], epoch=0}}.
2022-02-24T10:53:09.480Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration in progress for 1 partitions. {customer-state-process-customer-state-store-changelog-0: position=19011, end=42297, totalRestored=19011}
2022-02-24T10:53:11.769Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Processed 939 total records, ran 2 punctuators, and committed 4 total tasks since the last update
2022-02-24T10:53:14.105Z INFO o.a.k.s.p.i.StoreChangelogReader [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Finished restoring changelog customer-state-process-customer-state-store-changelog-0 to store customer-state-store with a total number of 42297 records
2022-02-24T10:53:14.106Z INFO c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Fetching state store in transformer
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamTask [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] task [1_0] Restored and ready to run
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] Restoration took 14995 ms for all tasks [1_0, 1_2, 1_3]
2022-02-24T10:53:14.152Z INFO o.a.k.s.p.i.StreamThread [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-thread [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2022-02-24T10:53:14.152Z INFO o.a.k.s.KafkaStreams [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - stream-client [customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf] State transition from REBALANCING to RUNNING
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699978060, "eventId": "96d3e5cc-f667-4735-91b8-529374c00d82", "customerId": 1234567890 }
=> At this point, the framework has finished restoring the changelog, so it should find the state of 2022-02-24T10:52:58.868Z, but it didn't
=> offset from incoming event become -1
2022-02-24T10:53:14.162Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Creating new state for customerId=1234567890
2022-02-24T10:53:14.514Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699978060 eventId=96d3e5cc-f667-4735-91b8-529374c00d82 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699977697, "eventId": "42a9535c-ab90-41da-965c-ee4c5d871626", "customerId": 1234567890 }
2022-02-24T10:53:14.515Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - State already exists for customerId=1234567890, version=1, eventTime=1645699978060, triggerEvent=CustomerEvent, updatedByEventId=96d3e5cc-f667-4735-91b8-529374c00d82, CounterSize=1
2022-02-24T10:53:14.515Z INFO c.k.d.k.t.CustomerNewEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - CustomerEvent -> adding customerId=1234567890 timeEvent=1645699977697 eventId=42a9535c-ab90-41da-965c-ee4c5d871626 Technical info -> partition=-1 offset=-1 task=1_0
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.DomainEventProcessor [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Successfully updated state for customerId=1234567890, version=2, eventTime=1645699977697, triggerEvent=CustomerEvent, updatedByEventId=42a9535c-ab90-41da-965c-ee4c5d871626, CounterSize=2
2022-02-24T10:53:14.516Z DEBUG c.k.d.k.t.EventToStateTransformer [thread=customer-state-process-2dff5b91-2c07-4a22-b7a8-7fbdbd413eaf-StreamThread-1] [] - Technical info -> context=null, offset=-1 task=1_0 - Received event {"header": {"timestamp": 1645699975892, "eventId": "459dc2ec-7a69-47f1-8c3e-53a5e5f4303b", "customerId": 1234567890 }
I'm trying to increase the number of consumers to match the number of partitions for a Kafka topic that we are reading from. There are three partitions so I configured the partitions for incoming messages to three as shown below:
mp:
messaging:
incoming:
topicA:
auto:
offset:
reset: earliest
topics: TOPIC-NAME
connector: smallrye-kafka
value:
deserializer: org.apache.kafka.common.serialization.StringDeserializer
group:
id: consumer-group
partitions: 3
However, I've been running the app for a while and it seems that the app is only processing messages in partition 0 and not in partitions 1 and 2. I see in the log that it creates three consumers.
2021-06-16 23:35:59,826 INFO [org.apa.kaf.cli.con.int.AbstractCoordinator] (vert.x-kafka-consumer-thread-0) [Consumer clientId=kafka-consumer-topicA-0, groupId=consumer-group] Successfully joined group with generation 15
2021-06-16 23:35:59,826 INFO [org.apa.kaf.cli.con.int.AbstractCoordinator] (vert.x-kafka-consumer-thread-2) [Consumer clientId=kafka-consumer-topicA-2, groupId=consumer-group] Successfully joined group with generation 15
2021-06-16 23:35:59,826 INFO [org.apa.kaf.cli.con.int.AbstractCoordinator] (vert.x-kafka-consumer-thread-1) [Consumer clientId=kafka-consumer-topicA-1, groupId=consumer-group] Successfully joined group with generation 15
2021-06-16 23:35:59,831 INFO [org.apa.kaf.cli.con.int.ConsumerCoordinator] (vert.x-kafka-consumer-thread-1) [Consumer clientId=kafka-consumer-topicA-1, groupId=consumer-group] Adding newly assigned partitions: TOPIC-NAME-1
2021-06-16 23:35:59,831 INFO [org.apa.kaf.cli.con.int.ConsumerCoordinator] (vert.x-kafka-consumer-thread-0) [Consumer clientId=kafka-consumer-topicA-0, groupId=consumer-group] Adding newly assigned partitions: TOPIC-NAME-0
2021-06-16 23:35:59,831 INFO [org.apa.kaf.cli.con.int.ConsumerCoordinator] (vert.x-kafka-consumer-thread-2) [Consumer clientId=kafka-consumer-topicA-2, groupId=consumer-group] Adding newly assigned partitions: TOPIC-NAME-2
But it seems to process messages in partition 0:
2021-06-16 23:38:00,141 INFO [MessageListener] (vert.x-worker-thread-2) Partition number:0; offset: 1593011
2021-06-16 23:38:00,282 INFO [MessageListener] (vert.x-worker-thread-1) Partition number:0; offset: 1593012
2021-06-16 23:38:00,412 INFO [MessageListener] (vert.x-worker-thread-4) Partition number:0; offset: 1593013
2021-06-16 23:38:00,543 INFO [MessageListener] (vert.x-worker-thread-6) Partition number:0; offset: 1593014
2021-06-16 23:38:00,692 INFO [MessageListener] (vert.x-worker-thread-8) Partition number:0; offset: 1593015
2021-06-16 23:38:00,838 INFO [MessageListener] (vert.x-worker-thread-10) Partition number:0; offset: 1593016
2021-06-16 23:38:00,977 INFO [MessageListener] (vert.x-worker-thread-12) Partition number:0; offset: 1593017
2021-06-16 23:38:01,131 INFO [MessageListener] (vert.x-worker-thread-14) Partition number:0; offset: 1593018
2021-06-16 23:38:01,272 INFO [MessageListener] (vert.x-worker-thread-16) Partition number:0; offset: 1593019
2021-06-16 23:38:01,406 INFO [MessageListener] (vert.x-worker-thread-18) Partition number:0; offset: 1593020
2021-06-16 23:38:01,535 INFO [MessageListener] (vert.x-worker-thread-0) Partition number:0; offset: 1593021
2021-06-16 23:38:01,670 INFO [MessageListener] (vert.x-worker-thread-3) Partition number:0; offset: 1593022
2021-06-16 23:38:01,799 INFO [MessageListener] (vert.x-worker-thread-5) Partition number:0; offset: 1593023
Here's the code snippet of the listener class:
#Incoming("topicA")
#Blocking
public CompletionStage<Void> consume(final IncomingKafkaRecord<String, String> message) {
log.info("Partition number:" + message.getPartition() + "; offset: " + message.getOffset());
return message.ack();
}
Is this a bug with small-rye kafka?
It appears these settings will use multiple partitions:
this will consume all messages.
mp.messaging.incoming.your-events.auto.offset.reset=earliest
mp.messaging.incoming.your-events.group.id=${quarkus.uuid}
If you are using an emitter this will work without the above settings;
int partition = 0;
Message<Integer> message = Message.of(value)
.addMetadata(OutgoingKafkaRecordMetadata.<String>builder()
.withKey(key)
.withPartition(partition) // change for each partition, 0, 1, 2..
.withTopic("your-events")
.build());
I have a cluster with 3 kafka brokers, with a topic called fallback_topic
There is only one consumerGroup that consumes from this topic and only one consumer in this consumerGroup
After injecting a few messages, I can see the messages has been published to Kafka. The LogSize has been moved by the new messages; however, Consumer Offset stays the same and no message is ever consumed.
Below is the log when consumer.poll(3000) ran. The partition (4, 7, 10) received new messages from producer, but when consumer tried to read it, it reports error=OFFSET_OUT_OF_RANGE
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.clients.FetchSessionHandler - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Node 654000 sent a full fetch response that created a new incremental fetch session 685508830 with 7 response partition(s)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 1062 for partition fallback_topic-1 returned fetch data (error=NONE, highWaterMark=1062, lastStableOffset = -1, logStartOffset = 1062, abortedTransactions = null, recordsSizeInBytes=0)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 124094 for partition fallback_topic-4 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, abortedTransactions = null, recordsSizeInBytes=0)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 762 for partition fallback_topic-7 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, abortedTransactions = null, recordsSizeInBytes=0)
04:20:41.311 [kafka-coordinator-heartbeat-thread | uniqueConsumerGroup] DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=uniqueConsumerGroup] Fetch READ_UNCOMMITTED at offset 897 for partition fallback_topic-10 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, abortedTransactions = null, recordsSizeInBytes=0)
My understanding is when leader of the partition changed the offset, but follower hasn't, thats when this error happens. But there is no broker outage, so consumer is using the same leader all the time. Can anyone help me with why there is a OFFSET_OUT_OF_RANGE error. Thank you very much. Below is my code and I skipped consumer.commitAsync() because my problem happened before the commit.
List<Event> events = new ArrayList<Event>();
consumer.subscribe(Arrays.asList("fallback_topic"));
ConsumerRecords<String, byte[]> records;
do {
logger.info("Start polling messages from " + topic);
records = consumer.poll(3000);
logger.info("done polling.");
records.partitions().forEach(tp -> logger.info("found records from "+tp.topic()+"-"+tp.partition()));
for (ConsumerRecord<String, byte[]> record : records) {
Event event = EventKafkaSerializer.serializer.deserializeEvent(new ByteArrayInputStream(record.value()));
logger.info(event.getId()+" "+event.getData().toString());
events.add(event);
}
} while(records.count()>0);
logger.info("Found total events "+events.size());
Found out why.
I forget to run consumer.close() at the end
I'm using flink streaming and flink-connector-kafka to process data from kafka. when I configure FlinkKafkaConsumer010 with setStartFromTimestamp(1586852770000L) , at this time, all data's time in kafka topic A is before 1586852770000L, then I send some message to partition-0 and partition-4 of Topic A (Topic A has 6 partitions, current system time is already after 1586852770000L). but my flink program doesn't consume any data from Topic A. So is this a issue?
if I stop my flink program and restart it, it can consume data from partition-0 and partition-4 of Topic A , but still won't consume any data from other 4 partitions if i send data to the other 4 partitions unless i restart my flink program again.
the log of kafka is as follows:
2020-04-15 11:48:46,447 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={TopicA-4=1586836800000}, minVersion=1) to broker server1:9092 (id: 185 rack: null)
2020-04-15 11:48:46,463 TRACE org.apache.kafka.clients.NetworkClient - Sending {replica_id=-1,topics=[{topic=TopicA,partitions=[{partition=0,timestamp=1586836800000}]}]} to node 184.
2020-04-15 11:48:46,466 TRACE org.apache.kafka.clients.NetworkClient - Completed receive from node 185, for key 2, received {responses=[{topic=TopicA,partition_responses=[{partition=4,error_code=0,timestamp=1586852770000,offset=4}]}]}
2020-04-15 11:48:46,467 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Received ListOffsetResponse {responses=[{topic=TopicA,partition_responses=[{partition=4,error_code=0,timestamp=1586852770000,offset=4}]}]} from broker server1:9092 (id: 185 rack: null)
2020-04-15 11:48:46,467 DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Handling ListOffsetResponse response for TopicA-4. Fetched offset 4, timestamp 1586852770000
2020-04-15 11:48:46,448 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={TopicA-0=1586836800000}, minVersion=1) to broker server2:9092 (id: 184 rack: null)
2020-04-15 11:48:46,463 TRACE org.apache.kafka.clients.NetworkClient - Sending {replica_id=-1,topics=[{topic=TopicA,partitions=[{partition=0,timestamp=1586836800000}]}]} to node 184.
2020-04-15 11:48:46,467 TRACE org.apache.kafka.clients.NetworkClient - Completed receive from node 184, for key 2, received {responses=[{topic=TopicA,partition_responses=[{partition=0,error_code=0,timestamp=1586863210000,offset=47}]}]}
2020-04-15 11:48:46,467 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Received ListOffsetResponse {responses=[{topic=TopicA,partition_responses=[{partition=0,error_code=0,timestamp=1586863210000,offset=47}]}]} from broker server2:9092 (id: 184 rack: null)
2020-04-15 11:48:46,467 DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Handling ListOffsetResponse response for TopicA-0. Fetched offset 47, timestamp 1586863210000
2020-04-15 11:48:46,448 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps={TopicA-2=1586836800000}, minVersion=1) to broker server3:9092 (id: 183 rack: null)
2020-04-15 11:48:46,465 TRACE org.apache.kafka.clients.NetworkClient - Sending {replica_id=-1,topics=[{topic=TopicA,partitions=[{partition=2,timestamp=1586836800000}]}]} to node 183.
2020-04-15 11:48:46,468 TRACE org.apache.kafka.clients.NetworkClient - Completed receive from node 183, for key 2, received {responses=[{topic=TopicA,partition_responses=[{partition=2,error_code=0,timestamp=-1,offset=-1}]}]}
2020-04-15 11:48:46,468 TRACE org.apache.kafka.clients.consumer.internals.Fetcher - Received ListOffsetResponse {responses=[{topic=TopicA,partition_responses=[{partition=2,error_code=
0,timestamp=-1,offset=-1}]}]} from broker server3:9092 (id: 183 rack: null)
2020-04-15 11:48:46,468 DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - Handling ListOffsetResponse response for TopicA-2. Fetched offset -1, timestamp -1
2020-04-15 11:48:46,481 INFO org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Consumer subtask 0 will start reading the following 2 partitions from timestamp 1586836800000: [KafkaTopicPartition{topic='TopicA', partition=4}, KafkaTopicPartition{topic='TopicA', partition=0}]
from the log, except partition-0 and partition-4, other 4 partition's offset is -1. why the return offset is -1 instead of the lastest offset?
in Kafka client's code( Fetcher.java,line: 674-680)
// Handle v1 and later response
log.debug("Handling ListOffsetResponse response for {}. Fetched offset {}, timestamp {}",topicPartition, partitionData.offset, partitionData.timestamp);
if (partitionData.offset != ListOffsetResponse.UNKNOWN_OFFSET) {
OffsetData offsetData = new OffsetData(partitionData.offset, partitionData.timestamp);
timestampOffsetMap.put(topicPartition, offsetData);
}
the value of ListOffsetResponse.UNKNOWN_OFFSET is -1 . So the other 4 partitions is filtered , and the kafka consumer will not consume data from the other 4 partitions.
My Flink version is 1.9.2 and flink kafka connertor is
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_2.11</artifactId>
<version>1.9.2</version>
</dependency>
the doc of flink kafka connector is as follows:
setStartFromTimestamp(long): Start from the specified timestamp. For
each partition, the record whose timestamp is larger than or equal to
the specified timestamp will be used as the start position. If a
partition’s latest record is earlier than the timestamp, the partition
will simply be read from the latest record.
test program code:
import java.util.Properties
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
import org.junit.Test
class TestFlinkKafka {
#Test
def testFlinkKafkaDemo: Unit ={
//1. set up the streaming execution environment.
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic( TimeCharacteristic.ProcessingTime)
// To use fault tolerant Kafka Consumers, checkpointing needs to be enabled at the execution environment
env.enableCheckpointing(60000)
//2. kafka source
val topic = "message"
val schema = new SimpleStringSchema()
//server1:9092,server2:9092,server3:9092
val props = getKafkaConsumerProperties("localhost:9092","flink-streaming-client", "latest")
val consumer = new FlinkKafkaConsumer010(topic, schema, props)
//consume data from special timestamp's offset
//2020/4/14 20:0:0
//consumer.setStartFromTimestamp(1586865600000L)
//2020/4/15 20:0:0
consumer.setStartFromTimestamp(1586952000000L)
consumer.setCommitOffsetsOnCheckpoints(true)
//3. transform
val stream = env.addSource(consumer)
.map(x => x)
//4. sink
stream.print()
//5. execute
env.execute("testFlinkKafkaConsumer")
}
def getKafkaConsumerProperties(brokerList:String, groupId:String, offsetReset:String): Properties ={
val props = new Properties()
props.setProperty("bootstrap.servers", brokerList)
props.setProperty("group.id", groupId)
props.setProperty("auto.offset.reset", offsetReset)
props.setProperty("flink.partition-discovery.interval-millis", "30000")
props
}
}
set log level for kafka:
log4j.logger.org.apache.kafka=TRACE
create kafka topic:
kafka-topics --zookeeper localhost:2181/kafka --create --topic message --partitions 6 --replication-factor 1
send message to kafka topic
kafka-console-producer --broker-list localhost:9092 --topic message
{"name":"tom"}
{"name":"michael"}
This problem was resolved by upgrading the Flink/Kafka connector to the newer, universal connector -- FlinkKafkaConsumer -- available from flink-connector-kafka_2.11. This version of the connector is recommended for all versions of Kafka from 1.0.0 forward. With Kafka 0.10.x or 0.11.x, it is better to use the version-specific flink-connector-kafka-0.10_2.11 or flink-connector-kafka-0.11_2.11 connectors. (And in all cases, substitute 2.12 for 2.11 if you are using Scala 2.12.)
See the Flink documentation for more information on Flink's Kafka connector.
spring-kafka consumer stops consuming messages after a while. The stoppage happens every time, but never at the same duration. When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal. If I am not seeing any errors or exceptions, why is the consumer leaving the group?
org.springframework.boot:spring-boot-starter-parent:2.0.4.RELEASE
spring-kafka:2.1.8.RELEASE
org.apache.kafka:kafka-clients:1.0.2
I've set logging as
logging.level.org.apache.kafka=DEBUG
logging.level.org.springframework.kafka=INFO
other settings
spring.kafka.listener.concurrency=5
spring.kafka.listener.type=single
spring.kafka.listener.ack-mode=record
spring.kafka.listener.poll-timeout=10000
spring.kafka.consumer.heartbeat-interval=5000
spring.kafka.consumer.max-poll-records=50
spring.kafka.consumer.fetch-max-wait=10000
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.properties.security.protocol=SSL
spring.kafka.consumer.retry.maxAttempts=3
spring.kafka.consumer.retry.backoffperiod.millisecs=2000
ContainerFactory setup
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> recordsKafkaListenerContainerFactory(RetryTemplate retryTemplate) {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(listenerCount);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.RECORD);
factory.getContainerProperties().setPollTimeout(pollTimeoutMillis);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckOnError(false);
factory.setRetryTemplate(retryTemplate);
factory.setStatefulRetry(true);
factory.getContainerProperties().setIdleEventInterval(60000L);
return factory;
}
Listener configuration
#Component
public class RecordsEventListener implements ConsumerSeekAware {
private static final org.slf4j.Logger LOG = org.slf4j.LoggerFactory.getLogger(RecordsEventListener.class);
#Value("${mode.replay:false}")
public void setModeReplay(boolean enabled) {
this.isReplay = enabled;
}
#KafkaListener(topics = "${event.topic}", containerFactory = "RecordsKafkaListenerContainerFactory")
public void handleEvent(#Payload String payload) throws RecordsEventListenerException {
try {
//business logic
} catch (Exception e) {
LOG.error("Process error for event: {}",payload,e);
if(isRetryableException(e)) {
LOG.warn("Retryable exception detected. Going to retry.");
throw new RecordsEventListenerException(e);
}else{
LOG.warn("Dropping event because non retryable exception");
}
}
}
private Boolean isRetryableException(Exception e) {
return binaryExceptionClassifier.classify(e);
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
//do nothing
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do this only once per start of app
if (isReplay && !partitonSeekToBeginningDone) {
assignments.forEach((t, p) -> callback.seekToBeginning(t.topic(), t.partition()));
partitonSeekToBeginningDone = true;
}
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
//do nothing
LOG.info("Container is IDLE; no messages to pull.");
assignments.forEach((t,p)->LOG.info("Topic:{}, Partition:{}, Offset:{}",t.topic(),t.partition(),p));
}
boolean isPartitionSeekToBeginningDone() {
return partitonSeekToBeginningDone;
}
void setPartitonSeekToBeginningDone(boolean partitonSeekToBeginningDone) {
this.partitonSeekToBeginningDone = partitonSeekToBeginningDone;
}
}
When app is no longer consuming, in the end of the log always I see the statement that consumer sent LEAVE_GROUP signal.
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:05.770 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5347 to node 2147482638
2019-05-02 18:31:05.872 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:10.856 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending Heartbeat request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:10.857 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send HEARTBEAT {group_id=app,generation_id=6,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5348 to node 2147482638
2019-05-02 18:31:10.958 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Received successful Heartbeat response
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Sending LeaveGroup request to coordinator x.x.x.com:9093 (id: 2147482638 rack: null)
2019-05-02 18:31:11.767 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-1, groupId=app] Using older server API v0 to send LEAVE_GROUP {group_id=app,member_id=consumer-1-98d28e69-b0b9-4c2b-82cd-731e53b74b87} with correlation id 5349 to node 2147482638
2019-05-02 18:31:11.768 DEBUG 9548 --- [kafka-coordinator-heartbeat-thread | app] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-1, groupId=app] Disabling heartbeat thread
Full log
Thanks to all who replied. Turns out, it was indeed the broker dropping the consumer on session timeout. The broker a very old version (0.10.0.1) did not accommodate the newer features as outlined in KIP-62 that spring-kafka version we used could make use of.
Since we could not dictate the upgrade to the broker or changes to session timeout, we simply modified our processing logic so as to finish the task under the session timeout.