Kafka Transaction API - Multi cluster - apache-kafka

My consumer is going to get data from Cluster A - which is non secure and does PLAINTEXT based communication.
Once consumer received that message from cluster A , application split the message into 3 parts based on business logic and send ( Producer) the message to the cluster B ( SASL_SSL) to 3 topics.
Consumer
-> Cluster A (PLAINTEXT )
-> Topic : raw-item
Split message into three parts
Producer
-> Cluster B (SASL_SSL)
-> Topics : item, price, inventory
If any thing goes wrong with any of the 3 destination topics ( item, price, inventory) - then entire Transaction will roll backed.
Does Multi cluster Transaction supported by spring - Kafka configuration?
I get below exception -
2022-04-18 12:28:55.827 INFO 26312 --- [ntainer#0-0-C-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=multi-topic-tx-producer-1, transactionalId=tx-432eeb17-7af0-43a9-bfe9-d757234faca4raw-item-consumer-grp.OSMI_C02_CATALOG_MKPDOMAIN.0] Aborting incomplete transaction
2022-04-18 12:28:55.835 ERROR 26312 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : Authentication/Authorization Exception and no authExceptionRetryInterval set
org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: raw-item-consumer-grp
raw-item-consumer-grp ( consumer ) getting data from cluster A where No ACL added then why it ask Group permission
Is this due to Non supported Multi cluster Transaction for by spring - Kafka configuration?
below configuration clearly shows my consumer is getting data from cluster A and Producer will send data to SASL_SSL enabled cluster B
spring:
profiles : local
kafka:
consumer:
bootstrap-servers: localhost:9192,localhost:9193,localhost:9194
groupId: raw-item-consumer-grp
client-id: raw-item-consumer-client
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
enable-auto-commit: false
auto-offset-reset: earliest
isolation-level: read-committed
producer:
client-id: multi-topic-tx-producer
bootstrap-servers: localhost:9092,localhost:9093,localhost:9094
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
ssl:
trust-store-location: kafka.producer.truststore.jks
trust-store-password: password
transaction-id-prefix: tx-${random.uuid}
properties:
sasl:
jaas:
config: org.apache.kafka.common.security.scram.ScramLoginModule required username="rawitem-multitopic-sasl-producer" password="Dem12345";
mechanism: SCRAM-SHA-512
security:
protocol: SASL_SSL
ssl.endpoint.identification.algorithm:
enable.idempotence: true
acks: all
retries: 10

Related

Kafka SSL Not streaming data to SSL Druid

I am new to druid and trying to do kafka(SSL) ingestion to SSL enabled druid. Druid is running on https.
Kafka Version : 2.2.2
Druid Version : 0.18.1
Kafka SSL works and I can assure it using the producer and consumer scripts :
bin/kafka-console-producer.sh --broker-list kafka01:9093 --topic testssl --producer.config config/client.properties
bin/kafka-console-consumer.sh --bootstrap-server kafka01:9093 --topic testssl config/client.properties --from-beginning
The above thing works. So I can assure that kafka SSL is setup.
Druid SSL Configuration :
druid.enablePlaintextPort=false
druid.enableTlsPort=true
druid.server.https.keyStoreType=jks
druid.server.https.keyStorePath=.jks
druid.server.https.keyStorePassword=
druid.server.https.certAlias=
druid.client.https.protocol=TLSv1.2
druid.client.https.trustStoreType=jks
druid.client.https.trustStorePath=.jks
druid.client.https.trustStorePassword=
Kafka SSL configuration :
ssl.truststore.location=<location>.jks --- The same is used for druid also
ssl.truststore.password=<password>
ssl.keystore.location=<location>.jks --- The same is used for druid also
ssl.keystore.password=<password>
ssl.key.password=<password>
ssl.enabled.protocols=TLSv1.2
ssl.client.auth=none
ssl.endpoint.identification.algorithm=
security.protocol=SSL
My consumerProperties spec looks like this :
"consumerProperties": {
"bootstrap.servers" : "kafka01:9093",
"security.protocol": "SSL",
"ssl.enabled.protocols" : "TLSv1.2",
"ssl.endpoint.identification.algorithm": "",
"group.id" : "<grouop_name>",
"ssl.keystore.type": "JKS",
"ssl.keystore.location" : "/datadrive/<location>.jks",
"ssl.keystore.password" : "<password>",
"ssl.key.password" : "<password>",
"ssl.truststore.location" : "/datadrive/<location>.jks",
"ssl.truststore.password" : "<password>",
"ssl.truststore.type": "JKS"
}
After ingestion, the datasource gets created and the segments also get created but with 0 rows.
And after sometime I am continuously getting in the druid logs:
[task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=kafka-supervisor-llhigfpg] Sending READ_COMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(testssl-0)) to broker kafka01:9093 (id: 0 rack: null)
And after sometimes in coordinator-overlord.log I am getting :
2020-08-03T16:51:42,881 DEBUG [JettyScheduler] org.eclipse.jetty.io.WriteFlusher - ignored: WriteFlusher#278a176a{IDLE}->null
java.util.concurrent.TimeoutException: Idle timeout expired: 300001/300000 ms
I am not sure what has gone wrong. I could not find much on the net for this issue. Need help on this.
NOTE : When druid is non-https and kafka is not ssl enabled, everything works fine.

Multiple instance with Spring Cloud Bus Kafka

My question is how to manage the multi instance with Spring Cloud Stream Kafka.
Let me explain, in a Spring Cloud Stream Microservices context (eureka, configserver, kafka) I want to have 2 instances of the same microservice. When I change a configuration in my GIT Repository, the configserver (via a webhook) will push a message into the Kafka topic.
If i use the same group-id in my microservice, only one of two instances will received the notification, and reload his spring context.
But I need to refresh all instances ...
So, to do that, I have configured an unique group-id : ${spring.application.name}.bus.${hostname}
It's work well, but the problem is, each time I start a new instance of my service, it create a new consumer group in kafka. Now i have a lot of unused consumer group.
[![consumers for a microservice][1]][1]
[1]: https://i.stack.imgur.com/6jIzx.png
Here is the Spring Cloud Stream configuration of my service :
spring:
cloud:
bus:
destination: sys.spring-cloud-bus.refresh
enabled: true
refresh:
enabled: true
env:
enabled: true
trace:
enabled: false
stream:
bindings:
# Override spring cloud bus configuration with a specific binder named "bus"
springCloudBusInput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
springCloudBusOutput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
binders:
bus:
type: kafka
defaultCandidate: false
environment:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka-dev.hcuge.ch:9092
kafka:
streams:
bindings:
springCloudBusInput:
consumer:
startOffset: latest # Reset offset to the latest value to avoid consume configserver notifications on startup
resetOffsets: true
How to avoid lot of consumer creation ? Should I remove old consumer group in kafka ?
I think my solution is not the best way to do it, so if you have a better option, I'm interested;)
Thank you
If you don't provide a group, bus will use a random group anyway.
The broker will eventually remove the unused groups according to its offsets.retention.minutes property (currently 7 days by default).

Spring Cloud #StreamListener consumer not registering CONSUMER-ID, HOST and CLIENT-ID in consumer group

We have a spring cloud consumer to read message from one kafka topic. Following is the interface for channel
#Component
public interface CollectionStreams {
String INPUT_REPORT = "report-in";
String OUTPUT_REPORT = "report-out";
#Input(INPUT_REPORT)
SubscribableChannel inboundReport();
#Output(OUTPUT_REPORT)
MessageChannel outboundReportToJM();
}
The problem we are facing is that while listing in the consumer group “report” we are not able to see CONSUMER-ID, HOST and CLIENT-ID as expected.
[root#innolx131112 templates]# kubectl -n tmo-ccm exec kafka-test-client -- /usr/bin/kafka-consumer-groups --bootstrap-server kafka:9092 --describe -group report
Note: This will not show information about old Zookeeper-based consumers.
Consumer group 'report' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
report 3 2 3 1 - - -
report 1 4 4 0 - - -
report 2 1 1 0 - - -
report 4 2 2 0 - - -
report 0 2 2 0 - - -
By the way we are running our application as well as Kafka in Kubernetes.
Due to this issue we are not able to multiple POD of our application as all PODs
Following is the interface for channel
#Component
public interface CollectionStreams {
String INPUT_REPORT = "report-in";
String OUTPUT_REPORT = "report-out";
#Input(INPUT_REPORT)
SubscribableChannel inboundReport();
#Output(OUTPUT_REPORT)
MessageChannel outboundReportToJM();
}
And we have define the method as follows to read message from topic.
#StreamListener(CollectionStreams.INPUT_REPORT)
//public void handleMessage(#Payload MessageT message) {
public void handleMessage(#Payload MessageT message, #Headers MessageHeaders msg) {
Following is configuration yaml
**
cloud:
stream:
kafka:
binder:
brokers: kafka
autoCreateTopics: false
bindings:
report-in:
consumer:
autoCommitOffset: false
autoCommit: false
auto-offset-reset: earliest
autoCommitOnError: false
resetOffsets: false
autoRebalanceEnabled: false
ackEachRecord: false
bindings:
report-in:
destination: report
contentType: application/json
group: report
consumer:
concurrency: 5
partitioned: true
report-out:
destination: jobmanager
contentType: application/json
group: jobmanager
producer:
autoAddPartitions: true
**
We also have another consumer for which we have not set any kafka related consumer props and surprisingly those consumers are registering themselves properly.
Config:
cloud:
stream:
kafka:
binder:
autoCreateTopics: false
brokers: kafka
bindings:
parse-in:
destination: parser
contentType: application/json
group: parser
consumer:
concurrency: 3
partitioned: true
parse-out:
destination: jobmanager
contentType: application/json
group: jobmanager
producer:
partitionKeyExpression: headers['contentType']
autoAddPartitions: true
And describe command output as below
[root#innolx131112 shyama]# kubectl -n tmo-ccm exec kafka-test-client -- /usr/bin/kafka-consumer-groups --bootstrap-server kafka:9092 --describe -group parser
Note: This will not show information about old Zookeeper-based consumers.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
parser 0 14 14 0 consumer-3-cb99e45e-21b3-4efb-ac7b-4f642a9486be /192.168.0.26 consumer-3
parser 1 13 13 0 consumer-3-cb99e45e-21b3-4efb-ac7b-4f642a9486be /192.168.0.26 consumer-3
parser 2 15 15 0 consumer-4-bec1e4af-d771-47fe-83ad-3440f3a6d4bd /192.168.0.26 consumer-4
parser 3 15 15 0 consumer-4-bec1e4af-d771-47fe-83ad-3440f3a6d4bd /192.168.0.26 consumer-4
parser 4 12 12 0 consumer-5-b9ac7e36-58cf-40cb-b37d-a0fa092a0d56 /192.168.0.26 consumer-5
Is it that as we are providing kafka related props for 1st consumer(report consumer), its not able to register???
Consumer group 'report' has no active members.
This simply means your app wasn't running when you entered the command.
That info is transient and not retained when the app stops.
The partitions might be assigned to a different instance next time.
EDIT
whenever we are instantiating more consumers using same group they are processing the same old already processed by old consumers
Well, looking at your configuration more carefully, that's exactly what you asked for...
autoRebalanceEnabled: false
... means that Kafka will not use group management and the partitions will be allocated by Spring Cloud stream.
autoCommitOffset: false
means that Spring will not commit any offsets and that is the responsibility of your application. If you don't commit the offset, you will get the behavior you observe.

Producer Partition Count override not effective

Interpreting this - https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#_producer_properties
my understanding is that, if the partitionCount override is less than the actual number of partitions on an existing kafka topic, then the producer should use the actual number of partitions rather than the override value. My experience is that the producer uses the partitionCount value regardless of how many partitions ( > partitionCount ) are actually configured on the kafka topic.
Ideally, I would like the producer to read the number of partitions on a pre-configured topic from kafka and write messages across all available partitions.
Spring-Cloud version: Finchley.RELEASE
Kafka Broker version : 1.0.0
application.yml:
spring:
application:
name: my-app
cloud:
stream:
default:
contentType: application/json
kafka:
binder:
brokers:
- ${KAFKA_HOST}:${KAFKA_PORT}
auto-create-topics: false
bindings:
input-channel:
destination: input-topic
contentType: application/json
group: input-group
output-channel:
destination: output-topic
contentType: application/json
producer:
partition-count: 2
partition-key-expression: payload['Id']
So, I am expecting that if the output topic is already configured with 6 partitions, the producer will recognise this and write to all of those. Could someone please verify my interpretation above? Or point out what I am missing to get the desired functionality?

Kafka messages are reprocessed

We have a micro-services that produces and consumes messages from Kafka using spring-boot and spring-cloud-stream.
versions:
spring-boot: 1.5.8.RELEASE
spring-cloud-stream: Ditmars.RELEASE
Kafka server: kafka_2.11-1.0.0
EDIT:
We are working in a Kubernetes environment using StatefulSets cluster of 3 Kafka nodes and a cluster of 3 Zookeeper nodes.
We experienced several occurrences of old messages that are reprocessed when those messages where already processed few days ago.
Several notes:
Before that happens the following logs were printed (there are more similar lines this is just a summary)
Revoking previously assigned partitions [] for group enrollment-service
Discovered coordinator dev-kafka-1.kube1.iaas.watercorp.com:9092 (id: 2147483646 rack: null)
Successfully joined group enrollment-service with generation 320
The above-mentioned incidents of revoking and reassigning of partitions happens every few hours. And just in few of those incidents old messages are re-consumed. In most cases the reassigning doesn't triggers message consumption.
The messages are from different partitions.
There are more than 1 message per partition that is being reprocessed.
application.yml:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka
defaultBrokerPort: 9092
zkNodes: zookeeper
defaultZkPort: 2181
minPartitionCount: 2
replicationFactor: 1
autoCreateTopics: true
autoAddPartitions: true
headers: type,message_id
requiredAcks: 1
configuration:
"[security.protocol]": PLAINTEXT #TODO: This is a workaround. Should be security.protocol
bindings:
user-enrollment-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
user-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
enrollment-mail-output:
producer:
sync: true
configuration:
retries: 10000
enroll-users-output:
producer:
sync: true
configuration:
retries: 10000
default:
binder: kafka
contentType: application/json
group: enrollment-service
consumer:
maxAttempts: 1
producer:
partitionKeyExtractorClass: com.watercorp.messaging.PartitionKeyExtractor
bindings:
user-enrollment-input:
destination: enroll-users
consumer:
concurrency: 10
partitioned: true
user-input:
destination: user
consumer:
concurrency: 5
partitioned: true
enrollment-mail-output:
destination: send-enrollment-mail
producer:
partitionCount: 10
enroll-users-output:
destination: enroll-users
producer:
partitionCount: 10
Is there any configuration that I might be missing? What can cause this behavior?
So the actual problem is the one that is described in the following ticket: https://issues.apache.org/jira/browse/KAFKA-3806.
Using the suggested workaround fixed it.