Logstash pipeline issues when sending to multiple Kafka topics - apache-kafka

I am using Logstash to extract change data from an SQL Server DB and send it to different Kafka topics.
Some Logstash config files send to the Ticket Topic others to the Availability topic
If I run just the configs that send to the Ticket topic on their own using the pipeline it works fine. If I run the configs for availability topic on their own in a pipeline they send the data ok.
However when I include the configs to send to both topics together I get the error. Please see extract from the logs. This time the availability topic failed other times the ticket topic fails.
[2021-03-22T07:30:00,172][WARN ][org.apache.kafka.clients.NetworkClient][AvaililityDOWN] [Producer clientId=Avail_down1] Error while fetching metadata with correlation id 467 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,172][ERROR][org.apache.kafka.clients.Metadata][AvaililityDOWN] [Producer clientId=Avail_down1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,203][INFO ][logstash.inputs.jdbc ][Ticket1][a296a0df2f603fe98d8c108e860be4d7a17f840f9215bb90c5254647bb9c37cd] (0.004255s) SELECT sys.fn_cdc_map_lsn_to_time(__$start_lsn) transaction_date, abs(convert(bigint, __$seqval)) seqval, * FROM cdc.dbo_TICKET_CT where ( __$operation = 2 or __$operation = 4) and modified_date > '2021-03-22T07:27:00.169' order by modified_date ASC
[2021-03-22T07:30:00,203][INFO ][logstash.inputs.jdbc ][AvailabilityMAXUP][7805e7bd44f20b373e99845b687dc15d7c2a3de084fb4424dd492be93b39b64a] (0.004711s) With Logstash as(
SELECT sys.fn_cdc_map_lsn_to_time(__$start_lsn) transaction_date, abs(convert(bigint, __$seqval)) seqval, *
FROM cdc.dbo_A_TERM_MAX_UPTIME_DAY_CT
)
select * from Logstash
where ( __$operation = 2 or __$operation = 4 or __$operation = 1 ) and TMZONE = 'Etc/UTC' and transaction_date > '2021-03-22T07:15:00.157' order by seqval ASC
[2021-03-22T07:30:00,281][WARN ][org.apache.kafka.clients.NetworkClient][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Error while fetching metadata with correlation id 633 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,281][ERROR][org.apache.kafka.clients.Metadata][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,297][WARN ][org.apache.kafka.clients.NetworkClient][AvaililityDOWN] [Producer clientId=Avail_down1] Error while fetching metadata with correlation id 468 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,297][ERROR][org.apache.kafka.clients.Metadata][AvaililityDOWN] [Producer clientId=Avail_down1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,406][WARN ][org.apache.kafka.clients.NetworkClient][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Error while fetching metadata with correlation id 634 : {dcsvisionavailability=TOPIC_AUTHORIZATION_FAILED}
[2021-03-22T07:30:00,406][ERROR][org.apache.kafka.clients.Metadata][AvailabilityMAXUP] [Producer clientId=Avail_MaxUp1] Topic authorization failed for topics [dcsvisionavailability]
[2021-03-22T07:30:00,406][WARN ][logstash.outputs.kafka ][AvailabilityMAXUP][3685b3e90091e526485060db8df552a756f11f0f7fd344a5051e08b484a8ff8a] producer send failed, dropping record {:exception=>Java::OrgApacheKafkaCommonErrors::TopicAuthorizationException, :message=>"Not authorized to access topics: [dcsvisionavailability]", :record_value=>"<A_TERM_MAX_UPTIME_DAY>\
This is the output section of the availability config
output {
kafka {
bootstrap_servers => "namespaceurl.windows.net:9093"
topic_id => "dcsvisionavailability"
security_protocol => "SASL_SSL"
sasl_mechanism => "PLAIN"
jaas_path => "C:\Logstash\keys\kafka_sasl_jaasAVAILABILITY.java"
client_id => "Avail_MaxUp1"
codec => line {
format => "<A_TERM_MAX_UPTIME_DAY>
<stuff deleted>"
}
}
}
The pipeline.yml file has this in it
## Ticket Topic
- pipeline.id: Ticket1
path.config: "TicketCT2KafkaEH8.conf"
queue.type: persisted
- pipeline.id: PublicComments1
path.config: "Public_DiaryCT2KafkaEH1.conf"
queue.type: persisted
## - Availability topic
- pipeline.id: AvailabilityDOWN
path.config: "Availability_Down_TimeCT2KafkaEH3.conf"
queue.type: persisted
- pipeline.id: AvailabilityMAXUP
path.config: "Availability_Max_UptimeCT2KafkaEH2.conf"
queue.type: persisted
I have tried running in different instances and yes that works where I have the Pipeline running and open another command window and run one other config sending to a different topic (for this I specify a different --path.data )
However with 40 configs going to 4 different topics I don't really want to run so lots of instances in parallel. Any advice welcomed

I have been able to resolve the issue. It was to do with the jaas_path file.
I had a different jaas_path file for each topic that specified the topic in the string as follows in the EntityPath.
KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="$ConnectionString"
password="Endpoint=sb://<stuff redacted>.windows.net/;SharedAccessKeyName=keyname;SharedAccessKey=<key redacted>;EntityPath=dcsvisionavailability";
};
When I provided common key from Event Hub for use with all the topics that does not have the ;EntityPath=topicname at the end it worked.
This makes sense since I had already specified the topic in the line
topic_id => "dcsvisionavailability" in the logstash conf file.

I'm glad it worked Walter.
For others who may look up for solution on this issue, the issue here was same shareAccessKeyName was used with different tokens for different Kafka topics for authorization at EventHub.
That is why, it all worked fine when requests targeted to only one topic came in for authorization.
When requests for different topics came to EventHub at the same time with same shareAccessKeyName but with different tokens, only one would go through but the other would exception out due to conflict of token.
Different options for solving this were
use same token for all requests (for all topics)
use different shareAccessKeyName with different token for each topic.
Walter chose#1 and needed topic name to be removed in password (as EventHub might have had topic wise authorization).
For topic wise authorization, solution #2 will be better.

Related

How to set consumer config values for Kafka Mirrormaker-2 2.6.1?

I am attempting to use mirrormaker 2 to replicate data between AWS Managed Kafkas (MSK) in 2 different AWS regions - one in eu-west-1 (CLOUD_EU) and the other in us-west-2 (CLOUD_NA), both running Kafka 2.6.1. For testing I am currently trying just to replicate topics 1 way, from EU -> NA.
I am starting a mirrormaker connect cluster using ./bin/connect-mirror-maker.sh and a properties file (included)
This works fine for topics with small messages on them, but one of my topic has binary messages up to 20MB in size. When I try to replicate that topic I get an error every 30 seconds
[2022-04-21 13:47:05,268] INFO [Consumer clientId=consumer-29, groupId=null] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2: {}. (org.apache.kafka.clients.FetchSessionHandler:481)
org.apache.kafka.common.errors.DisconnectException
When logging in DEBUG to get more information we get
[2022-04-21 13:47:05,267] DEBUG [Consumer clientId=consumer-29, groupId=null] Disconnecting from node 2 due to request timeout. (org.apache.kafka.clients.NetworkClient:784)
[2022-04-21 13:47:05,268] DEBUG [Consumer clientId=consumer-29, groupId=null] Cancelled request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=consumer-29, correlationId=35) due to node 2 being disconnected (org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient:593)
It gets stuck in a loop constantly disconnecting with request timeout every 30s and then trying again.
Looking at this, I suspect that the problem is the request.timeout.ms is on the default (30s) and it times out trying to read the topic with many large messages.
I followed the guide at https://github.com/apache/kafka/tree/trunk/connect/mirror to attempt to configure the consumer properties, however, no matter what I set, the timeout for the consumer remains fixed at the default, confirmed both by kafka outputting its config in the log and by timing how long between the disconnect messages. e.g. I set:
CLOUD_EU.consumer.request.timeout.ms=120000
In the properties that I start MM-2 with.
based on various guides I have found while looking at this, I have also tried
CLOUD_EU.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.override.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.override.request.timeout.ms=120000
None of which have worked.
How can I change the consumer request.timeout setting? The log is approx 10,000 lines long, but everywhere where the ConsumerConfig is logged out it logs request.timeout.ms = 30000
Properties file I am using:
# specify any number of cluster aliases
clusters = CLOUD_EU, CLOUD_NA
# connection information for each cluster
CLOUD_EU.bootstrap.servers = kafka.eu-west-1.amazonaws.com:9092
CLOUD_NA.bootstrap.servers = kafka.us-west-2.amazonaws.com:9092
# enable and configure individual replication flows
CLOUD_EU->CLOUD_NA.enabled = true
CLOUD_EU->CLOUD_NA.topics = METRICS_ATTACHMENTS_OVERSIZE_EU
CLOUD_NA->CLOUD_EU.enabled = false
replication.factor=3
tasks.max = 1
############################# Internal Topic Settings #############################
checkpoints.topic.replication.factor=3
heartbeats.topic.replication.factor=3
offset-syncs.topic.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3
config.storage.replication.factor=3
############################ Kafka Settings ###################################
# CLOUD_EU cluster over writes
CLOUD_EU.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.session.timeout.ms=150000

Problems with Amazon MSK default configuration and publishing with transactions

Recently we have started doing some testing of our Kafka connectors to MSK, Amazon's managed Kafka service. Publishing records seem to work fine however not when transactions are enabled.
Our cluster consists of 2 brokers (because we have 2 zones) using the default MSK configuration. We are creating our Java Kafka producer using the following properties:
bootstrap.servers=x.us-east-1.amazonaws.com:9094,y.us-east-1.amazonaws.com:9094
client.id=kafkautil
max.block.ms=5000
request.timeout.ms=5000
security.protocol=SSL
transactional.id=transactions
However when the producer was started with the transactional.id setting which enables transactions, the initTransactions() method hangs:
producer = new KafkaProducer<Object, Object>(kafkaProperties);
if (kafkaProperties.containsKey(ProducerConfig.TRANSACTIONAL_ID_CONFIG)) {
// this hangs
producer.initTransactions();
}
Looking at the log output we see streams of the following, and it didn't seem like it ever timed out.
TransactionManager - Enqueuing transactional request (type=FindCoordinatorRequest,
coordinatorKey=y, coordinatorType=TRANSACTION)
TransactionManager - Request (type=FindCoordinatorRequest, coordinatorKey=y,
coordinatorType=TRANSACTION) dequeued for sending
NetworkClient - Found least loaded node z:9094 (id: -2 rack: null) connected with no
in-flight requests
Sender - Sending transactional request (type=FindCoordinatorRequest, coordinatorKey=y,
coordinatorType=TRANSACTION) to node z (id: -2 rack: null)
NetworkClient - Sending FIND_COORDINATOR {coordinator_key=y,coordinator_type=1} with
correlation id 424 to node -2
NetworkClient - Completed receive from node -2 for FIND_COORDINATOR with
correlation id 424, received {throttle_time_ms=0,error_code=15,error_message=null,
coordinator={node_id=-1,host=,port=-1}}
TransactionManager LogContext.java:129 - Received transactional response
FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null',
error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)) for request
(type=FindCoordinatorRequest, coordinatorKey=xxx, coordinatorType=TRANSACTION)
As far as I can determine, the broker is available and each of the hosts in the bootstrap.servers property are available. If I connect to each of them and publish without transactions then it works.
Any idea what we are missing?
However when the producer was started with the transactional.id setting which enables transactions, the initTransactions() method hangs:
This turned out to a problem with the default AWS MSK properties and the number of brokers. If you create a Kafka cluster with less than 3 brokers, the following settings will need to be adjusted.
The following settings should be set (I think) to the number of brokers:
Property
Kafka Default
AWS Default
Description
default.replication.factor
1
3
Default replication factors for automatically created topics.
min.insync.replicas
1
2
Minimum number of replicas that must acknowledge a write for the write to be considered successful
offsets.topic.replication.factor
3
3
Internal topic that shares offsets on topics.
transaction.state.log.replication.factor
3
3
Replication factor for the transaction topic.
Here's the Kafka docs on broker properties.
Because we have 2 brokers, we ended up with:
default.replication.factor=2
min.insync.replicas=2
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
This seemed to resolve the issue. IMHO this is a real problem with the AWS MSK and the default configuration. They need to auto-generate the default configuration and tune it depending on the number of brokers in the cluster.

Kafka streams fail on decoding timestamp metadata inside StreamTask

We got strange errors on Kafka Streams during starting app
java.lang.IllegalArgumentException: Illegal base64 character 7b
at java.base/java.util.Base64$Decoder.decode0(Base64.java:743)
at java.base/java.util.Base64$Decoder.decode(Base64.java:535)
at java.base/java.util.Base64$Decoder.decode(Base64.java:558)
at org.apache.kafka.streams.processor.internals.StreamTask.decodeTimestamp(StreamTask.java:985)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:303)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:265)
at org.apache.kafka.streams.processor.internals.AssignedTasks.initializeNewTasks(AssignedTasks.java:71)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:385)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
and, as a result, error about failed stream: ERROR KafkaStreams - stream-client [xxx] All stream threads have died. The instance will be in error state and should be closed.
According to code inside org.apache.kafka.streams.processor.internals.StreamTask, failure happened due to error in decoding timestamp metadata (StreamTask.decodeTimestamp()). It happened on prod, and can't reproduce on stage.
What could be the root cause of such errors?
Extra info: our app uses Kafka-Streams and consumes messages from several kafka brokers using the same application.id and state.dir (actually we switch from one broker to another, but during some period we connected to both brokers, so we have two kafka streams, one per each broker). As I understand, consumer group lives on broker side (so shouldn't be a problem), but state dir is on client side. Maybe some race condition occurred due to using the same state.dir for two kafka streams? could it be the root cause?
We use kafka-streams v.2.4.0, kafka-clients v.2.4.0, Kafka Broker v.1.1.1, with the following configs:
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.timestamp.extractor: org.apache.kafka.streams.processor.WallclockTimestampExtractor
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
commit.interval.ms: 5000
num.stream.threads: 1
auto.offset.reset: latest
Finally, we figured out what is the root cause of corrupted metadata by some consumer groups.
It was one of our internal monitoring tool (written with pykafka) that corrupted metadata by temporarily inactive consumer groups.
Metadata were unencrupted and contained invalid data like the following: {"consumer_id": "", "hostname": "monitoring-xxx"}.
In order to understand what exactly we have in consumer metadata, we could use the following code:
Map<String, Object> config = Map.of( "group.id", "...", "bootstrap.servers", "...");
String topicName = "...";
Consumer<byte[], byte[]> kafkaConsumer = new KafkaConsumer<byte[], byte[]>(config, new ByteArrayDeserializer(), new ByteArrayDeserializer());
Set<TopicPartition> topicPartitions = kafkaConsumer.partitionsFor(topicName).stream()
.map(partitionInfo -> new TopicPartition(topicName, partitionInfo.partition()))
.collect(Collectors.toSet());
kafkaConsumer.committed(topicPartitions).forEach((key, value) ->
System.out.println("Partition: " + key + " metadata: " + (value != null ? value.metadata() : null)));
Several options to fix already corrupted metadata:
change consumer group to a new one. caution that you might lose or duplicate messages depending on the latest or earliest offset reset policy. so for some cases, this option might be not acceptable
overwrite metadata manually (timestamp is encoded according to logic inside StreamTask.decodeTimestamp()):
Map<TopicPartition, OffsetAndMetadata> updatedTopicPartitionToOffsetMetadataMap = kafkaConsumer.committed(topicPartitions).entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, (entry) -> new OffsetAndMetadata((entry.getValue()).offset(), "AQAAAXGhcf01")));
kafkaConsumer.commitSync(updatedTopicPartitionToOffsetMetadataMap);
or specify metadata as Af////////// that means NO_TIMESTAMP in Kafka Streams.

Kafka - how to use #KafkaListener(topicPattern="${kafka.topics}") where property kafka.topics is 'sss.*'?

I'm trying to implement Kafka consumer with topic names as a pattern. E.g. #KafkaListener(topicPattern="${kafka.topics}") where property kafka.topics is 'sss.*'. Now when I send message to topic 'sss.test' or any other topic name like 'sss.xyz', 'sss.pqr', it's throwing error as below:
WARN o.apache.kafka.clients.NetworkClient - Error while fetching metadata with correlation id 12 : {sss.xyz-topic=LEADER_NOT_AVAILABLE}
I tried to enable listeners & advertised.listeners in the server.properties file but when I re-start Kafka it consumes messages from all old topics which were tried. The moment I use new topic name, it throws above error.
Kafka doesn't support pattern matching? Or there's some configuration which I'm missing? Please suggest.

Kafka - Error while fetching metadata with correlation id - LEADER_NOT_AVAILABLE

I have setup Kafka cluster locally. Three broker's with properties :
broker.id=0
listeners=PLAINTEXT://:9092
broker.id=1
listeners=PLAINTEXT://:9091
broker.id=2
listeners=PLAINTEXT://:9090
Things were working fine but I am now getting error :
WARN Error while fetching metadata with correlation id 1 : {TRAIL_TOPIC=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I am also trying to write messages via Java based client & I am getting error : unable to fetch metadata in 6000ms.
I faced the same problem, it is because the topic does not exist and the configuration of broker auto.create.topics.enable by default is set to false. I was using bin/connect-standalone so I didn't specify topics I would use.
I changed this config to true and it solved my problem.