Toubleshooting ECS container logs going to sqs - amazon-ecs

I am trying to trouble shoot an issue. I have a set of logs from ecs that are to be sent to an sqs queue. The container logs show that the logs are being sent. However, when I poll the sqs queue none of the messages are received.
Any help with this would be greatly appreciated.
2021-10-06T22:16:30.330Z ERROR [input.s3] s3/collector.go:137 handleSQSMessage failed: json unmarshal sqs message body failed at offset 1 with syntax error: invalid character 'T' looking for beginning of value {"queue_url": "https://sqs.us-east-1.amazonaws.com/721086286010/filebeat-cloudtrail-sqs", "region": "us-east-1"}
This has been an ongoing issue with elastic.

Related

Spring Cloud Stream Kafka Binder Async Producer messages getting lost if broker goes down

I am using Spring Cloud Stream Kafka Binder to produce message into Kafka. I kept producer sync to false, enabled error channel for producer, unclean leader election true on server side.
spring.cloud.stream.kafka.bindings.outputChannelName.producer.sync: false
error-channel-enabled: true
unclean.leader.election.enabled: true
I subscribed to errorChannel, to log the message those are failed to send.
I saw for async producer till delivery.timeout.ms reached messages are getting retried for UNKNOWN_TOPIC_OR_PARTITION, NOT_LEADER_FOR_PARTITION, NETWORK_EXCEPTION errors if broker goes down. Not for all errors its retrying. And after delivery.timeout.ms exceeds records are getting expired and reached to errorChannel and printed in logs. But still I have some messages are getting lost, those are not visible in success or errorChannel. I don't find any related exception / error even after log level set to Debug.
Can someone faced same issue earlier?
My Query is:
How to track the lost messages?
Is there any specific configuration I am missing?
Is there any specific exception I need to search in logs?
Any suggestions are welcome.

Artemis - How to avoid TransactionRolledBackException for Non-Transactional session

I use live/backup with shared-storage, and I use a non-transacted JMS session. I always send one message, and I always receive one message then acknowledge and receive second message only after successful first acknowledge.
I got this exception in my non-transacted session:
Execution of JMS message listener failed. Caused by: [javax.jms.TransactionRolledBackException - AMQ219030: The transaction was rolled back on failover to a backup server]
javax.jms.TransactionRolledBackException: AMQ219030: The transaction was rolled back on failover to a backup server
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollbackOnFailover(ClientSessionImpl.java:904)
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:927)
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.acknowledge(ActiveMQMessage.java:719)
It happens because the session was marked as "rollbackOnly". I got this state after the following steps:
I use Spring-JMS. Consumer session works 24/7 (infinite loop session.receive())
The Master Node crashed, then the Master node was restarted
After recovery (After a couple of hours), I sent a message to the queue. The consumer read the message and throw Exception on acknowledge(because was marked as rollback-only)
I read message again (this is not very bad for my task) but Redelivery Count has not been increased
My consumer code:
onMessage(Message message) {
if (redeliveryCount(message) > 0){
processAsDublicate(message); // It's not invoked - it is error in my business logic.
}
}
I migrated from another broker and and I thought not to change the client logic
Question:
How to avoid TransactionRolledBackException for Non-Transactional session? If this is not possible i should change consumer code?
Thank you in Advance
UPDATE AFTER ANSWER:
https://github.com/apache/activemq-artemis/tree/2.14.0/examples/features/ha/replicated-failback
This example is not suitable for my case - I don't have non-acknowledged messages. I got this state after the following steps: 1) Restart server 2) consume message 3) acknoledge message
We use a broker for ~30 applications (24/7) ~ 200 consumers in total
For example, on the weekend we restart the JMS Broker
Will all consumers start getting this exception after consume new messages
(They don't have non-acknowledged messages)
The TransactionRolledBackException is expected as you can see in the replicated-failback example.
To prevent a consumer from receiving the same message more times, an idempotent consumer must be implemented, ie Apache Camel provides an Idempotent consumer component that would work with any JMS provider, see: http://camel.apache.org/idempotent-consumer.html

Kafka issue in kubernetes

When any message push to Kafka topic i'm seeing below message.
my message contains only 5 elements and it is plain text only it wont exceed 300 characters still seeing below error.
WARN [SocketServer brokerId=0] Unexpected error from /127.0.0.1; closing connection (org.apache.kafka.common.network.Selector)
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1920298859 larger than 504857600)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:104)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:381)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:342)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:609)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:541)
at org.apache.kafka.common.network.Selector.poll(Selector.java:467)
at kafka.network.Processor.poll(SocketServer.scala:689)
at kafka.network.Processor.run(SocketServer.scala:594)
at java.lang.Thread.run(Thread.java:748)
If no active consumer for long time around 8 to 10 hours kafka itself killing in Kubernetes.
I want some good article to setup Kafka cluster in Kubernetes any help appreciated.
1920298859 is a recognizable value, it's ruok decoded as a integer:
>>> struct.unpack("!I", struct.pack("!4s", "ruok".encode("UTF8")))
(1920298859,)
It looks like you have some Zookeeper tooling/healthcheck trying to connect to Kafka. The ruok command should only be sent to Zookeeper hosts. Kafka only accepts connections from Kafka clients.

Kafka Streams shutdown after IllegalStateException: No current assignment for partition

I have a Kafka Streams application that launches and runs successfully. We have 4 instances of the application running. Occasionally one of our instance of the application is legitimately killed which causes several rounds of rebalancing until the old node is replaced.
Sometimes during the rebalance, one ore more previously healthy nodes fail. The logs are indicating that the Streams application transitions into a PENDING_SHUTDOWN state directly after receiving the following exception:
java.lang.IllegalStateException: No current assignment for partition public.chat.message-28
at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:256)
at org.apache.kafka.clients.consumer.internals.SubscriptionState.resetFailed(SubscriptionState.java:418)
at org.apache.kafka.clients.consumer.internals.Fetcher$2.onFailure(Fetcher.java:621)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
at org.apache.kafka.clients.consumer.internals.RequestFutureAdapter.onFailure(RequestFutureAdapter.java:30)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:571)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:389)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292)
at org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:275)
at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1849)
at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1827)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:259)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:133)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:79)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:866)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:804)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:773)
Prior to this error we often seem to also recieve some informational logs reporting a disconnect exception:
Error sending fetch request (sessionId=568252460, epoch=7) to node 4: org.apache.kafka.common.errors.DisconnectException
I have a feeling the two are related but I'm unable to reason why at present.
Is anyone able to give me some hints as to what may be causing this issue and any possible solutions?
Additional Info:
Kafka 2.2.1
32 partitions spread evenly across the 4 worker nodes
StreamsConfig settings:
kafkaStreamProps.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 2);
kafkaStreamProps.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 1);
kafkaStreamProps.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
kafkaStreamProps.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 120000);
kafkaStreamProps.put(StreamsConfig.TOPOLOGY_OPTIMIZATION, StreamsConfig.OPTIMIZE);
This looks like it could be related to https://issues.apache.org/jira/browse/KAFKA-9073, which has been fixed in Kafka Streams 2.3.2.
If you can't wait for that release, you could try creating a private build using the changeset from this pull request: https://github.com/apache/kafka/pull/7630/files

Kafka Console Producer sending message failed

I would like to ask anyone face before this kind of stuation?
Few days ago the kafka is able to work properly but today it starting having problem. The console producer is unable to send message and receive by console consumer. After few seconds it prompt :
" ERROR Error when sending message to topic test with key: null, value: 11 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. "
Anyone can help? :'(
You'll see this if the broker you are bootstrapping from is not reachable. Try to telnet to the host and port you are trying to access from the producer. If that communication is OK, turn on debug logging by locating the tools-log4j.properties and changing the warn level to debug.