Producer timeout while sending requests on ActiveMQ Artemis broker - activemq-artemis

I am trying to introduce a timeout on my producer when sending messages to ActiveMQ Artemis broker 2.17.0. I use the following code for this purpose
#Bean
public ConnectionFactory jmsConnectionFactoryOnline() {
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(brokerUrl,username,password);
connectionFactory.setCallTimeout(5000);
return connectionFactory;
}
#Bean
public JmsPoolConnectionFactory pooledConnectionFactory() {
JmsPoolConnectionFactory poolingFactory = new JmsPoolConnectionFactory();
poolingFactory.setConnectionFactory(jmsConnectionFactoryOnline());
poolingFactory.setMaxConnections(MAX_CONNECTIONS);
poolingFactory.setMaxSessionsPerConnection(MAX_SESSIONS_PER_CONNECTION);
poolingFactory.setConnectionIdleTimeout(0);
return poolingFactory;
}
When I simulate losing network on the ActiveMQ Artemis node using iptables -A INPUT -s ip_producer -j DROP I can see on producer side that current connections honor the timeout of 5sec initially set.
Unfortunately following requests (new connections) seem to ignore it since I can observe producer waiting for the next calls up to Connection timeout interval (60sec) before declaring the request broken.
Can you guide me how to resolve this, or point to me what am I doing wrong and I cannot set a timeout on my producer?

The ConnectionTTL attribute of ActiveMQConnectionFactory determines how long a connection will be keep alive in the absence of any data arriving and its default value is 60 seconds.
Seting the ConnectionTTL attribute with a value lower than 60000 will reduce the connection timeout.

Related

Artemis - How to avoid TransactionRolledBackException for Non-Transactional session

I use live/backup with shared-storage, and I use a non-transacted JMS session. I always send one message, and I always receive one message then acknowledge and receive second message only after successful first acknowledge.
I got this exception in my non-transacted session:
Execution of JMS message listener failed. Caused by: [javax.jms.TransactionRolledBackException - AMQ219030: The transaction was rolled back on failover to a backup server]
javax.jms.TransactionRolledBackException: AMQ219030: The transaction was rolled back on failover to a backup server
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollbackOnFailover(ClientSessionImpl.java:904)
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:927)
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.acknowledge(ActiveMQMessage.java:719)
It happens because the session was marked as "rollbackOnly". I got this state after the following steps:
I use Spring-JMS. Consumer session works 24/7 (infinite loop session.receive())
The Master Node crashed, then the Master node was restarted
After recovery (After a couple of hours), I sent a message to the queue. The consumer read the message and throw Exception on acknowledge(because was marked as rollback-only)
I read message again (this is not very bad for my task) but Redelivery Count has not been increased
My consumer code:
onMessage(Message message) {
if (redeliveryCount(message) > 0){
processAsDublicate(message); // It's not invoked - it is error in my business logic.
}
}
I migrated from another broker and and I thought not to change the client logic
Question:
How to avoid TransactionRolledBackException for Non-Transactional session? If this is not possible i should change consumer code?
Thank you in Advance
UPDATE AFTER ANSWER:
https://github.com/apache/activemq-artemis/tree/2.14.0/examples/features/ha/replicated-failback
This example is not suitable for my case - I don't have non-acknowledged messages. I got this state after the following steps: 1) Restart server 2) consume message 3) acknoledge message
We use a broker for ~30 applications (24/7) ~ 200 consumers in total
For example, on the weekend we restart the JMS Broker
Will all consumers start getting this exception after consume new messages
(They don't have non-acknowledged messages)
The TransactionRolledBackException is expected as you can see in the replicated-failback example.
To prevent a consumer from receiving the same message more times, an idempotent consumer must be implemented, ie Apache Camel provides an Idempotent consumer component that would work with any JMS provider, see: http://camel.apache.org/idempotent-consumer.html

Kafka producer dealing with lost connection to broker

With a producer configuration like below, I am creating a Singleton producer that is used throughout the application:
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka.consul1:9092,kafka.consul2:9092");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.ACKS_CONFIG, "1");
I am connected to a k8s hosted Kafka cluster. The broker's advertised.listeners is configured to return me the IP addresses and not host names. While normally everything works as expected, the problem occurs when Kafka is restarted, sometimes the IP address changes. Since the producer only knows about the older IP it keeps trying to connect to that host to send messages and none of the messages go through.
I observe that a org.apache.kafka.common.errors.TimeoutException exception is thrown when the send fails. Currently the messages are sent async:
producer.send(data,
(RecordMetadata recordMetadata, Exception e) -> {
if (e != null) {
LOGGER.error("Exception while sending message to kafka", e);
}
});
How should the Timeoutexception be handled now? Given that the producer is shared across the application, closing and recreating in the callback does not sound right.
According to the JavaDocs on the Callback interface the TimeoutException is a retriable Exception that could be handled by increasing the number of retries of the Producer.
In the Kafka documentation you find details on the retries configuration:
retries (Default 0): Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Allowing retries without setting max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first.

Kakfa retries Concept - What Basis retries will be stopped in Kafka?

As am new to Kafka , trying to understand the retries concept in Kafka . What basis retries process will be completed ?
Example Retries parameter we set as 7 . Now questions here ,
Kafka will be retried in all 7 times ?
Will be tried until successful process ? If so , How Kafka will come to know about successful ?
If that would be depends upon any parameter what Is that parameter and how ?
In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.
Kafka will retry until the initiated process is successfully completed or retry count is zero.
Kafka maintains the status of each API call ( producer , consumer, and Streams ), and if the error condition meets then retry count is decreased.
Please go through the completeBatch function of the Sender.java in the following URL to get more information.
https://github.com/apache/kafka/blob/68ac551966e2be5b13adb2f703a01211e6f7a34b/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java
I guess you are talking about producer retrying to send failed messages.
From kafka producer retries property documentation -
"Setting a value greater than zero will cause the client to resend any
record whose send fails with a potentially transient error."
This means that kafka producer will retry if the error it encountered is considered "Retriable". not all errors are retriable - for example, if the target kafka topic does not exist, theres no point in trying to send the message again.
but if for example the connection was interrupted, it makes sense to try again.
Important to note - retries are only relevant if you have set broker ack != 0.
So, in your example you have 7 retries configured.
I assume that ack is set to a value different than 0 because then no retries will be attempted.
If your message failed with a non-retriable error, Kafka producer will not try to send the message again (it will actually 'give-up' on that message and move on to next messages).
If your message failed with a retriable error, Kafka producer will retry sending until message is successfully sent, or until retries are exhausted (when 7 retries were attempted and none of them succeeded).
Kafka client producer knows when your message was successfully sent to broker because when ack is set to 1\all, the kafka broker is "Acknowledging" any message received and informs the producer (in a kind of handshake between the producer and broker).
see acks & retries # https://kafka.apache.org/documentation/#producerconfigs
Kafka reties happens for transient exceptions such as NotEnoughReplicaException.
In Kafka version <=2.0 default retry is 0.
In Kafka version > 2.0 default retry is Integer.MAX
From kafka 2.1 retries are bounded to timeouts, there are couple of producer configuration such as.
delivery.timeout.ms=120000ms - by default producer will retry for 2 mins, if retry is not successful after 2 mins the request will not send to broker and we have to handle manually.
retry.backoff.ms=100ms - by default every 100ms producer will retry till delivery.timeout reaches.

Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time

Kafka Version : 0.10.2.1,
Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
This exception is occuring because you are queueing records at a much faster rate than they can be sent.
When you call the send method, the ProducerRecord will be stored in an internal buffer for sending to the broker. The method returns immediately once the ProducerRecord has been buffered, regardless of whether it has been sent.
Records are grouped into batches for sending to the broker, to reduce the transport overheard per message and increase throughput.
Once a record is added into a batch, there is a time limit for sending that batch to ensure that it has been sent within a specified duration. This is controlled by the Producer configuration parameter, request.timeout.ms, which defaults to 30 seconds. See related answer
If the batch has been queued longer than the timeout limit, the exception will be thrown. Records in that batch will be removed from the send queue.
Producer configs block.on.buffer.full, metadata.fetch.timeout.ms and timeout.ms have been removed. They were initially deprecated in Kafka 0.9.0.0.
Therefore give a try for increasing request.timeout.ms
Still, if you have any problem related to throughput, you can also refer following blog
This issue originates when wither brokers/topics/partitions are not able to contact with producer or producer times out before the queue.
I found that even for a live brokers you can encounter this issue. In my case, the topic partitions leaders were pointing to inactive broker ids. To fix this issue, you have to migrate those leaders to active brokers.
Use topic-reassignment tool for impacted topics.
Topic Migration: https://kafka.apache.org/21/documentation.html#basic_ops_automigrate
I had same message and I fixed it cleaning the kafka data from zookeeper. After that it's working.
i had faced same issue in aks cluster, just restarting of kafka and zookeeper servers resolved the issue.
FOR KAFKA DOCKER CASE
For a lot of time find out what happened, including changes server.properties , producer.properties and my code (Eclipse). That does not work for me (I send message from my laptop to Kafka Docker on a Linux server)
I cleaned Kafka and Zookeeper and reinstall them by docker-compose.yml(I'm newbie). Please look at my docker-compose.yml file and follow how I changes these IP to my Linux server's IP
bitnami/kafka
bitnami/kafka
to...
bitnami-changed
while 10.5.1.30 is my Linux server's IP address
wurstmeister kafka
wurstmeister
after that, I ran my code and here's result:
result
full code:
import java.util.Properties;
import java.util.concurrent.Future;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
public class SimpleProducer {
public static void main(String[] args) throws Exception {
try {
String topicName = "demo";
Properties props = new Properties();
props.put("bootstrap.servers", "10.5.1.30:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
Future<RecordMetadata> f = producer.send(new ProducerRecord<String, String>(topicName, "Eclipse3"));
System.out.println("Message sent successfully, total of message is: " + f.get().toString());
producer.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println("Successful");
}
}
Hope that helps. Peace !!!
Say a topic has 100 partitions (0-99). Kafka lets you produce records to a topic by specifying a particular partition. Faced the issue where I'm trying to produce to partition > 99, because brokers reject these records.
We tried everything, but no luck.
Decreased producer batch size and increased request.timeout.ms.
Restarted target kafka cluster, still no luck.
Checked replication on target kafka cluster, that as well was working fine.
Added retries, retries.backout.ms in prodcuer properties.
Added linger.time as well in kafka prodcuer properties.
Finally our case there was issue with kafka cluster itself, from 2 servers we were unable to fetch metadata in between.
When we changed target kafka cluster to our dev box, it worked fine.

Is there any way to check if kafka is up and running from kafka-net

I am using kafka-net client to send messages to kafka. I'm just wondering if there is any way to check is kafka server up and can receive messages. I shut kafka down, but the producer has been created successfully and SendMessageAsync just freezes for quite a long time. I've tried to pass timeout but it doesn't change anything. I use kafka-net 0.9
It works just fine when kafka server is up and running
Broker's id is registered in zookeeper(/brokers/ids/[brokerId]) as ephemeral node, which allow other brokers and consumers to detect failures.(Right now the definition of health is fairly naive., if registered in zk /brokers/ids/[brokerId] the broker is healthy, otherwise it is dead).
zookeeper ephemeral node exists as long as the broker's session is
active.
You could check if broker is up via ZkUtils.getSortedBrokerList(zkClient), which return all active broker id under /brokers/ids
import org.I0Itec.zkclient.ZkClient;
ZkClient zkClient = new ZkClient(properties.getProperty("zkQuorum"), zkSessionTimeout, zkConnectionTimeout,ZKStringSerializer$.MODULE$);
ZkUtils.getSortedBrokerList(zkClient);
Reference
Kafka data structures in Zookeeper
Try this.
In your constructor, put
options = new KafkaOptions(uri);
var endpoint = new DefaultKafkaConnectionFactory().Resolve(options.KafkaServerUri.First(), options.Log);
client = new KafkaTcpSocket(new DefaultTraceLog(), endpoint);
and then before you send each message,
// test if the broker is alive
var request = new MetadataRequest { Topics = new List<string>() { Topic } };
var task1 = client.WriteAsync(request.Encode()).ConfigureAwait(false);
Task<KafkaDataPayload> task2 = Task.Factory.StartNew(() => task1.GetAwaiter().GetResult());
if (task2.Wait(30000) == false)
{
throw new TimeoutException("Timeout while sending message to kafka broker!");
}
If you have a high volume of messages, this is going to be a performance hit, but with a low volume of messages it shouldn't matter.