How to unlock ActiveMQ Artemis broker - activemq-artemis

I did something to lock my ActiveMQ Artemis 2.8.1 broker. I needed to run > ./artemis data exp to get data on my queue setup. It failed to run, giving an error saying that the broker was locked: /var/lib/[broker]/lock
So I stopped the broker and ran the data exp successfully, but now when I try to start the broker I get the same error, and I don't know how to stop whatever was started by data exp.
Error: There is another process using the server at /var/lib/broker1/lock. Cannot start the process!*
So how do I unlock the broker in this situation? I've tried using systemctl to restart Artemis all together, but that didn't do anything. And the Artemis tab is missing entirely from Console.

You should be able to simply remove the lock file at /var/lib/broker1/lock and then start the broker again.

Related

KafkaTopicProvisioner failed to obtain partition

I observed my services going down with the below exception. The reason was one of our three Kafka brokers was down. And spring was always trying to connect with the same broker. Before it can skip faulty broker and connect to the next available broker, Kubernetes is restarting the pod (due liveness probe failure configured at 60seconds). Due to restart, next time also it tries to connect to the same first faulty broker and thus pod never comes up.
How we can configure spring to not wait for more than 10seconds for a faulty broker?
I found cloud.stream.binder.healthTimeout property but not sure if this is the right one. How I can replicate the issue in my local.
Kafka version: 2.2.1
{“timestamp”:“2020-01-21T17:16:47.598Z”,“level”:“ERROR”,“thread”:“main”,“logger”:“org.springframework.cloud.stream.binder.kafka.provisioning.KafkaTopicProvisioner”,“message”:“Failed
to obtain partition
information”,“context”:“default”,“exception”:“org.apache.kafka.common.errors.TimeoutException:
Failed to update metadata after 60000 ms.\n”}

ActiveMQ Artemis Error - AMQ224088: Timeout (10 seconds) while handshaking has occurred

In ActiveMQ Artemis, I occasionally receive the connection error below. I can't see any obvious impact to the brokers or message queues. Anyone able to advise exactly what it means or what impact it could be having?
Current action performed is to either restart the brokers or check to see they're still connected to the cluster. Is either of this action necessary?
Current ActiveMQ Artemis version deployed is v2.7.0.
//error log line received at least once a month
2019-05-02 07:28:14,238 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: *Timeout (10 seconds) while handshaking* has occurred.
This error indicates that something on the network is connecting to the ActiveMQ Artemis broker, but it's not completing any protocol handshake. This is commonly seen with, for example, load balancers that do a health check by creating a socket connection without sending any real data just to see if the port is open on the target machine.
The timeout is configurable so that the ERROR messages aren't logged, but that will also disable the clean-up which may or may not be a problem in your use-case. You should just be able to set handshake-timeout=0 on the relevant acceptor URL in broker.xml.
When you see this message there should be no need to restart the broker.
In the next ActiveMQ Artemis release the IP address of the remote client where the connection originated will be included as part of the message.

Spring Cloud Stream Kafka Binder autoCommitOnError=false get unexpected behavior

I am using Spring Boot 2.1.1.RELEASE and Spring Cloud Greenwich.RC2, and the managed version for spring-cloud-stream-binder-kafka is 2.1.0RC4. The Kafka version is 1.1.0. I have set the following properties as the messages should not be consumed if there is an error.
spring.cloud.stream.bindings.input.group=consumer-gp-1
...
spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOnError=false
spring.cloud.stream.kafka.bindings.input.consumer.enableDlq=false
spring.cloud.stream.bindings.input.consumer.max-attempts=3
spring.cloud.stream.bindings.input.consumer.back-off-initial-interval=1000
spring.cloud.stream.bindings.input.consumer.back-off-max-interval=3000
spring.cloud.stream.bindings.input.consumer.back-off-multiplier=2.0
....
There are 20 partitions in the Kafka topic and Kerberos is used for authentication (not sure if this is relevant).
The Kafka consumer is calling a web service for every message it processes, and if the web service is unavailable then I expect that the consumer will then try to process the message for 3 times before it moves on to the next message. So for my test, I disabled the webservice, and therefore none of the message could be processed correctly. From the logs I can see that this is happening.
After a while I stopped and then restarted the Kafka consumer (webservice is still disabled). I was expecting that after the restart of the Kafka consumer, it would attempt to process the messages that was not successfully processed the first time around. From the logs (I printed out each message with its fields) after the restart of the Kafka Consumer I couldn't see this happening. I thought the partition might be influencing something, but I check the logs and all 20 partitions were assigned to this single consumer.
Is there a property I have missed? I thought the expected behavior when I restart the consumer the second time, is that Kafka broker would pass the records that were not successfully processed to the consumer again.
Thanks
Parameters working as expected. See comment.

Lagom Kafka Unexpected Close Error

In Lagom Dev Enviornment, after starting Kafka using lagomKafkaStart
sometimes it shows KafkaServer Closed Unexpectedly, after that i need to run clean command to again get it running.
Please suggest is this the expected behaviour.
This can happen if you forcibly shut down sbt and the ZooKeeper data becomes corrupted.
Other than running the clean command, you can manually delete the target/lagom-dynamic-projects/lagom-internal-meta-project-kafka/ directory.
This will clear your local data from Kafka, but not from any other database (Cassandra or RDBMS). If you are using Lagom's message broker API, it will automatically repopulate the Kafka topic from the source database when you restart your service.

How to start Zookeeper and then Kafka?

I'm getting started with Confluent Platform which requires to run Zookeeper (zookeeper-server-start /etc/kafka/zookeeper.properties) and then Kafka (kafka-server-start /etc/kafka/server.properties). I am writing an Upstart script that should run both Kafka and Zookeeper. The issue is that Kafka should block until Zookeeper is ready (because it depends on it) but I can't find a reliable way to know when Zookeeper is ready. Here are some attempts in pseudo-code after running the Zookeeper server start:
Use a hardcoded block
sleep 5
Does not work reliably on slower computers and/or waits longer than needed.
Check when something (hopefully Zookeeper) is running on port 2181
wait until $(echo stat | nc localhost ${port}) is not none
This did not seem to work as it doesn't wait long enough for Zookeeper to accept a Kafka connection.
Check the logs
wait until specific string in zookeeper log is found
This is sketchy and there isn't even a string that cannot also be found on error too (e.g. "binding to port [...]").
Is there a reliable way to know when Zookeeper is ready to accept a Kafka connection? Otherwise, I will have to resort to a combination of 1 and 2.
I found that using a timer is not reliable. the second option (waiting for the port) worked for me:
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties && \
while ! nc -z localhost 2181; do sleep 0.1; done && \
bin/kafka-server-start.sh -daemon config/server.properties
The Kafka error message from your comment is definitely relevant:
FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.lang.RuntimeException: A broker is already registered on the path /brokers/ids/0. This probably indicates that you either have configured a brokerid that is already in use, or else you have shutdown this broker and restarted it faster than the zookeeper timeout so it appears to be re-registering.
This indicates that ZooKeeper is up and running, and Kafka was able to connect to it. As I would have expected, technique #2 was sufficient for verifying that ZooKeeper is ready to accept connections.
Instead, the problem appears to be on the Kafka side. It has registered a ZooKeeper ephemeral node to represent the starting Kafka broker. An ephemeral node is deleted automatically when the client's ZooKeeper session expires (e.g. the process terminates so it stops heartbeating to ZooKeeper). However, this is based on timeouts. If the Kafka broker restarts rapidly, then after restart, it sees that a znode representing that broker already exists. To the new process start, this looks like there is already a broker started and registered at that path. Since brokers are expected to have unique IDs, it aborts.
Waiting for a period of time past the ZooKeeper session expiration is an appropriate response to this problem. If necessary, you could potentially tune the session expiration to happen faster as discussed in the ZooKeeper Administrator's Guide. (See discussion of tickTime, minSessionTimeout and maxSessionTimeout.) However, tuning session expiration to something too rapid could cause clients to experience spurious session expirations during normal operations.
I have less knowledge on Kafka, but perhaps there is also something that can be done on the Kafka side. I know that some management tools like Apache Ambari take steps to guarantee assignment of a unique ID to each broker on provisioning.