ActiveMQ Artemis Shared Storage slave fail to start if master is not started - activemq-artemis

We have a master/slave setup with the shared storage strategy.
We observed that if we start the slave when the master is down, we have the following message:
AMQ221032: Waiting to become backup node
And the server does not become live.
So it means that the slave requires the master to be up at a given time to become operational.
Is this the expected behavior? Is there a way to let the slave become live at startup if the master is down?

Generally speaking what you're seeing is not the expected behavior for master/slave using shared storage. If the master is not started and the slave is started then the slave should acquire the lock on the shared storage and start. I just tested this out using the transaction-failover example which is shipped with ActiveMQ Artemis and the backup started just fine when the master wasn't started. Here's the logging I saw when starting the backup when the master wasn't started:
2022-07-03 21:50:55,955 INFO [org.apache.activemq.artemis.core.server] AMQ221032: Waiting to become backup node
2022-07-03 21:50:55,956 INFO [org.apache.activemq.artemis.core.server] AMQ221033: ** got backup lock
...
2022-07-03 21:50:56,156 INFO [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.23.0 [0db7f4ea-fb44-11ec-8718-3ce1a1d12939] started, waiting live to fail before it gets active
...
2022-07-03 21:50:56,661 INFO [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
The behavior you're seeing indicates that perhaps another backup is already started and has acquired the backup lock on the journal. It's hard to say with the information you're provided.

Related

kafka broker restart in clean state

We are using apache kafka 1.0 .We had stopped one of the kafka brokers in our cluster for an activity and its disk got fully wiped. We added it back with the same broker id so it started syncing with other brokers. We saw the below once it started syncing -
In application while producing to kafka things were working fine mostly . We saw a massive spike in consumer offset commit failures with error messages like below across consumers -
[Consumer clientId=abcasdfsadf, groupId=service_name_group_id] Offset commit failed on partition topic-name-1 at offset 2770664: The request timed out.
During this duration in the brokers which were already running we also so these logs pretty frequently [25-30k times in an hour] -
[2023-02-01 20:03:36,739] WARN Attempting to send response via channel for which there is no open connection, connection id broker-source-ip:broker-source-port-remote-ip:remote-port-1203082 (kafka.network.Processor)
Another observation was it seemed like the kafka cluster was expanding the isr and contracting it pretty frequently too for some topics [ it eventually did get in sync ]. Once the restarted broker got in sync all the errors seemed to go away. While the network and disk io was higher than usual but the machine on which broker runs has more bandwidth .
Was wondering if anyone has encountered a similar issue before and what could be the cause of this .

Impact of the starting kafka service on bootup

I have configure the kafka service on auto start on bootup.I wanted to understand the impact of doing so.
1.If service started on the all kafka servers at the same time.
2.If Kafka service auto started before all zookeeper severs started.
3.If Kafka service auto started after some time of gap.
Are there any other impact of starting kafka service on bootup automatically.
The only significant impact would be I/O usage of the machine. The order of the brokers don't matter, but if they start before Zookeeper, they would fail to start at all.

Delete the kafka connect topics without stopping the process

I was running a Kafka connect worker in distributed mode. (it's a test cluster), I wanted to reset the default connect-* topics,so without stopping the worker I removed, then After the worker restart, I'm getting this error.
ERROR [Worker clientId=connect-1, groupId=debezium-cluster1] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:324)
org.apache.kafka.common.config.ConfigException:
Topic 'connect-offsets' supplied via the 'offset.storage.topic' property is required to have 'cleanup.policy=compact' to guarantee consistency and durability of source connector offsets,
but found the topic currently has 'cleanup.policy=delete'.
Continuing would likely result in eventually losing source connector offsets and problems restarting this Connect cluster in the future.
Change the 'offset.storage.topic' property in the Connect worker configurations to use a topic with 'cleanup.policy=compact'.
Deleting the internal topics while the workers are still running sounds risky. The workers have internal state, which now no longer matches the state in the Kafka brokers.
A safer approach would be to shut down the workers (or at-least shut down all the connectors), delete the topics, and restart the workers/connectors.
It looks like the topics got auto-created, perhaps by the workers when you deleted them mid-flight.
You could manually apply the configuration change to the topic as suggested, or you could also specify a new set of topics for the worker to use (connect01- for example) and let the workers recreate them correctly.

Kafka is not restarting - cleared logs, disk space. Restarted but it turns off again and again

We are a talent system and have installed Zookeeper and Kafka on our AWS instance to send our requests to the core engine to get matches.
On our UI we are getting the error:
NoBrokersAvailable
and when we check Kafka is down. We restart it and it's still down. We checked for the logs and cleared it and also we checked cleared disk space.
Still the same problem of Kafka not starting. What should we do?

Synchronization mode of ha replication

Version : ActiveMQ Artemis 2.10.1
When we use ha-policy and replication, is the synchronization mode between the master and the slave full synchronization? Can we choose full synchronization or asynchronization?
I'm not 100% certain what you mean by "full synchronization" so I'll just explain how the brokers behave...
When a master broker receives a durable (i.e. persistent) message it will write the message to disk and send the message to the slave in parallel. The broker will then wait for the local disk write operation to complete as well as receive a response from the slave that it accepted the message before it responds to the client who originally sent the message.
This behavior is not configurable.