ActiveMQ Artemis latency issue - activemq-artemis

I have a cluster with 6 nodes of ActiveMQ Artemis (version 2.27.1) - i work with 3 servers, which on each one there is master and slave node.
As part of my test, i send something like 10,000 messages per minute into specific queue and watch the consuming latency results using my OpenSearch dashboard.
When i start my nodes by pairs - master and his corresponding slave on each iteration, it works normally, and i dont see any latency.
But When i start the nodes by the official recommended order of Artemis documentation, or in any other way, i am having latency.
here are some results of my test, and the difference between the scenarios:
artemis-latency-test-results

Related

Messages are stuck in ActiveMQ Artemis cluster queues

We have a problem with Apache ActiveMQ Artemis cluster queues. Sometimes messages are beginning to pile up in the particular cluster queues. It usually happens 1-4 times per day and mostly on production (it was only one time for last 90 days when it has happened on one of the test environments).
These messages are not delivered to consumers on other cluster brokers until we restart cluster connector (or entire broker).
The problem looks related to ARTEMIS-3809.
Our setup is: 6 servers in one environment (3 pairs of master/backup servers). Operating system is Linux (Red Hat).
We have tried to:
upgrade from 2.22.0 to 2.23.1
increase minLargeMessageSize on the cluster connectors to 1024000
The messages are still being stuck in the cluster queues.
Another problem that I tried to configure min-large-message-size as it written in documentation (in cluster-connection), but it caused errors at start (broker.xml did not pass validation with xsd), so it was only option to specify minLargeMessageSize in the URL parameters of connector for each cluster broker. I don't know if this setting has effect.
So we had to make a script which checks if messages are stuck in the cluster queues and restarts cluster connector.
How can we debug this situation?
When the messages are stuck, nothing wrong is written to the log (no errors, no stacktraces etc.).
Which logging level (for what classes) should we enable to debug or trace level to find out what happens with the cluster connectors?
I believe you can remedy the situation by setting this on your cluster-connection:
<producer-window-size>-1</producer-window-size>
See ARTEMIS-3805 for more details.
Generally speaking, moving message around the cluster via the cluster-connection, while convenient, isn't terribly efficient (much less so for "large" messages). Ideally you would have a sufficient number of clients on each node to consume the messages that were originally produced there. If you don't have that many clients then you may want to re-evaluate the size of your cluster as it may actually decrease overall message throughput rather than increase it.
If you're just using 3 HA pairs in order to establish a quorum for replication then you should investigate the recently added pluggable quorum voting which allows integration with a 3rd party component (e.g. ZooKeeper) for leader election eliminating the need for a quorum of brokers.

Consume directly from ActiveMQ Artemis replica

In a cluster scenario using HA/Data replication feature is there a way for consumers to consume/fetch data from a slave node instead of always reaching out to the master node (master of that particular queue)?
If you think about scalability, having all consumers call a single node responsible to be the master of a specific queue means all traffic goes to a single node.
Kafka allows consumers to fetch data from the closest node if that node contains a replica of the leader, is there something similar on ActiveMQ?
In short, no. Consumers can only consume from an active broker and slave brokers are not active, they are passive.
If you want to increase scalability you can add additional brokers (or HA broker pairs) to the cluster. That said, I would recommend careful benchmarking to confirm that you actually need additional capacity before increasing your cluster size. A single ActiveMQ Artemis broker can handle millions of messages per second depending on the use-case.
As I understand it, Kafka's semantics are quite different from a "traditional" message broker like ActiveMQ Artemis so the comparison isn't particularly apt.

Kafka cluster with single broker

I'm looking to start using Kafka for a system and I'm trying to cover all use cases.
Normally it would be run as a cluster of brokers running on virtual servers (replication factor 3-5). but some customers though don't care about resilience and a broker failure needing a manual reboot of the whole system is fine with them, they just care about hardware costs.
So my question is, are there any issues with using Kafka as a single broker system for small installations with low throughput?
Cheers
It's absolutely OK to use a single Kafka broker. Note, however, that with a single broker you won't have a highly available service meaning that when the broker fails you will have a downtime.
Your replication-factor will be limited to 1 and therefore all of the partitions of a topic will be stored on the same node.
For a proof-of-concept or non-critical dev work, a single node cluster works just fine. However having a cluster has multiple benefits. It's okay to go with a single node cluster if the following are not important/relevant for you.
scalability [spreads load across multiple brokers to maintain certain throughput]
fail-over [guards against data loss in case one/more node(s) go down]
availability [system remains reachable and functioning even if one/more node(s) go down]

During rolling upgrade/restart, how to detect when a kafka broker is "done"?

I need to automate a rolling restart of a kafka cluster (3 kafka brokers). I can easily do it manually - restart one after the other, while checking the log to see when it's fine (e.g., when the new process has joined the cluster).
What is a good way to automate this check? How can I ask the broker whether it's up and running, connected to its peers, all topics up-to-date and such? In my restart script, I have access to the metrics, but to be frank, I did not really see one there which gives me a clear picture.
Another way would be to ask what a good "readyness" probe would be that does not simply check some TCP/IP port, but looks at the actual server...
I would suggest exposing JMX metrics and tracking the following for cluster health
the controller count (must be 1 over the whole cluster)
under replicated partitions (should be zero for healthy cluster)
unclean leader elections (if you don't disable this in server.properties make sure there are none in the metric counts)
ISR shrinks within a reasonable time period, like 10 minute window (should be none)
Also, Yelp has tooling for rolling restarts implemented in Python, which requires Jolokia JMX Agents installed on the brokers, and it polls the metrics to make sure some of the above conditions are true
Assuming your cluster was healthy at the beginning of the restart operation, at a minimum, after each broker restart, you should ensure that the under-replicated partition count returns to zero before restarting the next broker.
As the previous responders mentioned, there is existing code out there to automate this. I don’t use Jolikia, myself, but my solution (which I’m working on now) also uses JMX metrics.
Kakfa Utils by Yelp is one of the best tools that can be used to detect when a kafka broker is "done". Specifically, kafka_rolling_restart is the tool which gets broker details from zookeeper and URP (Under Replicated Partitions) metrics from each broker. When a broker is restarted, total URPs across Kafka cluster is periodically collected and when it goes to zero, it restarts another broker. The controller broker is restarted at the last.

What is the real use of kafka based multi ordering service

I am new in fabric technologies. I read some articles about the Kafka based ordering services and its advantage. Some of articles say that Kafka based multi ordering services is suitable for fault tolerance. Now i just apply 3 Kafka based ordering services(orderer0,orderer1,orderer2). Then i stopped 2 orderer using the following command
docker stop orderer1.example.com
docker stop orderer2.example.com
Now the Rest api working correctly. Then i stopped orderer0 using
docker stop orderer0.example.com
Now my Rest api is not working.It has facing network connection problem.Then I started orderer1,orderer2 using the following command
docker start orderer1.example.com
docker start orderer2.example.com
But my Rest api is not working...........It has facing the same network connection problem.
And finally I started orderer0 using
docker start orderer0.example.com
Now the network is working fine.
My questions is
What is actual use of Kafka based ordering services..??
How we can implement Kafka based ordering service for prevent the orderer downing problem...??
Fabric:1.1.0
Composer:0.19.16
Node:8.11.3
OS: Ubuntu 16.04
I had the same problem as you when I wanted to set up several orderer. To solve this problem I have 2 solutions:
I changed the SDK, currently your SDK tries to contact the orderer0 if it fails it returns an error, it is necessary to change this so that the request loop on a list of orderer and returns an error if no is valid.
easier: set up a load-balancer upstream of the orderers.
To answer your question. The advantage of setting up Kafka based ordering services is that the data of the proposed blocks are spread over several servers. There is a fault tolerance because if an orderer crashes and reconnects to the kafka cluster it will be able to resynchronize. The performances are better (it's theoretical I did not test on this point)
As per Kafka Ordering Services
Each channel maps to a separate single-partition topic in Kafka
This means that all messages in the topic are totally-ordered in the order in which they were sent.
and
At a minimum, [the number of brokers] should be set to 4. (As we will explain in Step 4 below, this is the minimum number of nodes necessary in order to exhibit crash fault tolerance, i.e. with 4 brokers, you can have 1 broker go down, all channels will continue to be writeable and readable, and new channels can be created.)
The above assumes a Kafka replication factor of 3 and the producing client to set min.insync.replicas ideally to 2 to make sure that all writes are replicated to at least two servers.
Based on your network issues, this sounds to me like you did not actually configure all three brokers correctly (would need to see your entire Docker setup and what the Dockerfile is actually doing). But, assuming you did configure all three brokers for this "REST API", and there is a single-partition Kafka topic with 3 replicas (the default replication is 1, and topics are auto-created with this). So, I suggest you clean it all, then start three brokers, then manually create the topic with 1 partition, 3 replicas, then start Hyperledger.
If the REST API is the actual problem, not the Kafka connection, then you would need a load-balancer, I guess