using jmx monitor kafka topic - apache-kafka

I am using jmx to monitoring kafka topic.
val url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://broker1:9393/jmxrmi");
val jmxc = JMXConnectorFactory.connect(url, null);
val mbsc = jmxc.getMBeanServerConnection();
val messageCountObj = new ObjectName("kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=mytopic");
val messagesInPerSec = mbsc.getAttribute(messageCountObj,"MeanRate")
using this code I can get the MeanRate of "mytopic" on broker1.
but I have 10 brokers,how can I get the "mytopic"'s MeanRate from all my brokers?
I have try "service:jmx:rmi:///jndi/rmi://broker1:9393,broker2:9393,broker3:9393/jmxrmi"
got an error :(

It would be nice if it were that simple ;)
There's no way to do this as you outlined. You will need to make a seperate connection to each broker.
One possible solution would be to use MBeanServer Federation which would register proxies for each of your brokers in one MBeanServer, so if you did this on broker1, you could connect to service:jmx:rmi:///jndi/rmi://broker1:9393/jmxrmi and query the stats for all your brokers in one go, but you would need to query 10 different ObjectNames, query the value for each and then compute the MeanRate yourself. [Java] Pseudo code:
ObjectName wildcard = new ObjectName("*:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=mytopic");
double totalRate = 0d;
int respondingBrokers = 0;
for(ObjectName on : mbsc.queryNames(wildcard, null)) {
totalRate += (Double)mbsc.getAttribute(messageCountObj,"MeanRate");
respondingBrokers++;
}
// Average rate of mean rates: totalRate/respondingBrokers
Note: no exception handling, and I am assuming the rate type is a Double.
You could also create and register a custom MBean that computed the aggregate mean on the federated broker.
If you are maven oriented, you can build the OpenDMK from here.

Related

Hazelcast AtomicLong Data loss When multiple member left

Hazelcast fails when multiple members disconnect from the cluster.
My scenario is so basic and my configuration has 3 bakcup option(it does not work). I have 4 members in a cluster and i use AtomicLong API to save my key->value. When all members are alive, everything is perfect. However, some data loss occurs when i kill 2 members at the same time(without waiting for a while). My member counts are always 4. is there any way to avoid this kind of data loss?
Config config = new Config();
NetworkConfig network = config.getNetworkConfig();
network.setPort(DistributedCacheData.getInstance().getPort());
config.getCacheConfig("default").setBackupCount(3);
JoinConfig join = network.getJoin();
join.getMulticastConfig().setEnabled(false);
join.getTcpIpConfig().setEnabled(true);
config.setNetworkConfig(network);
config.setInstanceName("member-name-here");
Thanks.
IAtomicLong has hard-coded 1 sync backup, you cannot configure it to have more than 1 backup. What you are doing is configuring Cache with 3 backups.
Below is a example demonstrates multiple member disconnect for IMap
Config config = new Config();
config.getMapConfig("myMap").setBackupCount(3);
HazelcastInstance[] instances = {
Hazelcast.newHazelcastInstance(config),
Hazelcast.newHazelcastInstance(config),
Hazelcast.newHazelcastInstance(config),
Hazelcast.newHazelcastInstance(config)
};
IMap<Integer, Integer> myMap = instances[0].getMap("myMap");
for (int i = 0; i < 1000; i++) {
myMap.set(i, i);
}
System.out.println(myMap.size());
instances[1].getLifecycleService().terminate();
instances[2].getLifecycleService().terminate();
System.out.println(myMap.size());

Spark Structured Streaming: Running Kafka consumer on separate worker thread

So I have a Spark application that needs to read two streams from two kafka clusters (Kafka A and B) using structured streaming, and do some joins and filtering on the two streams. So is it possible to have a Spark job that reads stream from A, and also run a thread (Called consumer) on each worker that reads Kafka B and put data into a map. So later when we are filtering we can do something like stream.filter(row => consumer.idNotInMap(row.id))?
I have some questions regarding this approach:
If this approach works, will it cause any problems when the application is run on a cluster?
Will all consumer instance on each worker receive the same data in cluster mode? Or can we even let each consumer only listen on the Kafka partition for that worker node (which is probably controlled by Spark)?
How will the consumer instance gets serialized and passed to workers?
Currently they are initialized on Driver node but are there some ways to initialize it once for each worker node?
I feel like in my case I should use stream joining instead. I've already tried that and it didn't work, that's why I am taking this approach. It didn't work because stream from Kafka A is append only and stream B needs to have a state that can be updated, which makes it update only. Then joining streams of append and update mode is not supported in Spark.
Here are some pseudo-code:
// SparkJob.scala
val consumer = Consumer()
val getMetadata = udf(id => consumer.get(id))
val enrichedDataSet = stream.withColumn("metadata", getMetadata(stream("id"))
// Consumer.java
class Consumer implements Serializable {
private final ConcurrentHashMap<Integer, String> metadata;
public MetadataConsumer() {
metadata = new ConcurrentHashMap<>();
// read stream
listen();
}
// process kafka data inside this loop
private void listen() {
Thread t = new Thread(() -> {
KafkaConsumer consumer = ...;
while (consumer.hasNext()) {
var message = consumer.next();
// update metadata or put in new metadata
metadata.put(message.id, message);
}
});
t.start();
}
public String get(Integer key) {
return metadata.get(key);
}
}

ActiveMQ-Artemis 2.6 Set queue dead letter address using JMS management API

I'm trying to set the dead letter address for a queue via the JMS management API. From reading the latest Artemis docs it appears that I should be able to do this using the QueueControl.setDeadLetterAddress(...) method. See https://activemq.apache.org/artemis/docs/latest/management.html and search for "setDeadLetterAddress".
It is my understanding that the parameters of these methods should be found in the Artemis QueueControl javadocs here:
https://activemq.apache.org/artemis/docs/javadocs/javadoc-latest/org/apache/activemq/artemis/api/core/management/QueueControl.html
However, that documentation does not have any mention of a setDeadLetterAddress method or what parameters is might accept.
Does the QueueControl.setDeadLetterAddress method still exist and can it be called from the JMSManagementHelper.putOperationInvocation(...) method?
Many thanks!
Looking at the QueueControlImpl class code it is clear that the setDeadLetterAddress operation is no longer present. An operation in the ActiveMQServerControlImpl class named addAddressSettings does provide the capability to set the DLA for a queue (as well as plenty of other settings).
For example:
Queue managementQueue = ActiveMQJMSClient.createQueue("activemq.management");
Queue replyQueue = ActiveMQJMSClient.createQueue("management.reply");
JMSContext context = connectionFactory.createContext();
JMSConsumer consumer = context.createConsumer(replyQueue)) {
JMSProducer producer = context.createProducer();
producer.setJMSReplyTo(replyQueue);
// Using AddressSettings isn't required, but is provided
// for clarity.
AddressSettings settings = new AddressSettings()
.setDeadLetterAddress(new SimpleString("my.messages.dla"))
.setMaxDeliveryAttempts(5)
.setExpiryAddress(new SimpleString("ExpiryAddress"))
.setExpiryDelay(-1L) // No expiry
.setLastValueQueue(false)
.setMaxSizeBytes(-1) // No max
.setPageSizeBytes(10485760)
.setPageCacheMaxSize(5)
.setRedeliveryDelay(500)
.setRedeliveryMultiplier(1.5)
.setMaxRedeliveryDelay(2000)
.setRedistributionDelay(1000)
.setSendToDLAOnNoRoute(true)
.setAddressFullMessagePolicy(AddressFullMessagePolicy.PAGE)
.setSlowConsumerThreshold(-1) // No slow consumer checking
.setSlowConsumerCheckPeriod(1000)
.setSlowConsumerPolicy(SlowConsumerPolicy.NOTIFY)
.setAutoCreateJmsQueues(true)
.setAutoDeleteJmsQueues(false)
.setAutoCreateJmsTopics(true)
.setAutoDeleteJmsTopics(false)
.setAutoCreateQueues(true)
.setAutoDeleteQueues(false)
.setAutoCreateAddresses(true)
.setAutoDeleteAddresses(false);
Message m = context.createMessage();
JMSManagementHelper.putOperationInvocation(m, ResourceNames.BROKER, "addAddressSettings",
"my.messages",
settings.getDeadLetterAddress().toString(),
settings.getExpiryAddress().toString(),
settings.getExpiryDelay(),
settings.isLastValueQueue(),
settings.getMaxDeliveryAttempts(),
settings.getMaxSizeBytes(),
settings.getPageSizeBytes(),
settings.getPageCacheMaxSize(),
settings.getRedeliveryDelay(),
settings.getRedeliveryMultiplier(),
settings.getMaxRedeliveryDelay(),
settings.getRedistributionDelay(),
settings.isSendToDLAOnNoRoute(),
settings.getAddressFullMessagePolicy().toString(),
settings.getSlowConsumerThreshold(),
settings.getSlowConsumerCheckPeriod(),
settings.getSlowConsumerPolicy().toString(),
settings.isAutoCreateJmsQueues(),
settings.isAutoDeleteJmsQueues(),
settings.isAutoCreateJmsTopics(),
settings.isAutoDeleteJmsTopics(),
settings.isAutoCreateQueues(),
settings.isAutoDeleteQueues(),
settings.isAutoCreateAddresses(),
settings.isAutoDeleteAddresses());
producer.send(managementQueue, m);
Message response = consumer.receive();
// addAddressSettings returns void but this will also return errors if the
// method or parameters are wrong.
log.info("addAddressSettings Reply: {}", JMSManagementHelper.getResult(response));

how to implement high availability service (2 nodes) using zookeeper

i need to provide a H/A mechanism.
i understand thet zookeeper can be chosen as a leader election
i'm looking for the right pattern for this flow:
i need to implement a service that invoke a flow.
when it starting the flow [the flow is a looping flow], it must validate that it is the leader. (say by it's ip address).
i understand that i can put a value into a zookeeper that define that entering
instance, and dispose it when 1 loop end, or for a period of time.
it is that the right pattern?
also it seems that race condition issues if i use something like:
...
...
List<String> names = zk.getChildren( path, false );
String id = null;
// See whether we have already run for election in this process
for ( String name : names ) {
{
if ( name.startsWith( myIP ) ) {
id = name;
break;
}
}
if ( id == null ) {
id = zk.create( path + "/" + myIP, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
}
boolean isLeader = id != null;
for example:
2 services read null
and than 2nd overrides the 1st signature, and both of them running the task.
can you help?
thanks
Using ZooKeeper correctly can be difficult and takes some studying of its APIs and semantics. There are some nice high-level libraries such as Curator that already have implemented common algorithms such as leader election. The ZooKeeper documentation also has a recipe for leader election.

How to use Flink streaming to process Data stream of Complex Protocols

I'm using Flink Stream for the handling of data traffic log in 3G network (GPRS Tunnelling Protocol). And I'm having trouble in the synthesis of information in a user session of the user.
For example: how to map the start and end one session. I don't know that there Flink streaming suited to handle complex protocols like that?
p/s:
We capture data exchanging between SGSN and GGSN in 3G network (use GTP protocol with GTP-C/U messages). A session is started when the SGSN sends the CreateReq (TEID, Seq, IMSI, TEID_dl,TEID_data_dl) message and GGSN responses CreateRsp(TEID_dl, Seq, TEID_ul, TEID_data_ul) message.
After the session is established, others GTP-C messages (ex: UpdateReq, DeleteReq) sent from SGSN to GGSN uses TEID_ul and response message uses TEID_dl, GTP- U message uses TEID_data_ul (SGSN -> GGSN) and TEID_data_dl (GGSN -> SGSN). GTP-U messages contain information such as AppID (facebook, twitter, web), url,...
Finally, I want to handle continuous log data stream and map the GTP-C messages and GTP-U of the same one user (IMSI) to make a report.
I've tried this:
val sessions = createReqs.connect(createRsps).flatMap(new CoFlatMapFunction[CreateReq, CreateRsp, Session] {
// holds CreateReqs indexed by (tedid_dl,seq)
private val createReqs = mutable.HashMap.empty[(String, String), CreateReq]
// holds CreateRsps indexed by (tedid,seq)
private val createRsps = mutable.HashMap.empty[(String, String), CreateRsp]
override def flatMap1(req: CreateReq, out: Collector[Session]): Unit = {
val key = (req.teid_dl, req.header.seqNum)
val oRsp = createRsps.get(key)
if (!oRsp.isEmpty) {
val rsp = oRsp.get
println("OK")
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createRsps.remove(key)
} else {
createReqs.put(key, req)
}
}
override def flatMap2(rsp: CreateRsp, out: Collector[Session]): Unit = {
val key = (rsp.header.teid, rsp.header.seqNum)
val oReq = createReqs.get(key)
if (!oReq.isEmpty) {
val req = oReq.get
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createReqs.remove(key)
} else {
createRsps.put(key, rsp)
}
}
}).print()
This code always returns empty result. The fact that the input stream contains CreateRsp and CreateReq message of the same session. They appear very close together (within 1 second). When I debug, the oReq.isEmpty == true every time.
What i'm doing wrong?
To be honest it is a bit difficult to see through the telco specifics here, but if I understand correctly you have at least 3 streams, the first two being the CreateReq and the CreateRsp streams.
To detect the establishment of a session I would use the ConnectedDataStream abstraction to share state between the two aforementioned streams. Check out this example for usage or the related Flink docs.
Is this what you are trying to achieve?