Akka DistributedPubSubMediator at-least-once delivery guarantees for publishing to a topic - scala

I need to have at-least-once delivery guarantees for messages published to a DistributedPubSubMediator topic.
I looked into DistributedPubSubMediator.scala and can see the following in TopicLike trait (Akka 2.4.6):
trait TopicLike extends Actor {
var subscribers = Set.empty[ActorRef]
def defaultReceive: Receive = {
case msg ?
subscribers foreach { _ forward msg }
}
}
However I couldn't find any method to retrieve subscribers set from mediator... It would be great if there was a message request GetTopicSubscribers which would expose this information to mediator clients:
mediator ! GetTopicSubscribers("mytopic")
So after publishing to a topic the publisher could wait for Ack messages from all active subscribers. Is there any other way to accomplish something like that?
It would be great if somehow akka.contrib.pattern.ReliableProxy can be plugged in into DistributedPubSubMediator.

You could get your publisher to ask the mediator for the count of subscribers via a akka.cluster.pubsub.DistributedPubSubMediator.CountSubscribers("myTopic")
Then it just needs to keep a count of how many Ack messages it gets from subscribers.
No need to track actual subscribers or which ones have Acknowledged, when your Ack count reaches the subscriber count you know they have all received it (thanks to Akka at-most-once delivery reliability)
Note however this comment in the source code:
// Only for testing purposes, to poll/await replication
case object Count
final case class CountSubscribers(topic: String)
Suggests perhaps CountSubscribers is not something to rely on too heavily?

Related

Feedback message from subscriber in mqtt protocol

I used the MQTT protocol to send messages between two computers. I have patterned from this code.
publisher:
import paho.mqtt.client as mqtt
from random import randrange, uniform
import time
mqttBroker ="mqtt.eclipse.org"
client = mqtt.Client("Temperature_Inside")
client.connect(mqttBroker)
while True:
randNumber = randrange(10)
client.publish("TEMPERATURE", randNumber)
print("Just published " + str(randNumber) + " to topic TEMPERATURE")
time.sleep(1)
subscriber:
import paho.mqtt.client as mqtt
import time
def on_message(client, userdata, message):
print("received message: " ,str(message.payload.decode("utf-8")))
mqttBroker ="mqtt.eclipse.org"
client = mqtt.Client("Smartphone")
client.connect(mqttBroker)
client.loop_start()
client.subscribe("TEMPERATURE")
client.on_message=on_message
time.sleep(1)
client.loop_stop()
I want a feedback to be sent to the publisher when I receive the message. Is there a way to get message feedback?
There is no end to end delivery notification in the MQTT protocol. This is very deliberate.
MQTT is a pub/sub system, designed to totally separate the producer of information from the consumer. There could be 0 to infinite number of subscribers to a topic when a producer publishes a message. There could also be offline subscribers who will have the message delivered when they reconnect (which could be any time after the message was published)
What MQTT does provide is the QOS levels, but it is important to remember that these only apply to a single leg of the delivery journey. E.g. a message published at QOS 2 ensures it will reach the broker, but makes no guarantees about any subscribers as their subscription may be at QOS 0.
If your system requires end to end delivery notification then you will need to implement a response message yourself, this is normally dinner by including a unique ID in the initial message and sending a separate response message in a different topic also containing that ID
To ensure your message will get delivered you can use QoS. This can be set when publishing or subscribing. So for your case you will want either QoS 1 or 2. QoS 2 ensures it will reach the broker exactly once when publishing, and if subscribed at QoS 2 it will ensure the subscriber gets the message exactly once. Note though QoS 2 is the slowest form of publishing and subscribing. I find the most common way to deal with messages is with QoS 1, and then in your subscribe on_message you can determine how to deal with duplicate messages yourself. The paho mqtt client allows you to set QoS when publishing or subscribing but it defaults to 0. I've updated your code below.
# publisher
import paho.mqtt.client as mqtt
from random import randrange, uniform
import time
import json
mqttBroker ="mqtt.eclipse.org"
client = mqtt.Client("Temperature_Inside")
client.connect(mqttBroker)
id = 1
while True:
randNumber = randrange(10)
message_dict = { 'id': id, 'value': randNumber }
client.publish("TEMPERATURE", json.dumps(message_dict), 1)
print("Just published " + str(randNumber) + " to topic TEMPERATURE")
id += 1
time.sleep(1)
# subscriber
import paho.mqtt.client as mqtt
import time
import json
from datetime import datetime
parsed_messages = {}
def on_message(client, userdata, message):
json_body = json.loads(str(message.payload.decode("utf-8")))
msg_id = json_body['id']
if msg_id in parsed_messages.keys
print("Message already recieved at: ", parsed_messages[msg_id].strftime("%H:%M:%S"))
else
print("received message: " ,json_body['value'])
parsed_messages[msg_key] = datetime.now()
mqttBroker ="mqtt.eclipse.org"
client = mqtt.Client("Smartphone")
client.connect(mqttBroker)
client.loop_start()
client.subscribe("TEMPERATURE", 1)
client.on_message=on_message
time.sleep(1)
client.loop_stop()
Note it is important that the subscriber also defines QoS 1 on subscribing otherwise it will subscribe with QoS 0 which is default for the paho client, and the message will get downgraded to 0, meaning the message will get delivered at most once (but may not get delivered at all if packet dropped). As stated the above only ensures that the message will get received by the subscriber. If you want the publisher to get notified when the subscriber has processed the message you will need to publish on a new topic (with some uuid) when the subscriber has finished processing that the publisher can subscribe to. However when I see this being done I often question why use MQTT, and not just send HTTP requests. Here is a good link on MQTT QoS if you're interested (although it lacks detail on what happens from subscriber side).

How to efficiently produce messages out of a collection to Kafka

In my Scala (2.11) stream application I am consuming data from one queue in IBM MQ and writing it to a Kafka topic that has one partition. After consuming the data from the MQ the message payload gets splitted into 3000 smaller messages that are stored in a Sequence of Strings. Then each of these 3000 messages are send to Kafka (version 2.x) using KafkaProducer.
How would you send those 3000 messages?
I can't increase the number of queues in IBM MQ (not under my control) nor the number of partitions in the topic (ordering of messages is required, and writing a custom partitioner will impact too many consumers of the topic).
The Producer settings are currently:
acks=1
linger.ms=0
batch.size=65536
But optimizing them is probably a question of its own and not part of my current problem.
Currently, I am doing
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
private lazy val kafkaProducer: KafkaProducer[String, String] = new KafkaProducer[String, String](someProperties)
val messages: Seq[String] = Seq(String1, …, String3000)
for (msg <- messages) {
val future = kafkaProducer.send(new ProducerRecord[String, String](someTopic, someKey, msg))
val recordMetadata = future.get()
}
To me it looks like not the most elegant and most efficient way. Is there a programmatic way to increase throughput?
edit after answer from #radai
Thanks to the answer pointing me to the right direction I had a closer look into the different Producer methods. The book Kafka - The Definitive Guide list these methods:
Fire-and-forget
We send a message to the server and don’t really care if it arrives succesfully or not. Most of the time, it will arrive successfully, since Kafka is highly available and the producer will retry sending messages automatically. However, some messages will get lost using this method.
Synchronous send
We send a message, the send() method returns a Future object, and we use get()
to wait on the future and see if the send() was successful or not.
Asynchronous send
We call the send() method with a callback function, which gets triggered when it
receives a response from the Kafka broker
And now my code looks like this (leaving out error handling and the definition of Callback class):
val asyncProducer = new KafkaProducer[String, String](someProperties)
for (msg <- messages) {
val record = new ProducerRecord[String, String](someTopic, someKey, msg)
asyncProducer.send(record, new compareProducerCallback)
}
asyncProducer.flush()
I have compared all the methods for 10000 very small messages. Here is my measure result:
Fire-and-forget: 173683464ns
Synchronous send: 29195039875ns
Asynchronous send: 44153826ns
To be honest, there is probably more potential to optimize all of them by choosing the right properties (batch.size, linger.ms, ...).
the biggest reason i can see for your code to be slow is that youre waiting on every single send future.
kafka was designed to send batches. by sending one record at a time youre waiting round-trip time for every single record and youre not getting any benefit from compression.
the "idiomatic" thing to do would be send everything, and then block on all the resulting futures in a 2nd loop.
also, if you intend to do this i'd bump linger back up (otherwise your 1st record would result in a batch of size one, slowing you down overall. see https://en.wikipedia.org/wiki/Nagle%27s_algorithm) and call flush() on the producer once your send loop is done.

How I can prioritize mailbox in untyped Actor?

Problem:
Actor process all messages from his mailbox using FIFO strategy.
Let's suppose we want kill an actor sending to him the MyPoisonPill message, actor still handle messages in mailbox until arrive turn of MyPoisonPill.
Question:
How I can prioritize messages in actor mailbox?
UPD:
Let's consider A PoisonPill like my own message, because I am not sure that akka's PoisonPill has or not any priority in mailbox.
There are different strategies about how the messages are delivered. You can create a BoundedPriorityMailbox to have a priority for your messages.
Other type of mailboxes are given in https://doc.akka.io/docs/akka/2.5/mailboxes.html#builtin-mailbox-implementations
An example to implement is given in https://blog.knoldus.com/how-to-create-a-priority-based-mailbox-for-an-actor/

How can I send a message to my akka actor system's event stream without addressing the message to any actor in particular?

I'm interested in implementing:
1. an akka actor A that sends messages to an event stream;
2. an akka actor L that listens to messages of a certain type that have been published on the event stream.
If possible, I would like to reutilize the actor system's event stream.
I know how to do 2. It is explained here: https://doc.akka.io/docs/akka/2.5/event-bus.html#event-stream
But how can I do 1?
I know how to make A send a message addressed to another actor(Ref), but I do not want to address the message to any particular actor(Ref). I just want the message to appear in the event stream and be picked up by whoever is listening to messages of that type. Is this possible somehow?
A side-question: if I implement 2 as described in https://doc.akka.io/docs/akka/2.5/event-bus.html#event-stream, does the listener know who sent the message?
As per the documentation link that you posted you can publish messages to the EventStream:
system.eventStream.publish(Jazz("Sonny Rollins"))
Message will be delivered to all actors that subscribed themselves to this message type:
system.eventStream.subscribe(jazzListener, classOf[Jazz])
For the subscribers to know the sender, I suggest you define an ActorRef field in your payload and the sending actor can put its self reference in it when publishing the message. NB Defining the sender's ActorRef explicitly in the message type is how the new akka-typed library deals with all actor interactions, so it's a good idea to get used to this pattern.

Akka, Camel and ActiveMQ: throttling consumers

I've got a very basic skeleton Scala application (with Akka, Camel and ActiveMQ) where I want to publish onto an ActiveMQ queue as quickly as possible, but then only consume from that queue at a particular rate (eg. 1 per second).
Here's some code to illustrate that:
MyProducer.scala
class Producer extends Actor with Producer with Oneway {
def endpointUri = "activemq:myqueue"
}
MyConsumer.scala
class MyConsumer extends Actor with Consumer {
def endpointUri = "activemq:myqueue"
def receive = {
case msg: CamelMessage => println("Ping!")
}
}
In my main method, I then have all the boilerplate to set up Camel and get it talking to ActiveMQ, and then I have:
// Start the consumer
val consumer = system.actorOf(Props[MyConsumer])
val producer = system.actorOf(Props[MyProducer])
// Imagine I call this line 100+ times
producer ! "message"
How can I make it so that MyProducer sends things to ActiveMQ as quickly as possible (ie. no throttling) whilst making sure that MyConsumer only reads a message every x seconds? I'd like each message to stay on the ActiveMQ queue until the last possible moment (ie. when it's read by MyConsumer).
So far, I've managed to use a TimerBasedThrottler to consume at a certain rate, but this still consumes all of the messages in one big go.
Apologies if I've missed something along the way, I'm relatively new to Akka/Camel.
How many consumers comprise "MyConsumer"?
a) If it were only one, then it is unclear why a simple sleep between reading/consuming messages would not work.
If there are multiple consumers, which behavior are you requiring:
each consumer is throttled to the specified consumption rate. In that case each Consumer thread still behaves as mentioned in a)
the overall pool of consumers is throttled to the consumption rate. In that case a central Throttler would need to retain the inter-message delay and block each consumer until the required delay were met . There would be the complexity of managing when there were backlogs - to allow "catch-up". You probably get the drift here.
It may be you were looking for something else /more specific in this question. If so then please elaborate.