Does recieving message id (future) indicate message is sent in PubSub? - publish-subscribe

I am sending messages to google pubsub using a callback function which reads back the message id from the future. Using the following code:
"""Publishes multiple messages to a Pub/Sub topic with an error handler."""
import time
from google.cloud import pubsub_v1
# ...
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, topic_name)
def get_callback(f, data):
def callback(f):
try:
print(f.result())
except: # noqa
print('Please handle {} for {}.'.format(f.exception(), data))
return callback
for i in range(10):
data = str('message')
# When you publish a message, the client returns a future.
future = publisher.publish(
topic_path, data=data.encode('utf-8') # data must be a bytestring.
)
# Publish failures shall be handled in the callback function.
future.add_done_callback(get_callback(future, data))
print('Published message with error handler.')
I am receiving back the message id successfully with now errors/exceptions, however I find that some messages are not read into pubsub (when viewing them from the GCP console).
The message id is printed in the line print(f.result()) within the callback function.
My question is:
Is it safe to assume that messages are sent to Pubsub successfully following the receipt of a messageid?
If so, what could be the cause for 'dropped' messages?

If publish has returned successfully with a message ID, then yes, Cloud Pub/Sub guarantees to deliver the message to subscribers. If you do not see the message, there are several things that could cause this:
The subscription did not exist at the time the message was published. Cloud Pub/Sub only delivers messages to subscribers if the subscription was created before the message was published.
Another subscriber for the subscription was already running and received the message. If you are using the GCP console to fetch messages while a subscriber is up and running, it is possible that subscriber got the messages.
If you got the message on the console once and then reloaded to get the messages again, you may not see the message again until the ack deadline has passed, which defaults to 10 seconds.
A single pull request (which is what the GCP console "View Messages" feature does), may not be sufficient to retrieve the message. If you click on it multiple times or start up a basic subscriber, you will likely see the message.

Related

How to list/view messages on subscription in Gcloud pubsub?

I can acknowledge all messages on a subscription as follows:
gcloud pubsub subscriptions pull --auto-ack --limit=999 my-sub
(Although I often have to run this repeatedly before all messages are acknowledged).
However, I don't want to acknowledge them, I just want to see all unacknowledged messages (or just a count of how many unacknowledged messages there are would be helpful too).
I thought it might be:
gcloud pubsub subscriptions pull --limit=1 my-sub
But when I run this command it shows a different message every time.
I did this in the hope that I could run:
gcloud pubsub subscriptions pull --limit=999 my-sub
To see all unacknowledged messages.
You can use gcloud pubsub subscriptions pull to get messages for a subscription. If you do not pass in --auto-ack, then some of the messages should be displayed when you make the call. These messages will likely not be returned in subsequent requests until the ack deadline has passed since they will be considered outstanding when returned to the gcloud command. That is why you see a different message every time you call with --limit=1.
Additionally, setting the limit for the call to pull does not guarantee that all messages will be returned in a single response, only that no more than that may messages will be returned. If you want to see all messages, you'll have to run the command repeatedly, but you won't necessarily be able to see all of them in a deterministic way.
You would probably be better off writing a little subscriber app that receives messages and ultimately nacks them (or just lets the ack deadline expire if you aren't worried about ensuring the messages are delivered quickly to another subscriber).
Unfortunately, there is no direct command to get the unacknowledged messages or the number of unacknowledged messages in Pub/Sub.
However, you can use Cloud Monitoring using this method
pubsub.googleapis.com/subscription/num_undelivered_messages.
Cloud Monitoring has an API, so you can get this number programmatically.
This is how to do it:
You can get the values via the projects.timeSeries.list method. Set the name to projects/<your project> and use this filter:
metric.type = "pubsub.googleapis.com/subscription/num_undelivered_messages"
Or, if you want a specific subscription, you can add this filter:
resource.label.subscription_id = "<subscription name>".
The result will be one or more TimeSeries types with the points field, including the data points for the specified time range. It will have the value's int64 Value set to the number of unacknowledged messages by subscribers.
Also, you can see this official documentation about Introduction to Cloud Monitoring API and Monitoring your API usage.

Is there a way to explicitly acknowledge message receipt with QuickFIX/J?

For a guaranteed message receiver, in an ACK-based protocol like Apache Kafka, TIBCO EMS/RVCM, IBM MQ and JMS there is a way to explicitly acknowledge the receipt of a message. Explicit Acks are not just automatically sent when you return from a dispatcher's callback but an extra method on the session or message to say "I've processed this message". The reason for the existence of this explicit ack is that you can safely queue received messages to be processed by another thread at a later time and then only call this explicit-ack method once your are really done processing this message (safely storing to DB, forwarding to another MOM, etc.) Having this explicit method ensures that you are not losing messages even when you crash after receiving messages but didn't process them yet.
Now with QuickFIX/J (of FIX in general) I know it's not ACK-based but instead persists the last received SeqNum in a file and instead of sendings Acks, message guarantee is achieved by sending ResendRequests for missed SeqNums. But still, is there a way to tell the QuickFIX/J API "I don't automatically want you to persist this last SeqNum once I exit this onMessage() callback but hold off until I tell you so". In other words is there a Session variation which doesn't persist SeqNums automatically and then I can call something on the FIX message to persist this last Seqnum once I've really processed/saved that message ?
(If this feature doesn't exist I think it would be a good addition to the API)

Sending response in a Pub-Sub architecture

At the moment I have a single AWS EC2 instance which handles all incoming http client requests. It analyses each request and then decides which back end worker server should handle the request and then makes a http call to the chosen server. The back end server then responds when it has processed the request. The front end server will then respond to the client. The front end server is effectively a load balancer.
I now want to go to a Pub-Sub architecture instead of the front end server pushing the requests to the back end instances. The front end server will do some basic processing and then simply put the request into an SNS queue and the logic of which back end server should handle the request is left to the back end servers themselves.
My question is with this model what is the best way to have the back end servers notify the front end server that they have processed the request? Previously they just replied to the http request the front end server sent but now there is no direct request, just an item of work being published to a queue and a back end instance picking it off the queue.
Pubsub architectures are not well suited to responses/acknowledgements. Their fire-and-forget broadcasting pattern decouples publishers and the subscribers: a publisher does not know if or how many subscribers there are, and the subscribers do no know which publisher generated a message. Also, it can be difficult to guarantee sequence of responses, they won't necessarily match the sequence of messages due to the nature of network comms and handling of messages can take different amounts of time etc. So each message that needs to be acknowledge needs a unique ID that the subscriber can include in its response so the publisher can match a response with the message sent. For example:
publisher sends message "new event" and provides a UUID for the
event
many subscribers get the message; some may be the handlers for
the request, but others might be observers, loggers, analytics, etc
if only one subscriber handles the message (e.g. the first
subscriber to get a key from somewhere), that subscriber generates a
message "new event handled" and provides a UUID
the original
publisher, as well as any number of other subscribers, may get that
message;
the original publisher sees the ID is
in its cache as an unconfirmed message, and now marks it as
confirmed
if a certain amount of time passes without receiving a
confirmation with given ID, the original publisher republishes the
original message, with a new ID, and removes the old ID from cache.
In step 3, if many subscribers handled the message instead of just one, then it
less obvious how the original publisher should handle "responses": how does it
know how many subscribers handle the message, some could be down or
too busy to respond, or some may be in the process of responding by the time
the original publisher determines that "not enough handlers have
responded".
Publish-subscribe architectures should be designed to not request any response, but instead to check for some condition that should have happened as a result of the command being handled, such as a thumbnail having gotten generated (it can assume as a result of a handler of the message).

Handling REST request with Message Queue

I have two applications as stated below:
Spring boot application - Acts as rest end point, publishes the request to message queue. ( Apache Pulsar )
Heron (Storm) topology - which processes the message received from Message queue ( PULSAR ) and has all logic for processing.
My requirement, i need to serve different user queries through Spring boot application, which emits that query to message queue, and is consumed at spout. Once spout and bolts process the requests, a message is published again from bolt. That response from Bolt is handled at Spring boot (consumer) and replies to the user request. Typlically as shown below:
To serve to the same request, Im right now caching the deferred result object ( I set a reqID to each message which is sent to topology and I also maintain a key, value pair for ) in memory and when the message arrives I parse the request id and set the result to the defferedResult (I know this is a bad design, HOW SHOULD ONE SOLVE THIS ISSUE ?).
How can I proceed to serve the response back to the same request in this scenario where the order of messages received from topology is not sequential ( as each request which is processes takes its own time and producer bolt will fire the response as on when it is receives one ).
Im kind of stuck with this design and not able to proceed further.
//Controller
public DeferredResult<ResponseEntity<?>> process(//someinput) {
DeferredResult<ResponseEntity<?>> result = new DeferredResult<>(config.getTimeout());
CompletableFuture<String> serviceResponse = service.processAsync(inputSource);
serviceResponse.whenComplete((response, exception) -> {
if (!ObjectUtils.isEmpty(exception))
result.setErrorResult(//error);
else
result.setResult(//complete);
});
return result;
}
//In Service
public CompletableFuture processAsync(//input){
producer.send(input);
CompletableFuture result = new CompletableFuture();
//consumer has a listener as shown below
// **I want to avoid below line, how can I redesign this**
map.put(id, result);
return result;
}
//in same service, a listener is present for consumer for reading the messages
consumerListener(Message msg){
int reqID = msg.getRequestID();
map.get(reqID).complete(msg.getData);
}
As shown above as soon as I get a message I get the completableFuture
object and set the result, which interally calls the defferred result
object and returns the response to the user.
How can I proceed to serve the response back to the same request in this scenario where the order of messages received from topology is not sequential ( as each request which is processes takes its own time and producer bolt will fire the response as on when it is receives one ).
It sounds like you are looking for the Correlation Identifier messaging pattern. In broad strokes, you compute/create an identifier that gets attached to the message sent to pulsar, and arrange that Heron copies that identifier from the request it receives to the response it sends.
Thus, when your Spring Boot component is consuming messages from pulsar at step 5, you match the correlation id to the correct http request, and return the result.
Using the original requestId() as your correlation identifier should be fine, as far as I can tell.
To serve to the same request, Im right now caching the deferred result object ( I set a reqID to each message which is sent to topology and I also maintain a key, value pair for ) in memory and when the message arrives I parse the request id and set the result to the defferedResult (I know this is a bad design, HOW SHOULD ONE SOLVE THIS ISSUE ?).
Ultimately, you are likely to be doing that at some level; which is to say that the consumer at step 5 is going to be using the correlation id to look up something that was stored by the producer. Trying to pass the original request across four different process boundaries is likely to end in tears.
The more general form is to store a callback, rather than a CompletableFuture, in the map; but in this case the callback probably just completes the future.
The one thing I would want to check carefully in the design: you want to be sure that the consumer at step 5 sees the future it is supposed to use before the message arrives. In other words, there should be a happens-before memory barrier somewhere to ensure that the map lookup at step 5 doesn't fail.

How to handle OnComplete message with internal queuing reactive stream subscriber?

I'm using Akka-Stream 1.0 with a simple reactive stream:
An publisher sends N messages
A subscriber consumes the N messages
with
override val requestStrategy = new MaxInFlightRequestStrategy(max = 20) {
override def inFlightInternally: Int = messageBacklog.size
The publisher will close the stream after N messages (dynamically) via sending an OnComplete message.
The subscriber receives the messages and goes into canceled state right away. The problem is, that the subscriber needs some time to process each messages meaning that I usually have some backlog of messages - which can't be processed anymore as the subscriber gets canceled - IMHO in ActorSubscriber.scala:195
Processing a message means that my Subscriber will offload the work to someone else (Sending content back via Spray's ChunkedMessages) and gets a ack message back as soon a message is completed. As the Actor is canceled, the ack message is never processed and the backlog processed.
What is recommended to let me complete the backlog?
I could 'invent' my own 'Done Marker' but that sounds very strange to me. Obviously my code works with MaxInFlightRequestStrategy and a max of 1 - as there the demand will be always only 1 - meaning I never have a backlog of messages.
After long hours of debugging and trying around I think I understand what was/is going on - hopefully it saves other peoples time:
I think I failed with a conceptual misunderstanding on how to implement an reactive subscriber:
I was spooling messages internally of an ActorSubscriber and released those spooled messages at the right time back to the business logic via self ! SpooledMessage - which caused the calculations of the Subscriber to go crazy: Each spooled messages was counted twice as 'received' causing the internals to ask for even more messages from upstream.
Fixing this by processing the spooled messages within the actor itself resolved that problem - allowing me also to use OnComplete properly: As soon as this messages is received, the Subscriber does not get any new messages but I process the internal queue on its own (without using self ! ...) and thus complete the whole stream processing.