amazon sqs :read message not in order - zend-framework

i would like to take messages from amazon sqs in the same order in which it is inserted into sqs ( first in first out model).
Is their any way to implement it??
I am using zend php for programing.

Unordered message delivery is inherent in the design of SQS. You could try to work around it by numbering the messages and storing the out-of-order messages locally until the missing messages arrive, but its probably not worth the hassle.
SQS is really a bit of an odd duck, it does what it says, but what it does isn't what most people are looking for in a message bus. I really wish Amazon would offer and additional queuing solution more like RabbitMQ. SQS is really only suited for distributing tasks that aren't even remotely coupled, and where things like order and latency aren't important. For instance it would be great for sending completed orders to a shipping center, or perhaps scheduling print jobs.
Their own documentation shows it being used to schedule thumbnail creation, but when I recently used it for this exact purpose I quickly discovered that my users weren't going to be impressed with the latency: which at times is 30-50 seconds.
You can still run RabbitMQ on EC2 nodes, and while not as scalable as SQS it does cluster and should take you pretty far.

You could try IronMQ. It is hosted like SQS, has guaranteed first in first out ordering, no eventual consistency delays, is uber scalable and you can be up and running in minutes.
Here's a PHP library for it: https://github.com/iron-io/iron_mq_php
Disclaimer: I work for Iron.io

The SQS documentation answers this for you (bold is my emphasis to directly answer your question):
Amazon SQS makes a best effort to preserve order in messages, but due
to the distributed nature of the queue, we cannot guarantee you will
receive messages in the exact order you sent them. If your system
requires that order be preserved, we recommend you place sequencing
information in each message so you can reorder the messages upon
receipt.

I have tried to implement the FIFO fashion for receiving the messages in the same order they were sent
For this you can use message sequence no which it sent every time with message and validate at the receiver end
By Using this way you can get desired output in FIFO order

Related

Monitor if Kafka is up?

I need to simply monitor if my Kafka cluster is up. Occasionally the machines running Kafka were shutdown. I want to send an email alert if the cluster is not available.
I can create a producer and consumer to send and receive dummy messages periodically. Is there a simpler way to do it?
You can use https://github.com/obsidiandynamics/kafdrop
It won't send you emails, but it much easier than send dummy messages
Actually knowing if cluster is up is not so easy at all, there is discussion with community what is the best practice to decide if kafka cluster is up and active but there is no current good way to get this information, as kafka architecture is distributed system, you might have big clusters and while one or more brokers are down , still having your cluster to give high available service, not effecting the integrity of data. Also you might have problems with one topic while on other topics it might work fine.
One suggestion I read which might give you the most certain approach is to produce "dummy" msgs to your applicative topics, and "skip" these msgs on consumption, that guarantee you that your application would work. I don't like this approach very much as it requires to "send junk to your main topics"
Other approaches are like you say "produce/consume to/from test/healthcheck topic" but it is might not give full guarantee that your application would work, this is a lot like select from dummy in other db approaches... if for them is good enough....
Another suggestion is to use AdminClient to read the metrics of cluster, if metrics are provided that usually means the cluster is healthy , also not very good guarantee...
I asked in comment which language are you using, maybe you are using something like spring which has HealthIndicator to check component status, but for your case it would be little different.
First of all, you should know that Kafka by default should be High
Available, so while building the cluster you should follow the bold
lines of best practices, you should ensure that you have replicas of
machines. This is good assumption that will make you satisfied over implementing all of this.
But, if you want to check health of a cluster, you can use admin process, you can use AdminClient, with help of some utilities; you can check list of topics, groups, etc that you have. But this not 100% guarantee for you although it is good workaround.
You can do that using as you mentioned periodic scheduler, and send email based on the findings you get. But again this is not the ideal solution, and HA cluster infrastructure should save lots of time for you if you build it correctly from the beginning.

Mixing communication methods for microservices

I am working on a project which is actually will be a better version of an old project. We want it to be scalable to be able to deal with high load. So we decided to go with microservices instead of monolithic. Then I started to do research about microservices, how they communicate, common design patterns and other things. Since I want my services to be scalable, event based communication made sense to me. So I decided to use kafka for this purpose.
We have much more services in the system but to simplify my question lets say I have 2 types of services which are work-node and master-node. I want both of them to be scalable. For now they are communicating over kafka.
My question : for a case I want to publish an event (produce a message on a topic) from master-node and get that event (consume from the topic) from all work-nodes. But for an other case I need to send a message to specific work-node. To be able to cover first case, all my work-nodes have different group ids in kafka and when a message published on a topic they all get that message. I know that I am not able to send a message to specific consumer with kafka. Since my nodes are scalable and their number can increase or decrease depending on the load, creating a topic for each node does not seem a good idea. My first solution was adding work-node id in message. So other work-nodes can ignore that message. Well it works but I don't think it is a good solution. My second solution is sending http request if I am going to send a message to specific node. But I don't know mixing 2 communication methods is a good solution.
What do you guys think about this problem. Is there a better solution that I am missing ? Or my whole design is going wrong ?
Kafka is not an appropriate technology for the use case you describe. I would recommend using Cadence Workflow which natively supports routing tasks to specific nodes as well as dozens of other features that messaging systems lack.
Feel free to join Cadence Workflow slack channel if you have specific questions.
I think you should able to. Consider regular Kafka flow. You have some consumer groups subscribed to the topic. Producer doesn't send message to specific partition until you specify.
Now think about the scenario that you produce some message based on your algorithm to the specific partitions.
Message received from A
some kind of algorithm like hashcode generated always 0 for A
Message send to Partition 0
Consumer 1 connected to Partiton 0
Only Consumer 1 gets the message coming from A

Pub/Sub and consumer aware publishing. Stop producing when nobody is subscribed

I'm trying to find a messaging system that supports the following use case.
Producer registers topic namespace
client subscribes to topic
first client triggers notification on producer to start producing
new client with subscription to the same topic receives data (potentially conflated, similar to hot/cold observables in RX world)
When the last client goes away, unsubscribe or crash, notify the producer to stop producing to said topic
I am aware that according to the pub/sub pattern A producer is defined to be blissfully unaware of the existence of consumers, meaning that my use-case simply does not fit the pub/sub paradigm.
So far I have looked into Kafka, Redis, NATS.io and Amazon SQS, but without much success. I've been thinking about a few possible ways to solve this, Haven't however found anything that would satisfy my needs yet.
One option that springs to mind, for bullet 2) is to model a request/reply pattern as amongs others detailed on the NATS page to have the producer listen to clients. A client would then publish a 'subscribe' message into the system that the producer would pick up on a wildcard subscription. This however leaves one big problem, which is unsubscribing. Assuming the consumer stops as it should, publishing an unsubscribe message just like the subscribe would work. But in the case of a crash or similar this won't work.
I'd be grateful for any ideas, references or architectural patterns/best practices that satisfy the above.
I've been doing quite a bit of research over the past week but haven't come across any satisfying Q&A or articles. Either I'm approaching it entirely wrong, or there just doesn't seem to be much out there which would surprise me as to me, this appears to be a fairly common scenario that applies to many domains.
thanks in advance
Chris
//edit
An actual simple use-case that I have at hand is stock quote distribution.
Quotes come from external source
subscribe to stock A quotes from external system when the first end-user looks at stock A
Stop receiving quotes for stock A from external system when no more end-users look at said stock
RabbitMQ has internal events you can use with the Event Exchange Plugin. Event such as consumer.created or consumer.deleted could be use to trigger some actions at your server level: for example, checking the remaining number of consumers using RabbitMQ Management API and takes action such as closing a topic, based on your use cases.
I don't have any messaging consumer present based publishing in mind. Got ever worst because you'll need kind of heartbeat mechanism to handle consumer crashes.
So here are my two cents, not sue if you're looking for an out of the box solution, but if not, you could wrap your application around a zookeeper cluster to handle all your use cases.
Simply use watchers on ephemeral nodes to check when you have no more consumers ( including crashes) and put some watcher around a 'consumers' path to be advertised when you get consumers.
Consumers side, you would have to register your zk node ID whenever you start it.
It's not so complicated to do, and zk is not the only solution for this, you might use other consensus techs as well.
A start for zookeeper :
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html
( strongly advise to use curator api, which handle lot of recipes in a smooth way)
Yannick
Unfortunately you haven't specified your use business use case that you try to solve with such requirements. From the sound of it you want not the pub/sub system, but an orchestration one.
I would recommend checking out the Cadence Workflow that is capable of supporting your listed requirements and many more orchestration use cases.
Here is a strawman design that satisfies your requirements:
Any new subscriber sends an event to a workflow with a TopicName as a workflowID to subscribe. If workflow with given ID doesn't exist it is automatically started.
Any subscribe sends another signal to unsubscribe.
When no subscribers are left workflow exits.
Publisher sends an event to the workflow to deliver to subscribers.
Workflow delivers the event to the subscribers using an activity.
If workflow with given TopicName doesn't run the publish event to it is going to fail.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
Distributed CRON support
See the presentation that goes over Cadence programming model.

Is a message queue like RabbitMQ the ideal solution for this application?

I have been working on a project that is basically an e-commerce. It's a multi tenant application in which every client has its own domain and the website adjusts itself based on the clients' configuration.
If the client already has a software that manages his inventory like an ERP, I would need a medium on which, when the e-commerce generates an order, external applications like the ERP can be notified that this has happened to take actions in response. It would be like raising events over different applications.
I thought about storing these events in a database and having the client make requests in a short interval to fetch the data, but something about polling and using a REST Api for this seems hackish.
Then I thought about using Websockets, but if the client is offline for some reason when the event is generated, the delivery cannot be assured.
Then I encountered Message Queues, RabbitMQ to be specific. With a message queue, modeling the problem in a simplistic manner, the e-commerce would produce events on one end and push them to a queue that a clients worker would be processing as events arrive.
I don't know what is the best approach, to be honest, and would love some of you experienced developers give me a hand with this.
I do agree with Steve, using a message queue in your situation is ideal. Message queueing allows web servers to respond to requests quickly, instead of being forced to perform resource-heavy procedures on the spot. You can put your events to the queue and let the consumer/worker handle the request when the consumer has time to handle the request.
I recommend CloudAMQP for RabbitMQ, it's easy to try out and you can get started quickly. CloudAMQP is a hosted RabbitMQ service in the cloud. I also recommend this RabbitMQ guide: https://www.cloudamqp.com/blog/2015-05-18-part1-rabbitmq-for-beginners-what-is-rabbitmq.html
Your idea of using a message queue is a good one, better than database or websockets for the reasons you describe. With the message queue (RabbitMQ, or another server/broker based system such as Apache Qpid) approach you should consider putting a broker in a "DMZ" sort of network location so that your internal ecommerce system can push events out to it, and your external clients can reach into without risking direct access to your core business systems. You could also run a separate broker per client.

Is Kafka ready for production use?

I have an application in production that has to process several gigabytes of messages per day. I like the Kafka architecture and performance a lot; it perfectly fits my needs.
I'd like to replace my messaging layer with Kafka at some point. Is the 0.7.1 version good enough for production use in terms of stability and consistency in performance?
It is definitely in use at several Big Data companies already, including LinkedIn, where it was created (and later open sourced), and Tumblr. Just Tumblr by itself handles many gigabytes of messages per day. I'm sure LinkedIn is way up there too. You can see a list of companies known to currently use it here:
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Also, be sure to subscribe to their mailing list, there are lots of people actively trying it out and using it in production environments.
I'm sure it can handle whatever volume you can throw at it.
There is one critical feature I think Kafka is missing before it is ready for production.
"Flushing messages to disc if the producer can't reach any Kafka broker"
The issue has been filed a long time ago here:
https://issues.apache.org/jira/browse/KAFKA-156
This feature will makes the complete Kafka event pipline even more robust for some use-cases when the producer always has to be able to send events. For example when you track pageviews or like-button clicks and you don't want to miss any events, even if all Kafka brokers are unreachable.
I must agree with Dave, Kafka is a good tool but it missing some basic features which some can be done manually but then you need to think what Kafka provide. some missing things are:
(As Dave said) Flushing messages to disk when the producer fail to send them
Consumers ability to track which messages were handled (not just consumed) and which wasn't in case of a restart.
Monitoring - a way to receive the current status of the entities in the system like the current size of the queue in the producer or the write\read pace at the brokers (those can be done but are not part of the tool).
I have used kafka for quite sometime. Using native java and python clients would be preferred.
I had to struggle a lot finding a proper node.js client. literally re-wrote my whole code many a times using different clients as they had lot of bugs.
Finally settled with franz-kafka for node.js.
Apart from that maintaining the consumer offsets is a bit difficult. It is missing some good features like exchanges that exist in AMQP based Apache Qpid or RabbitMQ
Since it's distributed, supports offline messages and the performance is really impressive. I too preferred it :)