I want to record messages streamed by NATS-streaming channel and stream it again when I want.
I want to create infrastructure to test my micro-service app.
All micro-services are talking to each other using NATS-streaming and I would like to "record" stream of data, clean my database and stream it again to test if the system is consistent.
I saw that there is configuration for file-store or SQL-store but both are for storing the current state of NATS as described in the documentation. Also, I didn't find a way to "stream again" that data.
Is there any way to do it?
Thanks!
When messages are published, they are stored in the channel (message log) they are published to. You can then start subscriptions that point to any sequence/time in that channel and replay messages. As for persistence, there is support for memory, file and SQL stores.
More on documentation:
Message logs
Subscriptions
Store implementations
Related
I have a system that uses mongoDB as persistence and RabbitMQ as Message broker. I have a challenge that I only want to implement transactional outbox for RabbitMQ publish fail scenarios. I'm not sure is it possible because, I also have consumers that is using same mongoDB persistence so when I'm writing a code that covers transactional outbox for RabbitMQ publish fail scenarios, published messages reaching consumers before mongoDB commitTransaction so my consumer couldn't find the message in mongoDB because of latency.
My code is something like below;
1- start session transaction
2- insert into document with session (so it doesn't persist until I call commit)
3- publish rabbitMQ
4- if success commitTransaction
5- if error insert into outbox document with session than commitTransaction
6- if something went wrong on mongoDB abortTransaction (if published succeed and mongoDB failed, my consumers first check for mongoDB existence and if it doesn't exist don't do anything.)
So the problem is in here messages reaching consumer earlier than
mongoDB persistence, do you advice any solution that covers my
problem?
As far as I can tell the architecture outlined in the picture in https://microservices.io/patterns/data/transactional-outbox.html maps directly to MongoDB change streams:
keep the transaction around 1
insert into the outbox table in the transaction
setup message relay process which requests a change stream on the outbox table and for every inserted document publishes a message to the message broker
The publication to message broker can be retried and the change stream reads can also be retried in case of any errors. You need to track resume tokens correctly, see e.g. https://docs.mongodb.com/ruby-driver/master/reference/change-streams/#resuming-a-change-stream.
Limitations of this approach:
only one message relay process, no scalability and no redundancy - if it dies you won't get notifications until it comes back
Your proposed solution has a different set of issues, for example by publishing notifications before committing you open yourself up to the possibility of the notification processor not being able to find the document it got out of the message broker as you said.
So I would like to share my solution.
Unfortunately it's not possible to implement transactional outbox pattern for only fail scenarios.
What I decided is, create an architecture around High Availability so;
MongoDB as High Available persistence and RabbitMQ as High Available message broker.
I removed all session transactions that I coded before and implemented immediate write and publish.
In worst case scenario:
1- insert into document (success)
2- rabbitmq publish (failed)
3- insert into outbox (failed)
What will I have is, unpublished documents in my mongo. Even in worst case scenario I could re publish messages from MongoDB with another application but I'll not write that application until I'll face with that case because we can not cover every fail scenarios on our code. So our message brokers or persistences must be high available.
In my understanding, when I want to send a movie (4GB) to a Kafka broker, one producer will send that 4GB byte of a video file (serialized it) and send it to a kafka broker and many consumers who want to see that movie will consume that movie file.
I heard Netflix uses Kafka to send and watch movies. I am curious how they use producer, broker, and consumer. I'm using Netflix, and it's really fast. I want to know how they use Kafka.(especially how they use producers and consumers)
And as far as I know, when sending a video file, you need to encode it, and serialize it to send the data. (maybe encoding is serializing in this case?) Did I understand correctly? If I am missing something, could you give me some tips and guidance?
Netflix uses Kafka as part of its centralized data lineage solution. It is not using Kafka to encode, stream video contents. You can read more about how Kafka is being used here.
Now to answer your question on why its video streaming services are so fast. You'll need to understand how Netflix leverages aws resources like ec2, s3 and others to create a highly scalable, fault-tolerant microservice architecture.
On top of this Netflix works with ISPs to localize contents using a program called Netflix Open Connect. This allows them to cache the content locally which minimizes latency and saves on compute.
Kafka is a "Streaming Platform" but it's intended for streaming data and it's not designed to stream videos or audio.
While Netflix is using Kafka, it's not to stream videos to users but instead to process events in their backend, see their technology blog. Note that I'm not a Netflix employee nor I have any insider knowledge, it's just based on the information they disclosed publicly on their blog and at conferences.
That said, it's still possible to send a video file using a producer and receive it with a consumer but I don't think it's what you had in mind.
Studying kafka in the documentation I found next sentence:
Queuing is the standard messaging type that most people think of: messages are produced by one part of an application and consumed by another part of that same application. Other applications aren't interested in these messages, because they're for coordinating the actions or state of a single system. This type of message is used for sending out emails, distributing data sets that are computed by another online application, or coordinating with a backend component.
It means that Kafka topics aren't suitable for streaming data to external applications. However, in our application, we use Kafka for such purpose. We have some consumers which read messages from Kafka topics and try to send them to an external system. With such approach we have a number of problems:
Need a separet topic for each external application (assume that the number of external application numbers > 300, doesn't suite well)
Messages to an external system can fail when the external application is unavailable or for some another reason. It is incorrect to keep retrying to send the same message and not to commit offset. Another way there is no nicely configured log when I can see all fail messages and try to resend them.
What are other best practice approach to stream data to an external application? OR Kafka is a good choice for the purpose?
Just sharing a piece of experience. We use Kafka extensively for integrating external applications in the enterprise landscape.
We use topic-per-event-type pattern. The current number of topics is about 500. The governance is difficult but we have our own utility tool, so it is feasible.
Where possible we extend an external application to integrate with Kafka. So the consumers become a part of the external application and when the application is not available they just don't pull the data.
If the extension of the external system is not possible, we use connectors, which are mostly implemented by us internally. We distinguish two type of errors: recoverable and not recoverable. If the error is not recoverable, for example, the message is corrupted or not valid, we log the error and commit the offset. If the error is recoverable, for example, the database for writing the message is not available then we do not commit the message, suspend consumers for some period of time and after that period try again. In your case it is probably makes sense to have more topics with different behavior (logging errors, rerouting the failed messages to different topics and so on)
I am new to Kafka and data streaming and need some advice for the following requirement,
Our system is expecting close to 1 million incoming messages per day. The message carries a project identifier. The message should be pushed to users of only that project. For our case, lets say we have projects A, B and C. Users who opens project A's dashboard only sees / receives messages of project A.
This is my idea so far on implementing solution for the requirement,
The messages should be pushed to a Kafka Topic as they arrive, lets call this topic as Root Topic. The messages once pushed to the Root Topic, can be read by a Kafka Consumer/Listener and based on the project identifier in the message can push that message to a project specific Topic. So any message can end up at Topic A or B or C. Thinking of using websockets to update the message as they arrive on the project users' dashboards. There will be N Consumers/Listeners for the N project Topics. These consumers will push the project specific message to the project specifc websocket endpoints.
Please advise if I can make any improvements to the above design.
Chose Kafka as the messaging system here as it is highly scalable and fault tolerant.
There is no complex transformation or data enrichment before it gets sent to the client. Will it makes sense to use Apache Flink or Hazelcast Jet for the streaming or Kafka streaming is good enough for this simple requirement.
Also, when should I consider using Hazelcast Jet or Apache Flink in my project.
Should i use Flink say when I have to update few properties in the message based on a web service call or database lookup before sending it to the users?
Should I use Hazelcast Jet only when I need the entire dataset in memory to arrive at a property value? or will using Jet bring some benefits even for my simple use case specified above. Please advise.
Kafka Streams are a great tool to convert one Kafka topic to another Kafka topic.
What you need is a tool to move data from a Kafka topic to another system via web sockets.
Stream processor gives you a convenient tooling to build this data pipeline (among others connectors to Kafka and web sockets and scalable, fault-tolerant execution environment). So you might want use stream processor even if you don't transform the data.
The benefit of Hazelcast Jet is it's embedded scalable caching layer. You might want to cache your database/web service calls so that the enrichment is performed locally, reducing remote service calls.
See how to use Jet to read from Kafka and how to write data to a TCP socket (not websocket).
I would like to give you another option. I'm not Spark/Jet expert at all, but I've studying them for a few weeks.
I would use Pentaho Data Integration(kettle) to consume from the Kafka and I would write a kettle step (or User Defined Java Class step) to write the messages to a Hazelcast IMAP.
Then, would use this approach http://www.c2b2.co.uk/middleware-blog/hazelcast-websockets.php to provided the Websockets for the end-users.
So I am currently working on a chat, and I wonder if I could use Redis to store the chat messages. The messages will be only at the web and I want at least a chat history of 20 messages for each private chat. The Chats subscribers will be already stored in MongoDB.
I mainly want to use Redis, because I get rid of the MongoDB stuff, for more speed.
I already use Pub/Sub, but what about storing a copy in Redis Lists? Also what about reading statuses, how could I implement that?
Redis only loses data in case of power outage, if the system is shutdown properly, it will save its data and in this case, data won't be lost.
It is good approach to dump data from redis to mongoDb/anyotherDb when a size limit is reached or on date basis (weekly or monthly) so that your realtime chat database stays light weighted.
Many modern systems now a days prepare for power outage, a ups will run and the system will shutdown properly.
see : https://hackernoon.com/how-to-shutdown-your-servers-in-case-of-power-failure-ups-nut-co-34d22a08e92
Also what about reading statuses, how could I implement that?
Depends on protocol you are implementing, if you are using xmpp, see this.
Otherwise, you can use a property in message model for e.g "DeliveryStatus" and set it to your enums (1. Sent, 2. Delivered, 3. Read). Mark message as Sent as soon as it is received at server. For Delivered and Read, your clients will send you back packets indicating the respective action has occurred.
As pointed in the comment above, the important thing to consider here is the persistency model. Redis offers some persistency (with snapshots and aof-files). The important thing is to first understand what you need:
can you afford to lose all the data? can you afford to lose some of the data? if the answer is no, then perhaps you should not bother with redis.