I am creating a new desktop application in WPF. Objective of the application is to read data from a device and show real time charts in WPF clients. At the same time I want to save incoming data to database. There are few other operation to be performed on incoming data in parallel. The connected device produce data at around 1000 messages per second. I was trying to evaluate different message queuing systems for this scenario and came across Apache Kafka as one of the alternative. I have few questions regarding the same:
Is this even valid use case for Kafka? In other words, should we even use Kafka for desktop applications?
Are there any example project/POC for the same? I cannot find any such examples.
What about problem of shipping the application? Since we are not going to have any central server for desktop applications, we have to run / deploy kafka on each user's system.
Related
I want to stream financial data (trades, orderbook) from an exchange websocket endpoint and store that data somewhere to build up my own data history for backtesting purposes. Furthermore I might want to analyze the data in real time.
I found the idea of an event driven system very interesting so that I ended up building my own dockerized confluent Kafka cluster (with avro schema-registry) and a python producer that sends the streaming data into a Kafka topic. Then I set up a Faust app to stream process the data and store it as a new topic in Kafka.
It's working fine on my laptop, but now I'm wondering how I could put this to production? Obviously I cannot do it on my laptop, because I need this application to run 24/7 without interruptions.
When I look at the fully managed Kafka cloud solutions like confluent then I find it quite expensive, especially as I'm not running a business, it's rather a private hobby project. And maybe I don't even need that kind of highly scalable and professional service.
What could be a cost efficient approach for me to get my streaming and storage application to work?
Is there another Kafka cloud solution more reduced to my needs?
Should I set up my own server? Maybe raspberry pi?
Or should I use a different approach?
I'm sorry if my problem description might not be very specific, it's a reflection of me being overwhelmed with all these system architecture questions and cloud services.
Any advice and recommendation are appreciated!
I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.
this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.
I am using Kafka distributed system for message processing in spring boot application. Now my application are producing messages on even basic to three different different topics. There is one separate spring boot application which will be used by some data analysis team who will analysis the data. This application is a simple report type application with only one filter Topic.
Now I have to implement this but I am little bit confused how I will show the data to the UI. I have written listeners (Consumers) who are consuming the messages but how I will show the data to the UI on real time basic. Should I need to store it in some database like redis and then show this data to UI? Is this the correct way to deal with consumer in Kafka? Will it not be slow? As messages can grow drastically over the time.
In nutshell I want to know to how we can show messages on any UI in the efficient way and in real time.
Thanks
You can write a consumer to forward to a websocket.
Or you can use Kafka Connect to write to a database, then write a REST API
Or use Kafka Streams Interactive Queries feature + add a RPC layer on top for Javascript to call
Studying kafka in the documentation I found next sentence:
Queuing is the standard messaging type that most people think of: messages are produced by one part of an application and consumed by another part of that same application. Other applications aren't interested in these messages, because they're for coordinating the actions or state of a single system. This type of message is used for sending out emails, distributing data sets that are computed by another online application, or coordinating with a backend component.
It means that Kafka topics aren't suitable for streaming data to external applications. However, in our application, we use Kafka for such purpose. We have some consumers which read messages from Kafka topics and try to send them to an external system. With such approach we have a number of problems:
Need a separet topic for each external application (assume that the number of external application numbers > 300, doesn't suite well)
Messages to an external system can fail when the external application is unavailable or for some another reason. It is incorrect to keep retrying to send the same message and not to commit offset. Another way there is no nicely configured log when I can see all fail messages and try to resend them.
What are other best practice approach to stream data to an external application? OR Kafka is a good choice for the purpose?
Just sharing a piece of experience. We use Kafka extensively for integrating external applications in the enterprise landscape.
We use topic-per-event-type pattern. The current number of topics is about 500. The governance is difficult but we have our own utility tool, so it is feasible.
Where possible we extend an external application to integrate with Kafka. So the consumers become a part of the external application and when the application is not available they just don't pull the data.
If the extension of the external system is not possible, we use connectors, which are mostly implemented by us internally. We distinguish two type of errors: recoverable and not recoverable. If the error is not recoverable, for example, the message is corrupted or not valid, we log the error and commit the offset. If the error is recoverable, for example, the database for writing the message is not available then we do not commit the message, suspend consumers for some period of time and after that period try again. In your case it is probably makes sense to have more topics with different behavior (logging errors, rerouting the failed messages to different topics and so on)
I am designing a whatsapp like messenger application for the desktop using WPF and .Net. Now, when a user creates a group I want other members of the group to receive a notification that they were added to a group. My frontend is built in C#.Net, which is connected to a RESTful Webservice (Ruby on Rails). I am using Postgres for the database. I also have a Redis layer to cache my rails models.
I am considering the following options.
1) Use Postgres's inbuilt NOTIFY/LISTEN mechanism which the clients can subscribe to directly. I foresee two issues here
i) Postgres might not be able to handle 10000's of clients subscribed directly.
ii) There is no guarantee of delivery if the client is disconnected
2) Use Redis' Pub/Sub mechanism to which the clients can subscribe. I am still concerned with no guarantee of delivery here.
3) Use a messaging queue like RabbitMQ. The producer of this queue will be postgres which will push in messages through triggers. The consumer of-course will be the .Net clients.
So far, I am inclined to use the 3rd option.
Does anyone have any suggestions how to design this?
In an application like WhatsApp itself, the client running in your phone is an integral part of a large and complex event-based, distributed system.
Without more context, it would be impossible to point in the right direction. That said:
For option 1: You seem to imply that each client, as in a WhatsApp client, would directly (or through some web service) communicate with Postgres as an event bus, which is not sound and would not scale because you can only have ONE Postgres instance.
For option 2: You have the same problem that in option 1 with worse failure modes.
For option 3: RabbitMQ seems like a reasonable ally here. It is distributed in nature and scales well. As a matter of fact, it runs on erlang just as most of WhatsApp does. Using triggers inside Postgres to publish messages however does not make a lot of sense.
You need a message bus because you would have lots of updates to do in the background, not to directly connect your users to each other. As you said, clients can be offline.
Architecture is more about deferring decisions than taking them.
I suggest that you start simple. Build a small, monolithic, synchronous system first, pushing updates as persisted data to all the involved users. For example; In a group of n users, just write n records to a table. It is already complicated to reliably keep track of who has received and read what.
This heavy "group" updates can then be moved to long-running processes using RabbitMQ or the like, but a system with several thousand users can very well work without such thing, especially because a simple message from user A to user B would not need many writes.