Orleans Inter-Silo Communication using streams - distributed-computing

I have two Silos Silo A and Silo B each having its own distinct grain and interface. I need to have them both communicate via streams. How possible is this? I am running locally for now and using SMS InMemory storage. within the same silo streaming works, how can this be achieved across silos

If you have grain a on silo A that wants to communicate via stream with grain a on silo B, just produce to a stream from a and subscribe to consume from b, and it will work.

Related

Which messaging system for a web dashboard?

I would like to make a Web Dashboard system and I am facing a problem. I need to get an information that is in the cache of one of the instances of my program, for this I had thought of doing Pub/Sub with Kafka however I don't know how to do to Publish and get a response from one of my Subscriber. Do you know a pattern that allows this and a service that allows me to do this?
EDIT: I would like to design an infrastructure that follows this pattern:
Attached diagram is showing simple request->response flow, Kafka is designed for different types of architecture, so IMHO you should not focus on Kafka in this case.
However, if you still want to use Kafka for some other reasons I can suggest to you two options:
Stick with request->response flow and use ReplyingKafkaTemplate or AggregatingKafkaTemplate to handle it, second one is an extension of first one, this adds functionality to handle more responses then one. You can send a request to Kafka topic from the Dashboard application, then poll the message by one of the Bot instances, next, send reply to reply topic, and then process reply in Dashboard application.
Use Kafka to implement Event-Carried State Transfer pattern, move state (mutual guilds data) from Bot Instances directly to Dashboard application via Kafka topic. You can use several tools to implement this:
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate, then use one of the Kafka Connect sink connectors to save data in Dashboards database.
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate. Run Kafka Streams thread in Dashboard application and build a state using Kafka Streams functionalities - grouping, aggregating etc. Then read the state directly from Kafka Streams internal RocksDB database.

Is it me or are DynamoDb Streams just really lacking?

I have an application running in multiple regions in AWS, this application reads from global DynamoDb table(s). Updates occur in the background via another process and I wanted to be able to be able to monitor for these updates so the application can invalidate its cache (I'm not using DAX).
I was thinking I could use DynamoDb streams for this, however; after going through a number of road blocks with Spring Kinesis Streams Binder (e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry] be created, my company doesn't allow dynamic creation of tables (so that was fun to hunt down as I couldn't find any mention in the docs - 🤷‍♀️ maybe I missed it). Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
Is that true?
Is there a way
Is there a way for multiple applications, that only read from DynamoDb, to get notified when an update occurs? I was thinking that I could use DynamoDb Streams such that each app would monitor the stream for updates and be able to invalidate their cache. If the above is true, then I need to do something more involved or complex (use a SNS/SQS for updates, elasticache, Redis, Kafka) which just seems like overkill for this scenario.
e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry]
Well, that's how consumer group management is handled by Spring Cloud Stream Kinesis Binder. Even if you would use only a KCL, it still would require from you extra table in DynamoDB. Therefore your concern sounds more like a lack of confidence in cloud services you use.
Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
That's not true if all your consumer applications are configured for different consumer groups.
Please, make yourself familiar with Spring Cloud Stream and its model: https://docs.spring.io/spring-cloud-stream/docs/3.1.1/reference/html/spring-cloud-stream.html#_main_concepts
Another way probably could be done via AWS Lambda trigger for DynamoDB Streams: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

Kafka Streams Architecture

I am hoping to clarify a few ideas on Kafka Streams from an architectural standpoint.
I understand the stream processing and data enrichment uses, and that the data can be reused by other applications if pushed back into Kafka, but what is the correct implementation of a Streams Application?
My initial thoughts would be to create an application that pulls in a table, joins it to a stream, and then fires off an event for each entry rather than pushing it back into Kafka. If multiple services use this data, then each would materialize their own table, right?
And I haven't implemented a test application yet, which may answer some of these questions, but I think is a good place for planning. Basically, where should the event be triggered, in the streaming app or in a separate consumer app?
My initial thoughts would be to create an application that pulls in a table, joins it to a stream, and then fires off an event for each entry rather than pushing it back into Kafka.
In an event-driven architecture, where would the application send the events to (and how), if you think that Kafka topics shouldn't be the destination for sharing the events with other apps? Do you have other preferences?
If multiple services use this data, then each would materialize their own table, right?
Yes, that is one option.
Another option is to use the interactive queries feature in KStreams (aka queryable state), which allows your first application to expose its tables and state stores to other applications directly (e.g., via a REST API). Other apps would then not need to materialize their own tables. However, an architectural downside is that you have now a direct coupling between your first app and any other downstream applications through request-response communication. While this pattern of direct inter-service communication is popular for a microservices architecture, a compelling alternative is to not use direct communication but instead let microservices/apps communicate indirectly with each other via Kafka (i.e., to use the previous option).
Basically, where should the event be triggered, in the streaming app or in a separate consumer app?
This is a matter of preference, see above. To inform your thinking you may want to read the 4-part mini series about event-driven architectures with Kafka: https://www.confluent.io/blog/journey-to-event-driven-part-1-why-event-first-thinking-changes-everything (disclaimer: this blog series was written by a colleague of mine).

Kafka user - project design advise

I am new to Kafka and data streaming and need some advice for the following requirement,
Our system is expecting close to 1 million incoming messages per day. The message carries a project identifier. The message should be pushed to users of only that project. For our case, lets say we have projects A, B and C. Users who opens project A's dashboard only sees / receives messages of project A.
This is my idea so far on implementing solution for the requirement,
The messages should be pushed to a Kafka Topic as they arrive, lets call this topic as Root Topic. The messages once pushed to the Root Topic, can be read by a Kafka Consumer/Listener and based on the project identifier in the message can push that message to a project specific Topic. So any message can end up at Topic A or B or C. Thinking of using websockets to update the message as they arrive on the project users' dashboards. There will be N Consumers/Listeners for the N project Topics. These consumers will push the project specific message to the project specifc websocket endpoints.
Please advise if I can make any improvements to the above design.
Chose Kafka as the messaging system here as it is highly scalable and fault tolerant.
There is no complex transformation or data enrichment before it gets sent to the client. Will it makes sense to use Apache Flink or Hazelcast Jet for the streaming or Kafka streaming is good enough for this simple requirement.
Also, when should I consider using Hazelcast Jet or Apache Flink in my project.
Should i use Flink say when I have to update few properties in the message based on a web service call or database lookup before sending it to the users?
Should I use Hazelcast Jet only when I need the entire dataset in memory to arrive at a property value? or will using Jet bring some benefits even for my simple use case specified above. Please advise.
Kafka Streams are a great tool to convert one Kafka topic to another Kafka topic.
What you need is a tool to move data from a Kafka topic to another system via web sockets.
Stream processor gives you a convenient tooling to build this data pipeline (among others connectors to Kafka and web sockets and scalable, fault-tolerant execution environment). So you might want use stream processor even if you don't transform the data.
The benefit of Hazelcast Jet is it's embedded scalable caching layer. You might want to cache your database/web service calls so that the enrichment is performed locally, reducing remote service calls.
See how to use Jet to read from Kafka and how to write data to a TCP socket (not websocket).
I would like to give you another option. I'm not Spark/Jet expert at all, but I've studying them for a few weeks.
I would use Pentaho Data Integration(kettle) to consume from the Kafka and I would write a kettle step (or User Defined Java Class step) to write the messages to a Hazelcast IMAP.
Then, would use this approach http://www.c2b2.co.uk/middleware-blog/hazelcast-websockets.php to provided the Websockets for the end-users.

how to design a realtime database update system?

I am designing a whatsapp like messenger application for the desktop using WPF and .Net. Now, when a user creates a group I want other members of the group to receive a notification that they were added to a group. My frontend is built in C#.Net, which is connected to a RESTful Webservice (Ruby on Rails). I am using Postgres for the database. I also have a Redis layer to cache my rails models.
I am considering the following options.
1) Use Postgres's inbuilt NOTIFY/LISTEN mechanism which the clients can subscribe to directly. I foresee two issues here
i) Postgres might not be able to handle 10000's of clients subscribed directly.
ii) There is no guarantee of delivery if the client is disconnected
2) Use Redis' Pub/Sub mechanism to which the clients can subscribe. I am still concerned with no guarantee of delivery here.
3) Use a messaging queue like RabbitMQ. The producer of this queue will be postgres which will push in messages through triggers. The consumer of-course will be the .Net clients.
So far, I am inclined to use the 3rd option.
Does anyone have any suggestions how to design this?
In an application like WhatsApp itself, the client running in your phone is an integral part of a large and complex event-based, distributed system.
Without more context, it would be impossible to point in the right direction. That said:
For option 1: You seem to imply that each client, as in a WhatsApp client, would directly (or through some web service) communicate with Postgres as an event bus, which is not sound and would not scale because you can only have ONE Postgres instance.
For option 2: You have the same problem that in option 1 with worse failure modes.
For option 3: RabbitMQ seems like a reasonable ally here. It is distributed in nature and scales well. As a matter of fact, it runs on erlang just as most of WhatsApp does. Using triggers inside Postgres to publish messages however does not make a lot of sense.
You need a message bus because you would have lots of updates to do in the background, not to directly connect your users to each other. As you said, clients can be offline.
Architecture is more about deferring decisions than taking them.
I suggest that you start simple. Build a small, monolithic, synchronous system first, pushing updates as persisted data to all the involved users. For example; In a group of n users, just write n records to a table. It is already complicated to reliably keep track of who has received and read what.
This heavy "group" updates can then be moved to long-running processes using RabbitMQ or the like, but a system with several thousand users can very well work without such thing, especially because a simple message from user A to user B would not need many writes.