Spring Cloud Stream flow as one application - spring-cloud

As far as I know there is an option to use couple components of Spring Cloud Stream as one application by using AggregateApplication or AggregateApplicationBuilder.
From what I understood, spring will not use broker (Rabbit or Kafka) for communication between steps in this situation it will just pass result from previous step as an argument to the next almost directly, am I right?
If I am, is there another way to have running more components in one instance of an application with usage of a broker? I'm aware that this is not an architecture which is great for Cloud Stream, but now I don't have an infrastructure in which I can run Dataflow and also I would like to use durability of a brokers.

In general, aggregation has been designed as a replacement for communication over a message broker - to reduce the latency by avoiding to go over a hop. That being said, it may make sense to add an option of have the channels bound for use cases like yours. Can you open a feature request in GitHub, please?

Related

Expected problems and limitations of implementing Kafka Producer in Azure Function

I have a rather high-level, architectural question which might not have a 100% clear answer. We're currently thinking about implementing a Kafka Producer within Azure Functions, as opposed to having a dedicated Producer client running in some container. The Azure Function would be invoked by some REST API call which includes the payload. The alternative solution would require something similar, the Producer application would have some custom API endpoint exposed via some Java-based framework to take data in that are then passed to Kafka via Producer API - a constantly running Java application on some container (and if necessary, redundant for parallelism).
My gut feeling tells me this approach with Azure Functions might not be a good practice, because as far as I'm aware the Producer concept in Kafka is more something "continuous" rather than something instantiated "per record" and not as short-lived as an Azure Function, which may be instantiated thousands of times in a short period of time. This approach seems unintuitive to me, as we would invoke a whole Producer lifecycle for each incoming record, generating a lot of additional network traffic to our Kafka cluster and potentially result in message ordering being arbitrary (for some use cases negligible), disregarding the fact that it's probably a quite expensive solution.
But I could also be completely mistaken, maybe it is good/best practice and there are no significant downsides regarding the concerns I mentioned. Technically, the Azure Functions approach should be scalable way easier and depending on the load, it could actually be cheaper to invoke X Azure Functions instead of having a 24/7 running producer, but that is highly dependent on the use case. Also operations in the "custom Producer" case are something that need to be taken into account, serverless does not require this kind of considerations regarding operations/deployment/maintenance.
Any thoughts or experiences on this?
No, producers aren't necessarily continuous. If you've used kafka-console-producer, then you'd know this. Lambda/Function methods are no different.
Plus, Java is not necessary. Save yourself some costs/speed and don't trigger a JVM startup within a serverless function. Or, if you do, then compile a native binary using GraalVM (Quarkus or Spring Native can help with this)

Publish to Apache Kafka topic from Angular front end

I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.
this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.

Is it me or are DynamoDb Streams just really lacking?

I have an application running in multiple regions in AWS, this application reads from global DynamoDb table(s). Updates occur in the background via another process and I wanted to be able to be able to monitor for these updates so the application can invalidate its cache (I'm not using DAX).
I was thinking I could use DynamoDb streams for this, however; after going through a number of road blocks with Spring Kinesis Streams Binder (e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry] be created, my company doesn't allow dynamic creation of tables (so that was fun to hunt down as I couldn't find any mention in the docs - 🤷‍♀️ maybe I missed it). Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
Is that true?
Is there a way
Is there a way for multiple applications, that only read from DynamoDb, to get notified when an update occurs? I was thinking that I could use DynamoDb Streams such that each app would monitor the stream for updates and be able to invalidate their cache. If the above is true, then I need to do something more involved or complex (use a SNS/SQS for updates, elasticache, Redis, Kafka) which just seems like overkill for this scenario.
e.g. the fact that it requires 2 tables [SpringIntegrationMetadataStore & SpringIntegrationLockRegistry]
Well, that's how consumer group management is handled by Spring Cloud Stream Kinesis Binder. Even if you would use only a KCL, it still would require from you extra table in DynamoDB. Therefore your concern sounds more like a lack of confidence in cloud services you use.
Now I think I have found out that only 1 application can listen to a Kinesis stream at a time?
That's not true if all your consumer applications are configured for different consumer groups.
Please, make yourself familiar with Spring Cloud Stream and its model: https://docs.spring.io/spring-cloud-stream/docs/3.1.1/reference/html/spring-cloud-stream.html#_main_concepts
Another way probably could be done via AWS Lambda trigger for DynamoDB Streams: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

architecture pattern for microservices

I have a microservices architecture whose logs have to be sent to a remote Kafka topic.
Next to it, the consumer of this topic will send the logs to an ELK stack (an other team)
I want to have a dedicated microservice (fwk-proxy-elasticsearch) whose responsability is to collec the logs from the others one and send them to the remote kafka topic.
what's the best protocol to dispatch all the logs aggregated from my microservices to the fwk-proxy-elasticsearch microservice ?
I want this pattern to not duplicate the security configuration of the remote kafka topic. I want to centralize it in a single place.
May I use vertx event bus for that ? or kafka is beter ? or someother tool ?
May I use vertx to send message from jvm to jvm ?
Moreover, in a microservice architecture, is it a good pattern to centralize a use case in a dedicated microservice? (remote http connection for example)
On my point of view, it allows business microservices to focus on a business issue and not to worry over the protocol that the result has to be sent.
Thanks!
I believe you can use both Vert.x event bus and Kafka to propagate the logs, there are pros and cons on each approach.
While I understand the reasoning behind this decision, I would still consider a dedicated solution built for this purpose, like Fluentd, which is able to aggregate the logs and push them into multiple sources (including Kafka, via the dedicated plugin). I'm sure there are other similar solutions.
There are a couple of important benefits that I see if you use a dedicated solution, instead of building it yourself:
The level of configurability, which is definitely useful in the future (in a dedicated solution, you need to write code each time you want to build something new)
The number of destinations where you can export the logs
Support for a hybrid architecture - with a few config updates, you will be able to grab logs from non-JVM microservices

Kafka user - project design advise

I am new to Kafka and data streaming and need some advice for the following requirement,
Our system is expecting close to 1 million incoming messages per day. The message carries a project identifier. The message should be pushed to users of only that project. For our case, lets say we have projects A, B and C. Users who opens project A's dashboard only sees / receives messages of project A.
This is my idea so far on implementing solution for the requirement,
The messages should be pushed to a Kafka Topic as they arrive, lets call this topic as Root Topic. The messages once pushed to the Root Topic, can be read by a Kafka Consumer/Listener and based on the project identifier in the message can push that message to a project specific Topic. So any message can end up at Topic A or B or C. Thinking of using websockets to update the message as they arrive on the project users' dashboards. There will be N Consumers/Listeners for the N project Topics. These consumers will push the project specific message to the project specifc websocket endpoints.
Please advise if I can make any improvements to the above design.
Chose Kafka as the messaging system here as it is highly scalable and fault tolerant.
There is no complex transformation or data enrichment before it gets sent to the client. Will it makes sense to use Apache Flink or Hazelcast Jet for the streaming or Kafka streaming is good enough for this simple requirement.
Also, when should I consider using Hazelcast Jet or Apache Flink in my project.
Should i use Flink say when I have to update few properties in the message based on a web service call or database lookup before sending it to the users?
Should I use Hazelcast Jet only when I need the entire dataset in memory to arrive at a property value? or will using Jet bring some benefits even for my simple use case specified above. Please advise.
Kafka Streams are a great tool to convert one Kafka topic to another Kafka topic.
What you need is a tool to move data from a Kafka topic to another system via web sockets.
Stream processor gives you a convenient tooling to build this data pipeline (among others connectors to Kafka and web sockets and scalable, fault-tolerant execution environment). So you might want use stream processor even if you don't transform the data.
The benefit of Hazelcast Jet is it's embedded scalable caching layer. You might want to cache your database/web service calls so that the enrichment is performed locally, reducing remote service calls.
See how to use Jet to read from Kafka and how to write data to a TCP socket (not websocket).
I would like to give you another option. I'm not Spark/Jet expert at all, but I've studying them for a few weeks.
I would use Pentaho Data Integration(kettle) to consume from the Kafka and I would write a kettle step (or User Defined Java Class step) to write the messages to a Hazelcast IMAP.
Then, would use this approach http://www.c2b2.co.uk/middleware-blog/hazelcast-websockets.php to provided the Websockets for the end-users.