Use Kafka in AWS EBS microservice docker environment to avoid losing user requests and handle more concurrent hits - scala

Currently, I am using AWS EBS microservice docker environment to deploy the micro-services which are written in Scala and Akka. If anyone of the microservice docker is crashed and restarted again. In this case, we will lose the user requests and service will not return any response for those cases. My current architecture can handle up to 1000 concurrent requests without any issues. To avoid this issues, I am planning to store and retrieve all the requests and responses using Kafka.
So I want to use Kafka to manage the request and responses of all my web services and include a separate service or web socket to process all the requests and store the responses again to Kafka. In this case, if my core process docker crashed or restarted. It won't lose any request and responses at any point in time. It will again start to read the requests from Kafka and process it.
All the web services will store the request in relevant topic in Kafka and get the response from relevant response topic and return back to an API response. I have found the following library to use Kafka in Scala web services.
https://github.com/akka/reactive-kafka/
Please check the attached architecture diagram which I am going to use it to efficiently handle a large number of concurrent requests from client apps. Is it a good approach to proceed? Do I need to changes anything in my architecture?
I have created this architecture after done more research about Kafka and microservice dockers. Please let me know if anything wrong with this architecture.

This is Kafka's bread and butter so I don't think you're going to run into any architectural issues with this. Just be aware that there is a pretty large amount of operational overhead with Kafka. A good resource for getting started is Kafka: The Definitive Guide written by Confluent (https://www.confluent.io/wp-content/uploads/confluent-kafka-definitive-guide-complete.pdf) . It goes into detail on a lot of common operational issues that they don't mention in the documentation.

Related

Ressource announcement - Message queing registry on Kubernetes

I have an application which makes use of RabbitMQ messages - it sends messages. Other applications can react on these messages but they need to know which messages are available on the system and what they semantically mean.
My message queuing system is RabbitMQ and RabbitMQ as well as the applications are hosted and administered using Kubernetes.
I am aware of
https://kubemq.io/: That seems to be an alternative to RabbitMQ?
https://knative.dev/docs/eventing/event-registry/ also an alternative to RabbitMQ? but with a meta-layer approach to integrate existing event sources? The documentation is not clear for me.
Is there a general-purpose "MQ-interface service availabe, a solution, where I can register which messages are sent by an application, how the payload is technically and semantically set up, which serialization format is used and under what circumstances errors will be sent?
Can I do this in Kubernetes YAML-files?
RabbitMQ does not have this in any kind of generic fashion so you would have to write it yourself. Rabbit messages are just a series of bytes, there is no schema registry.

Alternative of Confluent REST Proxy

We have some applications which want to communicate with Kafka using REST API calls to both consume and produce messages. If we do not want to use Confluent REST Proxy, what are the options ?
One possible alternative is the Strimzi Kafka Bridge (https://github.com/strimzi/strimzi-kafka-bridge).
It's part of the broader Strimzi project about running Kafka on Kubernetes but work even running as standalone (when your Kafka cluster is on bare metal).
Of course it's open source and Apache 2.0 licensed.
the reason [not to use it] is monetary
You can use the Confluent REST Proxy with no software/licensing costs.
We are thinking of not buying any additional hardware for this new request and use existing configuration to meet the requirement.I am mostly interested to know if consumer/producer can be created to meet this requirement
You don't need extra hardware.
Pick an existing server with at least 2GB available of memory, and run kafka-rest-start and see how well it works
if we can create Rest-API calls which will be used by other applications to consume data from Kafka and push data to Kafka
That's the main purpose of REST Proxy, yes.

Monitoring UI for Apache kafka - kafka manager vs kafka monitor [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 3 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I am new to kafka. We want to monitor and manage kafka topics. We tried different open source monitoring tools like
kafka-monitor
kafka-manager
Both tools are good. But we are unable to make a decision which should be included in our deployment stack. Which one is better and why, and in which scenario?
'kafka manager' from yahoo looks the older one and 'kafka monitor' from LinkedIn is newer one
Kafka Monitor-
Lenses
Lenses (ex Landoop) enhances Kafka with User Interface, streaming SQL engine and cluster monitoring. It enables faster monitoring of Kafka data pipelines.
They provide a free all-in-one docker (Lenses Box) which can serve a single broker for up to 25M messages. Note that this is recommended for development environments.
Cloudera SMM
Streams Messaging Manager is the solution for monitoring and managing clusters running Cloudera or Hortonworks kafka. It also comes with replication capability.
Confluent
Another option is Confluent Enterprise which is a Kafka distribution for production environments. It also includes Control Centre, which is a management system for Apache Kafka that enables cluster monitoring and management from a User Interface.
Yahoo CMAK (Cluster Manager for Apache Kafka, previously known as Kafka Manager)
Kafka Manager or CMAK is a tool for monitoring Kafka offering less functionality compared to the aforementioned tools.
KafDrop
KafDrop is a UI for monitoring Apache Kafka clusters. The tool displays information such as brokers, topics, partitions, and even lets you view messages. It is a lightweight application that runs on Spring Boot and requires very little configuration.
LinkedIn Burrow
Burrow is a monitoring companion for Apache Kafka that provides consumer lag checking as a service without the need for specifying thresholds. It monitors committed offsets for all consumers and calculates the status of those consumers on demand. An HTTP endpoint is provided to request status on demand, as well as provide other Kafka cluster information. There are also configurable notifiers that can send status out via email or HTTP calls to another service.
Kafka Tool
Kafka Tool is a GUI application for managing and using Apache Kafka clusters. It provides an intuitive UI that allows one to quickly view objects within a Kafka cluster as well as the messages stored in the topics of the cluster. It contains features geared towards both developers and administrators.
If you cannot afford licenses, then go for Yahoo Kafka Manager, LinkedIn Burrow or KafDrop. Confluent's and Landoop's products are the best out there, but unfortunately, they require licensing.
For more details, you can refer to my blog post Overview of UI Monitoring tools for Apache Kafka Clusters.
If you want to pay for licensing and Kafka cluster support, then you can use Confluent Control Center
Alternatively, the free route would be to use JMX exporters from Datadog and/or Prometheus/Influxdb (with Grafana dashboards) to see overall system health checks (CPU, network, memory, etc)... Much more information than what you get only by monitoring Kafka processes with Kafka tools
At my company, we used the Yahoo product, we investigated the LinkedIn product, and several others mentioned. My company ultimately chose to use Prometheus+Grafana. Everyone loves it and I'd highly recommend it.
There are two big advantages to Prometheus+Grafana. The first is it does full featured Kafka metrics ingestion+visualization+alerting but it's not limited to Kafka. While our initial needs were just to monitor Kafka, we also wanted metrics on HTTP servers+traffic, server utilization (cpu/ram/disk), and custom application level metrics. Prometheus handles all of the above. Secondly, Prometheus + Grafana are very high quality, well designed, and easy to use. A lot of other products in this space are old and complicated to work with. Prometheus + Grafana are both excellent to work with, they are very customizable, polished, and easy to use. Grafana has a very flashy + functional JavaScript interface that lets you make exactly the customized dashboards that you want. Prometheus has a very polished metric collection engine, storage engine, query language, and alerting system. Something like Yahoo Kafka Manager has much more limited functionality in all of these categories.
If you want to try Prometheus, you need to do two things:
1) install+configure the JMX->Prometheus exporter on your Kafka brokers:
https://github.com/prometheus/jmx_exporter
2) Setup a Prometheus server to collect metrics + and setup a Grafana dashboard to display the graphs that you want.
I'd also say that this is just for monitoring+dashboards+alerting. For management functions, you still need other tools.
The kafka-monitor is (despite the name) a load generation and reporting tool. Yahoo's kafka-manager is an overall monitoring tool.

Is a web frontend producing directly to a Kafka broker a viable idea?

I have just started learning Kafka. So trying to build a social media web application. I am fairly clear on how to use Kafka for my backend ( communicating from backend to databases and other services).
However, I am not sure how should frontend communicate with backend. I was considering an architecture as: Frontend -> Kafka -> Backend.
Frontend acts as producer and backend as consumer. In this case, frontend would supposedly have all required resources to publish to Kafka broker (even if I implement security on Kafka). Now, is this scenario possible:
Lets say I impersonate the frontend and send absurd/invalid messages to my Kafka broker. Now I can handle and filter these messages when they reach to my backend. But I know that Kafka stores these messages temporarily. Wouldn't my Kafka server face DDOS problems if such "fake" messages are published to it in high volume, since it is gonna store them anyway as they dont get filtered out until they actually get consumed by backend?
If so, how can I prevent this?
Or is this not a good option? I can also try using REST for frontend/backend communication and then Kafka will be used from backend to communicate with database(s) and other stuff.
Or I can have a middleware (again, REST) that detects and filters out such messages.
Easiest way is to have the front end produce to the Kafka REST Proxy
See details here https://docs.confluent.io/1.0/kafka-rest/docs/intro.html
That way there is no kafka client code required in your front end and you can use HTTP(S) with standard off the shelf load balancers, and API Management tools.
Could you not consider the other direction, to use Kafka as a transport system for updating assets available to frontend ? This has been proposed for hybrid React / NodeJS/Express solutions.

ActiveMQ Deployment model

I have gone through (not fully) ActiveMQ and tried to figure out the deployment model for my application.
I am bit confused on that.
I want to make the system High Availability and decided to use the following. Please correct me if anything is wrong or disadvantage of the model.
Deployment Modle:
Will deploy Brokers in M1 and M2 respectivley.
Use Hardware load balancer (Either F5 or Zeus) to connect either one of the broker (M1 or M2) based on the load.
Want to publish a message using Load balancer URL.
I have gone through network of brokers and we need to mintain some topology. I fell which makes the system more complicated if system grows horizontally. So it is better to have one load balancer to distribute the load.
Questions
Is this above model will send message to any one of the Broker?
Consumer Will be deployed in Tomcat (Think i need to use embeded brokers to configure either M1 or M2). Is it possible to use Load balancer URL instaed of M1 or M2?
Is it possible to have single Web Console Admin to monitor both M1 and M2.
Do we have any performance issue using Spring's feature to consume message.
Sorry to shoot out so many questions. Please help me to correct the deployment model.
I think the best way to get some load balancing with some activemq servers is having a : network of brokers and your consumers/producers (in your webapps) should use some failover
So if a producer p1 send a message on a queue on broker 1, the consumer c1 can read the message on broker 2.
[Edit] I have never tried to add some hardware balancer instead of the activemq protocole failover. It should work : just try it and tell us.
3- I do not think it is possible to have only one Web Console to monitor both of your brokers.
4- As far as I am concerned I do not have any performance issue with my Spring configuration.
There are a lot of questions there.
The first thing is to do is start simple. If your application's load is being handled with just one broker, consider setting up high availability through a master-slave setup. For this you do not need a load balancer - the ActiveMQ client library has a failover mechanism where you can define the URLs to a set of brokers that the client should attempt to connect to.
If you are looking at setting up an infrastructure where one broker will not be able to deal with the message load (you can test the maximum throughput of your broker using the performance module), I would advise you to read up on how networks of brokers work. If you do go down this path, you really need to understand ActiveMQ.
On monitoring, a web console can only show you the internals of a single broker. To get insight around what is going on around a set of brokers you will need a monitoring tool such as FuseHQ/Hyperic that is able to aggregate JMX information from a number of boxes.
Performance with Spring is not a problem as long as you configure it correctly (see the section on PooledConnectionFactory).
I see that you are a new user, so if this answers your question, please tick it.