Is a web frontend producing directly to a Kafka broker a viable idea?

Is a web frontend producing directly to a Kafka broker a viable idea? - apache-kafka

I have just started learning Kafka. So trying to build a social media web application. I am fairly clear on how to use Kafka for my backend ( communicating from backend to databases and other services).
However, I am not sure how should frontend communicate with backend. I was considering an architecture as: Frontend -> Kafka -> Backend.
Frontend acts as producer and backend as consumer. In this case, frontend would supposedly have all required resources to publish to Kafka broker (even if I implement security on Kafka). Now, is this scenario possible:
Lets say I impersonate the frontend and send absurd/invalid messages to my Kafka broker. Now I can handle and filter these messages when they reach to my backend. But I know that Kafka stores these messages temporarily. Wouldn't my Kafka server face DDOS problems if such "fake" messages are published to it in high volume, since it is gonna store them anyway as they dont get filtered out until they actually get consumed by backend?
If so, how can I prevent this?
Or is this not a good option? I can also try using REST for frontend/backend communication and then Kafka will be used from backend to communicate with database(s) and other stuff.
Or I can have a middleware (again, REST) that detects and filters out such messages.

Easiest way is to have the front end produce to the Kafka REST Proxy
See details here https://docs.confluent.io/1.0/kafka-rest/docs/intro.html
That way there is no kafka client code required in your front end and you can use HTTP(S) with standard off the shelf load balancers, and API Management tools.

Could you not consider the other direction, to use Kafka as a transport system for updating assets available to frontend ? This has been proposed for hybrid React / NodeJS/Express solutions.

Related

Connecting to topics using Rest proxy

I am new to Kafka .I have implemented my consumer as normal Java springboot application.I need to connect to the topic deployed on remote broker using Kafka rest proxy.
I am not able to understand how it will function differently if i use Kafka rest proxy.Where i should do change in my code to include the rest proxy.Do i need to structure my code complete different as i didn't think about rest proxy while creation.
I maybe wrong with the terminologies.
Any help or guidance would be of great help.

REST proxy would be used with any HTTP client, not a Kafka consumer (so create a WebClient bean rather than a ConsumerFactory, etc)
You can refer its documentation for how you can consume records over HTTP, but, simply put, the code will be completely different up until you parse the data

Alternative of Confluent REST Proxy

We have some applications which want to communicate with Kafka using REST API calls to both consume and produce messages. If we do not want to use Confluent REST Proxy, what are the options ?

One possible alternative is the Strimzi Kafka Bridge (https://github.com/strimzi/strimzi-kafka-bridge).
It's part of the broader Strimzi project about running Kafka on Kubernetes but work even running as standalone (when your Kafka cluster is on bare metal).
Of course it's open source and Apache 2.0 licensed.

the reason [not to use it] is monetary
You can use the Confluent REST Proxy with no software/licensing costs.
We are thinking of not buying any additional hardware for this new request and use existing configuration to meet the requirement.I am mostly interested to know if consumer/producer can be created to meet this requirement
You don't need extra hardware.
Pick an existing server with at least 2GB available of memory, and run kafka-rest-start and see how well it works
if we can create Rest-API calls which will be used by other applications to consume data from Kafka and push data to Kafka
That's the main purpose of REST Proxy, yes.

Use Kafka in AWS EBS microservice docker environment to avoid losing user requests and handle more concurrent hits

Currently, I am using AWS EBS microservice docker environment to deploy the micro-services which are written in Scala and Akka. If anyone of the microservice docker is crashed and restarted again. In this case, we will lose the user requests and service will not return any response for those cases. My current architecture can handle up to 1000 concurrent requests without any issues. To avoid this issues, I am planning to store and retrieve all the requests and responses using Kafka.
So I want to use Kafka to manage the request and responses of all my web services and include a separate service or web socket to process all the requests and store the responses again to Kafka. In this case, if my core process docker crashed or restarted. It won't lose any request and responses at any point in time. It will again start to read the requests from Kafka and process it.
All the web services will store the request in relevant topic in Kafka and get the response from relevant response topic and return back to an API response. I have found the following library to use Kafka in Scala web services.
https://github.com/akka/reactive-kafka/
Please check the attached architecture diagram which I am going to use it to efficiently handle a large number of concurrent requests from client apps. Is it a good approach to proceed? Do I need to changes anything in my architecture?
I have created this architecture after done more research about Kafka and microservice dockers. Please let me know if anything wrong with this architecture.

This is Kafka's bread and butter so I don't think you're going to run into any architectural issues with this. Just be aware that there is a pretty large amount of operational overhead with Kafka. A good resource for getting started is Kafka: The Definitive Guide written by Confluent (https://www.confluent.io/wp-content/uploads/confluent-kafka-definitive-guide-complete.pdf) . It goes into detail on a lot of common operational issues that they don't mention in the documentation.

What are the benefits of Apache Kafka's native binary TCP protocol over it's restful API?

As per Apache Kafka's documentation, Kafka uses binary TCP protocol in it's native API's communication but they have also provided URL based restful API for the languages which don't support Apache Kafka's native API. I was just wondering if there is any benefit of native binary TCP protocol (supported in native API) over restful URL based communication with broker node? And I was also thinking that will restful API still maintain only once property?
Edit:
The restful API guide is here: https://www.confluent.io/blog/a-comprehensive-open-source-rest-proxy-for-kafka which explains how to produce and consume Kafka's message over restful API

There is no REST API included in Apache Kafka for producing or consuming messages as with the native Kafka protocol client implemented in Java.
There is a REST APIs in Apache Kafka for configuring Kafka Connect.
There are a number of third party REST Proxy implementations (such as the Confluent Kafka REST Proxy) which allow pub/sub over a REST interface but these are separate open source projects outside of Apache Kafka.
If you mean to ask what are the advantages to use the native Kafka Java Producer/Consumer API rather than these third party REST/HTTP Proxy implementation then these are some reasons:
One benefit is greater parallelism. A Kafka client will typically open up TCP connections to multiple brokers in the cluster and send or fetch data in parallel across multiple partitions of the same topic.
Another benefit is better network utilization as HTTP headers can add a lot of size to otherwise small messages while Kafka’s wire protocol is a compact binary protocol.
Kafka clients handle load balancing, failover, and cluster expansion or contraction automatically while REST clients typically require a third party load balancer to achieve the same functionality.
Kafka client can send their own authentication credentials for access control and bandwidth throttling (quotas) while all REST clients look to the kafka cluster as one Kafka client and therefore have common ACL privileges.
Kafka client libraries buffer and batch messages together into smaller numbers of Kafka produce or fetch requests while HTTP can only batch data if the programmer thought to publish them as a single batch.
The native Kafka protocol supports more than just what the producer/consumer api exposes. There is also an Admin API for creating topics, and modifying topic configurations. These functions are not (yet) exposed through the most popular REST Proxy implementations.

ActiveMQ Deployment model

I have gone through (not fully) ActiveMQ and tried to figure out the deployment model for my application.
I am bit confused on that.
I want to make the system High Availability and decided to use the following. Please correct me if anything is wrong or disadvantage of the model.
Deployment Modle:
Will deploy Brokers in M1 and M2 respectivley.
Use Hardware load balancer (Either F5 or Zeus) to connect either one of the broker (M1 or M2) based on the load.
Want to publish a message using Load balancer URL.
I have gone through network of brokers and we need to mintain some topology. I fell which makes the system more complicated if system grows horizontally. So it is better to have one load balancer to distribute the load.
Questions
Is this above model will send message to any one of the Broker?
Consumer Will be deployed in Tomcat (Think i need to use embeded brokers to configure either M1 or M2). Is it possible to use Load balancer URL instaed of M1 or M2?
Is it possible to have single Web Console Admin to monitor both M1 and M2.
Do we have any performance issue using Spring's feature to consume message.
Sorry to shoot out so many questions. Please help me to correct the deployment model.

I think the best way to get some load balancing with some activemq servers is having a : network of brokers and your consumers/producers (in your webapps) should use some failover
So if a producer p1 send a message on a queue on broker 1, the consumer c1 can read the message on broker 2.
[Edit] I have never tried to add some hardware balancer instead of the activemq protocole failover. It should work : just try it and tell us.
3- I do not think it is possible to have only one Web Console to monitor both of your brokers.
4- As far as I am concerned I do not have any performance issue with my Spring configuration.

There are a lot of questions there.
The first thing is to do is start simple. If your application's load is being handled with just one broker, consider setting up high availability through a master-slave setup. For this you do not need a load balancer - the ActiveMQ client library has a failover mechanism where you can define the URLs to a set of brokers that the client should attempt to connect to.
If you are looking at setting up an infrastructure where one broker will not be able to deal with the message load (you can test the maximum throughput of your broker using the performance module), I would advise you to read up on how networks of brokers work. If you do go down this path, you really need to understand ActiveMQ.
On monitoring, a web console can only show you the internals of a single broker. To get insight around what is going on around a set of brokers you will need a monitoring tool such as FuseHQ/Hyperic that is able to aggregate JMX information from a number of boxes.
Performance with Spring is not a problem as long as you configure it correctly (see the section on PooledConnectionFactory).
I see that you are a new user, so if this answers your question, please tick it.