assume that i own a kafka cluster and i ask for some clients(web apps) to send data to the Kafka , how i can make sure that the client who will create the producer to connect to my Kafka brokers will do the partition in the right way if the client will use custom partition-er ?
AFAIK it's not possible to restrict Kafka clients/cluster to use some partitioner. But if your producer is hidden behind some facade interface, you can probably check if the Key of your message has been created the right way.
Your facade can accept ProducerRecords for example. In this case you have access to the key and value fields.
https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html
Related
I work in an organization where we must use the shared Kafka cluster.
Due to internal company policy, the account we use for authentication has only the read/write permissions assigned.
We are not able to request the topic-create permission.
To create the topic we need to follow the onboarding procedure and know the topic name upfront.
As we know, Kafka Streams creates internal topics to persist the stream's state.
Is there a way to disable the fault tolerance and keep the stream state in memory or persist in the file system?
Thank you in advance.
This entirely depends how you write the topology. For example, map/filter/forEach, etc stateless DSL operators don't create any internal topics.
If you actually need to do aggregation, and build state-stores, then you really shouldn't disable topics. Yes, statestores are stored either in-memory or as RocksDB on disk, but they're still initially stored as topics so they can actually be distributed, or rebuilt in case of failure.
If you want to prevent them, I think you'll need an authorizer class defined on the broker that can restrict topic creation based, at least, on client side application.id and client.id regex patterns, but there's nothing you can do at the client config.
I want to use Kafka connect in order to read events from Kafka topic and write them into RabbitMQ.
In order to do so, I need to use the RabbitMQ sink.
Each of the events coming from Kafka should be sent to a different queue (based on some field in the event structure), which means a different routing key should be used. As far as I know, there's an option to configure a static routing key in the sink configuration. Is there any option to configure it dynamically based on the events to achieve the required behavior?
I only found references to MirrorMaker v2.
Can I reuse org.apache.kafka.connect.mirror.MirrorSourceConnector as if it were a "plain" Connector with Kafka as a source, or is there something else, hopefully simpler, available?
I'm trying to use KafkaConnect and (a combination of) its SMTs to simulate message routing behaviour found in other message brokers.
For example, I would like to consume from a topic, extract values from the message (either headers or payload), and route the message to another topic within the same cluster depending on the data found in the message.
Thank you
within the same cluster
Then that's what Kafka Streams or ksqlDB are for. You can import and use SMT methods directly via code, although you also need to use the Connect Converter classes to get Schema/Struct types that most of the SMT's require
While you could use MirrorMaker, intercluster relocation is not its purpose
During creating kafka producer, we can assign a client id. What is it used for? Can I get the producer client id in a consumer? For example, to see which producer produced the message?
No, a consumer cannot get the producer's client-id.
From the Kaka documentation, client-ids are:
An id string to pass to the server when making requests. The purpose
of this is to be able to track the source of requests beyond just
ip/port by allowing a logical application name to be included in
server-side request logging.
They are only used for identifying clients in the broker logs.
No, you'd have to pass it on as part of the key or value if you need it at the consumer side.
Kafka's philosophy is to decouple producers and consumers. A topic can be read by 0-n consumers and be written to by 0-n producers. Kafka is usually used for communication between (micro)service boundaries where services don't care about who produced a message, just about its contents.
Is it possible to create an alias of a topic name?
Or, put another way...
If a user writes to topic examplea is it possible to override that at the broker so they actually write to topic exampleb?
alternatively, if the topic was actually written as examplea, but the consumer can refer to it as exampleb.
I'm thinking it could probably be achieved using small hack at the broker where it replies to metadata requests, but I'd rather not if it can be done in some standard way.
Aliases are not natively supported in Kafka.
One workaround could be to produce to examplea and have a consumer/producer pair that consumers from examplea and produces to exampleb. The consumer/producer pair could be written with Kafka clients, as a connector in Connect, as a MirrorMaker instance (though you'll need to modify it to change the topic name), or as a Kafka Streams job. Note that the messages will appear in exampleb slightly after examplea because they're being copied after being written.
How are you writing to the Kafka topic because if it's via a REST proxy you should be able to rewrite the topic portion of the URL using NGINX or a similar reverse proxy intermediary.