Reading only one message from the topic using REST Proxy - apache-kafka

I use Kafka version 2.2.0cp2 through Rest Proxy (in the Docker container). I need the consumer to always read only one message.
I set the value max.poll.records=1 in the file /etc/kafka/consumer.properties as follows:
consumer.max.poll.records=1
OR:
max.poll.records=1
It had no effect.
Setting this value in other configs also did not give any result.

So consumer.properties is not read from the REST Proxy
Assuming consumer properties can be changed, the kafka-rest container env-var would be KAFKA_REST_CONSUMER_MAX_POLL_RECORDS, but that setting only controls the inner poll loop of the Proxy server, not the returned amount of data to the HTTP client...
There would have to be a limit flag given to the API, which does not exist - https://docs.confluent.io/current/kafka-rest/api.html#get--consumers-(string-group_name)-instances-(string-instance)-records

I don't see any consumer poll setting mentioned in the below link
https://docs.confluent.io/current/kafka-rest/config.html
But if you know the average message size you can pass max_bytes as below to control record size
GET
/consumers/testgroup/instances/my_consumer/records?timeout=3000&max_bytes=300000
HTTP/1.1
max_bytes:
The maximum number of bytes of unencoded keys and values that should
be included in the response. This provides approximate control over
the size of responses and the amount of memory required to store the
decoded response. The actual limit will be the minimum of this setting
and the server-side configuration consumer.request.max.bytes. Default
is unlimited

Related

How to define max.poll.records (SCS with Kafka) over containers

I'm trying to figure out the poll records mechanism for Kafka over SCS in a K8s environment.
What is the recommended way to control max.poll.records?
How can I poll the defined value?
Is it possible to define it once for all channels and then override for a specific channel?
(referring to this comment form documentation):
To avoid repetition, Spring Cloud Stream supports setting values for
all channels, in the format of
spring.cloud.stream.kafka.default.consumer.=. The
following properties are available for Kafka consumers only and must
be prefixed with
spring.cloud.stream.kafka.bindings..consumer..")
Is this path supported: spring.cloud.stream.binding.<channel name>.consumer.configuration?
Is this: spring.cloud.stream.**kafka**.binding.<channel name>.consumer.configuration?
How are conflicts being resolved? Let's say in a case where both spring.cloud.stream.binding... and spring.cloud.stream.**kafka**.binding... are set?
I've tried all mentioned configurations, but couldn't see in the log what is the actual poll.records and frankly the documentation is not entirely clear on the subject.
These are the configurations:
spring.cloud.stream.kafka.default.consumer.configuration.max.poll.records - default if nothing else specified for given channel
spring.cloud.stream.kafka.bindings..consumer.configuration.max.poll.records

Dynamic destination with spring cloud stream - Kafka

I'm developing a router (events proxy) application with spring cloud stream over Kafka, in the functional paradigm. the application consumes from constant input topic, maps and filters the message and then should send it to some topic according to some input fields (only single message at a time, not multiple results).
Is the best way to do it by setting the spring.cloud.stream.sendto.destination header for the output message?
and if so, how should I set the bindings for the producer?
You can also use StreamBridge
With regard to the binding configuration. . .
If they are truly dynamic where you don't know the name of the destination (e.g., may come with message header), there is nothing you can do with regard to configuring it.
If they are semi dynamic where you do know the name(s) and it's a limited set of names, then you can configure then as any other binding.
For example, let's say you are sending to destination foo, than you can use spring.cloud.stream.bindings.foo.....

Kafka Streams - accessing data from the metrics registry

I'm having a difficult time finding documentation on how to access the data within the Kafka Streams metric registry, and I think I may be trying to fit a square peg in a round hole. I was hoping to get some advice on the following:
Goal
Collect metrics being recorded in the Kafka Streams metrics registry and send these values to an arbitrary end point
Workflow
This is what I think needs to be done, and I've complete all of the steps except the last (having trouble with that one because the metrics registry is private). But I may be going about this the wrong way:
Define a class that implements the MetricReporter interface. Build a list of the metrics that Kafka creates in the metricChange method (e.g. whenever this method is called, update a hashmap with the currently registered metrics).
Specify this class in the metric.reporters configuration property
Set up a process that polls the Kafka Streams metric registry for the current data, and ship the values to an arbitrary end point
Anyways, the last step doesn't appear to be possible in Kafka 0.10.0.1 since the metrics registry isn't exposed. Could some please let me know this if is the correct workflow (sounds like it's not..), or if I am misunderstanding the process for extracting the Kafka Streams metrics?
Although the metrics registry is not exposed, you can still get the value of a given KafkaMetric by its KafkaMetric.value() / KafkaMetric.value(timestamp) methods. For example, as you observed in the JMXRporter, it keeps the list of KafkaMetrics from the instantiated init() and metricChange/metricRemoval methods, and then in its MBean implementation, when getAttribute is called, it will call its corresponding KafkaMetrics.value() function. So for your customized reporter, you can apply similar patterns, for example, periodically poll all kept KafkaMetrics.value() and then pipe the results to your end point.
The MetricReporter interface in org.apache.kafka.common.metrics already enables you to manage all Kafka stream metrics in the reporter. So kafka internal registry is not needed.

Read keys only from Kafka

Is it possible to read only the keys from Kafka? We have an application where the values stored in the Kafka log are quite big. In order to debug and quickly check whether a certain message is (still) in the log and at which offset, it would be great to just fetch and grep through the keys instead of reading the whole message value. Just discarding the value on the consumer side would be a big waste of time and bandwidth?
Can we get the keys only? How? Java solutions preferred, but Scala would be fine too.
As per Kafka Wire Protocol there is no possibility to fetch keys or values only. The fetch request does not contain any information to request only for keys or values, thus the returned message set will contain all keys and values present for returned messages.
You could sure filter out keys/values on the client side, but currently, I don't see any possibility to avoid network overhead you are looking for.

jute.maxbuffer affects only incoming traffic

Does this value only affect incoming traffic? If i set this value to say 4MB on zookeeper server as well as zookeeper client and I start my client, will I still get data > 4MB when I do a request for a path /abc/asyncMultiMap/subs. If /subs has data greater than 4MB is the server going to break it up in chunks <= 4MB and send it in pieces to the client?
I am using zookeeper 3.4.6 on both client (via vertx-zookeeper) and server. I see errors on clients where it complains that packet length is greater than 4MB.
java.io.IOException: Packet len4194374 is out of range!
at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
"This is a server-side setting"
This statement is incorrect, jute.maxbuffer is evaluated on client as well by Record implementing classes that receive InputArchive. Each time a field is read and stored into an InputArchive the value is checked against jute.maxbuffer. Eg ClientCnxnSocket.readConnectResult
I investigated it in ZK 3.4
There is no chunking in the response.
This is a server-side setting. You will get this error if the entirety of the response is greater than the jute.maxbuffer setting. This response limit includes the list of children of znodes as well so even if subs does not have a lot of data but has enough children such that their length of their paths exceeds the max buffer size you will get the error.