We are chasing a strange behavior for a while now - asking your help because we ran out of ideas. Maybe someone has a few good ideas/pointers?
Components we use:
We are using Kafka 2.2.1 iirc cluster - 3 nodes setup
Topics having 6 partitions
We are working in Java and using org.springframework.kafka:spring-kafka 2.6.3 (also tested with latest 2.7.7 - did not help)
The problem:
In our use case we have a few topics where we do not have a continuous consumer, but we just fire one (only one at a time! - consuming all partitions) up periodically / on demand. And we use MANUAL AckMode because sometimes we just want to count the messages and not really process them.
The consumer is created from Java code programmatically - done by this snippet:
#Autowired
KafkaListenerContainerFactory kafkaListenerContainerFactory;
MessageListenerContainer listenerContainer = kafkaListenerContainerFactory.createContainer(topicName);
listenerContainer.setupMessageListener(this); // our class implements AcknowledgingMessageListener<Object, String>
listenerContainer.getContainerProperties().setGroupId("MyService");
listenerContainer.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
listenerContainer.start();
... we wait until no more inbound messages (for X seconds, timeout fashion)
listenerContainer.stop();
The strange thing is that although everything works as expected,
sometimes / somehow we get into a "state" we do not receive any messages anymore when we start the consumer. However, we do know for sure there are messages as our "MyService" consumer group there has an offset lag (we have Prometheus+Grafana metrics from Kafka for that so we see).
What makes the above even more weird:
This "state" can self-heal after a while without doing anything, so everything returns to normal. However, how long this state exists varies a lot! Sometimes just 1-2 hours other times might take days (and sometimes weeks).
We have DEV, STAGE, PROD environment - PROD is more powerful regarding Kafka side. The above weird behavior we experience in DEV and STAGE, but PROD not (just happened once during 6 months, for 2 hours then gone).
What we can also add to the above:
Retention period does not play role here. This state can happen even with just a few hours old messages on the topics.
When this weird state happens restarting the JVM (so the Java app) does not help.
We can break out from this state with a small trick: when the consumer is active, we send a message to the topic. Then suddenly Kafka somehow recovers from this state and suddenly starts to send all the messages not processed by the "MyService" consumer group
Any ideas appreciated!
Related
DISCLAIMER: This question is specifically about the Erlang/OTP Kafka Client library Brod (No Tag available yet)
I am trying to establish three consumer groups, one should write messages just plain to the console, another one should update a state representing API with certain messages, and the third one should store every message into a long term database (crate). I use a supervisor to start three according brod_group_subscriber_v2 (see this GIST). If I also start three brod (kafka) clients first and attach each group subscriber its own client, everything works perfectly so that offsets are commited to Kafka for every group and reads start from the latest commited offset.
If I use only one client (as it should be possible and intended according to the Brod docs and issues, see Reference by zmstone), only the last group in my CHILD_SPEC works, both other do not receive handle_message calls.
At the moment starting a client for every group is not an issue for me, as there are only three. But as our project grows, we plan to establish a couple of consumer groups, and I don't really think that it might be a good idea to run 20 to 30 brod clients and blocking ressources for each of them.
I am building the following Kafka Streams topology (pseudo code):
gK = builder.stream().gropuByKey();
g1 = gK.windowedBy(TimeWindows.of("PT1H")).reduce().mapValues().toStream().mapValues().selectKey();
g2 = gK.reduce().mapValues();
g1.leftJoin(g2).to();
If you notice, this is a rhomb-like topology that starts at single input topic and ends in the single output topic with messages flowing through two parallel flows that eventually get joined together at the end. One flow applies (tumbling?) windowing, the other does not. Both parts of the flow work on the same key (apart from the WindowedKey intermediately introduced by the windowing).
The timestamp for my messages is event-time. That is, they get picked from the message body by my custom configured TimestampExtractor implementation. The actual timestamps in my messages are several years to the past.
That all works well at first sight in my unit tests with a couple of input/output messages and in the runtime environment (with real Kafka).
The problem seems to come when the number of messages starts being significant (e.g. 40K).
My failing scenario is following:
~40K records with the same
key get uploaded into the input topic first
~40K updates are
coming out of the output topic, as expected
another ~40K records
with the same but different to step 1) key get uploaded into the
input topic
only ~100 updates are coming out of the output topic,
instead of expected new ~40K updates. There is nothing special to
see on those ~100 updates, their contents seems to be right, but
only for certain time windows. For other time windows there are no
updates even though the flow logic and input data should definetly
generate 40K records. In fact, when I exchange dataset in step 1)
and 3) I have exactly same situation with ~40K updates coming from
the second dataset and same number ~100 from the first.
I can easily reproduce this issue in the unit tests using TopologyTestDriver locally (but only on bigger numbers of input records).
In my tests, I've tried disabling caching with StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG. Unfortunately, that didn't make any difference.
UPDATE
I tried both, reduce() calls and aggregate() calls instead. The issue persists in both cases.
What I'm noticing else is that with StreamsConfig.TOPOLOGY_OPTIMIZATION set to StreamsConfig.OPTIMIZE and without it, the mapValues() handler gets called in debugger before the preceding reduce() (or aggregate()) handlers at least for the first time. I didn't expect that.
Tried both join() and leftJoin() unfortunately same result.
In debugger the second portion of the data doesn't trigger reduce() handler in the "left" flow at all, but does trigger reduce() handler in the "right" flow.
With my configuration, if the number or records in both datasets is 100 in each, the problem doesn't manifests itself, I'm getting 200 output messages as I expect. When I raise the number to 200 in each data set, I'm getting less than 400 expected messages out.
So, it seems at the moment that something like "old" windows get dropped and the new records for those old windows get ignored by the stream.
There is window retention setting that can be set, but with its default value that I use I was expecting for windows to retain their state and stay active for at least 12 hours (what exceeds the time of my unit test run significantly).
Tried to amend the left reducer with the following Window storage config:
Materialized.as(
Stores.inMemoryWindowStore(
"rollup-left-reduce",
Duration.ofDays(5 * 365),
Duration.ofHours(1), false)
)
still no difference in results.
Same issue persists even with only single "left" flow without the "right" flow and without join(). It seems that the problem is in the window retention settings of my set up. Timestamps (event-time) of my input records span 2 years. The second dataset starts from the beginning of 2 years again. this place in Kafka Streams makes sure that the second data set records get ignored:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryWindowStore.java#L125
Kafka Streams Version is 2.4.0. Also using Confluent dependencies version 5.4.0.
My questions are
What could be the reason for such behaviour?
Did I miss anything in my stream topology?
Is such topology expected to work at all?
After some debugging time I found the reason for my problem.
My input datasets contain records with timestamps that span 2 years. I am loading the first dataset and with that the "observed" time of my stream gets set to the maximum timestamp from from input data set.
The upload of the second dataset that starts with records with timestamps that are 2 years before the new observed time causes the stream internal to drop the messages. This can be seen if you set the Kafka logging to TRACE level.
So, to fix my problem I had to configure the retention and grace period for my windows:
instead of
.windowedBy(TimeWindows.of(windowSize))
I have to specify
.windowedBy(TimeWindows.of(windowSize).grace(Duration.ofDays(5 * 365)))
Also, I had to explicitly configure reducer storage settings as:
Materialized.as(
Stores.inMemoryWindowStore(
"rollup-left-reduce",
Duration.ofDays(5 * 365),
windowSize, false)
)
That's it, the output is as expected.
I was trying to leverage some enhancements in Kafka connect in 2.0.0 release as specified by this KIP https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect and I came across this good blog post by Robin https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues.
Here are my questions
I have set errors.tolerance=all in my connector config. If I understand correctly, it will not fail for bad records and move forward. Is my understanding correct?
In my case, the consumer doesn't fail and stays in the RUNNING state (which is expected) but the consumer offsets don't move forward for the paritions with the bad records. Any guess why this may be happening?
I have set errors.log.include.messages and errors.log.enable to true for my connector but I don't see any additional logging for the failed records. The logs are similar to what I used to see before enabling these properties. I didn't see any message like this https://github.com/apache/kafka/blob/5a95c2e1cd555d5f3ec148cc7c765d1bb7d716f9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/errors/LogReporter.java#L67
Some Context:
In my connector, I do some transformations, validations for every record and if any of these fail, I throw RetriableException. Earlier I was throwing RuntimeException but I changed to RetriableException after reading the comments for RetryWithToleranceOperator class.
I have tried to keep it brief but let me know if any additional context is required.
Thanks so much in advance!
Looking out for best approach for designing my Kafka Consumer. Basically I would like to see what is the best way to avoid data loss in case there are any
exception/errors during processing the messages.
My use case is as below.
a) The reason why I am using a SERVICE to process the message is - in future I am planning to write an ERROR PROCESSOR application which would run at the end of the day, which will try to process the failed messages (not all messages, but messages which fails because of any dependencies like parent missing) again.
b) I want to make sure there is zero message loss and so I will save the message to a file in case there are any issues while saving the message to DB.
c) In production environment there can be multiple instances of consumer and services running and so there is high chance that multiple applications try to write to the
same file.
Q-1) Is writing to file the only option to avoid data loss ?
Q-2) If it is the only option, how to make sure multiple applications write to the same file and read at the same time ? Please consider in future once the error processor
is build, it might be reading the messages from the same file while another application is trying to write to the file.
ERROR PROCESSOR - Our source is following a event driven mechanics and there is high chance that some times the dependent event (for example, the parent entity for something) might get delayed by a couple of days. So in that case, I want my ERROR PROCESSOR to process the same messages multiple times.
I've run into something similar before. So, diving straight into your questions:
Not necessarily, you could perhaps send those messages back to Kafka in a new topic (let's say - error-topic). So, when your error processor is ready, it could just listen in to the this error-topic and consume those messages as they come in.
I think this question has been addressed in response to the first one. So, instead of using a file to write to and read from and open multiple file handles to do this concurrently, Kafka might be a better choice as it is designed for such problems.
Note: The following point is just some food for thought based on my limited understanding of your problem domain. So, you may just choose to ignore this safely.
One more point worth considering on your design for the service component - You might as well consider merging points 4 and 5 by sending all the error messages back to Kafka. That will enable you to process all error messages in a consistent way as opposed to putting some messages in the error DB and some in Kafka.
EDIT: Based on the additional information on the ERROR PROCESSOR requirement, here's a diagrammatic representation of the solution design.
I've deliberately kept the output of the ERROR PROCESSOR abstract for now just to keep it generic.
I hope this helps!
If you don't commit the consumed message before writing to the database, then nothing would be lost while Kafka retains the message. The tradeoff of that would be that if the consumer did commit to the database, but a Kafka offset commit fails or times out, you'd end up consuming records again and potentially have duplicates being processed in your service.
Even if you did write to a file, you wouldn't be guaranteed ordering unless you opened a file per partition, and ensured all consumers only ran on a single machine (because you're preserving state there, which isn't fault-tolerant). Deduplication would still need handled as well.
Also, rather than write your own consumer to a database, you could look into Kafka Connect framework. For validating a message, you can similarly deploy a Kafka Streams application to filter out bad messages from an input topic out into a topic to send to the DB
we're using the latest master build (at the time of this writing: https://github.com/linkedin/Burrow/commit/12e681a3a8a61f84f17677996dc3e6a2b79fac41)
Our Kafka-Brokers are running 1.1.0
We switched recently from https://github.com/Morningstar/kafka-offset-monitor to Burrow, because we're adding authorization to our Clusters.
Now, most of our consumer-lags are 0 most of the time (according to Burrow, whereas on kafka-offset-monitor they were around 1K - 100K most of the time - both are OK from our point of view).
For reasons unknown to us, the consumer lag "jumps" e.g. from 0 to 1.4 Billion(!) from one minute to the next, and back again after another minute. We have about 20 consumers on our main topic, and all of their lags jump - but by different amounts. Some "only" jump from 1k to 1M, others from 0 to the billions described above.
Is anybody else seeing this?
Is there a known reason or do we have to adjust our config? - We didn't change anything about the default config for the evaluation or notifications...
We use https://github.com/rgannu/burrow-graphite to report to graphite, and our alarming system is based on those metrics...
Any help is appreciated