Will flink resume from the last offset after executing yarn application kill and running again? - apache-kafka

I use FlinkKafkaConsumer to consume kafka and enable checkpoint. Now I'm a little confused on the offset management and checkpoint mechanism.
I have already know flink will start reading partitions from the consumer group’s.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration
and the offset will store into checkpoint in remote fileSystem.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-and-fault-tolerance
What happen if I stop the application by executing the yarn application -kill appid
and run the start command like ./bin flink run ...?
Will flink get the offset from checkpoint or from group-id managed by kafka?

If you run the job again without defining a savepoint ($ bin/flink run -s :savepointPath [:runArgs]) flink will try to get the offsets of your consumer-group from kafka (in older versions from zookeeper). But you will loose all other state of your flink job (which might be ignorable if you have a stateless flink job).
I must admit that this behaviour is quite confusing. By default starting a job without a savepoint is like starting from zero. As far as I know only the implementation of the kafka source differs from that behaviour. If you wanna change that behaviour you can set the setStartFromGroupOffsets of the FlinkKafkaConsumer[08/09/10] to false. This is described here: Kafka Consumers Start Position Configuration
It might be worth having a closer look at the documentation of flink: What is a savepoint and how does it differ from checkpoints.
In a nutshell
Checkpoints:
The primary purpose of Checkpoints is to provide a recovery mechanism in case of unexpected job failures. A Checkpoint’s lifecycle is managed by Flink
Savepoints:
Savepoints are created, owned, and deleted by the user. Their use-case is for planned, manual backup and resume
There are currently ongoing discussions on how to "unify" savepoints and checkpoints. Find a lot of technical details here: Flink improvals 47: Checkpoints vs Savepoints

Related

Apache Flink Streaming Job: deployment patterns

We want to use Apache Flink for the streaming job – read from one Kafka topic and write to another. The infrastructure will be deployed to Kubernetes. I want to restart the job on any PR merge to master branch.
Therefore, I wonder whether Flink guarantees that resubmitting the job will continue the data stream from the last processed message? Because one of the most important job's feature is message deduplication on time window.
What are the patterns of updating streaming jobs for Apache Flink? Should I just stop the old job and submit the new one?
My suggestion would be to simply try it.
Deploy your app manually and then stop it. Run kafka-consumer-groups script to find your consumer group. Then restart/upgrade the app, and run the command again with the same group. If the lag goes down (as it should), rather than resets to the beginning of the topic, then it's working as expected, as it would for any Kafka consumer.
read from one Kafka topic and write to another.
Ideally, Kafka Streams is used for this.
Kafka consumer offsets are saved as part of the checkpoint. So as long as your workflow is running in exactly-once mode, and your Kafka source is properly configured (e.g. you've set a group id), then restarting your job from the last checkpoint or savepoint will guarantee no duplicate records in the destination Kafka topic.
If you're stopping/restarting the job as part of your CI processing, then you'd want to:
Stop with savepoint.
Re-start from the savepoint
You could also set ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH to true (so that a checkpoint is taken when the job is terminated), enable externalized checkpoints, and then restart from the last checkpoint, but the savepoint approach is easier.
And you'd want to have some regular process that removes older savepoints, though the size of the savepoints will be very small (only the consumer offsets, for your simple workflow).

flink with checkpoint doesn't die after kafka disconnection

My flink streaming job (with checkpoint) gets data from a source kafka and print it (very simple job).
If I kill the source kafka,
what I expect : flink job will die (because flink will lose the source)
What happend : flink job doesn't die retrying to connect to the source kafka
I want my flink job is dead after disconnection.
Even I set restart strategy, flink job doesn't die.
How can I kill my flink job after kafka disconnection?
FYI)
My flink version : 1.14.4
Checkpoint setting (in scala)
...
env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE)
env.getCheckpointConfig.setCheckpointTimeout(TimeUnit.HOURS.toMillis(1))
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
env.getCheckpointConfig.setTolerableCheckpointFailureNumber(0)
env.getConfig.setUseSnapshotCompression(true)
...
Thanks in advance!
The checkpointing settings aren't relevant in this case, because they are specifically tied to making checkpoints. So only in case a checkpoint fails, these settings become relevant.
In your use case, there will be some time that passes before Flink can detect that the Kafka brokers are no longer available. Flink will not stop immediately, unless at that point Flink tries to connect to Kafka and fails. In case you don't want Flink to restart in case of a failure, then you'll need to set the restart-strategy to none. See https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/task_failure_recovery/#no-restart-strategy

Standby tasks not writing updates to .checkpoint files

I have a Kafka Streams application that is configured to have 1 standby replica created for each task. I have two instances of the application running. When the application starts the application writes .checkpoint files for each of the partitions it is responsible for. It writes these files for partitions owned by both active and standby tasks.
When sending a new Kafka event to be processed by the application, the instance containing that active task for the partition updates the offsets in the .checkpoint file. However, the .checkpoint file for the standby task on the second instance is never updated. It remains at the old offset.
I believe this is causing us to see OffsetOutOfRangeEceptions to be thrown when we rebalance which results in tasks being torn down and created from scratch.
Am I right in thinking that offsets should be written for partitions in both standby and active tasks?
Is this an indication that my standby tasks are not consuming or could it be that it is purely not able to write the offset?
Any ideas what could be causing this behaviour?
Streams version: 2.3.1
This issue has been fixed in Kafka 2.4.0 which resolves the following bug issues.apache.org/jira/browse/KAFKA-8755
Note: The issue looks to only effect applications the are configured OPTIMIZE="all"

Apache Flink - duplicate message processing during job deployments, with ActiveMQ as source

Given,
I have a Flink job that reads from ActiveMQ source & writes to a mysql database - keyed on an identifier. I have enabled checkpoints for this job every one second. I point the checkpoints to a Minio instance, I verified the checkpoints are working with the jobid. I deploy this job is an Openshift (Kubernetes underneath) - I can scale up/down this job as & when required.
Problem
When the job is deployed (rolling) or the job went down due to a bug/error, and if there were any unconsumed messages in ActiveMQ or unacknowledged messages in Flink (but written to the database), when the job recovers (or new job is deployed) the job process already processed messages, resulting in duplicate records inserted in the database.
Question
Shouldn't the checkpoints help the job recover from where it left?
Should I take the checkpoint before I (rolling) deploy new job?
What happens if the job quit with error or cluster failure?
As the jobid keeps changing on every deployment, how does the recovery happens?
Edit As I cannot expect idempotency from the database, to avoid duplicates saved into the database (Exactly-Once), can I write database specific (upsert) query to update if the given record is present & insert if not?
JDBC currently only supports at least once, meaning you get duplicate messages upon recovery. There is currently a draft to add support for exactly once, which would probably be released with 1.11.
Shouldn't the checkpoints help the job recover from where it left?
Yes, but the time between last successful checkpoints and recovery could produce the observed duplicates. I gave a more detailed answer on a somewhat related topic.
Should I take the checkpoint before I (rolling) deploy new job?
Absolutely. You should actually use cancel with savepoint. That is the only reliable way to change the topology. Additionally, cancel with savepoints avoids any duplicates in the data as it gracefully shuts down the job.
What happens if the job quit with error or cluster failure?
It should automatically restart (depending on your restart settings). It would use the latest checkpoint for recovery. That would most certainly result in duplicates.
As the jobid keeps changing on every deployment, how does the recovery happens?
You usually point explicitly to the same checkpoint directory (on S3?).
As I cannot expect idempotency from the database, is upsert the only way to achieve Exactly-Once processing?
Currently, I do not see a way around it. It should change with 1.11.

how offsets kept in savepoint

We are using kafka as source of our pipeline. I want to take the existing state from production environment to a new environment. My question is what happens with the offset in the new environment ? since we took the savepoint from production and the offsets are kept in the savepoint does it mean that in the new environment the job will start consuming messages with the offsets from production or it will actually starts with a new ones like as new consumer ?
The offsets in the new job will start from those stored in the savepoint, provided you restart the new job from the savepoint, like this:
$ bin/flink run -s :savepointPath [:runArgs]
Relevant docs include the last paragraph of this section about Kafka Consumers Start Position Configuration, which states
Note that these start position configuration methods do not affect the start position when the job is automatically restored from a failure or manually restored using a savepoint. On restore, the start position of each Kafka partition is determined by the offsets stored in the savepoint or checkpoint ...
as well as this section about Resuming from Savepoints.