kafka response ack after the data is written to pageCache or to disk? - apache-kafka

Many articles tell me that Kafka writes data to the PageCache first, which improves write performance.
However, I have a doubt, when ack=-1, when copy=2, the data does already exist in the PageCache of both nodes.
If Kafka responds to acks at this time, and immediately, both nodes experience a power outage or system crash at the same time, at this time, neither node's data is yet persistent on disk.
In this extreme case, data loss can still occur?

Data loss can occur in the situation outlined.
Related reading:
this other answer
Confluent blog post: "Since the log data is not flushed from the page cache to disk synchronously, Kafka relies on replication to multiple broker nodes, in order to provide durability. By default, the broker will not acknowledge the produce request until it has been replicated to other brokers."

Related

Artemis - Messages Sync Between Memory & Journal

When reading artemis docs understood that - artemis stores entire current active messages in memory and can offload messages to paging area for a given queue/topic as per the settings & artemis journals are append only.
With respect to this
How and when broker sync messages to and from from journal ( Only during restart ? )
How it identifies the message to be deleted from journal ( For ex : If journal is append only mode , if a consumer of a persistent message ACK the message , then how broker removes a single message from journal without keeping indexing).
Isn't it a performance hit to keep every active message in memory and even makes broker go out of memory. To avoid this , every queue/topic pagination settings have to be set in configuration otherwise broker may fill all the messages. Please correct me if wrong.
Any reference link that can explain about message sync and these information is helpful. Artemis docs explains regarding append only mode though but may be any section/article that explains these storage concepts and I might be missing.
By default, a durable message is persisted to disk after the broker receives it and before the broker sends a response back to the client that the message was received. In this way the client can know for sure that if it receives the response back from the broker that the durable message it sent was received and persisted to disk.
When using the NIO journal-type in broker.xml (i.e. the default configuration), data is synced to disk using java.nio.channels.FileChannel.force(boolean).
Since the journal is append-only during normal operation then when a message is acknowledged it is not actually deleted from the journal. The broker simply appends a delete record to the journal for that particular message. The message will then be physically removed from the journal later during "compaction". This process is controlled by the journal-compact-min-files & journal-compact-percentage parameters in broker.xml. See the documentation for more details on that.
Keeping message data in memory actually improves performance dramatically vs. evicting it from memory and then having to read it back from disk later. As you note, this can lead to memory consumption problems which is why the broker supports paging, blocking, etc. The main thing to keep in mind is that a message broker is not a storage medium like a database. Paging is a palliative measure meant to be used as a last resort to keep the broker functioning. Ideally the broker should be configured to handle the expected load without paging (e.g. acquire more RAM, allocate more heap). In other words, message production and message consumption should be balanced. The broker is designed for messages to flow through it. It can certainly buffer messages (potentially millions depending on the configuration & hardware) but when its forced to page the performance will drop substantially simply because disk is orders of magnitude slower than RAM.

Can a message loss occur in Kafka even if producer gets acknowledgement for it?

Kafka doc says:
Kafka relies heavily on the filesystem for storing and caching messages.
A modern operating system provides read-ahead and write-behind techniques that prefetch data in large block multiples and group smaller logical writes into large physical writes.
Modern operating systems have become increasingly aggressive in their use of main memory for disk caching. A modern OS will happily divert all free memory to disk caching with little performance penalty when the memory is reclaimed. All disk reads and writes will go through this unified cache
...rather than maintain as much as possible in-memory and flush it all out to the filesystem in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.”
Further this article says:
(3) a message is ‘committed’ when all in sync replicas have applied it to their log, and (4) any committed message will not be lost, as long as at least one in sync replica is alive.
So even if I configure producer with acks=all (which causes producer to receive acknowledgement after all brokers commit the message) and producer receives acknowledgement for certain message, does that mean their is still a possibility that the message can get lost, especially if all brokers goes down and the OS never flushes the committed message cache to disk?
With acks=all and if the replication factor of the topic is > 1, it's still possible to lose acknowledged messages but pretty unlikely.
For example, if you have 3 replicas (and all are in-sync), with acks=all, you would need to lose all 3 brokers at the same time before any of them had the time to do the actual write to disk. With acks=all, the aknowledgement is sent once all in-sync replicas have received the message, you can ensure this number stays high with min.insync.replicas=2 for example.
You can reduce the possibility of this scenario even further if you use the rack awareness feature (and obviously brokers are physically in different racks or even better data centers).
To summarize, using all these options, you can reduce the likeliness of losing data enough so that it's unlikely to ever happen.

What makes Kafka high in throughput?

Most articles depicts Kafka better in read/write throughput than other message broker(MB) like ActiveMQ. Per mine understanding reading/writing
with the help of offset makes it faster. But I am not clear how offset makes it faster ?
After reading Kafka architecture, I have got some understanding but not clear what makes Kafka scalable and high in throughput based on below points :-
Probably with the offset, client knows which exact message it needs to read which may be one of the factor to make it high in performance.
And in case of other MB's , broker need to coordinate among consumers so
that message is delivered to only consumer. But this is the case for queues only not for topics. Then What makes Kafka topic faster than other MB's topic.
Kafka provides partitioning for scalability but other message broker(MB) like ActiveMQ also provides the clustering. so how Kafka is better for big data/high loads ?
In other MB's we can have listeners . So as soon as message comes, broker will deliver the message but in case of Kafka we need to poll which means more
load on both broker/client side ?
Lots of details on what makes Kafka different and faster than other messaging systems are in Jay Kreps blog post here
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
There are actually a lot of differences that make Kafka perform well including but not limited to:
Maximized use of sequential disk reads and writes
Zero-copy processing of messages
Use of Linux OS page cache rather than Java heap for caching
Partitioning of topics across multiple brokers in a cluster
Smart client libraries that offload certain functions from the
brokers
Batching of multiple published messages to yield less frequent network round trips to the broker
Support for multiple in-flight messages
Prefetching data into client buffers for faster subsequent requests.
It's largely marketing that Kafka is fast for a message broker. For example IBM MessageSight appliances did 13M msgs/sec with microsecond latency in 2013. On one machine. A year before Kreps even started the Github.:
https://www.zdnet.com/article/ibm-launches-messagesight-appliance-aimed-at-m2m/
Kafka is good for a lot of things. True low latency messaging is not one of them. You flatly can't use batch delivery (e.g. a range of offsets) in any pure latency-centric environment. When an event arrives, delivery must be attempted immediately if you want the lowest latency. That doesn't mean waiting around for a couple seconds to batch read a block of events or enduring the overhead of requesting every message. Try using Kafka with an offset range of 1 (so: 1 message) if you want to compare it to a normal push-based broker and you'll see what I mean.
Instead, I recommend focusing on the thing pull-based stream buffering does give you:
Replayability!!!
Personally, I think this makes downstream data engineering systems a bit easier to build in the face of failure, particularly since you don't have to rely on their built-in replication models (if they even have one). For example, it's very easy for me to consume messages, lose the disks, restore the machine, and replay the lost data. The data streams become the single source of truth against which other systems can synchronize and this is exceptionally useful!!!
There's no free lunch in messaging, pull and push each have their advantages and disadvantages vs. each other. It might not surprise you that people have also tried push-pull messaging and it's no free lunch either :).

Is it possible to avoid acknowledged message get lost in kafka?

We consider using kafka as critical messaging middle-ware.
But it looks like message durability guarantee is optimistic in kafka replication design:
For better performance, each follower sends an acknowledgment after
the message is written to memory. So, for each committed message, we
guarantee that the message is stored in multiple replicas in memory
However, there is no guarantee that any replica has persisted the
commit message to disks though.
In worst case, if whole cluster outage at same time before flush acknowledged messages to disk, some data may get lost. Is it possible to avoid this case?
There are multiple configurations to adjust frequency of log flushes. You can increase the time the flush scheduler thread checks if a flush is necessary log.flush.scheduler.interval.ms and you can decrease the number of messages needed to trigger a flush log.flush.interval.messages.
Although you'd never need to worry about this case if you're able to replicate across different data centers.
I don't think it is possible to guarantee that an acknowledged message won't get lost. However we can reduce the probability of loss by taking certain measures listed below-->
Increase the replication factor for the topic
In Producer code set the config acks=all
Keep min.insync.replicas high
For example, by using replication factor of 5, min.insync.replicas=4 and acks=all, a message will not be acknowledged until at least 4 replicas have received the message(not necessarily persisted though!).
Higher the number less probably your message will get lost.

Reliable usage of DirectKafkaAPI

I am pllaned to develop a reliable streamig application based on directkafkaAPI..I will have one producer and another consumer..I wnated to know what is the best approach to achieve the reliability in my consumer?..I can employ two solutions..
Increasing the retention time of messages in Kafka
Using writeahead logs
I am abit confused regarding the usage of writeahead logs in directkafka API as there is no receiver..but in the documentation it indicates..
"Exactly-once semantics: The first approach uses Kafka’s high level API to store consumed offsets in Zookeeper. This is traditionally the way to consume data from Kafka. While this approach (in combination with write ahead logs) can ensure zero data loss (i.e. at-least once semantics), there is a small chance some records may get consumed twice under some failures. "
so I wanted to know what is the best approach..if it suffices to increase the TTL of messages in kafka or I have to also enable write ahead logs..
I guess it would be good practice if I avoid one of the above since the backup data (retentioned messages, checkpoint files) can be lost and then recovery could face failure..
Direct Approach eliminates the duplication of data problem as there is no receiver, and hence no need for Write Ahead Logs. As long as you have sufficient Kafka retention, messages can be recovered from Kafka.
Also, Direct approach by default supports exactly-once message delivery semantics, it does not use Zookeeper. Offsets are tracked by Spark Streaming within its checkpoints.