How topic QOS effect DDS service? - publish-subscribe

Most of the Topic Qos policies are also applicable to DataWriter and DataReader. For example DURABILITY, DEADLINE, LATENCY_BUDGET, RELIABILITY etc.
What happen when this type of Qos policies set with different values at Topic and its DataWriter or DataReader?
Is it considered as incompatible request? Is there a hierarchy between Topic and DataWriter or DataReader Qos policies?

Setting the DataReader/DataWriter QoS different from the Topic QoS is allowed. The only really significant role of the Topic QoS in DDS is when the durability kind is set to Transient/Persistent (an optional feature). In that case, the middleware ensures the data remains available even when all application processes are stopped (but some middleware ones remain in the case of transient).
The data set the middleware retains is what a DataReader that copied its QoS from the Topic QoS would contain. When copying the QoS for this purpose, the history kind/depth and resource limits settings are taken from the DurabilityService setting; and the mechanism also includes some automatic cleanup of disposed data (the "service cleanup delay" part of the DurabilityService setting).
Therefore, if a DataWriter and a Topic both have the durability kind set to transient/persistent, you can get incompatible QoS notifications if the QoS's are otherwise incompatible.
My advice is always to think carefully about the QoS when you're thinking about the topics you need, then set the Topic QoS accordingly. Almost always, you can then let the DataReaders and the DataWriters inherit from the Topic QoS.

Related

Kafka Streams - disable internal topic creation

I work in an organization where we must use the shared Kafka cluster.
Due to internal company policy, the account we use for authentication has only the read/write permissions assigned.
We are not able to request the topic-create permission.
To create the topic we need to follow the onboarding procedure and know the topic name upfront.
As we know, Kafka Streams creates internal topics to persist the stream's state.
Is there a way to disable the fault tolerance and keep the stream state in memory or persist in the file system?
Thank you in advance.
This entirely depends how you write the topology. For example, map/filter/forEach, etc stateless DSL operators don't create any internal topics.
If you actually need to do aggregation, and build state-stores, then you really shouldn't disable topics. Yes, statestores are stored either in-memory or as RocksDB on disk, but they're still initially stored as topics so they can actually be distributed, or rebuilt in case of failure.
If you want to prevent them, I think you'll need an authorizer class defined on the broker that can restrict topic creation based, at least, on client side application.id and client.id regex patterns, but there's nothing you can do at the client config.

Is it safe to use non unique kafka keys for financial events?

We use kafka to send events with financial information. We plan to use account id as kafka key for all events for account will be sent to the same kafka partion and ensure ordering of events for account. Events can be lost in case of compaction but our devops said they carefully review all pull-request for code-based configuration and will not enable compaction of topic on accident. Does it will be safe or we should use some unique id for all events and implement custom partitioning?
As you said yourself, keys are for ordering. In particular, keys are not related to fault tolerance.
For not losing message, you should do the following:
On topic level:
Set the replicas count to any number will help you sleep better at night (lets say 3 replicas).
On producer level:
Set acks to be 'all' (the argument name and value change depending on the client you are using).
This will make sure a message is considered committed only when all replicas saved the data to disk.
On consumer level:
Set delivery guarantee to be 'at least once'.
This will make sure you don't lose any data. On the other hand, you might be processing messages more than once, if your code is not idempotent you will have to handle that.

How to expand microservices? If Kafka is used

I have built a micro service platform based on kubernetes, but Kafka is used as MQ in the service. Now a very confusing question has arisen. Kubernetes is designed to facilitate the expansion of micro services. However, when the expansion exceeds the number of Kafka partitions, some micro services cannot be consumed. What should I do?
This is a Kafka limitation and has nothing to do with your service scheduler.
Kafka consumer groups simply cannot scale beyond the partition count. So, if you have a single partitioned topic because you care about strict event ordering, then only one replica of your service can be active and consuming from the topic, and you'd need to handle failover in specific ways that is outside the scope of Kafka itself.
If your concern is the k8s autoscaler, then you can look into the KEDA autoscaler for Kafka services
Kafka, as OneCricketeer notes, bounds the parallelism of consumption by the number of partitions.
If you couple processing with consumption, this limits the number of instances which will be performing work at any given time to the number of partitions to be consumed. Because the Kafka consumer group protocol includes support for reassigning partitions consumed by a crashed (or non-responsive...) consumer to a different consumer in the group, running more instances of the service than there are partitions at least allows for the other instances to be hot spares for fast failover.
It's possible to decouple processing from consumption. The broad outline of could be to have every instance of your service join the consumer group. Up to the number of instances consuming will actually consume from the topic. They can then make a load-balanced network request to another (or the same) instance based on the message they consume to do the processing. If you allow the consumer to have multiple requests in flight, this expands your scaling horizon to max-in-flight-requests * number-of-partitions.
If it happens that the messages in a partition don't need to be processed in order, simple round-robin load-balancing of the requests is sufficient.
Conversely, if it's the case that there are effectively multiple logical streams of messages multiplexed into a given partition (e.g. if messages are keyed by equipment ID; the second message for ID A needs to be processed after the first message, but could be processed in any order relative to messages from ID B), you can still do this, but it needs some care around ensuring ordering. Additionally, given the amount of throughput you should be able to get from a consumer of a single partition, needing to scale out to the point where you have more processing instances than partitions suggests that you'll want to investigate load-balancing approaches where if request B needs to be processed after request A (presumably because request A could affect the result of request B), A and B get routed to the same instance so they can leverage local in-memory state rather than do a read-from-db then write-to-db pas de deux.
This sort of architecture can be implemented in any language, though maintaining a reasonable level of availability and consistency is going to be difficult. There are frameworks and toolkits which can deliver a lot of this functionality: Akka (JVM), Akka.Net, and Protoactor all implement useful primitives in this area (disclaimer: I'm employed by Lightbend, which maintains and provides commercial support for one of those, though I'd have (and actually have) made the same recommendations prior to my employment there).
When consuming messages from Kafka in this style of architecture, you will definitely have to make the choice between at-most-once and at-least-once delivery guarantees and that will drive decisions around when you commit offsets. Note particularly that you need to be careful, if doing at-least-once, to not commit until every message up to that offset has been processed (or discarded), lest you end up with "at-least-zero-times", which isn't a useful guarantee. If doing at-least-once, you may also want to try for effectively-once: at-least-once with idempotent processing.

Is it necessary to use transactions explicitly in Kafka Streams to get "effectively once" behaviour?

A Confluence article states
Stream processing applications written in the Kafka Streams library can turn on exactly-once semantics by simply making a single config change, to set the config named “processing.guarantee” to “exactly_once” (default value is “at_least_once”), with no code change required.
But as transactions are said to be used, I would like to know: Are transactions used implicitly by Kafka Streams, or do I have to use them explicitly?
In other words, do I have to call something like .beginTransaction() and .commitTransaction(), or is all of this really being taken care of under the hood, and all that remains for me to be done is fine-tuning commit.interval.ms and cache.max.bytes.buffering?
Kafka Streams is using the transactions API to achieve exactly-once semantics implicitly, so you do not need to set any other configuration.
If you continue reading the blog it says:
"More specifically, when processing.guarantee is configured to exactly_once, Kafka Streams sets the internal embedded producer client with a transaction id to enable the idempotence and transactional messaging features, and also sets its consumer client with the read-committed mode to only fetch messages from committed transactions from the upstream producers."
More details can be found in KIP-129: Streams Exactly-Once Semantics

RabbitMQ Durability

I'm using rabbitMQ on docker.
When executing the rabbitmq, I want to set the message durability (durable/transient).
Is there any way to set up durability? (except when declare Queue and Exchange)
Yes it is possible to specify delivery-mode message attribute for any published message. However, the target queue must be also durable for a message to be persisted.
See chapter Message Attributes and Payload in RabbitMQ documenation:
Messages in the AMQP model have attributes. Some attributes are so
common that the AMQP 0-9-1 specification defines them and application
developers do not have to think about the exact attribute name. Some
examples are
Content type
Content encoding
Routing key
Delivery mode (persistent or not)
Message priority
Message publishing timestamp
Expiration period
Publisher application id
Simply publishing a
message to a durable exchange or the fact that the queue(s) it is
routed to are durable doesn't make a message persistent: it all
depends on persistence mode of the message itself. Publishing messages
as persistent affects performance (just like with data stores,
durability comes at a certain cost in performance).