DDS specification says that default value of Reliability for DataWriter is RELIABLE and for DataReader is BEST_EFFORT. When DataWriter and DataReader are created with default QOS values, how can reliable communication performed?
Does the ack/nack mechanism work, even though the DataReader is configured as BEST_EFFORT?
Why default Reliability values are different for DataWriter and DataReader?
When DataWriter and DataReader are created with default QOS values,
how can reliable communication performed? Does the ack/nack mechanism
work, even though the DataReader is configured as BEST_EFFORT?
No, if you are applying default QoS settings to both DataWriter and DataReader, there will be no effort made to repair any lost packets and no ack/nack mechanisms kick in. Even though the DataWriter is able to provide reliable communications by default, it won't if the DataReader does not ask for it.
They only way to get reliable communication going would be to modify the DataReader QoS to use RELIABLE reliability.
Why default Reliability values are different for DataWriter and
DataReader?
That is what the DDS Specification prescribes, but it does not explain the rationale behind that.
I suspect the thought may have been that this particular selection of reliability policies tends to cover the largest number of use cases with the least number of configuration changes. Best effort data flows are common in the types of systems that DDS is used for. With all policies set to default, there will be no QoS mismatches between DataWriters and DataReaders and best effort is indeed what is used. There is no (noticeable) overhead associated with that. If a DataReader does require guaranteed delivery, it just has to select the RELIABLE policy and the DataWriter will up its protocol with that DataReader to engage in the reliability protocol.
That said, for all QoSes that are Request-Offered (RxO), like reliability is, I consider it good practice to set them explicitly and to document why that particular combination of policies was selected for each dataflow.
Related
We are building an event sourced system at my company, relying on Kafka.
In order to be GDPR compliant, we need to be able to update the events.
Our idea is to use the compaction and tombstone capabilities.
This means that we cannot use the default partitioning strategy, as we want each message to have an unique key (in order to overwrite a specific message), but we still want events occuring on the same aggregate to end on the same partition.
Which brings us to the creation of a custom partitioner (basically copying the "hash modulo" logic of the default partitioner, but using a different value than the message key to compute the hash).
The issue is that we're evolving in a polyglot environment (we have php, python and Java/Kotlin services publishing and consuming events).
We want to ensure that all these services will produce messages to the same partition given a specific partition key (in case different services will publish events to the same topic).
Our main idea was to use a common hashing algorithm, but we find it hard to find one with both a strong distribution guarantee and a good stability (not just part of an experimental lib).
PHP natively supports a wide range of hashing algorithms, but we find it hard to find the same support in the other languages.
As Kafka default partitioner relies on murmur2, we started looking in that direction as well. Unfortunately, it is not natively supported by php (although some implementations exist). Furthermore, this algorithm uses a seed, which means that we will need to use the exact same seed for all our publisher services, which is starting to make the approach look quite complex.
However, we could be looking at the design from the wrong angle. Sharing event store write capabilities across polyglot services might not be a good idea and each services could have its own partitioning logic as long as it ensures the "one partition per aggregate" requirement. The thing is that we have to think this ahead, because no technical safeguard will prevent one service in the future to publish on a "shared" event stream (and not using the exact same partitioning logic will have a huge impact when it happens).
Would someone has experience with building an event store with Kafka in a polyglot environment, and could highlight us on this specific topic, please?
in reactive programming Resilience is achieved by replication, containment, isolation and delegation.
two of the famous design patterns are Bulkheads with supervisor and circuit breaks. are these only for reaching isolation and containment?
what are the most famous design patterns for microservices and specially the ones give resiliency?
Reactive Programing can not be just resumed in design patterns. There are many considerations about systems architecture, dev ops and so to have in mind when you are designing high performance and availability systems.
Specifically, about resiliency, you should be thinking, for example, in:
Containerization
Services Orchestration
Fault Tolerant Jobs
Pub/Sub Model
And looots of other things :)
Other than BulkHead and CBs, few other things that can be implemented:
Retry Pattern on Idempotent Ops. This requires the Operation to be retried is Idempotent and will produce the same results on repeated execution.
Proper Timeout Configurations like Connection, Command Timeouts in case of network dependency
Bounded Request Queues at Virtual Host/Listener level
Failover Strategy like Caching
Redundancy, Failover Systems can be incorporated to achieve resiliency against system failures as well
You can implement various resilience patterns to achieve different levels of resilience based on your needs.
Unit Isolation –split systems into parts and isolate the parts
against each other. The entire system must never fail.
Shed Load – Implement a rate limiter, which sheds any extra load an
application can’t handle, to ensure than an application is resilient
to spikes in the number of requests. any request that is processed by
an application consumes resources like CPU, memory, IO, and so on. If
requests come at a rate that exceeds an application’s available
resources, the app may become unresponsive, behave inconsistently, or
crash.
Retry – enable an application to handle transient failures when it
tries to connect to a service or network resource, by transparently
retrying a failed operation.
Timeout – wait for a predetermined length of time and take
alternative action if that time is exceeded.
Circuit Breaker – when connecting to a remote service or resource,
handle faults that might take a variable amount of time to recover
from.
Bounded Queue – limit request queue sizes in front of heavily used
resources.
I'm new to distributed systems, and I'm reading about "simple Paxos". It creates a lot of chatter and I'm thinking about performance implications.
Let's say you're building a globally-distributed database, with several small-ish clusters located in different locations. It seems important to minimize the amount of cross-site communication.
What are the decisions you definitely need to use consensus for? The only one I thought of for sure was deciding whether to add or remove a node (or set of nodes?) from the network. It seems like this is necessary for vector clocks to work. Another I was less sure about was deciding on an ordering for writes to the same location, but should this be done by a leader which is elected via Paxos?
It would be nice to avoid having all nodes in the system making decisions together. Could a few nodes at each local cluster participate in cross-cluster decisions, and all local nodes communicate using a local Paxos to determine local answers to cross-site questions? The latency would be the same assuming the network is not saturated, but the cross-site network traffic would be much lighter.
Let's say you can split your database's tables along rows, and assign each subset of rows to a subset of nodes. Is it normal to elect a set of nodes to contain each subset of the data using Paxos across all machines in the system, and then only run Paxos between those nodes for all operations dealing with that subset of data?
And a catch-all: are there any other design-related or algorithmic optimizations people are doing to address this?
Good questions, and good insights!
It creates a lot of chatter and I'm thinking about performance implications.
Let's say you're building a globally-distributed database, with several small-ish clusters located in different locations. It seems important to minimize the amount of cross-site communication.
What are the decisions you definitely need to use consensus for? The only one I thought of for sure was deciding whether to add or remove a node (or set of nodes?) from the network. It seems like this is necessary for vector clocks to work. Another I was less sure about was deciding on an ordering for writes to the same location, but should this be done by a leader which is elected via Paxos?
Yes, performance is a problem that my team had seen in practice as well. We maintain a consistent database & distributed lock manager; and orignally used Paxos for all writes, some reads and cluster membership updates.
Here are some of the optimizations we did:
As much as possible, nodes sent the transitions to a Distinguished Proposer/Learner (elected via Paxos), which
decided on write ordering, and
batched transitions while waiting for the response from the prior instance. (But batching too much also caused problems.)
We had considered using multi-paxos but we ended up doing something cooler (see below).
With these optimizations, we were still hurting for performance, so we split our server into three layers. The bottom layer is Paxos; it does what you suggest; viz. merely decides the node membership of the middle layer. The middle layer is a custom-in-house-high-speed chain consensus protocol, which does consensus & ordering for the DB. (BTW, chain-consensus can be viewed as Vertical Paxos.) The top layer now just maintains the database/locks & client connections. This design has lead to several orders of magnitude latency and throughput improvement.
It would be nice to avoid having all nodes in the system making decisions together. Could a few nodes at each local cluster participate in cross-cluster decisions, and all local nodes communicate using a local Paxos to determine local answers to cross-site questions? The latency would be the same assuming the network is not saturated, but the cross-site network traffic would be much lighter.
Let's say you can split your database's tables along rows, and assign each subset of rows to a subset of nodes. Is it normal to elect a set of nodes to contain each subset of the data using Paxos across all machines in the system, and then only run Paxos between those nodes for all operations dealing with that subset of data?
These two together remind me of the Google Spanner paper. If you skip over the parts about time, it's essentially doing 2PC globally and Paxos on the shards. (IIRC.)
I am trying to build a distributed task queue, and I am wondering if there is any data store, which has some or all of the following properties. I am looking to have a completely decentralized, multinode/multi-master self replicating datastore cluster to avoid any single point of failure.
Essential
Supports Python pickled object as Value.
Persistent.
More, the better, In decreasing order of importance (I do not expect any datastore to meet all the criteria. :-))
Distributed.
Synchronous Replication across multiple nodes supported.
Runs/Can run on multiple nodes, in multi-master configuration.
Datastore cluster exposed as a single server.
Round-robin access to/selection of a node for read/write action.
Decent python client.
Support for Atomicity in get/put and replication.
Automatic failover
Decent documentation and/or Active/helpful community
Significantly mature
Decent read/write performance
Any suggestions would be much appreciated.
Cassandra (open-sourced by facebook) has pretty much all of these properties. There are several Python clients, including pycassa.
Edited to add:
Cassandra is fully distributed, multi-node P2P, with tunable consistency levels (i.e. your replication can be synchronous or asynchronous or a mixture of both). Clients can connect to any server. Failover is automatic, and new servers can be added on-the-fly for load balancing. Cassandra is in production use by companies such as Facebook. There is an O'Reilly book. Write performance is extremely high, read performance is also high.
I have seen it mentioned several times as a best practice that there should be one distributor process configured per message type, but never any explanation as to why this is so. Since increasing the number of distributors increases the deployment complexity, I'd like to know the reasoning behind it. My guess is that if all available subscribers for a given message type are busy, the distributor may be stuck waiting for one to free up, while messages of other types which may have free subcribers are piling up in the distributor's work queue. Is this accurate? Any other reasons?
It is true that the Distributor will not hand out more work until a Worker is done. Therefore if Workers are tied up with a given message type, the others will sit there until they are done. NSB doesn't have a concept of priority, all messages are created equal. Workers do not subscribe to specific message types, they just get handed work from the Distributor.
If certain message types have "priority" over others, then they should have their own Distributor. If the "priority" is all the same then adding more workers will increase performance to a certain point. This will depend upon what you are resoruce you are operating upon. If it is a database, your endpoint may be more data bound than cpu bound. In that case adding more Workers won't help as they are creating increasing contention on potentially the same resource. In this case you may need to look into partitioning the resource some how.
Having one logical endpoint per message type (logical endpoint is equal to either one endpoint or many copies of an endpoint behind a distributor) allows you the flexibility to monitor and scale each use case independently.
Also, it enables you to version the endpoint for one message type independently from all the others.
There is higher deployment complexity in that you have more processes installed, and ultimately you have to strike a balance (as always) between flexibility and complexity, but keep in mind that many of these deployment headaches can be automated away.