Can Triggers be used in Cassandra for production for a multi datacenter environment? - triggers

I have a multi datacenter(DC1, DC2) environment having 3 nodes in each datacenter with RF=3 per datacenter.
Wanted to know if triggers can be used in production in a multi-datacenter environment. If so, how can this be achieved?
Case A: If I start inserting the data to DC1, it would have 3 replicas with in DC1 and is responsible of replicating the data to other data center DC2. Every time an insert into DC2 takes place, I would like to have an trigger event to occur and notify about the latest inserted value in the application. Is it possible?
Case B: If not point 2, is it good to insert the data simultaneously on to two datacenters DC1, DC2 (pointing to a single table) and avoid triggers concept?
Will it have any impact with the network traffic? Based on the latest timestamp, the table would have the last insert to the table which serves the purpose when queried from either of the regions.
Consistency level as LOCAL_QUORUM for Read
Consistency level as ONE for write
dse 4.8.2
With these Consistency levels, good consistency can be achieved lowering the latency for write operation across the datacenters.
Usecase:
We have an application (2 domains) for two different regions(DC1 &
DC2). Users of DC1 region uses domain 1 to access the application and
users of DC2 region uses domain 2 for the same. The data is ingested
to DC1 for the same region and when this replicates in its DC, the
coordinator of DC1 would replicate the data in other DC (DC2). The
moment Dc2 receives the data from DC1, we want to let the application
know about the latest information (Polling_ available using some
trigger event mechanism. Just wanted to know if this can be
implemented with cassandra triggers.
Can someone give the feedback on Case A and Case B? and which would be efficient in production.
Thanks

In either case stated above I am not sure why you want to use a trigger to notify your application that a value was inserted. In the scenario as I understand it your application already knows the newest value. Once the write has been successful you can notify your application with the newest value.
In both cases A and B you are working against some of the basic principals of how Cassandra functions. At an application level you should now need to worry about ensuring replication or eventual consistency of your data across multiple nodes and data centers. That is a large part of what Cassandra brings to the table.
In both Case A and B you are going to get multiple inserts of the same data for each write in each node it is replicated to in both data centers. As you write to DC1 it will also be written to DC2. If you then write to DC2 it will be written back to DC1. This will end with a large number of rows containing the same data and will increase disk requirements and compaction frequency. This will also increase network traffic as the two DC's talk back and forth to gain eventual consistency.
From what I can see here I also have to ask why you are doing an RF=3 on a 3 node cluster. This means that each node in each data center will have all the data essentially making each server a complete replica of the others. This seems like it may be overkill (depending on the data of course) as you are not going to get a lot of the scalability benefits that Cassandra offers.
Cassandra will handle the syncing of data between the data centers and across nodes so your application does not need to worry about this.
One other quick note - Currently your writes are using a CL=ONE. This means that you may end up with cross-DC latency on a write request. If you change this to LOCAL_ONE then you limit your CL query until one of the nodes in the local DC has written the value instead of possibly a node in the other DC. Cassandra will still handle the replication and syncing of the data.

Generally, multiple data center concept is used for workload separation(say different DCs for real-time query,analytic and search). Cassandra by itself takes care of replicating the data across multiple DCs.
So, coming to your question Case B doesn't seems a right option because:
Cassandra automatically replicates data across multiple DCs link
Case A is feasible.alerts/notifications using triggers
Hope, it will be helpful.

Related

Sharding with replication

Sharding with replication]1
I have a multi tenant database with 3 tables(store,products,purchases) in 5 server nodes .Suppose I've 3 stores in my store table and I am going to shard it with storeId .
I need all data for all shards(1,2,3) available in nodes 1 and 2. But node 3 would contain only shard for store #1 , node 4 would contain only shard for store #2 and node 5 for shard #3. It is like a sharding with 3 replicas.
Is this possible at all? What database engines can be used for this purpose(preferably sql dbs)? Did you have any experience?
Regards
I have a feeling you have not adequately explained why you are trying this strange topology.
Anyway, I will point out several things relating to MySQL/MariaDB.
A Galera cluster already embodies multiple nodes (minimum of 3), but does not directly support "sharding". You can have multiple Galera clusters, one per "shard".
As with my comment about Galera, other forms of MySQL/MariaDB can have replication between nodes of each shard.
If you are thinking of having a server with all data, but replicate only parts to readonly Replicas, there are settings for replicate_do/ignore_database. I emphasize "readonly" because changes to these pseudo-shards cannot easily be sent back to the Primary server. (However see "multi-source replication")
Sharding is used primarily when there is simply too much traffic to handle on a single server. Are you saying that the 3 tenants cannot coexist because of excessive writes? (Excessive reads can be handled by replication.)
A tentative solution:
Have all data on all servers. Use the same Galera cluster for all nodes.
Advantage: When "most" or all of the network is working all data is quickly replicated bidirectionally.
Potential disadvantage: If half or more of the nodes go down, you have to manually step in to get the cluster going again.
Likely solution for the 'disadvantage': "Weight" the nodes differently. Give a height weight to the 3 in HQ; give a much smaller (but non-zero) weight to each branch node. That way, most of the branches could go offline without losing the system as a whole.
But... I fear that an offline branch node will automatically become readonly.
Another plan:
Switch to NDB. The network is allowed to be fragile. Consistency is maintained by "eventual consistency" instead of the "[virtually] synchronous replication" of Galera+InnoDB.
NDB allows you to immediately write on any node. Then the write is sent to the other nodes. If there is a conflict one of the values is declared the "winner". You choose which algorithm for determining the winner. An easy-to-understand one is "whichever write was 'first'".

How should I setup Kafka paritions vs topics in my setup

I'm looking to use Kafka as my event store/stream for orders, here are a few attributes:
I have two regions to cater for: London and New York
An order started in London is highly likely to have further events (updates) come from London, however we do need to support cross-regional reads/writes (i.e for an event started in London, writes can come from New York)
The business would benefit from a lower latency so having London writing to New York or vice versa should be minimized
An order has a lifestyle of 24h, it can be archived from the event log at this point as we no longer need it.
Need resiliency, if the London Kafka plant goes down, I should be able to failover to New York and vice versa.
Ordering of the events needs to be consistent across all regions
Order numbers are only in the 1000s per 24h.
So I'm trying to get my setup of Kafka correct so I can minimise the amount of work I have to do external to Kafka, so my concerns/questions are:
Originating region seems like a natural partitioning key, but as far as I can see it, I gain nothing from a partitioning a topic...I could just have 2 topics, one for London, one for New York? Am I correct?
As far as I can see, in order to have the ability to failover, I need to setup two SEPARATE clusters and use mirror maker to sync the two topics across regions. But this would mean I would need to build logic into my applications so that they publish an event to the correct cluster - am I understanding correctly? Is there any way I can setup Kafka so I don't have to do this and I just connect to the local cluster and read/write to that, letting the cluster take care of where it routes the events to
You might want to look into the "rack awareness" configuration for brokers, which helps with rack aware partition replication. This is mostly used to improve cross availability zone traffic, you can read about it more here. The gist of it, is that your consumers can fetch records from the "nearest" replica. In your case a consumer sitting in London might only fetch data from brokers in London, assuming you operate a single cross-region cluster.
Concerning latency: If you don't have any sub-seconds requirements, I would highly recommend to operate a single cluster instead of two. The latency between the east coast and the UK shouldn't be too bad. Keep it simple, Kafka is very robust and can handle most faults within a single cluster (e.g. a broker dying). Start with a single cluster in one location, you will still be able to add a second one and migrate your data over using mirror maker or a dedicated service.
This would also result in you not having the "same" topic twice for each region. Separate your topics based on their content, not their location. Otherwise you'll have lots of fun, when migrating the data format you use for orders. You want to be as flexible as possible for future changes.

How to reliably shard data across multiple servers

I am currently reading up on some distributed systems design patterns. One of the designs patterns when you have to deal with a lot of data (billions of entires or multiple peta bytes) would be to spread it out across multiple servers or storage units.
One of the solutions for this is to use a Consistent hash. This should result in an even spread across all servers in the hash.
The concept is rather simple: we can just add new servers and only the servers in the range would be affected, and if you loose servers the remaining servers in the consistent hash would take over. This is when all servers in the hash have the same data (in memory, disk or database).
My question is how do we handle adding and removing servers from a consistent hash where there are so much data that it can't be stored on a single host. How do they figure out what data to store and what not too?
Example:
Let say that we have 2 machines running, "0" and "1". They are starting to reach 60% of their maximum capacity, so we decide to add an additional machine "2". Now a large part the data on machine 0 has to be migrated to machine 2.
How would we automate so this will happen without downtime and while being reliable.
My own suggested approach would be that the service hosing consistent hash and the machines would have be aware of how to transfer data between each other. When a new machine is added, will the consistent hash service calculate the affected hash ranges. Then inform the affect machine
of the affected hash range and that they need to transfer affected data to machine 2. Once the affected machines are done transferring their data, they would ACK back to the consistent hash service. Once all affected services are done transferring data, the consistent hash service would start sending data to machine 2, and inform the affected machine that they can remove their transferred data now. If we have peta bytes on each server can this process take a long time. We there for need to keep track of what entires where changes during the transfer so we can ensure to sync them after, or we can submit the write/updates to both machine 0 and 2 during the transfer.
My approach would work, but i feel it is a little risky with all the backs and forth, so i would like to hear if there is a better way.
How would we automate so this will happen without downtime and while being reliable?
It depends on the technology used to store your data, but for example in Cassandra, there is no "central" entity that governs the process and it is done like almost everything else; by having nodes gossiping with each other. There is no downtime when a new node joins the cluster (performance might be slightly impacted though).
The process is as follow:
The new node joining the cluster is defined as an empty node without system tables or data.
When a new node joins the cluster using the auto bootstrap feature, it will perform the following operations
- Contact the seed nodes to learn about gossip state.
- Transition to Up and Joining state (to indicate it is joining the cluster; represented by UJ in the nodetool status).
- Contact the seed nodes to ensure schema agreement.
- Calculate the tokens that it will become responsible for.
- Stream replica data associated with the tokens it is responsible for from the former owners.
- Transition to Up and Normal state once streaming is complete (to indicate it is now part of the cluster; represented by UN in the nodetool status).
Taken from https://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html
So when the joining node is in the Joining State, it is receiving data from other nodes but not ready for reads until the process is complete (Up status).
DataStax also has some material on this https://academy.datastax.com/units/2017-ring-dse-foundations-apache-cassandra?path=developer&resource=ds201-datastax-enterprise-6-foundations-of-apache-cassandra

Why isn't RDBMS Partition Tolerant in CAP Theorem and why is it Available?

Two points I don’t understand about RDBMS being CA in CAP Theorem :
1) It says RDBMS is not Partition Tolerant but how is RDBMS any less Partition Tolerant than other technologies like MongoDB or Cassandra? Is there a RDBMS setup where we give up CA to make it AP or CP?
2) How is it CAP-Available? Is it through master-slave setup? As in when the master dies, slave takes over writes?
I’m a novice at DB architecture and CAP theorem so please bear with me.
It is very easy to misunderstand the CAP properties, hence I'm providing some illustrations to make it easier.
Consistency: A query Q will produce the same answer A regardless the node that handles the request. In order to guarantee full consistency we need to ensure that all nodes agree on the same value at all times. Not to be confused with eventual consistency in which the network moves towards having all data consistent but there are periods of time in which it is not.
Availability: If the distributed system receives query Q it will always produce an answer for that query. This should not be confused with "high-availability", this is not about having the capacity to process a higher troughput of queries, it is about not refusing to answer.
Partition Tolerance: The system continues to function despite the existence of a partition. This is not about having mechanisms to "fix" the partition, it is about tolerating the partition, i.e. continuing despite the partition.
Note that the following examples do not cover all possible scenarios. Consider the following caption:
An example for CP:
The system is partition tolerant because its nodes keep accepting requests despite the partition; it is consistent because the only nodes providing answers are those that maintain a connection to the master node that handles all the write requests; it is not available because the nodes in the other partition do not provide an answer to the queries they receive.
Examples for AP:
Either because (respectively) we have the slave nodes replying to requests regardless whether they able to reach master or because the slave nodes in the other partition elect a new master, or because we have a masterless cluster, availability is achieved because all questions are getting an answer - consistency is dropped because both partitions are replying while potentially yielding different states.
Examples for CA:
If we disconnect nodes when a partition occurs, we can ensure that we have at most one partition which ultimately means that the network is not partitioned anymore, or simply there is no service at all. This is the opposite of partition tolerance, because the system is avoiding the partition instead of functioning despite it. Consistency and availability holds in these partially or fully disconnected systems because all working nodes (if any) have the same state and all received queries (if any) will get an answer - shutdown nodes do not receive queries.
To answer the questions:
Under default configurations, databases such as Cassandra and MongoDB are partition tolerant because they do not shutdown nodes to cope with partitions, whereas RDBMS such as MySQL do.
Availability has very little to do with master/slave setup, e.g. Cassandra is masterless and very available because it doesn't really matter which node dies. As for availability in a master/slave setup, there is no reason to stop responding to all queries when master is dead, but you may need to suspend write operations while electing a new one.
A lot of databases now actually have different configurations and depending on the settings you set, it can be either CA, CP, AP, etc but can not achieve all three at the same time. Some databases actually make an effort to support all three but still prioritizes them in a certain way.
For example, MySQL can be CP and CA depending on the configurations. By default, it is CA because it follows a master slave paradigm which data is replicated to the slaves. Partition tolerance is sacrificed in the event that a set of the slaves loses the connection to the master and therefore decides to elect a new master creating two masters with their own set of slaves.
However, MySQL also has another configuration which is a clustered configuration. It prioritizes CP over availability eg. the cluster will shutdown if there are not enough live nodes to serve all the data.
There are probably more configurations for MySQL that makes it satisfy other CAP theorem combinations but overall, I just wanted say that it depends on what your system requires. Sometimes databases are better for one configuration vs another so its best to see what kinds of problems that may also occur in using a certain configuration.
As for implementing the CAP theorem, I would advise taking a further look into different databases and how they implement the priorities for the CAP theorem. There are just too many different ways of implementing them eg. generally, the master slave model is used for CA systems, the hash ring for AP systems, etc.
CAP theorem is problematic and it applies only to distributed database systems. When you have distributed databases then network partition and node crashes can happen. And when network partition happens you must have partition tolerance (the P of your CAP).
So to answer your question number 1) It’s either CP or AP. It can be configured as Will mentioned.
More about why partition tolerance is a must:
https://codahale.com/you-cant-sacrifice-partition-tolerance/
More about problems around CAP theorem:
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
I agree that RDBMS can have all the properties of CAP. I have started studying noSQL DBs and had prior experience with IBM DB2.
Here is how IBM DB2 satisfies all the 3 CAP properties
C : Consistency : Every relational database satisfies this due to the transactional nature of RDBMS.
A : Availability : Availability means that when a query is made for a data that exists, it should be returned. Again, a relational database is designed to do this easily.
P : Partition Tolerance : This is the most interesting one. From DB2 stand point, in the application that I was working on, we had 2 databases spread across different data centres. One was the primary and communicated with the secondary via heartbeats. Each of these primary and secondary databases, had 12 physical instances where data was distributed on the basis of some predefined logic. If the primary goes down, the secondary detects this and takes the place of primary. Since the primary and secondary were always maintained in sync, data remains consistent as well.
This is how I think that RDBMS satisfies all 3 properties of CAP Theorem.
I may be wrong, and open to discussion on this.

Do NoSQL datacenter aware features enable fast reads and writes when nodes are distributed across high-latency connections?

We have a data system in which writes and reads can be made in a couple of geographic locations which have high network latency between them (crossing a few continents, but not this slow). We can live with 'last write wins' conflict resolution, especially since edits can't be meaningfully merged.
I'd ideally like to use a distributed system that allows fast, local reads and writes, and copes with the replication and write propagation over the slow connection in the background. Do the datacenter-aware features in e.g. Voldemort or Cassandra deliver this?
It's either this, or we roll our own, probably based on collecting writes using something like
rsync and sorting out the conflict resolution ourselves.
You should be able to get the behavior you're looking for using Voldemort. (I can't speak to Cassandra, but imagine that it's similarly possible using it.)
The key settings in the configuration will be:
replication-factor — This is the total number of times the data is stored. Each put or delete operation must eventually hit this many nodes. A replication factor of n means it can be possible to tolerate up to n - 1 node failures without data loss.
required-reads — The least number of reads that can succeed without throwing an exception.
required-writes — The least number of writes that can succeed without the client getting back an exception.
So for your situation, the replication would be set to whatever number made sense for your redundancy requirements, while both required-reads and required-writes would be set to 1. Reads and writes would return quickly, with a concomitant risk of stale or lost data, and the data would only be replicated to the other nodes afterwards.
I have no experience with Voldemort, so I can only comment on Cassandra.
You can deploy Cassandra to multiple datacenters with an inter-DC latency higher than a few milliseconds (see http://spyced.blogspot.com/2010/04/cassandra-fact-vs-fiction.html).
To ensure fast local reads, you can configure the cluster to replicate your data to a certain number of nodes in each datacenter (see "Network Topology Strategy"). For example, you specify that there should always be two replica in each data center. So even when you lose a node in a data center, you will still be able to read your data locally.
Write requests can be sent to any node in a Cassandra cluster. So for fast writes, your clients would always speak to a local node. The node receiving the request (the "coordinator") will replicate the data to other nodes (in other datacenters) in the background. If nodes are down, the write request will still succeed and the coordinator will replicate the data to the failed nodes at a later time ("hinted handoff").
Conflict resolution is based on a client-supplied timestamp.
If you need more than eventual consistency, Cassandra offers several consistency options (including datacenter-aware options).