I see conflicting answer to which KV store option can be used for Grafana mimir HA Tracker
https://grafana.com/docs/mimir/latest/operators-guide/configure/configuring-high-availability-deduplication/. - says only etcd and console are supporting kv store
https://grafana.com/docs/mimir/latest/operators-guide/configure/reference-configuration-parameters/#distributor - says etcd, console, inmemory and multi are supporting options for kv store
I tried with "inmemory" but distributor is accepting all replica metrics. so doubt if "inmemory" is even a valid option?
Related
Description
Is it possible for Spring Data Redis to use Elasticache's configuration endpoint to perform all cluster operations (i.e., reading, writing, etc.)?
Long Description
I have a Spring Boot application that uses a Redis cluster as data store. The Redis cluster is hosted on AWS Elasticache running in cluster-mode enabled. The Elasticache cluster has 3 shards spread out over 12 nodes. The Redis version that the cluster is running is 6.0.
The service isn't correctly writing or retrieving data from the cluster. Whenever performing any of these operations, I get a message similar to the following:
io.lettuce.core.RedisCommandExecutionException: MOVED 16211 10.0.7.254:6379
In searching the internet, it appears that the service isn't correctly configured for a cluster. The fix seems to be set the spring.redis.cluster.nodes property with a list of all the nodes in the Elasticache cluster (see here and here). I find this rather needless, considering that the Elasticache configuration endpoint is supposed to be used for all read and write operations (see the "Finding Endpoints for a Redis (Cluster Mode Enabled) Cluster" section here).
My question is this: can Spring Data Redis use Elasticache's configuration endpoint to perform all reads and writes, the way the AWS documentation describes? I'd rather not hand over a list of all the nodes if Spring Data Redis can use the configuration endpoint the way its meant to be used. This seems like a serious limitation to me.
Thanks in advance!
Here is what I found works:
#Bean
public RedisConnectionFactory lettuceConnectionFactory()
{
LettuceClientConfiguration config =
LettucePoolingClientConfiguration
.builder()
.*your configuration settings*
.build();
RedisClusterConfiguration clusterConfig = new RedisClusterConfiguration();
clusterConfig.addClusterNode(new RedisNode("xxx.v1tc03.clustercfg.use1.cache.amazonaws.com", 6379));
return new LettuceConnectionFactory(clusterConfig, config);
}
where xxx is the name from your elasticache cluster.
We have a cluster of 3 nodes, 2 of them are offline (missing) and I cannot get them to rejoin the cluster automatically only the master is Online.
Usually, you can use innodb admin:
var cluster = dba.getCluster();
but I cannot use the cluster instance because the metadata is not up to date. But I cannot upgrade the meta data because the missing members are required to be online to use dba.upgradeMetadata(). (Catch 22)
I tried to dissolve the cluster by using:
var cluster = dba.rebootClusterFromCompleteOutage();
cluster.dissolve({force:true});
but this requires the metadata to be updated as well.
Question is, how do I dissolve the cluster completely or upgrade the metadata so that I can use the cluster. methods.
This "chicken-egg" issue was fixed in MySQL Shell 8.0.20. dba.rebootClusterFromCompleteOutage() is now allowed in such situation:
BUG#30661129 – DBA.UPGRADEMETADATA() AND DBA.REBOOTCLUSTERFROMCOMPLETEOUTAGE() BLOCK EACH OTHER
More info at: https://mysqlserverteam.com/mysql-shell-adminapi-whats-new-in-8-0-20/
If you have a cluster where each node upgrades to the latest version of mysql and the cluster isn't fully operational and you need to update your metadata for mysqlsh, you'll need to use an older version of mysqlsh for example, https://downloads.mysql.com/archives/shell/ to get the cluster back up and running. Once it is up and running you can use the dba.upgrademetadata on the R/W node - make sure you update all of your routers or they will lose connection.
I`ve just read Deploying Cassandra with Stateful Sets topic in the Kubernetes documentation.
The deployment process:
1. Creation of StorageClass
2. Creation of PersistentVolume (in my case 4 PersistentVolume). Set created in 1) storageClassName
3. Creation of Cassandra Headless Service
4. Using a StatefulSet to Create a Cassandra Ring - setting created in 1) storageClassName in StatefulSet yml definition.
As a result, there are 4 pods: Cassandra-0, Cassandra-1, Cassandra-2, Cassandra-4, which are mounted to created in 2) volumes (pv-0, pv-1, pv-2, pv-3).
I wonder how / if these persistent volumes synchronize data with each other.
E.g. if I add some record, which will be written by pod cassandra-0 in persistent volume pv-0, then if someone who is going to retrieve data from the database a moment later - using the cassandra-1 pod/pv will see data that has been added to pv-0. Can anyone tell me how it works exactly?
This is not related to Kubernetes
The replication is done by database and is configurable
See the CAP theorem and Eventual Consistency for Cassandra
You can control the level of consistency in Cassandra, whether the record is immediately updated across or later , depends on the configuration you do in Cassandra.
See also: Synchronous Replication , Asynchronous Replication
Cassandra Consistency:
how to set cassandra read and write consistency
How is the consistency level configured?
The mechanism to spread data across the clusters is independent if it was deployed in kubernetes or bare-metal instances. Cassandra will try to spread randomly the data across the nodes depending on a hash value (known as token), and will use the same algorithm to retrieve the information.
There are other factors to take in consideration: The replication factor (amount of copies), and the consistency level used.
You would want to take a look to DS201: DataStax Enterprise Foundations of Apache Cassandra™ in Datastax academy, where they cover the basics of Cassandra.
Just to slightly extend Carlos' answer, Kubernetes is not involved and the volumes are completely isolated. The replication and distribution stuffs are entirely up to the database software to handle. As far as K8s sees, they are just separate processes and separate volumes.
Thanks for comments guys!
so, when I have my db with 3 PVs:
cassandra-pod0 cassandra-pod1 cassandra-pod2
| | |
cassandra-pv0 cassandra-pv0 cassandra-pv0
Data is divided into 3 pvs.When I kill cassandra-pod1 - it is possible that I will lose (temporarily) part of the data. Am I right?
Can someone tell me what the purpose of the “Managed Infrastructure Mixer Client”? I have it showing up on my GCE logs and I can’t find any information on it. It is adding and removing GCE instances.
I believe it is related to GCP's recommended settings:
Automatic restart - On (recommended)
On host maintenance - Migrate VM instance (recommended)
This is the User Agent used by Managed Instance Groups when performing operations on instances. These operations can result from both user operating on the MIG (e.g. resizing, recreating instances), as well as operations performed by Autoscaler, Autohealer, Updater, etc.
Note that this string may change in the future.
There is an emerging trend of ripping global state out of traditional "static" config management tools like Chef/Puppet/Ansible, and instead storing configurations in some centralized/distributed tool, of which the main players appear to be:
ZooKeeper (Apache)
Consul (Hashicorp)
Eureka (Netflix)
Each of these tools works differently, but the principle is the same:
Store your env vars and other dynamic configurations (that is, stuff that is subject to change) in these tools as key/value pairs
Connect to these tools/services via clients at startup and pull down your config KV pairs. This typically requires the client to supply a service name ("MY_APP"), and an environment ("DEV", "PROD", etc.).
There is an excellent Consul Java client which explains all of this beautifully and provides ample code examples.
My understanding of these tools is that they are built on top of consensus algorithms such as Zab, Paxos and Gossip that allow config updates to spread almost virally, with eventual consistency, throughout your nodes. So the idea there is that if you have a myapp app that has 20 nodes, say myapp01 through myapp20, if you make a config change to one of them, that change will naturally "spread" throughout the 20 nodes over a period of seconds/minutes.
My problem is: how do these updates actually deploy to each node? In none of the client APIs (the one I linked to above, the ZooKeeper API, or the Eureka API) do I see some kind of callback functionality that can be set up and used to notify the client when the centralized service (e.g. the Consul cluster) wants to push and reload config updates.
So I ask: how is this supposed to work (dynamic config deployment and reload on clients)? I'm interested in any viable answer for any of those 3 tools, though Consul's API seems to be the most advanced IMHO.
You could use cfg4j for that. It's a Java configuration library for distributed services. It supports Consul as one of the configuration sources.
That's a nice question. I can tell how Consul HTTP client works.
I also think initially that it works in the push mechanism but while I was recently exploring Consul, I found that all Consul clients poll server for changes they want to watch. Although it is a bit different polling mechanism, Consul supports blocking queries. These are HTTP requests with a max timeout of 10 mins. This query waits until there is some change on the watched key/folder and return with the latest index. If the index is changed, the client reloads the configuration. For more info : Consul Blocking Query