Kubernetes Keycloak high availability cluster - keycloak

I'm trying to deploy Keycloak in Kubernetes with multiple replicas. I am using Helm 3.0 charts with the latest Kubernetes. It deploys fine when I have one replica in my stateful set—but I need high availability and, thus, at least two replicas. So far, it only works with one replica. With two replicas, I can't login as either an admin or as a regular user.
Can someone provide me with a working version of Keycloak deployment (preferably Helm) that supports multiple replicas?
jgroups:
discoveryProtocol: dns.DNS_PING
jgroups:
discoveryProtocol: Kubernetes.KUBE_PING
jgroups:
discoveryProtocol: JDBC_PING
Stateful set snippet
apiVersion: v1
items:
- apiVersion: apps/v1
kind: StatefulSet
metadata:
...
labels:
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: keycloak
helm.sh/chart: keycloak-7.5.0
name: ...
namespace: default
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
...
containers:
- command:
- /scripts/keycloak.sh
env:
...
livenessProbe:
failureThreshold: 3
httpGet:
path: /auth/
port: http
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: keycloak
ports:
...
readinessProbe:
failureThreshold: 3
httpGet:
path: /auth/realms/master
port: http
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
- name: POSTGRES_DB
value: keycloak
- name: POSTGRESQL_ENABLE_LDAP
value: "no"
image: docker.io/bitnami/postgresql:12.2.0-debian-10-r91
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -c
- exec pg_isready -U "keycloak" -d "keycloak" -h 127.0.0.1 -p 5432
failureThreshold: 6
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: bizmall-postgresql
ports:
- containerPort: 5432
name: tcp-postgresql
protocol: TCP
readinessProbe:
exec:
command:
- /bin/sh
- -c
- -e
- |
exec pg_isready -U "keycloak" -d "keycloak" -h 127.0.0.1 -p 5432
[ -f /opt/bitnami/postgresql/tmp/.initialized ] || [ -f /bitnami/postgresql/.initialized ]
failureThreshold: 6
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
requests:
cpu: 250m
memory: 256Mi
securityContext:
runAsUser: 1001
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /bitnami/postgresql
name: data
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
terminationGracePeriodSeconds: 30

Here’s helm chart for keycloak - https://github.com/codecentric/helm-charts/tree/master/charts/keycloak we are using it do deploy HA mode keycloak with 3 replicas.

Related

How to deploy Zookeeper on Kubernetes - Liveness probe failed "The list of enabled four letter word commands is : [[srvr]]"

Hello I try to deploy Zookeeper in Kubernetes with this yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: zookeeper-cluster-import-deployment
namespace: mynamespace
labels:
name : zookeeper-cluster-import
app : zookeeper-cluster-import
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: zookeeper-cluster-import
template:
metadata:
labels:
app: zookeeper-cluster-import
spec:
restartPolicy: Always
containers:
- name: zookeeper
image: dockerbase_zookeeper:v01
imagePullPolicy: IfNotPresent
ports:
- containerPort: 2181
name: zookeeper
env:
- name: ALLOW_ANONYMOUS_LOGIN
value: "yes"
- name: ZOO_4LW_COMMANDS_WHITELIST
value: "ruok"
livenessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
resources:
requests:
cpu: 400m
memory: 2Gi
limits:
cpu: 500m
memory: 2Gi
securityContext:
allowPrivilegeEscalation: false
privileged: false
runAsGroup: 1000
runAsUser: 1000
imagePullSecrets:
- name: secret-repos
volumes:
- name: pv-01
persistentVolumeClaim:
claimName: pv-claim-01
But when I apply this yaml I have "Liveness probe failed" and in the pod logs I see at the end of the logs : The list of enabled four letter word commands is : [[srvr]]
I don't understand because in env I did :
- name: ZOO_4LW_COMMANDS_WHITELIST
value: "ruok"
It is like the command is not recognized.

Setup a Single Solr exporter, for all instances of Solr Statefulsets running in different namespaces in a Kubernetes Cluster

In our Project, we are using Solr exporter to fetch Solr metrics like heap usage and send it to Prometheus. We have configured alert manager to fire alert when Solr heap usage exceeds 80%. This application is configured for all instances.
Is there a way we can configure a common Solr exporter which will fetch Solr metrics from all instances which are in different namespaces in that cluster.? (This is reduce the resource consumption in kuberenetes caused by multiple solr exporter instances.)
YAML configs for solr exporter given below:
apiVersion: v1
kind: Service
metadata:
name: solr-exporter
labels:
app: solr-exporter
spec:
ports:
- port: 9983
name: client
selector:
app: solr
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: solr
name: solr-exporter
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: solr
pod: solr-exporter
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: solr
pod: solr-exporter
spec:
containers:
- command:
- /bin/bash
- -c
- /opt/solr/contrib/prometheus-exporter/bin/solr-exporter -p 9983 -z zk-cs:2181
-n 7 -f /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml
image: solr:8.1.1
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: 9983
scheme: HTTP
initialDelaySeconds: 480
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
name: solr-exporter
ports:
- containerPort: 9983
protocol: TCP
readinessProbe:
failureThreshold: 2
httpGet:
path: /metrics
port: 9983
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 500m
requests:
cpu: 50m
memory: 64Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml
name: solr-exporter-config
readOnly: true
subPath: solr-exporter-config.xml
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- |-
apt-get update
apt-get install curl -y
PROTOCOL="http://"
COUNTER=0;
while [ $COUNTER -lt 30 ]; do
curl -v -k -s --connect-timeout 10 "${PROTOCOL}solrcluster:8983/solr/admin/info/system" && exit 0
sleep 2
done;
echo "Did NOT see a Running Solr instance after 60 secs!";
exit 1;
image: solr:8.1.1
imagePullPolicy: Always
name: mpi-demo-solr-exporter-init
resources: {}
securityContext:
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: solr-exporter-config
name: solr-exporter-config

Multiple kube-shcedulers

I'm trying to create a kube-shceduler that will be used by my organisation pod deployment (a data-center). The objective would be to have the pods of any deployment be split equally between the two sites of the data-center. To archive this I created a new kube-shceduler manifest (see below), in which i changed the Ports that are used and added a new configuration file (see below) using the --config parameter. The problem is that even after setting the --port parameter to a new port that isn't already used by the original kube-scheduler, it still tries to use the old one. As such the new kube scheduler can't start:
I1109 20:27:59.225996 1 registry.go:173] Registering SelectorSpread plugin
I1109 20:27:59.226097 1 registry.go:173] Registering SelectorSpread plugin
I1109 20:27:59.926994 1 serving.go:331] Generated self-signed cert in-memory
failed to create listener: failed to listen on 0.0.0.0:10251: listen tcp 0.0.0.0:10251: bind: address already in use
How can I force the use of the specified port OR how can I delete the original kube-scheduler so that there won't be any conflict between the two. The first solution would be preferable.
Configuration kube-scheduler (manifest - /etc/kubernetes/manifest/kube-scheduler.yaml) :
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
#- --port=0
image: k8s.gcr.io/kube-scheduler:v1.19.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
status: {}
Configuration kube-custom-scheduler.yaml (/etc/kubernetes/manifest/kube-custom-scheduler.yaml):
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-custom-scheduler
tier: control-plane
name: kube-custom-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --config=/etc/kubernetes/scheduler-custom.conf
- --master=true
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=false
- --secure-port=10269
- --scheduler-name=kube-custom-shceduler
- --port=10261
image: k8s.gcr.io/kube-scheduler:v1.19.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10269
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10269
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/scheduler-custom.conf
name: customconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/scheduler-custom.conf
type: FileOrCreate
name: customconfig
status: {}
Custom configuration mentioned in the kube-custom-scheduler.yaml (/etc/kubernetes/scheduler-custom.conf):
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
- name: PodTopologySpread
args:
defaultConstraints:
- maxSkew: 1
topologyKey: topology.kube.io/datacenter
whenUnsatisfiable: ScheduleAnyway
Please don't hesistate if you need more information about the cluster.
According to https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/ , the --port would be overwritten by the file specified by --config. Adding HealthzBindAddress and MetricsBindAddress to /etc/kubernetes/scheduler-custom.conf works:
healthzBindAddress: 0.0.0.0:10261
metricsBindAddress: 0.0.0.0:10261
If you want to remove the default scheduler, you can move kube-scheduler.yaml out of /etc/kubernetes/manifests/.

Kubernetes StatefulSet - obtain spec.replicas metadata and reference elsewhere in configuration

I am configuring a StatefulSet where I want the number of replicas (spec.replicas as shown below) available to somehow pass as a parameter into the application instance. My application needs spec.replicas to determine the numer of replicas so it knows what rows to load from a MySQL table. I don't want to hard-code the number of replicas in both spec.replicas and the application parameter as that will not work when scaling the number of replicas up or down, since the application parameter needs to adjust when scaling.
Here is my StatefulSet config:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
run: my-app
name: my-app
namespace: my-ns
spec:
replicas: 3
selector:
matchLabels:
run: my-app
serviceName: my-app
podManagementPolicy: Parallel
template:
metadata:
labels:
run: my-app
spec:
containers:
- name: my-app
image: my-app:latest
command:
- /bin/sh
- /bin/start.sh
- dev
- 2000m
- "0"
- "3" **Needs to be replaced with # replicas**
- 127.0.0.1
- "32990"
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /health
port: 8081
initialDelaySeconds: 180
periodSeconds: 10
timeoutSeconds: 3
readinessProbe:
failureThreshold: 10
httpGet:
path: /ready
port: 8081
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 3
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
memory: 2500Mi
imagePullSecrets:
- name: snapshot-pull
restartPolicy: Always
I have read the Kubernetes docs and the spec.replicas field is scoped at the pod or container level, never the StatefulSet, at least as far as I have seen.
Thanks in advance.
You could use a yaml anchor to do this:
Check out:
https://helm.sh/docs/chart_template_guide/yaml_techniques/#yaml-anchors
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
run: my-app
name: my-app
namespace: my-ns
spec:
replicas: &numReplicas 3
selector:
matchLabels:
run: my-app
serviceName: my-app
podManagementPolicy: Parallel
template:
metadata:
labels:
run: my-app
spec:
containers:
- name: my-app
image: my-app:latest
command:
- /bin/sh
- /bin/start.sh
- dev
- 2000m
- "0"
- *numReplicas
- 127.0.0.1
- "32990"
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /health
port: 8081
initialDelaySeconds: 180
periodSeconds: 10
timeoutSeconds: 3
readinessProbe:
failureThreshold: 10
httpGet:
path: /ready
port: 8081
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 3
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
memory: 2500Mi
imagePullSecrets:
- name: snapshot-pull
restartPolicy: Always
Normally you would use the downward api for this kind of thing. https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/
However it is currently not possible for kubernetes to propagate deployment/statefulset spec data into the pod spec with the downward api, nor should it be. If you are responsible for this software I'd set up some internal functionality so that it can find it's peers and determine their count periodically.

Kubernetes deployment incurs downtime

When running a deployment I get downtime. Requests failing after a variable amount of time (20-40 seconds).
The readiness check for the entry container fails when the preStop sends SIGUSR1, waits for 31 seconds, then sends SIGTERM. In that timeframe the pod should be removed from the service as the readiness check is set to fail after 2 failed attempts with 5 second intervals.
How can I see the events for pods being added and removed from the service to find out what's causing this?
And events around the readiness checks themselves?
I use Google Container Engine version 1.2.2 and use GCE's network load balancer.
service:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- name: https
port: 443
targetPort: https
protocol: TCP
selector:
app: myapp
deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
revisionHistoryLimit: 10
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
version: 1.0.0-61--66-6
spec:
containers:
- name: myapp
image: ****
resources:
limits:
cpu: 100m
memory: 250Mi
requests:
cpu: 10m
memory: 125Mi
ports:
- name: http-direct
containerPort: 5000
livenessProbe:
httpGet:
path: /status
port: 5000
initialDelaySeconds: 30
timeoutSeconds: 1
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["sleep 31;"]
- name: haproxy
image: travix/haproxy:1.6.2-r0
imagePullPolicy: Always
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 25Mi
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
env:
- name: "SSL_CERTIFICATE_NAME"
value: "ssl.pem"
- name: "OFFLOAD_TO_PORT"
value: "5000"
- name: "HEALT_CHECK_PATH"
value: "/status"
volumeMounts:
- name: ssl-certificate
mountPath: /etc/ssl/private
livenessProbe:
httpGet:
path: /status
port: 443
scheme: HTTPS
initialDelaySeconds: 30
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /readiness
port: 81
initialDelaySeconds: 0
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 2
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["kill -USR1 1; sleep 31; kill 1"]
volumes:
- name: ssl-certificate
secret:
secretName: ssl-c324c2a587ee-20160331
When the probe fails, the prober will emit a warning event with reason as Unhealthy and message as xx probe errored: xxx.
You should be able to find those events using either kubectl get events or kubectl describe pods -l app=myapp,version=1.0.0-61--66-6 (filter pods by its label).