Hazelcast failed to start in Kubernetes

Hazelcast failed to start in Kubernetes - kubernetes

I have a HA Kubernetes cluster that initialized with custom certificates. I want to run Hazelcast on it, but there is an error in discovering Hazelcast members using Kubernetes API.
This is my deploy file:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hazelcast
labels:
app: hazelcast
spec:
replicas: 3
serviceName: hazelcast-service
selector:
matchLabels:
app: hazelcast
template:
metadata:
labels:
app: hazelcast
spec:
imagePullSecrets:
- name: nexuspullsecret
containers:
- name: hazelcast
image: 192.168.161.187:9050/hazelcast-custom:4.0.2
imagePullPolicy: "Always"
ports:
- name: hazelcast
containerPort: 5701
livenessProbe:
httpGet:
path: /hazelcast/health/node-state
port: 5701
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /hazelcast/health/node-state
port: 5701
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 1
resources:
requests:
memory: "0"
cpu: "0"
limits:
memory: "2048Mi"
cpu: "500m"
volumeMounts:
- name: hazelcast-storage
mountPath: /data/hazelcast
env:
- name: JAVA_OPTS
value: "-Dhazelcast.rest.enabled=true -Dhazelcast.config=/data/hazelcast/hazelcast.xml"
volumes:
- name: hazelcast-storage
configMap:
name: hazelcast-configuration
---
apiVersion: v1
kind: Service
metadata:
name: hazelcast-service
spec:
type: ClusterIP
selector:
app: hazelcast
ports:
- protocol: TCP
port: 5701
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: hazelcast-cluster-role
rules:
- apiGroups: [""]
resources: ["endpoints", "pods", "nodes"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: hazelcast-cluster-role-binding
subjects:
- kind: ServiceAccount
name: default
namespace: default
roleRef:
kind: ClusterRole
name: hazelcast-cluster-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: hazelcast
namespace: default
spec:
maxUnavailable: 0
selector:
matchLabels:
app: hazelcast
---
apiVersion: v1
kind: ConfigMap
metadata:
name: hazelcast-configuration
data:
hazelcast.xml: |-
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.hazelcast.com/schema/config
http://www.hazelcast.com/schema/config/hazelcast-config-4.0.xsd">
<network>
<rest-api enabled="true"></rest-api>
<join>
<!-- deactivate normal discovery -->
<multicast enabled="false"/>
<tcp-ip enabled="false" />
<!-- activate the Kubernetes plugin -->
<kubernetes enabled="true">
<service-name>hazelcast-service</service-name>
<namespace>default</namespace>
<kubernetes-api-retries>20</kubernetes-api-retries>
</kubernetes>
</join>
</network>
<user-code-deployment enabled="true">
<class-cache-mode>ETERNAL</class-cache-mode>
<provider-mode>LOCAL_AND_CACHED_CLASSES</provider-mode>
</user-code-deployment>
<reliable-topic name="ConfirmationTimeout">
<read-batch-size>10</read-batch-size>
<topic-overload-policy>DISCARD_OLDEST</topic-overload-policy>
<statistics-enabled>true</statistics-enabled>
</reliable-topic>
<ringbuffer name="ConfirmationTimeout">
<capacity>10000</capacity>
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<time-to-live-seconds>0</time-to-live-seconds>
<in-memory-format>BINARY</in-memory-format>
<merge-policy batch-size="100">com.hazelcast.spi.merge.PutIfAbsentMergePolicy</merge-policy>
</ringbuffer>
<scheduled-executor-service name="ConfirmationTimeout">
<capacity>100</capacity>
<capacity-policy>PER_NODE</capacity-policy>
<pool-size>32</pool-size>
<durability>3</durability>
<merge-policy batch-size="100">com.hazelcast.spi.merge.PutIfAbsentMergePolicy</merge-policy>
</scheduled-executor-service>
<cp-subsystem>
<cp-member-count>3</cp-member-count>
<group-size>3</group-size>
<session-time-to-live-seconds>300</session-time-to-live-seconds>
<session-heartbeat-interval-seconds>5</session-heartbeat-interval-seconds>
<missing-cp-member-auto-removal-seconds>14400</missing-cp-member-auto-removal-seconds>
<fail-on-indeterminate-operation-state>false</fail-on-indeterminate-operation-state>
<raft-algorithm>
<leader-election-timeout-in-millis>15000</leader-election-timeout-in-millis>
<leader-heartbeat-period-in-millis>5000</leader-heartbeat-period-in-millis>
<max-missed-leader-heartbeat-count>10</max-missed-leader-heartbeat-count>
<append-request-max-entry-count>100</append-request-max-entry-count>
<commit-index-advance-count-to-snapshot>10000</commit-index-advance-count-to-snapshot>
<uncommitted-entry-count-to-reject-new-appends>100</uncommitted-entry-count-to-reject-new-appends>
<append-request-backoff-timeout-in-millis>100</append-request-backoff-timeout-in-millis>
</raft-algorithm>
<locks>
<fenced-lock>
<name>TimeoutLock</name>
<lock-acquire-limit>1</lock-acquire-limit>
</fenced-lock>
</locks>
</cp-subsystem>
<metrics enabled="true">
<management-center>
<retention-seconds>30</retention-seconds>
</management-center>
<jmx enabled="false"/>
<collection-frequency-seconds>10</collection-frequency-seconds>
</metrics>
</hazelcast>
I have tested this deploy file on non-custom certificate ssl HA Kubernetes cluster and it works without any problems.
This is log files:
########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.rest.enabled=true -Dhazelcast.config=/data/hazelcast/hazelcast.xml
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*:/opt/hazelcast/user-lib/*
# CLASSPATH_DEFAULT=/opt/hazelcast/*:/opt/hazelcast/lib/*:/opt/hazelcast/user-lib/*
# starting now....
########################################
+ exec java -server -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.rest.enabled=true -Dhazelcast.config=/data/hazelcast/hazelcast.xml com.hazelcast.core.server.HazelcastMemberStarter
Sep 08, 2020 7:08:36 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Loading configuration '/data/hazelcast/hazelcast.xml' from System property 'hazelcast.config'
Sep 08, 2020 7:08:37 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Using configuration file at /data/hazelcast/hazelcast.xml
Sep 08, 2020 7:08:40 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.2] Prefer IPv4 stack is true, prefer IPv6 addresses is false
Sep 08, 2020 7:08:40 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.2] Picked [10.42.128.11]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true
Sep 08, 2020 7:08:40 AM com.hazelcast.system
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Hazelcast 4.0.2 (20200702 - 2de3027) starting at [10.42.128.11]:5701
Sep 08, 2020 7:08:40 AM com.hazelcast.system
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Sep 08, 2020 7:08:42 AM com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Backpressure is disabled
Sep 08, 2020 7:08:43 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Kubernetes Discovery properties: { service-dns: null, service-dns-timeout: 5, service-name: hazelcast-service, service-port: 0, service-label: null, service-label-value: true, namespace: default, pod-label: null, pod-label-value: null, resolve-not-ready-addresses: true, use-node-name-as-external-address: false, kubernetes-api-retries: 20, kubernetes-master: https://kubernetes.default.svc}
Sep 08, 2020 7:08:43 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Kubernetes Discovery activated with mode: KUBERNETES_API
Sep 08, 2020 7:08:43 AM com.hazelcast.instance.impl.Node
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Activating Discovery SPI Joiner
Sep 08, 2020 7:08:43 AM com.hazelcast.cp.CPSubsystem
INFO: [10.42.128.11]:5701 [dev] [4.0.2] CP Subsystem is enabled with 3 members.
Sep 08, 2020 7:08:44 AM com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)
Sep 08, 2020 7:08:44 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: [10.42.128.11]:5701 [dev] [4.0.2] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
Sep 08, 2020 7:08:45 AM com.hazelcast.core.LifecycleService
INFO: [10.42.128.11]:5701 [dev] [4.0.2] [10.42.128.11]:5701 is STARTING
Sep 08, 2020 7:08:47 AM com.hazelcast.kubernetes.RetryUtils
WARNING: Couldn't discover Hazelcast members using Kubernetes API, [1] retrying in 1 seconds...
Sep 08, 2020 7:08:49 AM com.hazelcast.kubernetes.RetryUtils
WARNING: Couldn't discover Hazelcast members using Kubernetes API, [2] retrying in 2 seconds...
Sep 08, 2020 7:08:51 AM com.hazelcast.kubernetes.RetryUtils
WARNING: Couldn't discover Hazelcast members using Kubernetes API, [3] retrying in 3 seconds...
Sep 08, 2020 7:08:54 AM com.hazelcast.kubernetes.RetryUtils
WARNING: Couldn't discover Hazelcast members using Kubernetes API, [4] retrying in 5 seconds...
Sep 08, 2020 7:09:00 AM com.hazelcast.kubernetes.RetryUtils
WARNING: Couldn't discover Hazelcast members using Kubernetes API, [5] retrying in 7 seconds...
Sep 08, 2020 7:09:07 AM com.hazelcast.kubernetes.RetryUtils
WARNING: Couldn't discover Hazelcast members using Kubernetes API, [6] retrying in 11 seconds...
Sep 08, 2020 7:09:12 AM com.hazelcast.internal.ascii.rest.HttpPostCommandProcessor
WARNING: [10.42.128.11]:5701 [dev] [4.0.2] An error occurred while handling request HttpCommand [HTTP_GET]{uri='/hazelcast/health/node-state'}AbstractTextCommand[HTTP_GET]{requestId=0}
java.lang.NullPointerException
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handleHealthcheck(HttpGetCommandProcessor.java:137)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:79)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:47)
at com.hazelcast.internal.ascii.TextCommandServiceImpl$CommandExecutor.run(TextCommandServiceImpl.java:396)
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
Sep 08, 2020 7:09:14 AM com.hazelcast.internal.ascii.rest.HttpPostCommandProcessor
WARNING: [10.42.128.11]:5701 [dev] [4.0.2] An error occurred while handling request HttpCommand [HTTP_GET]{uri='/hazelcast/health/node-state'}AbstractTextCommand[HTTP_GET]{requestId=0}
java.lang.NullPointerException
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handleHealthcheck(HttpGetCommandProcessor.java:137)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:79)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:47)
at com.hazelcast.internal.ascii.TextCommandServiceImpl$CommandExecutor.run(TextCommandServiceImpl.java:396)
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
Sep 08, 2020 7:09:19 AM com.hazelcast.kubernetes.RetryUtils
WARNING: Couldn't discover Hazelcast members using Kubernetes API, [7] retrying in 17 seconds...
Sep 08, 2020 7:09:22 AM com.hazelcast.internal.ascii.rest.HttpPostCommandProcessor
WARNING: [10.42.128.11]:5701 [dev] [4.0.2] An error occurred while handling request HttpCommand [HTTP_GET]{uri='/hazelcast/health/node-state'}AbstractTextCommand[HTTP_GET]{requestId=0}
java.lang.NullPointerException
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handleHealthcheck(HttpGetCommandProcessor.java:137)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:79)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:47)
at com.hazelcast.internal.ascii.TextCommandServiceImpl$CommandExecutor.run(TextCommandServiceImpl.java:396)
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
Sep 08, 2020 7:09:24 AM com.hazelcast.internal.ascii.rest.HttpPostCommandProcessor
WARNING: [10.42.128.11]:5701 [dev] [4.0.2] An error occurred while handling request HttpCommand [HTTP_GET]{uri='/hazelcast/health/node-state'}AbstractTextCommand[HTTP_GET]{requestId=0}
java.lang.NullPointerException
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handleHealthcheck(HttpGetCommandProcessor.java:137)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:79)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:47)
at com.hazelcast.internal.ascii.TextCommandServiceImpl$CommandExecutor.run(TextCommandServiceImpl.java:396)
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
Sep 08, 2020 7:09:32 AM com.hazelcast.internal.ascii.rest.HttpPostCommandProcessor
WARNING: [10.42.128.11]:5701 [dev] [4.0.2] An error occurred while handling request HttpCommand [HTTP_GET]{uri='/hazelcast/health/node-state'}AbstractTextCommand[HTTP_GET]{requestId=0}
java.lang.NullPointerException
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handleHealthcheck(HttpGetCommandProcessor.java:137)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:79)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:47)
at com.hazelcast.internal.ascii.TextCommandServiceImpl$CommandExecutor.run(TextCommandServiceImpl.java:396)
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
Sep 08, 2020 7:09:34 AM com.hazelcast.internal.ascii.rest.HttpPostCommandProcessor
WARNING: [10.42.128.11]:5701 [dev] [4.0.2] An error occurred while handling request HttpCommand [HTTP_GET]{uri='/hazelcast/health/node-state'}AbstractTextCommand[HTTP_GET]{requestId=0}
java.lang.NullPointerException
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handleHealthcheck(HttpGetCommandProcessor.java:137)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:79)
at com.hazelcast.internal.ascii.rest.HttpGetCommandProcessor.handle(HttpGetCommandProcessor.java:47)
at com.hazelcast.internal.ascii.TextCommandServiceImpl$CommandExecutor.run(TextCommandServiceImpl.java:396)
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
This is our custom dockerfile for Hazelcast that because we need some changes in it:
FROM alpine:3.11
# Versions of Hazelcast and Hazelcast plugins
ARG HZ_VERSION=4.0.2
ARG CACHE_API_VERSION=1.1.1
ARG JMX_PROMETHEUS_AGENT_VERSION=0.13.0
ARG BUCKET4J_VERSION=4.10.0
# Build constants
ARG HZ_HOME="/opt/hazelcast"
# JARs to download
# for lib directory:
ARG HAZELCAST_ALL_URL="https://repo1.maven.org/maven2/com/hazelcast/hazelcast-all/${HZ_VERSION}/hazelcast-all-${HZ_VERSION}.jar"
# for user-lib directory:
ARG JCACHE_API_URL="https://repo1.maven.org/maven2/javax/cache/cache-api/${CACHE_API_VERSION}/cache-api-${CACHE_API_VERSION}.jar"
ARG PROMETHEUS_AGENT_URL="https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/${JMX_PROMETHEUS_AGENT_VERSION}/jmx_prometheus_javaagent-${JMX_PROMETHEUS_AGENT_VERSION}.jar"
ARG BUCKET4J_CORE_URL="https://repo1.maven.org/maven2/com/github/vladimir-bukhtoyarov/bucket4j-core/${BUCKET4J_VERSION}/bucket4j-core-${BUCKET4J_VERSION}.jar"
ARG BUCKET4J_HAZELCAST_URL="https://repo1.maven.org/maven2/com/github/vladimir-bukhtoyarov/bucket4j-hazelcast/${BUCKET4J_VERSION}/bucket4j-hazelcast-${BUCKET4J_VERSION}.jar"
ARG BUCKET4J_JCACHE_URL="https://repo1.maven.org/maven2/com/github/vladimir-bukhtoyarov/bucket4j-jcache/${BUCKET4J_VERSION}/bucket4j-jcache-${BUCKET4J_VERSION}.jar"
# Runtime constants / variables
ENV HZ_HOME="${HZ_HOME}" \
CLASSPATH_DEFAULT="${HZ_HOME}/*:${HZ_HOME}/lib/*:${HZ_HOME}/user-lib/*" \
JAVA_OPTS_DEFAULT="-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=${HZ_HOME}/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED" \
PROMETHEUS_PORT="" \
PROMETHEUS_CONFIG="${HZ_HOME}/jmx_agent_config.yaml" \
LOGGING_LEVEL="" \
CLASSPATH="" \
JAVA_OPTS=""
# Expose port
EXPOSE 5701
COPY *.sh *.yaml *.jar *.properties ${HZ_HOME}/
RUN echo "Updating Alpine system" \
&& apk upgrade --update-cache --available \
&& echo "Installing new APK packages" \
&& apk add openjdk11-jre bash curl procps nss
RUN mkdir "${HZ_HOME}/user-lib"\
&& cd "${HZ_HOME}/user-lib" \
&& for USER_JAR_URL in ${JCACHE_API_URL} ${PROMETHEUS_AGENT_URL} ${BUCKET4J_CORE_URL} ${BUCKET4J_HAZELCAST_URL} ${BUCKET4J_JCACHE_URL}; do curl -sf -O -L ${USER_JAR_URL}; done
# Install
RUN echo "Downloading Hazelcast and related JARs" \
&& mkdir "${HZ_HOME}/lib" \
&& cd "${HZ_HOME}/lib" \
&& for JAR_URL in ${HAZELCAST_ALL_URL}; do curl -sf -O -L ${JAR_URL}; done \
&& echo "Granting read permission to ${HZ_HOME}" \
&& chmod 755 -R ${HZ_HOME} \
&& echo "Setting Pardot ID to 'docker'" \
&& echo 'hazelcastDownloadId=docker' > "${HZ_HOME}/hazelcast-download.properties" \
&& echo "Cleaning APK packages" \
&& rm -rf /var/cache/apk/*
WORKDIR ${HZ_HOME}
# Start Hazelcast server
CMD ["/opt/hazelcast/start-hazelcast.sh"]

Hazelcast Kubernetes discovery plugin does not allow you to specify the custom location of certificates. They are always read from the default location: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt.
The parameter ca-certificate lets you inline the certificate like presented here, but not specify the path of the certificate.
If you think such a feature would be useful, feel to create a GH issue at https://github.com/hazelcast/hazelcast-kubernetes (you can also send a PR with the change).

Related

Accessing minikube via the KubernetesPodOperator in airflow

I am trying to make the airflow KubernetesPodOperator work with minikube. But unfortunately, the operator does not find the kubernetes cluster.
The dag returned to me the following error:
ERROR - HTTPSConnectionPool(host='192.168.49.2', port=8443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=dag_id%... (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f9a63cdcc10>: Failed to establish a new connection: [Errno 110] Connection timed out'))
I suspect an error in my config file definition which you will find below. The server address I used is the one return by minikube ip.
apiVersion: v1
clusters:
- cluster:
certificate-authority: ./ca.crt
extensions:
- extension:
last-update: Tue, 26 Apr 2022 15:16:20 CEST
provider: minikube.sigs.k8s.io
version: v1.25.2
name: cluster_info
server: https://192.168.49.2:8443
name: minikube
contexts:
- context:
cluster: minikube
extensions:
- extension:
last-update: Tue, 26 Apr 2022 15:16:20 CEST
provider: minikube.sigs.k8s.io
version: v1.25.2
name: context_info
namespace: default
user: minikube
name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
user:
client-certificate: ./client.crt
client-key: ./client.key
Any ideas of what I could have done wrong?
Thanks!

KafkaStreams: Error changing permissions for the directory /var/data/state-store

I am running Kafka Streams 3.1.0 on AWS OCP cluster, and I am facing this error during restart of the pod:
10:33:18,529 [INFO ] Loaded Kafka Streams properties {topology.optimization=all, processing.guarantee=at_least_once, bootstrap.servers=PLAINTEXT://app-kafka-headless.app.svc.cluster.local:9092, state.dir=/var/data/state-store, metrics.recording.level=INFO, consumer.auto.offset.reset=earliest, cache.max.bytes.buffering=10485760, producer.compression.type=lz4, num.stream.threads=3, application.id=AppProcessor}
10:33:18,572 [ERROR] Error changing permissions for the directory /var/data/state-store
java.nio.file.FileSystemException: /var/data/state-store: Operation not permitted
at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
at java.base/sun.nio.fs.UnixFileAttributeViews$Posix.setMode(Unknown Source)
at java.base/sun.nio.fs.UnixFileAttributeViews$Posix.setPermissions(Unknown Source)
at java.base/java.nio.file.Files.setPosixFilePermissions(Unknown Source)
at org.apache.kafka.streams.processor.internals.StateDirectory.configurePermissions(StateDirectory.java:154)
at org.apache.kafka.streams.processor.internals.StateDirectory.<init>(StateDirectory.java:144)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:867)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:851)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:821)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:733)
at com.xyz.app.kafka.streams.AbstractProcessing.run(AbstractProcessing.java:54)
at com.xyz.app.kafka.streams.AppProcessor.main(AppProcessor.java:97)
10:33:18,964 [INFO ] Topologies:
Sub-topology: 0
Source: app-stream (topics: [app-app-stream])
--> KSTREAM-AGGREGATE-0000000002
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000001])
--> none
<-- app-stream
10:33:18,991 [WARN ] stream-thread [main] Failed to delete state store directory of /var/data/state-store/AppProcessor for it is not empty
On OCP cluster, the user running the app is provided by the cluster, and the state store is provided by an persistent volume (allowing pod to restart on same context), so the /var/data/state-store/ folder have following permissions drwxrwsr-x. (u:root g:1001030000) :
1001030000#app-processor-0:/$ ls -al /var/data/state-store/
total 24
drwxrwsr-x. 4 root 1001030000 4096 Mar 21 10:43 .
drwxr-xr-x. 3 root root 25 Mar 23 11:04 ..
drwxr-x---. 2 1001030000 1001030000 4096 Mar 23 11:04 AppProcessor
drwxrws---. 2 root 1001030000 16384 Mar 21 10:36 lost+found
1001030000#app-processor-0:/$ chmod 750 /var/data/state-store/
chmod: changing permissions of '/var/data/state-store/': Operation not permitted
POD manifest relevant parts are:
spec:
containers:
- name: app-processor
volumeMounts:
- mountPath: /var/data/state-store
name: data
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
securityContext:
fsGroup: 1001030000
runAsUser: 1001030000
seLinuxOptions:
level: s0:c32,c19
volumes:
- name: data
persistentVolumeClaim:
claimName: data-app-processor-0
How to handle that ?
Should we use a subPath on volumeMount ?
Thanks for your insights.

As suggested, the fix I found was to set a subPath below the mountPath:
Here is the relevant part of helm template used:
spec:
containers:
- name: app-processor
volumeMounts:
- name: data
mountPath: {{ dir .Values.streams.state_dir | default "/var/data/" }}
subPath: {{ base .Values.streams.state_dir | default "state-store" }}
Where .Values.streams.state_dir is mapped to stream property state.dir.
Note this value is mandatory, and must be initialized in the values.
In that case the state-store directory is created by securityContext.runAsUser user, instead of root, so the org.apache.kafka.streams.processor.internals.StateDirectory class can enforce the permissions.

Microk8s Ingress returns 502

I'm new at Kubernetes and trying to do a simple project to connect MySQL and PhpMyAdmin using Kubernetes on my Ubuntu 20.04. I created the components needed and here is the components.
mysql.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql-deployment
labels:
app: mysql
spec:
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-root-password
- name: MYSQL_USER
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-username
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-password
- name: MYSQL_DATABASE
valueFrom:
configMapKeyRef:
name: mysql-configmap
key: mysql-database
---
apiVersion: v1
kind: Service
metadata:
name: mysql-service
spec:
selector:
app: mysql
ports:
- protocol: TCP
port: 3306
targetPort: 3306
phpmyadmin.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: phpmyadmin
labels:
app: phpmyadmin
spec:
replicas: 1
selector:
matchLabels:
app: phpmyadmin
template:
metadata:
labels:
app: phpmyadmin
spec:
containers:
- name: phpmyadmin
image: phpmyadmin
ports:
- containerPort: 3000
env:
- name: PMA_HOST
valueFrom:
configMapKeyRef:
name: mysql-configmap
key: database_url
- name: PMA_PORT
value: "3306"
- name: PMA_USER
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-username
- name: PMA_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-password
---
apiVersion: v1
kind: Service
metadata:
name: phpmyadmin-service
spec:
selector:
app: phpmyadmin
ports:
- protocol: TCP
port: 8080
targetPort: 3000
ingress-service.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-service
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
defaultBackend:
service:
name: phpmyadmin-service
port:
number: 8080
rules:
- host: test.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: phpmyadmin-service
port:
number: 8080
when I execute microk8s kubectl get ingress ingress-service, the output is:
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-service public test.com 127.0.0.1 80 45s
and when I tried to access test.com, that's when I got 502 error.
My kubectl version:
Client Version: v1.22.2-3+9ad9ee77396805
Server Version: v1.22.2-3+9ad9ee77396805
My microk8s' client and server version:
Client:
Version: v1.5.2
Revision: 36cc874494a56a253cd181a1a685b44b58a2e34a
Go version: go1.15.15
Server:
Version: v1.5.2
Revision: 36cc874494a56a253cd181a1a685b44b58a2e34a
UUID: b2bf55ad-6942-4824-99c8-c56e1dee5949
As for my microk8s' own version, I followed the installation instructions from here, so it should be 1.21/stable. (Couldn't find the way to check the exact version from the internet, if someone know how, please tell me how)
mysql.yaml logs:
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.26-1debian10 started.
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.26-1debian10 started.
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Initializing database files
2021-10-14T07:05:38.960693Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.26) initializing of server in progress as process 41
2021-10-14T07:05:38.967970Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-10-14T07:05:39.531763Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-10-14T07:05:40.591862Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-10-14T07:05:40.592247Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-10-14T07:05:40.670594Z 6 [Warning] [MY-010453] [Server] root#localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
2021-10-14 07:05:45+00:00 [Note] [Entrypoint]: Database files initialized
2021-10-14 07:05:45+00:00 [Note] [Entrypoint]: Starting temporary server
2021-10-14T07:05:45.362827Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.26) starting as process 90
2021-10-14T07:05:45.486702Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-10-14T07:05:45.845971Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-10-14T07:05:46.022043Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-10-14T07:05:46.022189Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-10-14T07:05:46.023446Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-10-14T07:05:46.023728Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2021-10-14T07:05:46.026088Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2021-10-14T07:05:46.044967Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: /var/run/mysqld/mysqlx.sock
2021-10-14T07:05:46.045036Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.26' socket: '/var/run/mysqld/mysqld.sock' port: 0 MySQL Community Server - GPL.
2021-10-14 07:05:46+00:00 [Note] [Entrypoint]: Temporary server started.
Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it.
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Creating database testing-database
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Creating user testinguser
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Giving user testinguser access to schema testing-database
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Stopping temporary server
2021-10-14T07:05:48.422053Z 13 [System] [MY-013172] [Server] Received SHUTDOWN from user root. Shutting down mysqld (Version: 8.0.26).
2021-10-14T07:05:50.543822Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.26) MySQL Community Server - GPL.
2021-10-14 07:05:51+00:00 [Note] [Entrypoint]: Temporary server stopped
2021-10-14 07:05:51+00:00 [Note] [Entrypoint]: MySQL init process done. Ready for start up.
2021-10-14T07:05:51.711889Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.26) starting as process 1
2021-10-14T07:05:51.725302Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-10-14T07:05:51.959356Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-10-14T07:05:52.162432Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-10-14T07:05:52.162568Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-10-14T07:05:52.163400Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-10-14T07:05:52.163556Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2021-10-14T07:05:52.165840Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2021-10-14T07:05:52.181516Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock
2021-10-14T07:05:52.181562Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.26' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL.
phpmyadmin.yaml logs:
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.114.139. Set the 'ServerName' directive globally to suppress this message
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.114.139. Set the 'ServerName' directive globally to suppress this message
[Thu Oct 14 03:57:32.653011 2021] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.51 (Debian) PHP/7.4.24 configured -- resuming normal operations
[Thu Oct 14 03:57:32.653240 2021] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'
Here is also my Allocatable on describe nodes command:
Allocatable:
cpu: 4
ephemeral-storage: 113289380Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 5904508Ki
pods: 110
and the Allocated resources:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 550m (13%) 200m (5%)
memory 270Mi (4%) 370Mi (6%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Any help? Thanks in advance.

Turns out it is a fudged up mistake of mine, where I specify the phpmyadmin's container port to be 3000, while the default image port opens at 80. After changing the containerPort and phpmyadmin-service's targetPort to 80, it opens the phpmyadmin's page.
So sorry for kkopczak and AndD for the fuss and also big thanks for trying to help! :)

Pgpool fails to start on kubernetes as a pod

I have hosted pgpool on a container and given the container config for kubernetes deployment -
Mountpaths -
- name: cgroup
mountPath: /sys/fs/cgroup:ro
- name: var-run
mountPath: /run
And Volumes for mountpath for the cgroups are mentioned as below -
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: Directory
- name: var-run
emptyDir:
medium: Memory
Also in kubernetes deployment I have passed -
securityContext:
privileged: true
But when I open the pod and exec inside it to check the pgpool status I get the below issue -
[root#app-pg-6448dfb58d-vzk67 /]# journalctl -xeu pgpool
-- Logs begin at Sat 2020-07-04 16:28:41 UTC, end at Sat 2020-07-04 16:29:13 UTC. --
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: Started Pgpool-II.
-- Subject: Unit pgpool.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pgpool.service has finished starting up.
--
-- The start-up result is done.
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: [1-1] 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "stateme
nt_level_load_balance"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "statement_lev
el_load_balance"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto_failback
"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto_failback
_interval"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enable_consen
sus_with_half_votes"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enable_shared
_relcache"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "relcache_quer
y_target"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: FATAL: could not open pid file as /var/run/pgpool-II-11/p
gpool.pid. reason: No such file or directory
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: Unit pgpool.service entered failed state.
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service failed.
Systemctl status pgpool inside the pod container -
➜ app-app kubectl exec -it app-pg-6448dfb58d-vzk67 -- bash
[root#app-pg-6448dfb58d-vzk67 /]# systemctl status pgpool
● pgpool.service - Pgpool-II
Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sat 2020-07-04 16:28:41 UTC; 1h 39min ago
Process: 34 ExecStart=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $OPTS (code=exited, status=3)
Main PID: 34 (code=exited, status=3)
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "stat...lance"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto...lback"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto...erval"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enab...votes"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enab...cache"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "relc...arget"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: FATAL: could not open pid file as /var/run/pgpoo...ectory
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: Unit pgpool.service entered failed state.
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
If required this is the whole deployment sample -
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-pg
labels:
helm.sh/chart: app-pgpool-1.0.0
app.kubernetes.io/name: app-pgpool
app.kubernetes.io/instance: app-service
app.kubernetes.io/version: "1.0.3"
app.kubernetes.io/managed-by: Helm
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: app-pgpool
app.kubernetes.io/instance: app-service
template:
metadata:
labels:
app.kubernetes.io/name: app-pgpool
app.kubernetes.io/instance: app-service
spec:
volumes:
- name: "pgpool-config"
persistentVolumeClaim:
claimName: "pgpool-pvc"
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: Directory
- name: var-run
emptyDir:
# Tmpfs needed for systemd.
medium: Memory
# volumes:
# - name: pgpool-config
# configMap:
# name: pgpool-config
# - name: pgpool-config
# azureFile:
# secretName: azure-fileshare-secret
# shareName: pgpool
# readOnly: false
imagePullSecrets:
- name: app-secret
serviceAccountName: app-pg
securityContext:
{}
containers:
- name: app-pgpool
securityContext:
{}
image: "appacr.azurecr.io/pgpool:1.0.3"
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
stdin: true
tty: true
ports:
- name: http
containerPort: 9999
protocol: TCP
# livenessProbe:
# httpGet:
# path: /
# port: http
# readinessProbe:
# httpGet:
# path: /
# port: http
resources:
{}
volumeMounts:
- name: "pgpool-config"
mountPath: /etc/pgpool-II
- name: cgroup
mountPath: /sys/fs/cgroup:ro
- name: var-run
mountPath: /run
UPDATE -
Running this same setup on dockerfile runs perfectly good no issues at all -
version: '2'
services:
pgpool:
container_name: pgpool
image: appacr.azurecr.io/pgpool:1.0.3
logging:
options:
max-size: 100m
ports:
- "9999:9999"
networks:
vpcbr:
ipv4_address: 10.5.0.2
restart: unless-stopped
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- $HOME/Documents/app/docker-compose/pgpool.conf:/etc/pgpool-II/pgpool.conf
- $HOME/Documents/app/docker-compose/pool_passwd:/etc/pgpool-II/pool_passwd
privileged: true
stdin_open: true
tty: true
I dont know what am I doing wrong I am not able to start this pgpool anyway and not able to pinpoint the issue. What permission are we missing here or whether cgroups is the culprit ? or not ?
Some direction would be appreciated .

while this might not be a direct answer to your question, I have seen some very cryptic errors when trying to run any postgresql product from raw manifest, my recommandations would be to try leveraging the chart from Bitnami, they have put a lot of effort in ensuring that all of the security / permission culpits are taken care of properly.
https://github.com/bitnami/charts/tree/master/bitnami/postgresql-ha
Hopefully, this help.
Also, if you do not want to use Helm, you can run the help template command
https://helm.sh/docs/helm/helm_template/
this will generate manifest out of the chart's template file based on the provided values.yaml

redis cluster in Kubernetes doesn't write nodes.conf file

I'm trying to set up a Redis cluster and I followed this guide here: https://rancher.com/blog/2019/deploying-redis-cluster/
Basically I'm creating a StatefulSet with a replica 6, so that I can have 3 master nodes and 3 slave nodes.
After all the nodes are up, I create the cluster, and it all works fine... but if I look into the file "nodes.conf" (where the configuration of all the nodes should be saved) of each redis node, I can see it's empty.
This is a problem, because whenever a redis node gets restarted, it searches into that file for the configuration of the node to update the IP address of itself and MEET the other nodes, but he finds nothing, so it basically starts a new cluster on his own, with a new ID.
My storage is an NFS connected shared folder. The YAML responsible for the storage access is this one:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: nfs-provisioner-raid5
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-provisioner-raid5
spec:
serviceAccountName: nfs-provisioner-raid5
containers:
- name: nfs-provisioner-raid5
image: quay.io/external_storage/nfs-client-provisioner:latest
volumeMounts:
- name: nfs-raid5-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: 'nfs.raid5'
- name: NFS_SERVER
value: 10.29.10.100
- name: NFS_PATH
value: /raid5
volumes:
- name: nfs-raid5-root
nfs:
server: 10.29.10.100
path: /raid5
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-provisioner-raid5
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs.raid5
provisioner: nfs.raid5
parameters:
archiveOnDelete: "false"
This is the YAML of the redis cluster StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
labels:
app: redis-cluster
spec:
serviceName: redis-cluster
replicas: 6
selector:
matchLabels:
app: redis-cluster
template:
metadata:
labels:
app: redis-cluster
spec:
containers:
- name: redis
image: redis:5-alpine
ports:
- containerPort: 6379
name: client
- containerPort: 16379
name: gossip
command: ["/conf/fix-ip.sh", "redis-server", "/conf/redis.conf"]
readinessProbe:
exec:
command:
- sh
- -c
- "redis-cli -h $(hostname) ping"
initialDelaySeconds: 15
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "redis-cli -h $(hostname) ping"
initialDelaySeconds: 20
periodSeconds: 3
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: conf
mountPath: /conf
readOnly: false
- name: data
mountPath: /data
readOnly: false
volumes:
- name: conf
configMap:
name: redis-cluster
defaultMode: 0755
volumeClaimTemplates:
- metadata:
name: data
labels:
name: redis-cluster
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nfs.raid5
resources:
requests:
storage: 1Gi
This is the configMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-cluster
labels:
app: redis-cluster
data:
fix-ip.sh: |
#!/bin/sh
CLUSTER_CONFIG="/data/nodes.conf"
echo "creating nodes"
if [ -f ${CLUSTER_CONFIG} ]; then
if [ -z "${POD_IP}" ]; then
echo "Unable to determine Pod IP address!"
exit 1
fi
echo "Updating my IP to ${POD_IP} in ${CLUSTER_CONFIG}"
sed -i.bak -e "/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/${POD_IP}/" ${CLUSTER_CONFIG}
echo "done"
fi
exec "$#"
redis.conf: |+
cluster-enabled yes
cluster-require-full-coverage no
cluster-node-timeout 15000
cluster-config-file /data/nodes.conf
cluster-migration-barrier 1
appendonly yes
protected-mode no
and I created the cluster using the command:
kubectl exec -it redis-cluster-0 -- redis-cli --cluster create --cluster-replicas 1 $(kubectl get pods -l app=redis-cluster -o jsonpath='{range.items[*]}{.status.podIP}:6379 ')
what am I doing wrong?
this is what I see into the /data folder:
the nodes.conf file shows 0 bytes.
Lastly, this is the log from the redis-cluster-0 pod:
creating nodes
1:C 07 Nov 2019 13:01:31.166 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 07 Nov 2019 13:01:31.166 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 07 Nov 2019 13:01:31.166 # Configuration loaded
1:M 07 Nov 2019 13:01:31.179 * No cluster configuration found, I'm e55801f9b5d52f4e599fe9dba5a0a1e8dde2cdcb
1:M 07 Nov 2019 13:01:31.182 * Running mode=cluster, port=6379.
1:M 07 Nov 2019 13:01:31.182 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 07 Nov 2019 13:01:31.182 # Server initialized
1:M 07 Nov 2019 13:01:31.182 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 07 Nov 2019 13:01:31.185 * Ready to accept connections
1:M 07 Nov 2019 13:08:04.264 # configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH
1:M 07 Nov 2019 13:08:04.306 # IP address for this node updated to 10.40.0.27
1:M 07 Nov 2019 13:08:09.216 # Cluster state changed: ok
1:M 07 Nov 2019 13:08:10.144 * Replica 10.44.0.14:6379 asks for synchronization
1:M 07 Nov 2019 13:08:10.144 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '27972faeb07fe922f1ab581cac0fe467c85c3efd', my replication IDs are '31944091ef93e3f7c004908e3ff3114fd733ea6a' and '0000000000000000000000000000000000000000')
1:M 07 Nov 2019 13:08:10.144 * Starting BGSAVE for SYNC with target: disk
1:M 07 Nov 2019 13:08:10.144 * Background saving started by pid 1041
1041:C 07 Nov 2019 13:08:10.161 * DB saved on disk
1041:C 07 Nov 2019 13:08:10.161 * RDB: 0 MB of memory used by copy-on-write
1:M 07 Nov 2019 13:08:10.233 * Background saving terminated with success
1:M 07 Nov 2019 13:08:10.243 * Synchronization with replica 10.44.0.14:6379 succeeded
thank you for the help.

Looks to be an issue with the shell script that is mounted from configmap. can you update as below
fix-ip.sh: |
#!/bin/sh
CLUSTER_CONFIG="/data/nodes.conf"
echo "creating nodes"
if [ -f ${CLUSTER_CONFIG} ]; then
echo "[ INFO ]File:${CLUSTER_CONFIG} is Found"
else
touch $CLUSTER_CONFIG
fi
if [ -z "${POD_IP}" ]; then
echo "Unable to determine Pod IP address!"
exit 1
fi
echo "Updating my IP to ${POD_IP} in ${CLUSTER_CONFIG}"
sed -i.bak -e "/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/${POD_IP}/" ${CLUSTER_CONFIG}
echo "done"
exec "$#"
I just deployed with the updated script and it worked. see below the output
master $ kubectl get po
NAME READY STATUS RESTARTS AGE
redis-cluster-0 1/1 Running 0 83s
redis-cluster-1 1/1 Running 0 54s
redis-cluster-2 1/1 Running 0 45s
redis-cluster-3 1/1 Running 0 38s
redis-cluster-4 1/1 Running 0 31s
redis-cluster-5 1/1 Running 0 25s
master $ kubectl exec -it redis-cluster-0 -- redis-cli --cluster create --cluster-replicas 1 $(kubectl getpods -l app=redis-cluster -o jsonpath='{range.items[*]}{.status.podIP}:6379 ')
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.40.0.4:6379 to 10.40.0.1:6379
Adding replica 10.40.0.5:6379 to 10.40.0.2:6379
Adding replica 10.40.0.6:6379 to 10.40.0.3:6379
M: 9984141f922bed94bfa3532ea5cce43682fa524c 10.40.0.1:6379
slots:[0-5460] (5461 slots) master
M: 76ebee0dd19692c2b6d95f0a492d002cef1c6c17 10.40.0.2:6379
slots:[5461-10922] (5462 slots) master
M: 045b27c73069bff9ca9a4a1a3a2454e9ff640d1a 10.40.0.3:6379
slots:[10923-16383] (5461 slots) master
S: 1bc8d1b8e2d05b870b902ccdf597c3eece7705df 10.40.0.4:6379
replicates 9984141f922bed94bfa3532ea5cce43682fa524c
S: 5b2b019ba8401d3a8c93a8133db0766b99aac850 10.40.0.5:6379
replicates 76ebee0dd19692c2b6d95f0a492d002cef1c6c17
S: d4b91700b2bb1a3f7327395c58b32bb4d3521887 10.40.0.6:6379
replicates 045b27c73069bff9ca9a4a1a3a2454e9ff640d1a
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
....
>>> Performing Cluster Check (using node 10.40.0.1:6379)
M: 9984141f922bed94bfa3532ea5cce43682fa524c 10.40.0.1:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: 045b27c73069bff9ca9a4a1a3a2454e9ff640d1a 10.40.0.3:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 1bc8d1b8e2d05b870b902ccdf597c3eece7705df 10.40.0.4:6379
slots: (0 slots) slave
replicates 9984141f922bed94bfa3532ea5cce43682fa524c
S: d4b91700b2bb1a3f7327395c58b32bb4d3521887 10.40.0.6:6379
slots: (0 slots) slave
replicates 045b27c73069bff9ca9a4a1a3a2454e9ff640d1a
M: 76ebee0dd19692c2b6d95f0a492d002cef1c6c17 10.40.0.2:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 5b2b019ba8401d3a8c93a8133db0766b99aac850 10.40.0.5:6379
slots: (0 slots) slave
replicates 76ebee0dd19692c2b6d95f0a492d002cef1c6c17
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
master $ kubectl exec -it redis-cluster-0 -- redis-cli cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:61
cluster_stats_messages_pong_sent:76
cluster_stats_messages_sent:137
cluster_stats_messages_ping_received:71
cluster_stats_messages_pong_received:61
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:137
master $ for x in $(seq 0 5); do echo "redis-cluster-$x"; kubectl exec redis-cluster-$x -- redis-cli role;echo; done
redis-cluster-0
master
588
10.40.0.4
6379
588
redis-cluster-1
master
602
10.40.0.5
6379
602
redis-cluster-2
master
588
10.40.0.6
6379
588
redis-cluster-3
slave
10.40.0.1
6379
connected
602
redis-cluster-4
slave
10.40.0.2
6379
connected
602
redis-cluster-5
slave
10.40.0.3
6379
connected
588

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Hazelcast failed to start in Kubernetes - kubernetes

Related

Accessing minikube via the KubernetesPodOperator in airflow

KafkaStreams: Error changing permissions for the directory /var/data/state-store

Microk8s Ingress returns 502

Pgpool fails to start on kubernetes as a pod

redis cluster in Kubernetes doesn't write nodes.conf file

Categories

Resources