I'm trying to run Prometheus operator on Kubernetes, but after I trying to persist data on Rancher-RBD, there is an error:
level=info ts=2020-10-31T12:40:33.171Z caller=main.go:353 msg="Starting Prometheus" version="(version=2.22.0, branch=HEAD, revision=0a7fdd3b76960808c3a91d92267c3d815c1bc354)"
level=info ts=2020-10-31T12:40:33.171Z caller=main.go:358 build_context="(go=go1.15.3, user=root#6321101b2c50, date=20201015-12:29:59)"
level=info ts=2020-10-31T12:40:33.171Z caller=main.go:359 host_details="(Linux 4.14.35-1902.3.2.el7uek.x86_64 #2 SMP Tue Jul 30 03:59:02 GMT 2019 x86_64 prometheus-prometheus-0 (none))"
level=info ts=2020-10-31T12:40:33.171Z caller=main.go:360 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-10-31T12:40:33.171Z caller=main.go:361 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-10-31T12:40:33.173Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7fff711299c3, 0xb, 0x14, 0x30867c0, 0xc000e6f050, 0x30867c0)
/app/promql/query_logger.go:117 +0x4cf
main.main()
/app/cmd/prometheus/main.go:388 +0x536c
this is my operator deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/version: v0.43.0
name: prometheus-operator
namespace: monitorings
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
template:
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: prometheus-operator
app.kubernetes.io/version: v0.43.0
spec:
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --logtostderr=true
- --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.43.0
image: quay.io/prometheus-operator/prometheus-operator:v0.43.0
name: prometheus-operator
ports:
- containerPort: 8080
name: http
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
allowPrivilegeEscalation: false
nodeSelector:
beta.kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: prometheus-operator
I tried to add an initContainers to change permission but the problem already exusts:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
labels:
app: strimzi
spec:
replicas: 1
serviceAccountName: prometheus-server
podMonitorSelector:
matchLabels:
app: strimzi
resources:
requests:
memory: 400Mi
enableAdminAPI: false
ruleSelector:
matchLabels:
role: alert-rules
app: strimzi
alerting:
alertmanagers:
- namespace: monitorings
name: alertmanager
port: alertmanager
additionalScrapeConfigs:
name: additional-scrape-configs
key: prometheus-additional.yaml
imagePullSecrets:
- name: nexuspullsecret
initContainers:
- name: init
image: debian:stable
command: ["chmod", "-R", "777", "/mnt"]
volumeMounts:
- name: prometheus-prometheus-db
mountPath: /mnt
subPath: prometheus
storage:
volumeClaimTemplate:
spec:
storageClassName: rancher-rbd
name: prometheus-prometheus-db
resources:
requests:
storage: 10Gi
I changed my initContainers like here and it works:
initContainers:
- name: "init-datapath"
image: debian:stable
command: ["chown", "-R", "65534:65534", "/data"]
command: ["/bin/chmod","-R","777","/data"]
volumeMounts:
- name: prometheus-prometheus-db
mountPath: /data
subPath: ""
storage:
volumeClaimTemplate:
spec:
storageClassName: rancher-rbd
name: prometheus-prometheus-db
resources:
requests:
storage: 10Gi
Related
I have app-1 pods created by StatefulSet and in that, I am creating PVC as well
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: app-1
spec:
replicas: 3
selector:
matchLabels:
app: app-1
serviceName: "app-1"
template:
metadata:
labels:
app: app-1
spec:
containers:
- name: app-1
image: registry.k8s.io/nginx-slim:0.8
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
ports:
- containerPort: 4567
volumeMounts:
- name: app-1-state-volume-claim
mountPath: /app1Data
- name: app-2-data-volume-claim
mountPath: /app2Data
volumeClaimTemplates:
- metadata:
name: app-1-state-volume-claim
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "managed-csi-premium"
resources:
requests:
storage: 1Gi
- metadata:
name: app-2-data-volume-claim
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "managed-csi-premium"
resources:
requests:
storage: 1Gi
state of the app1 is maintained in PVC - app-1-state-volume-claim
app1 is also creating data for app2 in PVC - app-2-data-volume-claim
I want to access app-2-data-volume-claim in another pod deployment described below
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: app-2
spec:
selector:
matchLabels:
name: app-2
template:
metadata:
labels:
name: app-2
spec:
containers:
- name: app-2
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: app2Data
mountPath: /app2Data
volumes:
- name: app2Data
persistentVolumeClaim:
claimName: app-2-data-volume-claim
This is failing with below output
persistentvolumeclaim "app-2-data-volume-claim" not found
How can I do that?I cannot use Azure file share due to app-1 limitation.
I have installed on K3S raspberry pi cluster nexus with the following setups for kubernetes learning purposes. First I created a StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nexus
namespace: dev-ops
spec:
serviceName: "nexus"
replicas: 1
selector:
matchLabels:
app: nexus-server
template:
metadata:
labels:
app: nexus-server
spec:
containers:
- name: nexus
image: klo2k/nexus3:latest
env:
- name: MAX_HEAP
value: "800m"
- name: MIN_HEAP
value: "300m"
resources:
limits:
memory: "4Gi"
cpu: "1000m"
requests:
memory: "2Gi"
cpu: "500m"
ports:
- containerPort: 8081
volumeMounts:
- name: nexusstorage
mountPath: /sonatype-work
volumes:
- name: nexusstorage
persistentVolumeClaim:
claimName: nexusstorage
Storage class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nexusstorage
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "30"
fsType: "ext4"
diskSelector: "ssd"
nodeSelector: "ssd"
pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nexusstorage
namespace: dev-ops
spec:
accessModes:
- ReadWriteOnce
storageClassName: nexusstorage
resources:
requests:
storage: 50Gi
Service
apiVersion: v1
kind: Service
metadata:
name: nexus-server
namespace: dev-ops
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: /
prometheus.io/port: '8081'
spec:
selector:
app: nexus-server
type: LoadBalancer
ports:
- port: 8081
targetPort: 8081
nodePort: 32000
this setup will spin up nexus, but if I restart the pod the data will not persist and I have to create all the setups and users from scratch.
What I'm missing in this case?
UPDATE
I got it working, nexus needs on mount permissions on directory. The working StatefulSet looks as it follow
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nexus
namespace: dev-ops
spec:
serviceName: "nexus"
replicas: 1
selector:
matchLabels:
app: nexus-server
template:
metadata:
labels:
app: nexus-server
spec:
securityContext:
runAsUser: 200
runAsGroup: 200
fsGroup: 200
containers:
- name: nexus
image: klo2k/nexus3:latest
env:
- name: MAX_HEAP
value: "800m"
- name: MIN_HEAP
value: "300m"
resources:
limits:
memory: "4Gi"
cpu: "1000m"
requests:
memory: "2Gi"
cpu: "500m"
ports:
- containerPort: 8081
volumeMounts:
- name: nexus-storage
mountPath: /nexus-data
volumes:
- name: nexus-storage
persistentVolumeClaim:
claimName: nexus-storage
important snippet to get it working
securityContext:
runAsUser: 200
runAsGroup: 200
fsGroup: 200
I'm not familiar with that image, although checking dockerhub, they mention using a Dockerfile similar to that of Sonatype. Then, I would change the mountpoint for your volume, to /nexus-data
This is the default path storing data (they set this env var, then declare a VOLUME). Which we can confirm, looking at the repository that most likely produced your arm-capable image
And following up on your last comment, let's try to also mount it in /opt/sonatype/sonatype-work/nexus3...
In your statefulset, change volumeMounts, to this:
volumeMounts:
- name: nexusstorage
mountPath: /nexus-data
- name: nexusstorage
mountPath: /opt/sonatype/sonatype-work/nexus3
volumes:
- name: nexusstorage
persistentVolumeClaim:
claimName: nexusstorage
Although the second volumeMount entry should not be necessary, as far as I understand. Maybe something's wrong with your storage provider?
Are you sure your PVC is write-able? Reverting back to your initial configuration, enter your pod (kubectl exec -it) and try to write a file at the root of your PVC.
I am trying to update my pod time to Asia/Kolkata zone as per kubernetes timezone in POD with command and argument. However, the time still remains the same UTC time. Only the time zone is getting updated from UTC to Asia.
I was able to fix it using the volume mounts as below. Create a config map and apply the deployment yaml.
kubectl create configmap tz --from-file=/usr/share/zoneinfo/Asia/Kolkata -n <required namespace>
Why is the environmental variable method not working? Will a pod eviction occur from one host to another if we use volume mount time and will if affect the volume mount time after pod eviction?
The EV deployment YAML is below which does not update the time
apiVersion: apps/v1
kind: Deployment
metadata:
name: connector
labels:
app: connector
namespace: clients
spec:
replicas: 1
selector:
matchLabels:
app: connector
template:
metadata:
labels:
app: connector
spec:
containers:
- image: connector
name: connector
resources:
requests:
memory: "32Mi" # "64M"
cpu: "250m"
limits:
memory: "64Mi" # "128M"
cpu: "500m"
ports:
- containerPort: 3307
protocol: TCP
env:
- name: TZ
value: Asia/Kolkata
volumeMounts:
- name: connector-rd
mountPath: /home/mongobi/mongosqld.conf
subPath: mongosqld.conf
volumes:
- name: connector-rd
configMap:
name: connector-rd
items:
- key: mongod.conf
Volume Mount yaml is below.
apiVersion: apps/v1
kind: Deployment
metadata:
name: connector
labels:
app: connector
namespace: clients
spec:
replicas: 1
selector:
matchLabels:
app: connector
template:
metadata:
labels:
app: connector
spec:
containers:
- image: connector
name: connector
resources:
requests:
memory: "32Mi" # "64M"
cpu: "250m"
limits:
memory: "64Mi" # "128M"
cpu: "500m"
ports:
- containerPort: 3307
protocol: TCP
volumeMounts:
- name: tz-config
mountPath: /etc/localtime
- name: connector-rd
mountPath: /home/mongobi/mongosqld.conf
subPath: mongosqld.conf
volumes:
- name: connector-rd
configMap:
name: connector-rd
items:
- key: mongod.conf
path: mongosqld.conf
- name: tz-config
hostPath:
path: /usr/share/zoneinfo/Asia/Kolkata
In this scenario you need to mention type attribute as File for hostPath in the deployment configuration. The below configuration should work for you.
- name: tz-config
hostPath:
path: /usr/share/zoneinfo/Asia/Kolkata
type: File
Simply setting TZ env variable in deployment works for me
I am trying to install hdfs on EKS cluster. I deployed a namenode and two datanodes. All are up successfully.
But a strange error is happening. When I check Namenode GUI or check dfsadmin client to get the datanodes list, it randomly shows the one datanode only i.e. sometime datanode-0, sometime datanode-1. It never displays both/all datanodes.
What can be the issue here? I am even using headless service for datanodes. Please help.
#clusterIP service of namenode
apiVersion: v1
kind: Service
metadata:
name: hdfs-name
namespace: pulse
labels:
app.kubernetes.io/name: hdfs-name
app.kubernetes.io/version: "1.0"
spec:
ports:
- port: 8020
protocol: TCP
name: nn-rpc
- port: 9870
protocol: TCP
name: nn-web
selector:
app.kubernetes.io/name: hdfs-name
app.kubernetes.io/version: "1.0"
type: ClusterIP
---
#namenode stateful deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hdfs-name
namespace: pulse
labels:
app.kubernetes.io/name: hdfs-name
app.kubernetes.io/version: "1.0"
spec:
serviceName: hdfs-name
replicas: 1 #TODO 2 namenodes (1 active, 1 standby)
selector:
matchLabels:
app.kubernetes.io/name: hdfs-name
app.kubernetes.io/version: "1.0"
template:
metadata:
labels:
app.kubernetes.io/name: hdfs-name
app.kubernetes.io/version: "1.0"
spec:
initContainers:
- name: delete-lost-found
image: busybox
command: ["sh", "-c", "rm -rf /hadoop/dfs/name/lost+found"]
volumeMounts:
- name: hdfs-name-pv-claim
mountPath: /hadoop/dfs/name
containers:
- name: hdfs-name
image: bde2020/hadoop-namenode
env:
- name: CLUSTER_NAME
value: hdfs-k8s
- name: HDFS_CONF_dfs_permissions_enabled
value: "false"
#- name: HDFS_CONF_dfs_replication #not needed
# value: "2"
ports:
- containerPort: 8020
name: nn-rpc
- containerPort: 9870
name: nn-web
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
volumeMounts:
- name: hdfs-name-pv-claim
mountPath: /hadoop/dfs/name
volumeClaimTemplates:
- metadata:
name: hdfs-name-pv-claim
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: ebs
resources:
requests:
storage: 1Gi
---
#headless service of datanode
apiVersion: v1
kind: Service
metadata:
name: hdfs-data
namespace: pulse
labels:
app.kubernetes.io/name: hdfs-data
app.kubernetes.io/version: "1.0"
spec:
ports:
ports:
- port: 9866
protocol: TCP
name: dn-rpc
- port: 9864
protocol: TCP
name: dn-web
selector:
app.kubernetes.io/name: hdfs-data
app.kubernetes.io/version: "1.0"
clusterIP: None
type: ClusterIP
---
#datanode stateful deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hdfs-data
namespace: pulse
labels:
app.kubernetes.io/name: hdfs-data
app.kubernetes.io/version: "1.0"
spec:
serviceName: hdfs-data
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: hdfs-data
app.kubernetes.io/version: "1.0"
template:
metadata:
labels:
app.kubernetes.io/name: hdfs-data
app.kubernetes.io/version: "1.0"
spec:
containers:
- name: hdfs-data
image: bde2020/hadoop-datanode
env:
- name: CORE_CONF_fs_defaultFS
value: hdfs://hdfs-name:8020
ports:
- containerPort: 9866
name: dn-rpc
- containerPort: 9864
name: dn-web
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
volumeMounts:
- name: hdfs-data-pv-claim
mountPath: /hadoop/dfs/data
volumeClaimTemplates:
- metadata:
name: hdfs-data-pv-claim
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: ebs
resources:
requests:
storage: 1Gi
Running hdfs dfsadmin -report shows one datanode only randomly e.g. sometime datanode-0 and sometime datanode-1.
Datanodes host name is different datanode-0,datanode-1 but their name is same (127.0.0.1:9866(localhost)). Can this be the issue? If yes, how to solve i?
Also, I don't see any HDFS block replication happening, even rep factor is 3.
UPDATE
HI, It comes out to be the Istio porxy issue. I uninstalled Istio and it worked out. Istio proxy was setting name as 127.0.0.1 instead of actual IP.
I ran into this same issue and the workaround I'm currently using is to disable the envoy redirect for inbound traffic to the namenode on port 9000 (8020 for your case) by adding this annotation to the hadoop namenode:
traffic.sidecar.istio.io/excludeInboundPorts: "9000"
Reference: https://istio.io/v1.4/docs/reference/config/annotations/
After reading through some Istio issues it seems like the source IP is not being retained when being redirected through envoy.
Related issues:
https://github.com/istio/istio/issues/5679
https://github.com/istio/istio/pull/23275
I have not tried the TPROXY approach yet since I'm currently not running Istio 1.6 which includes the TPROXY source ip preservation fix.
It comes out to be the Istio porxy issue. I uninstalled Istio and it worked out. Istio proxy was setting name as 127.0.0.1 instead of actual IP.
I have the following file using which I'm setting up Prometheus on my Kubernetes cluster:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
namespace: plant-simulator-monitoring
spec:
replicas: 1
selector:
matchLabels:
name: prometheus-server
template:
metadata:
labels:
app: prometheus-server
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/
- name: prometheus-storage-volume
mountPath: /prometheus/
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-server-conf
- name: prometheus-storage-volume
emptyDir: {}
When I apply this to my Kubernetes cluster, I see the following error:
ts=2020-03-16T21:40:33.123641578Z caller=sync.go:165 component=daemon err="plant-simulator-monitoring:deployment/prometheus-deployment: running kubectl: The Deployment \"prometheus-deployment\" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{\"app\":\"prometheus-server\"}: `selector` does not match template `labels`"
I could not see anything wrong with my yaml file. Is there something that I'm missing?
As I mentioned in comments, You have issue with matching labels.
In spec.selector.matchLabels you have name: prometheus-server and in spec.template.medatada.labels you have app: prometheus-server. Values there need to be the same. Below what I get when used your yaml:
$ kubectl apply -f deploymentoriginal.yaml
The Deployment "prometheus-deployment" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app":"prometheus-server"}: `selector` does not match template `labels`
And output when I used below yaml with the same labels:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
namespace: plant-simulator-monitoring
spec:
replicas: 1
selector:
matchLabels:
name: prometheus-server
template:
metadata:
labels:
name: prometheus-server
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/
- name: prometheus-storage-volume
mountPath: /prometheus/
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-server-conf
- name: prometheus-storage-volume
emptyDir: {}
$ kubectl apply -f deploymentselectors.yaml
deployment.apps/prometheus-deployment created
More detailed info about selectors/labels can be found in Official Kubernetes docs.
There is a mismatch between the label in selector(name: prometheus-server) and metadata (app: prometheus-server). Below should work.
selector:
matchLabels:
app: prometheus-server
template:
metadata:
labels:
app: prometheus-server