I've deployed promtail, Grafana, Loki and AlertManager using Helm charts on k3d cluster on my local machine. I would like to have some rules in Loki such that if something will happen, AlertManager should be informed. Now I tried only with some simple rule, just to check if it works.
My Loki version: {"version":"2.6.1","revision":"6bd05c9a4","branch":"HEAD","buildUser":"root#ea1e89b8da02","buildDate":"2022-07-18T08:49:07Z","goVersion":""}
My Grafana version:
And the Loki configuration:
loki:
# should loki be deployed on cluster?
enabled: true
image:
repository: grafana/loki
pullPolicy: Always
pullSecrets:
- registry
priorityClassName: normal
resources:
limits:
memory: 3Gi
cpu: 0
requests:
memory: 0
cpu: 0
config:
chunk_store_config:
max_look_back_period: 30d
table_manager:
retention_deletes_enabled: true
retention_period: 30d
query_range:
split_queries_by_interval: 0
parallelise_shardable_queries: false
querier:
max_concurrent: 2048
frontend:
max_outstanding_per_tenant: 4096
compress_responses: true
ingester:
wal:
enabled: true
dir: /tmp/wal
schema_config:
configs:
- from: 2022-12-05
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /tmp/loki/chunks
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: filesystem
ruler:
storage:
type: local
local:
directory: /tmp/loki/rules/
ring:
kvstore:
store: inmemory
rule_path: /tmp/loki/rules-temp
alertmanager_url: http://onprem-kube-prometheus-alertmanager.svc.mylocal-monitoring:9093
enable_api: true
enable_alertmanager_v2: true
write:
extraVolumeMounts:
- name: rules-config
mountPath: /tmp/loki/rules/fake/
extraVolumes:
- name: rules-config
configMap:
name: rules-cfgmap
items:
- key: "rules.yaml"
path: "rules.yaml"
read:
extraVolumeMounts:
- name: rules-config
mountPath: /tmp/loki/rules/fake/
extraVolumes:
- name: rules-config
configMap:
name: rules-cfgmap
items:
- key: "rules.yaml"
path: "rules.yaml"
promtail:
image:
registry: docker
pullPolicy: Always
imagePullSecrets:
- name: registry
priorityClassName: normal
resources:
limits:
memory: 256Mi
cpu: 0
requests:
memory: 0
cpu: 0
livenessProbe:
failureThreshold: 5
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
config:
snippets:
pipelineStages:
- cri: {}
common:
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: node_name
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
replacement: /var/log/pods/*$1/*.log
regex: true/(.*)
separator: /
source_labels:
- __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
- __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
- __meta_kubernetes_pod_container_name
target_label: __path__
monitoring:
enabled: false
networkPolicies:
enabled: false
The problem is, when I want to check the rules, doing curl -X GET localhost:3100/loki/api/v1/rules it shows me: unable to read rule dir /tmp/loki/rules/fake: open /tmp/loki/rules/fake: no such file or directory.
So it seems it can't find the rules file.
I also tried to change the config like this:
write:
extraVolumeMounts:
- name: rules-conf
mountPath: /tmp/loki/rules/fake/rules.yaml
extraVolumes:
- name: rules-conf
read:
extraVolumeMounts:
- name: rules-conf
mountPath: /tmp/loki/rules/fake/rules.yaml
extraVolumes:
- name: rules-conf
And my configmap:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rules-cfgmap
namespace: mylocal-monitoring
data:
rules.yaml: |
groups:
- name: PrometheusAlertsGroup
rules:
- alert: test1
expr: |
1 > 0
for: 0m
labels:
severity: critical
annotations:
summary: TEST: testing test
description: test
And the rules file:
groups:
- name: PrometheusAlertsGroup
rules:
- alert: test1
expr: |
1 > 0
for: 0m
labels:
severity: critical
annotations:
summary: TEST: testing test
description: test
But the problem is the same. Any ideas?
It finally worked when I created /tmp/loki/rules/fake/rules.yaml manually, but thats not the point to create it manually.
Related
Im trying to deploy telegraf helm chart on kubernetes.
helm upgrade --install telegraf-instance -f values.yaml influxdata/telegraf
When I add modbus input plugin with holding_register i get error
[telegraf] Error running agent: Error loading config file /etc/telegraf/telegraf.conf: Error parsing data: line 49: key `name’ is in conflict with line 2fd
my values.yaml like below
## Default values.yaml for Telegraf
## This is a YAML-formatted file.
## ref: https://hub.docker.com/r/library/telegraf/tags/
replicaCount: 1
image:
repo: "telegraf"
tag: "1.21.4"
pullPolicy: IfNotPresent
podAnnotations: {}
podLabels: {}
imagePullSecrets: []
args: []
env:
- name: HOSTNAME
value: "telegraf-polling-service"
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
service:
enabled: true
type: ClusterIP
annotations: {}
rbac:
create: true
clusterWide: false
rules: []
serviceAccount:
create: false
name:
annotations: {}
config:
agent:
interval: 60s
round_interval: true
metric_batch_size: 1000000
metric_buffer_limit: 100000000
collection_jitter: 0s
flush_interval: 60s
flush_jitter: 0s
precision: ''
hostname: '9825128'
omit_hostname: false
processors:
- enum:
mapping:
field: "status"
dest: "status_code"
value_mappings:
healthy: 1
problem: 2
critical: 3
inputs:
- modbus:
name: "PS MAIN ENGINE"
controller: 'tcp://192.168.0.101:502'
slave_id: 1
holding_registers:
- name: "Coolant Level"
byte_order: CDAB
data_type: FLOAT32
scale: 0.001
address: [51410, 51411]
- modbus:
name: "SB MAIN ENGINE"
controller: 'tcp://192.168.0.102:502'
slave_id: 1
holding_registers:
- name: "Coolant Level"
byte_order: CDAB
data_type: FLOAT32
scale: 0.001
address: [51410, 51411]
outputs:
- influxdb_v2:
token: token
organization: organisation
bucket: bucket
urls:
- "url"
metrics:
health:
enabled: true
service_address: "http://:8888"
threshold: 5000.0
internal:
enabled: true
collect_memstats: false
pdb:
create: true
minAvailable: 1
Problem resolved by doing the following steps
deleted config section of my values.yaml
added my telegraf.conf to /additional_config path
added configmap to kubernetes with the following command
kubectl create configmap external-config --from-file=/additional_config
added the following command to values.yaml
volumes:
- name: my-config
configMap:
name: external-config
volumeMounts:
- name: my-config
mountPath: /additional_config
args:
- "--config=/etc/telegraf/telegraf.conf"
- "--config-directory=/additional_config"
I am working on a use-case where I need to add a new container in jupyterhub pod , This new container (sidecontainer) monitors jupyterhub directories.
Jupyterhub container while comming up creates a dynamic PV , see the section below
singleuser:
podNameTemplate:
extraTolerations: []
nodeSelector: {}
extraNodeAffinity:
required: []
preferred: []
extraPodAffinity:
required: []
preferred: []
extraPodAntiAffinity:
required: []
preferred: []
networkTools:
image:
name: jupyterhub/k8s-network-tools
tag: "set-by-chartpress"
pullPolicy:
pullSecrets: []
resources: {}
cloudMetadata:
# block set to true will append a privileged initContainer using the
# iptables to block the sensitive metadata server at the provided ip.
blockWithIptables: true
ip: 169.254.169.254
networkPolicy:
enabled: true
ingress: []
egress:
# Required egress to communicate with the hub and DNS servers will be
# augmented to these egress rules.
#
# This default rule explicitly allows all outbound traffic from singleuser
# pods, except to a typical IP used to return metadata that can be used by
# someone with malicious intent.
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
events: true
extraAnnotations: {}
extraLabels:
hub.jupyter.org/network-access-hub: "true"
extraFiles: {}
extraEnv: {}
lifecycleHooks: {}
initContainers: []
extraContainers: []
uid: 1000
fsGid: 100
serviceAccountName:
storage:
type: dynamic
extraLabels: {}
extraVolumes: []
extraVolumeMounts: []
static:
pvcName:
subPath: "{username}"
capacity: 10Gi
homeMountPath: /home/jovyan
dynamic:
storageClass:
pvcNameTemplate: claim-{username}{servername}
volumeNameTemplate: volume-{username}{servername}
storageAccessModes: [ReadWriteOnce]
image:
name: jupyterhub/k8s-singleuser-sample
tag: "set-by-chartpress"
pullPolicy:
pullSecrets: []
startTimeout: 300
cpu:
limit:
guarantee:
memory:
limit:
guarantee: 1G
extraResource:
limits: {}
guarantees: {}
cmd:
defaultUrl:
extraPodConfig: {}
profileList: []
I have added my new container in the extraContainer section of deployment file , my container does start but the dynamic PV is not mounted on that container.
Is the use-case I am trying to achieve technically possible at kubernetes level.
full yaml file is present here for reference
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/main/jupyterhub/values.yaml
Config Map for reference
singleuser:
baseUrl: /
cloudMetadata:
enabled: true
ip: xx.xx.xx.xx
cpu: {}
events: true
extraAnnotations: {}
extraConfigFiles:
config_files:
- cm_key: ''
content: ''
file_path: ''
- cm_key: ''
content: ''
file_path: ''
enabled: false
extraContainers:
- image: 'docker:19.03-rc-dind'
lifecycle:
postStart:
exec:
command:
- sh
- '-c'
- update-ca-certificates; echo Certificates Updated
name: dind
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/lib/docker
name: dind-storage
- mountPath: /usr/local/share/ca-certificates/
name: docker-cert
extraEnv:
ACTUAL_HADOOP_CONF_DIR: ''
ACTUAL_HIVE_CONF_DIR: ''
ACTUAL_SPARK_CONF_DIR: ''
CDH_PARCEL_DIR: ''
DOCKER_HOST: ''
JAVA_HOME:
LIVY_URL: ''
SPARK2_PARCEL_DIR: ''
extraLabels:
hub.jupyter.org/network-access-hub: 'true'
extraNodeAffinity:
preferred: []
required: []
extraPodAffinity:
preferred: []
required: []
extraPodAntiAffinity:
preferred: []
required: []
extraPodConfig: {}
extraResource:
guarantees: {}
limits: {}
extraTolerations: []
fsGid: 0
image:
name: >-
/jupyterhub/jpt-spark-magic
pullPolicy: IfNotPresent
tag: xxx
imagePullSecret:
email: null
enabled: false
registry: null
username: null
initContainers: []
lifecycleHooks: {}
memory:
guarantee: 1G
networkPolicy:
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32
enabled: false
ingress: []
networkTools:
image:
name: >-
/k8s-hub-multispawn
pullPolicy: IfNotPresent
tag: '12345'
nodeSelector: {}
profileList:
- description: Python for data enthusiasts
display_name: 0
kubespawner_override:
cmd:
- jpt-entry-cmd.sh
cpu_limit: 1
environment:
XYZ_SERVICE_URL: 'http://XYZ:8080'
CURL_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
DOCKER_HOST: 'tcp://localhost:2375'
HADOOP_CONF_DIR: /etc/hadoop/conf
HADOOP_HOME: /usr/hdp/3.1.5.6091-7/hadoop/
HDP_DIR: /usr/hdp/3.1.5.6091-7
HDP_HOME_DIR: /usr/hdp/3.1.5.6091-7
HDP_VERSION: 3.1.5.6091-7
HIVE_CONF_DIR: /usr/hdp/3.1.5.6091-7/hive
HIVE_HOME: /usr/hdp/3.1.5.6091-7/hive
INTEGRATION_ENV: HDP3
JAVA_HOME: /usr/jdk64/jdk1.8.0_112
LD_LIBRARY_PATH: >-
/usr/hdp/3.1.5.6091-7/hadoop/lib/native:/usr/jdk64/jdk1.8.0_112/jre:/usr/hdp/3.1.5.6091-7/usr/lib/:/usr/hdp/3.1.5.6091-7/usr/lib/
LIVY_URL: 'http://ammaster01.fake.org:8999'
MLFLOW_TRACKING_URI: 'http://mlflow:5100'
NO_PROXY: mlflow
SPARK_CONF_DIR: /etc/spark2/conf
SPARK_HOME: /usr/hdp/3.1.5.6091-7/spark2
SPARK2_PARCEL_DIR: /usr/hdp/3.1.5.6091-7/spark2
TOOLS_BASE_PATH: /usr/local/bin
image: >-
/jupyterhub/jpt-spark-magic:1.1.2
mem_limit: 4096M
uid: 0
- description: R for data enthusiasts
display_name: 1
kubespawner_override:
cmd:
- start-all.sh
environment:
XYZ_SERVICE_URL: 'http://XYZ-service:8080'
DISABLE_AUTH: 'true'
XYZ: /home/rstudio/kitematic
image: '/jupyterhub/rstudio:364094'
uid: 0
- description: Python for data enthusiasts test2
display_name: 2
kubespawner_override:
cmd:
- jpt-entry-cmd.sh
cpu_limit: 4
environment:
XYZ_SERVICE_URL: 'http://XYZ-service:8080'
CURL_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
DOCKER_HOST: 'tcp://localhost:2375'
HADOOP_CONF_DIR: /etc/hadoop/conf
HADOOP_HOME: /usr/hdp/3.1.5.6091-7/hadoop/
HDP_DIR: /usr/hdp/3.1.5.6091-7
HDP_HOME_DIR: /usr/hdp/3.1.5.6091-7
HDP_VERSION: 3.1.5.6091-7
HIVE_CONF_DIR: /usr/hdp/3.1.5.6091-7/hive
HIVE_HOME: /usr/hdp/3.1.5.6091-7/hive
INTEGRATION_ENV: HDP3
JAVA_HOME: /usr/jdk64/jdk1.8.0_112
LD_LIBRARY_PATH: >-
/usr/hdp/3.1.5.6091-7/hadoop/lib/native:/usr/jdk64/jdk1.8.0_112/jre:/usr/hdp/3.1.5.6091-7/usr/lib/:/usr/hdp/3.1.5.6091-7/usr/lib/
LIVY_URL: 'http://xyz:8999'
MLFLOW_TRACKING_URI: 'http://mlflow:5100'
NO_PROXY: mlflow
SPARK_CONF_DIR: /etc/spark2/conf
SPARK_HOME: /usr/hdp/3.1.5.6091-7/spark2
SPARK2_PARCEL_DIR: /usr/hdp/3.1.5.6091-7/spark2
TOOLS_BASE_PATH: /usr/local/bin
image: >-
/jupyterhub/jpt-spark-magic:1.1.2
mem_limit: 8192M
uid: 0
- description: Python for data enthusiasts test3
display_name: 3
kubespawner_override:
cmd:
- jpt-entry-cmd.sh
cpu_limit: 8
environment:
XYZ_SERVICE_URL: 'http://XYZ-service:8080'
CURL_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
DOCKER_HOST: 'tcp://localhost:2375'
HADOOP_CONF_DIR: /etc/hadoop/conf
HADOOP_HOME: /usr/hdp/3.1.5.6091-7/hadoop/
HDP_DIR: /usr/hdp/3.1.5.6091-7
HDP_HOME_DIR: /usr/hdp/3.1.5.6091-7
HDP_VERSION: 3.1.5.6091-7
HIVE_CONF_DIR: /usr/hdp/3.1.5.6091-7/hive
HIVE_HOME: /usr/hdp/3.1.5.6091-7/hive
INTEGRATION_ENV: HDP3
JAVA_HOME: /usr/jdk64/jdk1.8.0_112
LD_LIBRARY_PATH: >-
/usr/hdp/3.1.5.6091-7/hadoop/lib/native:/usr/jdk64/jdk1.8.0_112/jre:/usr/hdp/3.1.5.6091-7/usr/lib/:/usr/hdp/3.1.5.6091-7/usr/lib/
LIVY_URL: 'http://fake.org:8999'
MLFLOW_TRACKING_URI: 'http://mlflow:5100'
NO_PROXY: mlflow
SPARK_CONF_DIR: /etc/spark2/conf
SPARK_HOME: /usr/hdp/3.1.5.6091-7/spark2
SPARK2_PARCEL_DIR: /usr/hdp/3.1.5.6091-7/spark2
TOOLS_BASE_PATH: /usr/local/bin
image: >-
/jupyterhub/jpt-spark-magic:1.1.2
mem_limit: 16384M
uid: 0
startTimeout: 300
storage:
capacity: 10Gi
dynamic:
pvcNameTemplate: 'claim-{username}{servername}'
storageAccessModes:
- ReadWriteOnce
storageClass: nfs-client
volumeNameTemplate: 'volume-{username}{servername}'
extraLabels: {}
extraVolumeMounts:
- mountPath: /etc/krb5.conf
name: krb
readOnly: true
- mountPath: /usr/jdk64/jdk1.8.0_112
name: java-home
readOnly: true
- mountPath: /xyz/conda/envs
name: xyz-conda-envs
readOnly: false
- mountPath: /usr/hdp/
name: bigdata
readOnly: true
subPath: usr-hdp
- mountPath: /etc/hadoop/
name: bigdata
readOnly: true
subPath: HDP
- mountPath: /etc/hive/
name: bigdata
readOnly: true
subPath: hdp-hive
- mountPath: /etc/spark2/
name: bigdata
readOnly: true
subPath: hdp-spark2
extraVolumes:
- emptyDir: {}
name: dind-storage
- name: docker-cert
secret:
secretName: docker-cert
- hostPath:
path: /var/lib/ut_xyz_ts/jdk1.8.0_112
type: Directory
name: java-home
- hostPath:
path: /xyz/conda/envs
type: Directory
name: xyz-conda-envs
- hostPath:
path: /etc/krb5.conf
type: File
name: krb
- name: bigdata
persistentVolumeClaim:
claimName: bigdata
homeMountPath: '/home/{username}'
static:
subPath: '{username}'
type: dynamic
uid: 0
Thanks in advance ..
For you question, whether technically two or more containers of same pod share same volume, the answer in Yes. Refer here - https://youtu.be/GQJP9QdHHs8?t=82 .
But you should have a volumeMount (refer example in the video as well) defined in your extra containers spec as well. If you can check that, or share output of kubectl describe deployment <your-deployment> I can confirm it.
prometheus-prometheus-kube-prometheus-prometheus-0 0/2 Terminating 0 4s
alertmanager-prometheus-kube-prometheus-alertmanager-0 0/2 Terminating 0 10s
After updating EKS cluster to 1.16 from 1.15 everything works fine except these two pods, they keep on terminating and unable to initialise. Hence, prometheus monitoring does not work. I am getting below errors while describing the pods.
Error: failed to start container "prometheus": Error response from daemon: OCI runtime create failed: container_linux.go:362: creating new parent process caused: container_linux.go:1941: running lstat on namespace path "/proc/29271/ns/ipc" caused: lstat /proc/29271/ns/ipc: no such file or directory: unknown
Error: failed to start container "config-reloader": Error response from daemon: cannot join network of a non running container: 7e139521980afd13dad0162d6859352b0b2c855773d6d4062ee3e2f7f822a0b3
Error: cannot find volume "config" to mount into container "config-reloader"
Error: cannot find volume "config" to mount into container "prometheus"
here is my yaml file for the deployment:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2021-04-30T16:39:14Z"
deletionGracePeriodSeconds: 600
deletionTimestamp: "2021-04-30T16:49:14Z"
generateName: prometheus-prometheus-kube-prometheus-prometheus-
labels:
app: prometheus
app.kubernetes.io/instance: prometheus-kube-prometheus-prometheus
app.kubernetes.io/managed-by: prometheus-operator
app.kubernetes.io/name: prometheus
app.kubernetes.io/version: 2.26.0
controller-revision-hash: prometheus-prometheus-kube-prometheus-prometheus-56d9fcf57
operator.prometheus.io/name: prometheus-kube-prometheus-prometheus
operator.prometheus.io/shard: "0"
prometheus: prometheus-kube-prometheus-prometheus
statefulset.kubernetes.io/pod-name: prometheus-prometheus-kube-prometheus-prometheus-0
name: prometheus-prometheus-kube-prometheus-prometheus-0
namespace: mo
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: prometheus-prometheus-kube-prometheus-prometheus
uid: 326a09f2-319c-449d-904a-1dd0019c6d80
resourceVersion: "9337443"
selfLink: /api/v1/namespaces/monitoring/pods/prometheus-prometheus-kube-prometheus-prometheus-0
uid: e2be062f-749d-488e-a6cc-42ef1396851b
spec:
containers:
- args:
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
- --config.file=/etc/prometheus/config_out/prometheus.env.yaml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=10d
- --web.enable-lifecycle
- --storage.tsdb.no-lockfile
- --web.external-url=http://prometheus-kube-prometheus-prometheus.monitoring:9090
- --web.route-prefix=/
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
name: prometheus
ports:
- containerPort: 9090
name: web
protocol: TCP
readinessProbe:
failureThreshold: 120
httpGet:
path: /-/ready
port: web
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/prometheus/config_out
name: config-out
readOnly: true
- mountPath: /etc/prometheus/certs
name: tls-assets
readOnly: true
- mountPath: /prometheus
name: prometheus-prometheus-kube-prometheus-prometheus-db
- mountPath: /etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-kube-prometheus-prometheus-token-mh66q
readOnly: true
- args:
- --listen-address=:8080
- --reload-url=http://localhost:9090/-/reload
- --config-file=/etc/prometheus/config/prometheus.yaml.gz
- --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
- --watched-dir=/etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
command:
- /bin/prometheus-config-reloader
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: SHARD
value: "0"
image: quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0
imagePullPolicy: IfNotPresent
name: config-reloader
ports:
- containerPort: 8080
name: reloader-web
protocol: TCP
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/prometheus/config
name: config
- mountPath: /etc/prometheus/config_out
name: config-out
- mountPath: /etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-kube-prometheus-prometheus-token-mh66q
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: prometheus-prometheus-kube-prometheus-prometheus-0
nodeName: ip-10-1-49-45.ec2.internal
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: prometheus-kube-prometheus-prometheus
serviceAccountName: prometheus-kube-prometheus-prometheus
subdomain: prometheus-operated
terminationGracePeriodSeconds: 600
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: config
secret:
defaultMode: 420
secretName: prometheus-prometheus-kube-prometheus-prometheus
- name: tls-assets
secret:
defaultMode: 420
secretName: prometheus-prometheus-kube-prometheus-prometheus-tls-assets
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
- emptyDir: {}
name: prometheus-prometheus-kube-prometheus-prometheus-db
- name: prometheus-kube-prometheus-prometheus-token-mh66q
secret:
defaultMode: 420
secretName: prometheus-kube-prometheus-prometheus-token-mh66q
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-04-30T16:39:14Z"
status: "True"
type: PodScheduled
phase: Pending
qosClass: Burstable
If someone needs to know the answer, in my case(the above situation) there were 2 Prometheus operators running in different different namespace, 1 in default & another monitoring namespace. so I removed the one from the default namespace and it resolved my pods crashing issue.
I'm trying to deploy the mongodb community operator in openshift 3.11, using the following commands:
git clone https://github.com/mongodb/mongodb-kubernetes-operator.git
cd mongodb-kubernetes-operator
oc new-project mongodb
oc create -f deploy/crds/mongodb.com_mongodb_crd.yaml -n mongodb
oc create -f deploy/operator/role.yaml -n mongodb
oc create -f deploy/operator/role_binding.yaml -n mongodb
oc create -f deploy/operator/service_account.yaml -n mongodb
oc apply -f deploy/openshift/operator_openshift.yaml -n mongodb
oc apply -f deploy/crds/mongodb.com_v1_mongodb_openshift_cr.yaml -n mongodb
Operator pod is successfully running, but the mongodb replicaset pods do not spin up. Error is as follows:
[kubenode#master mongodb-kubernetes-operator]$ oc get pods
NAME READY STATUS RESTARTS AGE
example-openshift-mongodb-0 1/2 CrashLoopBackOff 4 2m
mongodb-kubernetes-operator-66bfcbcf44-9xvj7 1/1 Running 0 2m
[kubenode#master mongodb-kubernetes-operator]$ oc logs -f example-openshift-mongodb-0 -c mongodb-agent
panic: Failed to get current user: user: unknown userid 1000510000
goroutine 1 [running]:
com.tengen/cm/util.init.3()
/data/mci/2f46ec94982c5440960d2b2bf2b6ae15/mms-automation/build/go-dependencies/src/com.tengen/cm/util/user.go:14 +0xe5
I have gone through all the issues raised on the mongodb-kubernetes-operator repository which are related to this issue (reference), and found a suggestion to set the MANAGED_SECURITY_CONTEXT environment variable to true in the operator, mongodb and mongodb-agent containers.
I have done so for all of these containers, but am still facing the same issue.
Here is the confirmation that the environment variables are correctly set:
[kubenode#master mongodb-kubernetes-operator]$ oc set env statefulset.apps/example-openshift-mongodb --list
# statefulsets/example-openshift-mongodb, container mongodb-agent
AGENT_STATUS_FILEPATH=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
AUTOMATION_CONFIG_MAP=example-openshift-mongodb-config
HEADLESS_AGENT=true
MANAGED_SECURITY_CONTEXT=true
# POD_NAMESPACE from field path metadata.namespace
# statefulsets/example-openshift-mongodb, container mongod
AGENT_STATUS_FILEPATH=/healthstatus/agent-health-status.json
MANAGED_SECURITY_CONTEXT=true
[kubenode#master mongodb-kubernetes-operator]$ oc set env deployment.apps/mongodb-kubernetes-operator --list
# deployments/mongodb-kubernetes-operator, container mongodb-kubernetes-operator
# WATCH_NAMESPACE from field path metadata.namespace
# POD_NAME from field path metadata.name
MANAGED_SECURITY_CONTEXT=true
OPERATOR_NAME=mongodb-kubernetes-operator
AGENT_IMAGE=quay.io/mongodb/mongodb-agent:10.19.0.6562-1
VERSION_UPGRADE_HOOK_IMAGE=quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
Operator Information
Operator Version: 0.3.0
MongoDB Image used: 4.2.6
Cluster Information
[kubenode#master mongodb-kubernetes-operator]$ openshift version
openshift v3.11.0+62803d0-1
[kubenode#master mongodb-kubernetes-operator]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-15T09:45:30Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2020-12-07T17:59:40Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Update
When I check the replica pod yaml (see below), I see three occurrences of runAsUser security context set as 1000510000. I'm not sure how, but this is being set even though I'm not setting it manually.
[kubenode#master mongodb-kubernetes-operator]$ oc get -o yaml pod example-openshift-mongodb-0
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/scc: restricted
creationTimestamp: 2021-01-19T07:45:05Z
generateName: example-openshift-mongodb-
labels:
app: example-openshift-mongodb-svc
controller-revision-hash: example-openshift-mongodb-6549495b
statefulset.kubernetes.io/pod-name: example-openshift-mongodb-0
name: example-openshift-mongodb-0
namespace: mongodb
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: example-openshift-mongodb
uid: 3e91eb40-5a2a-11eb-a5e0-0050569b1f59
resourceVersion: "15616863"
selfLink: /api/v1/namespaces/mongodb/pods/example-openshift-mongodb-0
uid: 3ea17a28-5a2a-11eb-a5e0-0050569b1f59
spec:
containers:
- command:
- agent/mongodb-agent
- -cluster=/var/lib/automation/config/cluster-config.json
- -skipMongoStart
- -noDaemonize
- -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- -serveStatusPort=5000
- -useLocalMongoDbTools
env:
- name: AGENT_STATUS_FILEPATH
value: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- name: AUTOMATION_CONFIG_MAP
value: example-openshift-mongodb-config
- name: HEADLESS_AGENT
value: "true"
- name: MANAGED_SECURITY_CONTEXT
value: "true"
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/mongodb/mongodb-agent:10.19.0.6562-1
imagePullPolicy: Always
name: mongodb-agent
readinessProbe:
exec:
command:
- /var/lib/mongodb-mms-automation/probes/readinessprobe
failureThreshold: 60
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/automation/config
name: automation-config
readOnly: true
- mountPath: /data
name: data-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: example-openshift-mongodb-agent-scram-credentials
- mountPath: /var/log/mongodb-mms-automation/healthstatus
name: healthstatus
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
- command:
- /bin/sh
- -c
- |2
# run post-start hook to handle version changes
/hooks/version-upgrade
# wait for config to be created by the agent
while [ ! -f /data/automation-mongod.conf ]; do sleep 3 ; done ; sleep 2 ;
# start mongod with this configuration
exec mongod -f /data/automation-mongod.conf ;
env:
- name: AGENT_STATUS_FILEPATH
value: /healthstatus/agent-health-status.json
- name: MANAGED_SECURITY_CONTEXT
value: "true"
image: mongo:4.2.6
imagePullPolicy: IfNotPresent
name: mongod
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: data-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: example-openshift-mongodb-agent-scram-credentials
- mountPath: /healthstatus
name: healthstatus
- mountPath: /hooks
name: hooks
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
dnsPolicy: ClusterFirst
hostname: example-openshift-mongodb-0
imagePullSecrets:
- name: mongodb-kubernetes-operator-dockercfg-jhplw
initContainers:
- command:
- cp
- version-upgrade-hook
- /hooks/version-upgrade
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
imagePullPolicy: Always
name: mongod-posthook
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /hooks
name: hooks
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
nodeName: node1.192.168.27.116.nip.io
nodeSelector:
node-role.kubernetes.io/compute: "true"
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000510000
seLinuxOptions:
level: s0:c23,c2
serviceAccount: mongodb-kubernetes-operator
serviceAccountName: mongodb-kubernetes-operator
subdomain: example-openshift-mongodb-svc
terminationGracePeriodSeconds: 30
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-volume-example-openshift-mongodb-0
- name: automation-config
secret:
defaultMode: 416
secretName: example-openshift-mongodb-config
- name: example-openshift-mongodb-agent-scram-credentials
secret:
defaultMode: 384
secretName: example-openshift-mongodb-agent-scram-credentials
- emptyDir: {}
name: healthstatus
- emptyDir: {}
name: hooks
- name: mongodb-kubernetes-operator-token-lr9l4
secret:
defaultMode: 420
secretName: mongodb-kubernetes-operator-token-lr9l4
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:46:45Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:46:39Z
message: 'containers with unready status: [mongodb-agent]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
message: 'containers with unready status: [mongodb-agent]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:45:05Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://bd3ede9178bb78267bc19d1b5da0915d3bcd1d4dcee3e142c7583424bd2aa777
image: docker.io/mongo:4.2.6
imageID: docker-pullable://docker.io/mongo#sha256:c880f6b56f443bb4d01baa759883228cd84fa8d78fa1a36001d1c0a0712b5a07
lastState: {}
name: mongod
ready: true
restartCount: 0
state:
running:
startedAt: 2021-01-19T07:46:55Z
- containerID: docker://5e39c0b6269b8231bbf9cabb4ff3457d9f91e878eff23953e318a9475fb8a90e
image: quay.io/mongodb/mongodb-agent:10.19.0.6562-1
imageID: docker-pullable://quay.io/mongodb/mongodb-agent#sha256:790c2670ef7cefd61cfaabaf739de16dbd2e07dc3b539add0da21ab7d5ac7626
lastState:
terminated:
containerID: docker://5e39c0b6269b8231bbf9cabb4ff3457d9f91e878eff23953e318a9475fb8a90e
exitCode: 2
finishedAt: 2021-01-19T19:39:58Z
reason: Error
startedAt: 2021-01-19T19:39:58Z
name: mongodb-agent
ready: false
restartCount: 144
state:
waiting:
message: Back-off 5m0s restarting failed container=mongodb-agent pod=example-openshift-mongodb-0_mongodb(3ea17a28-5a2a-11eb-a5e0-0050569b1f59)
reason: CrashLoopBackOff
hostIP: 192.168.27.116
initContainerStatuses:
- containerID: docker://7c31cef2a68e3e6100c2cc9c83e3780313f1e8ab43bebca79ad4d48613f124bd
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
imageID: docker-pullable://quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook#sha256:e99105b1c54e12913ddaf470af8025111a6e6e4c8917fc61be71d1bc0328e7d7
lastState: {}
name: mongod-posthook
ready: true
restartCount: 0
state:
terminated:
containerID: docker://7c31cef2a68e3e6100c2cc9c83e3780313f1e8ab43bebca79ad4d48613f124bd
exitCode: 0
finishedAt: 2021-01-19T07:46:45Z
reason: Completed
startedAt: 2021-01-19T07:46:44Z
phase: Running
podIP: 10.129.0.119
qosClass: BestEffort
startTime: 2021-01-19T07:46:39Z
The article here: https://istio.io/docs/tasks/security/authn-policy/
Specifically, when I follow the instruction on the Setup section, I can't connect any httpbin that are residing in namespace foo and bar. But the legacy's one is okay. I expect there is something wrong in the side car proxy being installed.
Here is the output of httpbin pod yaml file (after being injected with istioctl kubeinject --includeIPRanges "10.32.0.0/16" command). I use --includeIPRanges so that the pod can communicate with external ip (for my debugging purpose to install dnsutils, etc package)
apiVersion: v1
kind: Pod
metadata:
annotations:
sidecar.istio.io/inject: "true"
sidecar.istio.io/status: '{"version":"4120ea817406fd7ed43b7ecf3f2e22abe453c44d3919389dcaff79b210c4cd86","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
creationTimestamp: 2018-08-15T11:40:59Z
generateName: httpbin-8b9cf99f5-
labels:
app: httpbin
pod-template-hash: "465795591"
version: v1
name: httpbin-8b9cf99f5-9c47z
namespace: foo
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: httpbin-8b9cf99f5
uid: 1450d75d-a080-11e8-aece-42010a940168
resourceVersion: "65722138"
selfLink: /api/v1/namespaces/foo/pods/httpbin-8b9cf99f5-9c47z
uid: 1454b68d-a080-11e8-aece-42010a940168
spec:
containers:
- image: docker.io/citizenstig/httpbin
imagePullPolicy: IfNotPresent
name: httpbin
ports:
- containerPort: 8000
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-pkpvf
readOnly: true
- args:
- proxy
- sidecar
- --configPath
- /etc/istio/proxy
- --binaryPath
- /usr/local/bin/envoy
- --serviceCluster
- httpbin
- --drainDuration
- 45s
- --parentShutdownDuration
- 1m0s
- --discoveryAddress
- istio-pilot.istio-system:15007
- --discoveryRefreshDelay
- 1s
- --zipkinAddress
- zipkin.istio-system:9411
- --connectTimeout
- 10s
- --statsdUdpAddress
- istio-statsd-prom-bridge.istio-system.istio-system:9125
- --proxyAdminPort
- "15000"
- --controlPlaneAuthPolicy
- NONE
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: ISTIO_META_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
image: docker.io/istio/proxyv2:1.0.0
imagePullPolicy: IfNotPresent
name: istio-proxy
resources:
requests:
cpu: 10m
securityContext:
privileged: false
readOnlyRootFilesystem: true
runAsUser: 1337
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/certs/
name: istio-certs
readOnly: true
dnsPolicy: ClusterFirst
initContainers:
- args:
- -p
- "15001"
- -u
- "1337"
- -m
- REDIRECT
- -i
- 10.32.0.0/16
- -x
- ""
- -b
- 8000,
- -d
- ""
image: docker.io/istio/proxy_init:1.0.0
imagePullPolicy: IfNotPresent
name: istio-init
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeName: gke-tvlk-data-dev-default-medium-pool-46397778-q2sb
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-pkpvf
secret:
defaultMode: 420
secretName: default-token-pkpvf
- emptyDir:
medium: Memory
name: istio-envoy
- name: istio-certs
secret:
defaultMode: 420
optional: true
secretName: istio.default
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-08-15T11:41:01Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-08-15T11:44:28Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-08-15T11:40:59Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://758e130a4c31a15c1b8bc1e1f72bd7739d5fa1103132861eea9ae1a6ae1f080e
image: citizenstig/httpbin:latest
imageID: docker-pullable://citizenstig/httpbin#sha256:b81c818ccb8668575eb3771de2f72f8a5530b515365842ad374db76ad8bcf875
lastState: {}
name: httpbin
ready: true
restartCount: 0
state:
running:
startedAt: 2018-08-15T11:41:01Z
- containerID: docker://9c78eac46a99457f628493975f5b0c5bbffa1dac96dab5521d2efe4143219575
image: istio/proxyv2:1.0.0
imageID: docker-pullable://istio/proxyv2#sha256:77915a0b8c88cce11f04caf88c9ee30300d5ba1fe13146ad5ece9abf8826204c
lastState:
terminated:
containerID: docker://52299a80a0fa8949578397357861a9066ab0148ac8771058b83e4c59e422a029
exitCode: 255
finishedAt: 2018-08-15T11:44:27Z
reason: Error
startedAt: 2018-08-15T11:41:02Z
name: istio-proxy
ready: true
restartCount: 1
state:
running:
startedAt: 2018-08-15T11:44:28Z
hostIP: 10.32.96.27
initContainerStatuses:
- containerID: docker://f267bb44b70d2d383ce3f9943ab4e917bb0a42ecfe17fe0ed294bde4d8284c58
image: istio/proxy_init:1.0.0
imageID: docker-pullable://istio/proxy_init#sha256:345c40053b53b7cc70d12fb94379e5aa0befd979a99db80833cde671bd1f9fad
lastState: {}
name: istio-init
ready: true
restartCount: 0
state:
terminated:
containerID: docker://f267bb44b70d2d383ce3f9943ab4e917bb0a42ecfe17fe0ed294bde4d8284c58
exitCode: 0
finishedAt: 2018-08-15T11:41:00Z
reason: Completed
startedAt: 2018-08-15T11:41:00Z
phase: Running
podIP: 10.32.19.61
qosClass: Burstable
startTime: 2018-08-15T11:40:59Z
Here is the example command when I got the error sleep.legacy -> httpbin.foo
> kubectl exec $(kubectl get pod -l app=sleep -n legacy -o jsonpath={.items..metadata.name}) -c sleep -n legacy -- curl http://httpbin.foo:8000/ip -s -o /dev/null -w "%{http_code}\n"
000
command terminated with exit code 7
** Here is the example command when I get success status: sleep.legacy -> httpbin.legacy **
> kubectl exec $(kubectl get pod -l app=sleep -n legacy -o jsonpath={.items..metadata.name}) -csleep -n legacy -- curl http://httpbin.legacy:8000/ip -s -o /dev/null -w "%{http_code}\n"
200
I have followed the instruction to ensure there is no mtls policy defined, etc.
> kubectl get policies.authentication.istio.io --all-namespaces
No resources found.
> kubectl get meshpolicies.authentication.istio.io
No resources found.
> kubectl get destinationrules.networking.istio.io --all-namespaces -o yaml | grep "host:"
host: istio-policy.istio-system.svc.cluster.local
host: istio-telemetry.istio-system.svc.cluster.local
NVM, I think I found why. There is configuration being messed up in my part.
If you take a look at the statsd address, it is defined with unrecognized hostname istio-statsd-prom-bridge.istio-system.istio-system:9125. I noticed that after looking at the proxy container being restarted/crashed multiple times.