I made the Kubernetes cluster with 2 azure Ubuntu VMs and trying to monitor the cluster. For that, I have deployed node-exporter daemonSet, heapster, Prometheus and grafana. Configured the node-exporter as a target in Prometheus rules files. but I am getting Get http://master-ip:30002/metrics: context deadline exceeded error. I have also increased scrape_interval and scrape_timeout values in the Prometheus-rules file.
The following are the manifest files for the Prometheus-rules file and node-exporter daemonSet and service files.
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: node-exporter
name: node-exporter
namespace: kube-system
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
containers:
- args:
- --web.listen-address=<master-IP>:30002
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
image: quay.io/prometheus/node-exporter:v0.18.1
name: node-exporter
resources:
limits:
cpu: 250m
memory: 180Mi
requests:
cpu: 102m
memory: 180Mi
volumeMounts:
- mountPath: /host/proc
name: proc
readOnly: false
- mountPath: /host/sys
name: sys
readOnly: false
- mountPath: /host/root
mountPropagation: HostToContainer
name: root
readOnly: true
- args:
- --logtostderr
- --secure-listen-address=[$(IP)]:9100
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
- --upstream=http://<master-IP>:30002/
env:
- name: IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: quay.io/coreos/kube-rbac-proxy:v0.4.1
name: kube-rbac-proxy
ports:
- containerPort: 9100
hostPort: 9100
name: https
resources:
limits:
cpu: 20m
memory: 40Mi
requests:
cpu: 10m
memory: 20Mi
hostNetwork: true
hostPID: true
nodeSelector:
kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: node-exporter
tolerations:
- operator: Exists
volumes:
- hostPath:
path: /proc
name: proc
- hostPath:
path: /sys
name: sys
- hostPath:
path: /
name: root
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: node-exporter
name: node-exporter
namespace: kube-system
spec:
type: NodePort
ports:
- name: https
port: 9100
targetPort: https
nodePort: 30002
selector:
app: node-exporter
---prometheus-config-map.yaml-----
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: default
data:
prometheus.yml: |-
global:
scrape_interval: 5m
evaluation_interval: 3m
scrape_configs:
- job_name: 'node'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
static_configs:
- targets: ['<master-IP>:30002']
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
Can we take service as NodePort for Node-exporter daemonSet? if the answer NO, how could we configure in a prometheus-rules file as the target? Could anyone help me to understand the scenario? Are any suggestible links also fine?
As #gayahtri confirmed in comments
it worked for me. – gayathri
If you have same issue as mentioned in topic check out this github issue
specifically this answer added by #simonpasquier
We have debugged it offline and the problem was the network.
Running the Prometheus container with "--network=host" solved the issue.
Related
I've got Kubernetes cluster with ECK Operator deployed. I also deploy Filebeat to my cluster. Here's file:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: filebeat
namespace: logging-dev
spec:
type: filebeat
version: 8.2.0
elasticsearchRef:
name: elastic-logging-dev
kibanaRef:
name: kibana
config:
filebeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints:
enabled: true
default_config:
type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
processors:
- add_cloud_metadata: { }
- add_host_metadata: { }
daemonSet:
podTemplate:
metadata:
annotations:
co.elastic.logs/enabled: "false"
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true # Allows to provide richer host metadata
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: filebeat
securityContext:
runAsUser: 0
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
requests:
memory: 500Mi
cpu: 100m
limits:
memory: 500Mi
cpu: 200m
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
It's working very well, but I want to also parse Kubernetes logs, eg.:
E0819 18:57:51.309161 1 watcher.go:327] failed to prepare current and previous objects: conversion webhook for minio.min.io/v2, Kind=Tenant failed: Post "https://operator.minio-operator.svc:4222/webhook/v1/crd-conversion?timeout=30s": dial tcp 10.233.8.119:4222: connect: connection refused
How can I do that?
In Fluentd it's quite simple:
<filter kubernetes.var.log.containers.kube-apiserver-*_kube-system_*.log>
#type parser
key_name log
format /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)$/
time_format %m%d %H:%M:%S.%N
types pid:integer
reserve_data true
remove_key_name_field false
</filter>
But I cannot find any example, tutorial or whatever how to do this with Filebeat.
Filebeat 7.12.1
ECK operator 2.2
I'm trying to setup the filbeat for the Nginx-ingress access logs in my ECK stack (installed in GKE). I can access the logs directly in the pod but nothing is coming to my Kibana dashboard.
I have set up two filebeat.autodiscover.providers
hints.enabled: true, which looks for all the containers with co.elastic.logs/enabled: "true"
Checks the container containing name ingress. I can confirm that the name of the pod is nginx-ingress-ingress-nginx-controller-xxxx-xxxxx
Below is my Filebeat auto discover content:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: filebeat
namespace: search
spec:
type: filebeat
version: 7.12.1
elasticsearchRef:
name: elastic-search
kibanaRef:
name: kibana-web
config:
filebeat.autodiscover.providers:
- node: ${NODE_NAME}
type: kubernetes
hints.enabled: true
#add_resource_metadata.namespace.enabled: true
hints.default_config.enabled: "false"
- node: ${NODE_NAME}
type: kubernetes
#add_resource_metadata.namespace.enabled: true
hints.default_config.enabled: "false"
templates:
- condition:
contains:
kubernetes.container.name: ingress
config:
- paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
type: container
exclude_lines: ["^\\s+[\\-`('.|_]"]
processors:
- add_cloud_metadata: {}
- add_host_metadata: {}
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirstWithHostNet
#hostNetwork: true # Allows to provide richer host metadata
containers:
- name: filebeat
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
requests:
memory: 200Mi
cpu: 0.2
limits:
memory: 300Mi
cpu: 0.4
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
Adding the answer here in case someone else run into this issue.
The issue is how I'm checking the contains condition. It should've been kubernetes.pod.name instead of kubernetes.container.name. So I replaced
- condition:
contains:
kubernetes.container.name: ingress
to
- condition:
contains:
kubernetes.pod.name: ingress
in the above file and things started to work!
env:
kubernetes provider: gke
kubernetes version: v1.13.12-gke.25
grafana version: 6.6.2 (official image)
grafana deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
# securityContext:
# runAsUser: 104
# allowPrivilegeEscalation: true
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Problem
when I deployed this grafana dashboard first time, its working fine. after sometime I restarted the pod to check whether volume mount is working or not. after restarting, I getting below error.
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
what I understand from this error, user could create these files. How can I give this user appropriate permission to start grafana successfully?
I recreated your deployment with appropriate PVC and noticed that grafana pod was failing.
Output of command: $ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
grafana-6466cd95b5-4g95f 0/1 Error 2 65s
Further investigation pointed the same errors as yours:
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
This error showed on first creation of a pod and the deployment. There was no need to recreate any pods.
What I did to make it work was to edit your deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
securityContext:
runAsUser: 472
fsGroup: 472
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Please take a specific look on part:
securityContext:
runAsUser: 472
fsGroup: 472
It is a setting described in official documentation: Kubernetes.io: set the security context for a pod
Please take a look on this Github issue which is similar to yours and pointed me to solution that allowed pod to spawn correctly:
https://github.com/grafana/grafana-docker/issues/167
Grafana had some major updates starting from version 5.1. Please take a look: Grafana.com: Docs: Migrate to v5.1 or later
Please let me know if this helps.
On v8.0, I do that setting runAsUser: 0.
It works.
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- name: grafana-tcp
port: 3000
protocol: TCP
targetPort: 3000
selector:
project: grafana
type: LoadBalancer
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
project: grafana
name: grafana
spec:
replicas: 1
selector:
matchLabels:
project: grafana
strategy:
type: RollingUpdate
template:
metadata:
labels:
project: grafana
name: grafana
spec:
securityContext:
runAsUser: 0
containers:
- image: grafana/grafana
name: grafana
ports:
- containerPort: 3000
protocol: TCP
resources: {}
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-volume
volumes:
- name: grafana-volume
hostPath:
# directory location on host
path: /opt/grafana
# this field is optional
type: DirectoryOrCreate
restartPolicy: Always
status: {}
I have a namespace called airflow that has 2 pods: webserver and scheduler. I want to deploy scheduler on node A and webserver on node B.
And here you can see deployment files:
scheduler:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: airflow
name: airflow-scheduler
labels:
name: airflow-scheduler
spec:
replicas: 1
template:
metadata:
labels:
app: airflow-scheduler
spec:
terminationGracePeriodSeconds: 60
containers:
- name: scheduler
image: 123423.dkr.ecr.us-east-1.amazonaws.com/airflow:$COMMIT_SHA1
volumeMounts:
- name: logs
mountPath: /logs
command: ["airflow"]
args: ["scheduler"]
imagePullPolicy: Always
resources:
limits:
memory: "3072Mi"
requests:
cpu: "500m"
memory: "2048Mi"
volumes:
- name: logs
persistentVolumeClaim:
claimName: logs
webserver:
apiVersion: v1
kind: Service
metadata:
name: airflow-webserver
namespace: airflow
labels:
run: airflow-webserver
spec:
ports:
- port: 80
targetPort: 8080
selector:
run: airflow-webserver
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: airflow-webserver
namespace: airflow
annotations:
kubernetes.io/ingress.class: nginx
certmanager.k8s.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- airflow.awesome.com.br
secretName: airflow-crt
rules:
- host: airflow.awesome.com.br
http:
paths:
- path: /
backend:
serviceName: airflow-webserver
servicePort: 80
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: airflow
name: airflow-webserver
labels:
run: airflow-webserver
spec:
replicas: 1
template:
metadata:
labels:
run: airflow-webserver
spec:
terminationGracePeriodSeconds: 60
containers:
- name: webserver
image: 123423.dkr.ecr.us-east-1.amazonaws.com/airflow:$COMMIT_SHA1
volumeMounts:
- name: logs
mountPath: /logs
ports:
- containerPort: 8080
command: ["airflow"]
args: ["webserver"]
imagePullPolicy: Always
resources:
limits:
cpu: "200m"
memory: "3072Mi"
requests:
cpu: "100m"
memory: "2048Mi"
volumes:
- name: logs
persistentVolumeClaim:
claimName: logs
What's the proper way to ensure that pods will be deployed on different nodes?
edit1:
antiaffinity is not working:
I've tried to set podAntiAffinity on scheduler but it's not working:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: name
operator: In
values:
- airflow-webserver
topologyKey: "kubernetes.io/hostname"
If you want to have these pods run on different nodes but you don't care about which nodes exactly, you can use the Pod anti-affinity feature. It basically defines that the pod X should not run on the same node (it can be also used with failure domain / zones etc., not just with nodes) as pod Y and uses labels to specify the pods. So you will need to add some labels and specify them in the spec sections. More info about it is in Kube docs.
If in addition you want to also specify on which node it should run, you can use the Node affinity feature. See Kube docs for more details.
I get
All host(s) tried for query failed (tried: 10.244.0.72/10.244.0.72:9042 (com.datastax.driver.core.exceptions.TransportException: [10.244.0.72/10.244.0.72:9042] Channel has been closed))
when trying to access Cassandra within the same namespace. Although when I forward ports it works ok from localhost. keyspace is created successfully.
kubectl port-forward cassandra1-0 9042:9042
My yaml
apiVersion: v1
kind: Service
metadata:
name: cassandra1
labels:
app: cassandra1
spec:
ports:
- name: "cql"
protocol: "TCP"
port: 9042
targetPort: 9042
- name: "thrift"
protocol: "TCP"
port: 9160
targetPort: 9160
selector:
app: cassandra1
type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra1
labels:
app: cassandra1
spec:
serviceName: cassandra1
replicas: 1
selector:
matchLabels:
app: cassandra1
template:
metadata:
labels:
app: cassandra1
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra1
image: gcr.io/google-samples/cassandra:v13
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
- containerPort: 9160
name: thrift
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra1-0.cassandra1.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "cassandra1"
- name: CASSANDRA_DC
value: "DC1-cassandra1"
- name: CASSANDRA_RACK
value: "Rack1-cassandra1"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
# These volume mounts are persistent. They are like inline claims,
# but not exactly because the names need to match exactly one of
# the stateful pod volumes.
volumeMounts:
- name: cassandra1-data
mountPath: /cassandra1_data
volumeClaimTemplates:
- metadata:
name: cassandra1-data
namespace: default
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
Cassandra starts with following properties:
Starting Cassandra on 10.244.0.72
CASSANDRA_CONF_DIR /etc/cassandra
CASSANDRA_CFG /etc/cassandra/cassandra.yaml
CASSANDRA_AUTO_BOOTSTRAP true
CASSANDRA_BROADCAST_ADDRESS 10.244.0.72
CASSANDRA_BROADCAST_RPC_ADDRESS 10.244.0.72
CASSANDRA_CLUSTER_NAME cassandra1
CASSANDRA_COMPACTION_THROUGHPUT_MB_PER_SEC
CASSANDRA_CONCURRENT_COMPACTORS
CASSANDRA_CONCURRENT_READS
CASSANDRA_CONCURRENT_WRITES
CASSANDRA_COUNTER_CACHE_SIZE_IN_MB
CASSANDRA_DC DC1-cassandra1
CASSANDRA_DISK_OPTIMIZATION_STRATEGY ssd
CASSANDRA_ENDPOINT_SNITCH SimpleSnitch
CASSANDRA_GC_WARN_THRESHOLD_IN_MS
CASSANDRA_INTERNODE_COMPRESSION
CASSANDRA_KEY_CACHE_SIZE_IN_MB
CASSANDRA_LISTEN_ADDRESS 10.244.0.72
CASSANDRA_LISTEN_INTERFACE
CASSANDRA_MEMTABLE_ALLOCATION_TYPE
CASSANDRA_MEMTABLE_CLEANUP_THRESHOLD
CASSANDRA_MEMTABLE_FLUSH_WRITERS
CASSANDRA_MIGRATION_WAIT 1
CASSANDRA_NUM_TOKENS 32
CASSANDRA_RACK Rack1-cassandra1
CASSANDRA_RING_DELAY 30000
CASSANDRA_RPC_ADDRESS 0.0.0.0
CASSANDRA_RPC_INTERFACE
CASSANDRA_SEEDS cassandra1-0.cassandra1.default.svc.cluster.local
CASSANDRA_SEED_PROVIDER org.apache.cassandra.locator.SimpleSeedProvider
changed ownership of '/cassandra_data/data' from root to cassandra
changed ownership of '/cassandra_data' from root to cassandra
In my application that runs in the same namespace i tried setting cassandraport to 9042 and host to:
10.240.0.4 (hostIP)
10.244.0.72 (podIP)
cassandra1 (name of the service)
cassandra1.default
cassandra1.default.svc.cluster.local
cassandra1-0.cassandra1.default.svc.cluster.local
_cql._tcp.cassandra1.default.svc.cluster.local
I also tried different types of a service:
headless, ClusterIP, NodePort
Does anybody has ANY ideas what is wrong or what else can i try to get this to work?