need to apply the prometheus rules from alertmanager repo in k8s - kubernetes

Have 2 gitlab repos:
=> gitlab a
=> gitlab b
gitlab a - contains stateful set and pod of prometheus and prometheus pushgateway
gitlab b - conatins the alertmanager service and alermanager pod and prometheus rules.
all the pods and containers are up and running.
am trying to apply the prometheus-rules to the prometheus stateful set.
prometheusRule.png
need to apply the Kind:prometheus rule to stateful set of prometheus.
can someone help.
applied rules yaml :
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: prometheus-k8s-rules
namespace: cmp-monitoring
spec:
groups:
- name: node-exporter.rules
rules:
- expr: |
count without (cpu) (
count without (mode) (
node_cpu_seconds_total{job="node-exporter"}
)
)
record: instance:node_num_cpu:sum
- expr: |
1 - avg without (cpu, mode) (
rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[1m])
)
record: instance:node_cpu_utilisation:rate1m
prometheus-statefulset
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
labels:
app: prometheus
spec:
selector:
matchLabels:
app: prometheus
serviceName: prometheus
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
terminationGracePeriodSeconds: 10
containers:
- name: prometheus
image: prom/prometheus
imagePullPolicy: Always
ports:
- name: http
containerPort: 9090
volumeMounts:
- name: prometheus-config
mountPath: "/etc/prometheus/prometheus.yml"
subPath: prometheus.yml
- name: prometheus-data
mountPath: "/prometheus"
#- name: rules-general
# mountPath: "/etc/prometheus/prometheus.rules.yml"
# subPath: prometheus.rules.yml
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 120
periodSeconds: 40
successThreshold: 1
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 120
periodSeconds: 40
successThreshold: 1
timeoutSeconds: 10
failureThreshold: 3
securityContext:
fsGroup: 1000
volumes:
- name: prometheus-config
configMap:
name: prometheus-server-conf
#- name: rules-general
# configMap:
# name: prometheus-server-conf
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: rbd-default
resources:
requests:
storage: 10Gi

Related

GKE "no.scale.down.node.pod.not.enough.pdb" log even with existing PDB

My GKE cluster is displaying "Scale down blocked by pod" note, and clicking it then going to the Logs Explorer it shows a filtered view with log entries for the pods that had the incident: no.scale.down.node.pod.not.enough.pdb . But that's really strange since the pods on the log entries having that message do have PDB defined for them. So it seems to me that GKE is wrongly reporting the cause of the blocking of the node scale down. These are the manifests for one of the pods with this issue:
apiVersion: v1
kind: Service
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
ports:
- port: 8000
protocol: TCP
targetPort: 8000
selector:
app: ms-new-api-beta
type: NodePort
The Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
selector:
matchLabels:
app: ms-new-api-beta
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
labels:
app: ms-new-api-beta
spec:
containers:
- command:
- /deploy/venv/bin/gunicorn
- '--bind'
- '0.0.0.0:8000'
- 'newapi.app:app'
- '--chdir'
- /deploy/app
- '--timeout'
- '7200'
- '--workers'
- '1'
- '--worker-class'
- uvicorn.workers.UvicornWorker
- '--log-level'
- DEBUG
env:
- name: ENV
value: BETA
image: >-
gcr.io/.../api:${trigger['tag']}
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 20
timeoutSeconds: 30
name: ms-new-api-beta
ports:
- containerPort: 8000
name: http
protocol: TCP
readinessProbe:
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
resources:
limits:
cpu: 150m
requests:
cpu: 100m
startupProbe:
failureThreshold: 30
httpGet:
path: /rest
port: 8000
periodSeconds: 120
imagePullSecrets:
- name: gcp-docker-registry
The Horizontal Pod Autoscaler:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: ms-new-api-beta
namespace: beta
spec:
maxReplicas: 5
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ms-new-api-beta
targetCPUUtilizationPercentage: 100
And finally, the Pod Disruption Budget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ms-new-api-beta
namespace: beta
spec:
minAvailable: 0
selector:
matchLabels:
app: ms-new-api-beta
no.scale.down.node.pod.not.enough.pdb is not complaining about the lack of a PDB. It is complaining that, if the pod is scaled down, it will be in violation of the existing PDB(s).
The "budget" is how much disruption the Pod can permit. The platform will not take any intentional action which violates that budget.
There may be another PDB in place that would be violated. To check, make sure to review pdbs in the pod's namespace:
kubectl get pdb

Zonal network endpoint group unhealthy even though that container application working properly

I've created a Kubernetes cluster on Google Cloud and even though the application is running properly (which I've checked running requests inside the cluster) it seems that the NEG health check is not working properly. Any ideas on the cause?
I've tried to change the service from NodePort to LoadBalancer, different ways of adding annotations to the service. I was thinking that perhaps it might be related to the https requirement in the django side.
# [START kubernetes_deployment]
apiVersion: apps/v1
kind: Deployment
metadata:
name: moner-app
labels:
app: moner-app
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: moner-app
template:
metadata:
labels:
app: moner-app
spec:
containers:
- name: moner-core-container
image: my-template
imagePullPolicy: Always
resources:
requests:
memory: "128Mi"
limits:
memory: "512Mi"
startupProbe:
httpGet:
path: /ht/
port: 5000
httpHeaders:
- name: "X-Forwarded-Proto"
value: "https"
failureThreshold: 30
timeoutSeconds: 10
periodSeconds: 10
initialDelaySeconds: 90
readinessProbe:
initialDelaySeconds: 120
httpGet:
path: "/ht/"
port: 5000
httpHeaders:
- name: "X-Forwarded-Proto"
value: "https"
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 10
livenessProbe:
initialDelaySeconds: 30
failureThreshold: 3
periodSeconds: 30
timeoutSeconds: 10
httpGet:
path: "/ht/"
port: 5000
httpHeaders:
- name: "X-Forwarded-Proto"
value: "https"
volumeMounts:
- name: cloudstorage-credentials
mountPath: /secrets/cloudstorage
readOnly: true
env:
# [START_secrets]
- name: THIS_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: GRACEFUL_TIMEOUT
value: '120'
- name: GUNICORN_HARD_TIMEOUT
value: '90'
- name: DJANGO_ALLOWED_HOSTS
value: '*,$(THIS_POD_IP),0.0.0.0'
ports:
- containerPort: 5000
args: ["/start"]
# [START proxy_container]
- image: gcr.io/cloudsql-docker/gce-proxy:1.16
name: cloudsql-proxy
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=moner-dev:us-east1:core-db=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
resources:
requests:
memory: "64Mi"
limits:
memory: "128Mi"
volumeMounts:
- name: cloudsql-oauth-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: ssl-certs
mountPath: /etc/ssl/certs
- name: cloudsql
mountPath: /cloudsql
# [END proxy_container]
# [START volumes]
volumes:
- name: cloudsql-oauth-credentials
secret:
secretName: cloudsql-oauth-credentials
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
- name: cloudsql
emptyDir: {}
- name: cloudstorage-credentials
secret:
secretName: cloudstorage-credentials
# [END volumes]
# [END kubernetes_deployment]
---
# [START service]
apiVersion: v1
kind: Service
metadata:
name: moner-svc
annotations:
cloud.google.com/neg: '{"ingress": true, "exposed_ports": {"5000":{}}}' # Creates an NEG after an Ingress is created
cloud.google.com/backend-config: '{"default": "moner-backendconfig"}'
labels:
app: moner-svc
spec:
type: NodePort
ports:
- name: moner-core-http
port: 5000
protocol: TCP
targetPort: 5000
selector:
app: moner-app
# [END service]
---
# [START certificates_setup]
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
name: managed-cert
spec:
domains:
- domain.com
- app.domain.com
# [END certificates_setup]
---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: moner-backendconfig
spec:
customRequestHeaders:
headers:
- "X-Forwarded-Proto:https"
healthCheck:
checkIntervalSec: 15
port: 5000
type: HTTP
requestPath: /ht/
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: managed-cert-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: moner-ssl
networking.gke.io/managed-certificates: managed-cert
kubernetes.io/ingress.class: "gce"
spec:
defaultBackend:
service:
name: moner-svc
port:
name: moner-core-http
Apparently, you didn’t have a GCP firewall rule to allow traffic on port 5000 to your GKE nodes. Creating an ingress firewall rule with IP range - 0.0.0.0/0 and port - TCP 5000 targeted to your GKE nodes could allow your setup to work even with port 5000.
I'm still not sure why, but i've managed to work when moved the service to port 80 and kept the health check on 5000.
Service config:
# [START service]
apiVersion: v1
kind: Service
metadata:
name: moner-svc
annotations:
cloud.google.com/neg: '{"ingress": true, "exposed_ports": {"5000":{}}}' # Creates an NEG after an Ingress is created
cloud.google.com/backend-config: '{"default": "moner-backendconfig"}'
labels:
app: moner-svc
spec:
type: NodePort
ports:
- name: moner-core-http
port: 80
protocol: TCP
targetPort: 5000
selector:
app: moner-app
# [END service]
Backend config:
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: moner-backendconfig
spec:
customRequestHeaders:
headers:
- "X-Forwarded-Proto:https"
healthCheck:
checkIntervalSec: 15
port: 5000
type: HTTP
requestPath: /ht/

Invalid host. To browse Nexus, click here/. To use the Docker registry, point your client at when access to nexus in kubernetes

I am using helm 3 to install nexus in kubernetes v1.18:
helm install stable/sonatype-nexus --name=nexus
and then expose nexus by using traefik 2.x to outside by using domian: nexus.dolphin.com. But when I using domain to access nexus servcie it give me this tips:
Invalid host. To browse Nexus, click here/. To use the Docker registry, point your client
and I have read this question, but It seem not suite for my situation. And this is my nexus yaml config now:
kind: Deployment
apiVersion: apps/v1
metadata:
name: nexus-sonatype-nexus
namespace: infrastructure
selfLink: /apis/apps/v1/namespaces/infrastructure/deployments/nexus-sonatype-nexus
uid: 023de15b-19eb-442d-8375-11532825919d
resourceVersion: '1710210'
generation: 3
creationTimestamp: '2020-08-16T07:17:07Z'
labels:
app: sonatype-nexus
app.kubernetes.io/managed-by: Helm
chart: sonatype-nexus-1.23.1
fullname: nexus-sonatype-nexus
heritage: Helm
release: nexus
annotations:
deployment.kubernetes.io/revision: '1'
meta.helm.sh/release-name: nexus
meta.helm.sh/release-namespace: infrastructure
managedFields:
- manager: Go-http-client
operation: Update
apiVersion: apps/v1
time: '2020-08-16T07:17:07Z'
fieldsType: FieldsV1
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2020-08-18T16:26:34Z'
fieldsType: FieldsV1
spec:
replicas: 1
selector:
matchLabels:
app: sonatype-nexus
release: nexus
template:
metadata:
creationTimestamp: null
labels:
app: sonatype-nexus
release: nexus
spec:
volumes:
- name: nexus-sonatype-nexus-data
persistentVolumeClaim:
claimName: nexus-sonatype-nexus-data
- name: nexus-sonatype-nexus-backup
emptyDir: {}
containers:
- name: nexus
image: 'sonatype/nexus3:3.20.1'
ports:
- name: nexus-docker-g
containerPort: 5003
protocol: TCP
- name: nexus-http
containerPort: 8081
protocol: TCP
env:
- name: install4jAddVmParams
value: >-
-Xms1200M -Xmx1200M -XX:MaxDirectMemorySize=2G
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
- name: NEXUS_SECURITY_RANDOMPASSWORD
value: 'false'
resources: {}
volumeMounts:
- name: nexus-sonatype-nexus-data
mountPath: /nexus-data
- name: nexus-sonatype-nexus-backup
mountPath: /nexus-data/backup
livenessProbe:
httpGet:
path: /
port: 8081
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 30
successThreshold: 1
failureThreshold: 6
readinessProbe:
httpGet:
path: /
port: 8081
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 30
successThreshold: 1
failureThreshold: 6
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
- name: nexus-proxy
image: 'quay.io/travelaudience/docker-nexus-proxy:2.5.0'
ports:
- name: nexus-proxy
containerPort: 8080
protocol: TCP
env:
- name: ALLOWED_USER_AGENTS_ON_ROOT_REGEX
value: GoogleHC
- name: CLOUD_IAM_AUTH_ENABLED
value: 'false'
- name: BIND_PORT
value: '8080'
- name: ENFORCE_HTTPS
value: 'false'
- name: NEXUS_DOCKER_HOST
- name: NEXUS_HTTP_HOST
- name: UPSTREAM_DOCKER_PORT
value: '5003'
- name: UPSTREAM_HTTP_PORT
value: '8081'
- name: UPSTREAM_HOST
value: localhost
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: nexus-sonatype-nexus
serviceAccount: nexus-sonatype-nexus
securityContext:
fsGroup: 2000
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 3
replicas: 1
updatedReplicas: 1
readyReplicas: 1
availableReplicas: 1
conditions:
- type: Progressing
status: 'True'
lastUpdateTime: '2020-08-18T16:23:54Z'
lastTransitionTime: '2020-08-18T16:23:54Z'
reason: NewReplicaSetAvailable
message: >-
ReplicaSet "nexus-sonatype-nexus-79fd4488d5" has successfully
progressed.
- type: Available
status: 'True'
lastUpdateTime: '2020-08-18T16:26:34Z'
lastTransitionTime: '2020-08-18T16:26:34Z'
reason: MinimumReplicasAvailable
message: Deployment has minimum availability.
why the domian could not access nexus by default? and what should I do to access nexus by domain?
From the documentation you should set a property of the helm chart: nexusProxy.env.nexusHttpHost to nexus.dolphin.com
The docker image used here has a proxy that allows you to access the Nexus HTTP and Nexus Docker services by different domains, if you don't specify either then you get the behaviour you're seeing.

Kubernetes Dashboard Ingress returning empty response from server

I am trying to set up the kubernetes dashboard. I have enabled the custom ssl certs from my domain and can curl the pod directly with no issues - i can curl the service and it works with no issues. However, when I try to access via ingress I get (52) empty response from server. I have an NLB forwarding to the port of nginx controller service (ingress works fine with another app). Here is my ingress config:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
labels:
app: dashboard
name: dashboard-ingress
name: dashboard-ingress
namespace: kubernetes-dashboard
spec:
rules:
- host: k8sdash.domain.com
http:
paths:
- backend:
serviceName: kubernetes-dashboard
servicePort: 443
path: /
Here is the Daemonset config for my ingress controllers.
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "3"
creationTimestamp: "2020-05-19T15:48:13Z"
generation: 3
labels:
app: lb
app.kubernetes.io/component: controller
chart: nginx-ingress-1.36.3
heritage: Tiller
release: lb
name: lb-controller
namespace: kube-system
resourceVersion: "747622"
selfLink: /apis/apps/v1/namespaces/kube-system/daemonsets/lb-controller
uid: 19d830ba-f2d9-4c6f-bc8d-d64667a900c7
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: lb
release: lb
template:
metadata:
creationTimestamp: null
labels:
app: lb
app.kubernetes.io/component: controller
component: controller
release: lb
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=kube-system/lb-default-backend
- --publish-service=kube-system/lb-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=kube-system/lb-controller
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: lb-controller
ports:
- containerPort: 80
hostPort: 80
name: http
protocol: TCP
- containerPort: 443
hostPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 101
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: lb
serviceAccountName: lb
terminationGracePeriodSeconds: 60
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberAvailable: 3
numberMisscheduled: 0
numberReady: 3
observedGeneration: 3

Pod load distribution in Kubernetes

I have a service in Kubernetes that receives Http requests to create users,
Only with 1 pod, it correctly reaches 100 requests per minute, after this, it has latencies, the point is that if you hold 100 requests with 1 pod, should you keep 500 requests per minute with 5 pods?
Because even with 10 pods, when exceeding 100 orders per minute, dont correctly distributed the load and appears latency in the services.
The default load configuration I understand is round robin, the problem is that I see that the ram increases only in one of the pods and does not distribute the load correctly.
This is my service yaml deploy and my HPA yaml.
Deploy Yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: create-user-service
labels:
app: create-user-service
spec:
replicas: 1
selector:
matchLabels:
app: create-user-service
template:
metadata:
labels:
app: create-user-service
spec:
volumes:
- name: key
secret:
secretName: my-secret-key
containers:
### [LISTPARTY CONTAINER]
- name: create-user-service
image: docker/user-create/create-user-service:0.0.1
volumeMounts:
- name: key
mountPath: /var/secrets/key
ports:
- containerPort: 8080
env:
- name: PORT
value: "8080"
resources:
limits:
cpu: "2.5"
memory: 6Gi
requests:
cpu: "1.5"
memory: 5Gi
livenessProbe: ## is healthy
failureThreshold: 3
httpGet:
path: /healthcheck/livenessprobe
port: 8080
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: create-user-service
spec:
ports:
- port: 8080
targetPort: 8080
protocol: TCP
name: http
selector:
app: create-user-service
type: NodePort
HPA Yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: create-user-service
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: create-user-service
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 75
- type: Resource
resource:
name: memory
targetAverageUtilization: 75
- external:
metricName: serviceruntime.googleapis.com|api|request_count
metricSelector:
matchLabels:
resource.type: api
resource.labels.service: create-user-service.endpoints.user-create.cloud.goog
targetAverageValue: "3"
type: External
What may be happend?.
Thank you all.