Prometheus/Alertmanager - many alerts firing - Why?

Prometheus/Alertmanager - many alerts firing - Why? - kubernetes

I have a 4 node K8s cluster set up via kubeadm on a local VM cluster. I am using the following:
Kubernetes 1.24
Helm 3.10.0
kube-prometheus-stack Helm chart 41.7.4 (app version 0.60.1)
When I go into either Prometheus or Alertmanager, there are many alerts that are always firing. Another thing to note is that Alertmanager "cluster status" is reporting as "disabled". Not sure what bearing (if any) that may have on this. I have not added any new alerts of my own - everything was presumably deployed with the Helm chart.
I do not understand what these alerts are triggering for other than what I can glean from their names. It does not seem a good thing that these alerts should be firing. Either there is something seriously wrong with the cluster or something is poorly configured in the alerting configuration of the Helm chart. I'm leaning toward the second case, but will admit, I really don't know.
Here is a listing of the firing alerts, along with label info:
etcdMembersDown
alertname=etcdMembersDown, job=kube-etcd, namespace=kube-system, pod=etcd-gagnon-m1, service=prometheus-stack-kube-prom-kube-etcd, severity=critical
etcdInsufficientMembers
alertname=etcdInsufficientMembers, endpoint=http-metrics, job=kube-etcd, namespace=kube-system, pod=etcd-gagnon-m1, service=prometheus-stack-kube-prom-kube-etcd, severity=critical
TargetDown
alertname=TargetDown, job=kube-scheduler, namespace=kube-system, service=prometheus-stack-kube-prom-kube-scheduler, severity=warning
alertname=TargetDown, job=kube-etcd, namespace=kube-system, service=prometheus-stack-kube-prom-kube-etcd, severity=warning
alertname=TargetDown, job=kube-proxy, namespace=kube-system, service=prometheus-stack-kube-prom-kube-proxy, severity=warning
alertname=TargetDown, job=kube-controller-manager, namespace=kube-system, service=prometheus-stack-kube-prom-kube-controller-manager, severity=warning
KubePodNotReady
alertname=KubePodNotReady, namespace=monitoring, pod=prometheus-stack-grafana-759774797c-r44sb, severity=warning
KubeDeploymentReplicasMismatch
alertname=KubeDeploymentReplicasMismatch, container=kube-state-metrics, deployment=prometheus-stack-grafana, endpoint=http, instance=192.168.42.19:8080, job=kube-state-metrics, namespace=monitoring, pod=prometheus-stack-kube-state-metrics-848f74474d-gp6pw, service=prometheus-stack-kube-state-metrics, severity=warning
KubeControllerManagerDown
alertname=KubeControllerManagerDown, severity=critical
KubeProxyDown
alertname=KubeProxyDown, severity=critical
KubeSchedulerDown
alertname=KubeSchedulerDown, severity=critical
Here is my values.yaml:
defaultRules:
create: true
rules:
alertmanager: true
etcd: true
configReloaders: true
general: true
k8s: true
kubeApiserverAvailability: true
kubeApiserverBurnrate: true
kubeApiserverHistogram: true
kubeApiserverSlos: true
kubeControllerManager: true
kubelet: true
kubeProxy: true
kubePrometheusGeneral: true
kubePrometheusNodeRecording: true
kubernetesApps: true
kubernetesResources: true
kubernetesStorage: true
kubernetesSystem: true
kubeSchedulerAlerting: true
kubeSchedulerRecording: true
kubeStateMetrics: true
network: true
node: true
nodeExporterAlerting: true
nodeExporterRecording: true
prometheus: true
prometheusOperator: true
prometheus:
enabled: true
ingress:
enabled: true
ingressClassName: nginx
hosts:
- prometheus.<hidden>
paths:
- /
pathType: ImplementationSpecific
grafana:
enabled: true
ingress:
enabled: true
ingressClassName: nginx
hosts:
- grafana.<hidden>
path: /
persistence:
enabled: true
size: 10Gi
alertmanager:
enabled: true
ingress:
enabled: true
ingressClassName: nginx
hosts:
- alerts.<hidden>
paths:
- /
pathType: ImplementationSpecific
config:
global:
slack_api_url: '<hidden>'
route:
receiver: "slack-default"
group_by:
- alertname
- cluster
- service
group_wait: 30s
group_interval: 5m # 5m
repeat_interval: 2h # 4h
routes:
- receiver: "slack-warn-critical"
matchers:
- severity =~ "warning|critical"
continue: true
receivers:
- name: "null"
- name: "slack-default"
slack_configs:
- send_resolved: true # false
channel: "#alerts-test"
- name: "slack-warn-critical"
slack_configs:
- send_resolved: true # false
channel: "#alerts-test"
kubeControllerManager:
service:
enabled: true
ports:
http: 10257
targetPorts:
http: 10257
serviceMonitor:
https: true
insecureSkipVerify: "true"
kubeEtcd:
serviceMonitor:
scheme: https
servername: <do I need it - don't know what this should be>
cafile: <do I need it - don't know what this should be>
certFile: <do I need it - don't know what this should be>
keyFile: <do I need it - don't know what this should be>
kubeProxy:
serviceMonitor:
https: true
kubeScheduler:
service:
enabled: true
ports:
http: 10259
targetPorts:
http: 10259
serviceMonitor:
https: true
insecureSkipVerify: "true"
Is there something wrong with this configuration? Are there any Kubernetes objects that might be missing or misconfigured? It seems very odd that one could install this Helm chart and experience this many "failures". Is there perhaps, a major problem with my cluster? I would think that if there was really something wrong with etcd, the kube-scheduler or kube-proxy that I would experience problems everywhere, but I am not.
If there is any other information I can pull from the cluster or related artifacts that might help, let me know and I will include them.

Related

In Grafana I am getting a "400 Bad Request Client sent an HTTP request to an HTTPS server" when trying to update datasource configmaps

In Grafana I notice that when I deploy a configmap that should add a datasource it makes no change and does not add the new datasource - note that the configmap is in the cluster and in the correct namespace.
If I make a change to the configmap I get the following error if I look at the logs for the grafana-sc-datasources container:
POST request sent to http://localhost:3000/api/admin/provisioning/datasources/reload. Response: 400 Bad Request Client sent an HTTP request to an HTTPS server.
I assume I do not see any changes because it can not make the post request.
I played around a bit and at one point I did see changes being made/updated in the datasources:
I changed the protocol to http under grafana: / server: / protocol: and I was NOT able to open the grafana website but I did notice that if I did make a change to a datasource configmap in the cluster then I would see a successful 200 message in logs of the grafana-sc-datasources container : POST request sent to http://localhost:3000/api/admin/provisioning/datasources/reload. Response: 200 OK {"message":"Datasources config reloaded"}.
So I assume just need to know how to get Grafana to send the POST request as https instead of http.
Can someone point me to what might be wrong and how to fix it?
Note that I am pretty new to K8s, grafana and helmcharts.
Here is a configmap that I am trying to get to work:
apiVersion: v1
kind: ConfigMap
metadata:
name: jaeger-${NACKLE_ENV}-grafana-datasource
labels:
grafana_datasource: '1'
data:
jaeger-datasource.yaml: |-
apiVersion: 1
datasources:
- name: Jaeger-${NACKLE_ENV}
type: jaeger
access: browser
url: http://jaeger-${NACKLE_ENV}-query.${NACKLE_ENV}.svc.cluster.local:16690
version: 1
basicAuth: false
Here is the current Grafana values file:
# use 1 replica when using a StatefulSet
# If we need more than 1 replica, then we'll have to:
# - remove the `persistence` section below
# - use an external database for all replicas to connect to (refer to Grafana Helm chart docs)
replicas: 1
image:
pullSecrets:
- docker-hub
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: eks.amazonaws.com/capacityType
operator: In
values:
- ON_DEMAND
persistence:
enabled: true
type: statefulset
storageClassName: biw-durable-gp2
podDisruptionBudget:
maxUnavailable: 1
admin:
existingSecret: grafana
sidecar:
datasources:
enabled: true
label: grafana_datasource
dashboards:
enabled: true
label: grafana_dashboard
labelValue: 1
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
node-exporter:
gnetId: 1860
revision: 23
datasource: Prometheus
core-dns:
gnetId: 12539
revision: 5
datasource: Prometheus
fluentd:
gnetId: 7752
revision: 6
datasource: Prometheus
ingress:
apiVersion: networking.k8s.io/v1
enabled: true
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: '/api/health'
alb.ingress.kubernetes.io/healthcheck-protocol: HTTPS
alb.ingress.kubernetes.io/backend-protocol: HTTPS
# Redirect to HTTPS at the ALB
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
spec:
rules:
- http:
paths:
- path: /*
pathType: ImplementationSpecific
backend:
service:
name: ssl-redirect
port:
name: use-annotation
defaultBackend:
service:
name: grafana
port:
number: 80
livenessProbe: { "httpGet": { "path": "/api/health", "port": 3000, "scheme": "HTTPS" }, "initialDelaySeconds": 60, "timeoutSeconds": 30, "failureThreshold": 10 }
readinessProbe: { "httpGet": { "path": "/api/health", "port": 3000, "scheme": "HTTPS" } }
service:
type: NodePort
name: grafana
rolePrefix: app-role
env: eks-test
serviceAccount:
name: grafana
annotations:
eks.amazonaws.com/role-arn: ""
pod:
spec:
serviceAccountName: grafana
grafana.ini:
server:
# don't use enforce_domain - it causes an infinite redirect in our setup
# enforce_domain: true
enable_gzip: true
# NOTE - if I set the protocol to http I do see it make changes to datasources but I can not see the website
protocol: https
cert_file: /biw-cert/domain.crt
cert_key: /biw-cert/domain.key
users:
auto_assign_org_role: Editor
# https://grafana.com/docs/grafana/v6.5/auth/gitlab/
auth.gitlab:
enabled: true
allow_sign_up: true
org_role: Editor
scopes: read_api
auth_url: https://gitlab.biw-services.com/oauth/authorize
token_url: https://gitlab.biw-services.com/oauth/token
api_url: https://gitlab.biw-services.com/api/v4
allowed_groups: nackle-teams/devops
securityContext:
fsGroup: 472
runAsUser: 472
runAsGroup: 472
extraConfigmapMounts:
- name: "cert-configmap"
mountPath: "/biw-cert"
subPath: ""
configMap: biw-grafana-cert
readOnly: true

Didn't creating kiali ingress resource after kiali deployment

I deployed the kiali-operator helm chart in the kiali-operator namespace and istio in the istio-system namespace. Now I am trying to deploy the kiali workload in the istio-system.
But somehow ingress rule did not create. I attached the kiali deployment YAML file for reference.
apiVersion: kiali.io/v1alpha1
kind: Kiali
metadata:
name: kiali
namespace: istio-system
spec:
istio_labels:
app_label_name: "app.kubernetes.io/name"
installation_tag: "kiali"
istio_namespace: "istio-system"
version: "default"
auth:
strategy: token
custom_dashboards:
- name: "envoy"
deployment:
accessible_namespaces: ["**"]
ingress:
# default: additional_labels is empty
# additional_labels:
# ingressAdditionalLabel: "ingressAdditionalLabelValue"
class_name: "nginx"
default: enabled is undefined
enabled: true
# default: override_yaml is undefined
override_yaml:
metadata:
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/secure-backends: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
rules:
- http:
paths:
- path: "/kiali"
pathType: Prefix
backend:
service:
name: "kiali"
port:
number: 20001
instance_name: "kiali"
external_services:
custom_dashboards:
enabled: true
istio:
component_status:
components:
- app_label: "istiod"
is_core: true
is_proxy: false
- app_label: "istio-ingressgateway"
is_core: true
is_proxy: true
# default: namespace is undefined
namespace: istio-system
- app_label: "istio-egressgateway"
is_core: false
is_proxy: true
# default: namespace is undefined
namespace: istio-system
enabled: true
config_map_name: "istio"
envoy_admin_local_port: 15000
# default: istio_canary_revision is undefined
istio_canary_revision:
current: "1-9-9"
upgrade: "1-10-2"
istio_identity_domain: "svc.cluster.local"
istio_injection_annotation: "sidecar.istio.io/inject"
istio_sidecar_annotation: "sidecar.istio.io/status"
istio_sidecar_injector_config_map_name: "istio-sidecar-injector"
istiod_deployment_name: "istiod"
istiod_pod_monitoring_port: 15014
root_namespace: ""
url_service_version: ""
prometheus:
# Prometheus service name is "metrics" and is in the "telemetry" namespace
url: "<prome_url>"
grafana:
auth:
ca_file: ""
insecure_skip_verify: false
password: "password"
token: ""
type: "basic"
use_kiali_token: false
username: "user"
enabled: true
# Grafana service name is "grafana" and is in the "telemetry" namespace.
in_cluster_url: '<grafana_url>'
url: '<grafana_url>'
tracing:
enabled: true
in_cluster_url: '<jaeger-url>'
use_grpc: true
Advance thanks for any help!!

How to connect opensearch dashboards to SSO AzureAD

I'm trying to have SSO in opensearch-dashboards via openid to AzureAD.
Overally - there is no need to have an encrypted communication between opensearch and nodes, there is no need to have encrypted communication between dashboards and master pod. All I need is to have working SSO to Azure AD to see dashboards.
I got errors in dashboards pod like: "res":{"statusCode":302,"responseTime":746,"contentLength":9} and tags":["error","plugins","securityDashboards"],"pid":1,"message":"OpenId authentication failed: Error: [index_not_found_exception] no such index [_plugins], with { index=\"_plugins\" │ │ & resource.id=\"_plugins\" & resource.type=\"index_expression\" & index_uuid=\"_na_\" }"} and the browser tells me The page isn’t redirecting properly
With last try I got from the ingress pod the error: Service "default/opensearch-values-opensearch-dashboards" does not have any active Endpoint.
I really appreciate any advice what am I missing...
I use helm installation of opensearch to AWS EKS (with nginx-controller ingress to publish the adress)
In AD I have an app registered like https://<some_address>/auth/openid/login
Here are my actual helm values:
opensearch.yaml
---
clusterName: "opensearch-cluster"
nodeGroup: "master"
masterService: "opensearch-cluster-master"
roles:
- master
- ingest
- data
- remote_cluster_client
replicas: 3
minimumMasterNodes: 1
majorVersion: ""
global:
dockerRegistry: "<registry>"
opensearchHome: /usr/share/opensearch
config:
log4j2.properties: |
rootLogger.level = debug
opensearch.yml: |
cluster.name: opensearch-cluster
network.host: 0.0.0.0
plugins.security.disabled: true
plugins:
security:
ssl:
transport:
pemcert_filepath: esnode.pem
pemkey_filepath: esnode-key.pem
pemtrustedcas_filepath: root-ca.pem
enforce_hostname_verification: false
http:
enabled: false
pemcert_filepath: esnode.pem
pemkey_filepath: esnode-key.pem
pemtrustedcas_filepath: root-ca.pem
allow_unsafe_democertificates: true
allow_default_init_securityindex: true
authcz:
admin_dn:
- CN=kirk,OU=client,O=client,L=test,C=de
audit.type: internal_opensearch
enable_snapshot_restore_privilege: true
check_snapshot_restore_write_privileges: true
restapi:
roles_enabled: ["all_access", "security_rest_api_access"]
system_indices:
enabled: true
indices:
[
".opendistro-alerting-config",
".opendistro-alerting-alert*",
".opendistro-anomaly-results*",
".opendistro-anomaly-detector*",
".opendistro-anomaly-checkpoints",
".opendistro-anomaly-detection-state",
".opendistro-reports-*",
".opendistro-notifications-*",
".opendistro-notebooks",
".opendistro-asynchronous-search-response*",
]
extraEnvs: []
envFrom: []
secretMounts: []
hostAliases: []
image:
repository: "opensearchproject/opensearch"
tag: ""
pullPolicy: "IfNotPresent"
podAnnotations: {}
labels: {}
opensearchJavaOpts: "-Xmx512M -Xms512M"
resources:
requests:
cpu: "1000m"
memory: "100Mi"
initResources: {}
sidecarResources: {}
networkHost: "0.0.0.0"
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
persistence:
enabled: true
enableInitChown: true
labels:
enabled: false
accessModes:
- ReadWriteOnce
size: 8Gi
annotations: {}
extraVolumes: []
extraVolumeMounts: []
extraContainers: []
extraInitContainers:
- name: sysctl
image: docker.io/bitnami/bitnami-shell:10-debian-10-r199
imagePullPolicy: "IfNotPresent"
command:
- /bin/bash
- -ec
- |
CURRENT=`sysctl -n vm.max_map_count`;
DESIRED="262144";
if [ "$DESIRED" -gt "$CURRENT" ]; then
sysctl -w vm.max_map_count=262144;
fi;
CURRENT=`sysctl -n fs.file-max`;
DESIRED="65536";
if [ "$DESIRED" -gt "$CURRENT" ]; then
sysctl -w fs.file-max=65536;
fi;
securityContext:
privileged: true
priorityClassName: ""
antiAffinityTopologyKey: "kubernetes.io/hostname"
antiAffinity: "soft"
nodeAffinity: {}
topologySpreadConstraints: []
podManagementPolicy: "Parallel"
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
headless:
annotations: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
securityConfig:
enabled: true
path: "/usr/share/opensearch/plugins/opensearch-security/securityconfig"
actionGroupsSecret:
configSecret:
internalUsersSecret:
rolesSecret:
rolesMappingSecret:
tenantsSecret:
config:
securityConfigSecret: ""
dataComplete: true
data:
config.yml: |-
config:
dynamic:
authc:
basic_internal_auth_domain:
description: "Authenticate via HTTP Basic"
http_enabled: true
transport_enabled: true
order: 1
http_authenticator:
type: "basic"
challenge: false
authentication_backend:
type: "internal"
openid_auth_domain:
order: 0
http_enabled: true
transport_enabled: true
http_authenticator:
type: openid
challenge: false
config:
enable_ssl: true
verify_hostnames: false
subject_key: preferred_username
roles_key: roles
openid_connect_url: https://login.microsoftonline.com/<ms_id>/v2.0/.well-known/openid-configuration
authentication_backend:
type: noop
roles_mapping.yml: |-
all_access
reserved: false
backend_roles:
- "admin"
description: "Maps admin to all_access"
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 2000
schedulerName: ""
imagePullSecrets:
- name: regcred
nodeSelector: {}
tolerations: []
ingress:
enabled: false
annotations: {}
path: /
hosts:
- chart-example.local
tls: []
nameOverride: ""
fullnameOverride: ""
masterTerminationFix: false
lifecycle: {}
keystore: []
networkPolicy:
create: false
http:
enabled: false
fsGroup: ""
sysctl:
enabled: false
plugins:
enabled: false
installList: []
extraObjects: []
opensearch-dashboards.yaml
---
opensearchHosts: "http://opensearch-cluster-master:9200"
replicaCount: 1
image:
repository: "<registry>"
tag: "1.3.1"
pullPolicy: "IfNotPresent"
imagePullSecrets:
- name: regcred
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
rbac:
create: true
secretMounts: []
podAnnotations: {}
extraEnvs: []
envFrom: []
extraVolumes: []
extraVolumeMounts: []
extraInitContainers: ""
extraContainers: ""
podSecurityContext: {}
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
config:
opensearch_dashboards.yml: |
opensearch_security.cookie.secure: false
opensearch_security.auth.type: openid
opensearch_security.openid.client_id: <client_id>
opensearch_security.openid.client_secret: <client_secret>
opensearch_security.openid.base_redirect_url: https://<some_aws_id>.elb.amazonaws.com
opensearch_security.openid.connect_url: https://login.microsoftonline.com/<MS id>/v2.0/.well-known/openid-configuration
priorityClassName: ""
opensearchAccount:
secret: ""
keyPassphrase:
enabled: false
labels: {}
hostAliases: []
serverHost: "0.0.0.0"
service:
type: ClusterIP
port: 5601
loadBalancerIP: ""
nodePort: ""
labels: {}
annotations: {}
loadBalancerSourceRanges: []
httpPortName: http
ingress:
enabled: false
annotations: {}
hosts:
- host: chart-example.local
paths:
- path: /
backend:
serviceName: chart-example.local
servicePort: 80
tls: []
resources:
requests:
cpu: "100m"
memory: "512M"
limits:
cpu: "100m"
memory: "512M"
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
updateStrategy:
type: "Recreate"
nodeSelector: {}
tolerations: []
affinity: {}
extraObjects: []

UI 404 - Vault Kubernetes

I'm testing out Vault in Kubernetes and am installing via the Helm chart. I've created an overrides file, it's an amalgamation of a few different pages from the official docs.
The pods seem to come up OK and into Ready status and I can unseal vault manually using 3 of the keys generated. I'm having issues getting 404 when browsing the UI though, the UI is presented externally on a Load Balancer in AKS. Here's my config:
global:
enabled: true
tlsDisable: false
injector:
enabled: false
server:
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
# livenessProbe:
# enabled: true
# path: "/v1/sys/health?standbyok=true"
# initialDelaySeconds: 60
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca
extraVolumes:
- type: secret
name: vault-server-tls # Matches the ${SECRET_NAME} from above
standalone:
enabled: true
config: |
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
tls_client_ca_file = "/vault/userconfig/vault-server-tls/vault.ca"
}
storage "file" {
path = "/vault/data"
}
# Vault UI
ui:
enabled: true
serviceType: "LoadBalancer"
serviceNodePort: null
externalPort: 443
# For Added Security, edit the below
# loadBalancerSourceRanges:
# 5.69.25.6/32
I'm still trying to get to grips with Vault. My liveness probe is commented out because it was permanently failing and causing the pod to be re-scheduled, even though checking the vault service status it appeared to be healthy and awaiting an unseal. That's a side issue though compared to the UI, just mentioning in case the failing liveness is related.
Thanks!

So, I don't think the documentation around deploying in Kubernetes from Helm is really that clear but I was basically missing a ui = true flag from the HCL config stanza. It's to be noted that this is in addition to the value passed to the helm chart:
# Vault UI
ui:
enabled: true
serviceType: "LoadBalancer"
serviceNodePort: null
externalPort: 443
Which I had mistakenly assumed was enough to enable the UI.
Here's the config now, with working UI:
global:
enabled: true
tlsDisable: false
injector:
enabled: false
server:
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca
extraVolumes:
- type: secret
name: vault-server-tls # Matches the ${SECRET_NAME} from above
standalone:
enabled: true
config: |
ui = true
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
tls_client_ca_file = "/vault/userconfig/vault-server-tls/vault.ca"
}
storage "file" {
path = "/vault/data"
}
# Vault UI
ui:
enabled: true
serviceType: "LoadBalancer"
serviceNodePort: null
externalPort: 443

Error extracting container id - source value does not contain matcher's logs_path '/var/lib/docker/containers/'

I am collect container's log using filebeat in kubernetes cluster, and now collected log shows this error:
2020-06-10T09:26:35.831Z ERROR [kubernetes] add_kubernetes_metadata/matchers.go:91 Error extracting container id - source value does not contain matcher's logs_path '/var/lib/docker/containers/'.
this is the full log output:
I find the filebeat was listening is the node meowk8sslave2 and login into this node found the path exists. why the error could happen? this is my filebeat config:
{
"filebeat.yml": "filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: \"/var/log/containers/\"
output.elasticsearch:
host: '${NODE_NAME}'
hosts: '${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}'
"
}

Look inside your filebeat pod where exactly the logs are made available.
I was testing the ELK stack on Minikube.
In my case it was inside /var/lib/docker/containers/*/*.log
So this one worked for me.
filebeatConfig:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/lib/docker/containers/*/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/lib/docker/containers/"
output.elasticsearch:
host: '${NODE_NAME}'
hosts: '${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}'

change
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
# To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
# filebeat.autodiscover:
# providers:
# - type: kubernetes
# node: ${NODE_NAME}
# hints.enabled: true
# hints.default_config:
# type: container
# paths:
# - /var/log/containers/*${data.kubernetes.container.id}.log
to
# filebeat.inputs:
# - type: container
# paths:
# - /var/log/containers/*.log
# processors:
# - add_kubernetes_metadata:
# host: ${NODE_NAME}
# matchers:
# - logs_path:
# logs_path: "/var/log/containers/"
# To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints.enabled: true
hints.default_config:
type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
Reference: https://discuss.elastic.co/t/problem-to-update-to-filebeat-7-7-0-and-parser-nginx-ingress-controller-on-kubernetes/232461/2

works for me:
update processors
from:
processors:
- add_cloud_metadata: ~
- add_kubernetes_metadata:
in_cluster: true
- drop_event.when.regexp.message: "kube-probe"
to:
processors:
- add_cloud_metadata: ~
- add_kubernetes_metadata:
in_cluster: true
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- drop_event.when.regexp.message: "kube-probe"
maybe you need to update your nginx module log path to:
/var/log/containers/*-${data.kubernetes.container.id}.log

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Prometheus/Alertmanager - many alerts firing - Why? - kubernetes

Related

In Grafana I am getting a "400 Bad Request Client sent an HTTP request to an HTTPS server" when trying to update datasource configmaps

Didn't creating kiali ingress resource after kiali deployment

How to connect opensearch dashboards to SSO AzureAD

UI 404 - Vault Kubernetes

Error extracting container id - source value does not contain matcher's logs_path '/var/lib/docker/containers/'

Categories

Resources