hey guys i’m trying to setup datadog as custom metric for my kubernetes hpa using the official guide:
https://docs.datadoghq.com/agent/cluster_agent/external_metrics/?tab=helm
running on EKS 1.18 & Datadog Cluster Agent (v1.10.0).
the problem is that i can't get the external metrics's for my HPA:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hibob-hpa
spec:
minReplicas: 1
maxReplicas: 5
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: something
metrics:
- type: External
external:
metricName: **kubernetes_state.container.cpu_limit**
metricSelector:
matchLabels:
pod: **something-54c4bd4db7-pm9q5**
targetAverageValue: 9
horizontal-pod-autoscaler unable to get external metric:
canary/nginx.net.request_per_s/&LabelSelector{MatchLabels:map[string]string{kube_app_name: nginx,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: the server is currently unable to handle the request (get nginx.net.request_per_s.external.metrics.k8s.io)
This is the errors i'm getting inside the cluster-agent:
datadog-cluster-agent-585897dc8d-x8l82 cluster-agent 2021-08-20 06:46:14 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/metrics_retriever.go:77 in retrieveMetricsValues) | Unable to fetch external metrics: [Error while executing metric query avg:nginx.net.request_per_s{kubea_app_name:ingress-nginx}.rollup(30): API error 403 Forbidden: {"status":********#datadoghq.com"}, strconv.Atoi: parsing "": invalid syntax]
# datadog-cluster-agent status
Getting the status from the agent.
2021-08-19 15:28:21 UTC | CLUSTER | WARN | (pkg/util/log/log.go:541 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
===============================
Datadog Cluster Agent (v1.10.0)
===============================
Status date: 2021-08-19 15:28:21.519850 UTC
Agent start: 2021-08-19 12:11:44.266244 UTC
Pid: 1
Go Version: go1.14.12
Build arch: amd64
Agent flavor: cluster_agent
Check Runners: 4
Log Level: INFO
Paths
=====
Config File: /etc/datadog-agent/datadog-cluster.yaml
conf.d: /etc/datadog-agent/conf.d
Clocks
======
System UTC time: 2021-08-19 15:28:21.519850 UTC
Hostnames
=========
ec2-hostname: ip-10-30-162-8.eu-west-1.compute.internal
hostname: i-00d0458844a597dec
instance-id: i-00d0458844a597dec
socket-fqdn: datadog-cluster-agent-585897dc8d-x8l82
socket-hostname: datadog-cluster-agent-585897dc8d-x8l82
hostname provider: aws
unused hostname providers:
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
Metadata
========
Leader Election
===============
Leader Election Status: Running
Leader Name is: datadog-cluster-agent-585897dc8d-x8l82
Last Acquisition of the lease: Thu, 19 Aug 2021 12:13:14 UTC
Renewed leadership: Thu, 19 Aug 2021 15:28:07 UTC
Number of leader transitions: 17 transitions
Custom Metrics Server
=====================
External metrics provider uses DatadogMetric - Check status directly from Kubernetes with: `kubectl get datadogmetric`
Admission Controller
====================
Disabled: The admission controller is not enabled on the Cluster Agent
=========
Collector
=========
Running Checks
==============
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
Total Runs: 787
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 660
Service Checks: Last Run: 3, Total: 2,343
Average Execution Time : 1.898s
Last Execution Date : 2021-08-19 15:28:17.000000 UTC
Last Successful Execution Date : 2021-08-19 15:28:17.000000 UTC
=========
Forwarder
=========
Transactions
============
Deployments: 350
Dropped: 0
DroppedOnInput: 0
Nodes: 497
Pods: 3
ReplicaSets: 576
Requeued: 0
Retried: 0
RetryQueueSize: 0
Services: 263
Transaction Successes
=====================
Total number: 3442
Successes By Endpoint:
check_run_v1: 786
intake: 181
orchestrator: 1,689
series_v1: 786
==========
Endpoints
==========
https://app.datadoghq.eu - API Key ending with:
- f295b
=====================
Orchestrator Explorer
=====================
ClusterID: f7b4f97a-3cf2-11ea-aaa8-0a158f39909c
ClusterName: production
ContainerScrubbing: Enabled
======================
Orchestrator Endpoints
======================
===============
Forwarder Stats
===============
Pods: 3
Deployments: 350
ReplicaSets: 576
Services: 263
Nodes: 497
===========
Cache Stats
===========
Elements in the cache: 393
Pods:
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 7 Miss: 5)
Deployments:
Last Run: (Hits: 36 Miss: 1) | Total: (Hits: 40846 Miss: 2444)
ReplicaSets:
Last Run: (Hits: 297 Miss: 1) | Total: (Hits: 328997 Miss: 19441)
Services:
Last Run: (Hits: 44 Miss: 0) | Total: (Hits: 49520 Miss: 2919)
Nodes:
Last Run: (Hits: 9 Miss: 0) | Total: (Hits: 10171 Miss: 755)```
and this is what i get from datadogmetric:
Name: dcaautogen-2f116f4425658dca91a33dd22a3d943bae5b74
Namespace: datadog
Labels: <none>
Annotations: <none>
API Version: datadoghq.com/v1alpha1
Kind: DatadogMetric
Metadata:
Creation Timestamp: 2021-08-19T15:14:14Z
Generation: 1
Managed Fields:
API Version: datadoghq.com/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:status:
.:
f:autoscalerReferences:
f:conditions:
.:
k:{"type":"Active"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:status:
f:type:
k:{"type":"Error"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:message:
f:reason:
f:status:
f:type:
k:{"type":"Updated"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:status:
f:type:
k:{"type":"Valid"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:status:
f:type:
f:currentValue:
Manager: datadog-cluster-agent
Operation: Update
Time: 2021-08-19T15:14:44Z
Resource Version: 164942235
Self Link: /apis/datadoghq.com/v1alpha1/namespaces/datadog/datadogmetrics/dcaautogen-2f116f4425658dca91a33dd22a3d943bae5b74
UID: 6e9919eb-19ca-4131-b079-4a8a9ac577bb
Spec:
External Metric Name: nginx.net.request_per_s
Query: avg:nginx.net.request_per_s{kube_app_name:nginx}.rollup(30)
Status:
Autoscaler References: canary/hibob-hpa
Conditions:
Last Transition Time: 2021-08-19T15:14:14Z
Last Update Time: 2021-08-19T15:53:14Z
Status: True
Type: Active
Last Transition Time: 2021-08-19T15:14:14Z
Last Update Time: 2021-08-19T15:53:14Z
Status: False
Type: Valid
Last Transition Time: 2021-08-19T15:14:14Z
Last Update Time: 2021-08-19T15:53:14Z
Status: True
Type: Updated
Last Transition Time: 2021-08-19T15:14:44Z
Last Update Time: 2021-08-19T15:53:14Z
Message: Global error (all queries) from backend
Reason: Unable to fetch data from Datadog
Status: True
Type: Error
Current Value: 0
Events: <none>
this is my cluster agent deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "18"
meta.helm.sh/release-name: datadog
meta.helm.sh/release-namespace: datadog
creationTimestamp: "2021-02-05T07:36:39Z"
generation: 18
labels:
app.kubernetes.io/instance: datadog
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: datadog
app.kubernetes.io/version: "7"
helm.sh/chart: datadog-2.7.0
name: datadog-cluster-agent
namespace: datadog
resourceVersion: "164881216"
selfLink: /apis/apps/v1/namespaces/datadog/deployments/datadog-cluster-agent
uid: ec52bb4b-62af-4007-9bab-d5d16c48e02c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: datadog-cluster-agent
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
ad.datadoghq.com/cluster-agent.check_names: '["prometheus"]'
ad.datadoghq.com/cluster-agent.init_configs: '[{}]'
ad.datadoghq.com/cluster-agent.instances: |
[{
"prometheus_url": "http://%%host%%:5000/metrics",
"namespace": "datadog.cluster_agent",
"metrics": [
"go_goroutines", "go_memstats_*", "process_*",
"api_requests",
"datadog_requests", "external_metrics", "rate_limit_queries_*",
"cluster_checks_*"
]
}]
checksum/api_key: something
checksum/application_key: something
checksum/clusteragent_token: something
checksum/install_info: something
creationTimestamp: null
labels:
app: datadog-cluster-agent
name: datadog-cluster-agent
spec:
containers:
- env:
- name: DD_HEALTH_PORT
value: "5555"
- name: DD_API_KEY
valueFrom:
secretKeyRef:
key: api-key
name: datadog
optional: true
- name: DD_APP_KEY
valueFrom:
secretKeyRef:
key: app-key
name: datadog-appkey
- name: DD_EXTERNAL_METRICS_PROVIDER_ENABLED
value: "true"
- name: DD_EXTERNAL_METRICS_PROVIDER_PORT
value: "8443"
- name: DD_EXTERNAL_METRICS_PROVIDER_WPA_CONTROLLER
value: "false"
- name: DD_EXTERNAL_METRICS_PROVIDER_USE_DATADOGMETRIC_CRD
value: "true"
- name: DD_EXTERNAL_METRICS_AGGREGATOR
value: avg
- name: DD_CLUSTER_NAME
value: production
- name: DD_SITE
value: datadoghq.eu
- name: DD_LOG_LEVEL
value: INFO
- name: DD_LEADER_ELECTION
value: "true"
- name: DD_COLLECT_KUBERNETES_EVENTS
value: "true"
- name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
value: datadog-cluster-agent
- name: DD_CLUSTER_AGENT_AUTH_TOKEN
valueFrom:
secretKeyRef:
key: token
name: datadog-cluster-agent
- name: DD_KUBE_RESOURCES_NAMESPACE
value: datadog
- name: DD_ORCHESTRATOR_EXPLORER_ENABLED
value: "true"
- name: DD_ORCHESTRATOR_EXPLORER_CONTAINER_SCRUBBING_ENABLED
value: "true"
- name: DD_COMPLIANCE_CONFIG_ENABLED
value: "false"
image: gcr.io/datadoghq/cluster-agent:1.10.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 6
httpGet:
path: /live
port: 5555
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
name: cluster-agent
ports:
- containerPort: 5005
name: agentport
protocol: TCP
- containerPort: 8443
name: metricsapi
protocol: TCP
readinessProbe:
failureThreshold: 6
httpGet:
path: /ready
port: 5555
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/datadog-agent/install_info
name: installinfo
readOnly: true
subPath: install_info
dnsConfig:
options:
- name: ndots
value: "3"
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: datadog-cluster-agent
serviceAccountName: datadog-cluster-agent
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: datadog-installinfo
name: installinfo
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-05-13T15:46:33Z"
lastUpdateTime: "2021-05-13T15:46:33Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-02-05T07:36:39Z"
lastUpdateTime: "2021-08-19T12:12:06Z"
message: ReplicaSet "datadog-cluster-agent-585897dc8d" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 18
readyReplicas: 1
replicas: 1
updatedReplicas: 1
For the record i got this sorted.
According to the helm default values file you must set the app key in order to use metrics provider:
# datadog.appKey -- Datadog APP key required to use metricsProvider
## If you are using clusterAgent.metricsProvider.enabled = true, you must set
## a Datadog application key for read access to your metrics.
appKey: # <DATADOG_APP_KEY>
I guess this is a lack of information in the docs and also a check that is missing at the cluster-agent startup. Going to open an issue about it.
From the official documentation on troubleshooting the agent here, you have:
If you see the following error when describing the HPA manifest:
Warning FailedComputeMetricsReplicas 3s (x2 over 33s) horizontal-pod-autoscaler failed to get nginx.net.request_per_s external metric: unable to get external metric default/nginx.net.request_per_s/&LabelSelector{MatchLabels:map[string]string{kube_container_name: nginx,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: the server is currently unable to handle the request (get nginx.net.request_per_s.external.metrics.k8s.io)
Make sure the Datadog Cluster Agent is running, and the service exposing the port 8443, whose name is registered in the APIService, is up.
I believe the key phrase here is whose name is registered in the APIService. Did you perform the API Service registration for your external metrics service? This source should provide some details on how to set it up. Since you're getting 403 - Unauthorized errors, it simply implies the TLS setup is causing issues.
Perhaps you can follow the guide in general and ensure that your node-agent is functioning correctly and has token environment variable correctly configured.
Related
I'd like to be able to provision alerts, and try to follow this instructions, but no luck!
OS Grafana version: 9.2.0 Running in Kubernetes
Steps that I take:
Created a new alert rule from UI.
Extract alert rule from API:
curl -k https://<my-grafana-url>/api/v1/provisioning/alert-rules/-4pMuQFVk -u admin:<my-admin-password>
It returns the following:
---
id: 18
uid: "-4pMuQFVk"
orgID: 1
folderUID: 3i72aQKVk
ruleGroup: cpu_alert_group
title: my_cpu_alert
condition: B
data:
- refId: A
queryType: ''
relativeTimeRange:
from: 600
to: 0
datasourceUid: _SaubQF4k
model:
editorMode: code
expr: system_cpu_usage
hide: false
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: true
refId: A
- refId: B
queryType: ''
relativeTimeRange:
from: 0
to: 0
datasourceUid: "-100"
model:
conditions:
- evaluator:
params:
- 3
type: gt
operator:
type: and
query:
params:
- A
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: "-100"
expression: A
hide: false
intervalMs: 1000
maxDataPoints: 43200
refId: B
type: classic_conditions
updated: '2022-12-07T20:01:47Z'
noDataState: NoData
execErrState: Error
for: 5m
Deleted the alert rule from UI.
Made a configmap from above alert-rule, as such:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-alerting
data:
alerting.yaml: |-
apiVersion: 1
groups:
- id: 18
uid: "-4pMuQFVk"
orgID: 1
folderUID: 3i72aQKVk
ruleGroup: cpu_alert_group
title: my_cpu_alert
condition: B
data:
- refId: A
queryType: ''
relativeTimeRange:
from: 600
to: 0
datasourceUid: _SaubQF4k
model:
editorMode: code
expr: system_cpu_usage
hide: false
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: true
refId: A
- refId: B
queryType: ''
relativeTimeRange:
from: 0
to: 0
datasourceUid: "-100"
model:
conditions:
- evaluator:
params:
- 3
type: gt
operator:
type: and
query:
params:
- A
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: "-100"
expression: A
hide: false
intervalMs: 1000
maxDataPoints: 43200
refId: B
type: classic_conditions
updated: '2022-12-07T20:01:47Z'
noDataState: NoData
execErrState: Error
for: 5m
I mount above configmap in grafana container (in /etc/grafna/provisioning/alerting).
The full manifest of the deployment is as follow:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: grafana
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2022-12-08T18:31:30Z"
generation: 4
labels:
app.kubernetes.io/instance: grafana
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: grafana
app.kubernetes.io/version: 9.3.0
helm.sh/chart: grafana-6.46.1
name: grafana
namespace: monitoring
resourceVersion: "648617"
uid: dc06b802-5281-4f31-a2b2-fef3cf53a70b
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: grafana
app.kubernetes.io/name: grafana
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
checksum/config: 98cac51656714db48a116d3109994ee48c401b138bc8459540e1a497f994d197
checksum/dashboards-json-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
checksum/sc-dashboard-provider-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
checksum/secret: 203f0e4d883461bdd41fe68515fc47f679722dc2fdda49b584209d1d288a5f07
creationTimestamp: null
labels:
app.kubernetes.io/instance: grafana
app.kubernetes.io/name: grafana
spec:
automountServiceAccountToken: true
containers:
- env:
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
key: admin-user
name: grafana
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
key: admin-password
name: grafana
- name: GF_PATHS_DATA
value: /var/lib/grafana/
- name: GF_PATHS_LOGS
value: /var/log/grafana
- name: GF_PATHS_PLUGINS
value: /var/lib/grafana/plugins
- name: GF_PATHS_PROVISIONING
value: /etc/grafana/provisioning
image: grafana/grafana:9.3.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /api/health
port: 3000
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: grafana
ports:
- containerPort: 3000
name: grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: 3000
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/grafana/grafana.ini
name: config
subPath: grafana.ini
- mountPath: /etc/grafana/provisioning/alerting
name: grafana-alerting
- mountPath: /var/lib/grafana
name: storage
dnsPolicy: ClusterFirst
enableServiceLinks: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 472
runAsGroup: 472
runAsUser: 472
serviceAccount: grafana
serviceAccountName: grafana
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: grafana-alerting
name: grafana-alerting
- configMap:
defaultMode: 420
name: grafana
name: config
- emptyDir: {}
name: storage
However, grafana fail to start with this error:
Failed to start grafana. error: failure to map file alerting.yaml: failure parsing rules: rule group has no name set
I fixed above error by adding Group Name, but similar errors about other missing elements kept showing up again and again (to the point that I stopped chasing, as I couldn't figure out what exactly the correct schema is). Digging in, it looks like the format/schema returned from API in step 2, is different than the schema that's pointed out in the documentation.
Why is the schema of the alert-rule returned from API different? Am I supposed to convert it, and if so how? Otherwise, what am I doing wrong?
Edit: Replaced Statefulset with deployment, since I was able to reproduce this in a normal/minimal Grafana deployment too.
I'm trying to make use of Schema Registry Transfer SMT plugin but I get following error.
PUT /connectors/kafka-connector-sb-16-sb-16/config returned 400 (Bad Request): Connector configuration is invalid and contains the following 2 error(s):
Invalid value cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer for configuration transforms.AvroSchemaTransfer.type: Class cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer could not be found.
Invalid value null for configuration transforms.AvroSchemaTransfer.type: Not a Transformation
You can also find the above list of errors at the endpoint `/connector-plugins/{connectorType}/config/validate`
I use strimzi operator to manage my kafka objects in kubernetes.
Kafka Connect config:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: {{.Release.Name}}-{{.Values.ENVNAME}}
annotations:
strimzi.io/use-connector-resources: "true"
labels:
app: {{.Release.Name}}
# owner: {{ .Values.owner | default "NotSet" }}
# branch: {{ .Values.branch | default "NotSet" | trunc 56 | trimSuffix "-" }}
spec:
logging:
type: inline
loggers:
connect.root.logger.level: "DEBUG"
template:
pod:
imagePullSecrets:
- name: wecapacrdev001-azurecr-io-pull-secret
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "9404"
prometheus.io/scrape: "true"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- arm64
authentication:
type: scram-sha-512
username: {{.Values.ENVNAME}}
passwordSecret:
secretName: {{.Values.ENVNAME}}
password: password
bootstrapServers: {{.Values.ENVNAME}}-kafka-bootstrap:9093
tls:
trustedCertificates:
- secretName: {{.Values.ENVNAME}}-cluster-ca-cert
certificate: ca.crt
config:
group.id: {{.Values.ENVNAME}}-connect-cluster
offset.storage.topic: {{.Values.ENVNAME}}-connect-cluster-offsets
config.storage.topic: {{.Values.ENVNAME}}-connect-cluster-configs
status.storage.topic: {{.Values.ENVNAME}}-connect-cluster-status
config.storage.replication.factor: -1
offset.storage.replication.factor: -1
status.storage.replication.factor: -1
key.converter: org.apache.kafka.connect.converters.ByteArrayConverter
value.converter: org.apache.kafka.connect.converters.ByteArrayConverter
image: capcr.azurecr.io/devops/kafka:0.27.1-kafka-2.8.0-arm64-v2.7
jvmOptions:
-Xms: 3g
-Xmx: 3g
livenessProbe:
initialDelaySeconds: 30
timeoutSeconds: 15
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
key: metrics-config.yml
name: {{.Release.Name}}-{{.Values.ENVNAME}}
readinessProbe:
initialDelaySeconds: 300
timeoutSeconds: 15
replicas: 3
resources:
limits:
cpu: 1000m
memory: 4Gi
requests:
cpu: 100m
memory: 3Gi
Connector config:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: {{.Release.Name}}-{{.Values.ENVNAME}}
labels:
strimzi.io/cluster: kafka-connect-cluster-{{.Values.ENVNAME}}
app: {{.Release.Name}}
# owner: {{ .Values.owner | default "NotSet" }}
# branch: {{ .Values.branch | default "NotSet" | trunc 56 | trimSuffix "-" }}
spec:
class: org.apache.kafka.connect.mirror.MirrorSourceConnector
config:
offset-syncs.topic.replication.factor: "3"
refresh.topics.interval.seconds: "60"
replication.factor: "3"
key.converter: org.apache.kafka.connect.converters.ByteArrayConverter
value.converter: org.apache.kafka.connect.converters.ByteArrayConverter
source.cluster.alias: SPLITTED
source.cluster.auto.commit.interval.ms: "3000"
source.cluster.auto.offset.reset: earliest
source.cluster.bootstrap.servers: {{.Values.ENVNAME}}-kafka-bootstrap:9093
source.cluster.fetch.max.bytes: "60502835"
source.cluster.group.id: {{.Values.ENVNAME}}-mirroring
source.cluster.max.poll.records: "100"
source.cluster.max.request.size: "60502835"
source.cluster.offset.storage.topic: {{.Values.ENVNAME}}-connect-cluster-offsets
source.cluster.config.storage.topic: {{.Values.ENVNAME}}-connect-cluster-configs
source.cluster.status.storage.topic: {{.Values.ENVNAME}}-connect-cluster-status
source.cluster.producer.compression.type: gzip
source.cluster.replica.fetch.max.bytes: "60502835"
source.cluster.sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule
required username="{{.Values.ENVNAME}}" password="{{.Values.kafka.password}}";
source.cluster.sasl.mechanism: SCRAM-SHA-512
source.cluster.security.protocol: SASL_SSL
source.cluster.ssl.keystore.location: /opt/kafka/keystore/kafka_keystore.p12
source.cluster.ssl.keystore.password: password
source.cluster.ssl.keystore.type: PKCS12
source.cluster.ssl.truststore.location: /opt/kafka/keystore/kafka_keystore.p12
source.cluster.ssl.truststore.password: password
source.cluster.ssl.truststore.type: PKCS12
sync.topic.acls.enabled: "false"
target.cluster.alias: AGGREGATED
target.cluster.auto.commit.interval.ms: "3000"
target.cluster.auto.offset.reset: earliest
target.cluster.bootstrap.servers: {{.Values.ENVNAME}}-kafka-bootstrap:9093
target.cluster.fetch.max.bytes: "60502835"
target.cluster.group.id: {{.Values.ENVNAME}}-test
target.cluster.max.poll.records: "100"
target.cluster.max.request.size: "60502835"
target.cluster.offset.storage.topic: {{.Values.ENVNAME}}-connect-cluster-offsets
target.cluster.config.storage.topic: {{.Values.ENVNAME}}-connect-cluster-configs
target.cluster.status.storage.topic: {{.Values.ENVNAME}}-connect-cluster-status
target.cluster.producer.compression.type: gzip
target.cluster.replica.fetch.max.bytes: "60502835"
target.cluster.sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule
required username="{{.Values.ENVNAME}}" password="{{.Values.kafka.password}}";
target.cluster.sasl.mechanism: SCRAM-SHA-512
target.cluster.security.protocol: SASL_SSL
target.cluster.ssl.keystore.location: /opt/kafka/keystore/kafka_keystore.p12
target.cluster.ssl.keystore.password: password
target.cluster.ssl.keystore.type: PKCS12
target.cluster.ssl.truststore.location: /opt/kafka/keystore/kafka_keystore.p12
target.cluster.ssl.truststore.password: password
target.cluster.ssl.truststore.type: PKCS12
tasks.max: "4"
topics: ^topic-to-be-replicated$
topics.blacklist: ""
transforms: AvroSchemaTransfer
transforms.AvroSchemaTransfer.dest.schema.registry.url: http://schemaregistry-{{.Values.ENVNAME}}-cp-schema-registry:8081
transforms.AvroSchemaTransfer.src.schema.registry.url: http://schemaregistry-{{.Values.ENVNAME}}-cp-schema-registry:8081
transforms.AvroSchemaTransfer.transfer.message.keys: "false"
transforms.AvroSchemaTransfer.type: cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer
transforms.TopicRename.regex: (.*)
transforms.TopicRename.replacement: replica.$1
transforms.TopicRename.type: org.apache.kafka.connect.transforms.RegexRouter
tasksMax: 4
I tried to build the plugin and build the docker image like described here:
ARG STRIMZI_BASE_IMAGE=0.29.0-kafka-3.1.1-arm64
FROM quay.io/strimzi/kafka:$STRIMZI_BASE_IMAGE
USER root:root
COPY schema-registry-transfer-smt/target/schema-registry-transfer-smt-0.2.1.jar /opt/kafka/plugins/
USER 1001
I can see in logs that the plugin is loaded
INFO Registered loader: PluginClassLoader{pluginLocation=file:/opt/kafka/plugins/schema-registry-transfer-smt-0.2.1.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) [main]
INFO Loading plugin from: /opt/kafka/plugins/schema-registry-transfer-smt-0.2.1.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) [main]
DEBUG Loading plugin urls: [file:/opt/kafka/plugins/schema-registry-transfer-smt-0.2.1.jar] (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) [main]
However, when I execute kubectl describe KafkaConnect ..., I get following status which probably means the plugin is not loaded.
Status:
Conditions:
Last Transition Time: 2022-10-31T07:22:43.749290Z
Status: True
Type: Ready
Connector Plugins:
Class: org.apache.kafka.connect.file.FileStreamSinkConnector
Type: sink
Version: 2.8.0
Class: org.apache.kafka.connect.file.FileStreamSourceConnector
Type: source
Version: 2.8.0
Class: org.apache.kafka.connect.mirror.MirrorCheckpointConnector
Type: source
Version: 1
Class: org.apache.kafka.connect.mirror.MirrorHeartbeatConnector
Type: source
Version: 1
Class: org.apache.kafka.connect.mirror.MirrorSourceConnector
Type: source
Version: 1
The logs prints classpath jvm.classpath = ... and it doesn't "include schema-registry-transfer-smt-0.2.1.jar".
This look like configuration error. I tried to create and use fat jar, compile with java11, change scope of provided dependencies and many more. The repository's last commit was created in 2019. Does it make sense to bump all dependency versions? The error message doesn't suggest that.
We deploy service with helm. The ingress template looks like that:
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ui-app-ingress
{{- with .Values.ingress.annotations}}
annotations:
{{- toYaml . | nindent 4}}
{{- end}}
spec:
rules:
- host: {{ .Values.ingress.hostname }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ include "ui-app-chart.fullname" . }}
port:
number: 80
tls:
- hosts:
- {{ .Values.ingress.hostname }}
secretName: {{ .Values.ingress.certname }}
as you can see, we already use networking.k8s.io/v1 but if i watch the treafik logs, i find this error:
1 reflector.go:138] pkg/mod/k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource (get ingresses.extensions)
what results in tls cert error:
time="2022-06-07T15:40:35Z" level=debug msg="Serving default certificate for request: \"example.de\""
time="2022-06-07T15:40:35Z" level=debug msg="http: TLS handshake error from 10.1.0.4:57484: remote error: tls: unknown certificate"
time="2022-06-07T15:40:35Z" level=debug msg="Serving default certificate for request: \"example.de\""
time="2022-06-07T15:53:06Z" level=debug msg="Serving default certificate for request: \"\""
time="2022-06-07T16:03:31Z" level=debug msg="Serving default certificate for request: \"<ip-adress>\""
time="2022-06-07T16:03:32Z" level=debug msg="Serving default certificate for request: \"<ip-adress>\""
PS C:\WINDOWS\system32>
i already found out that networking.k8s.io/v1beta1 is not longer served, but networking.k8s.io/v1 was defined in the template all the time as ApiVersion.
Why does it still try to get from v1beta1? And how can i fix this?
We use this TLSOptions:
apiVersion: traefik.containo.us/v1alpha1
kind: TLSOption
metadata:
name: default
namespace: default
spec:
minVersion: VersionTLS12
maxVersion: VersionTLS13
cipherSuites:
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
we use helm-treafik rolled out with terraform:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: traefik
meta.helm.sh/release-namespace: traefik
creationTimestamp: "2021-06-12T10:06:11Z"
generation: 2
labels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: traefik
helm.sh/chart: traefik-9.19.1
name: traefik
namespace: traefik
resourceVersion: "86094434"
uid: 903a6f54-7698-4290-bc59-d234a191965c
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/name: traefik
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: traefik
helm.sh/chart: traefik-9.19.1
spec:
containers:
- args:
- --global.checknewversion
- --global.sendanonymoususage
- --entryPoints.traefik.address=:9000/tcp
- --entryPoints.web.address=:8000/tcp
- --entryPoints.websecure.address=:8443/tcp
- --api.dashboard=true
- --ping=true
- --providers.kubernetescrd
- --providers.kubernetesingress
- --providers.file.filename=/etc/traefik/traefik.yml
- --accesslog=true
- --accesslog.format=json
- --log.level=DEBUG
- --entrypoints.websecure.http.tls
- --entrypoints.web.http.redirections.entrypoint.to=websecure
- --entrypoints.web.http.redirections.entrypoint.scheme=https
- --entrypoints.web.http.redirections.entrypoint.permanent=true
- --entrypoints.web.http.redirections.entrypoint.to=:443
image: traefik:2.4.8
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: 9000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 2
name: traefik
ports:
- containerPort: 9000
name: traefik
protocol: TCP
- containerPort: 8000
name: web
protocol: TCP
- containerPort: 8443
name: websecure
protocol: TCP
readinessProbe:
failureThreshold: 1
httpGet:
path: /ping
port: 9000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 2
resources: {}
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
readOnlyRootFilesystem: true
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: data
- mountPath: /tmp
name: tmp
- mountPath: /etc/traefik
name: traefik-cm
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65532
serviceAccount: traefik
serviceAccountName: traefik
terminationGracePeriodSeconds: 60
tolerations:
- effect: NoSchedule
key: env
operator: Equal
value: conhub
volumes:
- emptyDir: {}
name: data
- emptyDir: {}
name: tmp
- configMap:
defaultMode: 420
name: traefik-cm
name: traefik-cm
status:
availableReplicas: 3
conditions:
- lastTransitionTime: "2022-06-07T09:19:58Z"
lastUpdateTime: "2022-06-07T09:19:58Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-12T10:06:11Z"
lastUpdateTime: "2022-06-07T16:39:01Z"
message: ReplicaSet "traefik-84c6f5f98b" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 3
replicas: 3
updatedReplicas: 3
resource "helm_release" "traefik" {
name = "traefik"
namespace = "traefik"
create_namespace = true
repository = "https://helm.traefik.io/traefik"
chart = "traefik"
set {
name = "service.spec.loadBalancerIP"
value = azurerm_public_ip.pub_ip.ip_address
}
set {
name = "service.annotations.service\\.beta\\.kubernetes\\.io/azure-load-balancer-resource-group"
value = var.resource_group_aks
}
set {
name = "additionalArguments"
value = "{--accesslog=true,--accesslog.format=json,--log.level=DEBUG,--entrypoints.websecure.http.tls,--entrypoints.web.http.redirections.entrypoint.to=websecure,--entrypoints.web.http.redirections.entrypoint.scheme=https,--entrypoints.web.http.redirections.entrypoint.permanent=true,--entrypoints.web.http.redirections.entrypoint.to=:443}"
}
set {
name = "deployment.replicas"
value = 3
}
timeout = 600
depends_on = [
azurerm_kubernetes_cluster.aks
]
}
I found out that the problem was the version of the traefik image:
i quick fixed it by setting the latest image:
kubectl set image deployment/traefik traefik=traefik:2.7.0 -n traefik
I've a custom operator that listens to changes in a CRD I've defined in a Kubernetes cluster.
Whenever something changed in the defined custom resource, the custom operator would reconcile and idempotently create a secret (that would be owned by the custom resource).
What I expect is for the operator to Reconcile only when something changed in the custom resource or in the secret owned by it.
What I observe is that for some reason the Reconcile function triggers for every CR on the cluster in strange intervals without observable changes to related entities. I've tried focusing on a specific instance of the CR and follow the times in which Reconcile was called for it. The intervals of these calls are very strange. It seems that the calls are alternating between two series - one starts at 10 hours and diminishes seven minutes at a time. The other starts at 7 minutes and grows by 7 minutes a time.
To demonstrate, Reconcile triggered at these times (give or take a few seconds):
00:00
09:53 (10 hours - 1*7 minute interval)
10:00 (0 hours + 1*7 minute interval)
19:46 (10 hours - 2*7 minute interval)
20:00 (0 hours + 2*7 minute interval)
29:39 (10 hours - 3*7 minute interval)
30:00 (0 hours + 3*7 minute interval)
Whenever the diminishing intervals become less than 7 hours, it resets back to 10 hour intervals. The same with the growing series - as soon as the intervals are higher than 3 hours it resets back to 7 minutes.
My main question is how can I investigating why Reconcile is being triggered?
I'm attaching here the manifests for the CRD, the operator and a sample manifest for a CR:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.4.1
creationTimestamp: "2021-10-13T11:04:42Z"
generation: 1
name: databaseservices.operators.talon.one
resourceVersion: "245688703"
uid: 477f8d3e-c19b-43d7-ab59-65198b3c0108
spec:
conversion:
strategy: None
group: operators.talon.one
names:
kind: DatabaseService
listKind: DatabaseServiceList
plural: databaseservices
singular: databaseservice
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: DatabaseService is the Schema for the databaseservices API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: DatabaseServiceSpec defines the desired state of DatabaseService
properties:
cloud:
type: string
databaseName:
description: Foo is an example field of DatabaseService. Edit databaseservice_types.go
to remove/update
type: string
serviceName:
type: string
servicePlan:
type: string
required:
- cloud
- databaseName
- serviceName
- servicePlan
type: object
status:
description: DatabaseServiceStatus defines the observed state of DatabaseService
type: object
type: object
served: true
storage: true
subresources:
status: {}
status:
acceptedNames:
kind: DatabaseService
listKind: DatabaseServiceList
plural: databaseservices
singular: databaseservice
conditions:
- lastTransitionTime: "2021-10-13T11:04:42Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
- lastTransitionTime: "2021-10-13T11:04:42Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
storedVersions:
- v1alpha1
----
apiVersion: operators.talon.one/v1alpha1
kind: DatabaseService
metadata:
creationTimestamp: "2021-10-13T11:14:08Z"
generation: 1
labels:
app: talon
company: amber
repo: talon-service
name: db-service-secret
namespace: amber
resourceVersion: "245692590"
uid: cc369297-6825-4fbf-aa0b-58c24be427b0
spec:
cloud: google-australia-southeast1
databaseName: amber
serviceName: pg-amber
servicePlan: business-4
----
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "75"
secret.reloader.stakater.com/reload: db-credentials
simpledeployer.talon.one/image: <path_to_image>/production:latest
creationTimestamp: "2020-06-22T09:20:06Z"
generation: 77
labels:
simpledeployer.talon.one/enabled: "true"
name: db-operator
namespace: db-operator
resourceVersion: "245688814"
uid: 900424cd-b469-11ea-b661-4201ac100014
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
name: db-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
name: db-operator
spec:
containers:
- command:
- app/db-operator
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: db-operator
- name: AIVEN_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: db-credentials
- name: AIVEN_PROJECT
valueFrom:
secretKeyRef:
key: projectname
name: db-credentials
- name: AIVEN_USERNAME
valueFrom:
secretKeyRef:
key: username
name: db-credentials
- name: SENTRY_URL
valueFrom:
secretKeyRef:
key: sentry_url
name: db-credentials
- name: ROTATION_INTERVAL
value: monthly
image: <path_to_image>/production#sha256:<some_sha>
imagePullPolicy: Always
name: db-operator
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: db-operator
serviceAccountName: db-operator
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-06-22T09:20:06Z"
lastUpdateTime: "2021-09-07T11:56:07Z"
message: ReplicaSet "db-operator-cb6556b76" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2021-09-12T03:56:19Z"
lastUpdateTime: "2021-09-12T03:56:19Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 77
readyReplicas: 1
replicas: 1
updatedReplicas: 1
Note:
When Reconcile finishes, I return:
return ctrl.Result{Requeue: false, RequeueAfter: 0}
So that shouldn't be the reason for the repeated triggers.
I will add that I have recently updated the Kubernetes cluster version to v1.20.8-gke.2101.
This would require more info on how your controller is set up. For example what is the sync period you have set. This could be due to default sync period set which reconciles all the objects at given interval of time.
SyncPeriod determines the minimum frequency at which watched resources are
reconciled. A lower period will correct entropy more quickly, but reduce
responsiveness to change if there are many watched resources. Change this
value only if you know what you are doing. Defaults to 10 hours if unset.
there will a 10 percent jitter between the SyncPeriod of all controllers
so that all controllers will not send list requests simultaneously.
For more information check this: https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.2/pkg/manager/manager.go#L134
same problem
my Reconcile triggered at these times
00:00
09:03 (9 hours + 3 min)
18:06 (9 hours + 3 min)
00:09 (9 hours + 3 min)
sync period is not set so it should be default.
kubernetes 1.20.11 version
Have a Next.js project.
This is my next.config.js file, which I followed through with on this guide: https://dev.to/tesh254/environment-variables-from-env-file-in-nextjs-570b
module.exports = withCSS(withSass({
webpack: (config) => {
config.plugins = config.plugins || []
config.module.rules.push({
test: /\.svg$/,
use: ['#svgr/webpack', {
loader: 'url-loader',
options: {
limit: 100000,
name: '[name].[ext]'
}}],
});
config.plugins = [
...config.plugins,
// Read the .env file
new Dotenv({
path: path.join(__dirname, '.env'),
systemvars: true
})
]
const env = Object.keys(process.env).reduce((acc, curr) => {
acc[`process.env.${curr}`] = JSON.stringify(process.env[curr]);
return acc;
}, {});
// Fixes npm packages that depend on `fs` module
config.node = {
fs: 'empty'
}
/** Allows you to create global constants which can be configured
* at compile time, which in our case is our environment variables
*/
config.plugins.push(new webpack.DefinePlugin(env));
return config
}
}),
);
I have a .env file which holds the values I need. It works when run on localhost.
In my Kubernetes environment, within the deploy file which I can modify, I have the same environment variables set up. But when I try and identify them they come off as undefined, so my application cannot run.
I refer to it like:
process.env.SOME_VARIABLE
which works locally.
Does anyone have experience making environment variables function on Next.js when deployed? Not as simple as it is for a backend service. :(
EDIT:
This is what the environment variable section looks like.
EDIT 2:
Full deploy file, edited to remove some details
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "38"
creationTimestamp: xx
generation: 40
labels:
app: appname
name: appname
namespace: development
resourceVersion: xx
selfLink: /apis/extensions/v1beta1/namespaces/development/deployments/appname
uid: xxx
spec:
progressDeadlineSeconds: xx
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: appname
tier: sometier
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: appname
tier: sometier
spec:
containers:
- env:
- name: NODE_ENV
value: development
- name: PORT
value: "3000"
- name: SOME_VAR
value: xxx
- name: SOME_VAR
value: xxxx
image: someimage
imagePullPolicy: Always
name: appname
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 3000
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: xxx
lastUpdateTime: xxxx
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 40
readyReplicas: 1
replicas: 1
updatedReplicas: 1
.env Works in docker or docker-compose, they do not work in Kubernetes, if you want to add them you can by configmaps objects or add directly to each deployment an example (from documentation):
apiVersion: v1
kind: Pod
metadata:
name: envar-demo
labels:
purpose: demonstrate-envars
spec:
containers:
- name: envar-demo-container
image: gcr.io/google-samples/node-hello:1.0
env:
- name: DEMO_GREETING
value: "Hello from the environment"
- name: DEMO_FAREWELL
value: "Such a sweet sorrow
Also, the best and standard way is to use config maps, for example:
containers:
- env:
- name: DB_DEFAULT_DATABASE
valueFrom:
configMapKeyRef:
key: DB_DEFAULT_DATABASE
name: darwined-env
And the config map:
apiVersion: v1
data:
DB_DEFAULT_DATABASE: darwined_darwin_dev_1
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
io.kompose.service: darwin-env
name: darwined-env
Hope this helps.