I am running Airflow with k8s executor.
I have everything set up under the [kubernetes] section and things are working fine. However, I would prefer to use a pod file for the worker.
So I generated a pod.yaml from one of the worker container that spins up.
I have placed this file on a location accessible by the scheduler pod something like
/opt/airflow/yamls/workerpod.yaml
But when I try to specify this file in pod_template_file parameter, it gives me these errors
[2020-03-02 22:12:24,115] {pod_launcher.py:84} ERROR - Exception when attempting to create Namespaced Pod.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 81, in run_pod_async
resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace, **kwargs)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6115, in create_namespaced_pod
(data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6206, in create_namespaced_pod_with_http_info
collection_formats=collection_formats)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 334, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 168, in __call_api
_request_timeout=_request_timeout)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 377, in request
body=body)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 266, in POST
body=body)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ab2bc6dc-96f9-4014-8a08-7dae6e008aad', 'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'Date': 'Mon, 02 Mar 2020 22:12:24 GMT', 'Content-Length': '660'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"examplebashoperatorrunme0-c9ca5d619bc54bf2a456e133ad79dd00\" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{0}: 0 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000040000, 1000049999] spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden]","reason":"Forbidden","details":{"name":"examplebashoperatorrunme0-c9ca5d619bc54bf2a456e133ad79dd00","kind":"pods"},"code":403}
[2020-03-02 22:12:24,141] {kubernetes_executor.py:863} WARNING - ApiException when attempting to run task, re-queueing. Message: pods "examplebashoperatorrunme0-c9ca5d619bc54bf2a456e133ad79dd00" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{0}: 0 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000040000, 1000049999] spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden]
Just to clarify, the pod.yaml file is generated from same running container that comes from configs in kubernetes section of airflow.cfg that works just fine. The run as user is correct. The SA is correct but still I am getting this error.
I am unsure if I should place this file in relation to where I kick off my kubectl apply ?
Since it goes in the airflow.cfg, I didn't think that would be the case but rather should be accessible from within the scheduler container.
One strange thing I noticed is that even though I have specified and seem to be using KubernetesExecutor but when the individual worker pods come on they said LocalExecutor. That's something I had changed in the workerpod.yaml file to KubernetesExecutor.
here is pod yaml file
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/scc: nonroot
labels:
app: airflow-worker
kubernetes_executor: "True"
name: airflow-worker
# namespace: airflow
spec:
affinity: {}
containers:
env:
- name: AIRFLOW_HOME
value: /opt/airflow
- name: AIRFLOW__CORE__EXECUTOR
value: KubernetesExecutor
#value: LocalExecutor
- name: AIRFLOW__CORE__DAGS_FOLDER
value: /opt/airflow/dags
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: MYSQL_CONN_STRING
name: db-secret
image: ourrepo.example.com/airflow-lab:latest
imagePullPolicy: IfNotPresent
name: base
# resources:
# limits:
# cpu: "1"
# memory: 1Gi
# requests:
# cpu: 400m
# memory: 1Gi
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
volumeMounts:
- mountPath: /opt/airflow/dags
name: airflow-dags
readOnly: true
subPath: airflow/dags
- mountPath: /opt/airflow/logs
name: airflow-logs
- mountPath: /opt/airflow/airflow.cfg
name: airflow-config
readOnly: true
subPath: airflow.cfg
# - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
# name: airflow-cluster-access-token-5228g
# readOnly: true
dnsPolicy: ClusterFirst
# imagePullSecrets:
# - name: airflow-cluster-access-dockercfg-85twh
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
# fsGroup: 0
runAsUser: 1001
seLinuxOptions:
level: s0:c38,c12
serviceAccount: airflow-cluster-access
serviceAccountName: airflow-cluster-access
# tolerations:
# - effect: NoSchedule
# key: node.kubernetes.io/memory-pressure
# operator: Exists
volumes:
- name: airflow-dags
persistentVolumeClaim:
claimName: ucdagent
- emptyDir: {}
name: airflow-logs
- configMap:
defaultMode: 420
name: airflow-config
name: airflow-config
# - name: airflow-cluster-access-token-5228g
# secret:
# defaultMode: 420
# secretName: airflow-cluster-access-token-5228g
Here is the working kubernetes config from airflow.cfg
[kubernetes]
#pod_template_file = /opt/airflow/yamls/workerpod.yaml
dags_in_image = False
worker_container_repository = ${AIRFLOW_IMAGE_NAME}
worker_container_tag = ${AIRFLOW_IMAGE_TAG}
worker_container_image_pull_policy = IfNotPresent
delete_worker_pods = False
in_cluster = true
namespace = ${AIRFLOW_NAMESPACE}
airflow_configmap = airflow-config
run_as_user = 1001
dags_volume_subpath = airflow/dags
dags_volume_claim = ucdagent
worker_service_account_name = airflow-cluster-access
[kubernetes_secrets]
AIRFLOW__CORE__SQL_ALCHEMY_CONN = db-secret=MYSQL_CONN_STRING
UPDATE: my airflow version is 1.10.7. I am guessing this is a newer parameters. I am trying to find if this is currently an empty config reference or it has been implemented in latest which is right now 1.10.9
UPDATE: This parameter has not beeen implemented as of 1.10.9
Related
I recently upgraded the Airflow from 1.10.11 to 2.2.3 after following the steps given in https://airflow.apache.org/docs/apache-airflow/stable/upgrading-from-1-10/index.html. I first up upgraded to 1.10.15 as suggested which worked fine. But after upgrading to 2.2.3, I'm unable to execute the DAGs from UI as the task is going into queued state. When I check the task pod logs, I see this error:
[2022-02-22 06:46:23,886] {cli_action_loggers.py:105} WARNING - Failed to log action with (sqlite3.OperationalError) no such table: log
[SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?, ?, ?, ?, ?, ?, ?)]
[parameters: ('2022-02-22 06:46:23.880923', 'dag id', 'task id', 'cli_task_run', None, 'airflow', '{"host_name": "pod name", "full_command": "[\'/home/airflow/.local/bin/airflow\', \'tasks\', \ task id\', \'manual__2022-02-22T06:45:47.840912+00:00\', \'--local\', \'--subdir\', \'DAGS_FOLDER/dag_file.py\']"}')]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2022-02-22 06:46:23,888] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/repo/xxxxx.py
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 282, in task_run
dag = get_dag(args.subdir, args.dag_id)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 193, in get_dag
f"Dag {dag_id!r} could not be found; either it does not exist or it failed to parse."
airflow.exceptions.AirflowException: Dag 'xxxxx' could not be found; either it does not exist or it failed to parse
I did try exec into the webserver and scheduler using "kubectl exec -it airflow-dev-webserver-6c5755d5dd-262wd -n dev --container webserver -- /bin/sh". I could see all the dags under /opt/airflow/dags/repo/. Even in the error it says Filling up the DagBag from /opt/airflow/dags/repo/ but couldn't understand what's making the task execution to go into queued state.
I figured out the issue using below steps:
I triggered a DAG after which I could see a task pod going into error state. So I did "kubectl logs {pod_name} git-sync" to check whether the DAGs are being copied in the first place or not. Then I found this below error:
image
Then I realized that it is the problem with permissions for writing the DAGs to the DAGs folder. For this I tried changing the "readOnly: false" under "volumeMounts" section.
image
That's it!!! It worked. Below worked finally:
Pod Template File:
apiVersion: v1
kind: Pod
metadata:
labels:
component: worker
release: airflow-dev
tier: airflow
spec:
containers:
- args: []
command: []
env:
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY
value: ECR repo link
- name: AIRFLOW__SMTP__SMTP_PORT
value: '587'
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG
value: docker image tag
- name: AIRFLOW__KUBERNETES__GIT_SYNC_RUN_AS_USER
value: '65533'
- name: AIRFLOW__CORE__ENABLE_XCOM_PICKLING
value: 'True'
- name: AIRFLOW__KUBERNETES__LOGS_VOLUME_CLAIM
value: dw-airflow-dev-logs
- name: AIRFLOW__KUBERNETES__RUN_AS_USER
value: '50000'
- name: AIRFLOW__KUBERNETES__DAGS_IN_IMAGE
value: 'False'
- name: AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION
value: 'False'
- name: AIRFLOW__SMTP__SMTP_MAIL_FROM
value: email id
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'False'
- name: AIRFLOW__SMTP__SMTP_PASSWORD
value: xxxxxxxxx
- name: AIRFLOW__SMTP__SMTP_HOST
value: smtp-relay.gmail.com
- name: AIRFLOW__KUBERNETES__NAMESPACE
value: dev
- name: AIRFLOW__SMTP__SMTP_USER
value: xxxxxxxxxx
- name: AIRFLOW__CORE__EXECUTOR
value: LocalExecutor
- name: AIRFLOW_HOME
value: /opt/airflow
- name: AIRFLOW__CORE__DAGS_FOLDER
value: /opt/airflow/dags
- name: AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT
value: /opt/airflow/dags
- name: AIRFLOW__KUBERNETES__FS_GROUP
value: "50000"
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: connection
name: airflow-dev-airflow-metadata
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
key: connection
name: airflow-dev-airflow-metadata
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
key: fernet-key
name: airflow-dev-fernet-key
envFrom: []
image: docker image
imagePullPolicy: IfNotPresent
name: base
ports: []
volumeMounts:
- mountPath: /opt/airflow/dags
name: airflow-dags
readOnly: false
subPath: /repo
- mountPath: /opt/airflow/logs
name: airflow-logs
- mountPath: /etc/git-secret/ssh
name: git-sync-ssh-key
subPath: ssh
- mountPath: /opt/airflow/airflow.cfg
name: airflow-config
readOnly: true
subPath: airflow.cfg
- mountPath: /opt/airflow/config/airflow_local_settings.py
name: airflow-config
readOnly: true
subPath: airflow_local_settings.py
hostNetwork: false
imagePullSecrets:
- name: airflow-dev-registry
initContainers:
- env:
- name: GIT_SYNC_REPO
value: xxxxxxxxxxxxx
- name: GIT_SYNC_BRANCH
value: master
- name: GIT_SYNC_ROOT
value: /git
- name: GIT_SYNC_DEST
value: repo
- name: GIT_SYNC_DEPTH
value: '1'
- name: GIT_SYNC_ONE_TIME
value: 'true'
- name: GIT_SYNC_REV
value: HEAD
- name: GIT_SSH_KEY_FILE
value: /etc/git-secret/ssh
- name: GIT_SYNC_ADD_USER
value: 'true'
- name: GIT_SYNC_SSH
value: 'true'
- name: GIT_KNOWN_HOSTS
value: 'false'
image: k8s.gcr.io/git-sync:v3.1.6
name: git-sync
securityContext:
runAsUser: 65533
volumeMounts:
- mountPath: /git
name: airflow-dags
readOnly: false
- mountPath: /etc/git-secret/ssh
name: git-sync-ssh-key
subPath: ssh
nodeSelector: {}
restartPolicy: Never
securityContext:
fsGroup: 50000
runAsUser: 50000
serviceAccountName: airflow-dev-worker-serviceaccount
volumes:
- emptyDir: {}
name: airflow-dags
- name: airflow-logs
persistentVolumeClaim:
claimName: dw-airflow-dev-logs
- name: git-sync-ssh-key
secret:
items:
- key: gitSshKey
mode: `444`
path: ssh
secretName: airflow-private-dags-dev
- configMap:
name: airflow-dev-airflow-config
name: airflow-config [](url)
prometheus-prometheus-kube-prometheus-prometheus-0 0/2 Terminating 0 4s
alertmanager-prometheus-kube-prometheus-alertmanager-0 0/2 Terminating 0 10s
After updating EKS cluster to 1.16 from 1.15 everything works fine except these two pods, they keep on terminating and unable to initialise. Hence, prometheus monitoring does not work. I am getting below errors while describing the pods.
Error: failed to start container "prometheus": Error response from daemon: OCI runtime create failed: container_linux.go:362: creating new parent process caused: container_linux.go:1941: running lstat on namespace path "/proc/29271/ns/ipc" caused: lstat /proc/29271/ns/ipc: no such file or directory: unknown
Error: failed to start container "config-reloader": Error response from daemon: cannot join network of a non running container: 7e139521980afd13dad0162d6859352b0b2c855773d6d4062ee3e2f7f822a0b3
Error: cannot find volume "config" to mount into container "config-reloader"
Error: cannot find volume "config" to mount into container "prometheus"
here is my yaml file for the deployment:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2021-04-30T16:39:14Z"
deletionGracePeriodSeconds: 600
deletionTimestamp: "2021-04-30T16:49:14Z"
generateName: prometheus-prometheus-kube-prometheus-prometheus-
labels:
app: prometheus
app.kubernetes.io/instance: prometheus-kube-prometheus-prometheus
app.kubernetes.io/managed-by: prometheus-operator
app.kubernetes.io/name: prometheus
app.kubernetes.io/version: 2.26.0
controller-revision-hash: prometheus-prometheus-kube-prometheus-prometheus-56d9fcf57
operator.prometheus.io/name: prometheus-kube-prometheus-prometheus
operator.prometheus.io/shard: "0"
prometheus: prometheus-kube-prometheus-prometheus
statefulset.kubernetes.io/pod-name: prometheus-prometheus-kube-prometheus-prometheus-0
name: prometheus-prometheus-kube-prometheus-prometheus-0
namespace: mo
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: prometheus-prometheus-kube-prometheus-prometheus
uid: 326a09f2-319c-449d-904a-1dd0019c6d80
resourceVersion: "9337443"
selfLink: /api/v1/namespaces/monitoring/pods/prometheus-prometheus-kube-prometheus-prometheus-0
uid: e2be062f-749d-488e-a6cc-42ef1396851b
spec:
containers:
- args:
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
- --config.file=/etc/prometheus/config_out/prometheus.env.yaml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=10d
- --web.enable-lifecycle
- --storage.tsdb.no-lockfile
- --web.external-url=http://prometheus-kube-prometheus-prometheus.monitoring:9090
- --web.route-prefix=/
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
name: prometheus
ports:
- containerPort: 9090
name: web
protocol: TCP
readinessProbe:
failureThreshold: 120
httpGet:
path: /-/ready
port: web
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/prometheus/config_out
name: config-out
readOnly: true
- mountPath: /etc/prometheus/certs
name: tls-assets
readOnly: true
- mountPath: /prometheus
name: prometheus-prometheus-kube-prometheus-prometheus-db
- mountPath: /etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-kube-prometheus-prometheus-token-mh66q
readOnly: true
- args:
- --listen-address=:8080
- --reload-url=http://localhost:9090/-/reload
- --config-file=/etc/prometheus/config/prometheus.yaml.gz
- --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
- --watched-dir=/etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
command:
- /bin/prometheus-config-reloader
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: SHARD
value: "0"
image: quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0
imagePullPolicy: IfNotPresent
name: config-reloader
ports:
- containerPort: 8080
name: reloader-web
protocol: TCP
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/prometheus/config
name: config
- mountPath: /etc/prometheus/config_out
name: config-out
- mountPath: /etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-kube-prometheus-prometheus-token-mh66q
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: prometheus-prometheus-kube-prometheus-prometheus-0
nodeName: ip-10-1-49-45.ec2.internal
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: prometheus-kube-prometheus-prometheus
serviceAccountName: prometheus-kube-prometheus-prometheus
subdomain: prometheus-operated
terminationGracePeriodSeconds: 600
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: config
secret:
defaultMode: 420
secretName: prometheus-prometheus-kube-prometheus-prometheus
- name: tls-assets
secret:
defaultMode: 420
secretName: prometheus-prometheus-kube-prometheus-prometheus-tls-assets
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
name: prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0
- emptyDir: {}
name: prometheus-prometheus-kube-prometheus-prometheus-db
- name: prometheus-kube-prometheus-prometheus-token-mh66q
secret:
defaultMode: 420
secretName: prometheus-kube-prometheus-prometheus-token-mh66q
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-04-30T16:39:14Z"
status: "True"
type: PodScheduled
phase: Pending
qosClass: Burstable
If someone needs to know the answer, in my case(the above situation) there were 2 Prometheus operators running in different different namespace, 1 in default & another monitoring namespace. so I removed the one from the default namespace and it resolved my pods crashing issue.
I'm trying to deploy the mongodb community operator in openshift 3.11, using the following commands:
git clone https://github.com/mongodb/mongodb-kubernetes-operator.git
cd mongodb-kubernetes-operator
oc new-project mongodb
oc create -f deploy/crds/mongodb.com_mongodb_crd.yaml -n mongodb
oc create -f deploy/operator/role.yaml -n mongodb
oc create -f deploy/operator/role_binding.yaml -n mongodb
oc create -f deploy/operator/service_account.yaml -n mongodb
oc apply -f deploy/openshift/operator_openshift.yaml -n mongodb
oc apply -f deploy/crds/mongodb.com_v1_mongodb_openshift_cr.yaml -n mongodb
Operator pod is successfully running, but the mongodb replicaset pods do not spin up. Error is as follows:
[kubenode#master mongodb-kubernetes-operator]$ oc get pods
NAME READY STATUS RESTARTS AGE
example-openshift-mongodb-0 1/2 CrashLoopBackOff 4 2m
mongodb-kubernetes-operator-66bfcbcf44-9xvj7 1/1 Running 0 2m
[kubenode#master mongodb-kubernetes-operator]$ oc logs -f example-openshift-mongodb-0 -c mongodb-agent
panic: Failed to get current user: user: unknown userid 1000510000
goroutine 1 [running]:
com.tengen/cm/util.init.3()
/data/mci/2f46ec94982c5440960d2b2bf2b6ae15/mms-automation/build/go-dependencies/src/com.tengen/cm/util/user.go:14 +0xe5
I have gone through all the issues raised on the mongodb-kubernetes-operator repository which are related to this issue (reference), and found a suggestion to set the MANAGED_SECURITY_CONTEXT environment variable to true in the operator, mongodb and mongodb-agent containers.
I have done so for all of these containers, but am still facing the same issue.
Here is the confirmation that the environment variables are correctly set:
[kubenode#master mongodb-kubernetes-operator]$ oc set env statefulset.apps/example-openshift-mongodb --list
# statefulsets/example-openshift-mongodb, container mongodb-agent
AGENT_STATUS_FILEPATH=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
AUTOMATION_CONFIG_MAP=example-openshift-mongodb-config
HEADLESS_AGENT=true
MANAGED_SECURITY_CONTEXT=true
# POD_NAMESPACE from field path metadata.namespace
# statefulsets/example-openshift-mongodb, container mongod
AGENT_STATUS_FILEPATH=/healthstatus/agent-health-status.json
MANAGED_SECURITY_CONTEXT=true
[kubenode#master mongodb-kubernetes-operator]$ oc set env deployment.apps/mongodb-kubernetes-operator --list
# deployments/mongodb-kubernetes-operator, container mongodb-kubernetes-operator
# WATCH_NAMESPACE from field path metadata.namespace
# POD_NAME from field path metadata.name
MANAGED_SECURITY_CONTEXT=true
OPERATOR_NAME=mongodb-kubernetes-operator
AGENT_IMAGE=quay.io/mongodb/mongodb-agent:10.19.0.6562-1
VERSION_UPGRADE_HOOK_IMAGE=quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
Operator Information
Operator Version: 0.3.0
MongoDB Image used: 4.2.6
Cluster Information
[kubenode#master mongodb-kubernetes-operator]$ openshift version
openshift v3.11.0+62803d0-1
[kubenode#master mongodb-kubernetes-operator]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-15T09:45:30Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2020-12-07T17:59:40Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Update
When I check the replica pod yaml (see below), I see three occurrences of runAsUser security context set as 1000510000. I'm not sure how, but this is being set even though I'm not setting it manually.
[kubenode#master mongodb-kubernetes-operator]$ oc get -o yaml pod example-openshift-mongodb-0
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/scc: restricted
creationTimestamp: 2021-01-19T07:45:05Z
generateName: example-openshift-mongodb-
labels:
app: example-openshift-mongodb-svc
controller-revision-hash: example-openshift-mongodb-6549495b
statefulset.kubernetes.io/pod-name: example-openshift-mongodb-0
name: example-openshift-mongodb-0
namespace: mongodb
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: example-openshift-mongodb
uid: 3e91eb40-5a2a-11eb-a5e0-0050569b1f59
resourceVersion: "15616863"
selfLink: /api/v1/namespaces/mongodb/pods/example-openshift-mongodb-0
uid: 3ea17a28-5a2a-11eb-a5e0-0050569b1f59
spec:
containers:
- command:
- agent/mongodb-agent
- -cluster=/var/lib/automation/config/cluster-config.json
- -skipMongoStart
- -noDaemonize
- -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- -serveStatusPort=5000
- -useLocalMongoDbTools
env:
- name: AGENT_STATUS_FILEPATH
value: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- name: AUTOMATION_CONFIG_MAP
value: example-openshift-mongodb-config
- name: HEADLESS_AGENT
value: "true"
- name: MANAGED_SECURITY_CONTEXT
value: "true"
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/mongodb/mongodb-agent:10.19.0.6562-1
imagePullPolicy: Always
name: mongodb-agent
readinessProbe:
exec:
command:
- /var/lib/mongodb-mms-automation/probes/readinessprobe
failureThreshold: 60
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/automation/config
name: automation-config
readOnly: true
- mountPath: /data
name: data-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: example-openshift-mongodb-agent-scram-credentials
- mountPath: /var/log/mongodb-mms-automation/healthstatus
name: healthstatus
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
- command:
- /bin/sh
- -c
- |2
# run post-start hook to handle version changes
/hooks/version-upgrade
# wait for config to be created by the agent
while [ ! -f /data/automation-mongod.conf ]; do sleep 3 ; done ; sleep 2 ;
# start mongod with this configuration
exec mongod -f /data/automation-mongod.conf ;
env:
- name: AGENT_STATUS_FILEPATH
value: /healthstatus/agent-health-status.json
- name: MANAGED_SECURITY_CONTEXT
value: "true"
image: mongo:4.2.6
imagePullPolicy: IfNotPresent
name: mongod
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: data-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: example-openshift-mongodb-agent-scram-credentials
- mountPath: /healthstatus
name: healthstatus
- mountPath: /hooks
name: hooks
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
dnsPolicy: ClusterFirst
hostname: example-openshift-mongodb-0
imagePullSecrets:
- name: mongodb-kubernetes-operator-dockercfg-jhplw
initContainers:
- command:
- cp
- version-upgrade-hook
- /hooks/version-upgrade
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
imagePullPolicy: Always
name: mongod-posthook
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /hooks
name: hooks
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
nodeName: node1.192.168.27.116.nip.io
nodeSelector:
node-role.kubernetes.io/compute: "true"
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000510000
seLinuxOptions:
level: s0:c23,c2
serviceAccount: mongodb-kubernetes-operator
serviceAccountName: mongodb-kubernetes-operator
subdomain: example-openshift-mongodb-svc
terminationGracePeriodSeconds: 30
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-volume-example-openshift-mongodb-0
- name: automation-config
secret:
defaultMode: 416
secretName: example-openshift-mongodb-config
- name: example-openshift-mongodb-agent-scram-credentials
secret:
defaultMode: 384
secretName: example-openshift-mongodb-agent-scram-credentials
- emptyDir: {}
name: healthstatus
- emptyDir: {}
name: hooks
- name: mongodb-kubernetes-operator-token-lr9l4
secret:
defaultMode: 420
secretName: mongodb-kubernetes-operator-token-lr9l4
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:46:45Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:46:39Z
message: 'containers with unready status: [mongodb-agent]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
message: 'containers with unready status: [mongodb-agent]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:45:05Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://bd3ede9178bb78267bc19d1b5da0915d3bcd1d4dcee3e142c7583424bd2aa777
image: docker.io/mongo:4.2.6
imageID: docker-pullable://docker.io/mongo#sha256:c880f6b56f443bb4d01baa759883228cd84fa8d78fa1a36001d1c0a0712b5a07
lastState: {}
name: mongod
ready: true
restartCount: 0
state:
running:
startedAt: 2021-01-19T07:46:55Z
- containerID: docker://5e39c0b6269b8231bbf9cabb4ff3457d9f91e878eff23953e318a9475fb8a90e
image: quay.io/mongodb/mongodb-agent:10.19.0.6562-1
imageID: docker-pullable://quay.io/mongodb/mongodb-agent#sha256:790c2670ef7cefd61cfaabaf739de16dbd2e07dc3b539add0da21ab7d5ac7626
lastState:
terminated:
containerID: docker://5e39c0b6269b8231bbf9cabb4ff3457d9f91e878eff23953e318a9475fb8a90e
exitCode: 2
finishedAt: 2021-01-19T19:39:58Z
reason: Error
startedAt: 2021-01-19T19:39:58Z
name: mongodb-agent
ready: false
restartCount: 144
state:
waiting:
message: Back-off 5m0s restarting failed container=mongodb-agent pod=example-openshift-mongodb-0_mongodb(3ea17a28-5a2a-11eb-a5e0-0050569b1f59)
reason: CrashLoopBackOff
hostIP: 192.168.27.116
initContainerStatuses:
- containerID: docker://7c31cef2a68e3e6100c2cc9c83e3780313f1e8ab43bebca79ad4d48613f124bd
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
imageID: docker-pullable://quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook#sha256:e99105b1c54e12913ddaf470af8025111a6e6e4c8917fc61be71d1bc0328e7d7
lastState: {}
name: mongod-posthook
ready: true
restartCount: 0
state:
terminated:
containerID: docker://7c31cef2a68e3e6100c2cc9c83e3780313f1e8ab43bebca79ad4d48613f124bd
exitCode: 0
finishedAt: 2021-01-19T07:46:45Z
reason: Completed
startedAt: 2021-01-19T07:46:44Z
phase: Running
podIP: 10.129.0.119
qosClass: BestEffort
startTime: 2021-01-19T07:46:39Z
Following the project from here, I am trying to integrate airflow kubernetes executor using NFS server as backed storage PV. I've a PV airflow-pv which is linked with NFS server. Airflow webserver and scheduler are using a PVC airflow-pvc which is bound with airflow-pv. I've placed my dag files in NFS server /var/nfs/airflow/development/<dags/logs>. I can see newly added DAGS in webserver UI aswell. However when I execute a DAG from UI, the scheduler fires a new POD for that tasks BUT the new worker pod fails to run saying
Unable to mount volumes for pod "tutorialprintdate-3e1a4443363e4c9f81fd63438cdb9873_development(976b1e64-b46d-11e9-92af-025000000001)": timeout expired waiting for volumes to attach or mount for pod "development"/"tutorialprintdate-3e1a4443363e4c9f81fd63438cdb9873". list of unmounted volumes=[airflow-dags]. list of unattached volumes=[airflow-dags airflow-logs airflow-config default-token-hjwth]
here is my webserver and scheduler deployment files;
apiVersion: v1
kind: Service
metadata:
name: airflow-webserver-svc
namespace: development
spec:
type: NodePort
ports:
- name: web
protocol: TCP
port: 8080
selector:
app: airflow-webserver-app
namespace: development
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: airflow-webserver-dep
namespace: development
spec:
replicas: 1
selector:
matchLabels:
app: airflow-webserver-app
namespace: development
template:
metadata:
labels:
app: airflow-webserver-app
namespace: development
spec:
restartPolicy: Always
containers:
- name: airflow-webserver-app
image: airflow:externalConfigs
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
args: ["-webserver"]
env:
- name: AIRFLOW_KUBE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: airflow-secrets
key: AIRFLOW__CORE__FERNET_KEY
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: airflow-secrets
key: MYSQL_PASSWORD
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: airflow-secrets
key: MYSQL_PASSWORD
- name: DB_HOST
value: mysql-svc.development.svc.cluster.local
- name: DB_PORT
value: "3306"
- name: MYSQL_DATABASE
value: airflow
- name: MYSQL_USER
value: airflow
- name: MYSQL_PASSWORD
value: airflow
- name: AIRFLOW__CORE__EXECUTOR
value: "KubernetesExecutor"
volumeMounts:
- name: airflow-config
mountPath: /usr/local/airflow/airflow.cfg
subPath: airflow.cfg
- name: airflow-files
mountPath: /usr/local/airflow/dags
subPath: airflow/development/dags
- name: airflow-files
mountPath: /usr/local/airflow/plugins
subPath: airflow/development/plugins
- name: airflow-files
mountPath: /usr/local/airflow/logs
subPath: airflow/development/logs
- name: airflow-files
mountPath: /usr/local/airflow/temp
subPath: airflow/development/temp
volumes:
- name: airflow-files
persistentVolumeClaim:
claimName: airflow-pvc
- name: airflow-config
configMap:
name: airflow-config
The scheduler yaml file is exactly the same except the container args is args: ["-scheduler"]. Here is my airflow.cfg file,
apiVersion: v1
kind: ConfigMap
metadata:
name: "airflow-config"
namespace: development
data:
airflow.cfg: |
[core]
airflow_home = /usr/local/airflow
dags_folder = /usr/local/airflow/dags
base_log_folder = /usr/local/airflow/logs
executor = KubernetesExecutor
plugins_folder = /usr/local/airflow/plugins
load_examples = false
[scheduler]
child_process_log_directory = /usr/local/airflow/logs/scheduler
[webserver]
rbac = false
[kubernetes]
airflow_configmap =
worker_container_repository = airflow
worker_container_tag = externalConfigs
worker_container_image_pull_policy = IfNotPresent
delete_worker_pods = true
dags_volume_claim = airflow-pvc
dags_volume_subpath =
logs_volume_claim = airflow-pvc
logs_volume_subpath =
env_from_configmap_ref = airflow-config
env_from_secret_ref = airflow-secrets
in_cluster = true
namespace = development
[kubernetes_node_selectors]
# the key-value pairs to be given to worker pods.
# the worker pods will be scheduled to the nodes of the specified key-value pairs.
# should be supplied in the format: key = value
[kubernetes_environment_variables]
//the below configs gets overwritten by above [kubernetes] configs
AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM = airflow-pvc
AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH = var/nfs/airflow/development/dags
AIRFLOW__KUBERNETES__LOGS_VOLUME_CLAIM = airflow-pvc
AIRFLOW__KUBERNETES__LOGS_VOLUME_SUBPATH = var/nfs/airflow/development/logs
[kubernetes_secrets]
AIRFLOW__CORE__SQL_ALCHEMY_CONN = airflow-secrets=AIRFLOW__CORE__SQL_ALCHEMY_CONN
AIRFLOW_HOME = airflow-secrets=AIRFLOW_HOME
[cli]
api_client = airflow.api.client.json_client
endpoint_url = https://airflow.crunchanalytics.cloud
[api]
auth_backend = airflow.api.auth.backend.default
[admin]
# ui to hide sensitive variable fields when set to true
hide_sensitive_variable_fields = true
After firing a manual task, the logs of the Scheduler tells me that KubernetesExecutorConfig() executed with all values as None. Seems like it didn't picked up the configs ? I've tried almost everything I know of, but cannot manage to make it work. Could someone tell me waht am I missing ?
[2019-08-01 14:44:22,944] {jobs.py:1341} INFO - Sending ('kubernetes_sample', 'run_this_first', datetime.datetime(2019, 8, 1, 13, 45, 51, 874679, tzinfo=<Timezone [UTC]>), 1) to executor with priority 3 and queue default
[2019-08-01 14:44:22,944] {base_executor.py:56} INFO - Adding to queue: airflow run kubernetes_sample run_this_first 2019-08-01T13:45:51.874679+00:00 --local -sd /usr/local/airflow/dags/airflow/development/dags/k8s_dag.py
[2019-08-01 14:44:22,948] {kubernetes_executor.py:629} INFO - Add task ('kubernetes_sample', 'run_this_first', datetime.datetime(2019, 8, 1, 13, 45, 51, 874679, tzinfo=<Timezone [UTC]>), 1) with command airflow run kubernetes_sample run_this_first 2019-08-01T13:45:51.874679+00:00 --local -sd /usr/local/airflow/dags/airflow/development/dags/k8s_dag.py with executor_config {}
[2019-08-01 14:44:22,949] {kubernetes_executor.py:379} INFO - Kubernetes job is (('kubernetes_sample', 'run_this_first', datetime.datetime(2019, 8, 1, 13, 45, 51, 874679, tzinfo=<Timezone [UTC]>), 1), 'airflow run kubernetes_sample run_this_first 2019-08-01T13:45:51.874679+00:00 --local -sd /usr/local/airflow/dags/airflow/development/dags/k8s_dag.py', KubernetesExecutorConfig(image=None, image_pull_policy=None, request_memory=None, request_cpu=None, limit_memory=None, limit_cpu=None, gcp_service_account_key=None, node_selectors=None, affinity=None, annotations={}, volumes=[], volume_mounts=[], tolerations=None))
[2019-08-01 14:44:23,042] {kubernetes_executor.py:292} INFO - Event: kubernetessamplerunthisfirst-7fe05ddb34aa4cb9a5604e420d5b60a3 had an event of type ADDED
[2019-08-01 14:44:23,046] {kubernetes_executor.py:324} INFO - Event: kubernetessamplerunthisfirst-7fe05ddb34aa4cb9a5604e420d5b60a3 Pending
[2019-08-01 14:44:23,049] {kubernetes_executor.py:292} INFO - Event: kubernetessamplerunthisfirst-7fe05ddb34aa4cb9a5604e420d5b60a3 had an event of type MODIFIED
[2019-08-01 14:44:23,049] {kubernetes_executor.py:324} INFO - Event: kubernetessamplerunthisfirst-7fe05ddb34aa4cb9a5604e420d5b60a3 Pending
for reference, here is my PV and PVC;
kind: PersistentVolume
apiVersion: v1
metadata:
name: airflow-pv
labels:
mode: local
environment: development
spec:
persistentVolumeReclaimPolicy: Retain
storageClassName: airflow-pv
capacity:
storage: 4Gi
accessModes:
- ReadWriteMany
nfs:
server: 10.105.225.217
path: "/"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: airflow-pvc
namespace: development
spec:
storageClassName: airflow-pv
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
selector:
matchLabels:
mode: local
environment: development
Using Airflow version: 1.10.3
Since no answer yet, I'll share my findings so far. In my airflow.conf under kubernetes section, we are to pass the following values
dags_volume_claim = airflow-pvc
dags_volume_subpath = airflow/development/dags
logs_volume_claim = airflow-pvc
logs_volume_subpath = airflow/development/logs
the way how scheduler creates a new pod from the above configs is as follows (only mentioning the volumes and volumeMounts);
"volumes": [
{
"name": "airflow-dags",
"persistentVolumeClaim": {
"claimName": "airflow-pvc"
}
},
{
"name": "airflow-logs",
"persistentVolumeClaim": {
"claimName": "airflow-pvc"
}
}],
"containers": [
{ ...
"volumeMounts": [
{
"name": "airflow-dags",
"readOnly": true,
"mountPath": "/usr/local/airflow/dags",
"subPath": "airflow/development/dags"
},
{
"name": "airflow-logs",
"mountPath": "/usr/local/airflow/logs",
"subPath": "airflow/development/logs"
}]
...}]
K8s DOESN'T likes multiple volumes pointing to same pvc (airflow-pvc). To fix this, I'd to create two PVC (and PV) for dags and logs dags_volume_claim = airflow-dags-pvc and logs_volume_claim = airflow-log-pvc which works fine.
I don't kow if this has already been addressed in newer version of airflow (I am using 1.10.3). The airflow scheduler should handle this case when ppl using same PVC then create a pod with single volume and 2 volumeMounts referring to that Volume e.g.
"volumes": [
{
"name": "airflow-dags-logs", <--just an example name
"persistentVolumeClaim": {
"claimName": "airflow-pvc"
}
}
"containers": [
{ ...
"volumeMounts": [
{
"name": "airflow-dags-logs",
"readOnly": true,
"mountPath": "/usr/local/airflow/dags",
"subPath": "airflow/development/dags" <--taken from configs
},
{
"name": "airflow-dags-logs",
"mountPath": "/usr/local/airflow/logs",
"subPath": "airflow/development/logs" <--taken from configs
}]
...}]
I deployed a pod with above configurations and it works!
The article here: https://istio.io/docs/tasks/security/authn-policy/
Specifically, when I follow the instruction on the Setup section, I can't connect any httpbin that are residing in namespace foo and bar. But the legacy's one is okay. I expect there is something wrong in the side car proxy being installed.
Here is the output of httpbin pod yaml file (after being injected with istioctl kubeinject --includeIPRanges "10.32.0.0/16" command). I use --includeIPRanges so that the pod can communicate with external ip (for my debugging purpose to install dnsutils, etc package)
apiVersion: v1
kind: Pod
metadata:
annotations:
sidecar.istio.io/inject: "true"
sidecar.istio.io/status: '{"version":"4120ea817406fd7ed43b7ecf3f2e22abe453c44d3919389dcaff79b210c4cd86","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
creationTimestamp: 2018-08-15T11:40:59Z
generateName: httpbin-8b9cf99f5-
labels:
app: httpbin
pod-template-hash: "465795591"
version: v1
name: httpbin-8b9cf99f5-9c47z
namespace: foo
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: httpbin-8b9cf99f5
uid: 1450d75d-a080-11e8-aece-42010a940168
resourceVersion: "65722138"
selfLink: /api/v1/namespaces/foo/pods/httpbin-8b9cf99f5-9c47z
uid: 1454b68d-a080-11e8-aece-42010a940168
spec:
containers:
- image: docker.io/citizenstig/httpbin
imagePullPolicy: IfNotPresent
name: httpbin
ports:
- containerPort: 8000
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-pkpvf
readOnly: true
- args:
- proxy
- sidecar
- --configPath
- /etc/istio/proxy
- --binaryPath
- /usr/local/bin/envoy
- --serviceCluster
- httpbin
- --drainDuration
- 45s
- --parentShutdownDuration
- 1m0s
- --discoveryAddress
- istio-pilot.istio-system:15007
- --discoveryRefreshDelay
- 1s
- --zipkinAddress
- zipkin.istio-system:9411
- --connectTimeout
- 10s
- --statsdUdpAddress
- istio-statsd-prom-bridge.istio-system.istio-system:9125
- --proxyAdminPort
- "15000"
- --controlPlaneAuthPolicy
- NONE
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: ISTIO_META_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
image: docker.io/istio/proxyv2:1.0.0
imagePullPolicy: IfNotPresent
name: istio-proxy
resources:
requests:
cpu: 10m
securityContext:
privileged: false
readOnlyRootFilesystem: true
runAsUser: 1337
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/certs/
name: istio-certs
readOnly: true
dnsPolicy: ClusterFirst
initContainers:
- args:
- -p
- "15001"
- -u
- "1337"
- -m
- REDIRECT
- -i
- 10.32.0.0/16
- -x
- ""
- -b
- 8000,
- -d
- ""
image: docker.io/istio/proxy_init:1.0.0
imagePullPolicy: IfNotPresent
name: istio-init
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeName: gke-tvlk-data-dev-default-medium-pool-46397778-q2sb
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-pkpvf
secret:
defaultMode: 420
secretName: default-token-pkpvf
- emptyDir:
medium: Memory
name: istio-envoy
- name: istio-certs
secret:
defaultMode: 420
optional: true
secretName: istio.default
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-08-15T11:41:01Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-08-15T11:44:28Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-08-15T11:40:59Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://758e130a4c31a15c1b8bc1e1f72bd7739d5fa1103132861eea9ae1a6ae1f080e
image: citizenstig/httpbin:latest
imageID: docker-pullable://citizenstig/httpbin#sha256:b81c818ccb8668575eb3771de2f72f8a5530b515365842ad374db76ad8bcf875
lastState: {}
name: httpbin
ready: true
restartCount: 0
state:
running:
startedAt: 2018-08-15T11:41:01Z
- containerID: docker://9c78eac46a99457f628493975f5b0c5bbffa1dac96dab5521d2efe4143219575
image: istio/proxyv2:1.0.0
imageID: docker-pullable://istio/proxyv2#sha256:77915a0b8c88cce11f04caf88c9ee30300d5ba1fe13146ad5ece9abf8826204c
lastState:
terminated:
containerID: docker://52299a80a0fa8949578397357861a9066ab0148ac8771058b83e4c59e422a029
exitCode: 255
finishedAt: 2018-08-15T11:44:27Z
reason: Error
startedAt: 2018-08-15T11:41:02Z
name: istio-proxy
ready: true
restartCount: 1
state:
running:
startedAt: 2018-08-15T11:44:28Z
hostIP: 10.32.96.27
initContainerStatuses:
- containerID: docker://f267bb44b70d2d383ce3f9943ab4e917bb0a42ecfe17fe0ed294bde4d8284c58
image: istio/proxy_init:1.0.0
imageID: docker-pullable://istio/proxy_init#sha256:345c40053b53b7cc70d12fb94379e5aa0befd979a99db80833cde671bd1f9fad
lastState: {}
name: istio-init
ready: true
restartCount: 0
state:
terminated:
containerID: docker://f267bb44b70d2d383ce3f9943ab4e917bb0a42ecfe17fe0ed294bde4d8284c58
exitCode: 0
finishedAt: 2018-08-15T11:41:00Z
reason: Completed
startedAt: 2018-08-15T11:41:00Z
phase: Running
podIP: 10.32.19.61
qosClass: Burstable
startTime: 2018-08-15T11:40:59Z
Here is the example command when I got the error sleep.legacy -> httpbin.foo
> kubectl exec $(kubectl get pod -l app=sleep -n legacy -o jsonpath={.items..metadata.name}) -c sleep -n legacy -- curl http://httpbin.foo:8000/ip -s -o /dev/null -w "%{http_code}\n"
000
command terminated with exit code 7
** Here is the example command when I get success status: sleep.legacy -> httpbin.legacy **
> kubectl exec $(kubectl get pod -l app=sleep -n legacy -o jsonpath={.items..metadata.name}) -csleep -n legacy -- curl http://httpbin.legacy:8000/ip -s -o /dev/null -w "%{http_code}\n"
200
I have followed the instruction to ensure there is no mtls policy defined, etc.
> kubectl get policies.authentication.istio.io --all-namespaces
No resources found.
> kubectl get meshpolicies.authentication.istio.io
No resources found.
> kubectl get destinationrules.networking.istio.io --all-namespaces -o yaml | grep "host:"
host: istio-policy.istio-system.svc.cluster.local
host: istio-telemetry.istio-system.svc.cluster.local
NVM, I think I found why. There is configuration being messed up in my part.
If you take a look at the statsd address, it is defined with unrecognized hostname istio-statsd-prom-bridge.istio-system.istio-system:9125. I noticed that after looking at the proxy container being restarted/crashed multiple times.