Kubernetes doesn't allow to mount file to container - kubernetes

I ran into the below error when trying to deploy an application in a kubernetes cluster. It looks like kubernetes doesn't allow to mount a file to containers, do you know the possible reason?
deployment config file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: model-loader-service
namespace: "{{ .Values.nsPrefix }}-aai"
spec:
selector:
matchLabels:
app: model-loader-service
template:
metadata:
labels:
app: model-loader-service
name: model-loader-service
spec:
containers:
- name: model-loader-service
image: "{{ .Values.image.modelLoaderImage }}:{{ .Values.image.modelLoaderVersion }}"
imagePullPolicy: {{ .Values.pullPolicy }}
env:
- name: CONFIG_HOME
value: /opt/app/model-loader/config/
volumeMounts:
- mountPath: /etc/localtime
name: localtime
readOnly: true
- mountPath: /opt/app/model-loader/config/
name: aai-model-loader-config
- mountPath: /var/log/onap
name: aai-model-loader-logs
- mountPath: /opt/app/model-loader/bundleconfig/etc/logback.xml
name: aai-model-loader-log-conf
subPath: logback.xml
ports:
- containerPort: 8080
- containerPort: 8443
- name: filebeat-onap-aai-model-loader
image: {{ .Values.image.filebeat }}
imagePullPolicy: {{ .Values.pullPolicy }}
volumeMounts:
- mountPath: /usr/share/filebeat/filebeat.yml
name: filebeat-conf
- mountPath: /var/log/onap
name: aai-model-loader-logs
- mountPath: /usr/share/filebeat/data
name: aai-model-loader-filebeat
volumes:
- name: localtime
hostPath:
path: /etc/localtime
- name: aai-model-loader-config
hostPath:
path: "/dockerdata-nfs/{{ .Values.nsPrefix }}/aai/model-loader/appconfig/"
- name: filebeat-conf
hostPath:
path: /dockerdata-nfs/{{ .Values.nsPrefix }}/log/filebeat/logback/filebeat.yml
Details information of this issue:
message: 'invalid header field value "oci runtime error: container_linux.go:247:
starting container process caused \"process_linux.go:359: container init
caused \\\"rootfs_linux.go:53: mounting \\\\\\\"/dockerdata-nfs/onap/log/filebeat/logback/filebeat.yml\\\\\\\"
to rootfs \\\\\\\"/var/lib/docker/aufs/mnt/7cd32a29938e9f70a727723f550474cb5b41c0966f45ad0c323360779f08cf5c\\\\\\\"
at \\\\\\\"/var/lib/docker/aufs/mnt/7cd32a29938e9f70a727723f550474cb5b41c0966f45ad0c323360779f08cf5c/usr/share/filebeat/filebeat.yml\\\\\\\"
caused \\\\\\\"not a directory\\\\\\\"\\\"\"\n"'
....
$ docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.4
Git commit: 78d1802
Built: Tue Jan 10 20:38:45 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.4
Git commit: 78d1802
Built: Tue Jan 10 20:38:45 2017
OS/Arch: linux/amd64
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.3-rancher3", GitCommit:"772c4c54e1f4ae7fc6f63a8e1ecd9fe616268e16", GitTreeState:"clean", BuildDate:"2017-11-27T19:51:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

caused "not a directory" is kind of self explanatory. What is the exact volume and volumeMount definition you use ? do you use subPath in your declaration ?
EDIT: change
- name: filebeat-conf
hostPath:
path: /dockerdata-nfs/{{ .Values.nsPrefix }}/log/filebeat/logback/filebeat.yml
to
- name: filebeat-conf
hostPath:
path: /dockerdata-nfs/{{ .Values.nsPrefix }}/log/filebeat/logback/
and add subPath: filebeat.yml to volumeMount

SELinux may also be the culprit here. Log on to the node and execute sestatus. If the policy is disabled, you will see the output as SELINUX=disabled else it will be something similar to this:
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: mcs
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
First Option:
You can either disable the selinux by editing the /etc/selinux/config file and update the SELINUX=permissive to SELINUX=disabled. Once done, reboot the machine and deploy to see if fixed. However, this is not the recommonded way and can be seen as a temporary fix.
Second Option:
Log on to the node and execute ps -efZ | grep kubelet which will give something like this.
system_u:system_r:kernel_t:s0 root 1592 1 2 May23 ? 09:58:18 /usr/local/bin/kubelet --anonymous-auth=false
Now, from this output capture the string system_u:system_r:kernel_t:s0 which can be changed to security context as below in your deployment.
securityContext:
seLinuxOptions:
user: system_u
role: system_r
type: spc_t
level: s0
Deploy your application and check the logs if it is fixed. Do let me know if this works for you or need any further help.

Is this a multi-node cluster? If so, the file needs to exist on all Kubernetes nodes since the pod is typically scheduled on a randomly available host machine. In any case, ConfigMaps are a much better way to supply static/read-only files to a container.

Related

Is EFS a good logs backup option if Loki pod terminated accidentally in EKS Fargate

I am currently using Loki to store logs generated by my applications from EKS Fargate. Sidecar pattern with promtail is used to scrape logs. Single Loki pod is used and S3 is configured as a destination to store logs. It works nicely as expected. However, when I tested the availability of the logging system by deleting pods, I discovered that if Loki’s pod was deleted, some logs would be missing (range around 20 mins before the pod was deleted to the time the pod was deleted) even after the pod restarted.
To solve this problem, I tried to use EFS as the persistent volume of Loki’ pod, mounting the path /loki. The whole process is followed by this article (https://aws.amazon.com/blogs/aws/new-aws-fargate-for-amazon-eks-now-supports-amazon-efs/). But I have got an error from the Loki pod with msg "error running loki" err="mkdir /loki/compactor: permission denied”
Therefore, I have 2 questions in my mind:
Should I use EFS as a solution for log backup in my case?
Why did I get a permission denied inside the pod, any ways to solve this problem?
My Loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
# grpc_listen_port: 9096
ingester:
wal:
enabled: true
dir: /loki/wal
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
# final_sleep: 0s
chunk_idle_period: 3m
chunk_retain_period: 30s
max_transfer_retries: 0
chunk_target_size: 1048576
schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
shared_store: s3
aws:
bucketnames: bucketnames
endpoint: s3.us-west-2.amazonaws.com
region: us-west-2
access_key_id: access_key_id
secret_access_key: secret_access_key
sse_encryption: true
compactor:
working_directory: /loki/compactor
shared_store: s3
compaction_interval: 5m
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 48h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 96h
querier:
query_ingesters_within: 0
analytics:
reporting_enabled: false
Deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: fargate-api-dev
name: dev-loki
spec:
selector:
matchLabels:
app: dev-loki
template:
metadata:
labels:
app: dev-loki
spec:
volumes:
- name: loki-config
configMap:
name: dev-loki-config
- name: dev-loki-efs-pv
persistentVolumeClaim:
claimName: dev-loki-efs-pvc
containers:
- name: loki
image: loki:2.6.1
args:
- -print-config-stderr=true
- -config.file=/tmp/loki.yaml
resources:
limits:
memory: "500Mi"
cpu: "200m"
ports:
- containerPort: 3100
volumeMounts:
- name: dev-loki-config
mountPath: /tmp
readOnly: false
- name: dev-loki-efs-pv
mountPath: /loki
Promtail-config.yaml
server:
log_level: info
http_listen_port: 9080
clients:
- url: http://loki.com/loki/api/v1/push
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
- job_name: api-log
static_configs:
- targets:
- localhost
labels:
job: apilogs
pod: ${POD_NAME}
__path__: /var/log/*.log
I had a similar issue using EFS as volume to store the logs and I found this solution https://github.com/grafana/loki/issues/2018#issuecomment-1030221498
Basically loki container by it's own is not able to create a directory to start working, so we used a initcotainer to do it for it.
This solution worked like a charm for.

Kubernetes Executor does not spawn Kubernetes Pod Operator

can't figure out where to look. I am running Airflow on GKE. It was running fine but recently started failing. Can't understand where to look. Basically, DAG starts, and then tasks fail, but they were running okay a week ago.
It seems like something changed in a cluster, but based on logs, can't figure out what.
My KubernetesExecutor stopped spawning KubernetesPodOperators and there are no logs or errors.
If I run directly (kubectl apply -f) template I use for Operator, it runs successfully.
Airflow 2.1.2
Kubectl
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:40:09Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.8-gke.900", GitCommit:"28ab8501be88ea42e897ca8514d7cd0b436253d9", GitTreeState:"clean", BuildDate:"2021-06-30T09:23:36Z", GoVersion:"go1.15.13b5", Compiler:"gc", Platform:"linux/amd64"}
Executor Template
apiVersion: v1
kind: Pod
metadata:
...
spec:
restartPolicy: Never
serviceAccountName: airflow # this account have rights to create pods
automountServiceAccountToken: true
volumes:
- name: dags
emptyDir: {}
- name: logs
emptyDir: {}
- configMap:
name: airflow-git-sync-configmap
name: airflow-git-sync-configmap
initContainers:
- name: git-sync-clone
securityContext:
runAsUser: 65533
runAsGroup: 65533
image: k8s.gcr.io/git-sync/git-sync:v3.3.1
imagePullPolicy: Always
volumeMounts:
- mountPath: /tmp/git
name: dags
resources:
...
args: ["--one-time"]
envFrom:
- configMapRef:
name: airflow-git-sync-configmap
- secretRef:
name: airflow-git-sync-secret
containers:
- name: base
image: <artifactory_url>/airflow:latest
volumeMounts:
- name: dags
mountPath: /opt/airflow/dags
- name: logs
mountPath: /opt/airflow/logs
imagePullPolicy: Always
Pod template
apiVersion: v1
kind: Pod
metadata:
....
spec:
serviceAccountName: airflow
automountServiceAccountToken: true
volumes:
- name: sql
emptyDir: {}
initContainers:
- name: git-sync
image: k8s.gcr.io/git-sync/git-sync:v3.3.1
imagePullPolicy: Always
args: ["--one-time"]
volumeMounts:
- name: sql
mountPath: /tmp/git/
resources:
requests:
memory: 300Mi
cpu: 500m
limits:
memory: 600Mi
cpu: 1000m
envFrom:
- configMapRef:
name: git-sync-configmap
- secretRef:
name: airflow-git-sync-secret
containers:
- name: base
imagePullPolicy: Always
image: <artifactory_url>/clickhouse-client-gcloud:20.6.4.44
volumeMounts:
- name: sql
mountPath: /opt/sql
resources:
....
env:
- name: GS_SERVICE_ACCOUNT
valueFrom:
secretKeyRef:
name: gs-service-account
key: service_account.json
- name: DB_CREDENTIALS
valueFrom:
secretKeyRef:
name: estimation-db-secret
key: db_cred.json
DAG code
from textwrap import dedent
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
TEMPLATE_PATH = "/opt/airflow/dags/airflow-dags.git/pod_templates"
args = {
...
}
def create_pipeline(dag_: DAG):
task_startup_client = KubernetesPodOperator(
name="clickhouse-client",
task_id="clickhouse-client",
labels={"application": "clickhouse-client-gsutil"},
pod_template_file=f"{TEMPLATE_PATH}/template.yaml",
cmds=["sleep", "60000"],
reattach_on_restart=True,
is_delete_operator_pod=False,
get_logs=True,
log_events_on_failure=True,
dag=dag_,
)
task_startup_client
with DAG(
dag_id="MANUAL-GKE-clickhouse-client",
default_args=args,
schedule_interval=None,
max_active_runs=1,
start_date=days_ago(2),
tags=["utility"],
) as dag:
create_pipeline(dag)
I ran Airflow with DEBUG logging and there is nothing, successful completion:
Scheduler log
...
Event: manualgkeclickhouseclientaticlickhouseclient.9959fa1fd13a4b6fbdaf40549a09d2f9 Succeeded
...
*Executor logs
[2021-08-15 18:40:27,045] {settings.py:208} DEBUG - Setting up DB connection pool (PID 1)
[2021-08-15 18:40:27,046] {settings.py:276} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1
[2021-08-15 18:40:27,095] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f0556c5e280> to pre execution callback
[2021-08-15 18:40:28,070] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f0556c5e280>]
[2021-08-15 18:40:28,106] {settings.py:208} DEBUG - Setting up DB connection pool (PID 1)
[2021-08-15 18:40:28,107] {settings.py:244} DEBUG - settings.prepare_engine_args(): Using NullPool
[2021-08-15 18:40:28,109] {dagbag.py:496} INFO - Filling up the DagBag from /opt/airflow/dags/ati-airflow-dags.git/dag_clickhouse-client.py
[2021-08-15 18:40:28,110] {dagbag.py:311} DEBUG - Importing /opt/airflow/dags/ati-airflow-dags.git/dag_clickhouse-client.py
/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/backcompat/backwards_compat_converters.py:26 DeprecationWarning: This module is deprecated. Please use `kubernetes.client.models.V1Volume`.
/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/backcompat/backwards_compat_converters.py:27 DeprecationWarning: This module is deprecated. Please use `kubernetes.client.models.V1VolumeMount`.
[2021-08-15 18:40:28,135] {dagbag.py:461} DEBUG - Loaded DAG <DAG: MANUAL-GKE-clickhouse-client>
[2021-08-15 18:40:28,176] {plugins_manager.py:281} DEBUG - Loading plugins
[2021-08-15 18:40:28,176] {plugins_manager.py:225} DEBUG - Loading plugins from directory: /opt/airflow/plugins
[2021-08-15 18:40:28,177] {plugins_manager.py:205} DEBUG - Loading plugins from entrypoints
[2021-08-15 18:40:28,238] {plugins_manager.py:418} DEBUG - Integrate DAG plugins
Running <TaskInstance: MANUAL-GKE-clickhouse-client.clickhouse-client 2021-08-15T18:39:38.150950+00:00 [queued]> on host manualgkeclickhouseclientclickhouseclient.9959fa1fd13a4b6fbd
[2021-08-15 18:40:28,670] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
[2021-08-15 18:40:28,670] {settings.py:302} DEBUG - Disposing DB connection pool (PID 1)

Dynamic tagging for Fluentd td-agent source plugin

I'm trying to implement a Streaming Sidecar Container logging architecture in Kubernetes using Fluentd.
In a single pod I have:
emptyDir Volume (as log storage)
Application container
Fluent log-forwarder container
Basically, the Application container logs are stored in the shared emptyDir volume. Fluentd log-forwarder container tails this log file in the shared emptyDir volume and forwards it an external log-aggregator.
The Fluentd log-forwarder container uses the following config in td-agent.conf:
<source>
#type tail
tag "#{ENV['TAG_VALUE']}"
path (path to log file in volume)
pos_file /var/log/td-agent/tmp/access.log.pos
format json
time_key time
time_format %iso8601
keep_time_key true
</source>
<match *.*>
#type forward
#id forward_tail
heartbeat_type tcp
<server>
host (server-host-address)
</server>
</match>
I'm using an environment variable to set the tag value so I can change it dynamically e.g. when I have to use this container side-by-side with a different Application container, I don't have to modify this config and rebuild this image again.
Now, I set the environment variable value during pod creation in Kubernetes:
.
.
spec:
containers:
- name: application-pod
image: application-image:1.0
ports:
- containerPort: 1234
volumeMounts:
- name: logvolume
mountPath: /var/log/app
- name: log-forwarder
image: log-forwarder-image:1.0
env:
- name: "TAG_VALUE"
value: "app.service01"
volumeMounts:
- name: logvolume
mountPath: /var/log/app
volumes:
- name: logvolume
emptyDir: {}
After deploying the pod, I found that the tag value in the Fluentd log-forwarder container comes out empty (expected value: "app.service01"). I imagine it's because Fluentd's td-agent initializes first before the TAG_VALUE environment variable gets assigned.
So, the main question is...
How can I dynamically set the td-agent's tag value?
But really, what I'm wondering is:
Is it possible to assign an environment variable before a container's initialization in Kubernetes?
As an answer to your first question (How can I dynamically set the td-agent's tag value?), this seems the best way that you are doing which is defining tag "#{ENV['TAG_VALUE']}" inside fluentd config file.
For your second question, environment variable is assigned before a container's initialization.
So it means it should work and I tested with below sample yaml, and it just worked fine.
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-conf
data:
fluentd.conf.template: |
<source>
#type tail
tag "#{ENV['TAG_VALUE']}"
path /var/log/nginx/access.log
format nginx
</source>
<match *.*>
#type stdout
</match>
---
apiVersion: v1
kind: Pod
metadata:
name: log-forwarder
labels:
purpose: test-fluentd
spec:
containers:
- name: nginx
image: nginx:latest
volumeMounts:
- name: logvolume
mountPath: /var/log/nginx
- name: fluentd
image: fluent/fluentd
env:
- name: "TAG_VALUE"
value: "test.nginx"
- name: "FLUENTD_CONF"
value: "fluentd.conf"
volumeMounts:
- name: fluentd-conf
mountPath: /fluentd/etc
- name: logvolume
mountPath: /var/log/nginx
volumes:
- name: fluentd-conf
configMap:
name: fluentd-conf
items:
- key: fluentd.conf.template
path: fluentd.conf
- name: logvolume
emptyDir: {}
restartPolicy: Never
And when I curl nginx pod, I see this output on fluentd containers stdout.
kubectl logs -f log-forwarder fluentd
2019-03-20 09:50:54.000000000 +0000 test.nginx: {"remote":"10.20.14.1","host":"-","user":"-","method":"GET","path":"/","code":"200","size":"612","referer":"-","agent":"curl/7.60.0","http_x_forwarded_for":"-"}
2019-03-20 09:50:55.000000000 +0000 test.nginx: {"remote":"10.20.14.1","host":"-","user":"-","method":"GET","path":"/","code":"200","size":"612","referer":"-","agent":"curl/7.60.0","http_x_forwarded_for":"-"}
2019-03-20 09:50:56.000000000 +0000 test.nginx: {"remote":"10.128.0.26","host":"-","user":"-","method":"GET","path":"/","code":"200","size":"612","referer":"-","agent":"curl/7.60.0","http_x_forwarded_for":"-"}
As you can see, my environment variable TAG_VALUE=test.nginx has applied to log entries.
I hope it will be useful.
You can use the combo fluent-plugin-kubernetes_metadata_filter and fluent-plugin-rewrite-tag-filter to set container name or something to the tag.

Kubernetes job pod completed successfully but one of the containers were not ready

I've got some strange looking behavior.
When a job is run, it completes successfully but one of the containers says it's not (or was not..) ready:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default **********-migration-22-20-16-29-11-2018-xnffp 1/2 Completed 0 11h 10.4.5.8 gke-******
job yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: migration-${timestamp_hhmmssddmmyy}
labels:
jobType: database-migration
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: app
image: "${appApiImage}"
imagePullPolicy: IfNotPresent
command:
- php
- artisan
- migrate
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=${SQL_INSTANCE_NAME}=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
What may be the cause of this behavior? There is no readiness or liveness probes defined on the containers.
If I do a describe on the pod, the relevant info is:
...
Command:
php
artisan
migrate
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 29 Nov 2018 22:20:18 +0000
Finished: Thu, 29 Nov 2018 22:20:19 +0000
Ready: False
Restart Count: 0
Requests:
cpu: 100m
...
A Pod with a Ready status means it "is able to serve requests and should be added to the load balancing pools of all matching Services", see https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions
In your case, you don't want to serve requests, but simply to execute php artisan migrate once, and done. So you don't have to worry about this status, the important part is the State: Terminated with a Reason: Completed and a zero exit code: your command did whatever and then exited successfully.
If the result of the command is not what you expected, you'd have to investigate the logs from the container that ran this command with kubectl logs your-pod -c app (where app is the name of the container you defined), and/or you would expect the php artisan migrate command to NOT issue a zero exit code.
In my case, I was using istio, and experienced the same issue, removing istio-sidecar from the job pod solves this problem.
my solution if using istio:
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"

Why Kubernetes Pod gets into Terminated state giving Completed reason and exit code 0?

I am struggling to find any answer to this in the Kubernetes documentation. The scenario is the following:
Kubernetes version 1.4 over AWS
8 pods running a NodeJS API (Express) deployed as a Kubernetes Deployment
One of the pods gets restarted for no apparent reason late at night (no traffic, no CPU spikes, no memory pressure, no alerts...). Number of restarts is increased as a result of this.
Logs don't show anything abnormal (ran kubectl -p to see previous logs, no errors at all in there)
Resource consumption is normal, cannot see any events about Kubernetes rescheduling the pod into another node or similar
Describing the pod gives back TERMINATED state, giving back COMPLETED reason and exit code 0. I don't have the exact output from kubectl as this pod has been replaced multiple times now.
The pods are NodeJS server instances, they cannot complete, they are always running waiting for requests.
Would this be internal Kubernetes rearranging of pods? Is there any way to know when this happens? Shouldn't be an event somewhere saying why it happened?
Update
This just happened in our prod environment. The result of describing the offending pod is:
api:
Container ID: docker://7a117ed92fe36a3d2f904a882eb72c79d7ce66efa1162774ab9f0bcd39558f31
Image: 1.0.5-RC1
Image ID: docker://sha256:XXXX
Ports: 9080/TCP, 9443/TCP
State: Running
Started: Mon, 27 Mar 2017 12:30:05 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 24 Mar 2017 13:32:14 +0000
Finished: Mon, 27 Mar 2017 12:29:58 +0100
Ready: True
Restart Count: 1
Update 2
Here it is the deployment.yaml file used:
apiVersion: "extensions/v1beta1"
kind: "Deployment"
metadata:
namespace: "${ENV}"
name: "${APP}${CANARY}"
labels:
component: "${APP}${CANARY}"
spec:
replicas: ${PODS}
minReadySeconds: 30
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
component: "${APP}${CANARY}"
spec:
serviceAccount: "${APP}"
${IMAGE_PULL_SECRETS}
containers:
- name: "${APP}${CANARY}"
securityContext:
capabilities:
add:
- IPC_LOCK
image: "134078050561.dkr.ecr.eu-west-1.amazonaws.com/${APP}:${TAG}"
env:
- name: "KUBERNETES_CA_CERTIFICATE_FILE"
value: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: "metadata.namespace"
- name: "ENV"
value: "${ENV}"
- name: "PORT"
value: "${INTERNAL_PORT}"
- name: "CACHE_POLICY"
value: "all"
- name: "SERVICE_ORIGIN"
value: "${SERVICE_ORIGIN}"
- name: "DEBUG"
value: "http,controllers:recommend"
- name: "APPDYNAMICS"
value: "true"
- name: "VERSION"
value: "${TAG}"
ports:
- name: "http"
containerPort: ${HTTP_INTERNAL_PORT}
protocol: "TCP"
- name: "https"
containerPort: ${HTTPS_INTERNAL_PORT}
protocol: "TCP"
The Dockerfile of the image referenced in the above Deployment manifest:
FROM ubuntu:14.04
ENV NVM_VERSION v0.31.1
ENV NODE_VERSION v6.2.0
ENV NVM_DIR /home/app/nvm
ENV NODE_PATH $NVM_DIR/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/v$NODE_VERSION/bin:$PATH
ENV APP_HOME /home/app
RUN useradd -c "App User" -d $APP_HOME -m app
RUN apt-get update; apt-get install -y curl
USER app
# Install nvm with node and npm
RUN touch $HOME/.bashrc; curl https://raw.githubusercontent.com/creationix/nvm/${NVM_VERSION}/install.sh | bash \
&& /bin/bash -c 'source $NVM_DIR/nvm.sh; nvm install $NODE_VERSION'
ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/$NODE_VERSION/bin:$PATH
# Create app directory
WORKDIR /home/app
COPY . /home/app
# Install app dependencies
RUN npm install
EXPOSE 9080 9443
CMD [ "npm", "start" ]
npm start is an alias for a regular node app.js command that starts a NodeJS server on port 9080.
Check the version of docker you run, and whether the docker daemon was restarted during that time.
If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.
If this is still relevant, we just had a similar problem in our cluster.
We have managed to find more information by inspecting the logs from docker itself. ssh onto your k8s node and run the following:
sudo journalctl -fu docker.service
I hade similar problem when we upgraded to version 2.x Pos get restarted even after the Dags ran successfully.
I later resolved it after a long time of debugging by overriding the pod template and specifying it in the airflow.cfg file.
[kubernetes]
....
pod_template_file = {{ .Values.airflow.home }}/pod_template.yaml
---
# pod_template.yaml
apiVersion: v1
kind: Pod
metadata:
name: dummy-name
spec:
serviceAccountName: default
restartPolicy: Never
containers:
- name: base
image: dummy_image
imagePullPolicy: IfNotPresent
ports: []
command: []