I am facing the following issue related specifying namespace quota.
namespace quota specified is not getting created via helm.
My file namspacequota.yaml is as shown below
apiVersion: v1
kind: ResourceQuota
metadata:
name: namespacequota
namespace: {{ .Release.Namespace }}
spec:
hard:
requests.cpu: "3"
requests.memory: 10Gi
limits.cpu: "6"
limits.memory: 12Gi
Below command used for installation
helm install privachart3 . -n test-1
However the resourcequota is not getting created.
kubectl get resourcequota -n test-1
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NAME CREATED AT
gke-resource-quotas 2021-01-20T06:14:16Z
I can define the resource-quota using the below kubectl command.
kubectl apply -f namespacequota.yaml --namespace=test-1
The only change required in the file above is commenting of line number-5 that consist of release-name.
kubectl get resourcequota -n test-1
NAME CREATED AT
gke-resource-quotas 2021-01-20T06:14:16Z
namespacequota 2021-01-23T07:30:27Z
However in this case, when i am trying to install the chart, the PVC is created, but the POD is not getting created.
The capacity is not an issue as i am just trying to create a single maria-db pod using "Deployment".
Command used for install given below
helm install chart3 . -n test-1
Output observed given below
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NAME: chart3
LAST DEPLOYED: Sat Jan 23 08:38:50 2021
NAMESPACE: test-1
STATUS: deployed
REVISION: 1
TEST SUITE: None
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I got the answer from another the Git forum.
Upon setting a namespace quota we need to explicitly set the POD's resource.
In my case i just needed to specify the resource limit under the image.
- image: wordpress:4.8-apache
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Post that i am now able to observe the PODs as well
[george#dis ]$ kubectl get resourcequota -n geo-test
NAME AGE REQUEST LIMIT
gke-resource-quotas 31h count/ingresses.extensions: 0/100, count/ingresses.networking.k8s.io: 0/100, count/jobs.batch: 0/5k, pods: 2/1500, services: 2/500
namespace-quota 7s requests.cpu: 500m/1, requests.memory: 128Mi/1Gi limits.cpu: 1/3, limits.memory: 256Mi/3Gi
[george#dis ]$
.
[george#dis ]$ kubectl get pod -n geo-test
NAME READY STATUS RESTARTS AGE
wordpress-7687695f98-w7m5b 1/1 Running 0 32s
wordpress-mysql-7ff55f869d-2w6zs 1/1 Running 0 32s
[george#dis ]$
Related
On my Jetson NX, I like to set a yaml file that can mount 2 cameras to pod,
the yaml:
containers:
- name: my-pod
image: my_image:v1.0.0
imagePullPolicy: Always
volumeMounts:
- mountPath: /dev/video0
name: dev-video0
- mountPath: /dev/video1
name: dev-video1
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 9000
command: [ "/bin/bash"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
securityContext:
privileged: true
volumes:
- hostPath:
path: /dev/video0
type: ""
name: dev-video0
- hostPath:
path: /dev/video1
type: ""
name: dev-video1
but when I deploy it as pod, get the error:
MountVolume.SetUp failed for volume "default-token-c8hm5" : failed to sync secret cache: timed out waiting for the condition
I had tried to remove volumes in yaml, and the pod can be successfully deployed. Any comments on this issue?
Another issue is that when there is a pod got some issues, it will consume the rest of my storage of my Jetson NX, I guess maybe k8s will make lots of temporary files or logs...? when something wrong happening, any solution to this issue, otherwise all od my pods will be evicted...
can't figure out where to look. I am running Airflow on GKE. It was running fine but recently started failing. Can't understand where to look. Basically, DAG starts, and then tasks fail, but they were running okay a week ago.
It seems like something changed in a cluster, but based on logs, can't figure out what.
My KubernetesExecutor stopped spawning KubernetesPodOperators and there are no logs or errors.
If I run directly (kubectl apply -f) template I use for Operator, it runs successfully.
Airflow 2.1.2
Kubectl
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:40:09Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.8-gke.900", GitCommit:"28ab8501be88ea42e897ca8514d7cd0b436253d9", GitTreeState:"clean", BuildDate:"2021-06-30T09:23:36Z", GoVersion:"go1.15.13b5", Compiler:"gc", Platform:"linux/amd64"}
Executor Template
apiVersion: v1
kind: Pod
metadata:
...
spec:
restartPolicy: Never
serviceAccountName: airflow # this account have rights to create pods
automountServiceAccountToken: true
volumes:
- name: dags
emptyDir: {}
- name: logs
emptyDir: {}
- configMap:
name: airflow-git-sync-configmap
name: airflow-git-sync-configmap
initContainers:
- name: git-sync-clone
securityContext:
runAsUser: 65533
runAsGroup: 65533
image: k8s.gcr.io/git-sync/git-sync:v3.3.1
imagePullPolicy: Always
volumeMounts:
- mountPath: /tmp/git
name: dags
resources:
...
args: ["--one-time"]
envFrom:
- configMapRef:
name: airflow-git-sync-configmap
- secretRef:
name: airflow-git-sync-secret
containers:
- name: base
image: <artifactory_url>/airflow:latest
volumeMounts:
- name: dags
mountPath: /opt/airflow/dags
- name: logs
mountPath: /opt/airflow/logs
imagePullPolicy: Always
Pod template
apiVersion: v1
kind: Pod
metadata:
....
spec:
serviceAccountName: airflow
automountServiceAccountToken: true
volumes:
- name: sql
emptyDir: {}
initContainers:
- name: git-sync
image: k8s.gcr.io/git-sync/git-sync:v3.3.1
imagePullPolicy: Always
args: ["--one-time"]
volumeMounts:
- name: sql
mountPath: /tmp/git/
resources:
requests:
memory: 300Mi
cpu: 500m
limits:
memory: 600Mi
cpu: 1000m
envFrom:
- configMapRef:
name: git-sync-configmap
- secretRef:
name: airflow-git-sync-secret
containers:
- name: base
imagePullPolicy: Always
image: <artifactory_url>/clickhouse-client-gcloud:20.6.4.44
volumeMounts:
- name: sql
mountPath: /opt/sql
resources:
....
env:
- name: GS_SERVICE_ACCOUNT
valueFrom:
secretKeyRef:
name: gs-service-account
key: service_account.json
- name: DB_CREDENTIALS
valueFrom:
secretKeyRef:
name: estimation-db-secret
key: db_cred.json
DAG code
from textwrap import dedent
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
TEMPLATE_PATH = "/opt/airflow/dags/airflow-dags.git/pod_templates"
args = {
...
}
def create_pipeline(dag_: DAG):
task_startup_client = KubernetesPodOperator(
name="clickhouse-client",
task_id="clickhouse-client",
labels={"application": "clickhouse-client-gsutil"},
pod_template_file=f"{TEMPLATE_PATH}/template.yaml",
cmds=["sleep", "60000"],
reattach_on_restart=True,
is_delete_operator_pod=False,
get_logs=True,
log_events_on_failure=True,
dag=dag_,
)
task_startup_client
with DAG(
dag_id="MANUAL-GKE-clickhouse-client",
default_args=args,
schedule_interval=None,
max_active_runs=1,
start_date=days_ago(2),
tags=["utility"],
) as dag:
create_pipeline(dag)
I ran Airflow with DEBUG logging and there is nothing, successful completion:
Scheduler log
...
Event: manualgkeclickhouseclientaticlickhouseclient.9959fa1fd13a4b6fbdaf40549a09d2f9 Succeeded
...
*Executor logs
[2021-08-15 18:40:27,045] {settings.py:208} DEBUG - Setting up DB connection pool (PID 1)
[2021-08-15 18:40:27,046] {settings.py:276} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1
[2021-08-15 18:40:27,095] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f0556c5e280> to pre execution callback
[2021-08-15 18:40:28,070] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f0556c5e280>]
[2021-08-15 18:40:28,106] {settings.py:208} DEBUG - Setting up DB connection pool (PID 1)
[2021-08-15 18:40:28,107] {settings.py:244} DEBUG - settings.prepare_engine_args(): Using NullPool
[2021-08-15 18:40:28,109] {dagbag.py:496} INFO - Filling up the DagBag from /opt/airflow/dags/ati-airflow-dags.git/dag_clickhouse-client.py
[2021-08-15 18:40:28,110] {dagbag.py:311} DEBUG - Importing /opt/airflow/dags/ati-airflow-dags.git/dag_clickhouse-client.py
/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/backcompat/backwards_compat_converters.py:26 DeprecationWarning: This module is deprecated. Please use `kubernetes.client.models.V1Volume`.
/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/backcompat/backwards_compat_converters.py:27 DeprecationWarning: This module is deprecated. Please use `kubernetes.client.models.V1VolumeMount`.
[2021-08-15 18:40:28,135] {dagbag.py:461} DEBUG - Loaded DAG <DAG: MANUAL-GKE-clickhouse-client>
[2021-08-15 18:40:28,176] {plugins_manager.py:281} DEBUG - Loading plugins
[2021-08-15 18:40:28,176] {plugins_manager.py:225} DEBUG - Loading plugins from directory: /opt/airflow/plugins
[2021-08-15 18:40:28,177] {plugins_manager.py:205} DEBUG - Loading plugins from entrypoints
[2021-08-15 18:40:28,238] {plugins_manager.py:418} DEBUG - Integrate DAG plugins
Running <TaskInstance: MANUAL-GKE-clickhouse-client.clickhouse-client 2021-08-15T18:39:38.150950+00:00 [queued]> on host manualgkeclickhouseclientclickhouseclient.9959fa1fd13a4b6fbd
[2021-08-15 18:40:28,670] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
[2021-08-15 18:40:28,670] {settings.py:302} DEBUG - Disposing DB connection pool (PID 1)
Verions
Airflow: 1.10.7
Kubernetes: 1.14.9
Setup
Airflow is configured to use Kubernetes Executors; Normal operations work just fine;
Dags and logs are accessed via EFS volumes defined with PersistentVolume & PersistentVolumeClaim specs ;
I have the following k8s spec, which I want to run backfill jobs with;
apiVersion: v1
kind: Pod
metadata:
name: backfill-test
namespace: airflow
spec:
serviceAccountName: airflow-service-account
volumes:
- name: airflow-dags
persistentVolumeClaim:
claimName: airflow-dags
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
containers:
- name: somename
image: myimage
volumeMounts:
- name: airflow-dags
mountPath: /usr/local/airflow/dags
readOnly: true
- name: airflow-logs
mountPath: /usr/local/airflow/logs
readOnly: false
env:
- name: AIRFLOW__CORE__EXECUTOR
value: KubernetesExecutor
- name: AIRFLOW__KUBERNETES__NAMESPACE
value: airflow
- name: AIRFLOW__CORE__DAGS_FOLDER
value: dags
- name: AIRFLOW__CORE__BASE_LOG_FOLDER
value: logs
# - name: AIRFLOW__KUBERNETES__DAGS_VOLUME_MOUNT_POINT
# value: /usr/local/airflow/dags
- name: AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH
value: dags
- name: AIRFLOW__KUBERNETES__LOGS_VOLUME_SUBPATH
value: logs
- name: AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM
value: airflow-dags
- name: AIRFLOW__KUBERNETES__LOGS_VOLUME_CLAIM
value: airflow-logs
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY
value: someimage_uri
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG
value: latest
- name: AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME
value: airflow-service-account
- name: AIRFLOW_HOME
value: usr/local/airflow
# command: ["sleep", "1h"]
command: ["airflow", "backfill",
"my_dag",
# # "--subdir", ".",
# "--local",
"--task_regex", "my_task_task",
"--start_date", "2020-07-01T00:00:00",
"--end_date", "2020-08-01T00:00:00"]
restartPolicy: Never
Problem
The issue with this seems to be some pathing problem when the task is added to the queue
When running the initial command, the cli finds the dag and associated task;
airflow#backfill-test:~$ airflow run my_dag my_task 2020-07-01T01:15:00+00:00 --local --raw --force
[2020-08-27 23:14:42,038] {__init__.py:51} INFO - Using executor KubernetesExecutor
[2020-08-27 23:14:42,040] {dagbag.py:403} INFO - Filling up the DagBag from /usr/local/airflow/dags
Running %s on host %s <TaskInstance: my_dag.my_task 2020-07-01T01:15:00+00:00 [failed]> backfill-test
However, the task gets added to the queue with some weird path. Log of the actual task execution attempt below.
[2020-08-27 23:14:43,019] {taskinstance.py:867} INFO - Starting attempt 3 of 2
[2020-08-27 23:14:43,019] {taskinstance.py:868} INFO -
--------------------------------------------------------------------------------
[2020-08-27 23:14:43,043] {taskinstance.py:887} INFO - Executing <Task(PostgresOperator): my_task> on 2020-07-01T01:15:00+00:00
[2020-08-27 23:14:43,046] {standard_task_runner.py:52} INFO - Started process 191 to run task
[2020-08-27 23:14:43,085] {logging_mixin.py:112} INFO - [2020-08-27 23:14:43,085] {dagbag.py:403} INFO - Filling up the DagBag from /usr/local/airflow/dags/usr/local/airflow/my_dag.py
[2020-08-27 23:14:53,006] {logging_mixin.py:112} INFO - [2020-08-27 23:14:53,006] {local_task_job.py:103} INFO - Task exited with return code 1
Adding --subdir to the initial command doesn't actually get propagated to the task queue, and results in the same log output.
I need to enable a few Feature Gates on my bare-metal K8s cluster(v1.13). I've tried using the kubelet flag --config to enable them, as kubelet --feature-gates <feature gate> throws an error saying that the feature has been deprecated.
I've created a .yml file with the following configuration:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
feature-gates:
VolumeSnapshotDataSource=true
and after running: "kubelet --config , I got the following error:
I0119 21:59:52.987945 29087 server.go:417] Version: v1.14.2
I0119 21:59:52.988165 29087 plugins.go:103] No cloud provider specified.
W0119 21:59:52.988188 29087 server.go:556] standalone mode, no API client
F0119 21:59:52.988203 29087 server.go:265] failed to run Kubelet: no client provided, cannot use webhook authentication
Does anyone know what could be happening and how to fix this problem?
You don't apply --feature-gates to the kubelet. You do it to the API-server. Depending on how have you installed kubernetes on bare metal, you would need to either stop API-server, edit the command you start it with and add the following parameter:
--feature-gates=VolumeSnapshotDataSource=true
Or, if it is in a pod, find the manifest, edit it and re-deploy it (it should happen automatically, once you finish editing). It should look like this:
...
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.132.0.48
- --allow-privileged=true
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
- --insecure-port=0
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-cluster-ip-range=10.96.0.0/12
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --feature-gates=VolumeSnapshotDataSource=true
image: k8s.gcr.io/kube-apiserver:v1.16.4
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 10.132.0.48
path: /healthz
port: 6443
scheme: HTTPS
...
It(VolumeSnapshotDataSource) is a default feature in 1.17 beta releases. It needs to be enabled in API server if the kubernetes version is less than 1.17.
I am working on a new scheduler's stress test in Kubernetes. I need to open a lot of CPU and memory pods to analyze performance.
I am using image: polinux/stress in my pods.
I would like to ask if there is any instruction, or when I write the yaml file, I can set this successfully generated pod to delete itself within the time set by me.
The following yaml file is the pod I am writing for stress testing. I would like to ask if I can write it from here to let him delete it after a period of time.
apiVersion: v1
kind: Pod
metadata:
name: alltest12
namespace: test
spec:
containers:
- name: alltest
image: polinux/stress
resources:
requests:
memory: "1000Mi"
cpu: "1"
limits:
memory: "1000Mi"
cpu: "1"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "500M", "--vm-hang", "1"]
If polinux/stress contains a shell, I believe you can have the thing kill itself:
- containers:
image: polinux/stress
command:
- sh
- -c
- |
sh -c "sleep 300; kill -9 1" &
stress --vm 1 --vm-bytes 500M --vm-hang 1
Or even slightly opposite:
- |
stress --vm etc etc &
child_pid=$!
sleep 300
kill -9 $child_pid
And you can parameterize that setup using env::
env:
- name: LIVE_SECONDS
value: "300"
command:
- sh
- -c
- |
sleep ${LIVE_SECONDS}
kill -9 $child_pid