Unable to create node object manually in EKS

Unable to create node object manually in EKS - kubernetes

I'm trying to setup kubedge with cloudcore in EKS (k8s version 1.21.12) and edgecore in an external server. As part of the kubeedge setup, I had to create a node object manually in cloudside which will be labelled as edge node.
But when I do the kubectl apply -f node.json, I'm getting the following response:
C:\Users\akhi1\manifest>kubectl apply -f node.json
node/edge-node-01 created
C:\Users\akhi1\manifest>kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-xx-xx-xxx-213.ap-southeast-1.compute.internal Ready <none> 3h48m v1.21.12-eks-xxxx << this node was already in my eks cluster
As you can see, I'm not able see the newly created node 'edge-node-01' in the list.
On checking the kube events, I got the following:
C:\Users\akhi1\manifests>kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
13m Normal DeletingNode node/edge-node-01 Deleting node edge-node-01 because it does not exist in the cloud provider
For manually node registration, I followed this doc:
https://kubernetes.io/docs/concepts/architecture/nodes/#manual-node-administration
My node.json would look like this:
{
"kind": "Node",
"apiVersion": "v1",
"metadata": {
"name": "edge-node-01",
"labels": {
"name": "my-first-k8s-edge-node"
}
}
}
I have also checked, node restriction and admission controller but couldn't find anything related to it.
Please let me know why, eks is blocking me to create a node object that doesn't have an underlying ec2 attached.
Thanks in advance,
Akhil

Related

Kubernetes Rest API to change existing secret/configmap in Pod

I have deployed Pod using kubernetes Rest API POST /api/v1/namespaces/{namespace}/pods
The request body has the podspec with volumes something as below:
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "test",
"namespace": "default"
},
"spec": {
"volumes": [
{
"name": "test-secrets",
"secret": {
"secretName": "test-secret-one"
}
}
],
"containers":[
<<container json>>.........
]
}
}
Now I want to change the secret name test-secret-one to test-secret-two for the Pod?
How can I achieve this? And what Rest API I need to use?
Patch rest API - I can use the change container image but cant be used for Volumes. If this can be used Can you give me an example or reference?
Is there any Kubernetes Rest API to restart the Pod. Note that we are not using a deployment object model. It is directly deployed as Pod, not as deployment.
Can anyone help here?

I'm posting the answer as Community Wiki as solution came from #Matt in the comments.
Volumes aren't updatable fields, you will need to recreate the pod
with the new spec.
The answer to most of your questions is use a deployment and patch it.
The deployment will manage the updates and restarts for you.
A different approach is also possible and was suggested by #Kitt:
If you only update the content of Secrets and Configmap instead of
renaming them, the mounted volumes will be refreshed by kubelet in
the duration --sync-frequency(1m as default).

GKE nodes unexpectedly deleted and recreated

I created a cluster on Google Kubernetes Engine. The nodes get deleted/created very often (at least once a day). Even though new instances are created to replace them, and pods are moved to these new nodes, I would like to understand why the nodes disappear.
I checked the settings used to create the cluster and the node pool:
"Automatic node upgrade" is Disabled on the node pool.
"Pre-emptible nodes" is Disabled.
"Automatic node repair" is Enabled, but I doesn't look like there was a node repair, since I don't see anything in gcloud container operations list at the time when my nodes were deleted.
I can see that the current nodes were all (re-)created at 21:00, while the cluster was created at 08:35 :
➜ ~ gcloud container clusters describe my-cluster --format=json
{
"createTime": "2019-04-11T08:35:39+00:00",
...
"nodePools": [
{
...
"management": {
"autoRepair": true
},
"name": "default-pool",
}
],
"status": "RUNNING",
...
}
How can I trace the reason why the nodes were deleted ?

I tried to reproduce your problem by creating a cluster, manually stopping the kubelet on a node (by running systemctl stop kubelet) to trigger repair and watching the node recover. In my case, I do see an operation for the auto node repair, but I can also see in the GCE operations log that the VM was deleted and recreated (by the GKE robot account).
If you run gcloud compute operations list (or check the cloud console page for operations) you should see what caused the VM to be deleted and recreated.

just happened to me on Sunday 13/10/2019.
all data from stateful partition also gone

kubefed init says "waiting for the federation control plane to come up" and it never comes up

I've created clusters using kops command. For each cluster I've to create a hosted zone and add namespaces to DNS provider. To create a hosted zone, I've created a sub-domain in the hosted zone in aws(example.com) by using the following command :
ID=$(uuidgen) && aws route53 create-hosted-zone --name subdomain1.example.com --caller-reference $ID | jq .DelegationSet.NameServers
The nameservers I get by executing above command are included in a newly created file subdomain1.json with the following content.
{
"Comment": "Create a subdomain NS record in the parent domain",
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "subdomain1.example.com",
"Type": "NS",
"TTL": 300,
"ResourceRecords": [
{
"Value": "ns-1.awsdns-1.co.uk"
},
{
"Value": "ns-2.awsdns-2.org"
},
{
"Value": "ns-3.awsdns-3.com"
},
{
"Value": "ns-4.awsdns-4.net"
}
]
}
}
]
}
To get the parent-zone-id, I've used the following command:
aws route53 list-hosted-zones | jq '.HostedZones[] | select(.Name=="example.com.") | .Id'
To apply the subdomain NS records to the parent hosted zone-
aws route53 change-resource-record-sets --hosted-zone-id <parent-zone-id> --change-batch file://subdomain1.json
then I created a cluster using kops command-
kops create cluster --name=subdomain1.example.com --master-count=1 --master-zones ap-southeast-1a --node-count=1 --zones=ap-southeast-1a --authorization=rbac --state=s3://example.com --kubernetes-version=1.11.0 --yes
I'm able to create a cluster, validate it and get its nodes. By using the same procedure, I created one more cluster (subdomain2.example.com).
I've set aliases for the two clusters using these commands-
kubectl config set-context subdomain1 --cluster=subdomain1.example.com --user=subdomain1.example.com
kubectl config set-context subdomain2 --cluster=subdomain2.example.com --user=subdomain2.example.com
To set up federation between these two clusters, I've used these commands-
kubectl config use-context subdomain1
kubectl create clusterrolebinding admin-to-cluster-admin-binding --clusterrole=cluster-admin --user=admin
kubefed init interstellar --host-cluster-context=subdomain1 --dns-provider=aws-route53 --dns-zone-name=example.com
-The output of kubefed init command should be
But for me it's showing as "waiting for the federation control plane to come up...", but it does not come up. What might be the error?
I've followed the following tutorial to create 2 clusters.
https://gist.github.com/arun-gupta/02f534c7720c8e9c9a875681b430441a

There was a problem with the default image used for federation api server and controller manager binaries. By default, the below mentioned image is considered for the kubefed init command-
"gcr.io/k8s-jkns-e2e-gce-federation/fcp-amd64:v0.0.0-master_$Format:%h$".
But this image is old and is not available, the federation control plane tries to pull the image but fails. This is the error I was getting.
To rectify it, build a fcp image of your own and push it to some repository and use this image in kubefed init command. Below are the instructions to be executed(Run all of these commands from this path "$GOPATH/src/k8s.io/kubernetes/federation")-
to create fcp image and push it to a repository -
docker load -i _output/release-images/amd64/fcp-amd64.tar
docker tag gcr.io/google_containers/fcp-amd64:v1.9.0-alpha.2.60_430416309f9e58-dirty REGISTRY/REPO/IMAGENAME[:TAG]
docker push REGISTRY/REPO/IMAGENAME[:TAG]
now create a federation control plane with the following command-
_output/dockerized/bin/linux/amd64/kubefed init myfed --host-cluster-context=HOST_CLUSTER_CONTEXT --image=REGISTRY/REPO/IMAGENAME[:TAG] --dns-provider="PROVIDER" --dns-zone-name="YOUR_ZONE" --dns-provider-config=/path/to/provider.conf

Not able to access app deployed on kubernetes cluster

I am getting following error while accessing the app deployed on Azure kubernetes service
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {},
"code": 403
}
I have followed all steps as given here https://learn.microsoft.com/en-us/azure/aks/tutorial-kubernetes-prepare-app
I know that this is something to do with authentication and RBAC, but i don't know what exactly is wrong and where should i make changes.

Just follow the steps in the link you posted. You will be successful in finishing that. The destination of each step below:
Create the image and make sure it can work without any error.
Create an Azure Container Registry and push the image into the registry.
Create a Service Principal for the AKS to let it just can pull the image from the registry.
Change the yaml file and make it pull the image from the Azure Registry, then crate pods in the AKS nodes.
You just need these four steps to run the application on AKS. Then get the IP address through the command kubectl get service azure-vote-front --watch like the step 4. If you can not access the application, check your steps carefully again.
Also, you can check all the pods status through the command kubectl describe pods or one pod with kubectl describe pod podName.
Update
I test with the image you provide and the result here:
And you can get the service information and know which port you should use to browse.

Kubernetes Deployment update crashes ReplicaSet and creates too many Pods

Using Kubernetes I deploy an app to Google Cloud Containerengine on a cluster with 3 smalll instances.
On a first-time deploy, all goes well using:
kubectl create -f deployment.yaml
And:
kubectl create -f service.yaml
Then I change the image in my deployment.yaml and update it like so:
kubectl apply -f deployment.yaml
After the update, a couple of things happen:
Kubernetes updates its Pods correctly, ending up with 3 updated instances.
Short after this, another ReplicaSet is created (?)
Also, the double amount (2 * 3 = 6) of Pods are suddenly present, where half of them have a status of Running, and the other half Unknown.
So I inspected my Pods and came across this error:
FailedSync Error syncing pod, skipping: network is not ready: [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Also I can't use the dashboard anymore using kubectl proxy. The page shows:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service \"kubernetes-dashboard\"",
"reason": "ServiceUnavailable",
"code": 503
}
So I decided to delete all pods forecefully:
kubectl delete pod <pod-name> --grace-period=0 --force
Then, three Pods are triggered for creation, since this is defined in my service.yaml. But upon inspecting my Pods using kubectl describe pods/<pod-name>, I see:
no nodes available to schedule pods
I have no idea where this all went wrong. I essence, all I did was updating an image of a deployment.
Anyone ideas?

I've run into similar issues on Kubernetes. According to your reply to my question on your question (see above):
I noticed that this happens only when I deploy to a micro instance on Google Cloud, which simply has insufficient resources to handle the deployment. Scaling up the initial resources (CPU, Memory) resolved my issue
It seems to me like what's happening here is that the OOM killer from the Linux kernel ends up killing the kubelet, which in turn makes the Node useless to the cluster (and becomes "Unknown").
A real solution to this problem (to prevent an entire node from dropping out of service) is to add resource limits. Make sure you're not just adding requests; add limits because you want your services -- rather than K8s system services -- to be killed so that they can be rescheduled appropriately (if possible).
Also inside of the cluster settings (specifically in the Node Pool -- select from https://console.cloud.google.com/kubernetes/list), there is a box you can check for "Automatic Node Repair" that would at least partially re-mediate this problem rather than giving you an undefined amount of downtime.

If your intention is just to update the image try to use kubectl set image instead. That at least works for me.
By googling kubectl apply a lot of known issues do seem to come up. See this issue for example or this one.
You did not post which version of kubernetes you deployed, but if you can try to upgrade your cluster to the latest version to see if the issue still persists.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse