Unable to access EFS from ECS Fargate task - amazon-ecs

Trying to launch a Fargate task that uses an EFS Volume.
When starting the task from ECS Console, I'm getting this error :
ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: Failed to resolve "fs-019a4b2d1774c5586.efs.eu-west-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first. : unsuccessful EFS utils command execution; code: 1
File system Id is correct. I've mounted the volume from an ec2 instance in the same VPC, all good.
Following steps defined here : https://aws.amazon.com/premiumsupport/knowledge-center/ecs-fargate-mount-efs-containers-tasks/?nc1=h_ls
I cannot figure out where to specify outbound rule for ECS service or task. See image
Thanks in advance.

As #MarkB stated, i've edited the outbound rule and added the port 2049 (NFS) to the EFS security group, and it's workin fine.

Basically the ECS'S security group should allow ssh in the ingress and nfs protocol on the port 2049 to the Securitygroup of the mount target and
Mount target's security group should allow nfs protocol on the 2049 port.

Related

How to fix error with GitLab runner inside Kubernetes cluster - try setting KUBERNETES_MASTER environment variable

I have setup two VMs that I am using throughout my journey of educating myself in CI/CD, GitLab, Kubernetes, Cloud Computing in general and so on. Both VMs have Ubuntu 22.04 Server as a host.
VM1 - MicroK8s Kubernetes cluster
Most of the setup is "default". Since I'm not really that knowledgeable, I have only configured two pods and their respective services - one with PostGIS and the other one with GeoServer. My intent is to add a third pod, which is the deployment of a app that I a have in VM2 and that will communicate with the GeoServer in order to provide a simple map web service (Leaflet + Django). All pods are exposed both within the cluster via internal IPs as well as externally (externalIp).
I have also installed two GitLab-related components here:
GitLab Runner with Kubernetes as executor
GitLab Kubernetes Agent
In VM2 both are visible as connected.
VM2 - GitLab
Here is where GitLab (default installation, latest version) runs. In the configuration (/etc/gitlab/gitlab.rb) I have enabled the agent server.
Initially I had the runner in VM1 configured to have Docker as executor. I had not issues with that. However then I thought it would be nice to try out running the runner inside the cluster so that everything is capsuled (using the internal cluster IPs without further configuration and exposing the VM's operating system).
Both the runner and agent are showing as connected but running a pseudo-CI/CD pipeline (the one provided by GitLab, where you have build, test and deploy stages with each consisting of a simple echo and waiting for a few seconds) returns the following error:
Running with gitlab-runner 15.8.2 (4d1ca121)
on testcluster-k8s-runner Hko2pDKZ, system ID: s_072d6d140cfe
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
ERROR: Job failed (system failure): getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
I am unable to find any information regarding KUBERNETES_MASTER except in issue tickets (GitLab) and questions (SO and other Q&A platforms). I have no idea what it is, where to set it. My guess would be it belongs in the runner's configuration on VM1 or at least the environment of the gitlab-runner (the user that contains the runner's userspace with its respective /home/gitlab-runner directory).
The only one possible solution I have found so far is to create the .kube directory from the user which uses kubectl (in my case microk8s kubectl since I use MicroK8s) to the home directory of the GitLab runner. I didn't see anything special in this directory (no hidden files) except for a cache subdirectory, hence my decision to simply create it at /home/gitlab-runner/.kube, which didn't change a thing.

Aws ECS Fargate enforce readonlyfilesystem

I need to enforce on ECS Fargate services 'readonlyrootFileSystem' to reduce Security hub vulnerabilities.
I thought it was an easy task by just setting it true in the task definition.
But it backfired as the service does not deploy because the commands in the dockerfile are not executed because they do not have access to folders and also this is incompatible with ssm execute commands, so I won't be able to get inside the container.
I managed to set the readonlyrootFileSystem To true and have my service back on by mounting a volume. To do I mounted a tmp volume that is used by the container to install dependencies at start and a data volume to store data (updates).
So now according to the documentation the security hub vulnerability should be fixed as the rule needs that variable not be False but still security hub is flagging the task as non complaint.
---More update---
the task definition of my service spins also a datadog image for monitoring. That also needs to have its filesystem as readonly to satisfy security hub.
Here I cannot solve as above because datadog agent needs access to /etc/ folder and if I mount a volume there I will lose files and the service wont' start.
is there a way out of this?
Any ideas?
In case someone stumbles into this.
The solution (or workaround, call it as you please), was to set readonlyrootFileSystem True for both container and sidecard (datadog in this case) and use bind mounts.
The rules for monitoring ECS using datadog can be found here
The bind mount that you need to add for your service depend on how you have setup your dockerfile.
in my case it was about adding a volume for downloading data.
Moreover since with readonly FS ECS exec (SSM) does not work, if you want this you also have to add mounts: if added two mounts in /var/lib/amazon and /var/log/amazon. This will allow to have ssm (docker exec basically into your container)
As for datadog, I just needed to fix the mounts so that the agent could work. In my case, since it was again a custom image, I mounted a volume on /etc/datadog-agent.
happy days!

Issue with mounting EFS access point from an AWS ECS Fargate task (Cloudformation related)

I am using AWS cloudformation to provision some resources. Part of it is to create an ECS task definition that will mount an EFS access point. A custom resource is defined in Cloudformation which a lambda function in Python will run the ECS Fargate task. However, when I create a stack from the Cloudformation template to provision all the things, the ECS task failed to mount the EFS through the access point with the following error message:
ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: Failed to resolve "fs-082b4402fbb9c9972.efs.us-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first. : unsuccessful EFS utils command execution; code: 1
I met similar error before when I created the ECS task in an AZ without a mount target. But it is definitely not the case.
If I run the ECS task manually from the console or run a local python code to run the ECS task, there are no error at all.
Since the cloudformation template are nested templates which create all VPC and other resources together, I am not sure whether the Cloudformation custom resource (Lambda calling ECS task) should have more DependsOn: resources. I have already added the mount targets and acccess point to DependsOn:.
Tried to separate the Cloudformation custom resource into another file so that this part will only be created after all other parts of stack are completed. However, the result is the same.
PS I have added 300 seconds delay to the lambda function which call the ECS task. It works normally afterwards. Then I tried to create the original stack without 300 seconds' delay. The result is also positive. Just wondered what the problem was.

Hashicorp Vault: Is it possible to make edits to pre-existing server configuration file?

I have a Kubernetes cluster which utilizes Vault secrets. I am attempting to modify the conf.hcl that was used to establish Vault. I went into the pod which contains Vault, and appended:
max_lease_ttl = "999h"
default_lease_ttl = "999h"
I did attempt to apply the changes using the only server option available according to the documentation, but failed due to it already being established:
vault server -config conf.hcl
Error initializing listener of type tcp: listen tcp4 0.0.0.0:8200: bind: address already in use
You can't reinitialize in the pod since it's the port is already bound on the containers (Vault is already running there).
You need to restart the pod/deployment with a new config. Not sure how your Vault deployment is configured but the config could be in the container itself, or in some mounted volume or perhaps a ConfigMap.

Unable to connect to the server: dial tcp [::1]:8080: connectex: No connection could be made because the target machine actively refused it

Am working on Azure Kubernates where we can store Docker Images in Azure. Here am trying to check my kubectl version, then am getting
Unable to connect to the server: dial tcp [::1]:8080: connectex: No
connection could be made because the target machine actively refused
it.
For this I followed MSDN:uilding Microservices with AKS and VSTS – Part 2 and MSDOCS:Kubernetes on windows
So, can you please suggest me “How to resolve for this issue?”
I am on windows 10, and for me I did not enable kubernetes on Docker Desktop.
As you can see here, there are no contexts available.
So go to settings of docker desktop and enable it as follows.
Now run a command as follows.
kubectl config get-contexts
Ensure you see something like this.
Also you can also try listing the nodes as follows.
kubectl get nodes
I think you might missed out to configure the cluster, for that you need to run the below command in your command prompt.
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster
The above CLI command creates .config file with complete cluster and nodes details in your local machine.
After that you run kubectl get nodes command in your command prompt, then you can get the list of nodes inside the cluster like in the below image.
For reference follow this Deploy an Azure Kubernetes Service (AKS) cluster.
If you can see that your config file is correctly configured by going to $HOME/.kube/config - Linux or %UserProfile%/.kube/config - Windows but you are still receiving the error message - try running command line as an administrator.
More information on the config file can be found here: https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/
In my case, I was shuffling between az aks k8s cluster and local docker-desktop.
So every time I change the cluster context I need to restart the docker, else I get the same described error.
Unable to connect to the server: dial tcp 127.0.0.1:6443: connectex: No connection could be made because the target machine actively refused it.
PS: make sure your cluster is started as shown in this picture showing (Stop local cluster)
For me it appeared to be due to Windows not having a HOME environment variable set. According to the docs kubectl will use the config file $(HOME)/.kube/config. But since this variable isn't set on Window it can't locate the file.
I created a HOME variable with the same value as USERPROFILE and it started working.
I'm using Hyper-V on Local Windows and I met this error because I didn't configure minikube.
(I know the question is about Azure, not minikube. But this article is on the top for the error message. So, I've put the solution here.)
1. enable Hyper-V.
Type in systeminfo on your Terminal. If you can find the line below,
Hyper-V Requirements: A hypervisor has been detected. Features required for Hyper-V will not be displayed.
Hyper-V works correctly.
If you can't, enable it from settings.
2. Create Hyper-V Network Switch
Open Hyper-V manager. (Searching it is the fastest way.)
Next, click your PC name on the left.
Then, you can find Virtual Switch Manager menu on the right.
Click it and choose External Virtual Switch with name: "Minikube Switch"
Click apply to create it.
3. start minikube
Go back to terminal and type in:
minikube start --vm-driver hyperv --hyperv-virtual-switch "Minikube Switch"
For more information, check the steps in this article.
Check docker is running and you started minikube or whichever cloud kube you using.
my issue resolved after running "minikube start --driver=docker"
Essentially this problem occurs if your minikube or kind isn't configured. Just try to restart your minikube or kind. If that doesn't solve your problem then try to restart your hypervisor which minikube uses.
minikube start
This command solved my issue.
I was facing the same error while firing the command "kubectl get pods"
The issue has been resolved by having following steps below:
a) First find out current-context
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
b) if no context is set then set it manually by using
kubectl config set-context <Your context>
Hope this will help you.
If you're facing this error on windows, its possible that your docker instance is not running.
These are the steps I followed to replicate the above error;
Stopped docker and then tried to start-up an nginx-deployment. Doing this caused the mentioned error above to happen.
How did I solve it?
Check if minikube is running in my case this was not running
Start minikube
Retry applying your configuration above. In my case see the screenshot below
When you see that your deployment has been created, then all should be fine.
I had exactly the same problem even after having correct config (by running an azure cli command).
It seems that kubectl expects HOME env.variable set but it did not exist for me. There is however a solution:
If you add a KUBECONFIG environmental variable that will point to config it will start working.
Example:
setx KUBECONFIG %UserProfile%\.kube\config
When the variable is present kubectl has no troubles reading from file.
P.S. It is an alternative to setting a HOME variable as suggested in another answer.
Azure self-hosted agent doesn't have the permission to access Kubernates cluster:
Remove Azure self-hosted agent - .\config.cmd Remove
configure again ( .\config.cmd) with a user have permission to access Kubernates cluster
I encountered similar problem:
> kubectl cluster-info
"To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Unable to connect to the server: dial tcp xxx.x.x.x:8080: connectex: No connection could be made because the target machine actively refused it."
> kubectl cluster-info dump
Unable to connect to the server: dial tcp xxx.0.0.x:8080: connectex: No connection could be made because the target machine actively refused it.
This setup was working fine until Docker for Desktop bought it's own copy of kubectl. There are 2 ways to overcome this situation:
1 - Quit / Stop Docker for Desktop while using the cluster
2 - Set KUBECONFIG file path
I tried both the options and they worked.
Found a good source for .kube/config, sending it over here for quick reference:
apiVersion: v1
clusters:
- cluster:
certificate-authority: fake-ca-file
server: https://1.2.3.4
name: development
- cluster:
insecure-skip-tls-verify: true
server: https://5.6.7.8
name: scratch
contexts:
- context:
cluster: development
namespace: frontend
user: developer
name: dev-frontend
- context:
cluster: development
namespace: storage
user: developer
name: dev-storage
- context:
cluster: scratch
namespace: default
user: experimenter
name: exp-scratch
current-context: ""
kind: Config
preferences: {}
users:
- name: developer
user:
client-certificate: fake-cert-file
client-key: fake-key-file
- name: experimenter
user:
password: some-password
username: exp
Reference: https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/
Following #ilya-chernomordik,
I've added my config path to the System Variable by doing
setx KUBECONFIG "D:\Minikube\Minikube.minikube\config"
I have changed the default Location from C: Drive to D: Drive as i have less space in C.
Now the problem is fixed.
edit: after 5 mins, the api server again stopped. It's been more than 5-6 hours i'm trying to solve this issue. I'm not sure why this problem is happening, even after adding the coreect path.
On Rancher Desktop, make sure context is correctly choosen
In my situation, I'm in windows with docker desktop in a simple scenario just for studies, but the case is:
In the docker version in 20.10 or above, it come with kubernetes installed. Then it doesn't necessary installed a cluster adm like minikube. Then, when it just need to enable kubernetes in Docker Desktop configuration. Like:
Go to Docker Desktop: settings > kubernetes > check the box inside section Enable kubernetes and then click in Restart Kubernetes Cluster
When we do this, the docker provide all needed to works Kubernetes properly.
Referenced by: Blog