Google Cloud Composer failed after restart - kubernetes

I have Google cloud composer running in 2 GCP projects. I have updated composer environment variable in both. One composer restarted fine within few minutes. I have problem in another & it shows below error as shown in images.
Update operation failed. Couldn't start composer-agent, a GKE job that updates kubernetes resources. Please check if your GKE cluster exists and is healthy.
This is the error what I see when I enter the composer
This is the environment overview
GKE cluster notification
GKE pods overview
I am trying to find how to resolve the problem but I didn't find any satisfied answers. My colleagues are assuming firewall & org policies issue but I haven't changed any.
Can some one let me know what caused this problem as the google composer is managed by google & how to resolve this issue now?

Once the Cloud Composer is the managed resource and when the GKE which serves the environment for your composer is unhealthy you should try to contact Google Cloud Support. That GKE should work just fine and you do not need even know about its existence.
Also check whether you do not reacy any limits or quotas in your project.
When nothing helps recreation of Cloud Composer is always good idea.

Related

Cloud Run for Anthos is not available under deployment in GCP

I tried to add a cluster into a cloud run using Anthos. In all the tutorials and forums am seeing the "Cloud Run(fully managed) and Cloud Run for Anthos" options. But when i tried I am not seeing these options under deployment.
I even tried to add the cluster from the option "Cloud Run for Anthos". It is throwing the below exception
"Cloud Run for Anthos is no longer available as a GKE add-on, and is now installed using Anthos fleets: https://cloud.google.com/anthos/run/docs/install"
The add-on itself is not getting enabled. Even though I enabled the "Cloud Run API" still have the same issue am facing.
In the trial version, Anthos would not get enabled. or what am I missing here?
please help me to resolve this issue. I have attached the screenshot for the reference.
Cloud run (fully managed) and cloud run for anthos are two diffrent products even if they have the same name ,
Cloud run for anthos is basically knative (older version) , however cloud run (fully managed) is a new technology developed by google, from what i understand the backend is not kubernetes,
If you want to use cloud run for anthos you should create your cluster from the anthos interface and not from the gke interface and enable cloud run for anthos,
i recommend using knative instead because you get to use all the new features (node selectors...) witch are not included in cloud run for anthos (no roadmap information / realse dates )
https://knative.dev/docs/install/

Anthos Config Management, Config sync not installed

I want to install Config sync via Anthos console UI, so a repository is watched and will deliver configuration to my Kubernetes clusters.
While I don't get any error messages, the cluster never leaves the "In progress" status, although nothing is installed. These are EKS clusters.
Config Management dashboard
I've also used nomos to check for the installation and it shows the same "NOT INSTALLED" status.
The config management operator was installed following this guide (and checked with a bunch of other guides for consistency):
https://cloud.google.com/anthos-config-management/docs/how-to/installing-kubectl#deploying
The interesting fact here is that, if I deploy a configmanagement object with my Git repo configuration using Kubectl, everything works, except that I see this odd configuration in the console, which doesn't help much:
anthos-console
Any pointer to where I can troubleshoot this further will be much appreciated.

Is KubeFlow still supported on GCP?

I am trying to use KubeFlow on GCP and I am following this codelab, but "click-to-deploy" is no longer supported so I followed the documentation of "kubectl and kpt". However, I keep getting this "You cannot perform this action because the Cloud SDK component manager is disabled for this installation." error and none of the solutions I found worked. I have 2 other friends told me they tried to make KubeFlow work since last year, it never worked, but I did see people post question about KubeFlow on Stackoverflow still, so I want to ask if it is still working, if so, where can I find a decent guide to follow?
Thanks!
I finally got it working. For that error message, it turned out that I just didn't install the Cloud SDK properly. There will be a lot of other issues too down the road, but at least the KubeFlow web UI is working for me now.
yes, as the kubectl and kpt says, the first step in getting prepared to install cluster is installing gcloud that is CLI that manages authentication, local configuration, developer workflow, interactions with Google Cloud APIs.
Without is you simply cant work with objects(in your case you need to enable kpt anthoscli beta) and perform tasks like
creating a Compute Engine VM instance, managing a Google Kubernetes
Engine cluster, and deploying an App Engine application, either from
the command line or in scripts and other automations..

creating a proper kubeconfig file for a 2 node gentoo linux kubernetes cluster

I have two servers at my home with Gentoo Linux ~amd64.I would like to install Kubernetes on them to play with it a bit.
Gentoo now packages all the Kubernetes related dependencies under one package called sys-cluster/kubernetes and the latest version available at the moment is 1.18.3.
the last time I played with Kubernetes was several years ago and I think I completely forgot everything.
so I installed kubernetes on both servers. since I use systemd and the package contains only kubelet systemd service I created systemd init scripts for also kube-apiserver, kube-controller-manager, kube-proxy and kube-scheduler.
now this package also comes with kubeadm but I would like to know how to install and configure kubernetes manually.
now I want to create a kubeconfig file for my cluster configuration.
I googled and found the following url: http://docs.shippable.com/deploy/tutorial/create-kubeconfig-for-self-hosted-kubernetes-cluster/
the first step is Make sure you can access the cluster but I thought I wanted to create kubeconfig in order for the services to properly know how to access my cluster!
this web site already talks about secrets that where already configured which aren't.. i'm starting from scratch and this is not probably the way to go.
In general I want to know how to properly create a kubeconfig file for my setup, then i'll configure the services to use this kubeconfig file and go on from there.
so any information regarding this issue would be greatly appreciated.
so I asked this also in Kubernetes slack channel and they provided me this project: https://github.com/kelseyhightower/kubernetes-the-hard-way
it's a documentation project on how to configure kubernetes the hard way, in the documentation they set it up in google cloud, but it's easy to understand what they did on cloud and how to configure the same on your network.

Error copying pip.conf from bucket to Cloud Composer Airflow environment

Similar to this lonely questioner I'm trying to install a Python package from a private PyPI repo such that it's available to our Google Cloud Composer Airflow instance.
I've followed these instructions but Airflow continues not to know about my package:
No module named 'foopackage'
I can't find any reference to my pip.conf in any logs anywhere so I'm not sure whether the file is in the right place, or has the right contents.
How can I proceed with debugging this problem?
The Cloud Composer environment logs show that there was a problem with copying pip.conf from the bucket, but don't give any other details:
{
insertId: "16qa4c8g540zxs3"
logName: "projects/{my-env}/logs/composer-agent"
receiveTimestamp: "2020-02-06T15:59:03.164564368Z"
resource: {…}
severity: "ERROR"
textPayload: "Copying gs://{my-bucket}/config/pip/pip.conf...
"
timestamp: "2020-02-06T15:59:00.857642186Z"
}
I first thought this might be a permissions issue, but the file seems to have the same set of permissions as other files in this bucket.
Where can I get more detailed information on what went wrong when copying that file?
update
I'm on composer-1.7.2-airflow-1.10.2.
update
The service account for my Composer environment already has the project.editor role.
This is an indicator that the Docker image used for the web server failed to build. To find the root cause, please view Cloud Build logs in project.
The reason for this, is a failed or taking long time operation, it timed out on the Composer’s backend. In some cases these errors persist in the backend, blocking future attempts. You can try re-enabling the API:
First solution that comes to my mind is running following commands in cloud shell:
gcloud services disable composer.googleapis.com
gcloud services enable composer.googleapis.com
After enabling the API, please update your Composer environment as usual.
When you install packages, the Composer environment re-creates Docker containers for the Airflow workers and scheduler, then performs a rolling update within the GKE cluster to update the workers to keep workers available. You can check Kubernetes Engine > Workloads to see if your environment timed out because of waiting for the scheduler and workers to come back online.
When Composer environment is using a custom service account and does not have IAM access to use Cloud Build, builds will fail immediately, so please check it. You can diagnose these by going to Cloud Build > History, and when you see builds without a log, it means that builds failed even before trying to build a container.
When your package implement bindings, it will fail at runtime if the libraries don't exist on the system. This means it is incompatible with Cloud Composer, because getting shared libraries into the build environment is not currently supported.
Another thing, make sure if your project is packed in correct way.
I hope you find the above pieces of information useful.