Error copying pip.conf from bucket to Cloud Composer Airflow environment - google-cloud-storage

Similar to this lonely questioner I'm trying to install a Python package from a private PyPI repo such that it's available to our Google Cloud Composer Airflow instance.
I've followed these instructions but Airflow continues not to know about my package:
No module named 'foopackage'
I can't find any reference to my pip.conf in any logs anywhere so I'm not sure whether the file is in the right place, or has the right contents.
How can I proceed with debugging this problem?
The Cloud Composer environment logs show that there was a problem with copying pip.conf from the bucket, but don't give any other details:
{
insertId: "16qa4c8g540zxs3"
logName: "projects/{my-env}/logs/composer-agent"
receiveTimestamp: "2020-02-06T15:59:03.164564368Z"
resource: {…}
severity: "ERROR"
textPayload: "Copying gs://{my-bucket}/config/pip/pip.conf...
"
timestamp: "2020-02-06T15:59:00.857642186Z"
}
I first thought this might be a permissions issue, but the file seems to have the same set of permissions as other files in this bucket.
Where can I get more detailed information on what went wrong when copying that file?
update
I'm on composer-1.7.2-airflow-1.10.2.
update
The service account for my Composer environment already has the project.editor role.

This is an indicator that the Docker image used for the web server failed to build. To find the root cause, please view Cloud Build logs in project.
The reason for this, is a failed or taking long time operation, it timed out on the Composer’s backend. In some cases these errors persist in the backend, blocking future attempts. You can try re-enabling the API:
First solution that comes to my mind is running following commands in cloud shell:
gcloud services disable composer.googleapis.com
gcloud services enable composer.googleapis.com
After enabling the API, please update your Composer environment as usual.
When you install packages, the Composer environment re-creates Docker containers for the Airflow workers and scheduler, then performs a rolling update within the GKE cluster to update the workers to keep workers available. You can check Kubernetes Engine > Workloads to see if your environment timed out because of waiting for the scheduler and workers to come back online.
When Composer environment is using a custom service account and does not have IAM access to use Cloud Build, builds will fail immediately, so please check it. You can diagnose these by going to Cloud Build > History, and when you see builds without a log, it means that builds failed even before trying to build a container.
When your package implement bindings, it will fail at runtime if the libraries don't exist on the system. This means it is incompatible with Cloud Composer, because getting shared libraries into the build environment is not currently supported.
Another thing, make sure if your project is packed in correct way.
I hope you find the above pieces of information useful.

Related

Google Cloud Composer failed after restart

I have Google cloud composer running in 2 GCP projects. I have updated composer environment variable in both. One composer restarted fine within few minutes. I have problem in another & it shows below error as shown in images.
Update operation failed. Couldn't start composer-agent, a GKE job that updates kubernetes resources. Please check if your GKE cluster exists and is healthy.
This is the error what I see when I enter the composer
This is the environment overview
GKE cluster notification
GKE pods overview
I am trying to find how to resolve the problem but I didn't find any satisfied answers. My colleagues are assuming firewall & org policies issue but I haven't changed any.
Can some one let me know what caused this problem as the google composer is managed by google & how to resolve this issue now?
Once the Cloud Composer is the managed resource and when the GKE which serves the environment for your composer is unhealthy you should try to contact Google Cloud Support. That GKE should work just fine and you do not need even know about its existence.
Also check whether you do not reacy any limits or quotas in your project.
When nothing helps recreation of Cloud Composer is always good idea.

Anthos Config Management, Config sync not installed

I want to install Config sync via Anthos console UI, so a repository is watched and will deliver configuration to my Kubernetes clusters.
While I don't get any error messages, the cluster never leaves the "In progress" status, although nothing is installed. These are EKS clusters.
Config Management dashboard
I've also used nomos to check for the installation and it shows the same "NOT INSTALLED" status.
The config management operator was installed following this guide (and checked with a bunch of other guides for consistency):
https://cloud.google.com/anthos-config-management/docs/how-to/installing-kubectl#deploying
The interesting fact here is that, if I deploy a configmanagement object with my Git repo configuration using Kubectl, everything works, except that I see this odd configuration in the console, which doesn't help much:
anthos-console
Any pointer to where I can troubleshoot this further will be much appreciated.

ERROR: (gcloud.app.deploy) Error Response: [9] Flex operation projects/.../regions/us-central1/operations/... error [FAILED_PRECONDITION]

I'm pretty new on Google Cloud, and I just wanted to deploy my first streamlit webapp. I'm on Windows in command line. I already did the Google Cloud "Hello World" Example, which worked without any error.
When I deploy the streamlit webapp, I got after 3-4 minutes waiting "Updating Server" the following error:
ERROR: (gcloud.app.deploy) Error Response: [9] Flex operation projects/XXXX/regions/us-central1/operations/f0c89d22-2d09-410d-bf99-fc49ad337800 error [FAILED_PRECONDITION]: An internal error occurred while processing task /app-engine-flex/flex_await_healthy/flex_await_healthy>2021-05-27T06:13:50.278Z10796.jc.0: 2021-05-27 06:15:32.787 An update to the [server] config option section was detected. To have these changes be reflected, please restart streamlit.
That's my app.yaml file:
service: default
runtime: custom
env: flex
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
Posting my comment as an answer for better visibility and to summarize.
In this particular case, the error was caused by a mistake in the Dockerfile.
Here are some steps you can follow to fix or narrow down the error:
Try to deploy a test app to see the differences in configuration. Example.
Try deploying your app after updating the gcloud with gcloud components update command.
Make sure you run the SDK as an Admin.
If the error recurs, run the gcloud app deploy app.yaml --verbosity=debug to try getting more specified error.
It's good practice to include references in questions for folks who aren't familiar with e.g. Streamlit. I assume it's this: https://streamlit.io/
I suspect (!) that Streamlit does not (by default) satisfy App Engine's requirements:
A web app on port 8080
No additional (apt get) dependencies
No C-based dependencies
The Streamlit wiki references various deployment alternatives and includes Google Kubernetes Engine (aka GKE) (see below) but not App Engine.
This doesn't mean that it won't work on App Engine (standard) just that it may not be trivial.
The GKE instructions reference installing Cython an optimizing c-compiler and that gives me pause about using App Engine standard. Unless you're familiar with Kubernetes, I'd discourage you from trying GKE as there's more complexity.
So, it would be helpful if others with experience with Streamlit weigh in but, until then, you may wish to consider using Streamlit sharing.
It would be helpful if someone who has deployed Streamlit to App Engine (flexible?) or perhaps Cloud Run can provide an overview.

Is KubeFlow still supported on GCP?

I am trying to use KubeFlow on GCP and I am following this codelab, but "click-to-deploy" is no longer supported so I followed the documentation of "kubectl and kpt". However, I keep getting this "You cannot perform this action because the Cloud SDK component manager is disabled for this installation." error and none of the solutions I found worked. I have 2 other friends told me they tried to make KubeFlow work since last year, it never worked, but I did see people post question about KubeFlow on Stackoverflow still, so I want to ask if it is still working, if so, where can I find a decent guide to follow?
Thanks!
I finally got it working. For that error message, it turned out that I just didn't install the Cloud SDK properly. There will be a lot of other issues too down the road, but at least the KubeFlow web UI is working for me now.
yes, as the kubectl and kpt says, the first step in getting prepared to install cluster is installing gcloud that is CLI that manages authentication, local configuration, developer workflow, interactions with Google Cloud APIs.
Without is you simply cant work with objects(in your case you need to enable kpt anthoscli beta) and perform tasks like
creating a Compute Engine VM instance, managing a Google Kubernetes
Engine cluster, and deploying an App Engine application, either from
the command line or in scripts and other automations..

Unable to install gcloud SDK

When I try to install google cloud SDK, I was getting the following error:
ERROR:
(gcloud.components.update) Failed to fetch component listing from
server. Check your network settings and try again. This will install
all the core command line tools necessary for working with the Google
Cloud Platform. Failed to install.
After reinstalling python (v3.7.0), I added the path and also added CLOUDSDK_PYTHON environment variable to make sure. Now when I attempt the installation, the installation simply hangs:
If I attempt the installation trough terminal by executing install.bat, it also gets stuck after requesting to send diagnostics to google:
Welcome to the Google Cloud SDK!
Active code page: 65001
To help improve the quality of this product, we collect anonymized usage data
and anonymized stacktraces when crashes are encountered; additional information
is available at <https://cloud.google.com/sdk/usage-statistics>. This data is
handled in accordance with our privacy policy
<https://policies.google.com/privacy>. You may choose to opt in this
collection now (by choosing 'Y' at the below prompt), or at any time in the
future by running the following command:
gcloud config set disable_usage_reporting false
Do you want to help improve the Google Cloud SDK (y/N)? n
Nothing gets printed after that.
it seems that the "(gcloud.components.update) Failed to fetch component listing from server" error might be caused by some proxies or antivirus in your environment, I'd recommend you to try a clean installation in a vm or using another network.
Also, I was able to find soem similar errors for this on issue tracker and the team gave a soltion at comment10, also, as you can see on the issue tracker, sometimes this behavior is because the Python SDK is installed on default "Program Files" location, you could give it a try by changing the location of the python SDK