Google cloud datalab deployment unsuccessful - sort of - deployment

This is a different scenario from other question on this topic. My deployment almost succeeded and I can see the following lines at the end of my log
[datalab].../#015Updating module [datalab]...done.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deployed module [datalab] to [https://main-dot-datalab-dot-.appspot.com]
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Step deploy datalab module succeeded.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deleting VM instance...
The landing page keeps showing a wait bar indicating the deployment is still in progress. I have tried deploying several times in last couple of days.
About additions described on the landing page -
An App Engine "datalab" module is added. - when I click on the pop-out url "https://datalab-dot-.appspot.com/" it throws an error page with "404 page not found"
A "datalab" Compute Engine network is added. - Under "Compute Engine > Operations" I can see a create instance for datalab deployment with my id and a delete instance operation with *******-ompute#developer.gserviceaccount.com id. not sure what it means.
Datalab branch is added to the git repo- Yes and with all the components.
I think the deployment is partially successful. When I visit the landing page again, the only option I see is to deploy the datalab again and not to start it. Can someone spot the problem ? Appreciate the help.
I read the other posts on this topic and tried to verify my deployment using - "https://console.developers.google.com/apis/api/source/overview?project=" I get the following message-
The API doesn't exist or you don't have permission to access it

You can try looking at the App Engine dashboard here, to verify that there is a "datalab" service deployed.
If that is missing, then you need to redeploy again (or switch to the new locally-run version).
If that is present, then you should also be able to see a "datalab" network here, and a VM instance named something like "gae-datalab-main-..." here. If either of those are missing, then try going back to the App Engine console, deleting the "datalab" service, and redeploying.

Related

Azure Devops: Pipeline fails to deploy to Linux Web App

I have a pipeline deploying to my Azure web app, that most of the times errors out because it couldn't deploy to my web app. The task take around 25 mins :
...
Copying file: 'frontend/.gitignore'
Copying file: 'frontend/README.md'
Copying file: 'frontend/package.json'
Copying file: 'frontend/tsconfig.json'
Copying file: 'frontend/yarn.lock'
Omitting next output lines...
An error has occurred during web site deployment.
Kudu Sync failed
\n/opt/Kudu/Scripts/starter.sh "/home/site/deployments/tools/deploy.sh"
##[error]Failed to deploy web package to App Service.
##[error]To debug further please check Kudu stack trace URL : https://$someapp:***#someapp.scm.azurewebsites.net/api/vfs/LogFiles/kudu/trace
##[error]Error: Package deployment using ZIP Deploy failed. Refer logs for more details.
...
When i enable : system.debug = true , i see these logs repeated many time , before start copying the artifact files :
POLL URL RESULT: {"statusCode":202,"statusMessage":"Accepted","headers":{"transfer-encoding":"chunked","content-type":"application/json; charset=utf-8","location":"http://XXXXXXXXX.scm.azurewebsites.net:80/api/deployments/latest?deployer=VSTS_ZIP_DEPLOY&time=2021-07-09_09-01-41Z","server":"Kestrel","date":"Fri, 09 Jul 2021 09:23:37 GMT","connection":"close"},"body":{"id":"68a7a8811796416b993924437493ff87","status":0,"status_text":"Building and Deploying '68a7a8811796416b993924437493ff87'.","author_email":"N/A","author":"N/A","deployer":"VSTS_ZIP_DEPLOY","message":"Created via a push deployment","progress":"Running deployment command...","received_time":"2021-07-09T09:01:50.4159225Z","start_time":"2021-07-09T09:01:51.775357Z","end_time":null,"last_success_end_time":null,"complete":false,"active":false,"is_temp":false,"is_readonly":true,"url":null,"log_url":null,"site_name":"XXXXXXXXXXXXe"}}
Deployment status: 0 'Building and Deploying '68a7a8811796416b993924437493ff87'.'. retry after 5 seconds
setting affinity cookie ["ARRAffinity=c06e9bb74f52245b3695b3079a52f6acbc70c3ee812f67e4fa3f5f65088ff4f7;Path=/;HttpOnly;Secure;Domain=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX.scm.azurewebsites.net","ARRAffinitySameSite=c06e9bb74f52245b3695b3079a52f6acbc70c3ee812f67e4fa3f5f65088ff4f7;Path=/;HttpOnly;SameSite=None;Secure;Domain=XXXXXXXXXXXXXXX.scm.azurewebsites.net"]
[GET]https://XXXXXXXXXXX-test.scm.azurewebsites.net:443/api/deployments/latest?deployer=VSTS_ZIP_DEPLOY&time=2021-07-09_09-01-41Z
POLL URL RESULT: {"statusCode":202,"statusMessage":"Accepted","headers":{"transfer-encoding":"chunked","content-type":"application/json; charset=utf-8","location":"http://XXXXXXXXXXXXXXXXXXX.scm.azurewebsites.net:80/api/deployments/latest?deployer=VSTS_ZIP_DEPLOY&time=2021-07-09_09-01-41Z","server":"Kestrel","date":"Fri, 09 Jul 2021 09:23:45 GMT","connection":"close"},"body":{"id":"68a7a8811796416b993924437493ff87","status":0,"status_text":"Building and Deploying '68a7a8811796416b993924437493ff87'.","author_email":"N/A","author":"N/A","deployer":"VSTS_ZIP_DEPLOY","message":"Created via a push deployment","progress":"Running deployment command...","received_time":"2021-07-09T09:01:50.4159225Z","start_time":"2021-07-09T09:01:51.775357Z","end_time":null,"last_success_end_time":null,"complete":false,"active":false,"is_temp":false,"is_readonly":true,"url":null,"log_url":null,"site_name":"XXXXXXXXXXXX"}}
Deployment status: 0 'Building and Deploying '68a7a8811796416b993924437493ff87'.'. retry after 5 seconds
setting affinity cookie ["ARRAffinity=c06e9bb74f52245b3695b3079a52f6acbc70c3ee812f67e4fa3f5f65088ff4f7;Path=/;HttpOnly;Secure;Domain=XXXXXXXXXXXXXXXXXXXXXX.scm.azurewebsites.net","ARRAffinitySameSite=c06e9bb74f52245b3695b3079a52f6acbc70c3ee812f67e4fa3f5f65088ff4f7;Path=/;HttpOnly;SameSite=None;Secure;Domain=XXXXXXXXXXXXXXXXXX"]
This task fails only in specific slot in myweb app , authors slots and production slot works fine and the job take around 6 mins
Any ideas what could be wrong?
As per the discussion and troubleshooting performed here, I tried to setup a Linux App Service on Standard S1 pricing tier enabling 5 (max) slots with CI/CD configured via Azure Pipelines. Unfortunately, I wasn't able to reproduce the same error as yours despite multiple different trials.
I'd suggest you to try the following:
Kudu Sync failed in the deployment log resembles this open issue from about a year ago: ZipDelpoy on azure web app linux fails during kudu sync #2972. Please check the trace/deployment log files on kudu at https://<appname>.scm.azurewebsites.net/api/vfs/LogFiles/kudu/trace or /deployment or from Kudu's DebugConsole (/LogFiles/kudu/\*) and check if this is caused by deployment lock failures. In that case, check this wiki out for dealing with locked files during deployment.
Try a different deployment method like run from package (to avoid resource locking), using FTP/S, or local git deployment.
This should help you narrow down the issue further, whether it is caused in the App service/deployment method, or the ADO pipeline/task.
Scale up to the next higher tier and re-trigger your pipeline. If it succeeds, you may scale back down to the original tier. This would indirectly restart your SCM sites as well.
If the above workarounds don't help, you could check on the following:
Customize your deploy task with options like TakeAppOfflineFlag, DeploymentType or RenameFilesFlag to streamline your deployment.
Try restarting the app/slot just before the deployment in order to recycle the app pool.
Check if your app is running into any of the prescribed limits (ex: file system storage) for your tier.
Drill down into available metrics for your app to identify any CPU/Memory anomalies.
Try the Diagnose and solve problems tool for any additional insights about your app.
If your environment permits, try setting up and deploying to a new slot within your App Service, or try verifying if this happens to another app in a different region.

Kubeflow fails to deploy using both CLI and Console

I deleted my KF cluster last night to create a new one (using kubectl cluster command not Kfctl delete), and then when I tied to create a new one, it fails, it does not work with CLI not Console. I found other people have run into this issue before, for example (here and here)
"However, as I said even with CLI my deployment fails, the error from console is:
ailed to apply: (kubeflow.error): Code 500 with message: coordinator Apply failed for gcp: (kubeflow.error): Code 500 with message: gcp apply could not update deployment manager Error could not update storage-kubeflow.yaml; Insert deployment error: googleapi: Error 403: Request had insufficient authentication scopes.
More details:
Reason: insufficientPermissions, Message: Insufficient Permission"
and the error I get from Console is:
"Please enable APIs for your project and try again
Please enable cloud resource manager API: https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com/ and iam API: https://console.developers.google.com/apis/api/iam.googleapis.com/"
Note that this error is wrong, all the apis are active already. I'm quite sure this is a bug of KF but not sure how to find a workaround, any thoughts?
With CLI, I'm using my own account which has "owner" privileges.
Thanks
It seems you have an issue with IAM and the installation of Kubeflow, a 3rd party product that itself is not supported by us; nevertheless I went ahead and dig some information about this Machine Learning product.
The main issues (and although it seems you already cover permissions) are permissions, number of projects and some fine grained points.
I was checking and found out the following things that may help
a) Troubleshooting Kubeflow 1
b) Deploying Kubeflow in GKE[2]
c) Kubleflow auto deployer for GKE[3]
There are also some discussion about a mismatch permissions setting in Kubeflow that may be worth reading [4]
Finally there is a group that, also on a best-effort basis due the nature of Kubeflow:"google-kubeflow-support#google.com" that may come in handy.
I trust this information will be useful for you to solve your issue

IBM Cloud Private CE - Unauthorized Access to Catalog

I have installed ICP CE 2.1.0 on a google cloud VM and the installation has gone well-no errors in installation process. When accessing the GUI I am able to see deployments and services but as soon as I access any part of the Catalog I get a blank white page with the text:
{"statusCode":401,"details":"Unexpected response code 401 from request:\nGET https://xx.xxx.xxx.xx:8443/console/api/v1/header?serviceId=catalog-ui&dev=false&accessUrl=https%3A%2F%2Fxx.xxx.xxx.xx%3A8443* ...... }
I have tried killing the individual pods but I get same error. When looking a the pod logs for the catalog-ui I have error 500 messages.
Has anyone experienced this or can tell my why this is the case? Understand that a cloud VM is not the best use case maybe but it should work?
Can you confirm the version level of ICP? Your post mentioned "ICP CE 2.1.0" but if you can check the user icon (top right corner) and click About, we should be able to see the full version details.
Reason for asking is that, at the 2100 level there was an intermittent catalog issue just like you describe. Generally it was caused by resource constraints on the k8s.
Details for ICP 2103, which is the latest available release:
https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0.3/getting_started/whats_new.html

"Project is not fully initialized with the default service accounts." Error in brand new account on first project?

i just signed up for Google Could Engine and started the most basic container engine quickstart on a brand new project:
https://cloud.google.com/container-engine/docs/quickstart
a few steps in it has me run this command
gcloud container clusters create example-cluster
which errors out:
$ gcloud container clusters create example-cluster
ERROR: (gcloud.container.clusters.create) ResponseError: code=503, message=Project hello-world-161713 is not fully initialized with the default service accounts. Please try again later.
so far, "trying again later" has not helped: it's been doing this every time for the past few hours.
as usual, Google has no obvious way of getting help in any timely manner, and Googling the error turns up nothing useful.
kind of a long shot but i found a link sending me over here on one of their pages (great support guys) so figured i'd give it a shot. thanks in advance.
The Container Engine API needs to be enabled, and unfortunately that error message is not correct (trying again later won't help).
If you visit the Google Container Engine page in the wb console https://console.cloud.google.com/kubernetes/list it should enable Google Container Engine. Make sure you select the project you're using with the quickstart. You can create your cluster from that page too if you'd prefer.
You can also enable the Container Engine API manually here: https://console.cloud.google.com/apis/api/container.googleapis.com/overview

Trouble adding a new service

I have followed the instructions at https://github.com/cloudfoundry/oss-docs/tree/master/vcap/adding_a_system_service and copied the echo service and created my new service. (That document is somewhat out-of-date in that "excluded components" no longer exists.
In any case, my service shows up as running with a gateway and a node when I look at 'vcap status' on the server. However, when I look at 'vmc services' from the client my service is not in the list. Where is this list maintained and why is my service not on the list?
Various services, including blob, filesystem, mongodb, etc, are shown on the 'vcm services' list even though they have never been included in my config. Where is this maintained and why are other services on this list?
The cloud_controller.log file shows a "Create service request:" for echo every minute. This service is not in my config file (it was once but it was removed and I repeated the deployment). What is prompting this request for a service that was not defined in the config?
The _gateway.log for my service shows the following:
INFO -- Sending info to cloud controller: ...api.vcap.me/services/v1/offerings
INFO -- Fetching handles from cloud controller .../offerings/.../handles
ERROR -- Failed registering with cloud controller, status=400
DEBUG -- [GaaS-Provisioner] Connected to node mbus..
ERROR -- Failed fetching handles, status=404
Why does my gateway fail to register with the cloud controller? I have found some reports that suggest that the problem is with domain name mapping. I have verified that the server can find itself:
$curl api.vcap.me
Welcome to VMware's Cloud Application Platform
What can I do to register my service?
You can also try asking your question on the vcap_dev google group.
https://groups.google.com/a/cloudfoundry.org/forum/?fromgroups#!forum/vcap-dev
They are focused in answering and discussing OSS subjects for Cloud Foundry!
If you follow the document correctly things should work just fine. I understand that the mechanism for maintaining the excluded list of components has changed and can be a point of confusion when following the steps mentioned in the article (just ignore that step totally).
ERROR -- Failed registering with cloud controller, status=400
Well this is a point of worry. I recently followed the article step by step and was able to add a new service.
Is the echo service showing up in vmc services?
Have you copied the the yml files for node and gateway at ./cloudfoundry/.deployments/devbox/config?
Are the tokens for your gateway unique? and matching in the two files? ./cloudfoundry/.deployments/devbox/config/cloud_controller.yml and ./cloudfoundry/.deployments/devbox/config/**_gateway.yml**
I would recommend that you first concentrate on getting the echo service to be listed in the vmc services output. Once done with this you should replicate the steps (with absolute care to modify things like the token) to get your custom service working.
Cheers,
Ankit
You should follow this guide
It work to me.
regards.