All of a sudden unable to deploy the code in GitHub Actions - github

Due to some reasons deleted the AWS s3 keys and replaced them. Unable to deploy the code in guthub actions due to the below errors.
"Resource is not in the state servicesStable"
In AWS events tab we are getting the below error.
"unable to consistently start tasks successfully".
If required let me know, I will share the task-def.json file.

Related

OpenSearch 1.3 > 2.3 upgrade, CloudFormation fails on domain update

I recently updated our CDK code to move our OpenSearch cluster from version 1.3 to 2.3. The cluster itself seems to have upgraded to a healthy state and is still accessible / usable by our application, but CloudFormation failed when attempting to update our domain resource with:
Resource handler returned message: "Resource handler returned message: "Invalid request provided: DP Nodes are OOS, Tags operation is not allowed"
This kicked the stack into UPDATE_ROLLBACK_FAILED, which is not allowed. The cluster cannot be downgraded back to 1.3.
I'm struggling to find any information about this error it's kicking out and not quite sure how to resolve it to unblock the CloudFormation stack.
Things I have tried:
Digging through CloudWatch logs only revealed information pertaining to queries.
Forcing the rollback to occur without Domain resource. This got me back to an UPDATE_COMPLETE state, but each subsequent deploy of this stack will cause it to fail again since the core issue is not resolved.
This was an odd presentation of a permissions issue. As I was reading through some docs, I stumbled upon this section, which discusses changes to tag-based access control.
This lead me start looking into CloudTrail a bit and stumbled upon the exact error that was firing when this deploy happened. It was a little odd because the assumed role granted admin access to CloudFormation, but the last line of this event record caught my eye:
"sourceIPAddress": "cloudformation.amazonaws.com",
"userAgent": "cloudformation.amazonaws.com",
"errorCode": "ValidationException",
"errorMessage": "DP Nodes are OOS, Tags operation is not allowed",
"eventSource": "es.amazonaws.com",
Upon adding es.amazonaws.com to the trust relationship of that role, the deploy fully re-ran successfully.
Hopefully this helps someone else.

Error while pushing to github for the first time "remote: Internal Server Error" [duplicate]

This error is popping-up intermittently while trying to deploy my github webpages. I have no clue what to do, I already deleted and recreated the repository but the error persists. I have the same problem for all my github pages repositories.
Here is one repository example: https://github.com/cnftstats/borgs
Run actions/deploy-pages#v1
Actor: github-pages[bot]
Action ID: 1998855719
Artifact URL: https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/pipelines/workflows/1998855719/artifacts?api-version=6.0-preview
{"count":1,"value":[{"containerId":359584,"size":14684160,"signedContent":null,"fileContainerResourceUrl":"https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/resources/Containers/359584","type":"actions_storage","name":"github-pages","url":"https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/pipelines/1/runs/21/artifacts?artifactName=github-pages","expiresOn":"2022-06-15T13:26:01.9505756Z","items":null}]}
Creating deployment with payload:
{
"artifact_url": "https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/pipelines/1/runs/21/artifacts?artifactName=github-pages&%24expand=SignedContent",
"pages_build_version": "bf8f96d22c5dd116a5d94ee24cd398bdda60035f",
"oidc_token": "***"
}
Failed to create deployment for bf8f96d22c5dd116a5d94ee24cd398bdda60035f.
{"message":"Deployment request failed for bf8f96d22c5dd116a5d94ee24cd398bdda60035f due to in progress deployment. Please cancel e92de3f483b775a12d4f784d7cc661ff2847fa62 first or wait for it to complete.","documentation_url":"https://docs.github.com/rest/reference/repos#create-a-github-pages-deployment"}
Error: Error: Request failed with status code 400
Error: Error: Request failed with status code 400
Sending telemetry for run id 1998855719
GitHub Actions is currently experiencing degraded performance and published this on their status page. Therefore, you are most likely experiencing a side effect of the current problems. Other users are reporting the same issue as well. Try again later when the issue has been resolved by GitHub.
Update: More products are now affected and experience degraded performance. Check their status page for more details: https://www.githubstatus.com
[It was a bug of GitHub, happens to all its users---date: 18/03/2022]
It happens to me today too.. :(
Maybe is a bug of GitHub pages: https://github.com/actions/deploy-pages/issues/22
https://github.community/t/pages-deploy-wedged-incorrect-request-failed-due-to-in-progress-deployment/234793/4

Github pages fails to deploy

This error is popping-up intermittently while trying to deploy my github webpages. I have no clue what to do, I already deleted and recreated the repository but the error persists. I have the same problem for all my github pages repositories.
Here is one repository example: https://github.com/cnftstats/borgs
Run actions/deploy-pages#v1
Actor: github-pages[bot]
Action ID: 1998855719
Artifact URL: https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/pipelines/workflows/1998855719/artifacts?api-version=6.0-preview
{"count":1,"value":[{"containerId":359584,"size":14684160,"signedContent":null,"fileContainerResourceUrl":"https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/resources/Containers/359584","type":"actions_storage","name":"github-pages","url":"https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/pipelines/1/runs/21/artifacts?artifactName=github-pages","expiresOn":"2022-06-15T13:26:01.9505756Z","items":null}]}
Creating deployment with payload:
{
"artifact_url": "https://pipelines.actions.githubusercontent.com/odmqpuZ7yGar25NNWIM53v9pBjO9vEwDjecGIYtf9ECZfcxi8V/_apis/pipelines/1/runs/21/artifacts?artifactName=github-pages&%24expand=SignedContent",
"pages_build_version": "bf8f96d22c5dd116a5d94ee24cd398bdda60035f",
"oidc_token": "***"
}
Failed to create deployment for bf8f96d22c5dd116a5d94ee24cd398bdda60035f.
{"message":"Deployment request failed for bf8f96d22c5dd116a5d94ee24cd398bdda60035f due to in progress deployment. Please cancel e92de3f483b775a12d4f784d7cc661ff2847fa62 first or wait for it to complete.","documentation_url":"https://docs.github.com/rest/reference/repos#create-a-github-pages-deployment"}
Error: Error: Request failed with status code 400
Error: Error: Request failed with status code 400
Sending telemetry for run id 1998855719
GitHub Actions is currently experiencing degraded performance and published this on their status page. Therefore, you are most likely experiencing a side effect of the current problems. Other users are reporting the same issue as well. Try again later when the issue has been resolved by GitHub.
Update: More products are now affected and experience degraded performance. Check their status page for more details: https://www.githubstatus.com
[It was a bug of GitHub, happens to all its users---date: 18/03/2022]
It happens to me today too.. :(
Maybe is a bug of GitHub pages: https://github.com/actions/deploy-pages/issues/22
https://github.community/t/pages-deploy-wedged-incorrect-request-failed-due-to-in-progress-deployment/234793/4
If someone has an error 500 make sure you enabled the Build and deployment Source from "GitHub Actions".
Settings -> Pages section -> Build and deployment -> Source: GitHub Actions

Kubeflow fails to deploy using both CLI and Console

I deleted my KF cluster last night to create a new one (using kubectl cluster command not Kfctl delete), and then when I tied to create a new one, it fails, it does not work with CLI not Console. I found other people have run into this issue before, for example (here and here)
"However, as I said even with CLI my deployment fails, the error from console is:
ailed to apply: (kubeflow.error): Code 500 with message: coordinator Apply failed for gcp: (kubeflow.error): Code 500 with message: gcp apply could not update deployment manager Error could not update storage-kubeflow.yaml; Insert deployment error: googleapi: Error 403: Request had insufficient authentication scopes.
More details:
Reason: insufficientPermissions, Message: Insufficient Permission"
and the error I get from Console is:
"Please enable APIs for your project and try again
Please enable cloud resource manager API: https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com/ and iam API: https://console.developers.google.com/apis/api/iam.googleapis.com/"
Note that this error is wrong, all the apis are active already. I'm quite sure this is a bug of KF but not sure how to find a workaround, any thoughts?
With CLI, I'm using my own account which has "owner" privileges.
Thanks
It seems you have an issue with IAM and the installation of Kubeflow, a 3rd party product that itself is not supported by us; nevertheless I went ahead and dig some information about this Machine Learning product.
The main issues (and although it seems you already cover permissions) are permissions, number of projects and some fine grained points.
I was checking and found out the following things that may help
a) Troubleshooting Kubeflow 1
b) Deploying Kubeflow in GKE[2]
c) Kubleflow auto deployer for GKE[3]
There are also some discussion about a mismatch permissions setting in Kubeflow that may be worth reading [4]
Finally there is a group that, also on a best-effort basis due the nature of Kubeflow:"google-kubeflow-support#google.com" that may come in handy.
I trust this information will be useful for you to solve your issue

Google cloud datalab deployment unsuccessful - sort of

This is a different scenario from other question on this topic. My deployment almost succeeded and I can see the following lines at the end of my log
[datalab].../#015Updating module [datalab]...done.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deployed module [datalab] to [https://main-dot-datalab-dot-.appspot.com]
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Step deploy datalab module succeeded.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deleting VM instance...
The landing page keeps showing a wait bar indicating the deployment is still in progress. I have tried deploying several times in last couple of days.
About additions described on the landing page -
An App Engine "datalab" module is added. - when I click on the pop-out url "https://datalab-dot-.appspot.com/" it throws an error page with "404 page not found"
A "datalab" Compute Engine network is added. - Under "Compute Engine > Operations" I can see a create instance for datalab deployment with my id and a delete instance operation with *******-ompute#developer.gserviceaccount.com id. not sure what it means.
Datalab branch is added to the git repo- Yes and with all the components.
I think the deployment is partially successful. When I visit the landing page again, the only option I see is to deploy the datalab again and not to start it. Can someone spot the problem ? Appreciate the help.
I read the other posts on this topic and tried to verify my deployment using - "https://console.developers.google.com/apis/api/source/overview?project=" I get the following message-
The API doesn't exist or you don't have permission to access it
You can try looking at the App Engine dashboard here, to verify that there is a "datalab" service deployed.
If that is missing, then you need to redeploy again (or switch to the new locally-run version).
If that is present, then you should also be able to see a "datalab" network here, and a VM instance named something like "gae-datalab-main-..." here. If either of those are missing, then try going back to the App Engine console, deleting the "datalab" service, and redeploying.