Google cloud build: No space left on device - gcloud

I have been using google's cloud build for building my artifacts/docker for my deployment. But I am suddenly getting the following error when submitting a build:
Creating temporary tarball archive of 1103 file(s) totalling 99.5 MiB before compression.
ERROR: gcloud crashed (IOError): [Errno 28] No space left on device
I have increased the diskSizeGB size as well but still I am getting this error. Where does cloud build happen in the cloud or which VM ? How to get rid of this error ?

Cloud Build is a service. While its builds are on GCE VMs these are VMs managed by the service and opaque to you. You cannot access the build service's resources directly.
What value did you try for diskSizeGB?
Please updating your question to include the (salient parts of) cloudbuild.yaml and the gcloud command that you're using to submit the job.
I'm wondering whether the error corresponds to a lack of space locally (your host) rather than on the service's VM.

Related

Container App Environment creation timing out

Where I work has just started migrating to the cloud. We've successfully deployed a number of resources using Terraform and Pipelines into Azure.
Where we are running into issues is deploying a Container App Environment, we have code that was working in a less locked down environment (setup for Proof of Concept), but are now having issues using that code in our go-forward.
When deploying, the Container App Environment spends 30mins attempting to create before it returns a context deadline exceeded error. Looking in Azure Portal, I can see the resource in "Waiting" provisioning state and I can also see the MC_ and AKS resources that get generated. It then fails around 4hrs later.
Any advice?
I am suspecting it's related to security on the Virtual Network that the subnets are sitting on, but I'm not seeing any logs on the deployment to confirm. The original subnets had a Network Security Group (NSG) assigned and I configured the rules that Microsoft provide before I added a couple of subnets without an NSG assigned and no luck.
My next step is to try provisioning it via the GUI and see if that works.
I managed to break our build in the "anything goes" environment.
The root cause is an incomplete configuration of the Virtual Network which has custom DNS entries. This has now been passed to our network architects to resolve. If I can get more details on the fix they apply I'll include that here for anyone else that runs into the issue.

How to sync user directory on bitbucket server to jira with both running on aks?

When trying to sync the user directories of Jira to other atlassian products (confluence and bitbucket server running on aks) a 403 error is returned.
Upon looking into this error the following steps have been attempted:
https://confluence.atlassian.com/stashkb/unable-to-connect-to-jira-for-authentication-forbidden-403-323391874.html
The IP adresses have been added to the whitelist of Jira. The next step in solutions online is to restart the Jira
service.
This however causes issues as upon running the stop/start-jira.sh files inside the pod the service returns
with none of the previous settings and all configurations including backups are gone. Taking us back to square one.
cluster size:
current set-up
3 x Standard D8 v3 (8 vcpus, 32 GiB memory) cluster on aks
Used the following images installed through UI:
atlassian/jira-software
cptactionhank/docker-atlassian-jira
Exec into pod and go to /opt/atlassian/jira/bin
run ./(start/stop)-jira.sh
What should happen is that when going back to the url the Jira instance is reset and all configuration files in the pod for the service are lost.
The logs of the pod give error no 137 as a common error when restarting.
update:
https://github.com/int128/devops-kompose/tree/master/atlassian-jira-software
The following helm chart has also been used and achieved the same result.

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.

Node app staging fails at Installing App Management stage

I've been trying to push a new build of an app to Bluemix, but staging keeps failing when it's at "Installing App Management" because it can't create regular files and directories due to the disk quota being exceeded.
I've already tried pushing it with "-k 2G", but it it still fails.
Is there any way to find out how or why the disk quota keeps being exceeded? There's no way I'm near using 2GB of disk space.
Switching to npm v3 is a potentinal solution here as it reduces the number of duplicated dependencies.
You can do that in your package.json, for example:
"engines": { "npm": "3.x" }
By design, the CloudFoundry applications on IBM Bluemix are limited to an available diskquota of 2GB (default is 1 GB): usually if a cloud application needs more than a 1GB (even 1GB is a lot for a cloud application...) it should be redesigned according to cloud patterns, breaking down it into microservices, using external storage services if it needs simply static storage (for example ObjectStorage service on Bluemix).
You have also to consider that a cloud application filesystem is a unreliable filesystem, the application itself could be automatically deployed on a different virtual environment without any evidence for the end users.
Even the logs should be collected on external services (routing the log stream) if you need to keep these safe, otherwise they will be reset as soon as the application will be restarted on a different cluster node.

Create VM Azure REST API in Java - The specified deployment slot Production is occupied

I got this error The specified deployment slot Production is occupied when I try to create a VM with REST API.
In my XML, I have <DeploymentSlot>Production</DeploymentSlot> but I can't find any information for resolving this issue.
Any ideas ?