How can I check the remaining size of a persistent disk on Google Cloud? And where can I find the code in the instances? - postgresql

I created a project on Google Cloud a long time ago and I am currently having some problems with it. The only result I seem to be receiving is Internal Server Error.
I tried connecting to the compute instance through ssh, but it does not help much because :
as far as I remember, I used to be able to see all the code on the compute instance. It's no longer there, the home folder only has some hidden files. I am not sure where to look for the actual project files.
the only error I managed to get from a log file was : Error syncing pod 9c8e56bc-4298-11e6-ab50, skipping: failed to "StartContainer" for "postgres" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=postgres pod=postgres_default(9c8e56bc-4298-11e6-ab50); this makes me think there are some issues with Postgres, which has a persistent disk to its own, but there seems to be no easy way to find out how much of that disk is occupied.
even though I am admin on that project and I should receive detailed (with stacktrace) emails every time there is an error, I am not receiving anything at all.
This behaviour started today, all of a sudden, and I haven't touched the project in almost 2 years, so I am completely lost.
Thanks.

How can I check the remaining size of a persistent disk on Google
Cloud?
For this part, I finally found a way to do it today. I'll describe it all here with print screens so that it is easy for anyone.
First, go to the Google Console, Disks page : https://console.cloud.google.com/compute/disks
Identify the persistent disk you are interested in. In my case, this was called pg-data-disk. Click on the respective VM instance; this will be on the column "In use by", link in the image below :
This will open a SSH connection to the VM instance to which your persistent disk is attached. In the SSH window, run the following command : sudo lsblk. The result should be like in the image below :
You will thus discover the DISK ID (in my case this was sdb), so you can now run : sudo df -h <YOUR DISK ID>. This command will give you the exact disk usage, as shown below :
As for the other part of the question, I was actually using Docker containers which were orchestrated by Kubernetes. And I totally forgot about it.
Will upgrade my RAM and get back to work.
Thank you all.

Related

MDT step by step deployment capture not generating wim

New to MDT.
So I am following through the MS step by step guides:
https://learn.microsoft.com/en-us/windows/deployment/windows-10-poc
https://learn.microsoft.com/en-us/windows/deployment/windows-10-poc-mdt
I am at step 28 in (in the second guide):
Deploy Windows 10 in a test lab using Microsoft Deployment Toolkit
Where the deployment wizard has been launched in a VM on the host system and have watched the process continue for an hour. It finally finishes but it does not create the .wim on the the server share as
expected and as referred to in the bootstrap.ini:
Bootstrap.ini
[Settings]
Priority=Default
[Default]
DeployRoot=\\SRV1\MDTBuildLab$
UserDomain=CONTOSO
UserID=MDT_BA
UserPassword=pass#word1
SkipBDDWelcome=YES
I have verified that the share "DeployRoot" exists and can be connected to using the provided credentials and that the share has the correct permissions to create/delete files.
Not sure what I'm missing but my expectation was a .wim should have been created in \srv1\MDTBuildLab$\Captures but there is nothing in that folder.
Just before stopping the deployment wizard reboots several times in quick succession, which to me doesn't appear correct but as I have never witnessed a successful capture I can't say for sure this isn't what's supposed to happen.
I'm not even sure where I can view any log files to figure out why it fails.
Any assistance appreciated!
Further Info:
Activated monitoring. It gets to step 86 of 93. The last thing I see is "Applying WinPE (BD)" or something similar and then it restarts. Then several quick reboots occur (the loading bar appears for a second or two and then reboots) (Which I think are failing) finally it gives up! The process never completes!
When I attempt to mount the client REFW10X64-001.vhdx to check the logs I am greeted with this message
The disk image isn't initialized, contains partitions that aren't recognizable, or contains volumes that haven't been assigned drive letters. Please use the Disk Management snap-in to make sure that the disk, partitions, and volumes are in a usable state.
So it looks like the last step totally screwed the disk! Which would explain the last several boots failing to load anything.
So no errors no warnings, no logs, no finish and no wim generated.
How do I troubleshoot this?
I know this post is old, but the normal behavior would be as follows:
Using the boot image, you boot into WinPE
The task sequence is started and the OS gets applied to the disk
Reboot
Boot into full Windows where the task sequence also continues
Under full Windows, one of the last steps is that WinPE gets applied again
Reboot
Computer boots automatically into WinPE
The wim file gets created (WinPE is running on the RAM disk and the regular C: drive (and any additional drives) is being mirrored into the wim file)
Computer performs the FINISHACTION.
We would need at least BDD.log and smsts.log to further troubleshoot. My guess is that WinPE was not applied correctly.

AWS EC2 free tier instance is automatically stopping frequently

I am using ubuntu 18.04 on AWS EC2 instace free tier, running websites on apache server, NodeJS with PostgreSQL database. All deployments are done perfectly and webapps works fine without any exception or error details.
However I am facing an annoying issue: this instance is stopping frequently without any exception or error logs. After rebooting instance everything starts working fine but after some time it automatically stops either in few hrs. on same day when rebooted instance or in 1-2 days after that.
I created another free tier instance with seperate account and that is also giving same issue. I am not finding any logs or troubleshoot option to get rid of this problem.
I would like to know how it can be troubleshooted or where can i find logs of any errors or exception for this isntance?
Suggestion given by AWS in "Instance Status Checl" as attached below are not practicle solution to apply evertime.
Something with your VM itself is causing its health checks to fail.
Have a look at syslogs, and your application logs. Also take a look at CloudWatch metrics to see if any metrics have dramatic change close to time.
You can also add a CloudWatch alarm with a recovery action to automatically reboot if there’s an issue with your VM.

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.

Google Compute Engine snapshot of instance with persistent disks attached failed

I have a working VM instance that I'm trying to copy to allow redundancy behind google load balancer.
A test run with a dummy instance worked fine, creating a new instance from a snapshot of a running one.
Now, the real "original" instance have a persistent disk attached and this cause a problem in starting up the cloned instance because of the (obviously) missing persistent disk mount.
Logs from serial console output is as:
* Stopping cold plug devices[74G[ OK ]
* Stopping log initial device creation[74G[ OK ]
* Starting enable remaining boot-time encrypted block devices[74G[ OK ]
The disk drive for /mnt/XXXX-log is not ready yet or not present.
keys:Continue to wait, or Press S to skip mounting or M for manual recovery
As I understand there is no way to send any of this key strokes to the instance, is there any other way to overcome this issue? I know that I could unmount the disk before the snapshot, but the workflow I would like to instate is creating period snapshots of production servers, so un-mounting disks every time before performing it would require instance downtime (plus all the unnecessary risks of doing an action that would seem pointless).
Is there a way to boot this type of cloned instances successfully, and attach a new persistence disk afterwards?
Is this happening because the original persistent disk is in use, or the same problem would occur even if the original instance is offline (for example due to a failure in which case I would try to created a new instance from a snapshot)?
One workaround that I am using to get away from the same issue is that I dont't actually unmount the disk rather just comment out the the mount line in /etc/fstab and take the snapshot. This way my instance has no downtime or down disks while snapshoting. (I am using Ubuntu 14.04 as OS if that matters)
Later I fix and uncomment it when I use that snapshot on a new instance.
However you can also look into adding the nofail option in the commented line to get a better solution.
By the way I am doing a similar task building a load balanced setup with multiple webserver nodes. Each being cloned from the said snapshot with extra persistent disks mounted for eg uploads,data and logs etc
I'm a little unclear as to what you're trying to accomplish. It sounds like you're looking to periodically snapshot the data volumes of a production server so you can clone them later.
In all likelihood, you simply need to sync and fsfreeze to before you make your snapshot, rather than just unmounting/remounting it. The GCP documentation has a basic example of this in the Snapshots documentation.

MobileFirst container is stuck in my Bluemix space though I deleted it

I created a container using the evaluation image for Mobile First (built an image and pushed it), then deleted the container. Though I deleted that container, it still shows in my Dashboard with state "Unknown". What is worse is that it is taking 1 GB of memory out of my 2 GB quota. So, I am not able to create a new container with memory >= 1GB nor am I able to delete the "Unknown" state one. I tried to log out or use different browser with no luck.
The result of "ice ps" is zero rows.
The command
ice ps
returns only running containers.
Please try to run the following command
ice ps -a
This one lists all containers in your space.
If you see your container listed use the following command to remove it:
ice rm <container id> --force
In case this solution does not work for you then I suggest you to open a ticket with IBM Bluemix Support and someone in support can help you delete the container. Here is the link to create a support ticket:
https://developer.ibm.com/bluemix/support/#support