NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running - server

I have Ubuntu 16.04 running and Nvidia drivers and CUDA and cud installed a long time ago. Yesterday the server was rebooted, and after I login through ssh, I wanted to check the GPU using nvidia-smi but failed.
The error message was "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."
The server has:
Ubuntu 16.04
Nvidia k80c x4
CUDA 9.0
Has anyone faced this problem and can offer a possible solution? I don't really want to reinstall the driver as it could cause other problems.

This problem is eventually solved by re-installing the Nvidia GPU driver. The root cause of this error is because the server has shut down due to unstable power supply while running using the GPU.

Related

Ubuntu 22.04 LTS automatic shutdown when log off to use RDP

I am experiencing some issues with Ubuntu 22.04 LTS.
I have logout from the Ubuntu server and connect to my Ubuntu server using Remote Desktop (Windows 11). This is the process to be able to connect to RDP.
However, after a short period of time, i experience that the server shutdown/ or go to sleep mode. I need to physically turn it off and ON again.
Can someone please share some information on why does this happen? And any possible fix?
Thank you.

Building Flutter Engine on Ubuntu

I use Ubuntu 20.04. When I run https://github.com/flutter/flutter/wiki/Setting-up-the-Engine-development-environment step 7
sudo ./build/install-build-deps-android.sh
ERROR: Only Ubuntu 12.04 (precise), 14.04 (trusty), 14.10 (utopic), 15.04 (vivid), 16.04 (xenial), 18.04 (bionic), and Debian (rodete and stretch) are currently supported
And the doc says:
If you're on Linux, run the following. Note: These scripts are distro- and version-specific, so are not guaranteed to work on every configuration. If they fail, you may need to find comparable packages to the ones that weren't found.
Based on the documentation, How do I know which packages are not compatible.
Maybe you can modify that install-build-deps-android.sh script and force 20.04 and try to go on the building process?

Google VM Instance Stopped Working after Upgrade

I have a google VM instance which was running flawlessly. It has Centos-7 and Plesk installed in it.
I just stopped it, upgraded machine type (to better CPU and RAM) and started it again. My server stopped responding at all. No websites are running, I can't connect to SSH & Google Cloud SDK Shell is unable to reach server. It says NETWORK ERROR, CONNECTION TIMED OUT. My all other instances works well.
I tried rebooting & resetting multiple times. Reading out stuff from internet since last 6 hours but for no luck. I also tried to clone disk of the instance and creating new instance with the cloned disk but for no luck. Same network connection issue. May be something in OS got corrupted? Please suggest. I have a number of websites hosted on the server which are down due to this. Thanks a lot in advance.
I took screenshot of my VM using Google Cloud Shell. It is as follows:
I connected with serial console which is as follows:
While creating a ticket with google, I found that they've posted some information under "Known Issues". I am pasting the whole stuff as there is no direct link to reach there. The symptoms they told were exactly what was happening to me:
Below is Known Issue Posted by Google:
Description:
We are experiencing an issue with Google Compute Engine instances running RHEL and CentOS 7 and 8. More details on this issue are available in the following article and bugs: https://access.redhat.com/solutions/5272311 https://bugzilla.redhat.com/show_bug.cgi?id=1861977 (RHEL 8) https://bugzilla.redhat.com/show_bug.cgi?id=1862045 (RHEL 7) Symptoms: Instances running RHEL and CentOS 7 and 8 that run yum update may fail to boot after restart with errors messages referring to a combination of: "X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID", "FXSAVE_STATE" or "Find image based on IP". This issue affects instances with specific versions of the shim package installed. To find the currently installed shim version, use the following command: rpm -q shim-x64 Affected shim versions: CentOS 7: shim-x64-15-7.el7_9.x86_64 CentOS 8: shim-x64-15-13.el8.x86_64 RHEL 7: shim-x64-15-7.el7_8.x86_64 RHEL 8: shim-x64-15-14.el8_2.x86_64 Workaround: Do not update or reboot instances running RHEL or CentOS 7 and 8. If you are on an affected shim version, run yum downgrade shim\* grub2\* mokutil to downgrade to the correct version. This command may not work on CentOS 8. If you have already rebooted, you will need to attach the disk to another instance, chroot into the disk, then run the yum downgrade command. We will provide an update by Thursday, 2020-07-30 14:00 US/Pacific with current details.
Start time:
July 30, 2020 at 9:08:34 PM GMT+5
How to diagnose:
Instances running RHEL and CentOS 7 and 8 that run yum update may fail to boot after restart with errors messages referring to a combination of: "X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID", "FXSAVE_STATE" or "Find image based on IP". This issue affects instances with specific versions of the shim package installed. To find the currently installed shim version, use the following command: rpm -q shim-x64 Affected shim versions: CentOS 7: shim-x64-15-7.el7_9.x86_64 CentOS 8: shim-x64-15-13.el8.x86_64 RHEL 7: shim-x64-15-7.el7_8.x86_64 RHEL 8: shim-x64-15-14.el8_2.x86_64
Workaround:
Do not update or reboot instances running RHEL or CentOS 7 and 8. If you are on an affected shim version, run yum downgrade shim\* grub2\* mokutil to downgrade to the correct version. This command may not work on CentOS 8. If you have already rebooted, you will need to attach the disk to another instance, chroot into the disk, then run the yum downgrade command.

Can't start minikube v1.1.1 on Ubuntu 19.04

Problem with starting minikube 1.1.1 on Ubuntu 19.04:
X Unable to start VM
* Error: [VBOX_HOST_ADAPTER] start: Error setting up host only network on machine start: The host-only adapter we just created is not visible. This is a well known VirtualBox bug. You might want to uninstall it and reinstall at least version 5.0.12 that is is supposed to fix this issue
I've seen this question: How to fix VM issue with minikube start ? however I'm running ubuntu 19.04, not OSX. I've also done Minikube not starting on Ubuntu, throwing errors a couple of times too.
Now, I've installed the dkms package, I've modprobed vbox-dkms into the kernel, UEFI secure boot is set up as part of the dkms installation, but I still can't get minikube to start
Anyone have any ideas where to look next or what the problem might be?
Ps. I'm running virtualbox 6.0.6_Ubuntu r129722

Vagrant Stopped Working, Says VirtualBox Version is Not Supported

I have Vagrant 1.6.3 and VirtualBox 4.3.12 on Windows 8.1 Pro(64 bits). Vagrant was working fine till yesterday.
Suddenly, it started giving this error on vagrant up (in powershell):
The provider 'virtualbox' that was requested to back the machine
'default' is reporting that it isn't usable on this system. The
reason is shown below:
Vagrant has detected that you have a version of VirtualBox installed
that is not supported. Please install one of the supported versions
listed below to use Vagrant:
4.0, 4.1, 4.2, 4.3
I have tried following:
Uninstall both Vagrant and VirtualBox (5-6 times) and reinstall. I rebooted after every install/uninstall.
Tried different (latest as well as older) versions of both Vagrant and VirtualBox. I have tried Vagrant 1.6.3 and Vagrant 1.6.5, VirtualBox 4.3.16, 4.3.12, 4.3.10.
Disabling anti virus and firewall. (Both Vagrant and VirtualBox are added to exclusion list)
Checking that VirtualBox is added in Environment Path.
Anything that I am missing? I tried searching and only two relevant links I found were: Vagrant has detected that you have a version of VirtualBox installed that is not supported (does not work), Vagrant has detected that the VirtualBox installed is not supported (fixed in Vagrant 1.5)
Any should I do now?
The problem was with Firewall. I recently switched to Comodo firewall and it was causing the issue.