TPU not found on Google VM (jax version 0.2.16)

TPU not found on Google VM (jax version 0.2.16) - tpu

I'm running a TPU v3-8 VM on Google. On the VM, I installed jax with pip install "jax[tpu]==0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html.
Unfortunately, I'm getting the message No GPU/TPU found, falling back to CPU, when issuing jax.device_count(). The same holds for pip install jax==0.2.12. Only when I'm using pip install "jax[tpu]>=0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html (newest jax version), it works. But I need jax version 0.2.12 or 0.2.16 because I would like to train GPT-J on a TPU following the tutorial https://github.com/kingoflolz/mesh-transformer-jax/blob/master/howto_finetune.md
How can I get it running with these versions?

Could you please try to explicitly set TPU_LIBRARY_PATH to the present location of the libtpu.so? most likely /home/<your username>/.local/lib/python3.8/site-packages/libtpu/libtpu.so
Here is the relevant GitHub issue: https://github.com/google/jax/issues/13321
As mentioned there,
"
The underlying problem is that this version of jax still expected libtpu.so to be automatically installed in the VM image (https://github.com/google/jax/blob/jax-v0.2.16/jax/_src/cloud_tpu_init.py#L104), which the TPU VM base image no longer does.
"

Related

IBM Blockchain platform cannot rebuild native dependencies

I trying to get the IBM Blockchain platform to work in Visual studio code (on Linux), but it keeps coming back with:
Could not rebuild native dependencies Failed to execute command "npm" with arguments
"rebuild, grpc, --target=6.1.5, --runtime=electron, --update-binary, --fallback-to-build, --
target_arch=x64, --dist-url=https://atom.io/download/electron" return code 1. Please ensure
that you have node and npm installed
I have node and npm installed
node -v
v10.17.0
and
npm -v
6.11.3
satisfying the constraints. I have visual studio code version 1.41.1. What could be thie issue?

The problem you are experiencing is described in this issue https://github.com/IBM-Blockchain/blockchain-vscode-extension/issues/1621
The issue is that there are no pre-built versions of grpc used by the fabric node sdk for electron 6 currently and due to changes in newer versions of gcc the grpc node module fails to compile when it falls back to using source because of no pre-built versions
The easiest solution is to downgrade for vscode 1.39 and install the extension.
Alternative options are to install gcc version 7 and make that the default in your linux environment or you could install a version of linux that has gcc version 7 as the default for example ubuntu 18.04 (which would allow grpc to compile from source)

How snapcraft nodejs plugin handle the Node.js environment when you create an app snap for different OS?

I'm try to understand how the nodejs plugin i'm using to create snap node.js app handle the Node.js environment ? Example in this application :
parts:
webserver:
source: .
plugin: nodejs
nodejs-version: "12.13.1"
nodejs-package-manager: "yarn"
nodejs-yarn-version: "v1.21.1"
I'm defining to use Node.js v12.13.1 and Yarn v1.21.1 lunching the snapcraft commands:
snapcraft clean
snapcraft --debug
snap install my-snap-file.snap --dangerous
Now i'm able to run the command/service on my machine (amd64 Ubuntu 16.04 LTS) with Node.js v12.2.0 installed but i cant find the node.js env not in multipass instance not in another machine with installed Ubuntu Core 18, i mean i can't run command as node --version and so on and even the snap app doesn't work neither command neither service.
Other problem i've discovered digging in the Ubuntu Core 18 env installed on RaspBerry Pi3 is : When i've installed my snap with nodejs app in the folder /snap//bin i cannot run the ./node exec ! i get the error :
./node: 1: ./node: Syntax error: "(" unexpected
My questions are :
Why i get the ./node error ?
my-snap-file.snap bundle the Node.js v12.3.1 inside the mysnap ?
how i can test the node.js is working with the right version in multipass and other machine where i've installed only the snap bundling node.js ?
THks

Thanks to forum.snapcraft.io i've solved the issue ... i'm posting here to help people to solve these kind of issues too. This error is due to the snap that is NOT build on the actual target architecture you want to run it on. Make sure you build eg. armhf (raspberry pi3) actually on an armhf architecture device.
architectures:
- build-on: amd64
run-on: [amd64, armhf]
here you tell snapcraft that building on amd64 produces binary snaps that can run on amd64 and armhf … which is indeed not true (since building on amd64 will pull in only amd64 binaries). i’d drop that statement completely and make sure to build the armhf version on an armhf device (or on build.snapcraft.io). (Credits Ogra)
Read the Link about architectures in snapcraft.yaml
https://forum.snapcraft.io/t/architectures/4972

Chaquopy upgrade pip

While running the latest chaquopy I am running into the error:
Collecting tensorflow==1.13.1
Could not find a version that satisfies the requirement tensorflow==1.13.1 (from versions: 1.10.1)
No matching distribution found for tensorflow==1.13.1
I am wondering if the internal chaquopy pip is too old and needs to be upgraded. How is this possible?

The issue isn't the version of pip, it's the version of TensorFlow. Try changing your project to use version 1.10.1 instead, as the message suggests.

Lustre Client on Linux Kernel 4+

Does anyone know if it is possible to install lustre client software on a linux machine that has kernel 4+? From what I have experimented so far, all the working examples are on kernel 3.10. And if I try to install kmod-luster-client on 4+ machine, it fails with:
rpm -ivh kmod-lustre-client-2.10.5-1.el7.x86_64.rpm
error: Failed dependencies:
kernel < 3.10.0-863 is needed by kmod-lustre-client-2.10.5-1.el7.x86_64
kernel(PDE_DATA) = 0x44f0d59d is needed by kmod-lustre-client-2.10.5-1.el7.x86_64

According to lustre/ChangeLog in the b2_10 branch, it works with kernels at least 4.4.133-94.33 (SLES12SP3) and 4.4.0-131 (Ubuntu 16.04).
If you are using a newer kernel, you also need to use a newer version of Lustre. The lustre/ChangeLog on the tip of master (almost 2.12 release) reports support for kernels 4.15.0-32 (Ubuntu 18.04).
It looks like you are trying to install a binary kernel module RPM built for the RHEL7 kernel on a non-RHEL kernel. That is never going to work. You need to either get the right RPMs/Debs for your kernel from https://lustre.org/download/ or download the source and rebuild it for your kernel.
The 2.10.x kernels are currently the LTS maintained releases (bugfixes backported to that release), while 2.11.0 is a feature release that does not have bugfixes backported.

Error F_SETLK when building bazel on CentOS 6.5

I am working on building and installing tensorflow on my institution's cluster computer, which is running CentOS 6.5.
Obviously, the first step is building and installing bazel. The build works just fine, but when I try to run the bazel binary, I get the following error:
Error: unexpected result from F_SETLK: Function not implemented
gcc version is 4.7.2
java version is jdk1.8.0_65
edit: I have also tried compiling gcc 4.9.4 and building with this version, and I have tried building both the latest dist of bazel, and the 0.3.1 from the git repo. All variants get the same error.

This happens if the filesystem where Bazel tries to install itself (unpack its embedded tools) doesn't support locking.
Workaround (until the relevant issue is resolved) is to specify a path on a local, writable (and file-lockable) filesystem for --output_user_root, for example:
bazel --output_user_root=/usr/local/$USER/bazelout build <targets>