h2o4gpu does not run ensemble.RandomForestCLassifier on the GPU - h2o4gpu

I built a random forest classifier using the Sklearn Python API. After hearing about the h2o4gpu package for sklearn GPU acceleration I installed it and imported it. But the ensemble.RandomForestClassifier still seems to be running on the CPU. Am I missing something??

Based on specified parameters Sklearn can be used as a fallback. Also you can force to use h2o's back end by
backend='h2o4gpu'

Related

Using Pytorch model trained on RTX2080 on RTX3060

I try to run my PyTorch model (trained on a Nvidia RTX2080) on the newer Nvidia RTX3060 with CUDA support. It is possible to load the model and to execute it. If I run it on the CPU with the --no_cuda flag it runs smootly and gives back the correct predictions, but if I want to run it with CUDA, it only returns wrong predictions which make no sense.
Does the different GPU-architecture of the cards affect the prediction?
Ok it seemed that the problem was the different floating point of the two architectures. The flag torch.backends.cuda.matmul.allow_tf32 = false needs to be set, to provide a stable execution of the model of a different architecture.

What's the meaning of the top number from plot_model of keras?

What's the meaning of the top number "139873604363984" from plot_model of keras?
The topmost block in your diagram represents an implicit Keras input layer. The number itself is probably an internal Python id of the layer object. The appearance of this block is caused by a bug in Keras explained and fixed in this pull request. As of Keras 2.3.1 this seems to be fixed, and implicit layers are no longer shown by default.
If you still use the old version of Keras, upgrading it by running pip install Keras --upgrade should get rid of this block.

Can I use parallel ARPACK in scipy?

I've been using scipy.sparse.linalg.eigs on some large matrices, and not surprisingly, it takes a while. So, I've been looking for ways to speed it up. My understand is that, under the hood, the scipy code use ARPACK, and there is a parallel version of ARPACK which uses MPI. Is it possible to have scipy use the parallel version of ARPACK without too much pain? If so, how?
(I should note that MATLAB's equivalent of eigs does seem to be multithreaded, so that may be the least painful option.)
It would seem that the (MPI-) parallel version of ARPACK is an entirely different project called PARPACK:
"A parallel version of the ARPACK library is now availible. The
message passing layers currently supported are BLACS and MPI .
Parallel ARPACK (PARPACK) is provided as an extension to the current
ARPACK library (Release 2.1)."
Have you looked at PETsc4py?
Or maybe even
"explore calling a parallel sparse linear algebra library like CUSP or
cuSPARSE from Python if speed is your concern and you have an NVIDIA
GPU."
(see this answer)

How do I force MATLAB to run deep learning code on the CPU instead of the GPU?

I don't have CUDA-enabled Nvidia GPU, and I want to force MATLAB to run the code on CPU instead of GPU (yes, I know, it will be very very slow). How can I do it?
As an example, let’s try to run this code on my PC without CUDA. Here is the error given by MATLAB:
There is a problem with the CUDA driver or with this GPU device. Be sure that you have a supported GPU and that the latest driver is installed.
Error in nnet.internal.cnn.SeriesNetwork/activations (line 48)
output = gpuArray(data);
Error in SeriesNetwork/activations (line 269)
YChannelFormat = predictNetwork.activations(X, layerID);
Error in DeepLearningImageClassificationExample (line 262)
trainingFeatures = activations(convnet, trainingSet, featureLayer, ...
Caused by:
The CUDA driver could not be loaded. The library name used was 'nvcuda.dll'. The error was:
The specified module could not be found.
With R2016a, the ConvNet "functionality requires the Parallel Computing Toolbox™ and a CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher."
See: http://uk.mathworks.com/help/nnet/convolutional-neural-networks.html
The code example that you link to requires a GPU. As such the solution is very simple:
You need to use different code.
In your question it is not mentioned specifically what you are trying to achieve, so it is hard to say whether you would need to create something your self or will be able to pick up an exisiting solution, but this CPU vs GPU deep learning benchmark may be an inspiration.

armadillo linear system solver (with openblas)

I've been testing various open source codes for solving a linear system of equations in C++. So far the fastest I've found is armadillo, using the OPENblas package as well. To solve a dense linear NxN system, where N=5000 takes around 8.3 seconds on my system, which is really really fast (without openblas installed, it takes around 30 seconds).
One reason for this increase is that armadillo+openblas seems to enable using multiple threads. It runs on two of my cores, whereas armadillo without openblas only uses 1. I have an i7 processor, so I want to increase the number of cores, and test it further. I'm using ubuntu, so from the openblas documentation I can do in the terminal:
export OPENBLAS_NUM_THREADS=4
however, running the code again doesn't seem to increase the number of cores being used or the speed. Am i doing something wrong, or is the 2 the max amount for using armadillo's "solve(A,b)" command? I wasn't able to find armadillo's source code anywhere to take a look.
Incidentally does anybody know the methods armadillo/openblas use for solving Ax=b (standard LU decomposition with parallelism or something else) ? Thanks!
edit: Actually the number of cores stuck at 2 seems to be a bug when installing openblas with synaptic package manager see here. Reinstalling from source allows it to detect how many cores i actutally have (8). Now I can use export OPENBLAS_NUM_THREADS=4 etc to govern it.
Armadillo doesn't prevent OpenBlas from using more cores. It's possible that the current implementation of OpenBlas simply chooses 2 cores for certain operations.
You can see Armadillo's source code directly in the downloadable package (it's open source), in the folder "include". Specifically, have a look at the file "include/armadillo_bits/fn_solve.hpp" (which contains the user accessible solve() function), and the file "include/armadillo_bits/auxlib_meat.hpp" (which contains the wrapper and housekeeping code for calling the torturous Blas and Lapack functions).
If you already have Armadillo installed on your machine, have a look at "/usr/include/armadillo_bits" or "/usr/local/include/armadillo_bits".