SegNet - CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR - neural-network

Im trying to train my own dataset on SegNet (with caffe), I prepared the dataset same as segnet tutorial. when I try to run the train, it shows me this error:
I0915 08:33:50.851986 49060 net.cpp:482] Collecting Learning Rate and Weight Decay.
I0915 08:33:50.852017 49060 net.cpp:247] Network initialization done.
I0915 08:33:50.852030 49060 net.cpp:248] Memory required for data: 1064448016
I0915 08:33:50.852730 49060 solver.cpp:42] Solver scaffolding done.
I0915 08:33:50.853065 49060 solver.cpp:250] Solving VGG_ILSVRC_16_layer
I0915 08:33:50.853080 49060 solver.cpp:251] Learning Rate Policy: step
F0915 08:33:51.324506 49060 math_functions.cu:123] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR
*** Check failure stack trace: ***
# 0x7fa27a0d3daa (unknown)
# 0x7fa27a0d3ce4 (unknown)
# 0x7fa27a0d36e6 (unknown)
# 0x7fa27a0d6687 (unknown)
# 0x7fa27a56946e caffe::caffe_gpu_asum<>()
# 0x7fa27a54b264 caffe::SoftmaxWithLossLayer<>::Forward_gpu()
# 0x7fa27a440b29 caffe::Net<>::ForwardFromTo()
# 0x7fa27a440f57 caffe::Net<>::ForwardPrefilled()
# 0x7fa27a436745 caffe::Solver<>::Step()
# 0x7fa27a43707f caffe::Solver<>::Solve()
# 0x406676 train()
# 0x404bb1 main
# 0x7fa2795e5f45 (unknown)
# 0x40515d (unknown)
# (nil) (unknown)
my dataset is .jpg (train) .png (labels gray-scale images) and .txt file as in the tutorial. what can be the problem? thanks for helping

The ground-truth images should be 1 channel 0-255 images without alpha layer, so the NN will recognise the difference between the classes.
img = Image.open(filename).convert('L') # Not 'LA' (A - alpha)

Thanks to isn4, Here is the resolution:
Turns out you have to change the range of the pixel values as well as the actual number of pixel values. Segnet gets confused if you have 256 possible pixel values (0-255) and don't have class weightings for each of them. So I changed all of my PNG label images from 255 and 0 as the pixel possibilities to 1 and 0 as the pixel possibilities.
Here's my python script for doing so:
import os
import cv2
import numpy as np
img = cv2.imread('/usr/local/project/old_png_labels/label.png, 0)
a_img = np.array(img, np.double)
normalized = cv2.normalize(img, a_img, 1.0, 0.0, cv2.NORM_MINMAX)
cv2.imwrite('/usr/local//project/png_labels/label.png, normalized)

Related

A bug is encountered with the "Maskable PPO" training with custom Env setup

I encountered an error while using SB3-contrib Maskable PPO action masking algorithm.
File ~\anaconda3\lib\site-packages\sb3_contrib\common\maskable\distributions.py:231, in MaskableMultiCategoricalDistribution.apply_masking(self, masks)
228 masks = th.as_tensor(masks)
230 # Restructure shape to align with logits
--> 231 masks = masks.view(-1, sum(self.action_dims))
233 # Then split columnwise for each discrete action
234 split_masks = th.split(masks, tuple(self.action_dims), dim=1)
RuntimeError: shape '[-1, 1600]' is invalid for input of size 800
I am running learning progamme with an action being a MultiBinary space with 800 selections of 0, 1.
The action space is defined as below:
self.action_space = spaces.MultiBinary(800)
Within the custom environment class, an "action_mask" function was created such that it returns a List of 800 boolean values.
Now, when I follow the document and start to train the model, the error message pops:
from sb3_contrib import MaskablePPO
from Equities_RL_Env import Equities_RL_Env
import time
from sb3_contrib.common.maskable.utils import get_action_masks
models_dir = f"models/V1 31-Jul/"
logdir = f"logs/{time.strftime('%d %b %Y %H-%M',time.localtime())}/"
if not os.path.exists(models_dir):
os.makedirs(models_dir)
if not os.path.exists(logdir):
os.makedirs(logdir)
env = Equities_RL_Env(Normalize_frame(historical_frame), pf)
env.reset()
model = MaskablePPO('MlpPolicy', env, verbose=1, tensorboard_log=logdir)
TIMESTEPS = 1000
iters = 0
while iters <= 1000000:
iters += 1
model.learn(total_timesteps=TIMESTEPS, reset_num_timesteps=False, tb_log_name=f"PPO")
model.save(f"{models_dir}/{TIMESTEPS*iters}")
May I know is there a way to define that shape within the custom environment?

Using numba in a method to randomize matrices

I'm still not very familiar with numba and my problem is that I have the piece of code bellow that I use for randomize the edges of graphs.
This code is simply used to swap some edges in a connectivity matrix given the number of desired swaps and a seed for the random number generator.
My problem is that when I try to use it with numba to speed it up I did not menage to run it. The error it returns is also pasted bellow.
#nb.jit(nopython=True)
def _randomize_adjacency_wei(A, n_swaps, seed):
np.random.seed(seed)
# Number of nodes
n_nodes = A.shape[0]
# Copy the adj. matrix
Arnd = A.copy()
# Choose edges that will be swaped
edges = np.random.choice(n_nodes, size=(4, n_swaps), replace=True).T
#itr = range(n_swaps)
#for it in tqdm(itr) if verbose else itr:
it = 0
for it in range(n_swaps):
i,j,k,l = edges[it,:]
if len(np.unique([i,j,k,l]))<4:
continue
else:
# Old values of weigths
w_ij,w_il,w_kj,w_kl=Arnd[i,j],Arnd[i,l],Arnd[k,j],Arnd[k,l]
# Swaping edges
Arnd[i,j]=Arnd[j,i]=w_il
Arnd[k,l]=Arnd[l,k]=w_kj
Arnd[i,l]=Arnd[l,i]=w_ij
Arnd[k,j]=Arnd[j,k]=w_kl
return Arnd
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function unique at 0x7f1a1c03b0d0>) found for signature:
>>> unique(list(int64)<iv=None>)
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function 'np_unique': File: numba/np/arrayobj.py: Line 1915.
With argument(s): '(list(int64)<iv=None>)':
Rejected as the implementation raised a specific error:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'ravel' of type list(int64)<iv=None>
File "../../../home/vinicius/anaconda3/lib/python3.8/site-packages/numba/np/arrayobj.py", line 1918:
def np_unique_impl(a):
b = np.sort(a.ravel())
^
During: typing of get attribute at /home/vinicius/anaconda3/lib/python3.8/site-packages/numba/np/arrayobj.py (1918)
File "../../../home/vinicius/anaconda3/lib/python3.8/site-packages/numba/np/arrayobj.py", line 1918:
def np_unique_impl(a):
b = np.sort(a.ravel())
^
raised from /home/vinicius/anaconda3/lib/python3.8/site-packages/numba/core/typeinfer.py:1071
During: resolving callee type: Function(<function unique at 0x7f1a1c03b0d0>)
During: typing of call at <ipython-input-165-90ffd30fe0e8> (19)
File "<ipython-input-165-90ffd30fe0e8>", line 19:
def _randomize_adjacency_wei(A, n_swaps, seed):
<source elided>
i,j,k,l = edges[it,:]
if len(np.unique([i,j,k,l]))<4:
^
Thanks in advance,
Vinicius
According to the comments, you are passing a list to np.unique() but this is not supported by Numba.
Modifying the code this way:
i, j, k, l = e = edges[it, :]
if len(np.unique(e)) < 4:
...
The following example doesn't produce any errors:
>>> A = np.random.randint(0, 5, (8,8))
>>> r = _randomize_adjacency_wei(A, 4, 33)

MATLAB H5 files cannot exceed 2GB

If I create a h5 file larger than 2GB, I get an error. Smaller than 2GB things work fine. I didn't think this filesize was a problem with the h5 format. This seems to not be a problem on my windows 8.1 machine, but is on my Ubuntu 20.04 machine.
MWE:
fname = 'tmp001.h5';
h5create(fname,'/DS1',[195 2048 1500],'DataType','single'); % doesn't work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
fname = 'tmp002.h5';
h5create(fname,'/DS1',[195 2048 1400],'DataType','single');% doesn't work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
fname = 'tmp003.h5';
h5create(fname,'/DS1',[195 2048 1300],'DataType','single');% does work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
This only seems to be a problem if I give it only a segment of the data set, and even then it works providing the size of the data is more than 2^14! See files 4 and 5:
fname = 'tmp004.h5';
h5create(fname,'/DS1',[536886273],'DataType','single');% does work
h5write(fname,'/DS1',rand(536886273,1)); % filesize > 2gb but length > 2^14
fname = 'tmp005.h5';
h5create(fname,'/DS1',[536886273],'DataType','single');% doesn't work
h5write(fname,'/DS1',1,1,1); % filesize > 2gb but length < 2^14
fname = 'tmp006.h5';
h5create(fname,'/DS1',[536886272],'DataType','single');% does work
h5write(fname,'/DS1',1,1,1); % length > 2^14 but okay since filesize < 2gb
Error:
Warning: The following error was caught while executing 'H5ML.id' class destructor:
Error using hdf5lib2
The HDF5 library encountered an error and produced the following stack trace information:
H5FD_truncate driver truncate request failed
H5F_dest low level truncate failed
H5F_try_close problems closing file
H5O_close problem attempting file close
H5D_close unable to release object header
H5I_dec_app_ref can't decrement ID ref count
H5I_dec_app_ref_always_close can't decrement ID ref count
H5Dclose can't decrement count on dataset ID
Error in H5ML.id/close (line 47)
H5ML.hdf5lib2(obj.callback, obj.identifier);
Error in H5ML.id/delete (line 36)
obj.close();

Keras: What is the correct data format for recurrent networks?

I am trying to build a recurrent network which classifies sequences (multidimensional data streams). I must be missing something, since while running my code:
from keras.models import Sequential
from keras.layers import LSTM, Dropout, Activation
import numpy as np
ils = 10 # input layer size
ilt = 11 # input layer time steps
hls = 12 # hidden layer size
nhl = 2 # number of hidden layers
ols = 1 # output layer size
p = 0.2 # dropout probability
f_a = 'relu' # activation function
opt = 'rmsprop' # optimizing function
#
# Building the model
#
model = Sequential()
# The input layer
model.add(LSTM(hls, input_shape=(ilt, ils), return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Hidden layers
for i in range(nhl - 1):
model.add(LSTM(hls, return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Output layer
model.add(LSTM(ols, return_sequences=False))
model.add(Activation('softmax'))
model.compile(optimizer=opt, loss='binary_crossentropy')
#
# Making test data and fitting the model
#
m_train, n_class = 1000, 2
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
model.fit(data, labels, nb_epoch=10, batch_size=32)
I get output (truncated):
Using Theano backend.
line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 430, in make_node
new_inputs.append(format(outer_seq, as_var=inner_seq))
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 422, in format
rval = tmp.filter_variable(rval)
File "/home/koala/.local/lib/python2.7/site-packages/theano/tensor/type.py", line 233, in filter_variable
self=self))
TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable Subtensor{:int64:}.0) into Type TensorType(float32, (False, False, True)). You can try to manually convert Subtensor{:int64:}.0 into a TensorType(float32, (False, False, True)).
Is this a problem with the data format at all.
For me the problem was fixed when I went and tried it on my real dataset. The difference being that in the real dataset I have more than 1 label. So an example of dataset on which this code works is:
(...)
ols = 2 # Output layer size
(...)
m_train, n_class = 1000, ols
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
# Make labels onehot
onehot_labels = np.zeros(shape=(labels.shape[0], ols))
onehot_labels[np.arange(labels.shape[0]), labels.astype(np.int)] = 1

Error : H5LTfind_dataset(file_id, dataset_name_) Failed to find HDF5 dataset label

I want to use HDF5 file to input my data and labels in my CNN.
I created the hdf5 file with matlab.
Here is my code:
h5create(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/image',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/anno',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/label',[1 numFrames]);`
h5write(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/image',images);
h5write(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/anno',anno);
h5write(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/label',label);`
Where image, anno are 4D unit8 and label is a 1x85 unit16 vector.
When I display my .h5 file I got this:
HDF5 uNetDataSet.h5
Group '/'
Group '/home'
Group '/home/alexandra'
Group '/home/alexandra/Documents'
Group '/home/alexandra/Documents/my-u-net'
Group '/home/alexandra/Documents/my-u-net/warwick_dataset'
Group '/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset'
Dataset 'label'
Size: 1x85
MaxSize: 1x85
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train'
Dataset 'anno'
Size: 522x775x3x85
MaxSize: 522x775x3x85
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'image'
Size: 522x775x3x85
MaxSize: 522x775x3x85
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000`
When I read the label dataset with h5read it works.
But when I try to train my network I got this error:
I0713 09:47:18.620510 4278 layer_factory.hpp:77] Creating layer loadMydata
I0713 09:47:18.620535 4278 net.cpp:91] Creating Layer loadMydata
I0713 09:47:18.620550 4278 net.cpp:399] loadMydata -> label
I0713 09:47:18.620580 4278 net.cpp:399] loadMydata -> anno
I0713 09:47:18.620600 4278 net.cpp:399] loadMydata -> image
I0713 09:47:18.620622 4278 hdf5_data_layer.cpp:79] Loading list of HDF5 filenames from: /home/alexandra/Documents/my-u-net/my_data.txt
I0713 09:47:18.620656 4278 hdf5_data_layer.cpp:93] Number of HDF5 files: 1
F0713 09:47:18.621317 4278 hdf5.cpp:14] Check failed: H5LTfind_dataset(file_id, dataset_name_) Failed to find HDF5 dataset label
*** Check failure stack trace: ***
# 0x7f2edf557daa (unknown)
# 0x7f2edf557ce4 (unknown)
# 0x7f2edf5576e6 (unknown)
# 0x7f2edf55a687 (unknown)
# 0x7f2edf908597 caffe::hdf5_load_nd_dataset_helper<>()
# 0x7f2edf907365 caffe::hdf5_load_nd_dataset<>()
# 0x7f2edf9579fe caffe::HDF5DataLayer<>::LoadHDF5FileData()
# 0x7f2edf956818 caffe::HDF5DataLayer<>::LayerSetUp()
# 0x7f2edf94fcbc caffe::Net<>::Init()
# 0x7f2edf950b45 caffe::Net<>::Net()
# 0x7f2edf91d08a caffe::Solver<>::InitTrainNet()
# 0x7f2edf91e18c caffe::Solver<>::Init()
# 0x7f2edf91e4ba caffe::Solver<>::Solver()
# 0x7f2edf930ed3 caffe::Creator_SGDSolver<>()
# 0x40e67e caffe::SolverRegistry<>::CreateSolver()
# 0x40794b train()
# 0x40590c main
# 0x7f2ede865f45 (unknown)
# 0x406041 (unknown)
# (nil) (unknown)
Aborted (core dumped)
In my .prototxt file :
layer {
top: 'label'
top:'anno'
top: 'image'
name: 'loadMydata'
type: "HDF5Data"
hdf5_data_param { source: '/home/alexandra/Documents/my-u-net/my_data.txt' batch_size: 1 }
include: { phase: TRAIN }
}
I don't know where I did something wrong, if anyone could help me it would be great !
your hdf5 file 'uNetDataSet.h5' does not have label in it.
What you have instead is '/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/label' - I hope you can spot the difference.
Try creating the dataset with
h5create(['uNetDataSet.h5'],'/image',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/anno',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/label',[1 numFrames]);
Please see this answer for more details. Also note that you might need to permute the input data before saving it to hdf5 using matlab.