HDF5 data or label incorrect [duplicate] - matlab

I have the train and label data as data.mat. (I have 200 training data with 6000 features and labels are (-1, +1) that have saved in data.mat).
I am trying to convert my data in hdf5 and run Caffe using:
load data.mat
hdf5write('my_data.h5', '/new_train_x', single( reshape(new_train_x,[200, 6000, 1, 1]) ) );
hdf5write('my_data.h5', '/label_train', single( reshape(label_train,[200, 1, 1, 1]) ), 'WriteMode', 'append' );
And my layer.prototxt (just data layer) is:
layer {
type: "HDF5Data"
name: "data"
top: "new_train_x" # note: same name as in HDF5
top: "label_train" #
hdf5_data_param {
source: "/path/to/list/file.txt"
batch_size: 20
}
include { phase: TRAIN }
}
but, i have an error:
( Check failed: hdf_blobs_[i]->shape(0) == num (200 vs. 6000))
I1222 17:02:48.915861 3941 layer_factory.hpp:76] Creating layer data
I1222 17:02:48.915871 3941 net.cpp:110] Creating Layer data
I1222 17:02:48.915877 3941 net.cpp:433] data -> new_train_x
I1222 17:02:48.915890 3941 net.cpp:433] data -> label_train
I1222 17:02:48.915900 3941 hdf5_data_layer.cpp:81] Loading list of HDF5 filenames from: file.txt
I1222 17:02:48.915923 3941 hdf5_data_layer.cpp:95] Number of HDF5 files: 1
F1222 17:02:48.993865 3941 hdf5_data_layer.cpp:55] Check failed: hdf_blobs_[i]->shape(0) == num (200 vs. 6000)
*** Check failure stack trace: ***
# 0x7fd2e6608ddd google::LogMessage::Fail()
# 0x7fd2e660ac90 google::LogMessage::SendToLog()
# 0x7fd2e66089a2 google::LogMessage::Flush()
# 0x7fd2e660b6ae google::LogMessageFatal::~LogMessageFatal()
# 0x7fd2e69f9eda caffe::HDF5DataLayer<>::LoadHDF5FileData()
# 0x7fd2e69f901f caffe::HDF5DataLayer<>::LayerSetUp()
# 0x7fd2e6a48030 caffe::Net<>::Init()
# 0x7fd2e6a49278 caffe::Net<>::Net()
# 0x7fd2e6a9157a caffe::Solver<>::InitTrainNet()
# 0x7fd2e6a928b1 caffe::Solver<>::Init()
# 0x7fd2e6a92c19 caffe::Solver<>::Solver()
# 0x41222d caffe::GetSolver<>()
# 0x408ed9 train()
# 0x406741 main
# 0x7fd2e533ca40 (unknown)
# 0x406f69 _start
Aborted (core dumped)
Many thanks!!!! Any advice would be appreciated!

The problem
It seems like there is indeed a conflict with the order of elements in arrays: matlab arranges the elements from the first dimension to the last (like fortran), while caffe and hdf5 stores the arrays from last dimension to first:
Suppose we have X of shape nxcxhxw then the "second element of X" is X[2,1,1,1] in matlab but X[0,0,0,1] in C (1-based vs 0-based indexing doesn't make life easier at all).
Therefore, when you save an array of size=[200, 6000, 1, 1] in Matlab, what hdf5 and caffe are actually seeing is as array of shape=[6000,200].
Using the h5ls command line tool can help you spot the problem.
In matlab you saved
>> hdf5write('my_data.h5', '/new_train_x',
single( reshape(new_train_x,[200, 6000, 1, 1]) );
>> hdf5write('my_data.h5', '/label_train',
single( reshape(label_train,[200, 1, 1, 1]) ),
'WriteMode', 'append' );
Now you can inspect the resulting my_data.h5 using h5ls (in Linux terminal):
user#host:~/$ h5ls ./my_data.h5
label_train Dataset {200}
new_train_x Dataset {6000, 200}
As you can see, the arrays are written "backwards".
Solution
Taking this conflict into account when exporting data from Matlab, you should permute:
load data.mat
hdf5write('my_data.h5', '/new_train_x',
single( permute(reshape(new_train_x,[200, 6000, 1, 1]),[4:-1:1] ) );
hdf5write('my_data.h5', '/label_train',
single( permute(reshape(label_train,[200, 1, 1, 1]), [4:-1:1] ) ),
'WriteMode', 'append' );
Inspect the resulting my_data.h5 using h5ls now results with:
user#host:~/$ h5ls ./my_data.h5
label_train Dataset {200, 1, 1, 1}
new_train_x Dataset {200, 6000, 1, 1}
Which is what you expected in the first place.

Related

TRN-Pytorch model - RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I'm using TRN-Pytorch model in colab and pytorch version is 0.4.1. While training the model I got a RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
This is code for training
!python3 main.py something RGB \
--arch BNInception --num_segments 3 \
--consensus_type TRN --batch-size 2
and I got this Error
storing name: TRN_something_RGB_BNInception_TRN_segment3
Initializing TSN with base model: BNInception.
TSN Configurations:
input_modality: RGB
num_segments: 3
new_length: 1
consensus_module: TRN
dropout_ratio: 0.8
img_feature_dim: 256
/content/drive/My Drive/TRN-pytorch/models.py:87: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
normal(self.new_fc.weight, 0, std)
/content/drive/My Drive/TRN-pytorch/models.py:88: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
constant(self.new_fc.bias, 0)
video number:4
/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py:208: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
"please use transforms.Resize instead.")
video number:1
group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 71 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 71 params, lr_mult: 2, decay_mult: 0
group: BN scale/shift has 2 params, lr_mult: 1, decay_mult: 0
Freezing BatchNorm2D except the first one.
Traceback (most recent call last):
File "main.py", line 324, in <module>
main()
File "main.py", line 128, in main
train(train_loader, model, criterion, optimizer, epoch, log_training)
File "main.py", line 175, in train
prec1, prec5 = accuracy(output.data, target, topk=(1,5))
File "main.py", line 301, in accuracy
batch_size = target.size(1)
RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
main.py file is here
Help me to solve this issue

Keras: What is the correct data format for recurrent networks?

I am trying to build a recurrent network which classifies sequences (multidimensional data streams). I must be missing something, since while running my code:
from keras.models import Sequential
from keras.layers import LSTM, Dropout, Activation
import numpy as np
ils = 10 # input layer size
ilt = 11 # input layer time steps
hls = 12 # hidden layer size
nhl = 2 # number of hidden layers
ols = 1 # output layer size
p = 0.2 # dropout probability
f_a = 'relu' # activation function
opt = 'rmsprop' # optimizing function
#
# Building the model
#
model = Sequential()
# The input layer
model.add(LSTM(hls, input_shape=(ilt, ils), return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Hidden layers
for i in range(nhl - 1):
model.add(LSTM(hls, return_sequences=True))
model.add(Activation(f_a))
model.add(Dropout(p))
# Output layer
model.add(LSTM(ols, return_sequences=False))
model.add(Activation('softmax'))
model.compile(optimizer=opt, loss='binary_crossentropy')
#
# Making test data and fitting the model
#
m_train, n_class = 1000, 2
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
model.fit(data, labels, nb_epoch=10, batch_size=32)
I get output (truncated):
Using Theano backend.
line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 430, in make_node
new_inputs.append(format(outer_seq, as_var=inner_seq))
File "/home/koala/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 422, in format
rval = tmp.filter_variable(rval)
File "/home/koala/.local/lib/python2.7/site-packages/theano/tensor/type.py", line 233, in filter_variable
self=self))
TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable Subtensor{:int64:}.0) into Type TensorType(float32, (False, False, True)). You can try to manually convert Subtensor{:int64:}.0 into a TensorType(float32, (False, False, True)).
Is this a problem with the data format at all.
For me the problem was fixed when I went and tried it on my real dataset. The difference being that in the real dataset I have more than 1 label. So an example of dataset on which this code works is:
(...)
ols = 2 # Output layer size
(...)
m_train, n_class = 1000, ols
data = np.array(np.random.random((m_train, ilt, ils)))
labels = np.random.randint(n_class, size=(m_train, 1))
# Make labels onehot
onehot_labels = np.zeros(shape=(labels.shape[0], ols))
onehot_labels[np.arange(labels.shape[0]), labels.astype(np.int)] = 1

Error : H5LTfind_dataset(file_id, dataset_name_) Failed to find HDF5 dataset label

I want to use HDF5 file to input my data and labels in my CNN.
I created the hdf5 file with matlab.
Here is my code:
h5create(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/image',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/anno',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/label',[1 numFrames]);`
h5write(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/image',images);
h5write(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train/anno',anno);
h5write(['uNetDataSet.h5'],'/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/label',label);`
Where image, anno are 4D unit8 and label is a 1x85 unit16 vector.
When I display my .h5 file I got this:
HDF5 uNetDataSet.h5
Group '/'
Group '/home'
Group '/home/alexandra'
Group '/home/alexandra/Documents'
Group '/home/alexandra/Documents/my-u-net'
Group '/home/alexandra/Documents/my-u-net/warwick_dataset'
Group '/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset'
Dataset 'label'
Size: 1x85
MaxSize: 1x85
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/train'
Dataset 'anno'
Size: 522x775x3x85
MaxSize: 522x775x3x85
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'image'
Size: 522x775x3x85
MaxSize: 522x775x3x85
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000`
When I read the label dataset with h5read it works.
But when I try to train my network I got this error:
I0713 09:47:18.620510 4278 layer_factory.hpp:77] Creating layer loadMydata
I0713 09:47:18.620535 4278 net.cpp:91] Creating Layer loadMydata
I0713 09:47:18.620550 4278 net.cpp:399] loadMydata -> label
I0713 09:47:18.620580 4278 net.cpp:399] loadMydata -> anno
I0713 09:47:18.620600 4278 net.cpp:399] loadMydata -> image
I0713 09:47:18.620622 4278 hdf5_data_layer.cpp:79] Loading list of HDF5 filenames from: /home/alexandra/Documents/my-u-net/my_data.txt
I0713 09:47:18.620656 4278 hdf5_data_layer.cpp:93] Number of HDF5 files: 1
F0713 09:47:18.621317 4278 hdf5.cpp:14] Check failed: H5LTfind_dataset(file_id, dataset_name_) Failed to find HDF5 dataset label
*** Check failure stack trace: ***
# 0x7f2edf557daa (unknown)
# 0x7f2edf557ce4 (unknown)
# 0x7f2edf5576e6 (unknown)
# 0x7f2edf55a687 (unknown)
# 0x7f2edf908597 caffe::hdf5_load_nd_dataset_helper<>()
# 0x7f2edf907365 caffe::hdf5_load_nd_dataset<>()
# 0x7f2edf9579fe caffe::HDF5DataLayer<>::LoadHDF5FileData()
# 0x7f2edf956818 caffe::HDF5DataLayer<>::LayerSetUp()
# 0x7f2edf94fcbc caffe::Net<>::Init()
# 0x7f2edf950b45 caffe::Net<>::Net()
# 0x7f2edf91d08a caffe::Solver<>::InitTrainNet()
# 0x7f2edf91e18c caffe::Solver<>::Init()
# 0x7f2edf91e4ba caffe::Solver<>::Solver()
# 0x7f2edf930ed3 caffe::Creator_SGDSolver<>()
# 0x40e67e caffe::SolverRegistry<>::CreateSolver()
# 0x40794b train()
# 0x40590c main
# 0x7f2ede865f45 (unknown)
# 0x406041 (unknown)
# (nil) (unknown)
Aborted (core dumped)
In my .prototxt file :
layer {
top: 'label'
top:'anno'
top: 'image'
name: 'loadMydata'
type: "HDF5Data"
hdf5_data_param { source: '/home/alexandra/Documents/my-u-net/my_data.txt' batch_size: 1 }
include: { phase: TRAIN }
}
I don't know where I did something wrong, if anyone could help me it would be great !
your hdf5 file 'uNetDataSet.h5' does not have label in it.
What you have instead is '/home/alexandra/Documents/my-u-net/warwick_dataset/Warwick_Dataset/label' - I hope you can spot the difference.
Try creating the dataset with
h5create(['uNetDataSet.h5'],'/image',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/anno',[522 775 3 numFrames]);
h5create(['uNetDataSet.h5'],'/label',[1 numFrames]);
Please see this answer for more details. Also note that you might need to permute the input data before saving it to hdf5 using matlab.

Feed my own image data on caffe [duplicate]

I have the train and label data as data.mat. (I have 200 training data with 6000 features and labels are (-1, +1) that have saved in data.mat).
I am trying to convert my data in hdf5 and run Caffe using:
load data.mat
hdf5write('my_data.h5', '/new_train_x', single( reshape(new_train_x,[200, 6000, 1, 1]) ) );
hdf5write('my_data.h5', '/label_train', single( reshape(label_train,[200, 1, 1, 1]) ), 'WriteMode', 'append' );
And my layer.prototxt (just data layer) is:
layer {
type: "HDF5Data"
name: "data"
top: "new_train_x" # note: same name as in HDF5
top: "label_train" #
hdf5_data_param {
source: "/path/to/list/file.txt"
batch_size: 20
}
include { phase: TRAIN }
}
but, i have an error:
( Check failed: hdf_blobs_[i]->shape(0) == num (200 vs. 6000))
I1222 17:02:48.915861 3941 layer_factory.hpp:76] Creating layer data
I1222 17:02:48.915871 3941 net.cpp:110] Creating Layer data
I1222 17:02:48.915877 3941 net.cpp:433] data -> new_train_x
I1222 17:02:48.915890 3941 net.cpp:433] data -> label_train
I1222 17:02:48.915900 3941 hdf5_data_layer.cpp:81] Loading list of HDF5 filenames from: file.txt
I1222 17:02:48.915923 3941 hdf5_data_layer.cpp:95] Number of HDF5 files: 1
F1222 17:02:48.993865 3941 hdf5_data_layer.cpp:55] Check failed: hdf_blobs_[i]->shape(0) == num (200 vs. 6000)
*** Check failure stack trace: ***
# 0x7fd2e6608ddd google::LogMessage::Fail()
# 0x7fd2e660ac90 google::LogMessage::SendToLog()
# 0x7fd2e66089a2 google::LogMessage::Flush()
# 0x7fd2e660b6ae google::LogMessageFatal::~LogMessageFatal()
# 0x7fd2e69f9eda caffe::HDF5DataLayer<>::LoadHDF5FileData()
# 0x7fd2e69f901f caffe::HDF5DataLayer<>::LayerSetUp()
# 0x7fd2e6a48030 caffe::Net<>::Init()
# 0x7fd2e6a49278 caffe::Net<>::Net()
# 0x7fd2e6a9157a caffe::Solver<>::InitTrainNet()
# 0x7fd2e6a928b1 caffe::Solver<>::Init()
# 0x7fd2e6a92c19 caffe::Solver<>::Solver()
# 0x41222d caffe::GetSolver<>()
# 0x408ed9 train()
# 0x406741 main
# 0x7fd2e533ca40 (unknown)
# 0x406f69 _start
Aborted (core dumped)
Many thanks!!!! Any advice would be appreciated!
The problem
It seems like there is indeed a conflict with the order of elements in arrays: matlab arranges the elements from the first dimension to the last (like fortran), while caffe and hdf5 stores the arrays from last dimension to first:
Suppose we have X of shape nxcxhxw then the "second element of X" is X[2,1,1,1] in matlab but X[0,0,0,1] in C (1-based vs 0-based indexing doesn't make life easier at all).
Therefore, when you save an array of size=[200, 6000, 1, 1] in Matlab, what hdf5 and caffe are actually seeing is as array of shape=[6000,200].
Using the h5ls command line tool can help you spot the problem.
In matlab you saved
>> hdf5write('my_data.h5', '/new_train_x',
single( reshape(new_train_x,[200, 6000, 1, 1]) );
>> hdf5write('my_data.h5', '/label_train',
single( reshape(label_train,[200, 1, 1, 1]) ),
'WriteMode', 'append' );
Now you can inspect the resulting my_data.h5 using h5ls (in Linux terminal):
user#host:~/$ h5ls ./my_data.h5
label_train Dataset {200}
new_train_x Dataset {6000, 200}
As you can see, the arrays are written "backwards".
Solution
Taking this conflict into account when exporting data from Matlab, you should permute:
load data.mat
hdf5write('my_data.h5', '/new_train_x',
single( permute(reshape(new_train_x,[200, 6000, 1, 1]),[4:-1:1] ) );
hdf5write('my_data.h5', '/label_train',
single( permute(reshape(label_train,[200, 1, 1, 1]), [4:-1:1] ) ),
'WriteMode', 'append' );
Inspect the resulting my_data.h5 using h5ls now results with:
user#host:~/$ h5ls ./my_data.h5
label_train Dataset {200, 1, 1, 1}
new_train_x Dataset {200, 6000, 1, 1}
Which is what you expected in the first place.

[caffe]: check fails: Check failed: hdf_blobs_[i]->shape(0) == num (200 vs. 6000)

I have the train and label data as data.mat. (I have 200 training data with 6000 features and labels are (-1, +1) that have saved in data.mat).
I am trying to convert my data in hdf5 and run Caffe using:
load data.mat
hdf5write('my_data.h5', '/new_train_x', single( reshape(new_train_x,[200, 6000, 1, 1]) ) );
hdf5write('my_data.h5', '/label_train', single( reshape(label_train,[200, 1, 1, 1]) ), 'WriteMode', 'append' );
And my layer.prototxt (just data layer) is:
layer {
type: "HDF5Data"
name: "data"
top: "new_train_x" # note: same name as in HDF5
top: "label_train" #
hdf5_data_param {
source: "/path/to/list/file.txt"
batch_size: 20
}
include { phase: TRAIN }
}
but, i have an error:
( Check failed: hdf_blobs_[i]->shape(0) == num (200 vs. 6000))
I1222 17:02:48.915861 3941 layer_factory.hpp:76] Creating layer data
I1222 17:02:48.915871 3941 net.cpp:110] Creating Layer data
I1222 17:02:48.915877 3941 net.cpp:433] data -> new_train_x
I1222 17:02:48.915890 3941 net.cpp:433] data -> label_train
I1222 17:02:48.915900 3941 hdf5_data_layer.cpp:81] Loading list of HDF5 filenames from: file.txt
I1222 17:02:48.915923 3941 hdf5_data_layer.cpp:95] Number of HDF5 files: 1
F1222 17:02:48.993865 3941 hdf5_data_layer.cpp:55] Check failed: hdf_blobs_[i]->shape(0) == num (200 vs. 6000)
*** Check failure stack trace: ***
# 0x7fd2e6608ddd google::LogMessage::Fail()
# 0x7fd2e660ac90 google::LogMessage::SendToLog()
# 0x7fd2e66089a2 google::LogMessage::Flush()
# 0x7fd2e660b6ae google::LogMessageFatal::~LogMessageFatal()
# 0x7fd2e69f9eda caffe::HDF5DataLayer<>::LoadHDF5FileData()
# 0x7fd2e69f901f caffe::HDF5DataLayer<>::LayerSetUp()
# 0x7fd2e6a48030 caffe::Net<>::Init()
# 0x7fd2e6a49278 caffe::Net<>::Net()
# 0x7fd2e6a9157a caffe::Solver<>::InitTrainNet()
# 0x7fd2e6a928b1 caffe::Solver<>::Init()
# 0x7fd2e6a92c19 caffe::Solver<>::Solver()
# 0x41222d caffe::GetSolver<>()
# 0x408ed9 train()
# 0x406741 main
# 0x7fd2e533ca40 (unknown)
# 0x406f69 _start
Aborted (core dumped)
Many thanks!!!! Any advice would be appreciated!
The problem
It seems like there is indeed a conflict with the order of elements in arrays: matlab arranges the elements from the first dimension to the last (like fortran), while caffe and hdf5 stores the arrays from last dimension to first:
Suppose we have X of shape nxcxhxw then the "second element of X" is X[2,1,1,1] in matlab but X[0,0,0,1] in C (1-based vs 0-based indexing doesn't make life easier at all).
Therefore, when you save an array of size=[200, 6000, 1, 1] in Matlab, what hdf5 and caffe are actually seeing is as array of shape=[6000,200].
Using the h5ls command line tool can help you spot the problem.
In matlab you saved
>> hdf5write('my_data.h5', '/new_train_x',
single( reshape(new_train_x,[200, 6000, 1, 1]) );
>> hdf5write('my_data.h5', '/label_train',
single( reshape(label_train,[200, 1, 1, 1]) ),
'WriteMode', 'append' );
Now you can inspect the resulting my_data.h5 using h5ls (in Linux terminal):
user#host:~/$ h5ls ./my_data.h5
label_train Dataset {200}
new_train_x Dataset {6000, 200}
As you can see, the arrays are written "backwards".
Solution
Taking this conflict into account when exporting data from Matlab, you should permute:
load data.mat
hdf5write('my_data.h5', '/new_train_x',
single( permute(reshape(new_train_x,[200, 6000, 1, 1]),[4:-1:1] ) );
hdf5write('my_data.h5', '/label_train',
single( permute(reshape(label_train,[200, 1, 1, 1]), [4:-1:1] ) ),
'WriteMode', 'append' );
Inspect the resulting my_data.h5 using h5ls now results with:
user#host:~/$ h5ls ./my_data.h5
label_train Dataset {200, 1, 1, 1}
new_train_x Dataset {200, 6000, 1, 1}
Which is what you expected in the first place.