I am trying to make a neural network in Pytorch to recognize faces from the famous Olivetti faces dataset (ORL dataset). The dimensions of the images are 32x32=1024, and there are a total of 400 of them with 40 classes. I transferred the dataset from the .mat file to Python's familiar variable environment.
orl = loadmat('ORL_32x32.mat')
x = orl["fea"]
y = orl["gnd"]
df = pd.DataFrame(x)
df_label = pd.DataFrame(y)
df.to_csv("data.csv", index = False)
df_label.to_csv("y.csv", index = False)
And after that I did the following code
label = torchvision.transforms.functional.to_tensor(df_label.values) #shape torch.Size([1, 400, 1])
df_tensor = torchvision.transforms.functional.to_tensor(df.values) #shape torch.Size([1, 400, 1024])
After that, I created a tensor dataset and started training through epochs.
trn = TensorDataset(df_tensor,label)
#print(type(trn))
trn_dataloader = torch.utils.data.DataLoader(trn,batch_size=400,shuffle=False, num_workers=4)
for epoch in range(EPOCHS):
for batch_idx, (data, target) in enumerate(trn_dataloader):
print(data.shape) #torch.Size([1, 400, 1024])
Which is actually a big problem - because data.shape should be torch.Size([1, 1, 1024]) just one image, not the whole dataset looking as one image.
What is the best way to solve the whole problem?
You have specified the batch size of the dataloader to be 400, which you stated is the number of images in the dataset. The data tensor in the dataloader loop will therefore contain all images. If you set the batch size to 1, you will see that data will have shape (1, 1, 1024).
Depending on how you are training your model, you will adjust the batch size accordingly, but usually you do not train with 1 as batch size.
Since working with PyTorch, I would advise reshaping your data to the standard way for images, which is (batch size, number of channels, height, width). It seems like you are working with flattened images, so therefore the shape should be (batch size, number of features).
To me it seems like your data.csv has some wrong arrangements to be loaded the right way. When loaded, it mixes up the channel and batch size dimensions. But this can be fixed by permutating the tensor:
df_tensor = df_tensor.permute(1, 0, 2) # Shape: (1, 400, 1024) -> (400, 1, 1024)
Or scrapping the channel dimension since these are flattened images:
df_tensor = df_tensor.squeeze(0) # Shape: (1, 400, 1024) -> (400, 1024)
Related
I currently have 11 pt files of size "torch.Size([1000000, 3, 50, 40])". Each tensor for the cnn is 3x50x40. Each pt file has 1MM of these tensors. I cannot combined them due to memory limitations and I do not want to save them as 11MM individual pt files. Can anyone help me understand how to get these into a DataLoader?
With a smaller dataset I have used:
data_tensor = torch.load('tensor_1.pt')
dataset = torch.utils.data.TensorDataset(data_tensor, target_tensor)
train_set, val_set, test_set = random_split(dataset, [int(size*.8), int(size*.1), size-int(size*.8)-int(size*.1)])
train_loader = DataLoader(train_set, batch_size=128, num_workers=4, shuffle=True)
but with the size of these files this will not work. Thank you!
So. First of all, I am new to Neural Network (NN).
As part of my PhD, I am trying to solve some problem through NN.
For this, I have created a program that creates some data set made of
a collection of input vectors (each with 63 elements) and its corresponding
output vectors (each with 6 elements).
So, my program looks like this:
Nₜᵣ = 25; # number of inputs in the data set
xtrain, ytrain = dataset_generator(Nₜᵣ); # generates In/Out vectors: xtrain/ytrain
datatrain = zip(xtrain,ytrain); # ensamble my data
Now, both xtrain and ytrain are of type Array{Array{Float64,1},1}, meaning that
if (say)Nₜᵣ = 2, they look like:
julia> xtrain #same for ytrain
2-element Array{Array{Float64,1},1}:
[1.0, -0.062, -0.015, -1.0, 0.076, 0.19, -0.74, 0.057, 0.275, ....]
[0.39, -1.0, 0.12, -0.048, 0.476, 0.05, -0.086, 0.85, 0.292, ....]
The first 3 elements of each vector is normalized to unity (represents x,y,z coordinates), and the following 60 numbers are also normalized to unity and corresponds to some measurable attributes.
The program continues like:
layer1 = Dense(length(xtrain[1]),46,tanh); # setting 6 layers
layer2 = Dense(46,36,tanh) ;
layer3 = Dense(36,26,tanh) ;
layer4 = Dense(26,16,tanh) ;
layer5 = Dense(16,6,tanh) ;
layer6 = Dense(6,length(ytrain[1])) ;
m = Chain(layer1,layer2,layer3,layer4,layer5,layer6); # composing the layers
squaredCost(ym,y) = (1/2)*norm(y - ym).^2;
loss(x,y) = squaredCost(m(x),y); # define loss function
ps = Flux.params(m); # initializing mod.param.
opt = ADAM(0.01, (0.9, 0.8)); #
and finally:
trainmode!(m,true)
itermax = 700; # set max number of iterations
losses = [];
for iter in 1:itermax
Flux.train!(loss,ps,datatrain,opt);
push!(losses, sum(loss.(xtrain,ytrain)));
end
It runs perfectly, however, it comes to my attention that as I train my model with an increasing data set(Nₜᵣ = 10,15,25, etc...), the loss function seams to increase. See the image below:
Where: y1: Nₜᵣ=10, y2: Nₜᵣ=15, y3: Nₜᵣ=25.
So, my main question:
Why is this happening?. I can not see an explanation for this behavior. Is this somehow expected?
Remarks: Note that
All elements from the training data set (input and output) are normalized to [-1,1].
I have not tryed changing the activ. functions
I have not tryed changing the optimization method
Considerations: I need a training data set of near 10000 input vectors, and so I am expecting an even worse scenario...
Some personal thoughts:
Am I arranging my training dataset correctly?. Say, If every single data vector is made of 63 numbers, is it correctly to group them in an array? and then pile them into an ´´´Array{Array{Float64,1},1}´´´?. I have no experience using NN and flux. How can I made a data set of 10000 I/O vectors differently? Can this be the issue?. (I am very inclined to this)
Can this behavior be related to the chosen act. functions? (I am not inclined to this)
Can this behavior be related to the opt. algorithm? (I am not inclined to this)
Am I training my model wrong?. Is the iteration loop really iterations or are they epochs. I am struggling to put(differentiate) this concept of "epochs" and "iterations" into practice.
loss(x,y) = squaredCost(m(x),y); # define loss function
Your losses aren't normalized, so adding more data can only increase this cost function. However, the cost per data doesn't seem to be increasing. To get rid of this effect, you might want to use a normalized cost function by doing something like using the mean squared cost.
I have a network model that is trained using batch training. Once it is trained, I want to predict the output for a single example.
Here is my model code:
model = Sequential()
model.add(Dense(32, batch_input_shape=(5, 1, 1)))
model.add(LSTM(16, stateful=True))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
I have a sequence of single inputs to single outputs. I'm doing some test code to map characters to next characters (A->B, B->C, etc).
I create an input data of shape (15,1,1) and an output data of shape (15, 1) and call the function:
model.fit(x, y, nb_epoch=epochs, batch_size=5, shuffle=False, verbose=0)
The model trains, and now I want to take a single character and predict the next character (input A, it predicts B). I create an input of shape (1, 1, 1) and call:
pred = model.predict(x, batch_size=1, verbose=0)
This gives:
ValueError: Shape mismatch: x has 5 rows but z has 1 rows
I saw one solution was to add "dummy data" to your predict values, so the input shape for the prediction would be (5,1,1) with data [x 0 0 0 0] and you would just take the first element of the output as your value. However, this seems inefficient when dealing with larger batches.
I also tried to remove the batch size from the model creation, but I got the following message:
ValueError: If a RNN is stateful, a complete input_shape must be provided (including batch size).
Is there another way? Thanks for the help.
Currently (Keras v2.0.8) it takes a bit more effort to get predictions on single rows after training in batch.
Basically, the batch_size is fixed at training time, and has to be the same at prediction time.
The workaround right now is to take the weights from the trained model, and use those as the weights in a new model you've just created, which has a batch_size of 1.
The quick code for that is
model = create_model(batch_size=64)
mode.fit(X, y)
weights = model.get_weights()
single_item_model = create_model(batch_size=1)
single_item_model.set_weights(weights)
single_item_model.compile(compile_params)
Here's a blog post that goes into more depth:
https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
I've used this approach in the past to have multiple models at prediction time- one that makes predictions on big batches, one that makes predictions on small batches, and one that makes predictions on single items. Since batch predictions are much more efficient, this gives us the flexibility to take in any number of prediction rows (not just a number that is evenly divisible by batch_size), while still getting predictions pretty rapidly.
#ClimbsRocks showed a nice workaround. I cannot provide a "correct" answer in sense of "this is how Keras intends it to be done", but I can share another workaround which might help somebody depending on the use-case.
In this workaround I use predict_on_batch(). This method allows to pass a single sample out of a batch without throwing an error. Unfortunately, it returns a vector in the shape the target has according to the training-settings. However, each sample in the target yields then the prediction for your single sample.
You can access it like this:
to_predict = #Some single sample that would be part of a batch (has to have the right shape)#
model.predict_on_batch(to_predict)[0].flatten() #Flatten is optional
The result of the prediction is exactly the same as if you would pass an entire batch to predict().
Here some cod-example.
The code is from my question which also deals with this issue (but in a sligthly different manner).
sequence_size = 5
number_of_features = 1
input = (sequence_size, number_of_features)
batch_size = 2
model = Sequential()
#Of course you can replace the Gated Recurrent Unit with a LSTM-layer
model.add(GRU(100, return_sequences=True, activation='relu', input_shape=input, batch_size=2, name="GRU"))
model.add(GRU(1, return_sequences=True, activation='relu', input_shape=input, batch_size=batch_size, name="GRU2"))
model.compile(optimizer='adam', loss='mse')
model.summary()
#Summary-output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
GRU (GRU) (2, 5, 100) 30600
_________________________________________________________________
GRU2 (GRU) (2, 5, 1) 306
=================================================================
Total params: 30,906
Trainable params: 30,906
Non-trainable params: 0
def generator(data, batch_size, sequence_size, num_features):
"""Simple generator"""
while True:
for i in range(len(data) - (sequence_size * batch_size + sequence_size) + 1):
start = i
end = i + (sequence_size * batch_size)
yield data[start : end].reshape(batch_size, sequence_size, num_features), \
data[end - ((sequence_size * batch_size) - sequence_size) : end + sequence_size].reshape(batch_size, sequence_size, num_features)
#Task: Predict the continuation of a linear range
data = np.arange(100)
hist = model.fit_generator(
generator=generator(data, batch_size, sequence_size, num_features),
steps_per_epoch=total_batches,
epochs=200,
shuffle=False
)
to_predict = np.asarray([[np.asarray([x]) for x in range(95,100,1)]]) #Only single element of a batch
correct = np.asarray([100,101,102,103,104])
print( model.predict_on_batch(to_predict)[0].flatten() )
#Output:
[ 99.92908 100.95854 102.32129 103.28584 104.20213 ]
I am trying to emulate something equivalent to a SeparableConvolution2D layer for the theano backend (it already exists for the TensorFlow backend). As the first step What I need to do is pass ONE channel from a tensor into the next layer. So say I have a 2D convolution layer called conv1 with 16 filters which produces an output with shape: (batch_size, 16, height, width) I need to select the subtensor with shape (: , 0, : , : ) and pass it to the next layer. Simple enough right?
This is my code:
from keras import backend as K
image_input = Input(batch_shape = (batch_size, 1, height, width ), name = 'image_input' )
conv1 = Convolution2D(16, 3, 3, name='conv1', activation = 'relu')(image_input)
conv2_input = K.reshape(conv1[:,0,:,:] , (batch_size, 1, height, width))
conv2 = Convolution2D(16, 3, 3, name='conv1', activation = 'relu')(conv2_input)
This throws:
Exception: You tried to call layer "conv1". This layer has no information about its expected input shape, and thus cannot be built. You can build it manually via: layer.build(batch_input_shape)
Why does the layer not have the required shape information? I'm using reshape from the theano backend. Is this the right way of passing individual channels to the next layer?
I asked this question on the keras-user group and I got an answer there:
https://groups.google.com/forum/#!topic/keras-users/bbQ5CbVXT1E
Quoting it:
You need to use a lambda layer, like: Lambda(x: x[:, 0:1, :, :], output_shape=lambda x: (x[0], 1, x[2], x[3]))
Note that such a manual implementation of a separable convolution would be horribly inefficient. The correct solution is to use the TensorFlow backend.
I'm building one but, and when I use the custom train function provided on lenet example with a batch size bigger than 110 my accuracy gets bigger than 1 (100%).
If I use batch size 32, I get 30 percent of accuracy. Batch size equal 64 my net accuracy is 64. And batch size equal to 128, the accuracy is 1.2.
My images are 32x32.
Train dataset: 56 images of Neutral faces. 60 images of Surprise faces. Test dataset: 15 images of Neutral faces. 15 images of Surprise faces.
This is my code:
def train(solver):
niter = 200
test_interval = 25
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
output = zeros((niter, 32, 2))
for it in range(niter):
solver.step(1)
train_loss[it] = solver.net.blobs['loss'].data
solver.test_nets[0].forward(start='conv1')
output[it] = solver.test_nets[0].blobs['ip2'].data[:32]
if it % test_interval == 0:
print 'Iteration', it, 'testing...'
correct = 0
for test_it in range(100):
solver.test_nets[0].forward()
correct += sum(solver.test_nets[0].blobs['ip2'].data.argmax(1) == solver.test_nets[0].blobs['label'].data)
test_acc[it // test_interval] = correct / 1e4
So, what is wrong with my code?
In your testing code you run 100 iterations (for test_it in range(100)), on each iteration you compute correct as number of examples in a batch that are correct. You then divide that number by 1e4.
Let's assume your model is very good and has almost 100% prediction rate. Then with batch size of 32 on each of 100 iterations you will add 32 to correct, yielding 3200. You then divide it by 1e4, ending up with 0.32, which is almost consistent with what you see (your number is slightly less because sometimes your model does mispredict the target).
To fix it, you can replace
test_acc[it // test_interval] = correct / 1e4
with
test_acc[it // test_interval] = correct / (100.0 * batch_size)