Pytorch: How to load tensors from a few pt files lazily into neural network DataLoader - neural-network

I currently have 11 pt files of size "torch.Size([1000000, 3, 50, 40])". Each tensor for the cnn is 3x50x40. Each pt file has 1MM of these tensors. I cannot combined them due to memory limitations and I do not want to save them as 11MM individual pt files. Can anyone help me understand how to get these into a DataLoader?
With a smaller dataset I have used:
data_tensor = torch.load('')
dataset =, target_tensor)
train_set, val_set, test_set = random_split(dataset, [int(size*.8), int(size*.1), size-int(size*.8)-int(size*.1)])
train_loader = DataLoader(train_set, batch_size=128, num_workers=4, shuffle=True)
but with the size of these files this will not work. Thank you!


ORL dataset Pytorch dataset input data

I am trying to make a neural network in Pytorch to recognize faces from the famous Olivetti faces dataset (ORL dataset). The dimensions of the images are 32x32=1024, and there are a total of 400 of them with 40 classes. I transferred the dataset from the .mat file to Python's familiar variable environment.
orl = loadmat('ORL_32x32.mat')
x = orl["fea"]
y = orl["gnd"]
df = pd.DataFrame(x)
df_label = pd.DataFrame(y)
df.to_csv("data.csv", index = False)
df_label.to_csv("y.csv", index = False)
And after that I did the following code
label = torchvision.transforms.functional.to_tensor(df_label.values) #shape torch.Size([1, 400, 1])
df_tensor = torchvision.transforms.functional.to_tensor(df.values) #shape torch.Size([1, 400, 1024])
After that, I created a tensor dataset and started training through epochs.
trn = TensorDataset(df_tensor,label)
trn_dataloader =,batch_size=400,shuffle=False, num_workers=4)
for epoch in range(EPOCHS):
for batch_idx, (data, target) in enumerate(trn_dataloader):
print(data.shape) #torch.Size([1, 400, 1024])
Which is actually a big problem - because data.shape should be torch.Size([1, 1, 1024]) just one image, not the whole dataset looking as one image.
What is the best way to solve the whole problem?
You have specified the batch size of the dataloader to be 400, which you stated is the number of images in the dataset. The data tensor in the dataloader loop will therefore contain all images. If you set the batch size to 1, you will see that data will have shape (1, 1, 1024).
Depending on how you are training your model, you will adjust the batch size accordingly, but usually you do not train with 1 as batch size.
Since working with PyTorch, I would advise reshaping your data to the standard way for images, which is (batch size, number of channels, height, width). It seems like you are working with flattened images, so therefore the shape should be (batch size, number of features).
To me it seems like your data.csv has some wrong arrangements to be loaded the right way. When loaded, it mixes up the channel and batch size dimensions. But this can be fixed by permutating the tensor:
df_tensor = df_tensor.permute(1, 0, 2) # Shape: (1, 400, 1024) -> (400, 1, 1024)
Or scrapping the channel dimension since these are flattened images:
df_tensor = df_tensor.squeeze(0) # Shape: (1, 400, 1024) -> (400, 1024)

NaN Loss and Quantity of Data

While I was training a neural network with a training dataset of 40000 objects, I was having problems related to the loss function being equal to Nan at each epoch. After sampling the dataset, using 50% of it, this problem wasn't occurring any more. I was wondering how the size of the training data would have an impact in this setting. I used the following function to do the training:
def train_test_net_notLinearCDF(X,y,coefficient,test_input):
# Neural network
model = Sequential()
model.add(Dense(80, activation="relu", input_dim=X.shape[1]))
model.add(Dense(20, activation="tanh"))
model.add(Dense(1, activation="linear"))
opt_adam = Adam(clipvalue=0.5)
model.compile(loss='mean_squared_error', optimizer=opt_adam)
history =, y, epochs=100, validation_split = 0.2,batch_size=32)
fig1 = plt.gcf()
plt.suptitle('MSE de treino e Validação ' + coefficient)
plt.legend(['Train', 'Val'], loc='upper left')
fig1.savefig('losses_varying_alpha_'+coefficient+'.png', dpi=300)
y_pred = model.predict(test_input)
return y_pred
Thanks in advance.

What is wrong with my siamese network? Why does it output the same value(appx 0.5) irrespective of the input pairs?

I'm trying to build a Siamese Network for dataset. I've picked 10 Images per class from this dataset. There are a total of 131 classes in this dataset. I'm using the below model to train my network. However, it is failing to converge. I saw a strange behaviour, after 3000 epochs my results are 0.5000003 irrespective of the input pair I give and my loss stops at 0.61. The specifications of the network are as specified in the paper. I tried changing the following things,
Changing Denes layer activation to ReLU
Importing 'ImageNet' weights of ResNet50
Tried increasing and decreasing learning rate.
I also checked the batch inputs to see if the correct input pair (x) is paired with the correct y value. However, I think I'm doing something basically wrong. Glad if you could help me. Thank you :)
The notebook is hosted in Kaggle
If you have some doubts on how certain parts of the code works refer
#Building a sequential model
input_shape=(100, 100, 3)
left_input = Input(input_shape)
right_input = Input(input_shape)
W_init = keras.initializers.RandomNormal(mean = 0.0, stddev = 1e-2)
b_init = keras.initializers.RandomNormal(mean = 0.5, stddev = 1e-2)
model = keras.models.Sequential([
keras.layers.Conv2D(64, (10,10), activation='relu', input_shape=input_shape, kernel_initializer=W_init, kernel_regularizer=l2(2e-4)),
keras.layers.Conv2D(128, (7,7), activation='relu', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(2e-4)),
keras.layers.Conv2D(128, (4,4), activation='relu', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(2e-4)),
keras.layers.Conv2D(256, (4,4), activation='relu', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(2e-4)),
keras.layers.Dense(4096, activation='sigmoid', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(1e-3))
encoded_l = model(left_input)
encoded_r = model(right_input)
subtracted = keras.layers.Subtract()([encoded_l, encoded_r])
prediction = Dense(1, activation='sigmoid', bias_initializer=b_init)(subtracted)
siamese_net = Model(input=[left_input, right_input], output=prediction)
optimizer= Adam(learning_rate=0.0006)
siamese_net.compile(loss='binary_crossentropy', optimizer=optimizer)
plot_model(siamese_net, show_shapes=True, show_layer_names=True)
I have seen the notebook on kaggle. Thanks for all the information. But it seems that training and validation spilt is wrong. As this model trains on initial 91 classes only. What about remaining 40 classes. Train and validation spilt should be from the same class. Suppose I have 10 images in a class. I can use 8 image for train and 2 images for validation. Train and validation spilt should be on images not on classes. Also I couldn't see the testing script. It would be a great help if you can provide that also.

Using Keras LSTM to predict a single example after using batch training

I have a network model that is trained using batch training. Once it is trained, I want to predict the output for a single example.
Here is my model code:
model = Sequential()
model.add(Dense(32, batch_input_shape=(5, 1, 1)))
model.add(LSTM(16, stateful=True))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
I have a sequence of single inputs to single outputs. I'm doing some test code to map characters to next characters (A->B, B->C, etc).
I create an input data of shape (15,1,1) and an output data of shape (15, 1) and call the function:, y, nb_epoch=epochs, batch_size=5, shuffle=False, verbose=0)
The model trains, and now I want to take a single character and predict the next character (input A, it predicts B). I create an input of shape (1, 1, 1) and call:
pred = model.predict(x, batch_size=1, verbose=0)
This gives:
ValueError: Shape mismatch: x has 5 rows but z has 1 rows
I saw one solution was to add "dummy data" to your predict values, so the input shape for the prediction would be (5,1,1) with data [x 0 0 0 0] and you would just take the first element of the output as your value. However, this seems inefficient when dealing with larger batches.
I also tried to remove the batch size from the model creation, but I got the following message:
ValueError: If a RNN is stateful, a complete input_shape must be provided (including batch size).
Is there another way? Thanks for the help.
Currently (Keras v2.0.8) it takes a bit more effort to get predictions on single rows after training in batch.
Basically, the batch_size is fixed at training time, and has to be the same at prediction time.
The workaround right now is to take the weights from the trained model, and use those as the weights in a new model you've just created, which has a batch_size of 1.
The quick code for that is
model = create_model(batch_size=64), y)
weights = model.get_weights()
single_item_model = create_model(batch_size=1)
Here's a blog post that goes into more depth:
I've used this approach in the past to have multiple models at prediction time- one that makes predictions on big batches, one that makes predictions on small batches, and one that makes predictions on single items. Since batch predictions are much more efficient, this gives us the flexibility to take in any number of prediction rows (not just a number that is evenly divisible by batch_size), while still getting predictions pretty rapidly.
#ClimbsRocks showed a nice workaround. I cannot provide a "correct" answer in sense of "this is how Keras intends it to be done", but I can share another workaround which might help somebody depending on the use-case.
In this workaround I use predict_on_batch(). This method allows to pass a single sample out of a batch without throwing an error. Unfortunately, it returns a vector in the shape the target has according to the training-settings. However, each sample in the target yields then the prediction for your single sample.
You can access it like this:
to_predict = #Some single sample that would be part of a batch (has to have the right shape)#
model.predict_on_batch(to_predict)[0].flatten() #Flatten is optional
The result of the prediction is exactly the same as if you would pass an entire batch to predict().
Here some cod-example.
The code is from my question which also deals with this issue (but in a sligthly different manner).
sequence_size = 5
number_of_features = 1
input = (sequence_size, number_of_features)
batch_size = 2
model = Sequential()
#Of course you can replace the Gated Recurrent Unit with a LSTM-layer
model.add(GRU(100, return_sequences=True, activation='relu', input_shape=input, batch_size=2, name="GRU"))
model.add(GRU(1, return_sequences=True, activation='relu', input_shape=input, batch_size=batch_size, name="GRU2"))
model.compile(optimizer='adam', loss='mse')
Layer (type) Output Shape Param #
GRU (GRU) (2, 5, 100) 30600
GRU2 (GRU) (2, 5, 1) 306
Total params: 30,906
Trainable params: 30,906
Non-trainable params: 0
def generator(data, batch_size, sequence_size, num_features):
"""Simple generator"""
while True:
for i in range(len(data) - (sequence_size * batch_size + sequence_size) + 1):
start = i
end = i + (sequence_size * batch_size)
yield data[start : end].reshape(batch_size, sequence_size, num_features), \
data[end - ((sequence_size * batch_size) - sequence_size) : end + sequence_size].reshape(batch_size, sequence_size, num_features)
#Task: Predict the continuation of a linear range
data = np.arange(100)
hist = model.fit_generator(
generator=generator(data, batch_size, sequence_size, num_features),
to_predict = np.asarray([[np.asarray([x]) for x in range(95,100,1)]]) #Only single element of a batch
correct = np.asarray([100,101,102,103,104])
print( model.predict_on_batch(to_predict)[0].flatten() )
[ 99.92908 100.95854 102.32129 103.28584 104.20213 ]

using precomputed kernels with libsvm

I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So given these NxN kernel-matrices (for a total of N images) i want to train and test a SVM. I'm not very experienced using SVMs though.
What confuses me though is how to enter the input for training. Using a subset of the kernel MxM (M being the number of training images), trains the SVM with M features. However, if I understood it correctly this limits me to use test-data with similar amounts of features. Trying to use sub-kernel of size MxN, causes infinite loops during training, consequently, using more features when testing gives poor results.
This results in using equal sized training and test-sets giving reasonable results. But if i only would want to classify, say one image, or train with a given amount of images for each class and test with the rest, this doesn't work at all.
How can i remove the dependency between number of training images and features, so i can test with any number of images?
I'm using libsvm for MATLAB, the kernels are distance-matrices ranging between [0,1].
You seem to already have figured out the problem... According to the README file included in the MATLAB package:
To use precomputed kernel, you must include sample serial number as
the first column of the training and testing data.
Let me illustrate with an example:
%# read dataset
[dataClass, data] = libsvmread('./heart_scale');
%# split into train/test datasets
trainData = data(1:150,:);
testData = data(151:270,:);
trainClass = dataClass(1:150,:);
testClass = dataClass(151:270,:);
numTrain = size(trainData,1);
numTest = size(testData,1);
%# radial basis function: exp(-gamma*|u-v|^2)
sigma = 2e-3;
rbfKernel = #(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , rbfKernel(trainData,trainData) ];
KK = [ (1:numTest)' , rbfKernel(testData,trainData) ];
%# train and test
model = svmtrain(trainClass, K, '-t 4');
[predClass, acc, decVals] = svmpredict(testClass, KK, model);
%# confusion matrix
C = confusionmat(testClass,predClass)
The output:
optimization finished, #iter = 70
nu = 0.933333
obj = -117.027620, rho = 0.183062
nSV = 140, nBSV = 140
Total nSV = 140
Accuracy = 85.8333% (103/120) (classification)
C =
65 5
12 38