Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to train a classifier to distinguish songs genres from the raw audio spectrum. For this I use a deep convolutional network in tflearn. However, the network will not converge/learn/the loss is increasing. I would be grateful if someone had an idea of why this might be.
The data I'm using is 128x128 grayscale images of the spectrogram, classified between Classical music (500 examples) and Hard rock (500 examples), 1-hot encoded.
Here's what the samples look like:
Classical extract
I can tell the difference between the two classes (I cannot show it because of stackoverflow's limit), and I doubt that a deep CNN simply is not capable of classifying these.
Here's what my loss looks like:
Loss plot in tflearn
The code I used in tflearn for the model is the following:
convnet = input_data(shape=[None, 128, 128, 1], name='input')
convnet = conv_2d(convnet, 64, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = conv_2d(convnet, 32, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = conv_2d(convnet, 128, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = conv_2d(convnet, 64, 2, activation='elu', weights_init="Xavier")
convnet = max_pool_2d(convnet, 2)
convnet = fully_connected(convnet, 1024, activation='elu')
convnet = dropout(convnet, 0.5)
convnet = fully_connected(convnet, 2, activation='softmax')
convnet = regression(convnet, optimizer='rmsprop', learning_rate=0.01, loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(convnet)
model.fit({'input': train_X}, {'targets': train_y}, n_epoch=100, batch_size=64, shuffle=True, validation_set=({'input': test_X}, {'targets': test_y}),
snapshot_step=100, show_metric=True)
Thank you very much for you help !
The few things I would usually try are:
lower learning rate
try another activation
remove dropout temporarily
HTH
Related
I currently have 11 pt files of size "torch.Size([1000000, 3, 50, 40])". Each tensor for the cnn is 3x50x40. Each pt file has 1MM of these tensors. I cannot combined them due to memory limitations and I do not want to save them as 11MM individual pt files. Can anyone help me understand how to get these into a DataLoader?
With a smaller dataset I have used:
data_tensor = torch.load('tensor_1.pt')
dataset = torch.utils.data.TensorDataset(data_tensor, target_tensor)
train_set, val_set, test_set = random_split(dataset, [int(size*.8), int(size*.1), size-int(size*.8)-int(size*.1)])
train_loader = DataLoader(train_set, batch_size=128, num_workers=4, shuffle=True)
but with the size of these files this will not work. Thank you!
I am using one layer to extract features from image. The old layer is
self.conv = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1, bias=False)
New layer is
resConv = models.resnet50(pretrained=True)
resConv.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1, bias=False)
self.conv = resConv.conv1
Any performance difference? or both layers are same.
Almost, there are a few differences however. According to the paper https://arxiv.org/abs/1512.03385, resnet50's conv1 layer (as you would find in torchvision.models.resnet50()) has
conv1 = {Conv2d} Conv2d(3, 64, kernel_size=(7,7), stride=(2,2), padding=(3,3), bias=False).
So the differences area) the kernel_size is 7x7 not 3x3b) padding of 3x3 not 1x1 andc) the weights from resnet50.conv1 from the pretrained model will be conditioned on the training and not initialized as random normal as the will be for nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1, bias=False)
The main difference here is, in self.Conv, weights are default (for example zero) and need to train, but in resConv.Conv1 which uses a pre-trained model, weights are tuned because it is trained with large datasets before.
performance result depends on your task but in general, it helps to get a better result (better local optimal) or need fewer epoch to train.
I'm trying to build a Siamese Network for https://www.kaggle.com/moltean/fruits dataset. I've picked 10 Images per class from this dataset. There are a total of 131 classes in this dataset. I'm using the below model to train my network. However, it is failing to converge. I saw a strange behaviour, after 3000 epochs my results are 0.5000003 irrespective of the input pair I give and my loss stops at 0.61. The specifications of the network are as specified in the paper. I tried changing the following things,
Changing Denes layer activation to ReLU
Importing 'ImageNet' weights of ResNet50
Tried increasing and decreasing learning rate.
I also checked the batch inputs to see if the correct input pair (x) is paired with the correct y value. However, I think I'm doing something basically wrong. Glad if you could help me. Thank you :)
The notebook is hosted in Kaggle https://www.kaggle.com/krishnaprasad96/siamese-network.
If you have some doubts on how certain parts of the code works refer https://medium.com/#krishnaprasad_54871/siamese-networks-line-by-line-explanation-for-beginners-55b8be1d2fc6
#Building a sequential model
input_shape=(100, 100, 3)
left_input = Input(input_shape)
right_input = Input(input_shape)
W_init = keras.initializers.RandomNormal(mean = 0.0, stddev = 1e-2)
b_init = keras.initializers.RandomNormal(mean = 0.5, stddev = 1e-2)
model = keras.models.Sequential([
keras.layers.Conv2D(64, (10,10), activation='relu', input_shape=input_shape, kernel_initializer=W_init, kernel_regularizer=l2(2e-4)),
keras.layers.MaxPooling2D(),
keras.layers.Conv2D(128, (7,7), activation='relu', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(2e-4)),
keras.layers.MaxPooling2D(),
keras.layers.Conv2D(128, (4,4), activation='relu', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(2e-4)),
keras.layers.MaxPooling2D(),
keras.layers.Conv2D(256, (4,4), activation='relu', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(2e-4)),
keras.layers.MaxPooling2D(),
keras.layers.Flatten(),
keras.layers.Dense(4096, activation='sigmoid', kernel_initializer=W_init, bias_initializer=b_init, kernel_regularizer=l2(1e-3))
])
encoded_l = model(left_input)
encoded_r = model(right_input)
subtracted = keras.layers.Subtract()([encoded_l, encoded_r])
prediction = Dense(1, activation='sigmoid', bias_initializer=b_init)(subtracted)
siamese_net = Model(input=[left_input, right_input], output=prediction)
optimizer= Adam(learning_rate=0.0006)
siamese_net.compile(loss='binary_crossentropy', optimizer=optimizer)
plot_model(siamese_net, show_shapes=True, show_layer_names=True)
I have seen the notebook on kaggle. Thanks for all the information. But it seems that training and validation spilt is wrong. As this model trains on initial 91 classes only. What about remaining 40 classes. Train and validation spilt should be from the same class. Suppose I have 10 images in a class. I can use 8 image for train and 2 images for validation. Train and validation spilt should be on images not on classes. Also I couldn't see the testing script. It would be a great help if you can provide that also.
I am working with a Boston housing price. I have my X and Y with a shape of (506, 13). Then, i define my model
def basic_model_1():
t_model = Sequential()
t_model.add(Dense(13, activation="tanh", input_dim = 13))
t_model.add(Dense(10, activation="tanh"))
t_model.add(Dropout(0.2))
t_model.add(Dense(6, activation="tanh"))
t_model.add(Dense(3, activation="tanh"))
t_model.add(Dense(1))
print(t_model.summary())
t_model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
t_model.fit(X,Y, nb_epoch = 200 , batch_size= 10, validation_split= 0.20)
return(t_model)
When i run this model, i get pretty bad performance of val_acc 0.0098. I changed activation function to sigmoid or relu. Performance increases slightly. What do i need to increase model performance?
In my opinion you could:
1) Add more neurons at every layer (use a multiple of 2 for better performance, try 64, 128, 256).
2) Add more dropout layers, one after every Dense layer.
3) Add much more data.
There is nothing wrong with your model architecture. Only thing I would suggest is to use kernel_initializer='normal', activation='relu' in all Dense layers (specially in output layer) since it's a regression model.
I am doing Tensorflow tutorial, getting what TF is. But I am confused about what neural network should I use in my work.
I am looking at Single Layer Neural Network, CNN, RNN, and LSTM RNN.
There is a sensor which measures something and represents the result in 2 boolean ways. Here, they are Blue and Red, like this:
the sensor gives result values every 5minutes. If we pile up the values for each color, we can see some patterns:
number inside each circle represents the sequence of result values given from sensor. (107 was given right after 106) when you see from 122 to 138, you can see decalcomanie-like pattern.
I want to predict the next boolean value before the sensor result. I may do supervised learning using past results. But I'm not sure which neural network or method is suitable. Thinking that this work needs pattern using past results (have to see context), and memorize past results, maybe LSTM RNN (long-short term memory recurrent neural network) would be suitable one. Could you tell me what is the right one?
So it sounds like you need to process a sequences of images. You could actually use both CNN and RNN together. I did this a month ago when I was training a network to swipe left or right on tinder using the sequence of profile pictures. What you would do is pass all of the images through a CNN and then into the RNN. Below is part of the code for my tinder bot. See how I distribute the convolutions over the sequence and then push it through the RNN. Finally I put a softmax classifier on the last time step to make the prediction, however in your case I think you will distribuite the prediction in time since you want the next item in the sequence.
self.input_tensor = tf.placeholder(tf.float32, (None, self.max_seq_len, self.img_height, self.img_width, 3), 'input_tensor')
self.expected_classes = tf.placeholder(tf.int64, (None,))
self.is_training = tf.placeholder_with_default(False, None, 'is_training')
self.learning_rate = tf.placeholder(tf.float32, None, 'learning_rate')
self.tensors = {}
activation = tf.nn.elu
rnn = tf.nn.rnn_cell.LSTMCell(256)
with tf.variable_scope('series') as scope:
state = rnn.zero_state(tf.shape(self.input_tensor)[0], tf.float32)
for t, img in enumerate(reversed(tf.unpack(self.input_tensor, axis = 1))):
y = tf.map_fn(tf.image.per_image_whitening, img)
features = 48
for c_layer in range(3):
with tf.variable_scope('pool_layer_%d' % c_layer):
with tf.variable_scope('conv_1'):
filter = tf.get_variable('filter', (3, 3, y.get_shape()[-1].value, features))
b = tf.get_variable('b', (features,))
y = tf.nn.conv2d(y, filter, (1, 1, 1, 1), 'SAME') + b
y = activation(y)
self.tensors['img_%d_conv_%d' % (t, 2 * c_layer)] = y
with tf.variable_scope('conv_2'):
filter = tf.get_variable('filter', (3, 3, y.get_shape()[-1].value, features))
b = tf.get_variable('b', (features,))
y = tf.nn.conv2d(y, filter, (1, 1, 1, 1), 'SAME') + b
y = activation(y)
self.tensors['img_%d_conv_%d' % (t, 2 * c_layer + 1)] = y
y = tf.nn.max_pool(y, (1, 3, 3, 1), (1, 3, 3, 1), 'SAME')
self.tensors['pool_%d' % c_layer] = y
features *= 2
print(y.get_shape())
with tf.variable_scope('rnn'):
y = tf.reshape(y, (-1, np.prod(y.get_shape().as_list()[1:])))
y, state = rnn(y, state)
self.tensors['rnn_%d' % t] = y
scope.reuse_variables()
with tf.variable_scope('output_classifier'):
W = tf.get_variable('W', (y.get_shape()[-1].value, 2))
b = tf.get_variable('b', (2,))
y = tf.nn.dropout(y, tf.select(self.is_training, 0.5, 1.0))
y = tf.matmul(y, W) + b
self.tensors['classifier'] = y
Yes, an RNN (recurrent neural network) fits the task of accumulating state along along a sequence in order to predict its next element. LSTM (long short-term memory) is a particular design for the recurrent pieces of the network that has turned out to be very successful in avoiding numerical challenges from long-lasting recurrences; see colah's much-cited blogpost for more. (Alternatives to the LSTM cell design exist but I would only fine tune that much later, possibly never.)
The TensorFlow RNN codelab explains LSTM RNNs for the case of language models, which predict the (n+1)-st word of a sentence from the preceding n words, for each n (like for each timestep in your series of measurements). Your case is simpler than language models in that you only have two words (red and blue), so if you read anything about embeddings of words, ignore it.
You also mentioned other types of neural networks. These are not aimed at accumulating state along a sequence, such as your boolean sequence of red/blue inputs. However, your second image suggests that there might be pattern in the sequence of counts of successive red/blue values. You could try using the past k counts as input to a plain feed-forward (i.e., non-recursive) neural network that predicts the probability of the next measurement having the same color as the current one. - Maybe that works with a single layer, or maybe two or even three work better; experimentation will tell. This is a less fancy approach than an RNN, but if it works good enough, it gives you a simpler solution with fewer technicalities to worry about.
CNNs (convolutional neural networks) would not be my first choice here. These aim to discover a set of fixed-scale features at various places in the input, for example, some texture or curved edge anywhere in an image. But you only want to predict one next item that extends your input sequence. A plain neural network (see above) may discover useful patterns on the k previous values, and training it with all earlier partial sequences will help it find those patterns. The CNN approach would help to discover them during prediction at long-gone parts of the input; I have no intuition why that would help.