Passing Individual Channels of Tensors to Layers in Keras - neural-network

I am trying to emulate something equivalent to a SeparableConvolution2D layer for the theano backend (it already exists for the TensorFlow backend). As the first step What I need to do is pass ONE channel from a tensor into the next layer. So say I have a 2D convolution layer called conv1 with 16 filters which produces an output with shape: (batch_size, 16, height, width) I need to select the subtensor with shape (: , 0, : , : ) and pass it to the next layer. Simple enough right?
This is my code:
from keras import backend as K
image_input = Input(batch_shape = (batch_size, 1, height, width ), name = 'image_input' )
conv1 = Convolution2D(16, 3, 3, name='conv1', activation = 'relu')(image_input)
conv2_input = K.reshape(conv1[:,0,:,:] , (batch_size, 1, height, width))
conv2 = Convolution2D(16, 3, 3, name='conv1', activation = 'relu')(conv2_input)
This throws:
Exception: You tried to call layer "conv1". This layer has no information about its expected input shape, and thus cannot be built. You can build it manually via: layer.build(batch_input_shape)
Why does the layer not have the required shape information? I'm using reshape from the theano backend. Is this the right way of passing individual channels to the next layer?

I asked this question on the keras-user group and I got an answer there:
https://groups.google.com/forum/#!topic/keras-users/bbQ5CbVXT1E
Quoting it:
You need to use a lambda layer, like: Lambda(x: x[:, 0:1, :, :], output_shape=lambda x: (x[0], 1, x[2], x[3]))
Note that such a manual implementation of a separable convolution would be horribly inefficient. The correct solution is to use the TensorFlow backend.

Related

ORL dataset Pytorch dataset input data

I am trying to make a neural network in Pytorch to recognize faces from the famous Olivetti faces dataset (ORL dataset). The dimensions of the images are 32x32=1024, and there are a total of 400 of them with 40 classes. I transferred the dataset from the .mat file to Python's familiar variable environment.
orl = loadmat('ORL_32x32.mat')
x = orl["fea"]
y = orl["gnd"]
df = pd.DataFrame(x)
df_label = pd.DataFrame(y)
df.to_csv("data.csv", index = False)
df_label.to_csv("y.csv", index = False)
And after that I did the following code
label = torchvision.transforms.functional.to_tensor(df_label.values) #shape torch.Size([1, 400, 1])
df_tensor = torchvision.transforms.functional.to_tensor(df.values) #shape torch.Size([1, 400, 1024])
After that, I created a tensor dataset and started training through epochs.
trn = TensorDataset(df_tensor,label)
#print(type(trn))
trn_dataloader = torch.utils.data.DataLoader(trn,batch_size=400,shuffle=False, num_workers=4)
for epoch in range(EPOCHS):
for batch_idx, (data, target) in enumerate(trn_dataloader):
print(data.shape) #torch.Size([1, 400, 1024])
Which is actually a big problem - because data.shape should be torch.Size([1, 1, 1024]) just one image, not the whole dataset looking as one image.
What is the best way to solve the whole problem?
You have specified the batch size of the dataloader to be 400, which you stated is the number of images in the dataset. The data tensor in the dataloader loop will therefore contain all images. If you set the batch size to 1, you will see that data will have shape (1, 1, 1024).
Depending on how you are training your model, you will adjust the batch size accordingly, but usually you do not train with 1 as batch size.
Since working with PyTorch, I would advise reshaping your data to the standard way for images, which is (batch size, number of channels, height, width). It seems like you are working with flattened images, so therefore the shape should be (batch size, number of features).
To me it seems like your data.csv has some wrong arrangements to be loaded the right way. When loaded, it mixes up the channel and batch size dimensions. But this can be fixed by permutating the tensor:
df_tensor = df_tensor.permute(1, 0, 2) # Shape: (1, 400, 1024) -> (400, 1, 1024)
Or scrapping the channel dimension since these are flattened images:
df_tensor = df_tensor.squeeze(0) # Shape: (1, 400, 1024) -> (400, 1024)

How to build a recurrent neural net in Keras where each input goes through a layer first?

I'm trying to build an neural net in Keras that would look like this:
Where x_1, x_2, ... are input vectors that undergo the same transformation f. f is itself a layer whose parameters must be learned. The sequence length n is variable across instances.
I'm having trouble understanding two things here:
What should the input look like?
I'm thinking of a 2D tensor with shape (number_of_x_inputs, x_dimension), where x_dimension is the length of a single vector $x$. Can such 2D tensor have a variable shape? I know tensors can have variable shapes for batch processing, but I don't know if that helps me here.
How do I pass each input vector through the same transformation before feeding it to the RNN layer?
Is there a way to sort of extend for example a GRU so that an f layer is added before going through the actual GRU cell?
I'm not an expert, but I hope this helps.
Question 1:
Vectors x1, x2... xn can have different shapes, but I'm not sure if the instances of x1 can have different shapes. When I have different shapes I usually pad the short sequences with 0s.
Question 2:
I'm not sure about extending a GRU, but I would do something like this:
x_dims = [50, 40, 30, 20, 10]
n = 5
def network():
shared_f = Conv1D(5, 3, activation='relu')
shated_LSTM = LSTM(10)
inputs = []
to_concat = []
for i in range(n):
x_i = Input(shape=(x_dims[i], 1), name='x_' + str(i))
inputs.append(x_i)
step1 = shared_f(x_i)
to_concat.append(shated_LSTM(step1))
merged = concatenate(to_concat)
final = Dense(2, activation='softmax')(merged)
model = Model(inputs=inputs, outputs=[final])
# model = Model(inputs=[sequence], outputs=[part1])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
return model
m = network()
In this example, I used a Conv1D as the shared f transformation, but you could use something else (Embedding, etc.).

How to use groups parameter in PyTorch conv2d function

I am trying to compute a per-channel gradient image in PyTorch. To do this, I want to perform a standard 2D convolution with a Sobel filter on each channel of an image. I am using the torch.nn.functional.conv2d function for this
In my minimum working example code below, I get an error:
import torch
import torch.nn.functional as F
filters = torch.autograd.Variable(torch.randn(1,1,3,3))
inputs = torch.autograd.Variable(torch.randn(1,3,10,10))
out = F.conv2d(inputs, filters, padding=1)
RuntimeError: Given groups=1, weight[1, 1, 3, 3], so expected
input[1, 3, 10, 10] to have 1 channels, but got 3 channels instead
This suggests that groups need to be 3. However, when I make groups=3, I get a different error:
import torch
import torch.nn.functional as F
filters = torch.autograd.Variable(torch.randn(1,1,3,3))
inputs = torch.autograd.Variable(torch.randn(1,3,10,10))
out = F.conv2d(inputs, filters, padding=1, groups=3)
RuntimeError: invalid argument 4: out of range at
/usr/local/src/pytorch/torch/lib/TH/generic/THTensor.c:440
When I check that code snippet in the THTensor class, it refers to a bunch of dimension checks, but I don't know where I'm going wrong.
What does this error mean? How can I perform my intended convolution with this conv2d function? I believe I am misunderstanding the groups parameter.
If you want to apply a per-channel convolution then your out-channel should be the same as your in-channel. This is expected, considering each of your input channels creates a separate output channel that it corresponds to.
In short, this will work
import torch
import torch.nn.functional as F
filters = torch.autograd.Variable(torch.randn(3,1,3,3))
inputs = torch.autograd.Variable(torch.randn(1,3,10,10))
out = F.conv2d(inputs, filters, padding=1, groups=3)
whereas, filters of size (2, 1, 3, 3) or (1, 1, 3, 3) will not work.
Additionally, you can also make your out-channel a multiple of in-channel. This works for instances where you want to have multiple convolution filters for each input channel.
However, This only makes sense if it is a multiple. If not, then pytorch falls back to its closest multiple, a number less than what you specified. This is once again expected behavior. For example a filter of size (4, 1, 3, 3) or (5, 1, 3, 3), will result in an out-channel of size 3.

How to determine accuracy with triplet loss in a convolutional neural network

A Triplet network (inspired by "Siamese network") is comprised of 3 instances of the same feed-forward network (with shared parameters). When fed with 3 samples, the network outputs 2 intermediate values - the L2 (Euclidean) distances between the embedded representation of two of its inputs from
the representation of the third.
I'm using pairs of three images for feeding the network (x = anchor image, a standard image, x+ = positive image, an image containing the same object as x - actually, x+ is same class as x, and x- = negative image, an image with different class than x.
I'm using the triplet loss cost function described here.
How do I determine the network's accuracy?
I am assuming that your are doing work for image retrieval or similar tasks.
You should first generate some triplet, either randomly or using some hard (semi-hard) negative mining method. Then you split your triplet into train and validation set.
If you do it this way, then you can define your validation accuracy as proportion of the number of triplet in which feature distance between anchor and positive is less than that between anchor and negative in your validation triplet. You can see an example here which is written in PyTorch.
As another way, you can directly measure in term of your final testing metric. For example, for image retrieval, typically, we measure the performance of model on test set using mean average precision. If you use this metric, you should first define some queries on your validation set and their corresponding ground truth image.
Either of the above two metric is fine. Choose whatever you think fit your case.
So I am performing a similar task of using Triplet loss for classification. Here is how I used the novel loss method with a classifier.
First, train your model using the standard triplet loss function for N epochs. Once you are sure that the model ( we shall refer to this as the embedding generator) is trained, save the weights as we shall be using these weights ahead.
Let's say that your embedding generator is defined as:
class EmbeddingNetwork(nn.Module):
def __init__(self):
super(EmbeddingNetwork, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(1, 64, (7,7), stride=(2,2), padding=(3,3)),
nn.BatchNorm2d(64),
nn.LeakyReLU(0.001),
nn.MaxPool2d((3, 3), 2, padding=(1,1))
)
self.conv2 = nn.Sequential(
nn.Conv2d(64,64,(1,1), stride=(1,1)),
nn.BatchNorm2d(64),
nn.LeakyReLU(0.001),
nn.Conv2d(64,192, (3,3), stride=(1,1), padding=(1,1)),
nn.BatchNorm2d(192),
nn.LeakyReLU(0.001),
nn.MaxPool2d((3,3),2, padding=(1,1))
)
self.fullyConnected = nn.Sequential(
nn.Linear(7*7*256,32*128),
nn.BatchNorm1d(32*128),
nn.LeakyReLU(0.001),
nn.Linear(32*128,128)
)
def forward(self,x):
x = self.conv1(x)
x = self.conv2(x)
x = self.fullyConnected(x)
return torch.nn.functional.normalize(x, p=2, dim=-1)
Now we shall using this embedding generator to create another classifier, fit the weights we saved before to this part of the network and then freeze this part so our classifier trainer does not interfere with the triplet model. This can be done as:
class classifierNet(nn.Module):
def __init__(self, EmbeddingNet):
super(classifierNet, self).__init__()
self.embeddingLayer = EmbeddingNet
self.classifierLayer = nn.Linear(128,62)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = self.dropout(self.embeddingLayer(x))
x = self.classifierLayer(x)
return F.log_softmax(x, dim=1)
Now we shall load the weights we saved before and freeze them using:
embeddingNetwork = EmbeddingNetwork().to(device)
embeddingNetwork.load_state_dict(torch.load('embeddingNetwork.pt'))
classifierNetwork = classifierNet(embeddingNetwork)
Now train this classifier network using the standard classification losses like BinaryCrossEntropy or CrossEntropy.

In Tensorflow, What kind of neural network should I use?

I am doing Tensorflow tutorial, getting what TF is. But I am confused about what neural network should I use in my work.
I am looking at Single Layer Neural Network, CNN, RNN, and LSTM RNN.
There is a sensor which measures something and represents the result in 2 boolean ways. Here, they are Blue and Red, like this:
the sensor gives result values every 5minutes. If we pile up the values for each color, we can see some patterns:
number inside each circle represents the sequence of result values given from sensor. (107 was given right after 106) when you see from 122 to 138, you can see decalcomanie-like pattern.
I want to predict the next boolean value before the sensor result. I may do supervised learning using past results. But I'm not sure which neural network or method is suitable. Thinking that this work needs pattern using past results (have to see context), and memorize past results, maybe LSTM RNN (long-short term memory recurrent neural network) would be suitable one. Could you tell me what is the right one?
So it sounds like you need to process a sequences of images. You could actually use both CNN and RNN together. I did this a month ago when I was training a network to swipe left or right on tinder using the sequence of profile pictures. What you would do is pass all of the images through a CNN and then into the RNN. Below is part of the code for my tinder bot. See how I distribute the convolutions over the sequence and then push it through the RNN. Finally I put a softmax classifier on the last time step to make the prediction, however in your case I think you will distribuite the prediction in time since you want the next item in the sequence.
self.input_tensor = tf.placeholder(tf.float32, (None, self.max_seq_len, self.img_height, self.img_width, 3), 'input_tensor')
self.expected_classes = tf.placeholder(tf.int64, (None,))
self.is_training = tf.placeholder_with_default(False, None, 'is_training')
self.learning_rate = tf.placeholder(tf.float32, None, 'learning_rate')
self.tensors = {}
activation = tf.nn.elu
rnn = tf.nn.rnn_cell.LSTMCell(256)
with tf.variable_scope('series') as scope:
state = rnn.zero_state(tf.shape(self.input_tensor)[0], tf.float32)
for t, img in enumerate(reversed(tf.unpack(self.input_tensor, axis = 1))):
y = tf.map_fn(tf.image.per_image_whitening, img)
features = 48
for c_layer in range(3):
with tf.variable_scope('pool_layer_%d' % c_layer):
with tf.variable_scope('conv_1'):
filter = tf.get_variable('filter', (3, 3, y.get_shape()[-1].value, features))
b = tf.get_variable('b', (features,))
y = tf.nn.conv2d(y, filter, (1, 1, 1, 1), 'SAME') + b
y = activation(y)
self.tensors['img_%d_conv_%d' % (t, 2 * c_layer)] = y
with tf.variable_scope('conv_2'):
filter = tf.get_variable('filter', (3, 3, y.get_shape()[-1].value, features))
b = tf.get_variable('b', (features,))
y = tf.nn.conv2d(y, filter, (1, 1, 1, 1), 'SAME') + b
y = activation(y)
self.tensors['img_%d_conv_%d' % (t, 2 * c_layer + 1)] = y
y = tf.nn.max_pool(y, (1, 3, 3, 1), (1, 3, 3, 1), 'SAME')
self.tensors['pool_%d' % c_layer] = y
features *= 2
print(y.get_shape())
with tf.variable_scope('rnn'):
y = tf.reshape(y, (-1, np.prod(y.get_shape().as_list()[1:])))
y, state = rnn(y, state)
self.tensors['rnn_%d' % t] = y
scope.reuse_variables()
with tf.variable_scope('output_classifier'):
W = tf.get_variable('W', (y.get_shape()[-1].value, 2))
b = tf.get_variable('b', (2,))
y = tf.nn.dropout(y, tf.select(self.is_training, 0.5, 1.0))
y = tf.matmul(y, W) + b
self.tensors['classifier'] = y
Yes, an RNN (recurrent neural network) fits the task of accumulating state along along a sequence in order to predict its next element. LSTM (long short-term memory) is a particular design for the recurrent pieces of the network that has turned out to be very successful in avoiding numerical challenges from long-lasting recurrences; see colah's much-cited blogpost for more. (Alternatives to the LSTM cell design exist but I would only fine tune that much later, possibly never.)
The TensorFlow RNN codelab explains LSTM RNNs for the case of language models, which predict the (n+1)-st word of a sentence from the preceding n words, for each n (like for each timestep in your series of measurements). Your case is simpler than language models in that you only have two words (red and blue), so if you read anything about embeddings of words, ignore it.
You also mentioned other types of neural networks. These are not aimed at accumulating state along a sequence, such as your boolean sequence of red/blue inputs. However, your second image suggests that there might be pattern in the sequence of counts of successive red/blue values. You could try using the past k counts as input to a plain feed-forward (i.e., non-recursive) neural network that predicts the probability of the next measurement having the same color as the current one. - Maybe that works with a single layer, or maybe two or even three work better; experimentation will tell. This is a less fancy approach than an RNN, but if it works good enough, it gives you a simpler solution with fewer technicalities to worry about.
CNNs (convolutional neural networks) would not be my first choice here. These aim to discover a set of fixed-scale features at various places in the input, for example, some texture or curved edge anywhere in an image. But you only want to predict one next item that extends your input sequence. A plain neural network (see above) may discover useful patterns on the k previous values, and training it with all earlier partial sequences will help it find those patterns. The CNN approach would help to discover them during prediction at long-gone parts of the input; I have no intuition why that would help.