Im having issues on how to understand how Convolution Layers are added.
Im trying to add Convolution Layers but i get this error :
ValueError: GpuCorrMM shape inconsistency:
bottom shape: 128 32 30 30
weight shape: 3 32 3 3
top shape: 128 1 28 28 (expected 128 3 28 28)
Apply node that caused the error: GpuCorrMM_gradInputs{valid, (1, 1)}(GpuContiguous.0, GpuContiguous.0)
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D)]
Inputs shapes: [(3, 32, 3, 3), (128, 1, 28, 28)]
Inputs strides: [(288, 9, 3, 1), (784, 0, 28, 1)]
Inputs values: ['not shown', 'not shown']
Im trying to understand what is nb_filter, stack_size, nb_row, nb_col are on a convolutional layer.
My Objective is to copy the VGG Model.
model = Sequential()
model.add(Convolution2D(32, 1, 3, 3, border_mode='full'))
model.add(Activation('relu'))
model.add(Convolution2D(32, 32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 32, 3, 3, border_mode='full'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(poolsize=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64*8*8, 512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512, nb_classes))
model.add(Activation('softmax'))
# let's train the model using SGD + momentum (how original).
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
--
Im currently using Theano and keras.
Please, any tip is appreciated.
You need to correct the output shape for the convolutional layer. Output of a CNN layer depends on many factors such as input size, number of kernels, stride and padding. Generally for an input of size BxCxW1xH1, the output would be BxFxW2xH2 where B is the batch size, C is the input channels, F is the number of output features, W1xH1 is the input size and you can compute the value of W2 and H2 using W1, H1, stride and padding. It is illustrated very well in this tutorial from Stanford: http://cs231n.github.io/convolutional-networks/#comp
Hope it helps!
Related
I'm trying to apply a linear layer to a 2D matrix of tensors connecting it only by column as in the picture below.
The input shape is (batch_size, 3, 50). I first tried with 2D convolution, adding a 1 channel dimension, so input shape is (batch_size, 1, 3, 50)
import torch.nn as nn
import torch
class ColumnConv(nn.Module):
def __init__(self):
self.layers = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=1,
kernel_size=(3, 1),
stride=1,
), # shape is B, 1, 1, 50
nn.ReLU(),
nn.Flatten() #shape is B, 50
)
def forward(self, x):
return self.layers(x)
But it doesn't seem to work.
I'm planning to use a list of 50 nn.Linear layers and apply them to column slices of input, but it seems much more like a workaround not optimized for performance.
Is there a more "pytorchic" way of doing this?
The PyTorch nn.Linear module can be applied to multidimensional input, the linear will be applied to the last dimension so to apply by column the solution is to swap rows and columns.
linear_3_to_1 = nn.Linear(3, 1)
x = torch.randn(1, 1, 3, 50)
x = x.transpose(2, 3) #swap 3 and 50
out = linear_3_to_1(x).flatten()
I am using the python xgboost library, and I am unable to get a simple working example using the gblinear booster:
M = np.array([
[1, 2],
[2, 4],
[3, 6],
[4, 8],
[5, 10],
[6, 12],
])
xg_reg = xgb.XGBRegressor(objective ='reg:linear', booster='gblinear')
X, y = M[:, :-1], M[:, -1]
xg_reg.fit(X,y)
plt.scatter(range(-5, 20), [xg_reg.predict([i]) for i in range(-5, 20)])
plt.scatter(M[:,0], M[:,-1])
plt.show()
Predictions are in blue, and real data in orange
Am I missing something?
I think the issue is that the model does not converge to the optimum with the configuration and the amount of data that you have chosen. GBM's do not use the boosting model to fit the target directly, but rather to fit the gradient and then to add a fraction of the prediction (fraction is equal to the learning rate) to the prediction from the previous step.
So the obvious ways to improve are: increase the learning rate, increase the number of iterations, increase the data size.
For example, this variant of your code gives already a better prediction:
X = np.expand_dims(range(1,7), axis=1)
y = 2*X
# note increased learning rate!
xg_reg = xgb.XGBRegressor(objective ='reg:linear', booster='gblinear', learning_rate=1)
xg_reg.fit(X, y, verbose=20, eval_set=[(X,y)])
plt.scatter(range(-5, 20), [xg_reg.predict([i]) for i in range(-5, 20)], label='prediction')
plt.scatter(X[:20,:], y[:20], label='target')
plt.legend()
plt.show()
This leads to the metric value of 0.872 on the training data (i've added evaluation in the fit function to see how does it change). This is further reduced to ~0.1, if you increase the number of samples from 7 to 70.
I am using the function
in = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13];
lag = 8;
out = xcorr(in, lag)
it produces the output:
out = [175,000000000000, 238,000000000000, 308, 384, 465, 550, 638, 728, 819, 728, 638, 550, 465, 384, 308, 238, 175,000000000000];
I do not understand from Matlabs documentation how to get those values. Is there any kind of formula that I can use for that?
In general matlab documentation put the formulas in a chapter named More about, look at this chapter to understand which formula matlab implements.
This is the link to the More about chapter of the xcorr function.
https://it.mathworks.com/help/signal/ref/xcorr.html#bubr0h6
For greater clarity look at this code:
in = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13];
lag = 8;
N = length(in);
correlation = zeros(2*lag,1);
for m = -8:8
correlation(m+8+1) = sum(in.*[zeros(1,abs(m)) in(1:N-abs(m))]);
end
where sum(in.*[zeros(1,abs(m)) in(1:N-abs(m))]); computes the sum of the product beetween in and its shifted version. To compute the shifted version of in simply padding the first m elements with zero and the N-m element are in(1:N-m). I've used the abs because the lag m is either negative or positive.
Try the code and also print [zeros(1,abs(m)) in(1:N-abs(m))] for various value of m to understand better how look the shifted version of the vector.
For homework: why we use [zeros(1,abs(m)) in(1:N-abs(m))] and not [zeros(1,abs(m)) in(1:N)]?
P.s in this case you are calculating the autocorrelation, so the y vector is x.
For more details about the theory check the Reference chapter to see which books matlab refers.
I'm kinda confused ... I first started with:
model = Sequential() # (32, 32, 3)
model.add(Conv2D(4, (3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(8, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
and two more models with one extra block of Conv2D, Conv2D, MaxPool, Dropout with 16 and 32 conv filters in model2 and two extra blocks with 16, 32, 64, 128, conv filters in model3.
I got max accuracy of about 75%, 78%, 82% respectively ...
then I changed the number of filters in Conv2D in the first model to 16 and 32, and number of nodes in the flat layer to 300, and got accuracy of 97%.
However, changing models 2 and 3, which are essentially the same as model 1, but with more layers, didn't improve accuracy and actually made it worse... I tried different numbers of filters in the extra Conv layers and different numbers of nodes in the flat layer, from 300 to 1500, but nothing seems to make a difference. No matter what, the first model, with the least number of layers seems to be doing the best.
Why is that?
My goal is to use Matlab to verify circular convolution calculations. I try to do this using cconv.
However, Matlab does not give the same answer to problems I know the answer for. Why?
An example is the circular convolution modulo 4 between [1, 2, 4, 5, 6] and [7, 8, 9, 3] as can be found in this paper by Abassi
According to the paper the answer is: [112, 91, 71, 88, 124].
But according to Matlab it is: [131, 127, 122, 106].
a = [1,2,4,5,6]
b = [7,8,9,3]
y = cconv(a,b,4)
ans =
131 127 122 106
What do I do wrong here?
y = cconv(a,b,5)
the 3rd argument is 5 not 4 for what the paper describes
The matlab code used in the Abbasi paper is written in the end :
A=fft(a);
B=fft(b);
y=ifft(A.*B);
I don't know why you use cconv if this doese the work.