Forward Propagation with Dropout - neural-network

I am working through Andrew Ng new deep learning Coursera course.
We are implementing the following code :
def forward_propagation_with_dropout(X, parameters, keep_prob = 0.5):
np.random.seed(1)
# retrieve parameters
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
W3 = parameters["W3"]
b3 = parameters["b3"]
# LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
Z1 = np.dot(W1, X) + b1
A1 = relu(Z1)
### START CODE HERE ### (approx. 4 lines) # Steps 1-4 below correspond to the Steps 1-4 described above.
D1 = np.random.rand(*A1.shape) # Step 1: initialize matrix D1 = np.random.rand(..., ...)
D1 = (D1 < 0.5) # Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)
A1 = A1*D1 # Step 3: shut down some neurons of A1
A1 = A1 / keep_prob # Step 4: scale the value of neurons that haven't been shut down
### END CODE HERE ###
Z2 = np.dot(W2, A1) + b2
A2 = relu(Z2)
### START CODE HERE ### (approx. 4 lines)
D2 =np.random.rand(*A2.shape) # Step 1: initialize matrix D2 = np.random.rand(..., ...)
D2 = (D2 < 0.5) # Step 2: convert entries of D2 to 0 or 1 (using keep_prob as the threshold)
A2 = A2 * D2 # Step 3: shut down some neurons of A2
A2 = A2 / keep_prob # Step 4: scale the value of neurons that haven't been shut down
### END CODE HERE ###
Z3 = np.dot(W3, A2) + b3
A3 = sigmoid(Z3)
cache = (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3)
return A3, cache
Calling:
X_assess, parameters = forward_propagation_with_dropout_test_case()
A3, cache = forward_propagation_with_dropout(X_assess, parameters, keep_prob = 0.7)
print ("A3 = " + str(A3))
My output was :
A3 = [[ 0.36974721 0.49683389 0.04565099 0.49683389 0.36974721]]
The expected output should be :
A3 = [[ 0.36974721 0.00305176 0.04565099 0.49683389 0.36974721]]
Only one number difference. Any ideas why ?
I think it is because of the way I shaped D1 and D2.

I think it is because you put D1 = (D1 < 0.5) and D2 = (D2 < 0.5)
You need to put "keep_prob" instead of 0.5

Related

RuntimeError: Too many open files when I use pytorch dataloader. (Not opening any files)

I am trying to build and train a neural network using more low level pytorch.
I have the following simple toy code that defines and trains a 6-layer fully connected neural network. The dataset is from a simple sinusoidal function.
# initialise network weights
W1 = torch.randn((1, 30), requires_grad=True)
W2 = torch.randn((30, 30), requires_grad=True)
W3 = torch.randn((30, 30), requires_grad=True)
W4 = torch.randn((30, 30), requires_grad=True)
W5 = torch.randn((30, 30), requires_grad=True)
W6 = torch.randn((30, 1), requires_grad=True)
B1 = torch.randn((30), requires_grad=True)
B2 = torch.randn((30), requires_grad=True)
B3 = torch.randn((30), requires_grad=True)
B4 = torch.randn((30), requires_grad=True)
B5 = torch.randn((30), requires_grad=True)
B6 = torch.randn((1), requires_grad=True)
def Neural_net(x, W1, W2, W3, W4, W5 , W6, B1, B2, B3, B4, B5, B6):
# calculate hidden and output layers
h1 = torch.tanh((x # W1) + B1)
h2 = torch.tanh((h1 # W2) + B2)
h3 = torch.tanh((h2 # W3) + B3)
h4 = torch.tanh((h3 # W4) + B4)
h5 = torch.tanh((h4 # W5) + B5)
output = (h5 # W6) + B6
return output
# generating dataset
features = torch.linspace(1,20,50)
features = features.view(len(features),1)
labels = torch.sin(0.5*features)
# Creating Dataloader
data_size = np.shape(features)[0]
data_set = torch.FloatTensor(features)
labels = torch.FloatTensor(labels)
dataset = TensorDataset(data_set, labels)
num_batches = 10
dataloader = torch.utils.data.DataLoader(dataset, batch_size=data_size//num_batches,
shuffle=True, num_workers=2, drop_last=False)
# training
num_epochs = 10000
criterion = torch.nn.MSELoss()
h = 0.01/num_batches
for epoch in range(num_epochs):
for i, data in enumerate(dataloader):
x = data[0]
y = data[1]
dL1 = 0
dL2 = 0
dL3 = 0
dL4 = 0
dL5 = 0
dL6 = 0
dLb1 = 0
dLb2 = 0
dLb3 = 0
dLb4 = 0
dLb5 = 0
dLb6 = 0
# forward
outputs = Neural_net(x, W1, W2, W3, W4, W5,W6, B1, B2, B3, B4, B5, B6)
loss = criterion(outputs, y)
# backward
dL1 = torch.autograd.grad(loss,W1,create_graph=True)[0]
dL2 = torch.autograd.grad(loss,W2,create_graph=True)[0]
dL3 = torch.autograd.grad(loss,W3,create_graph=True)[0]
dL4 = torch.autograd.grad(loss,W4,create_graph=True)[0]
dL5 = torch.autograd.grad(loss,W5,create_graph=True)[0]
dL6 = torch.autograd.grad(loss,W6,create_graph=True)[0]
dLb1 = torch.autograd.grad(loss,B1,create_graph=True)[0]
dLb2 = torch.autograd.grad(loss,B2,create_graph=True)[0]
dLb3 = torch.autograd.grad(loss,B3,create_graph=True)[0]
dLb4 = torch.autograd.grad(loss,B4,create_graph=True)[0]
dLb5 = torch.autograd.grad(loss,B5,create_graph=True)[0]
dLb6 = torch.autograd.grad(loss,B6,create_graph=True)[0]
# optimise
W1 = W1 - h * dL1
W2 = W2 - h * dL2
W3 = W3 - h * dL3
W4 = W4 - h * dL4
W5 = W5 - h * dL5
W6 = W6 - h * dL6
#
B1 = B1 - h * dLb1
B2 = B2 - h * dLb2
B3 = B3 - h * dLb3
B4 = B4 - h * dLb4
B5 = B5 - h * dLb5
B6 = B6 - h * dLb6
if epoch%10 == 0:
print('epoch = ',epoch,'loss = ', loss)
The problem is, that while the code seems to work at first, training the neural network for a number of epochs, the code stops with the following Runtime error after some time:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [24], in <cell line: 5>()
3 h = 0.01/num_batches
5 for epoch in range(num_epochs): # loop over the dataset multiple times
----> 7 for i, data in enumerate(dataloader):#, 0):
8 x = data[0]
9 y = data[1]
File ~/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py:517, in _BaseDataLoaderIter.__next__(self)
515 if self._sampler_iter is None:
516 self._reset()
--> 517 data = self._next_data()
518 self._num_yielded += 1
519 if self._dataset_kind == _DatasetKind.Iterable and \
520 self._IterableDataset_len_called is not None and \
521 self._num_yielded > self._IterableDataset_len_called:
File ~/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1182, in _MultiProcessingDataLoaderIter._next_data(self)
1179 return self._process_data(data)
1181 assert not self._shutdown and self._tasks_outstanding > 0
-> 1182 idx, data = self._get_data()
1183 self._tasks_outstanding -= 1
1184 if self._dataset_kind == _DatasetKind.Iterable:
1185 # Check for _IterableDatasetStopIteration
File ~/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1148, in _MultiProcessingDataLoaderIter._get_data(self)
1144 # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
1145 # need to call `.task_done()` because we don't use `.join()`.
1146 else:
1147 while True:
-> 1148 success, data = self._try_get_data()
1149 if success:
1150 return data
File ~/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1013, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
1011 except OSError as e:
1012 if e.errno == errno.EMFILE:
-> 1013 raise RuntimeError(
1014 "Too many open files. Communication with the"
1015 " workers is no longer possible. Please increase the"
1016 " limit using `ulimit -n` in the shell or change the"
1017 " sharing strategy by calling"
1018 " `torch.multiprocessing.set_sharing_strategy('file_system')`"
1019 " at the beginning of your code") from None
1020 raise
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code
Any help would be very appreciated.

artificial neural network in octave

I'm having trouble on an easy exercise about an artificial neural network with 2 features, a hidden layer of 5 neurons and two possible outputs (0 or 1).
My X matrix is a 51x2 matrix, and y is a 51x1 vector.
I know I'm not supposed to do the while E>1 but I wanted to see if eventually my error would be lower than 1
I'd like to know what I am doing wrong. My error doesn't seem to lower (around 1.5 no matter how much iterations I'm doing). Do you see in the code where I am doing a mistake? I'm supposed to use gradient descent.
function [E, v,w] = costFunction(X, y,alpha1,alpha2)
[m n] = size(X);
E = 1;
v = 2*rand(5,3)-1;
w = 2*rand(2,6)-1;
grad_v=zeros(size(v));
grad_w=zeros(size(w));
K = 2;
E = 2;
while E> 1
a1 = [ones(m,1) X];
z2 = a1 * v';
a2 = sigmoid(z2);
a2 = [ones(size(a2,1),1),a2];
z3 = a2 * w';
h = sigmoid(z3);
cost = sum((-y.*log(h)) - ((1-y).*log(1-h)),2);
E = (1/m)*sum(cost);
Delta1=0;
Delta2=0;
for t = 1:m
a1 = [1;X(t,:)'];
z2 = v * a1;
a2 = sigmoid(z2);
a2 = [1;a2];
z3 = w * a2;
a3 = sigmoid(z3);
d3 = a3 - y(t,:)';
d2 = (w(:,2:end)'*d3).*sigmoidGradient(z2);
Delta2 += (d3*a2');
Delta1 += (d2*a1');
end
grad_v = (1/m) * Delta1;
grad_w = (1/m) * Delta2;
v -= alpha1 * grad_v;
w -= alpha2 * grad_w;
end
end

Equating symbolic coefficients

I would like to seek y particular of ODE y'' - y' - 2y = 4x^2
I made the following script:
syms x A0 A1 A2
ypa = A2*x^2+A1*x+A0; % y_p assume
cyp = diff(ypa,2) - diff(ypa) - 2*ypa % according to ODE
P1 = 4*x^2; P2 = cyp ; % Equating P1 and P2
C = coeffs(P1 - P2,x);
A0 = solve(C(1),A0)
A1 = solve(C(2),A1)
A2 = solve(C(3),A2)
I got the correct answer for A2 = -2. But I did not get for A0 (should be -3) and A1 (should be 2). How to get them automatically?
P.S I'm using MATLAB R2013a.
Instead of calling solve 3 times, once on each equation of C, you should call it once on the entire system of equations so that the proper substitutions are done to give you a numeric result for each variable:
>> [A0, A1, A2] = solve(C)
A0 =
-3
A1 =
2
A2 =
-2

Compute the change of basis matrix in Matlab

I've an assignment where I basically need to create a function which, given two basis (which I'm representing as a matrix of vectors), it should return the change of basis matrix from one basis to the other.
So far this is the function I came up with, based on the algorithm that I will explain next:
function C = cob(A, B)
% Returns C, which is the change of basis matrix from A to B,
% that is, given basis A and B, we represent B in terms of A.
% Assumes that A and B are square matrices
n = size(A, 1);
% Creates a square matrix full of zeros
% of the same size as the number of rows of A.
C = zeros(n);
for i=1:n
C(i, :) = (A\B(:, i))';
end
end
And here are my tests:
clc
clear out
S = eye(3);
B = [1 0 0; 0 1 0; 2 1 1];
D = B;
disp(cob(S, B)); % Returns cob matrix from S to B.
disp(cob(B, D));
disp(cob(S, D));
Here's the algorithm that I used based on some notes. Basically, if I have two basis B = {b1, ... , bn} and D = {d1, ... , dn} for a certain vector space, and I want to represent basis D in terms of basis B, I need to find a change of basis matrix S. The vectors of these bases are related in the following form:
(d1 ... dn)^T = S * (b1, ... , bn)^T
Or, by splitting up all the rows:
d1 = s11 * b1 + s12 * b2 + ... + s1n * bn
d2 = s21 * b1 + s22 * b2 + ... + s2n * bn
...
dn = sn1 * b1 + sn2 * b2 + ... + snn * bn
Note that d1, b1, d2, b2, etc, are all column vectors. This can be further represented as
d1 = [b1 b2 ... bn] * [s11; s12; ... s1n];
d2 = [b1 b2 ... bn] * [s21; s22; ... s2n];
...
dn = [b1 b2 ... bn] * [sn1; sn2; ... s1n];
Lets call the matrix [b1 b2 ... bn], whose columns are the columns vectors of B, A, so we have:
d1 = A * [s11; s12; ... s1n];
d2 = A * [s21; s22; ... s2n];
...
dn = A * [sn1; sn2; ... s1n];
Note that what we need now to find are all the entries sij for i=1...n and j=1...n. We can do that by left-multiplying both sides by the inverse of A, i.e. by A^(-1).
So, S might look something like this
S = [s11 s12 ... s1n;
s21 s22 ... s2n;
...
sn1 sn2 ... snn;]
If this idea is correct, to find the change of basis matrix S from B to D is really what I'm doing in the code.
Is my idea correct? If not, what's wrong? If yes, can I improve it?
Things become much easier when one has an intuitive understanding of the algorithm.
There are two key points to understand here:
C(B,B) is the identity matrix (i.e., do nothing to change from B to B)
C(E,D)C(B,E) = C(B,D) , think of this as B -> E -> D = B -> D
A direct corollary of 1 and 2 is
C(E,D)C(D,E) = C(D,D), the identity matrix
in other words
C(E,D) = C(D,E)-1
Summarizing.
Algorithm to calculate the matrix C(B,D) to change from B to D:
Define C(B,E) = [b1, ..., bn] (column vectors)
Define C(D,E) = [d1, ..., dn] (column vectors)
Compute C(E,D) as the inverse of C(D,E).
Compute C(B,D) as the product C(E,D)C(B,E).
Example
B = {(1,2), (3,4)}
D = {(1,1), (1,-1)}
C(B,E) = | 1 3 |
| 2 4 |
C(D,E) = | 1 1 |
| 1 -1 |
C(E,D) = | .5 .5 |
| .5 -.5 |
C(B,D) = | .5 .5 | | 1 3 | = | 1.5 3.5 |
| .5 -.5 | | 2 4 | | -.5 -.5 |
Verification
1.5 d1 + -.5 d2 = 1.5(1,1) + -.5(1,-1) = (1,2) = b1
3.5 d1 + -.5 d2 = 3.5(1,1) + -.5(1,-1) = (3,4) = b2
which shows that the columns of C(B,D) are in fact the coordinates of b1 and b2 in the base D.

Cheap hash of three inputs independent of their order

I have a module that takes three inputs, each of which is three bits wide.
output = f(inputA, inputB, inputC)
The output depends on the values of the three inputs but does not depend on their order.
i.e. f(inputA, inputB, inputC) = f(inputB, inputC, inputA)
The solution should work well for both FPGAs and ASICs. Currently I am implementing it without taking advantage of the symmetry, but I assume that explicitly forcing the synthesizer to consider the symmetry will result in a better implementation.
I am planning on implementing this using a hash, h, of the three inputs that does not depend on their order. I can then do:
hash <= h(inputA, inputB, inputC);
output <= VALUE0 when hash = 0 else
VALUE1 when hash = 1 else
.....
My question is what should I use for the hash function?
My thoughts so far:
If each input is 3 bits wide there are 512 possibilities, but only 120 when you consider the symmetry, so theoretically I should be able to use a hash that is 7 bits wide. Practically it may need to be longer.
Each bit of the hash is a function of the input bits and must respect the symmetry of the three inputs. The bits of the hash should be independent from one another. But I'm not sure how to generate these functions.
As mentioned in your question, you could sort and concatenate your inputs.
In pseudo code:
if (A < B)
swap(A, B);
if (B < C)
swap(B, C);
if (A < B)
swap(A, B);
As block diagram:
The 6-in/6-out function needed for a "conditional swap" block:
A3x = A3 B3 ;
A2x = A3 B3' B2 + A3' A2 B3 + A2 B2 ;
A1x = A2 B3' B2' B1 + A3' A2' A1 B2 + A3 A2 B2' B1
+ A2' A1 B3 B2 + A3 B3' B1 + A3' A1 B3 + A1 B1;
B3x = B3 + A3 ;
B2x = A3' B2 + A2 B3' + B3 B2 + A3 A2 ;
B1x = A3' A2' B1 + A1 B3' B2' + A2' B3 B1 + A3 A1 B2'
+ A3' B2 B1 + A2 A1 B3' + A3' B3 B1 + A3 A1 B3'
+ B3 B2 B1 + A3 A2 A1 ;
I have to admit that this solution is not exactly "cheap" and results in a 9-bit hash rather than in a 7-bit hash. Therefore, a look-up table might in fact be the best solution.