Neural network - exercise - neural-network

I am currently learning for myself the concept of neural networks and I am working with the very good pdf from
http://neuralnetworksanddeeplearning.com/chap1.html
There are also few exercises I did, but there is one exercise I really dont understand, at least one step
Task:
There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.
I found also the solution, as can be seen on the second image
I understand why the matrix has to have this shape, but I really struggle to understand the step, where the user calculates
0.99 + 3*0.01
4*0.01
I really don't understand these two steps. I would be very happy if someone can help me understand this calculation
Thank you very much for help

Output of previous layer is 10x1(x). Weight matrix is 4x10. New output layer will be 4x1. There are two assumption first:
x is 1 only at one row. xT= [1 0 0 0 0 0 0 0 0 0]. If you multiple this vector with matrix W your output will be yT=[0 0 0 0], because there is only 1 in x. After multiplication by W will be this only 1 multiple by 0th column of W which are zeroes.
Second assumption is, what if x is not 1 anymore, instead of one x can be xT=[0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01]. And if you perform multiplication of x with first row of W result is 0.05(I believe here is typo). When xT=[0.01 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01] after multiplication with first row of W result is 1.03. Because:
0.01*0 + 0.99*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 = 1.03
So I believe there is a typo, because author probably assume 4 ones at first row of W, which is not true, because there is 5 ones. Because if there was 4 ones at first first row, than really results will be 0.04 for 0.99 at first row of x and 1.02 for 0.99 at second row of x.

Related

How to create an adjacency/joint probability matrix in matlab

From a binary matrix, I want to calculate a kind of adjacency/joint probability density matrix (not quite sure how to label it as so please feel free to rename).
For example, I start with this matrix:
A = [1 1 0 1 1
1 0 0 1 1
0 0 0 1 0]
I want to produce this output:
Output = [1 4/5 1/5
4/5 1 1/5
1/5 1/5 1]
Basically, for each row, I want to calculate the proportion of times where they agreed (1 and 1 or 0 and 0). A will always agree with itself and thus have it as 1 along the diagonal. No matter how many different js are added it will still result in a 3x3, but an extra i variable will result in a 4x4.
I like to think of the inputs along i in the A matrix as the person and Js as the question and so the final output is a 3x3 (number of persons) matrix.
I am having some trouble with this on matlab. If you could please help point me in the right direction that would be fabulous.
So, you can do this in two parts.
bothOnes = A*A';
gives you a matrix showing how many 1s each pair of rows share, and
bothZeros = (1-A)*(1-A)';
gives you a matrix showing how many 0s each pair of rows share.
If you just add them up, you get how many elements they share of either type:
bothSame = A*A' + (1-A)*(1-A)';
Then just divide by the row length to get the desired fractional representation:
output = (A*A' + (1-A)*(1-A)') / size(A, 2);
That should get you there.
Note that this only works if A contains only 1's and 0's, but it can be adapted for other cases.
Here are some alternatives, assuming A can only contain 0 and 1:
If you have the Statistics Toolbox:
result = 1-squareform(pdist(A, 'hamming'));
Manual approach with implicit expansion:
result = mean(permute(A, [1 3 2])==permute(A, [3 1 2]), 3);
Using bitwise operations. This is a more esoteric approach, and is only valid if A has at most 53 columns, due to floating-point limitations:
t = bin2dec(char(A+'0')); % convert each row from binary to decimal
u = bitxor(t, t.'); % bitwise xor
v = mean(dec2bin(u)-'0', 2); % compute desired values
result = 1 - reshape(v, size(A,1), []); % reshape to obtain result

proper way to normalize my distance matrices (matlab)

I am facing a doubt about a comparison that I want to do between two distance matrices. Lets say that I have my ground truth matrix:
gt = [1 0 0 0 1;
0 1 0 0 1;
0 0 1 0 0;
0 0 0 1 0];
and then I have two other extracted matrices:
v1 = [0.6136 0.1012 0.1146 0.1647 0.7445;
0.2264 0.7457 -0.0015 -0.0093 1.0026;
-0.0107 0.1975 1.1219 0.1699 0.1926;
-0.0019 0.0564 0.1560 0.7723 0.0565];
v2 = [0.8209 0.1390 0.1538 0.0203 0.9997;
0.2295 0.7720 -0.0028 -0.0112 1.0329;
-0.0167 0.2593 0.8172 0.2227 0.2501;
-0.0000 0.0549 0.1561 1.2728 0.0569];
Then I want to extract the distance matrix of each column of the above matrices to the columns of the ground truth matrix gt. The way I am getting this distance is dist1 = pdist2(gt', V1','euclidean'); and dist2 = pdist2(gt', V2','euclidean');. However, the result two distance matrices are not comparable right? Since the value range of each of the v1 and v2 matrices are different, therefore I need to apply a kind of normalization in order to be able to make conclusions on the result (please correct, if I am wrong).
However, I am not sure if this should be before or after I compute the distance matrices and what type of normalization to use. The negative values are playing a role of penalizing against (for that reason I am saying that I might need to apply the normalization after I compute the distance matrix, otherwise my first pick would be to normalize the v1 and v2 before I get their distance to the gt), therefore their affect should be kept and after the normalization.
Can you please give some feedback on that, how and what type of normalization to apply.
Thanks

Line chart with means and CIs in Stata

I have a data set with effect estimates for different points in time (for 1 month, 2 months, 6 months, 12 months and 18 months) and its standard errors. Now I want to plot the means for each period and the corresponding CIs around the means.
My sample looks like:
effect horizon se
0.03 1 0.2
0.02 6 0.01
0.01 6 0.3
0.00 1 0.4
0.04 18 0.2
0.02 2 0.05
0.01 2 0.02
... ...
The means of the effects for each horizon lead to 5 data points that I want to plot in a line chart together with the confidence intervals. I tried this:
egen means = mean(effect), by(horizon)
line means horizon
But how can I add the symmetric confidence bands? Such that I get something that looks like this:
Not entirely certain that this makes sense statistically, but here's how I might do this:
gen variance = se^2
collapse (mean) effect (sum) SV = variance (count) noobs = effect, by(horizon)
gen se_mean = sqrt(SV*(1/noobs)^2)
gen LB = effect - 1.96*se_mean
gen UB = effect + 1.96*se_mean
twoway (rline LB UB horizon, lpattern(dash dash)) (line effect horizon, lpattern(solid)), yline(0, lcolor(gray))
Which yields:
To get the SE of the mean effects T̅, I am using the formula
V(T̅) = 1/(n2) Σin V(Ti)
(which assumes the covariances of the effects are all zero). I then take the square root to get the SE of T̅.

Histogram Equalization method without use of histeq

I am new to Matlab and am trying to implement code to perform the same function as histeq without actual use of the function. In my code the image colour I get changes drastically when it should not change that much. The average intensity in the image (ranging between 0 and 255) is 105.3196. The image is of an open source pollen particle.
Any help would be much appreciated. The sooner the better! Please could any help be simplified as my Matlab understanding is limited. Thanks.
clc;
clear all;
close all;
pollenJpg = imread ('pollen.jpg', 'jpg');
greyscalePollen = rgb2gray (pollenJpg);
histEqPollen = histeq(greyscalePollen);
averagePollen = mean2 (greyscalePollen)
sizeGreyScalePollen = size(greyscalePollen);
rowsGreyScalePollen = sizeGreyScalePollen(1,1);
columnsGreyScalePollen = sizeGreyScalePollen(1,2);
for i = (1:rowsGreyScalePollen)
for j = (1:columnsGreyScalePollen)
if (greyscalePollen(i,j) > averagePollen)
greyscalePollen(i,j) = greyscalePollen(i,j) + (0.1 * averagePollen);
if (greyscalePollen(i,j) > 255)
greyscalePollen(i,j) = 255;
end
elseif (greyscalePollen(i,j) < averagePollen)
greyscalePollen(i,j) = greyscalePollen(i,j) - (0.1 * averagePollen);
if (greyscalePollen(i,j) > 0)
greyscalePollen(i,j) = 0;
end
end
end
end
figure;
imshow (pollenJpg);
title ('Original Image');
figure;
imshow (greyscalePollen);
title ('Attempted Histogram Equalization of Image');
figure;
imshow (histEqPollen);
title ('True Histogram Equalization of Image');
To implement the equalisation algorithm described on the Wikipedia page, follow these these steps:
Decide on a binSize to group greyscale values. (This is a tweakable, the larger the bin, the less accurate the result from the ideal case, but I think it can cause problems if chosen too small on real images).
Then, calculate the probability of a pixel being a shade of grey:
pixelCount = imageWidth * imageHeight
histogram = all zero
for each pixel in image at coordinates i, j
histogram[floor(pixel / 255 / 10) + 1] += 1 / pixelCount // 1-based arrays, not 0-based
// Note a technicality here: you may need to
// write special code to handle pixels of 255,
// because they will fall in their own bin. Or instead use rounding with an offset.
The histogram in this calculation is scaled (divided by the pixel count) so that the values make sense as probabilities. You can of course factor the division out of the for loop.
Now you need to calculate the accumulative sum of this:
histogramSum = all zero // length of histogramSum must be one bigger than histogram
for i == 1 .. length(histogram)
histogramSum[i + 1] = histogramSum[i] + histogram[i]
Now you have to invert this function and this is the tricky part. The best is to not calculate an explicit inverse, but calculate it on the spot, and apply it on the image. The basic idea is to search for the pixel value in the histogramSum (find the closest index below), and then do a linear interpolation between the index and the next index.
foreach pixel in image at coordinates i, j
hIndex = findIndex(pixel, histogramSum) // You have to write findIndex, it should be simple
equilisationFactor = (pixel - histogramSum[hIndex])/(histogramSum[hIndex + 1] - histogramSum[hIndex]) * binSize
// This above is the linear interpolation step.
// Notice the technicality that you need to handle:
// histogramSum[hIndex + 1] may be out of bounds
equalisedImage[i, j] = pixel * equilisationFactor
Edit: without drilling into the maths, I can't be 100% sure, but I think that division by 0 errors are possible. These can occur if one bin is empty, so consecutive sums are equal. So you need special code to handle this case too. The best you can do is take the value for the factor as halfway between hIndex, hIndex + n, where n is the highest value for which histogramSum[hIndex + n] == histogramSum[hIndex].
And that should be it, once you have dealt with all the technicalities.
The above algorithm is slow (especially in the findIndex step). You may be able to optimize this with a special lookup datastructure. But only do that when it's working, and only if necessary.
One more thing about your Matlab code: the rows and columns are inverted. Because of the symmetry in the algorithm, the result is the same, but it can cause puzzling bugs in other algorithms, and be very confusing if you examine pixel values during debugging. In the pseudocode above I used them the same as you, though.
Relatively few (5) lines of code can do this. I used a low contrast file called 'pollen.jpg' that I found at http://commons.wikimedia.org/wiki/File%3ALepismium_lorentzianum_pollen.jpg
I read it in using your code, run all the above, then do the following:
% find out the index of pixels sorted by intensity:
[gv gi] = sort(greyscalePollen(:));
% create a table of "approximately equal" intensity values:
N = numel(gv);
newVals = repmat(0:255, [ceil(N/256) 1]);
% perform lookup:
% the pixels in sorted order need new values from "equal bins" table:
newImg = zeros(size(greyscalePollen));
newImg(gi) = newVals(1:N);
% if the size of the image doesn't divide into 256, the last bin will have
% slightly fewer pixels in it than the others
When I run this algorithm, and then create a composite of the four images (original, your attempt, my attempt, and histeq), you get the following:
I think it's convincing. The images are not exactly identical - I believe that is because the matlab histeq routine ignores all pixels with value 0. Since it is fully vectorized it is also pretty fast (although not nearly as fast as histeq by about a factor 15 on my image.
EDIT: a bit of explanation might be in order. The repmat command I use to create the newVals matrix creates a matrix that looks like this:
0 1 2 3 4 ... 255
0 1 2 3 4 ... 255
0 1 2 3 4 ... 255
...
0 1 2 3 4 ... 255
Since matlab stores matrices in "first index first" order, if you read this matrix with a single index (as I do in the line newVals(1:N)), you access first all the zeros, then all the ones, etc:
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 ...
So - when I know the indices of the pixels in the order of their intensity (as returned by the second argument of the sort command, which I called gi), then I can easily assign the value 0 to the first N/256 pixels, the value 1 to the next N/256 etc, with the command I used:
newImg(gi) = newVals(1:N);
I hope this makes the code a little easier to understand.

Extremely large weighted average

I am using 64 bit matlab with 32g of RAM (just so you know).
I have a file (vector) of 1.3 million numbers (integers). I want to make another vector of the same length, where each point is a weighted average of the entire first vector, weighted by the inverse distance from that position (actually it's position ^-0.1, not ^-1, but for example purposes). I can't use matlab's 'filter' function, because it can only average things before the current point, right? To explain more clearly, here's an example of 3 elements
data = [ 2 6 9 ]
weights = [ 1 1/2 1/3; 1/2 1 1/2; 1/3 1/2 1 ]
results=data*weights= [ 8 11.5 12.666 ]
i.e.
8 = 2*1 + 6*1/2 + 9*1/3
11.5 = 2*1/2 + 6*1 + 9*1/2
12.666 = 2*1/3 + 6*1/2 + 9*1
So each point in the new vector is the weighted average of the entire first vector, weighting by 1/(distance from that position+1).
I could just remake the weight vector for each point, then calculate the results vector element by element, but this requires 1.3 million iterations of a for loop, each of which contains 1.3million multiplications. I would rather use straight matrix multiplication, multiplying a 1x1.3mil by a 1.3milx1.3mil, which works in theory, but I can't load a matrix that large.
I am then trying to make the matrix using a shell script and index it in matlab so only the relevant column of the matrix is called at a time, but that is also taking a very long time.
I don't have to do this in matlab, so any advice people have about utilizing such large numbers and getting averages would be appreciated. Since I am using a weight of ^-0.1, and not ^-1, it does not drop off that fast - the millionth point is still weighted at 0.25 compared to the original points weighting of 1, so I can't just cut it off as it gets big either.
Hope this was clear enough?
Here is the code for the answer below (so it can be formatted?):
data = load('/Users/mmanary/Documents/test/insertion.txt');
data=data.';
total=length(data);
x=1:total;
datapad=[zeros(1,total) data];
weights = ([(total+1):-1:2 1:total]).^(-.4);
weights = weights/sum(weights);
Fdata = fft(datapad);
Fweights = fft(weights);
Fresults = Fdata .* Fweights;
results = ifft(Fresults);
results = results(1:total);
plot(x,results)
The only sensible way to do this is with FFT convolution, as underpins the filter function and similar. It is very easy to do manually:
% Simulate some data
n = 10^6;
x = randi(10,1,n);
xpad = [zeros(1,n) x];
% Setup smoothing kernel
k = 1 ./ [(n+1):-1:2 1:n];
% FFT convolution
Fx = fft(xpad);
Fk = fft(k);
Fxk = Fx .* Fk;
xk = ifft(Fxk);
xk = xk(1:n);
Takes less than half a second for n=10^6!
This is probably not the best way to do it, but with lots of memory you could definitely parallelize the process.
You can construct sparse matrices consisting of entries of your original matrix which have value i^(-1) (where i = 1 .. 1.3 million), multiply them with your original vector, and sum all the results together.
So for your example the product would be essentially:
a = rand(3,1);
b1 = [1 0 0;
0 1 0;
0 0 1];
b2 = [0 1 0;
1 0 1;
0 1 0] / 2;
b3 = [0 0 1;
0 0 0;
1 0 0] / 3;
c = sparse(b1) * a + sparse(b2) * a + sparse(b3) * a;
Of course, you wouldn't construct the sparse matrices this way. If you wanted to have less iterations of the inside loop, you could have more than one of the i's in each matrix.
Look into the parfor loop in MATLAB: http://www.mathworks.com/help/toolbox/distcomp/parfor.html
I can't use matlab's 'filter' function, because it can only average
things before the current point, right?
That is not correct. You can always add samples (i.e, adding or removing zeros) from your data or from the filtered data. Since filtering with filter (you can also use conv by the way) is a linear action, it won't change the result (it's like adding and removing zeros, which does nothing, and then filtering. Then linearity allows you to swap the order to add samples -> filter -> remove sample).
Anyway, in your example, you can take the averaging kernel to be:
weights = 1 ./ [3 2 1 2 3]; % this kernel introduces a delay of 2 samples
and then simply:
result = filter(w,1,[data, zeros(1,3)]); % or conv (data, w)
% removing the delay introduced by the kernel
result = result (3:end-1);
You considered only 2 options:
Multiplying 1.3M*1.3M matrix with a vector once or multiplying 2 1.3M vectors 1.3M times.
But you can divide your weight matrix to as many sub-matrices as you wish and do a multiplication of n*1.3M matrix with the vector 1.3M/n times.
I assume that the fastest will be when there will be the smallest number of iterations and n is such that creates the largest sub-matrix that fits in your memory, without making your computer start swapping pages to your hard drive.
with your memory size you should start with n=5000.
you can also make it faster by using parfor (with n divided by the number of processors).
The brute force way will probably work for you, with one minor optimisation in the mix.
The ^-0.1 operations to create the weights will take a lot longer than the + and * operations to compute the weighted-means, but you re-use the weights across all the million weighted-mean operations. The algorithm becomes:
Create a weightings vector with all the weights any computation would need:
weights = (-n:n).^-0.1
For each element in the vector:
Index the relevent portion of the weights vector to consider the current element as the 'centre'.
Perform the weighted-mean with the weights portion and the entire vector. This can be done with a fast vector dot-multiply followed by a scalar division.
The main loop does n^2 additions and subractions. With n equal to 1.3 million that's 3.4 trillion operations. A single core of a modern 3GHz CPU can do say 6 billion additions/multiplications a second, so that comes out to around 10 minutes. Add time for indexing the weights vector and overheads, and I still estimate you could come in under half an hour.