Simulation of custom probability distribution in Matlab - matlab

I'm trying to simulate the following distribution:
a | 0 | 1 | 7 | 11 | 13
-----------------------------------------
p(a) | 0.34 | 0.02 | 0.24 | 0.29 | 0.11
I already simulated a similar problem: four type of balls with chances of 0.3, 0.1, 0.4 and 0.2. I created a vector F = [0 0.3 0.4 0.8 1] and used repmat to grow it by 1000 rows. Then I compared it with a columnvector of 1000 random numbers grown with 5 columns using the same repmat approach. I compared those two, calculated the sumvector of the matrix, and calculated the difference to get the frequences (e.g. [301 117 386 196]). .
But with the current distribution I don't know how to create the initial matrix F and whether I can use the same approach I used before at all.
I need the answer to be "vectorised", so no (for, while or if) loops.
This question on math.stackexchange

What if you create the following arrays:
largeNumber = 1000000;
a=repmat( [0], 1, largeNumber*0.34 );
b=repmat( [1], 1, largeNumber*0.02 );
% ...
e=repmat( [13], 1, largeNumber*0.11 );
Then you concatenate all of these arrays (to get a single array where your entries are represented with their corresponding probabilities), shuffle them, and extract the first N elements to get an N-dimensional vector drawn from your distribution.
EDIT: of course this answer is the way to go.

Related

Neural network - exercise

I am currently learning for myself the concept of neural networks and I am working with the very good pdf from
http://neuralnetworksanddeeplearning.com/chap1.html
There are also few exercises I did, but there is one exercise I really dont understand, at least one step
Task:
There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.
I found also the solution, as can be seen on the second image
I understand why the matrix has to have this shape, but I really struggle to understand the step, where the user calculates
0.99 + 3*0.01
4*0.01
I really don't understand these two steps. I would be very happy if someone can help me understand this calculation
Thank you very much for help
Output of previous layer is 10x1(x). Weight matrix is 4x10. New output layer will be 4x1. There are two assumption first:
x is 1 only at one row. xT= [1 0 0 0 0 0 0 0 0 0]. If you multiple this vector with matrix W your output will be yT=[0 0 0 0], because there is only 1 in x. After multiplication by W will be this only 1 multiple by 0th column of W which are zeroes.
Second assumption is, what if x is not 1 anymore, instead of one x can be xT=[0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01]. And if you perform multiplication of x with first row of W result is 0.05(I believe here is typo). When xT=[0.01 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01] after multiplication with first row of W result is 1.03. Because:
0.01*0 + 0.99*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 = 1.03
So I believe there is a typo, because author probably assume 4 ones at first row of W, which is not true, because there is 5 ones. Because if there was 4 ones at first first row, than really results will be 0.04 for 0.99 at first row of x and 1.02 for 0.99 at second row of x.

How can I delete table row values in pairs? For example, if either column is less than 0.01, how do I delete the row?

I have two sets of data from different instruments that have common X-variables (XThompsons) but various Y-variables (YCounts) due to various experimental conditions. The data resemble the example below:
[Table1]
XThompsons | YCounts (1) | YCounts (2) | YCounts (3) | .... | ....
------------------------------------------------------------------
[Table2]
XThompsons | YCounts (1) | YCounts (2) | YCounts (3) | .... | ....
------------------------------------------------------------------
When I have two sets of data that are like this, I have written a script to take a single Y-column information from Table1 and do some math to all Y-columns in Table2. However, when comparing two table columns if either column has a value of a specific threshold (0.10) I want to delete that value. In the example below I want to delete row 4 and row 6 because either column has a value containing 0.10 or less
XThompsons | Table1.YCounts(1) | Table2.YCounts(2)
--------------------------------------------------
1 1.00 0.50
2 0.22 0.12
3 0.29 0.14
4 0.29 0.09 (delete row)
5 0.11 0.49
6 0.02 0.83 (delete row)
How can I carry this out in Matlab? My current code is below; I convert each table row to an array first. How can I make it so that if Y < 0.10 delete the row?
datax = readtable('table1.xls'); % Instrument 1
datay = readtable('table2.xls'); % Instrument 2
SIDATA = [];
for idx=2:width(datay);
% Read the indexed column of datax (instrument 1) then normalize to 1
x = table2array(datax(:,idx));
x = x ./ max(x);
% Read indexed column of datay (instrument 2) and carry out loop
for idy=2:width(datay);
% Normalize y data to 1
y = table2array(datay(:,idy));
y = y ./ max(y);
% Calculate similarity index (SI) at using the datax index for all collision energies for datay
xynum = sum(sqrt(x) .* sqrt(y));
xyden = sqrt(sum(x) .* sum(y));
SIDATA(idy,idx) = (xynum/xyden);
end
end
Help would be appreciated.
Thanks!
Generally when looping through and pruning values you want to increment from the end of the matrix back to one; this way, if you delete any rows, you don't skip. (If you delete row 2, then advance to row 3, you skip the data formerly in row 3).
To me, the easiest way to do this is that if all your data is in one matrix A, with columns Y1 Y2,
APruned = A((A(:,1) > 0.1) & (A(:,2) > 0.1),:)
This takes the A matrix, finds the rows where Y1 > 0.1, finds the rows where Y2 > 0.1, finds the overlap, and then outputs only the rows in A where both of these are true.
You should read about logical indecies for more on this topic
EDIT: It looks like you could also clean up your earlier code using element-wise operations;
A = [datax./max(datax) datay./max(datay)];

Merge different sized arrays into a table

Overview
I am currently working with a series of .txt files I am importing into MATLAB. For simplicity, I'll show my problem conceptually. Obligatory, I'm new to MATLAB (or programming in general).
These .txt files contain data from tracking a ROI in a video (frame-by-frame) with time ('t') in the first column and velocity ('v') in the second as shown below;
T1 = T2 = etc.
t v t v
0 NaN 0 NaN
0.1 100 0.1 200
0.2 200 0.2 500
0.3 400 0.3 NaN
0.4 150
0.5 NaN
Problem
Files differ in their size, the columns remain fixed but the rows vary from trial to trial as shown in T1 and T2.
The time column is the same for each of these files so I wanted to organise data in a table as follows;
time v1 v2 etc.
0 NaN NaN
0.1 100 200
0.2 200 500
0.3 400 NaN
0.4 150 0
0.5 NaN 0
Note that I want to add 0s (or NaN) to end of shorter trials to fix the issue of size differences.
Edit
Both solutions worked well for my dataset. I appreciate all the help!
You could import each file into a table using readtable and then use outerjoin to combine the tables in the way that you would expect. This will work if all data starts at t = 0 or not.
To create a table from a file:
T1 = readtable('filename1.dat');
T2 = readtable('filename2.dat');
Then to perform the outerjoin (pseudo data created for demonstration purposes).
t1 = table((1:4)', (5:8)', 'VariableNames', {'t', 'v'});
%// t v
%// _ _
%// 1 5
%// 2 6
%// 3 7
%// 4 8
% t2 is missing row 2
t2 = table([1;3;4], [1;3;4], 'VariableNames', {'t', 'v'});
%// t v
%// _ _
%// 1 1
%// 3 3
%// 4 4
%// Now perform an outer join and merge the key column
t3 = outerjoin(t1, t2, 'Keys', 't', 'MergeKeys', true)
%// t v_t1 v_t2
%// _ ____ ____
%// 1 5 1
%// 2 6 NaN
%// 3 7 3
%// 4 8 4
I would suggest the use of the padarray and horzcat functions. They respectively :
Pad a matrix or vector with extra data, effectively adding extra 0's or any specified value (NaNs work too).
Concatenate matrices or vectors horizontally.
First, try to obtain the length of the longest vector you have to concatenate. Let's call this value max_len. Once you have that, you can then pad each vector by doing :
v1 = padarray(v1, max_len - length(v1), 0, 'post');
% You can replace the '0' by any value you want !
Finally, once you have vectors of the same size, you can concatenate them using horzcat :
big_table = horzcat(v1, v2, ... , vn);

Matlab: arithmetic operation on columns inside a for loop (simple yet devious!)

I'm trying to represent a simple matrix m*n (let's assume it has only one row!) such that m1n1 = m1n1^1, m1n2 = m1n1^2, m1n3 = m1n1^3, m1n3 = m1n1^4, ... m1ni = m1n1^i.
In other words, I am trying to iterate over a matrix columns n times to add a new vector(column) at the end such that each of the indices has the same value as the the first vector but raised to the power of its column number n.
This is the original vector:
v =
1.2421
2.3348
0.1326
2.3470
6.7389
and this is v after the third iteration:
v =
1.2421 1.5429 1.9165
2.3348 5.4513 12.7277
0.1326 0.0176 0.0023
2.3470 5.5084 12.9282
6.7389 45.4128 306.0329
now given that I'm a total noob in Matlab, I really underestimated the difficulty of such a seemingly easy task, that took my almost a day of debugging and surfing the web to find any clue. Here's what I have come up with:
rows = 5;
columns = 3;
v = x(1:rows,1);
k = v;
Ncol = ones(rows,1);
extraK = ones(rows,1);
disp(v)
for c = 1:columns
Ncol = k(:,length(k(1,:))).^c; % a verbose way of selecting the last column only.
extraK = cat(2,extraK,Ncol);
end
k = cat(2,k,extraK);
disp(extraK(:,2:columns+1)) % to cut off the first column
now this code (for some weird reason) work only if rows = 6 or less, and columns = 3 or less.
when rows = 7, this is the output:
v = 1.0e+03 *
0.0012 0.0015 0.0019
0.0023 0.0055 0.0127
0.0001 0.0000 0.0000
0.0023 0.0055 0.0129
0.0067 0.0454 0.3060
0.0037 0.0138 0.0510
0.0119 0.1405 1.6654
How could I get it to run on any number of rows and columns?
Thanks!
I have found a couple of things wrong with your code:
I'm not sure as to why you are defining d = 3;. This is just nitpicking, but you can remove that from your code safely.
You are not doing the power operation properly. Specifically, look at this statement:
Ncol = k(:,length(k(1,:))).^c; % a verbose way of selecting the last column only.
You are selectively choosing the last column, which is great, but you are not applying the power operation properly. If I understand your statement, you wish to take the original vector, and perform a power operation to the power of n, where n is the current iteration. Therefore, you really just need to do this:
Ncol = k.^c;
Once you replace Ncol with the above line, the code should now work. I also noticed that you crop out the first column of your result. The reason why you are getting duplicate columns is because your for loop starts from c = 1. Since you have already computed v.^1 = v, you can just start your loop at c = 2. Change your loop starting point to c = 2, and you can get rid of the removal of the first column.
However, I'm going to do this in an alternative way in one line of code. Before we do this, let's go through the theory of what you're trying to do.
Given a vector v that is m elements long stored in a m x 1 vector, what you want is to have a matrix of size m x n, where n is the desired number of columns, and for each column starting from left to right, you wish to take v to the nth power.
Therefore, given your example from your third "iteration", the first column represents v, the second column represents v.^2, and the third column represents v.^3.
I'm going to introduce you to the power of bsxfun. bsxfun stands for Binary Singleton EXpansion function. What bsxfun does is that if you have two inputs where either or both inputs has a singleton dimension, or if either of both inputs has only one dimension which has value of 1, each input is replicated in their singleton dimensions to match the size of the other input, and then an element-wise operation is applied to these inputs together to produce your output.
For example, if we had two vectors like so:
A = [1 2 3]
B = [1
2
3]
Note that one of them is a row vector, and the other is a column vector. bsxfun would see that A and B both have singleton dimensions, where A has a singleton dimension being the number of rows being 1, and B having a singleton dimension which is the number of columns being 1. Therefore, we would duplicate B as many columns as there are in A and duplicate A for as many rows as there are in B, and we actually get:
A = [1 2 3
1 2 3
1 2 3]
B = [1 1 1
2 2 2
3 3 3]
Once we have these two matrices, you can apply any element wise operations to these matrices to get your output. For example, you could add, subtract, take the power or do an element wise multiplication or division.
Now, how this scenario applies to your problem is the following. What you are doing is you have a vector v, and you will have a matrix of powers like so:
M = [1 2 3 ... n
1 2 3 ... n
...........
...........
1 2 3 ... n]
Essentially, we will have a column of 1s, followed by a column of 2s, up to as many columns as you want n. We would apply bsxfun on the vector v which is a column vector, and another vector that is only a single row of values from 1 up to n. You would apply the power operation to achieve your result. Therefore, you can conveniently calculate your output by doing:
columns = 3;
out = bsxfun(#power, v, 1:columns);
Let's try a few examples given your vector v:
>> v = [1.2421; 2.3348; 0.1326; 2.3470; 6.7389];
>> columns = 3;
>> out = bsxfun(#power, v, 1:columns)
out =
1.2421 1.5428 1.9163
2.3348 5.4513 12.7277
0.1326 0.0176 0.0023
2.3470 5.5084 12.9282
6.7389 45.4128 306.0321
>> columns = 7;
>> format bank
>> out = bsxfun(#power, v, 1:columns)
out =
Columns 1 through 5
1.24 1.54 1.92 2.38 2.96
2.33 5.45 12.73 29.72 69.38
0.13 0.02 0.00 0.00 0.00
2.35 5.51 12.93 30.34 71.21
6.74 45.41 306.03 2062.32 13897.77
Columns 6 through 7
3.67 4.56
161.99 378.22
0.00 0.00
167.14 392.28
93655.67 631136.19
Note that for setting the columns to 3, we get what we see in your post. For pushing the columns up to 7, I had to change the way the numbers were presented so you can see the numbers clearly. Not doing this would put this into exponential form, and there were a lot of zeroes that followed the significant digits.
Good luck!
When computing cumulative powers you can reuse previous results: for scalar x and n, x.^n is equal to x * x.^(n-1), where x.^(n-1) has been already obtained. This may be more efficient than computing each power independently, because multiplication is faster than power.
Let N be the maximum exponent desired. To use the described approach, the column vector v is repeated N times horizontally (repmat), and then a cumulative product is applied along each row (cumprod):
v =[1.2421; 2.3348; 0.1326; 2.3470; 6.7389]; %// data. Column vector
N = 3; %// maximum exponent
result = cumprod(repmat(v, 1, N), 2);

Ignoring similar columns when concating matrixes vertically

In matlab I have a 128 by n matrix, which we can call
[A B C]
where each letter is an 128 by 1 matrix.
So what I want to do is concat the above matrix with another matrix,
[A~ D E].
Where A~ is similar in its values to A.
What I want to get as the result of the concat would be:
[A B C D E],
where A~ is omitted.
What is the best way to do this? Note that I do not know beforehand that A~ is similar.
To clarify, my problem is how would I determine if two columns are similar? By similar I mean where between two columns, many of the row values are close in value.
Maybe an illustration would help as well
Vector A: [1 2 3 4 5 6 7 8 9]'
| | | | | | | | |
Vector B: [20 2.4 4 5 0 7 7 7.6 10]'
where there are some instances where the values are completely different, but for the most part the values are close. I don't have a defined threshold for this, but ideally it would be something that I could experiment with.
If you want to omit only identical columns, this is one way to do it:
%# Define the example matrices.
Matrix1 = [ 1 2 3; 4 5 6; 7 8 9 ]';
Matrix2 = [ 4 5 6; 7 8 10 ]';
%# Concatenate the matrices and keep only unique columns.
OutputMatrix = unique([ Matrix1, Matrix2 ]', 'rows')';
To solve this, a matching algorithm called vl_ubcmatch can be used.
[matches, scores] = vl_ubcmatch(da, db) ; For each descriptor in da,
vl_ubcmatch finds the closest descriptor in db (as measured by the L2
norm of the difference between them). The index of the original match
and the closest descriptor is stored in each column of matches and the
distance between the pair is stored in scores.
source:
http://www.vlfeat.org/overview/sift.html
Thus, the solution is to find the matched columns with the highest scores and eliminate them before concatenating.
I think it's pdist2 you need.
Consider the following example:
>> X = rand(25, 5);
>> Y = rand(100, 5);
>> Y(22, : ) = 0.99*X(22,:);
>> D = pdist2(X,Y, 'euclidean');
>> [~,ind] = min(D(:));
>> [i,j]=ind2sub(size(D),ind)
i =
22
j =
22
which is indeed the entry we manipulated to be similar. Read help pdist2 or doc pdist2 for more background.