How to repeat the simulation 1000 times to have 1000 different datasets - simulation

I have 1 binary covariate x and want to use logistic regression to get the outcome y binary using the logistic formula. Then after that, I want to simulate a data on the basis of the probability from the logistic equation
I have tried and I am able to simulate the first data but repeating the process and adding subject id and simulation number is giving me the problem.
simulation id y x
1 1 1 0
1 2 0 1
1 3 0 0
1 4 1 1
1 5 0 1
2 1 1 1
2 2 0 0
2 3 1 1
2 4 1 1
2 5 1 0
3 1 1 1
3 2 1 0
3 3 1 0
3 4 0 0
3 5 0 1
The code is as follows:
nsample <- 100
set.seed(1234)
id <- rep(1:nsample)
p <- 0.01
b0 <- log(p/(1-p))
b1 <- 0.5
b2 <- 0.1
b5 <- -4
x1 <- rbinom(nsample, 1, 0.4)
x2 <- rbinom(nsample, 1, 0.6)
z2 <- b0 + b1*x1+b2*x2
p_vector <- 1/(1+exp(-z2))
y <- rbinom(n = length(nsample), size = 1, prob = p_vector)

Related

Is there a way to do this without a for-loop?

Currently I have this function in MATLAB
function [ y ] = pyramid( x )
%PYRAMID Returns a "pyramid"-shapped matrix.
y = zeros(x); % Creates an empty matrix of x by x.
rings = ceil(x/2); % Compute number of "rings".
for n = 1:rings
% Take the first and last row of the ring and set values to n.
y([n,x-n+1],n:x-n+1) = n*ones(2,x-2*(n-1));
% Take the first and last column of the ring and set values to n.
y(n:x-n+1,[n,x-n+1]) = n*ones(x-2*(n-1),2);
end
end
Which produces the following output:
piramide(4)
ans =
1 1 1 1
1 2 2 1
1 2 2 1
1 1 1 1
piramide(5)
ans =
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1
piramide(6)
ans =
1 1 1 1 1 1
1 2 2 2 2 1
1 2 3 3 2 1
1 2 3 3 2 1
1 2 2 2 2 1
1 1 1 1 1 1
Is there a way to achive the same result without using a for-loop ?
If you have the Image Processing Toolbox you can use bwdist:
function y = pyramid(x)
m([1 x], 1:x) = 1;
m(1:x, [1 x]) = 1;
y = bwdist(m,'chessboard')+1;
end
Other solution using min:
pyramid = #(x) min(min((1:x),(1:x).'), min((x:-1:1),(x:-1:1).'));

Modify parts of a matrix based on linear equation on row and column numbers

For example:
>> tmp = ones(5,5)
tmp =
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
I want a command like:
tmp(colNum - 2*rowNum > 0) = 0
that modifies entries of tmp when the column number is more than twice the row number e.g. it should produce:
tmp =
1 1 0 0 0
1 1 1 1 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
As a second example, tmp(colNum - rowNum == 0) = 0 should set the diagonal elements of tmp to be zero.
A possibly more efficient solution is to use bsxfun like so
nRows = 5;
nCols = 5;
bsxfun(#(col,row)~(col - 2*row > 0), 1:nCols, (1:nRows)')
You can generalize this to just accept a function so it becomes
bsxfun(#(col,row)~f(col,row), 1:nCols, (1:nRows)')
And now just replace f with exactly the way you specify the equation in your question i.e.
f = #(colNum, rowNum)(colNum - 2*rowNum > 0)
or
f = #(colNum, rowNum)(colNum - rowNum == 0)
of course it might make more sense to specify your function to accept (row,col) instead of (col,row) as that's how MATLAB indexes
You can use meshgrid to generate a grid of 2D coordinates, then use this to impose any condition you wish. The variant you seek outputs 2 2D matrices where the first matrix gives you the column locations and the second matrix outputs the row locations.
For example, given your situation above:
>> [X,Y] = meshgrid(1:5, 1:5)
X =
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Y =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
You can see that each unique spatial location shared between X and Y give you the desired 2D location as if you were envisioning a 2D grid.
Therefore, you would do something like this for your first situation:
[X,Y] = meshgrid(1:5,1:5); % Generate 2D coordinates
tmp = ones(5); % Generate desired matrix
tmp(X > 2*Y) = 0; % Set desired locations to 0
We get:
>> tmp
tmp =
1 1 0 0 0
1 1 1 1 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Finally for your second example:
[X,Y] = meshgrid(1:5,1:5); % Generate 2D coordinates
tmp = ones(5); % Generate desired matrix
tmp(X == Y) = 0; % Set desired locations to 0
We get:
>> tmp
tmp =
0 1 1 1 1
1 0 1 1 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 0
Simply put, generate a grid of 2D coordinates, then use those directly to index into your desired matrix using logical / Boolean conditions to set the desired locations to 0.

How to normalize matrix setting 0 for minimum values and 1 for maximum values?

I need to transform a neural network output matrix with size 2 X N in zeros and ones, where 0 will represent the minimum value of the column and 1 contrariwise. This will be necessary in order to calculate the confusion matrix.
For example, consider this matrix 2 X 8:
2 33 4 5 6 7 8 9
1 44 5 4 7 5 2 1
I need to get this result:
1 0 0 1 0 1 1 1
0 1 1 0 1 0 0 0
How can I do this in MATLAB without for loops? Thanks in advance.
>> d = [ 2 33 4 5 6 7 8 9;
1 44 5 4 7 5 2 1];
>> bsxfun(#rdivide, bsxfun(#minus, d, min(d)), max(d) - min(d))
ans =
1 0 0 1 0 1 1 1
0 1 1 0 1 0 0 0
The bsxfun function is necessary to broadcast the minus and division operations to matrices of different dimensions (min and max have only 1 row each).
Other solution is the following (works only for 2 rows):
>> [d(1,:) > d(2,:); d(1,:) < d(2,:)]
ans =
1 0 0 1 0 1 1 1
0 1 1 0 1 0 0 0
If it's just 2xN, then this will work:
floor(A./[max(A); max(A)])
In general:
floor(A./repmat(max(A),size(A,1),1))

Generate matrix of vectors from labels for multiclass classification (vectorized)

I'm structuring my input for a multiclass classifier (m data points, k classes). In my input, I have the labels for the training data as integers in a vector y (i.e. y is m dimensional and each entry in y is an integer between 1 and k).
I'd like to transform this into an m x k matrix. Each row has 1 at the index corresponding to the label of that data point and 0 otherwise (e.g. if the data point has label 3, the row looks like [0 0 1 0 0 0 0 ...]).
I can do this by constructing a vector a = [1 2 3 4 ... k] and then computing
M_ = y*(1./b)
M = M_ .== 1
(where ./ is elementwise division and .== is elementwise logical equals). This achieves what I want by setting everything in the intermediate matrix that is not exactly 1 to 0.
But this solution seems silly and roundabout. Is there a more direct way that I'm missing?
You can use logical arrays:
M = [1:k] == y;
Given a label vector y such as [1 2 2 1 3 2 3 1] and a number of classes k such as 3, you can convert this to a label matrix Y as follows.
function Y = labelmatrix(y, k)
m = length(y);
Y = repmat(y(:),1,k) .== repmat(1:k,m,1);
The idea is to perform the following expansions:
1 1 1 1 2 3
2 2 2 1 2 3
2 2 2 1 2 3
1 1 1 .== 1 2 3
3 3 3 1 2 3
2 2 2 1 2 3
3 3 3 1 2 3
1 1 1 1 2 3
This yields:
1 0 0
0 1 0
0 1 0
1 0 0
0 0 1
0 1 0
0 0 1
1 0 0
Or just by indexing:
%// Dummy code to generate some input data
y = [1 4 3 7 2 1];
m = length(y);
k = max(y);
%// Actual conversion using y elements as index
M = zeros(m, k);
M(sub2ind(size(M), [1:m], y)) = 1
%// Result
M =
1 0 0 0 0 0 0
0 0 0 1 0 0 0
0 0 1 0 0 0 0
0 0 0 0 0 0 1
0 1 0 0 0 0 0
1 0 0 0 0 0 0

Selecting that pixel that minimizes the distance

Say I have the following two matrices:
>> x = [1 4 3; 6 4 3; 6 9 3; 2 4 3; 5 4 0; 5 3 1; 6 4 7];
>> y = [0 0 1; 1 1 0; 1 1 0; 0 1 1; 0.2 0.8 0.54; 1 1 1; 0 0 0];
Where you can think of x as some image, and y as the degree of membership of each element of x to some region of interest.
Say I set those elements in x that have degree of membership = 1 to 1 (core) and the other elements to 0 as follows:
x = zeros(size(y));
x(y==1) = 1;
In which case I will have the following output:
0 0 1
1 1 0
1 1 0
0 1 1
0 0 0
1 1 1
0 0 0
Now, for the elements of 0, I substitute their values with the value of y in the corresponding location as follows:
x(x==0)=y(x==0);
Now, I select those pixels that are considered 4-neighbours of core but not in core as follows:
four_neighbourhood_pixels = imdilate(core, strel('diamond', 1)) - core;
My question is: how can we select a pixel p that belongs to four_neighbourhood_pixels that minimizes the distance between x & core?
Provided that for distance I calculate it as follows:
pdist([x,core],'minkowski');
Provided that x in the preceding command will be the matrix after substituting the zeros with the degree of membership values y i the corresponding location?
So, how can I select that pixel that belongs to four_neighbourhood_pixels that minimizes the distance between x with the zeros substituted and core?
Thanks.
If I understand correctly, core is the following matrix:
0 0 1
1 1 0
1 1 0
0 1 1
0 0 0
1 1 1
0 0 0
First find the distance between x and core.
dist=pdist([x,core],'minkowski');
dist1=squareform(dist);
[row1,row2]=find(dist1==min(dist1(:)); %interpretation: you get the minimum distance between row1 and row2 of [x core]
Veify if my understanding is correct:
You want a pixel from x which minimizes the distance dist and it should belong to four_neighbourhood_pixels. This is the matrix [x core]
1 4 3 0 0 1
6 4 3 1 1 0
6 9 3 1 1 0
2 4 3 0 1 1
5 4 0 0 0 0
5 3 1 1 1 1
6 4 7 0 0 0
Suppose you get the minimum value between 2nd row and 3rd row. Now based on this tell us what you mean by "find a pixel which minimizes..."