Using NMF with negative value - matrix-factorization

When we have a data set with negative value, can we apply non negative matrix factorisation?
If yes, how?

If you are using R:
nneg.data.matrix <- nneg(data.matrix)
nneg.data.matrix < 0
The additional argument 'method' choices are:
pmax - Each entry is constrained to be above threshold threshold.
posneg - The matrix is split into its "positive" and "negative" parts, with the entries of each part constrained to be above threshold threshold. The result consists in these two parts stacked in rows (i.e. rbind-ed) into a single matrix, which has double the number of rows of the input matrix object.
absolute - The absolute value of each entry is constrained to be above threshold threshold.
min - Global shift by adding the minimum entry to each entry, only if it is negative, and then apply threshold.

You can add min value of the dataframe to entire cell to make it non-negative
set.seed(1)
x <- rmatrix(5,5, rnorm, mean=0, sd=5)
x
[,1] [,2] [,3] [,4] [,5]
[1,] -3.1322691 -4.102342 7.558906 -0.22466805 4.5948869
[2,] 0.9182166 2.437145 1.949216 -0.08095132 3.9106815
[3,] -4.1781431 3.691624 -3.106203 4.71918105 0.3728249
[4,] 7.9764040 2.878907 -11.073499 4.10610598 -9.9467585
[5,] 1.6475389 -1.526942 5.624655 2.96950661 3.0991287
nneg(x, method='min')
[,1] [,2] [,3] [,4] [,5]
[1,] 7.941230 6.971158 18.632405 10.84883 15.668386
[2,] 11.991716 13.510645 13.022716 10.99255 14.984181
[3,] 6.895356 14.765123 7.967297 15.79268 11.446324
[4,] 19.049903 13.952406 0.000000 15.17961 1.126741
[5,] 12.721038 9.546558 16.698154 14.04301 14.172628

Related

Matlab: Covariance Matrix from matrix of combinations using E(X) and E(X^2)

I have a set of independent binary random variables (say A,B,C) which take a positive value with some probability and zero otherwise, for which I have generated a matrix of 0s and 1s of all possible combinations of these variables with at least a 1 i.e.
A B C
1 0 0
0 1 0
0 0 1
1 1 0
etc.
I know the values and probabilities of A,B,C so I can calculate E(X) and E(X^2) for each. I want to treat each combination in the above matrix as a new random variable equal to the product of the random variables which are present in that combination (show a 1 in the matrix). For example, random variable Row4 = A*B.
I have created a matrix of the same size to the above, which shows the relevant E(X)s instead of the 1s, and 1s instead of the 0s. This allows me to easily calculate the vector of Expected values of the new random variables (one per combination) as the product of each row. I have also generated a similar matrix which shows E(X^2) instead of E(X), and another one which shows prob(X>0) instead of E(X).
I'm looking for a Matlab script that computes the Covariance matrix of these new variables i.e. taking each row as a random variable. I presume it will have to use the formula:
Cov(X,Y)=E(XY)-E(X)E(Y)
For example, for rows (1 1 0) and (1 0 1):
Cov(X,Y)=E[(AB)(AC)]-E(X)E(Y)
=E[(A^2)BC]-E(X)E(Y)
=E(A^2)E(B)E(C)-E(X)E(Y)
These values I already have from the matrices I've mentioned above. For each Covariance, I'm just unsure how to know which two variables appear in both rows, because for those I will have to select E(X^2) instead of E(X).
Alternatively, the above can be written as:
Cov(X,Y)=E(X)E(Y)*[1/prob(A>0)-1]
But the problem remains as the probabilities in the denominator will only be the ones of the variables which are shared between two combinations.
Any advice on how automate the computation of the Covariance matrix in Matlab would be greatly appreciated.
I'm pretty sure this is not the most efficient way to do that but that's a start:
Assume r1...n the combinations of the random variables, R is the matrix:
A B C
r1 1 0 0
r2 0 1 0
r3 0 0 1
r4 1 1 0
If you have the vector E1, E2 and ER as:
E1 = [E(A) E(B) E(C) ...]
E2 = [E(A²) E(B²) E(C²) ...]
ER = [E(r1) E(r2) E(r3) ...]
If you want to compute E(r1,r2) you can:
1) Extract the R1 and R2 columns from R
v1 = R(1,:)
v2 = R(2,:)
2) Sum both vectors in vs
vs = v1 + v2
3) Loop in vs, if you see a 2 that means the value in R2 has to be used, if you see a 1 it is the value in R1, if it is 0 do not use the value.
4) Using the loop, compute your E(r1,r2) as wanted.

MATLAB: How to change value for linear increasing column-structure

My question is about changing values in a matrix linearly. I have a 594x1183 matrix and each cell has a value of 10. I want to change certain parts in a matrix to other values (see image below). In the solid-lined box I have a matrix with values of 10. In the dash-lined box I want to have a value of -16.
As you can see, from column 1019 to end (1183) the value should be -16. This also holds for column 1020 (to end) ... to column 1054 (to end) for the rows 54 to 182.
I can do it either manually with Excel (time-consuming) or make for every row a loop (128 loops, also time-consuming). I think there must be a quicker way to solve this problem.
So basically, for the first row (1), column 1019 to the end of matrix (column 1183) should have a value of -16 (in the first row column 1 to 1018 it has a value of 10 and from 1019 to 1183 it has a value of -16). Then the next row, the column 1020 to the end of matrix (1183) should have a value of -16 as well (in the second row, column 1 to 1019 it has a value of 10) .... repeating this to the column 1054 in row 128. So in the last row column 1 to 1053 it has a value of 10 and from 1054 to 1183 it has a value of -16.
You can make a coordinate system via meshgrid, and use that to make inequalities to use the logical indexing of arrays.
y = 594;
x=1183;
x0 = 1054;
x1 = 1019;
y0 = 54;
y1 = 182;
A = 10*ones(y,x);
[X,Y]=meshgrid(1:x,1:y);
A( Y >= y1*(X-x0)/(x1-x0) + y0*(x1-X)/(x1-x0) & Y <= y1 & Y >= y0 ) = -16;
You can check that with the spy(A) command.

Matlab matrix with fixed sum over rows

I'm trying to construct a matrix in Matlab where the sum over the rows is constant, but every combination is taken into account.
For example, take a NxM matrix where M is a fixed number and N will depend on K, the result to which all rows must sum.
For example, say K = 3 and M = 3, this will then give the matrix:
[1,1,1
2,1,0
2,0,1
1,2,0
1,0,2
0,2,1
0,1,2
3,0,0
0,3,0
0,0,3]
At the moment I do this by first creating the matrix of all possible combinations, without regard for the sum (for example this also contains [2,2,1] and [3,3,3]) and then throw away the element for which the sum is unequal to K
However this is very memory inefficient (especially for larger K and M), but I couldn't think of a nice way to construct this matrix without first constructing the total matrix.
Is this possible in a nice way? Or should I use a whole bunch of for-loops?
Here is a very simple version using dynamic programming. The basic idea of dynamic programming is to build up a data structure (here S) which holds the intermediate results for smaller instances of the same problem.
M=3;
K=3;
%S(k+1,m) will hold the intermediate result for k and m
S=cell(K+1,M);
%Initialisation, for M=1 there is only a trivial solution using one number.
S(:,1)=num2cell(0:K);
for iM=2:M
for temporary_k=0:K
for new_element=0:temporary_k
h=S{temporary_k-new_element+1,iM-1};
h(:,end+1)=new_element;
S{temporary_k+1,iM}=[S{temporary_k+1,iM};h];
end
end
end
final_result=S{K+1,M}
This may be more efficient than your original approach, although it still generates (and then discards) more rows than needed.
Let M denote the number of columns, and S the desired sum. The problem can be interpreted as partitioning an interval of length S into M subintervals with non-negative integer lengths.
The idea is to generate not the subinterval lengths, but the subinterval edges; and from those compute the subinterval lengths. This can be done in the following steps:
The subinterval edges are M-1 integer values (not necessarily different) between 0 and S. These can be generated as a Cartesian product using for example this answer.
Sort the interval edges, and remove duplicate sets of edges. This is why the algorithm is not totally efficient: it produces duplicates. But hopefully the number of discarded tentative solutions will be less than in your original approach, because this does take into account the fixed sum.
Compute subinterval lengths from their edges. Each length is the difference between two consecutive edges, including a fixed initial edge at 0 and a final edge at S.
Code:
%// Data
S = 3; %// desired sum
M = 3; %// number of pieces
%// Step 1 (adapted from linked answer):
combs = cell(1,M-1);
[combs{end:-1:1}] = ndgrid(0:S);
combs = cat(M+1, combs{:});
combs = reshape(combs,[],M-1);
%// Step 2
combs = unique(sort(combs,2), 'rows');
%// Step 3
combs = [zeros(size(combs,1),1) combs repmat(S, size(combs,1),1)]
result = diff(combs,[],2);
The result is sorted in lexicographical order. In your example,
result =
0 0 3
0 1 2
0 2 1
0 3 0
1 0 2
1 1 1
1 2 0
2 0 1
2 1 0
3 0 0

Calculation the elements of different sized matrix in Matlab

Can anybody help me to find out the method to calculate the elements of different sized matrix in Matlab ?
Let say that I have 2 matrices with numbers.
Example:
A=[1 2 3;
4 5 6;
7 8 9]
B=[10 20 30;
40 50 60]
At first,we need to find maximum number in each column.
In this case, Ans=[40 50 60].
And then,we need to find ****coefficient** (k).
Coefficient(k) is equal to 1 divided by quantity of column of matrix A.
In this case, **coefficient (k)=1/3=0.33.
I wanna create matrix C filling with calculation.
Example in MS Excel.
H4 = ABS((C2-C6)/C9)*0.33+ABS((D2-D6)/D9)*0.33+ABS((E2-E6)/E9)*0.33
I4 = ABS((C3-C6)/C9)*0.33+ABS((D3-D6)/D9)*0.33+ABS((E3-E6)/E9)*0.33
J4 = ABS((C4-C6)/C9)*0.33+ABS((D4-D6)/D9)*0.33+ABS((E4-E6)/E9)*0.33
And then (Like above)
H5 = ABS((C2-C7)/C9)*0.33+ABS((D2-D7)/D9)*0.33+ABS((E2-E7)/E9)*0.33
I5 = ABS((C3-C7)/C9)*0.33+ABS((D3-D7)/D9)*0.33+ABS((E3-E7)/E9)*0.33
J5 = ABS((C4-C7)/C9)*0.33+ABS((D4-D7)/D9)*0.33+ABS((E4-E7)/E9)*0.33
C =
0.34 =|(1-10)|/40*0.33+|(2-20)|/50*0.33+|(3-30)|/60*0.33
0.28 =|(4-10)|/40*0.33+|(5-20)|/50*0.33+|(6-30)|/60*0.33
0.22 =|(7-10)|/40*0.33+|(8-20)|/50*0.33+|(9-30)|/60*0.33
0.95 =|(1-40)|/40*0.33+|(2-50)|/50*0.33+|(3-60)|/60*0.33
0.89 =|(4-40)|/40*0.33+|(5-50)|/50*0.33+|(6-60)|/60*0.33
0.83 =|(7-40)|/40*0.33+|(8-50)|/50*0.33+|(9-60)|/60*0.33
Actually A is a 15x4 matrix and B is a 5x4 matrix.
Perhaps,the matrices dimensions are more than this matrices (variables).
How can i write this in Matlab?
Thanks you!
You can do it like so. Let's assume that A and B are defined as you did before:
A = vec2mat(1:9, 3)
B = vec2mat(10:10:60, 3)
A =
1 2 3
4 5 6
7 8 9
B =
10 20 30
40 50 60
vec2mat will transform a vector into a matrix. You simply specify how many columns you want, and it will automatically determine the right amount of rows to transform the vector into a correctly shaped matrix (thanks #LuisMendo!). Let's also define more things based on your post:
maxCol = max(B); %// Finds maximum of each column in B
coefK = 1 / size(A,2); %// 1 divided by number of columns in A
I am going to assuming that coefK is multiplied by every element in A. You would thus compute your desired matrix as so:
cellMat = arrayfun(#(x) sum(coefK*(bsxfun(#rdivide, ...
abs(bsxfun(#minus, A, B(x,:))), maxCol)), 2), 1:size(B,1), ...
'UniformOutput', false);
outputMatrix = cell2mat(cellMat).'
You thus get:
outputMatrix =
0.3450 0.2833 0.2217
0.9617 0.9000 0.8383
Seems like a bit much to chew right? Let's go through this slowly.
Let's start with the bsxfun(#minus, A, B(x,:)) call. What we are doing is taking the A matrix and subtracting with a particular row in B called x. In our case, x is either 1 or 2. This is equal to the number of rows we have in B. What is cool about bsxfun is that this will subtract every row in A by this row called by B(x,:).
Next, what we need to do is divide every single number in this result by the corresponding columns found in our maximum column, defined as maxCol. As such, we will call another bsxfun that will divide every element in the matrix outputted in the first step by their corresponding column elements in maxCol.
Once we do this, we weight all of the values of each row by coefK (or actually every value in the matrix). In our case, this is 1/3.
After, we then sum over all of the columns to give us our corresponding elements for each column of the output matrix for row x.
As we wish to do this for all of the rows, going from 1, 2, 3, ... up to as many rows as we have in B, we apply arrayfun that will substitute values of x going from 1, 2, 3... up to as many rows in B. For each value of x, we will get a numCol x 1 vector where numCol is the total number of columns shared by A and B. This code will only work if A and B share the same number of columns. I have not placed any error checking here. In this case, we have 3 columns shared between both matrices. We need to use UniformOutput and we set this to false because the output of arrayfun is not a single number, but a vector.
After we do this, this returns each row of the output matrix in a cell array. We need to use cell2mat to transform these cell array elements into a single matrix.
You'll notice that this is the result we want, but it is transposed due to summing along the columns in the second step. As such, simply transpose the result and we get our final answer.
Good luck!
Dedication
This post is dedicated to Luis Mendo and Divakar - The bsxfun masters.
Assuming by maximum number in each column, you mean columnwise maximum after vertically concatenating A and B, you can try this one-liner -
sum(abs(bsxfun(#rdivide,bsxfun(#minus,permute(A,[3 1 2]),permute(B,[1 3 2])),permute(max(vertcat(A,B)),[1 3 2]))),3)./size(A,2)
Output -
ans =
0.3450 0.2833 0.2217
0.9617 0.9000 0.8383
If by maximum number in each column, you mean columnwise maximum of B, you can try -
sum(abs(bsxfun(#rdivide,bsxfun(#minus,permute(A,[3 1 2]),permute(B,[1 3 2])),permute(max(B),[1 3 2]))),3)./size(A,2)
The output for this case stays the same as the previous case, owing to the values of A and B.

Matlab calculate outliers from data and time they occur

In Matlab I have a large matrix A. The first column of the matrix contains a time in seconds. The second to 13th column contain results from a calculation. For each column (except the first) I calculated the whisker by:
quantile(A,[.75])-1.5*(quantile(A,[.75])-quantile(A,[.25]))
Now I would like to now how many outliers (= values below whisker) there are in each column, and when they occur. This will give me the ability to calculate how much the outliers are spread over time.
I prefer to create a loop which gives me 12 martices containing two columns. The second column should contain the values of the outliers (= values of cells below whisker) without any zero's in between, and the first column should contain the time at which a outlier occurs (chronologically).
How can I create this?
regards,
Vincent
let,
A =
0.6260 0.7690 0.1209 0.5523 0.0495
0.6609 0.5814 0.8627 0.6299 0.4896
0.7298 0.9283 0.4843 0.0320 0.1925
0.8908 0.5801 0.8449 0.6147 0.1231
0.9823 0.0170 0.2094 0.3624 0.2055
for second column:
B = quantile(A(:,2),[.75])-1.5*(quantile(A(:,2),[.75])-quantile(A(:,2),[.25]))
Then,
index = find(A(:,2) < B)
value_outliner = A(index,2)
outliner_time = A(index,1)
No need to loop: use matrix operations and logical indexing instead.
Assuming you have a matrix A and outlier threshold thr is a 1x12 vector with the threshold for each column:
vals = A(:,2:13);
outliers = bsxfun(#lt, vals, thr); #% #lt is 'less than' function handle
#% outliers is a Nx12 logical matrix with true(1) where the value < threshold
#% and false(0) otherwise.
To get the time when these outliers occurred (for a given column, let's say column 2 of the data portion of the original matrix):
t = A(outliers(:,2), 1);
#% ^____________ logical index of rows where outliers occurred in that column
You can also easily get the number of outliers in each column (or row) by summing:
num_outliers = sum(outliers,1);