In CF item-item recommenders, how can I calculate item similarity when the matrix is sparse? - recommendation-engine

On the way to find item neighbors, I need first to calculate similarity. How can I calculate it in sparse matrix? Is it correct?

In item-based collaborative filtering we calculate the similarity between items.
Here we can use cosine similarity because no matter how much sparse vectors are, Cosine similarity will calculate neighbors based on cosine angle between the vectors or the closeness of two vectors in the vector space. Not on the basis of the values of thè vectors.
For example:-
Per1 Per2 Per3
Item1 5 3 1
Ttem2 2 3 3
If we calculate the cosine similarity of two vectors:
Cos_sim_1 = (5*2 + 3*3 + 1*3) / sqrt((25+9+1)*(4+9+9)
Cos_sim_1 = 0.792
And if the matrix is sparse:
Per1 Per2 Per3 Per4 Per5 Per6 Per7 per8
Item1 5 3 1 0 0 0 0 0
Ttem2 2 3 3 0 0 0 0 0
And the cosine similarity of sparse vectors:
Cos_sim_2 = (5*2 + 3*3 + 1*3 + 0*0 + 0*0 +0*0 +0*0 +0*0) / sqrt((25+9+1+0+0+0+0+0)*(4+9+9+0+0+0+0+0))
Cos_sim_2 = 0.792
I hope it helps!!!!!

Related

Find two MAXIMUM values' position in 3D matrix (MATLAB)

I have been having problem with identifying two maximum values' position in 3D matrix (MATLAB). Say I have matrix A output as follows:
A(:,:,1) =
5 3 5
0 1 0
A(:,:,2) =
0 2 0
8 0 8
A(:,:,3) =
3 0 0
0 7 7
A(:,:,4) =
6 6 0
4 0 0
For the first A(:,:,1), I want to identify that the first row have the highest value (A=5). But I need the two index position, which in this case, 1 and 3. And this is the same as the other A(:,:,:).
I have searched through SO but since I am bad in MATLAB, I couldn't find way to work this through.
Please do help me on this. It would be better if I don't need to use for loop to get the desired output.
Shot #1 Finding the indices for maximum values across each 3D slice -
%// Reshape A into a 2D matrix
A_2d = reshape(A,[],size(A,3))
%// Find linear indices of maximum numbers for each 3D slice
idx = find(reshape(bsxfun(#eq,A_2d,max(A_2d,[],1)),size(A)))
%// Convert those linear indices to dim1, dim2,dim3 indices and
%// present the final output as a Nx3 array
[dim1_idx,dim2_idx,dim3_idx] = ind2sub(size(A),idx)
out_idx_triplet = [dim1_idx dim2_idx dim3_idx]
Sample run -
>> A
A(:,:,1) =
5 3 5
0 1 0
A(:,:,2) =
0 2 0
8 0 8
A(:,:,3) =
3 0 0
0 7 7
A(:,:,4) =
6 6 0
4 0 0
out_idx_triplet =
1 1 1
1 3 1
2 1 2
2 3 2
2 2 3
2 3 3
1 1 4
1 2 4
out_idx_triplet(:,2) is what you are looking for!
Shot #2 Finding the indices for highest two numbers across each 3D slice -
%// Get size of A
[m,n,r] = size(A)
%// Reshape A into a 2D matrix
A_2d = reshape(A,[],r)
%// Find linear indices of highest two numbers for each 3D slice
[~,sorted_idx] = sort(A_2d,1,'descend')
idx = bsxfun(#plus,sorted_idx(1:2,:),[0:r-1]*m*n)
%// Convert those linear indices to dim1, dim2,dim3 indices
[dim1_idx,dim2_idx,dim3_idx] = ind2sub(size(A),idx(:))
%// Present the final output as a Nx3 array
out_idx_triplet = [dim1_idx dim2_idx dim3_idx]
out_idx_triplet(:,2) is what you are looking for!
The following code gives you the column and row of the respective maximum.
The first step will obtain the maximum of each sub-matrix containing the first and second dimension. Since max works per default with the first dimension, the matrix is reshaped to combine the original first and second dimension.
max_vals = max(reshape(A,size(A,1)*size(A,2),size(A,3)));
max_vals =
5 8 7 6
In the second step, the index of elements equal to the respective max_vals of each sub-matrix is obtained using arrayfun over the third dimension. Since the output of arrayfun are cells, cell2mat is used to transform the output into a matrix. As a last step, the linear index from find is transformed into sub-indices by ind2sub.
[i,j] = ind2sub(size(A(:,:,1)),cell2mat(arrayfun(#(i)find(A(:,:,i)==max_vals(i)),1:size(A,3),'UniformOutput',false)))
i =
1 2 2 1
1 2 2 1
j =
1 1 2 1
3 3 3 2
Hence, the values in j are the ones you want to have.

I want to generate(or count) all possible binary matrix that satisfy certain Condition

I want to generate(or count) all possible binary matrix that satisfy below Condition.
let A be arbitrary binary matrix 4*4
A= [0 0 1 1]
[0 0 1 1]
[1 1 0 0]
[1 1 0 0]
[sum(row1) sum(r2) sum(r3) sum(r4) sum(column1) sum(c2) sum(c3) sum(c4)]
condition: [2 2 2 2 2 2 2 2]
1)how many matrix satisfy above condition?
2)how can i generate them?
answer 1 is:90
but i want a formula or algorithm ,
because i want to use it for 1024*1024 or upper and every arbitrary condition vector.
Brute-force approach: generate all 4x4 binary matrices and test which of them fulfill your condition:
condition = [2 2 2 2 2 2 2 2]; %// desired conditions
matrices = reshape(dec2bin(0:2^16-1,16).'-'0', 4,4,[]); %'// all binary matrices
ind = all(bsxfun(#eq, [squeeze(sum(matrices,1)); squeeze(sum(matrices,2))],...
condition(:))); %// gives 1 for matrices that fulfill the condition, or else 0
result = matrices(:,:,ind); %// pick solution matrices
number = size(result,3); %// number of solution matrices
The solution matrices are result(.,:,1), result(.,:,2), ...; and number is the number of solution matrices.
You could speed this a little exploiting symmetries.

Checking values of two vectors against eachother and then using the column location of equal entries to extract colums from a matrix in matlab

I'm doing a curve fitting problem in Matlab and so far I've set up some orthonormal polynomials along a specified range of x-values with x = (0:0.0001:40);
The polynomials themselves are each a manipulation of that x vector and are stored as a row in a matrix. I also have some have data entries in the form of two vectors - one for the data x-coords and one for the actual values. I need a way to use the x-coords of my data points to find the same values in my continuous x-vector and then take the corresponding columns from my polynomial matrix and add them to a new matrix.
EDIT: To be more clear. I have, for example:
x = [0 1 2 3 4 5]
Polynomial =
1 1 1 1 1 1
0 1 2 3 4 5
0 1 4 9 16 25
% Data values:
x-coord = [1 3 4]
values = [5 3 8]
I want to check the x-coord values against 'x' to find the corresponding columns and then pull out those columns from the polynomial matrix to get:
Polynomial =
1 1 1
1 3 4
1 9 16
If your x, Polynomial, and xcoord are the same length you could use logical indexing which is elegant; something along the lines of Polynomial(x==xcoord). But since this doesn't seem to be the case, there's a less fancy solution with a for-loop and find(xcoord(i)==x)

sparse matrix parallel to the full matrix syntax of A(ind,ind)=1 in Matlab

In Matlab I can populate all combinations of elements in a full matrix by doing the following:
A=zeros(5);
ind=[1 4 5];
A(ind,ind)=1
A =
1 0 0 1 1
0 0 0 0 0
0 0 0 0 0
1 0 0 1 1
1 0 0 1 1
How can I accomplish that when my matrix A is sparse? (say A=zeros(1e6) and I only want ~1000 elements to be 1 etc...)
You can use the sparse command, as follows:
% create a 5x5 sparse matrix A, with 1's at A(ind,ind)
[row,col] = meshgrid(ind,ind); % form indexing combinations
row = row(:); % rearrange matrices to column vectors
col = col(:);
A = sparse(row, col, 1, 5, 5);
While it is possible to index sparse matrices using the conventional A(1,2) = 1 style, generally this is not a good idea. MATLAB sparse matrices are stored very differently to full matrices behind the scenes and do not support efficient dynamic indexing of this kind.
To get good performance sparse matrices should be built in one go using the sparse(i,j,x,m,n) syntax.

Avoiding sub2ind and ind2sub

I need to access several indices around a certain point in 3D.
For example, for point (x1,y1,z1) I need to get all the indices of its 3x3x3 neighborhood such that (x1,y1,z1) is centered. For neighborhood of size 3, I do it with
[x,y,z] = meshgrid(-1:1,-1:1,-1:1);
x_neighbors = bsxfun(#plus,x,x1);
y_neighbors = bsxfun(#plus,y,y1);
z_neighbors = bsxfun(#plus,z,z1);
Here, I center x1,y1,z1 to (0,0,0) by adding the distances from (x1,y1,z1) to any point in the 3x3x3 box.
that gives me the coordinates of (x1,y1,z1) 3x3x3 neighborhood. I then need to turn them into linear indices so I can access them:
lin_ind = sub2ind(size(volume),y_neighbors,x_neighbors,z_neighbors);
that is costly in what I do.
My question is, how to avoid sub2ind. If inx is the linear index of (x1,y1,z1),
inx = sub2ind(size(volume),y1,x1,z1);
how can I find the 3x3x3 neighborhood of the linear index by adding or subtracting or any other simple operation of inx?
As long as you know the dimensions of your 3D array, you can compute the linear offsets of all the elements of the 3x3x3 neighborhood. To illustrate this, consider a 2D example of a 4x5 matrix. The linear indices look like this:
1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
4 8 12 16 20
The 3x3 neighborhood of 10 is [5 6 7 9 10 11 13 14 15]. The 3x3 neighborhood of 15 is [10 11 12 14 15 16 18 19 20]. If we subtract off the index of the central element, in both cases we get [-5 -4 -3 -1 0 1 3 4 5]. More generally, for MxN matrix we will have [-M-1 -M -M+1 -1 0 1 M-1 M M+1], or [(-M+[-1 0 1]) -1 0 1 (M+[-1 0 1])].
Generalizing to three dimensions, if the array is MxNxP, the linear index offsets from the central element will be [(-M*N+[-M-1 -M -M+1 -1 0 1 M-1 M M+1]) [-M-1 -M -M+1 -1 0 1 M-1 M M+1] (M*N+[-M-1 -M -M+1 -1 0 1 M-1 M M+1])]. You can reshape this to 3x3x3 if you wish.
Note that this sort of indexing doesn't deal well with edges; if you want to find the neighbors of an element on the edge of the array you should probably pad the array on all sides first (thereby changing M, N, and P).
Just adding the (generalized) code to #nhowe answer:
This is an example for neighborhood of size 5X5X5, therefore r (the radius) is 2:
ns = 5;
r = 2;
[M,N,D] = size(vol);
rs = (1:ns)-(r+1);
% 2d generic coordinates:
neigh2d = bsxfun(#plus, M*rs,rs');
% 3d generic coordinates:
pages = (M*N)*rs;
pages = reshape(pages,1,1,length(pages));
neigh3d = bsxfun(#plus,neigh2d,pages);
to get any neighborhood of any linear index of vol, just add the linear index to neigh3d:
new_neigh = bxsfun(#plus,neigh3d, lin_index);