Matlab : backtrack path from matrices - matlab

I have a data :
minval = NaN 7 8 9 9 9 10 10 10 10
NaN NaN 10 10 10 10 10 10 10 10
NaN NaN NaN 10 10 9 10 10 10 9
NaN NaN NaN NaN 9 9 10 9 10 10
NaN NaN NaN NaN NaN 9 10 10 10 10
NaN NaN NaN NaN NaN NaN 10 11 10 10
NaN NaN NaN NaN NaN NaN NaN 10 10 10
NaN NaN NaN NaN NaN NaN NaN NaN 10 10
NaN NaN NaN NaN NaN NaN NaN NaN NaN 10
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
and I do this following :
C=size(minval,2);
D1(1,2:end) = minval(1,2:C);
D2 = bsxfun(#plus,minval(2:C-1,3:C),D1(1,1:C-2)');
D2 = [zeros(1,size(D2,2)) ;D2];
D2(D2==0) = NaN;
D1(2,3:end) = nanmin(D2);
D3 = bsxfun(#plus,minval(3:C-1,4:C),D1(2,2:C-2)');
D3 = [zeros(2,size(D3,2)) ;D3];
D3(D3==0) = NaN;
D1(3,4:end)= nanmin(D3);
Then, I want to backtrack the path which D1(end,end)comes from.
Is there any help? Thank you.

In MATLAB you can index out parts of matrices directly. There's no need for loops here:
C=size(minval,2);
D1(2:C) = minval(1,2:C);
For these ones you are not doing what you hoped, I suspect:
for e=3:C
for b=2:e-1
D2(e)=min(minval(b,e)+D1(b-1));
end
end
In the inner loop, for each value of b (from 2 to e-1), you are overwriting the value of D2 at each step. Only the result for the last value of b will be recorded. There may well be a much simpler way of getting the result you want. min and other functions do not just work on two single values but on entire matrices - e.g. you can do:
min(minval)
ans =
NaN 7 8 9 9 9 10 9 10 9

Related

How to pad an irregularly shaped matrix in matlab

I have a matrix with values in the center and NaNs on the border (imagine a matrix representing a watershed which is never square). I need to pad it with one cell to do some component stress calculations. I am trying to avoid using outside libraries from the core Matlab functionality however what i am trying to do is similar to padarray symmetric but for an irregular border:
padarray(Zb,[1 1],'symmetric','both');
For example:
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN 2 5 39 55 44 8 NaN NaN NaN
NaN NaN NaN NaN 7 33 48 31 66 17 NaN NaN
NaN NaN NaN NaN 28 NaN 89 NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Becomes:
NaN NaN 2 2 5 39 55 44 8 8 NaN NaN
NaN NaN 2 2 5 39 55 44 8 17 17 NaN
NaN NaN 2 2 7 33 48 31 66 17 17 NaN
NaN NaN NaN 28 28 33 89 31 66 17 17 NaN
NaN NaN NaN 28 28 28 89 89 NaN NaN NaN NaN
(Not sure how to handle convex corners with two adjacent values since I need to control edge effects).
This post follows on an earlier question today in which I was able to extract the locations of these padded cells (buffers) into a dilated logical. However using fillmissing with nearest did not create the effect I expected (what padarray does).
Zb_ext(logical(ZbDilated)) = fillmissing(Zb_ext(logical(ZbDilated)),'nearest');
I might be able to reverse what I did to find the padcells to find the adjacent values and use those to replace the pad cell NaNs. But I thought I would first see if there was a simpler solution?
You can use two 2D convolutions to achieve this, where conv2 is within the core MATLAB library so nothing external is needed, and it should be fast.
However, you noted this:
Not sure how to handle convex corners with two adjacent values since I need to control edge effects
I've taken the liberty of defining a "sensible" output for convex corners which is to take the average value, because from your example it seems undefined how these cases, and more complicated ones like cell (5,6), should be handled.
I've added detailed comments to the below code for explanation
% Example matrix
A = [
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN 2 5 39 55 44 8 NaN NaN NaN
NaN NaN NaN NaN 7 33 48 31 66 17 NaN NaN
NaN NaN NaN NaN 28 NaN 89 NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
];
% Track the "inner" indices, where values are defined
inner = (~isnan(A));
B = A; % Copy A so we don't change it
B(~inner) = 0; % Replace NaN with 0 so that convolutions work OK
% First dilate the inner region by one element, taking the average of
% neighbours which are up/down/left/right (no diagonals). This is required
% to avoid including interior points (which only touch diagonally) in the
% averaging. These can be considered the "cardinal neighbours"
kernel = [0 1 0 ; 1 0 1; 0 1 0]; % Cardinal directions in 3x3 stencil
s = conv2(B,kernel,'same'); % 2D convolution to get sum of neighbours
n = conv2(inner,kernel,'same'); % 2D convolution to get count of neighbours
s(inner) = 0; % Zero out the inner region
s = s./n; % Get the mean of neighbours
% Second, dilate the inner region but including the mean from all
% directions. This lets us handle convex corners in the image
s2 = conv2(B,ones(3),'same'); % Sum of neighbours (and self, doesn't matter)
n = conv2(inner,ones(3),'same'); % Count of neighbours (self=0 for dilated elems)
s2 = s2./n; % Get the mean of neighbours
% Finally piece together the 3 matrices:
out = s2; % Start with outmost dilation inc. corners
out(~isnan(s)) = s(~isnan(s)); % Override with inner dilation for cardinal neighbours
out(inner) = A(inner); % Override with original inner data
So for this example, the output would be the same as your example output, except for corners as mentioned:
NaN NaN 2 2 5 39 55 44 8 8 NaN NaN
NaN NaN 2 2 5 39 55 44 8 12.5 17 NaN
NaN NaN 2 4.5 7 33 48 31 66 17 17 NaN
NaN NaN NaN 28 28 50 89 60 66 17 17 NaN
NaN NaN NaN 28 28 58.5 89 89 NaN NaN NaN NaN
Related (and utilised): MATLAB/Octave: Calculate the sum of adjacent/neighboring elements in a matrix

How to get utility Matrix from initial dataset?

While I apply Alternating Least Squares,I found need to use utility matrix.
I'm working on 20 milion Movielens dataset which contain rating file(userId ,MovieId ,Rating).
I know utility matrix (M X N) where M is the number of users and N is the number of Movies .
my question: How to build utility matrix from rating file?
As, the 20M dataset couldn't fit in my computer, during the pivot call, I am showing the process for 1M dataset.
import re
import os
import zipfile
import numpy as np
import pandas as pd
from sklearn import preprocessing
from urllib.request import urlretrieve
# Creating required folders, if they don't exist
def create_dir(dirname):
if os.path.exists(dirname):
print(f"Directory {dirname} already exists.")
else:
os.mkdir(dirname)
create_dir('Datasets')
print("Downloading movielens data...")
urlretrieve("http://files.grouplens.org/datasets/movielens/ml-1m.zip", "movielens.zip")
zip_ref = zipfile.ZipFile('movielens.zip', "r")
zip_ref.extractall()
print("Extraction done")
# Loading ratings dataset and renamed extracted folder
ratings = pd.read_csv('ml-1m/ratings.dat', sep='::', names=['userId', 'movieId', 'rating', 'timestamp'])
ratings = ratings.drop(columns=['timestamp'])
ratings.to_csv('Datasets/ratings.csv', index=False)
print(ratings.shape)
pivot_table = ratings.pivot_table(index=['userId'], columns=['movieId'], values='rating')
pivot_table.to_csv('Datasets/user_vs_movies.csv', index=False)
pivot_table.head()
Output:
Downloading movielens data...
Extraction done
(1000209, 3)
movieId 1 2 3 4 5 6 7 8 9 10 ... 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952
userId
1 5.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN 2.0 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 rows × 3706 columns

Matlab: Plot non-equal matrices in a cell array without a loop

Knowing that:
There are a lot of discussion about plotting equal sized matrices in a cell array and it is quite easy to do without a loop.
For example, to plot the 2-by-2 matrices in mycell:
mycell = {[1 1; 2 1], [1 1; 3 1], [1 1; 4 1]};
We can use cellfun to add a row of NaN at the bottom of each matrix and then convert the cell to a matrix:
mycellnaned = cellfun(#(x) {[x;nan(1,2)]}, mycell);
mymat = cell2mat(mycellnaned');
mymat looks like:
1 1 1 1 1
2 1 3 1 4
NaN NaN NaN NaN NaN
Then we can plot it easily:
mymatx = mymat(:,1:2:end);
mymaty = mymat(:,2:2:end);
figure;
plot(mymatx, mymaty,'+-');
The problem:
The problem is now, how do I do something similar with a cell containing non-equal matrices? Such as:
mycell = {
[1:2; ones(1,2)]';
[1:4; ones(1,4)*2]';
[1:6; ones(1,6)*3]';
[1:8; ones(1,8)*4]';
[1:10; ones(1,10)*5]';
[1:12; ones(1,12)*6]';
};
mycell = repmat(mycell,1000,1);
I would not be able to convert them into one matrix like I did before. I could use a loop, as suggested in this answer, but it would be very inefficient if the cell contains thousands of matrices.
Therefore, I'm looking for a more efficient way of plotting non-equal sized matrices in a cell array.
Note that different colours should be used for different matrices in the figure.
Well, while I was writing the question, I figured it out...
I'd like to keep the question open since there might be better solutions.
For everyone else's reference, the solution is simple: add NaN to make the matrices equal sized:
% find out the maximum length of all matrices in the array
cellLengthMax = max(cellfun('length', mycell));
% fill the matrices so they are equal in size.
mycellfilled = cellfun(#(x) {[
x
nan(cellLengthMax-size(x,1), 2)
nan(1, 2)
]}, mycell);
Then convert to a matrix and plot:
mymat = cell2mat(mycellfilled');
mymatx = mymat(:,1:2:end);
mymaty = mymat(:,2:2:end);
figure;
plot(mymatx, mymaty,'+-');
mymat looks like:
1 1 1 2 1 3 1 4 1 5 1 6
2 1 2 2 2 3 2 4 2 5 2 6
NaN NaN 3 2 3 3 3 4 3 5 3 6
NaN NaN 4 2 4 3 4 4 4 5 4 6
NaN NaN NaN NaN 5 3 5 4 5 5 5 6
NaN NaN NaN NaN 6 3 6 4 6 5 6 6
NaN NaN NaN NaN NaN NaN 7 4 7 5 7 6
NaN NaN NaN NaN NaN NaN 8 4 8 5 8 6
NaN NaN NaN NaN NaN NaN NaN NaN 9 5 9 6
NaN NaN NaN NaN NaN NaN NaN NaN 10 5 10 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Update:
Time cost for plotting 6000 matrices:
using the solution proposed here: 1.183546 seconds.
using a loop: 3.450423 seconds.
Still not very satisfactory. I really wish to reduce the time to 0.1 seconds, because I'm trying to design an interactive UI, where the user can change a few parameters and the result get plotted instantly.
I don't want to reduce the resolution of the figure.
Update:
I did a profiler and it seems the 99% of the time is wasted on plot(mymatx, mymaty,'+-');. So the conclusion is, there is probably no other way to fasten this.

Include rows of NaN in matrix at predetermined row numbers.

Initially, I have
Matrix A=
[ 1 2 3
4 255 6
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11 ];
I find out the row numbers which are all NaN.
Row_NaN_MatA = [3 6];
After eliminating these rows, I am left with:
Matrix B1 =
[ 1 2 3
4 255 6
7 8 9
10 11 12
10 9 11 ];
After applying a filter, I make the second row of Matrix B = NaN NaN NaN. Therefore
Matrix B2 =
[ 1 2 3
NaN NaN NaN
7 8 9
10 11 12
10 9 11 ];
Now, the question is, after all these processing, I want to get the initial matrix back, but with all the deleted elements as NaN. So the required output I want is:
Output Matrix=
[ 1 2 3
NaN NaN NaN
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11 ];
I know the dimensions of output I want (= initial Matrix A dimensions), and the row numbers which should be NaN (= Row_NaN_MatA) . The rest of the rows should be equal to rows of Matrix B2.
How can I do this?
Use setdiff to get the row IDs that were not part of Row_NaN_MatA by setdiff-ing Row_NaN_MatA with the a 1D array of indices for the entire row extent of A, like so -
output = A
output(setdiff(1:size(A,1),Row_NaN_MatA),:) = B2
You can also use ismember for the same effect -
output(~ismember(1:size(A,1),Row_NaN_MatA),:) = B2
Or use bsxfun there -
output(all(bsxfun(#ne,Row_NaN_MatA(:),1:size(A,1))),:) = B2
Sample run -
>> A
A =
1 2 3
4 255 6
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11
>> B1
B1 =
1 2 3
4 255 6
7 8 9
10 11 12
10 9 11
>> B2
B2 =
1 2 3
NaN NaN NaN
7 8 9
10 11 12
10 9 11
>> output
output =
1 2 3
NaN NaN NaN
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11

How to trim leading and trailing NaN values from n-dimensional array?

This is easy in two dimensions, for example:
>> A = NaN(5,4)
>> A(2:4,2:3) = [1 2; 3 4; 5 6]
>> A(2,2) = NaN
>> A(4,3) = NaN
A =
NaN NaN NaN NaN
NaN NaN 2 NaN
NaN 3 4 NaN
NaN 5 NaN NaN
NaN NaN NaN NaN
>> A(~all(isnan(A),2),~all(isnan(A),1))
ans =
NaN 2
3 4
5 NaN
Note that NaN values in rows and columns that are not all NaN are retained.
How to expand this to multiple dimensions? For example if A has three dimensions:
>> A = NaN(5,4,3)
>> A(2:4,2:3,2) = [1 2; 3 4; 5 6]
>> A(2,2,2) = NaN
>> A(4,3,2) = NaN
A(:,:,1) =
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
A(:,:,2) =
NaN NaN NaN NaN
NaN NaN 2 NaN
NaN 3 4 NaN
NaN 5 NaN NaN
NaN NaN NaN NaN
A(:,:,3) =
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
How do I then get
ans =
NaN 2
3 4
5 NaN
I'd like to do this in four dimensions, and with much larger matrixes than the example matrix A here.
My solution to the problem based on the input A as posted by OP:
>> [i,j,k] = ind2sub(size(A),find(~isnan(A)));
>> l = min([i j k]);
>> u = max([i j k]);
>> B=A(l(1):u(1),l(2):u(2),l(3):u(3))
B =
NaN 2
3 4
5 NaN
>> size(B)
ans =
3 2
Since you stated that you want to do this on much larger matrices I'm not sure about the performance of #ronalchn's solution - that is all the all-calls. But I have no idea to what extend that matters - maybe someone can comment...
Try this:
2 dimensions
A(~all(isnan(A),2),~all(isnan(A),1))
3 dimensions
A(~all(all(isnan(A),2),3),...
~all(all(isnan(A),1),3),...
~all(all(isnan(A),1),2))
4 dimensions
A(~all(all(all(isnan(A),2),3),4),...
~all(all(all(isnan(A),1),3),4),...
~all(all(all(isnan(A),1),2),4),...
~all(all(all(isnan(A),1),2),3))
Basically, the rule is for N dimensions:
on all N dimensions you do the isnan() thing.
Then wrap it in with the all() function N-1 times,
and the 2nd argument each of the all() functions for the ith dimension should be numbers 1 to N in any order, but excluding i.
Since Theodros Zelleke wants to see whose method is faster (nice way of saying he thinks his method is so fast), here's a benchmark. Matrix A defined as:
A = NaN*ones(100,400,3,3);
A(2:4,2:3,2,2) = [1 2; 3 4; 5 6];
A(2,2,2,2) = NaN;A(4,3,2,2) = NaN;
A(5:80,4:200,2,2)=ones(76,197);
His test defined as:
tic;
for i=1:100
[i,j,k,z] = ind2sub(size(A),find(~isnan(A)));
l = min([i j k z]);
u = max([i j k z]);
B=A(l(1):u(1),l(2):u(2),l(3):u(3),l(4):u(4));
end
toc
With results:
Elapsed time is 0.533932 seconds.
Elapsed time is 0.519216 seconds.
Elapsed time is 0.575037 seconds.
Elapsed time is 0.525000 seconds.
My test defined as:
tic;
for i=1:100
isnanA=isnan(A);
ai34=all(all(isnanA,3),4);
ai12=all(all(isnanA,1),2);
B=A(~all(ai34,2),~all(ai34,1),~all(ai12,4),~all(ai12,3));
end
toc
With results:
Elapsed time is 0.224869 seconds.
Elapsed time is 0.225132 seconds.
Elapsed time is 0.246762 seconds.
Elapsed time is 0.236989 seconds.