I have an UNBALANCED dataset containing five fields like:
a_code b_code sector year value
1 2 15 1970 1000
2 3 16 1971 2900
3 2 15 1970 3900
I want to create a 4-dimensional matrix in MATLAB for the "value" field. So I want to have a value field in a matrix such as M(a_code,b_code,sector,year) = value. I have 75 a_code, 75 b_code, 19 sectors and 45 years. So a NaN matrix is (75,75,19,45).
Since my dataset is not balanced (for example I don't have a value for a_code = 3, b_code = 1, sector = 15, year = 1970), I don't have a value for each (a_code, b_code, sector, year) combination. For the unavailable values, I want to have NaN. I know how to create a 4-dimensional matrix with NaN values, but how do I replace these NaN values with the ones in my dataset?
Probably I should write a loop, but I don't know how.
Here is some simple code to fulfill your requirements:
D= [1 2 15 1970 1000; 2 3 16 1971 2900; 3 2 15 1970 3900];
m= min(D(:, 1: end- 1))- 1;
shape= max(D(:, 1: end- 1))- m+ 1;
X= NaN(shape);
for k= 1: size(D, 1)
n= D(k, 1: end- 1)- m;
X(sub2ind(shape, n(1), n(2), n(3), n(4)))= D(k, end);
end
X(1, 1, 1, 1) %=> 1000
X(2, 2, 2, 2) %=> 2900
X(3, 1, 1, 1) %=> 3900
You may like to elaborate more on your specific situation, there may exists more suitable approaches. For example from your question, its not quite so clear why you need to have your data represented as a 4D array.
Related
So my question is as follows: I have a matrix (let's take
A = [ 1 11 22 33; 44 13 12 33; 1 14 33 44,]
as an example) where I want to calculate the mean for all columns separately. The tricky part is that I only want to calculate the mean for those numbers in each column which are greater than the column 25th percentile.
I was thinking to simply create the 25th percentile and then use this as a criterion for selecting rows. This, unfortunately, does not work.
In order to further clarify: What should happen is to go through each column and calculate the 25th percentile
prctile(A,25,1)
And then calculating the mean only for those numbers which are respectively to their column bigger than the percentile.
Any help?
Thanks!
You can create a version of A which is NaN for values below the 25th percentile, then use the 'omitnan' flag in mean to exclude those points:
A = [1 11 22 33; 44 13 12 33; 1 14 33 44];
B = A; % copy to leave A unaltered
B( B <= prctile(B,25,1) ) = NaN; % Turn values to NaN which we want to exclude
C = mean( B, 1, 'omitnan' ); % Omit the NaN values from the caculation
% C >>
% [ 15.33 13.50 27.50 36.67 ]
I have a vector of data for 21 years with daily data and want to create a rolling window of 365 days such as the next period stars one month (30 days) after the previous one. In the question, n_interval defines the difference between the first data point of the next window and the last observation of the previous series.
Let's assume my daily data start from Jan. 1 2000, then the first column would be Jan. 1, 2000 -Jan.1, 2001 and the second column starts from Feb. 1, 2000. and ends on Feb. 1, 2001. and ... the last column will cover Jan. 1, 2017 to Jan. 1, 2018. for example if:
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
for a given variable n_interval = 3, with window_size=5, the output matrix should look like:
mat = [[1 4 7 10 13],
[2 5 8 11 14],
[3 6 9 12 15],
[4 7 10 13 16],
[5 8 11 14 17]]
Given your example vector
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17];
we can create an indexing scheme by as follows:
First, we need to determine how many rows there will be in the mat. Assuming we want every element of vec to be expressed in mat at least once then we need to make sure that last index in the last row is greater than or equal to the size of vec. It's fairly easy to see that the index of the last column in mat is described by
last_index = n_interval*(n_rows-1) + n_columns
We want to ensure that last_index >= numel(vec). Substituting in the above expression into the inequality and solving for n_rows gives
n_rows >= (numel(vec) - n_columns)/n_interval + 1
We assign n_rows to be the ceil of this bound so that it is the smallest integer which satisfies the inequality. Now that we know the number of rows we generate the list of starting indices for each row
start_index = 1:n_interval:(n_interval*(n_rows-1)+1);
In the index matrix we want each column to be 1 plus the previous column. In other words we want to offset the column according to the array index_offset = 0:(n_interval-1).
Using bsxfun we generate the index matrix by computing the sums of all pairs between the start_index and index_offset arrays
index = bsxfun(#plus, index_offset, start_index');
The final thing we need to worry about is going out of bounds. To handle this we apply the mod function to wrap the out of bounds indicies:
index_wrapped = mod(index-1, numel(vec))+1;
Then we simply sample the vector according to index_wrapped
mat = vec(index_wrapped);
The complete code is
n_interval = 3;
n_columns = 5;
vec = 1:17;
n_rows = ceil((numel(vec)-n_columns)/n_interval + 1);
start_index = 1:n_interval:(n_interval*(n_rows-1)+1);
index_offset = 0:(n_columns-1);
index = bsxfun(#plus, index_offset, start_index');
index_wrapped = mod(index-1, numel(vec))+1;
mat = vec(index_wrapped);
Suppose I create the following matrix M.
>>>M = reshape(linspace(11,18,8),[2, 2, 2])
>>>M(:,:,1) =
11 13
12 14
M(:,:,2) =
15 17
16 18
>>>M([1,2],[2, 1],[2,1])
>>>
ans(:,:,1) =
17 15
18 16
ans(:,:,2) =
13 11
14 12
Please explain how the command M([1,2],[2, 1],[2,1]) produces the above result.explain the indexing in detail.
Let us start with simple example,
M([1,2],1,1) is going to show rows 1 and 2 of column 1 and 2d matrix 1.
ans =
11
12
M([2,1],1,1) is going to exchange rows 1 and 2 of column 1 and 2d matrix 1.
ans =
12
11
So changing the first index changes the rows order. Similarly Modifying second index will change the column order. And modifying third index will change the 2d matrices order.
Now let us take small example before we go to your problem,
M([1,2],[2,1],1)
Will exchange the columns of both rows of first 2d matrix.
ans =
13 11
14 12
And M([1,2],[1,2],[2,1]) will exchange between 1st 2d matrix and the second,
ans(:,:,1) =
15 17
16 18
ans(:,:,2) =
11 13
12 14
So combining the last two examples will exchange the 2d matrices and exchange their columns with keeping the orders of rows and that what exactly your answer shows
M([1,2],[2, 1],[2,1])
ans(:,:,1) =
17 15
18 16
ans(:,:,2) =
13 11
14 12
Hope this helps.
First, M is a three-dimensional matrix, which consists of two 2x2 matrix. So M(:,:,1) will get the first 2x2 matrix, and M(:,:,2) will get the second 2x2 matrix.
Some examples :
M(1, 2, 2) will give answer 17, which is the element in row 1 and column 2 of the 2nd matrix.
M(1, 1, [2, 1]) will give answers : 15 and 11 in that order. It will gather the element in row 1 and column 1, of each from the two matrix in [2 1] order (so it will return from the 2nd matrix first).
M(1, 1, [1, 2]) will give answers : 11 and 15 in that order. It will gather the element in row 1 and column 1, of each from the two matrix in [1 2] order (so it will return from the 1st matrix first).
M(1, [1 2], [2, 1]) will give answers : 15 17 and 11 13 in that order. It will gather the element(s) in row 1 of column 1 & 2 (in this order), of each from the two matrix in [2 1] order (so it will return from the 2nd matrix first).
M(2, [1 2], [2, 1]) will give answers : 16 18 and 12 14 in that order. It will gather the element(s) in row 2 of column 1 & 2 (in this order), of each from the two matrix in [2 1] order (so it will return from the 2nd matrix first).
M(2, [1 1], [2, 1]) will give answers : 16 16 and 12 12 in that order. It will gather the element(s) in row 2 of column 1 & 1 (*same column), of each from the two matrix in [2 1] order (so it will return from the 2nd matrix first).
So,
M([1 2], [2 1], [2, 1]) will give answers :
17 15
18 16
( which means the output is a 2x2 matrix, with 1st column and 2nd column are 2nd column and 1st column (respectively, by [2 1]) of the 2nd matrix. The 1st row and 2nd row are 1st and 2nd rows (respectively, by [1 2]) of the 2nd matrix. )
and also
13 11
14 12
....
Also, you may want to read the documentation : https://www.mathworks.com/help/matlab/math/matrix-indexing.html
Let say I have
A=[1 3 4 5 6 7 9 12 15 16 17 18 20 23 24 25 26];
My interest is how to find the middle value between consecutive numbers using Matlab.
For example, first group of consecutive numbers is
B=[3 4 5 6 7];
so the answer should be is 5. The 2nd group of consecutive numbers (i.e. [15 16 17 18]) should give 16 etc...
At the end, my final answer is
[5 16 24]
Here is a vectorized approach:
d = [diff(A) == 1, 0];
subs = cumsum([diff(d) == 1, 0]).*(d | [0, diff(d) == -1]) + 1
temp = accumarray(subs', A', [], #median)
final = floor(temp(2:end))
Here is some sample code which does what you are looking for. I'll let you play with the different outputs to see what they do exactly, although I wrote some comments to follow:
clear
clc
A=[1 3 4 5 6 7 9 12 15 16 17 18 20 23 24 25 26]
a=diff(A); %// Check the diff array to identify occurences different than 1.
b=find([a inf]>1);
NumElements=diff([0 b]); %//Number of elements in the sequence
LengthConsec = NumElements((NumElements~=1)) %// Get sequences with >1 values
EndConsec = b(NumElements~=1) %// Check end values to deduce starting values
StartConsec = EndConsec-LengthConsec+1;
%// Initialize a cell array containing the sequences (can have ifferent
%lengths, i.e. an array is not recommended) and an array containing the
%median values.
ConsecCell = cell(1,numel(LengthConsec));
MedianValue = zeros(1,numel(LengthConsec));
for k = 1:numel(LengthConsec)
ConsecCell{1,k} = A(StartConsec(k):1:EndConsec(k));
MedianValue(k) = floor(median(ConsecCell{1,k}));
end
%//Display the result
MedianValue
Giving the following:
MedianValue =
5 16 24
diff + strfind based approach -
loc_consec_nums = diff(A)==1 %// locations of consecutive (cons.) numbers
starts = strfind([0 loc_consec_nums],[0 1]) %// start indices of cons. numbers
ends = strfind([loc_consec_nums 0],[1 0]) %// end indices of cons. numbers
out = A(ceil(sum([starts ; ends],1)./2))%// median of each group of starts and ends
%// and finally index into A with them for the desired output
I have a matrix of 2d lets assume the values of the matrix
a =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
17 24 1 8 15
11 18 25 2 9
This matrix is going to be divided into three different matrices randomly let say
b =
17 24 1 8 15
23 5 7 14 16
c =
4 6 13 20 22
11 18 25 2 9
d =
10 12 19 21 3
17 24 1 8 15
How can i know the index of the vectors in matrix d for example in the original matrix a,note that the values of the matrix can be duplicated.
for example if i want to know the index of {10 12 19 21 3} in matrix a?
or the index of {17 24 1 8 15} in matrix a,but for this one should return only on index value?
I would appreciate it so much if you can help me with this. Thank you in advance
You can use ismember with the 'rows' option. For example:
tf = ismember(a, c, 'rows')
Should produce:
tf =
0
0
1
0
0
1
To get the indices of the rows, you can apply find on the result of ismember (note that it's redundant if you're planning to use this vector for matrix indexing). Here find(tf) return the vector [3; 6].
If you want to know the number of the row in matrix a that matches a single vector, you either use the method explained and apply find, or use the second output parameter of ismember. For example:
[tf, loc] = ismember(a, [10 12 19 21 3], 'rows')
returns loc = 4 for your example. Note that here a is the second parameter, so that the output variable loc would hold a meaningful result.
Handling floating-point numbers
If your data contains floating point numbers, The ismember approach is going to fail because floating-point comparisons are inaccurate. Here's a shorter variant of Amro's solution:
x = reshape(c', size(c, 2), 1, []);
tf = any(all(abs(bsxfun(#minus, a', x)) < eps), 3)';
Essentially this is a one-liner, but I've split it into two commands for clarity:
x is the target rows to be searched, concatenated along the third dimension.
bsxfun subtracts each row in turn from all rows of a, and the magnitude of the result is compared to some small threshold value (e.g eps). If all elements in a row fall below it, mark this row as "1".
It depends on how you build those divided matrices. For example:
a = magic(5);
d = a([2 1 2 3],:);
then the matching rows are obviously: 2 1 2 3
EDIT:
Let me expand on the idea of using ismember shown by #EitanT to handle floating-point comparisons:
tf = any(cell2mat(arrayfun(#(i) all(abs(bsxfun(#minus, a, d(i,:)))<1e-9,2), ...
1:size(d,1), 'UniformOutput',false)), 2)
not pretty but works :) This would be necessary for comparisons such as: 0.1*3 == 0.3
(basically it compares each row of d against all rows of a using an absolute difference)