Is it possible to filter a matrix by a tagged array? - matlab

I have a 10 dimensional matrix with many, many columns (in the hundreds of thousands). I have however, implemented a tag based on the day of an experiment and a condition
So my original matrix looks something like
0.1 0.25 0.64 0.15 0.1 0.96 0.01 0.05....
.
.
.
.
0.2 0.3 0.049 0 0.3 0.71 0.4 0.45....
I was able to implement a tag for the day and experiment type so my matrix looks like
0.1 0.25 0.64 0.15 0.1 0.96 0.01 0.05....
.
.
.
.
0.2 0.3 0.049 0 0.3 0.71 0.4 0.45....
1 1 1 1 2 2 2 2
1 1 2 2 2 3 3 3
The top row represents a day, and the bottom row represents a condition. Is there anyway to "filter" this matrix, call it A, by day and condition in MATLAB? So for example, if I want the day 1 condition 2 "mini-matrix", I can get
0.64 0.15
.
.
.
0.049 0

Yes, you can do this by accessing only the columns matching a certain value in your day or condition rows.
For example, say your input matrix is A, and that the entries in the third row A(3,:) are the days, and the entries in the fourth row A(4,:) are the conditions.
Then A(:, A(3,:) == 2) will give you the subset of columns in A where the day is 2.
And A(:, A(3,:) == 2 & A(4,:) == 1) will give you the columns where the day is 2 and the condition is 1.

Related

Simulink misses data points in a from-workspace block for discrete simulation

I have a simulation running at 50 Hz, and some data that comes in at 10 Hz. I have extra 'in-between' points with dummy data at the following 50 Hz time points, and interpolation set to off. This should in theory ensure that between 10 Hz time steps, the dummy data is being held and only at the 10 Hz steps is the actual data present. For example, my data vector would be
[0.0 0.8 0.1 0.12 0.2 0.22 0.3 0.32 0.4 0.42 0.5 0.52 ...
-1 -1 1 -1 2 -1 3 -1 4 -1 5 -1 ...]
However, with a scope attached directly from the 'from-workspace' block, simulink is returning this:
[0.0 0.8 0.1 0.12 0.2 0.22 0.3 0.32 0.34 0.4 0.42 0.5 0.52...
-1 -1 -1 -1 2 -1 3 3 -1 4 -1 5 5...]
where some values are skipped and others are repeated in a consistent pattern. Is there something with simulinks time-step algorithms that would cause this?
Edit: A solution I ended up finding was to offset the entire time vector by 1/100th of a second so that the sim was taking data between points rather than on points, and that seemed to fix it. Not ideal, but functional.

Octave - why is surf not working but trisurf does?

I am able to plot a trisurf chart, but surf does not work.
What am I doing wrong?
pkg load statistics;
figure (1,'name','Matrix Map');
colormap('hot');
t = dlmread('C:\Map3D.csv');
tx =t(:,1);ty=t(:,2);tz=t(:,3);
tri = delaunay(tx,ty);
handle = surf(tx,ty,tz); #This does NOT work
#handle = trisurf(tri,tx,ty,tz); #This does work
`error: surface: rows (Z) must be the same as length (Y) and columns (Z) must be the same as length
(X)
My data is in a CSV (commas not shown here)
1 2 -0.32
2 2 0.33
3 2 0.39
4 2 0.09
5 2 0.14
1 2.5 -0.19
2 2.5 0.13
3 2.5 0.15
4 2.5 0.24
5 2.5 0.33
1 3 0.06
2 3 0.44
3 3 0.36
4 3 0.45
5 3 0.51
1 3.5 0.72
2 3.5 0.79
3 3.5 0.98
4 3.5 0.47
5 3.5 0.55
1 4 0.61
2 4 0.13
3 4 0.44
4 4 0.47
5 4 0.58
1 4.5 0.85
surf error message is different in Matlab or in Octave.
Error message from Matlab:
Z must be a matrix, not a scalar or vector.
The problem is pretty clear here since you specified Z (for you tz) as a vector.
Error message from Octave:
surface: rows (Z) must be the same as length (Y) and columns (Z) must be the same as length (X)
You are wrong here since on your example, columns (Z) = 1, but length (X) = 26, so here is the mistake.
One of the consequences of that is that with surf you cannot have "holes" or undefined points on your grid. On your case you have a X-grid from 1 to 5 and a Y-grid from 2 to 4.5 but point of coordinate (2, 4.5) is not defined.
#Luis Mendo, Matlab and Octave do allow the prototype surf(matrix_x, matrix_y, matrix_z) but the third argument matrix_z still have to be a matrix (not a scalar or vector). Apparently, a matrix of only one line or column is not considered as a matrix.
To solve the issue, I suggest something like:
tx = 1:5; % tx is a vector of length 5
ty = 2:0.5:4.5; % ty is a vector of length 6
tz = [-0.32 0.33 0.39 0.09 0.14;
-0.19 0.13 0.15 0.24 0.33;
0.06 0.44 0.36 0.45 0.51;
0.72 0.79 0.98 0.47 0.55;
0.61 0.13 0.44 0.47 0.58;
0.85 0. 0. 0. 0.]; % tz is a matrix of size 6*5
surf(tx,ty,tz);
Note that I had to invent some values at the points where your grid was not defined, I put 0. but you can change it with the value you prefer.

Spark get top N highest score results for each (item1, item2, score)

I have a DataFrame of the following format:
item_id1: Long, item_id2: Long, similarity_score: Double
What I'm trying to do is to get top N highest similarity_score records for each item_id1.
So, for example:
1 2 0.5
1 3 0.4
1 4 0.3
2 1 0.5
2 3 0.4
2 4 0.3
With top 2 similar items would give:
1 2 0.5
1 3 0.4
2 1 0.5
2 3 0.4
I vaguely guess that it can be done by first grouping records by item_id1, then sorting in reverse by score and then limiting the results. But I'm stuck with how to implement it in Spark Scala.
Thank you.
I would suggest to use window-functions for this:
df
.withColumn("rnk",row_number().over(Window.partitionBy($"item_id1").orderBy($"similarity_score")))
.where($"rank"<=2)
Alternatively, you could use dense_rank/rank instead of row_number, depending on how to handle cases where the similarity-score is equal.

How to count number of elements between elements of some sorted vector?

I have two vectors:
First have many values between 0 and 1
Second vector have 100 values(intervals) between 0 and 1: [0 0.01 0.02 .... 1] where 0 0.01 is first interval, 0.01 0.02 second and so on.
I need to create vector, where each element is the number of occurrences of elements of the first vector in each interval from second.
For example:
first [0.00025 0.0001 0.0011 0.0025 0.009 ...(a lot of values bigger then 0.01) ... 1]
then first element of result vector should be 5, and so on.
Any ideas how to implement this in matlab?

changing multiple columns of a matrix with respect to sorted indices of its specific columns

Let's say I have a 2 by 9 matrix. I want to replace the 2 by 3 matrices inside this matrix with respect to descending sort of a(2,3), a(2,6), and a(2,9) elements. For example:
a =
0.4 0.4 0.5 0.6 0.2 0.2 0.6 0.2 0.6
0.5 0.8 0.9 0.9 0.6 0.6 0.1 0.2 0.8
[b i] = sort(a(2,3:3:end),2,'descend')
b =
0.9 0.8 0.6
i =
1 3 2
So, I want to have the following matrix:
a =
0.4 0.4 0.5 0.6 0.2 0.6 0.6 0.2 0.6
0.5 0.8 0.9 0.1 0.2 0.8 0.9 0.6 0.6
Try converting to a cell matrix first and then using your i to rearrange the cells
[b i] = sort(a(2,3:3:end),2,'descend')
A = mat2cell(a, 2, 3*ones(1,3));
cell2mat(A(i))
If for whatever reason you don't want to convert the whole of a into a cell matrix, you can do it by extending your indexing vector i to index all the columns. In your case you'd need:
I = [1,2,3,7,8,9,4,5,6]
which you could generate using a loop or else use bsxfun to get
[1 7 4
2 8 5
3 9 6]
and then "flatten" using reshape:
I = reshape(bsxfun(#plus, 3*s-2, (0:2)'), 1, [])
and then finally
a(:,I)
Typically, when a 2d matrix is separated into blocks, best practice ist to use more dimensions:
a=reshape(a,size(a,1),3,[]);
Now you can access each block via a(:,:,1)
To sort use:
[~,idx]=sort(a(2,3,:),'descend')
a=a(:,:,idx)
If you really need a 2d matrix, change back:
a=reshape(a,2,[])
sortrows-based approach:
n = 3; %// number of columns per block
m = size(a,1);
a = reshape(sortrows(reshape(a, m*n, []).', -m*n).', m, []);
This works by reshaping each block into a row, sorting rows according to last column, and reshaping back.