I have two arrays. The first one is a consecutive sequential one, like:
seq1 =
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
...continues
The second one is like:
seq2 =
2 250
3 260
5 267
6 270
8 280
10 290
13 300
18 310
20 320
21 330
...continues
I need to embed seq2 into seq1 in such a way that I end up with the sequence:
seq3 =
1 0
2 250
3 260
4 260
5 267
6 270
7 270
8 280
9 280
10 290
11 290
... continues
I could do this with loops but the arrays are really big so I don't want to use two for loops, it is taking too long. How can I do this in a vectorised manner?
I think this does what you want:
[~, jj, vv] = find(sum(bsxfun(#le, seq2(:,1), seq1(:,1).'), 1));
seq3 = seq1;
seq3(jj,2) = seq2(vv,2);
How it works
The required index is obtained by computing how many values in the first column of seq2 are less than or equal to each value in the first column or seq1 (code sum(bsxfun(#le, ...), 1)). This will be used to select the appropriate entries from the second column of seq2 and write them into the result. But before that, the value 0 in this index needs to be discarded. This is done using the three-output version of find (code [~, jj, vv] = find(...)).
If your second column of data is always increasing, you can solve this easily with accumarray and cummax:
seq = [seq1; seq2];
seq3 = cummax(accumarray(seq(:, 1), seq(:, 2), [], #max));
seq3 = [(1:numel(seq3)).' seq3];
And here's what you get for your sample inputs:
seq3 =
1 0
2 250
3 260
4 260
5 267
6 270
7 270
8 280
9 280
10 290
11 290
12 290
13 300
14 300
15 300
16 300
17 300
18 310
19 310
20 320
21 330
How it works...
After concatenating seq1 and seq2, accumarray collects all the values in the second column that have the same value in the first column (i.e. [0 250] for the value 2), then gets the maximum value of each set. The function cummax is then used to fill any zero values with the previous non-zero value. Finally, an index column is added to the new sequence.
Related
I have a csv file with experiment results that goes like this:
64 4 8 1 1 2 1 ttt 62391 4055430 333 0.0001 10 161 108 288 0
64 4 8 1 1 2 1 ttt 60966 3962810 322 0.0001 10 164 112 295 0
64 4 8 1 1 2 1 ttt 61530 3999475 325 0.0001 10 162 112 291 0
64 4 8 1 1 2 1 ttt 61430 4054428 332 0.0001 10 158 110 286 0
64 4 8 1 1 2 1 ttt 63891 4152938 339 0.0001 9 149 109 274 0
64 4 32 1 1 2 1 ttt 63699 4204182 345 0.0001 4 43 179 240 0
64 4 32 1 1 2 1 ttt 63326 4116218 336 0.0001 4 45 183 248 0
64 4 32 1 1 2 1 ttt 62654 4135211 340 0.0001 4 48 178 248 0
64 4 32 1 1 2 1 ttt 63192 4107506 339 0.0001 4 49 175 245 0
64 4 32 1 1 2 1 ttt 62707 4138666 345 0.0001 4 46 179 245 0
64 4 64 1 1 2 1 ttt 60968 3962929 323 0.0001 4 46 191 256 0
64 4 64 1 1 2 1 ttt 58765 3819787 305 0.0001 4 50 196 267 0
64 4 64 1 1 2 1 ttt 58946 3831499 308 0.0001 5 52 187 260 0
64 4 64 1 1 2 1 ttt 60646 3942047 321 0.0001 4 47 187 254 0
64 4 64 1 1 2 1 ttt 59723 3882044 311 0.0001 4 46 201 269 0
64 8 8 1 1 2 1 ttt 63414 4185382 382 0.0001 33 517 109 643 0
64 8 8 1 1 2 1 ttt 62429 4057899 372 0.0001 33 538 110 667 0
64 8 8 1 1 2 1 ttt 60622 3940452 384 0.0001 33 556 115 689 0
64 8 8 1 1 2 1 ttt 64433 4188192 369 0.0001 33 519 110 644 0
My goal is to be able to plot various combinations (choose which, in different charts) of the columns before the "ttt", with the average and standard deviation of the columns (choose which) after "ttt" (by grouping them by the before "ttt" columns).
Is this possible in GNUPlot and if yes how? If not, do you have any alternate suggestions regarding my problem?
Here is a completely revised and more general version.
Since you want to filter by 3 columns you need to have 3 properties to distinguish the data in the plot. This would be for example color, x-position and pointtype. What the script basically does:
Generates random data for testing (take your file instead)
$Data looks like this:
8 64 57773 0
4 32 64721 2
8 32 56757 1
4 16 56226 2
8 8 56055 1
8 64 59874 0
8 32 58733 0
4 16 55525 2
8 32 58869 0
8 64 64470 0
4 32 60930 1
8 64 57073 2
...
the variables ColX, ColC, ColP, and ColS define which columns are taken for x-position, color, pointtype and statistics.
find unique values of ColX, ColC, ColP, (check help smooth frequency) and put them to datablocks $ColX, $ColC, and $ColP.
put the unique values to arrays ArrX, ArrC, ArrP
loop all possible combinations and do statistics on ColS and put it to $Data2. Add 3 columns at the beginning for color, x-position and pointtype.
$Data2 looks like this:
1 1 1 0 8 4 61639.4 2788.4
1 1 2 0 8 8 59282.1 2740.2
1 2 1 0 16 4 59372.3 2808.6
1 2 2 0 16 8 60502.3 2825.0
1 3 1 0 32 4 59850.7 2603.8
1 3 2 0 32 8 60617.7 1979.8
1 4 1 0 64 4 60399.4 3273.6
1 4 2 0 64 8 59930.7 2919.8
2 1 1 1 8 4 59172.6 2288.2
2 1 2 1 8 8 58992.2 2888.0
2 2 1 1 16 4 59350.1 2364.6
2 2 2 1 16 8 61034.0 2368.5
2 3 1 1 32 4 59920.8 2867.6
2 3 2 1 32 8 59711.9 3464.2
2 4 1 1 64 4 60936.7 3439.7
2 4 2 1 64 8 61078.7 2349.3
3 1 1 2 8 4 58976.0 2376.3
3 1 2 2 8 8 61731.5 1635.7
3 2 1 2 16 4 58276.0 2101.7
3 2 2 2 16 8 58594.5 3358.5
3 3 1 2 32 4 60471.5 3737.6
3 3 2 2 32 8 59909.1 2024.0
3 4 1 2 64 4 62044.2 1446.7
3 4 2 2 64 8 60454.0 3215.1
Finally, plot the data. I couldn't figure out how plotting style with yerror works properly together with variable pointtypes. So, instead I split it into two plot commands with vectors and with points. The third one keyentry is just to get an empty line in the legend and the forth one is to get the pointtype into the legend.
I hope you can figure out all the other details and adapt it to your data.
Code:
### grouped statistics on filtered (unsorted) data
reset session
set colorsequence classic
# generate some random test data
rand1(n) = 2**(int(rand(0)*2)+2) # values 4,8
rand2(n) = 2**(int(rand(0)*4)+3) # values 8,16,32,64
rand3(n) = int(rand(0)*10000)+55000 # values 55000 to 65000
rand4(n) = int(rand(0)*3) # values 0,1,2
set print $Data
do for [i=1:200] {
print sprintf("% 3d% 4d% 7d% 3d", rand1(0), rand2(0), rand3(0), rand4(0))
}
set print
print $Data # (just for test purpose)
ColX = 2 # column for x
ColC = 4 # column for color
ColP = 1 # column for pointtype
ColS = 3 # column for statistics
# get unique values of the columns
set table $ColX
plot $Data u (column(ColX)) smooth freq
unset table
set table $ColC
plot $Data u (column(ColC)) smooth freq
unset table
set table $ColP
plot $Data u (column(ColP)) smooth freq
unset table
# put unique values into arrays
set table $Dummy
array ArrX[|$ColX|-6] # gnuplot creates 6 extra lines
array ArrC[|$ColC|-6]
array ArrP[|$ColP|-6]
plot $ColX u (ArrX[$0+1]=$1)
plot $ColC u (ArrC[$0+1]=$1)
plot $ColP u (ArrP[$0+1]=$1)
unset table
print ArrX, ArrC, ArrP # just for test purpose
# define filter function
Filter(c,x,p) = ArrX[x]==column(ColX) && ArrC[c]==column(ColC) && \
ArrP[p]==column(ColP) ? column(ColS) : NaN
# loop all values and do statistics, write data into $Data2
set print $Data2
do for [c=1:|ArrC|] {
do for [x=1:|ArrX|] {
do for [p=1:|ArrP|] {
undef var STATS*
stats $Data u (Filter(c,x,p)) nooutput
if (exists('STATS_mean') && exists('STATS_stddev')) {
print sprintf("% 3d% 3d% 3d% 3d% 3d% 3d% 9.1f % 7.1f", c, x, p, ArrC[c], ArrX[x], ArrP[p], STATS_mean, STATS_stddev)
}
}
}
print ""; print ""
}
set print
# print $Data2 # just for testing purpose
set xlabel sprintf("Column %d", ColX)
set ylabel sprintf("Column %d", ColS)
set xrange[0.5:|ArrX|+1]
set xtics () # remove all xtics
do for [x=1:|ArrX|] { set xtics add (sprintf("%d",ArrX[x]) x)} # set xtics "manually"
# function for x position and offsets,
# actually not dependent on 'n' but to shorten plot command
# columns in $Data2: 1=color, 2=x, 3=pointtype
width = 0.5 # float number!
step = width/(|ArrC|-1)
PosX(n) = column(2) - width/2.0 + step*(column(1)-1) + (column(3)-1)*step*0.3
plot \
for [c=1:|ArrC|] $Data2 u (PosX(0)):($7-$8):(0):(2*$8) index c-1 w vectors \
heads size 0.04,90 lw 2 lc c ti sprintf("%g",ArrC[c]),\
for [c=1:|ArrC|] '' u (PosX(0)):7:($3*2+4):(c) index c-1 w p ps 1.5 pt var lc var not, \
keyentry w p ps 0 ti "\n", \
for [p=1:|ArrP|] '' u (0):(NaN) w p pt p*2+4 ps 1.5 lc rgb "black" ti sprintf("%g",ArrP[p])
### end of code
Result:
I do not think gnuplot can produce exactly what you are asking for in a single plot command. I will show you two alternatives in the hope that one or both is a useful starting point.
Alternative 1: standard boxplot
spacing = 1.0
width = 0.25
unset key
set xlabel "Column 3"
set ylabel "Column 9"
plot 'data' using (spacing):9:(width):3 with boxplot lw 2
This collects points based on the value in column 3 and for each such value it produces a boxplot. This is a widely used method of showing the distribution of point values in a category, but it is a quartile analysis not a display of mean + standard deviation.
Alternative 2: calculate mean and standard deviation for categories known in advance
The boxplot analysis has the advantage that you do not need to know in advance what values may be present in column 3. Gnuplot can calculate mean and standard deviation based on a column 3 value, but you need to specify in advance what that value is. Here is a set of commands tailored to the specific example file you provided. It calculates, but does not plot, the requested categorical mean and standard deviation. You can use these numbers to construct a plot, but that will require additional commands. You could, for example, save the values for each category in a new file, or array, or datablock and then go back and plot these together.
col3entry = "8 32 64"
do for [i in col3entry] {
stats "data" using ($3 == real(i) ? $9 : NaN) name "Condition".i nooutput
print i, ": ", value("Condition".i."_mean"), value("Condition".i."_stddev")
}
output:
8: 62345.1111111111 1259.34784220021
32: 63115.6 392.552977316438
64: 59809.6 881.583711283279
I have this matrix:
A = [92 92 92 91 91 91 146 146 146 0
0 0 112 112 112 127 127 127 35 35
16 16 121 121 121 55 55 55 148 148
0 0 0 96 96 0 0 0 0 0
0 16 16 16 140 140 140 0 0 0]
How can I replace consecutive zero value with shuffled consecutive value from matrix B?
B = [3 3 3 5 5 6 6 2 2 2 7 7 7]
The required result is some matrix like this:
A = [92 92 92 91 91 91 146 146 146 0
6 6 112 112 112 127 127 127 35 35
16 16 121 121 121 55 55 55 148 148
7 7 7 96 96 5 5 3 3 3
0 16 16 16 140 140 140 2 2 2]
You simply can do it like this:
[M,N]=size(A);
for i=1:M
for j=1:N
if A(i,j)==0
A(i,j)=B(i+j);
end
end
end
If I understand it correctly from what you've described, your solution is going to need the following steps:
Loop over the rows of your matrix, e.g. for row = 1:size(A, 1)
Loop over the elements of each row, identify where each run of zeroes starts and store the indices and the length of the run. For example you might end up with a matrix like: consecutiveZeroes = [ 2 1 2 ; 4 1 3 ; 4 6 5 ; 5 8 3 ] indicating that you have a run at (2, 1) of length 2, a run at (4, 1) of length 3, a run at (4, 6) of length 5, and a run at (5, 8) of length 3.
Now loop over the elements of B counting up how many elements there are of each value. For example you might store this as replacementValues = [ 3 3 ; 2 5 ; 2 6 ; 3 2 ; 3 7 ] meaning three 3's, two 5's, two 6's etc.
Now take a row from your consecutiveZeroes matrix and randomly choose a row of replacementValues that specifies the same number of elements, replace the zeroes in A with the values from replacementValues, and delete the row from replacementValues to show that you've used it.
If there isn't a row in replacementValues that describes a long enough run of values to replace one of your runs of zeroes, find a combination of two or more rows from replacementValues that will work.
You can't do this with just a single pass through the matrix, because presumably you could have a matrix A like [ 15 7 0 0 0 0 0 0 3 ; 2 0 0 0 5 0 0 0 9 ] and a vector B like [ 2 2 2 3 3 3 7 7 5 5 5 5 ], where you can only achieve what you want if you use the four 5's and two 7's and not the three 2's and three 3's to substitute for the run of six zeroes, because you have to leave the 2's and 3's for the two runs of three zeroes in the next row. The easiest approach if efficiency is not critical would probably be to run the algorithm multiple times trying different random combinations until you get one that works - but you'll need to decide how many times to try before giving up in case the input data actually has no solution.
If you get stuck on any of these steps I suggest asking a new, more specific question.
I have an array which looks similar to:
0 2 3 4 0 0 7 8 0 10
0 32 44 47 0 0 37 54 0 36
I wish to remove all
0
0
from this to get:
2 3 4 7 8 10
32 44 47 37 54 36
I've tried x(x == 0) = []
but I get:
x =
2 32 3 44 4 47 7 37 8 54 10 36
How can I remove all zero columns?
Here is a possible solution:
x(:,all(x==0))=[]
You had the right approach with x(x == 0) = [];. By doing this, you would remove the right amount of elements that can still form a 2D matrix and this actually gives you a vector of values that are non-zero. All you have to do is reshape the matrix back to its original form with 2 rows:
x(x == 0) = [];
y = reshape(x, 2, [])
y =
2 3 4 7 8 10
32 44 47 37 54 36
Another way is with any:
y = x(:,any(x,1));
In this case, we look for any columns that are non-zero and use these locations to index into x and extract out those corresponding columns.
Result:
y =
2 3 4 7 8 10
32 44 47 37 54 36
Another way which is more for academic purposes is to use unique. Assuming that your matrix has all positive values:
[~,~,id] = unique(x.', 'rows');
y = x(:, id ~= 1)
y =
2 3 4 7 8 10
32 44 47 37 54 36
We transpose x so that each column becomes a row, and we look for all unique rows. The reason why the matrix needs to have all positive values is because the third output of unique assigns unique ID to each unique row in sorted order. Therefore, if we have all positive values, then a row of all zeroes would be assigned an ID of 1. Using this array, we search for IDs that were not assigned a value of 1, and use those to index into x to extract out the necessary columns.
You could also use sum.
Sum over the columns and any column with zeros only will be zeros after the summation as well.
sum(x,1)
ans =
0 34 47 51 0 0 44 62 0 46
x(:,sum(x,1)>0)
ans =
2 3 4 7 8 10
32 44 47 37 54 36
Also by reshaping nonzeros(x) as follows:
reshape(nonzeros(x), size(x,1), [])
ans =
2 3 4 7 8 10
32 44 47 37 54 36
I am new here and new to matlab.
I have a matrix file in matlab and what I want to do is make a plot of the average all the rows. However, I only want to plot a few data points (about 20) of the values before, and after, the maximum value within that row. The matrix file has 550 columns.
I have worked out how to identify the maximum value and the column number of this maximum value using;
[maxvalue maxindex] = max(filename, [], 2)
As the maximum is never in the same column, i really need help in calculating the average values for each row (before and afer the max value), and then how to plot this where the maximum value would be 0 on the x-axis.
For example - i have a matrix like this;
14 51 623 23 4 1 4 5
0 0 3 5 67 37 37 5
0 0 0 0 574 4 5 6
and max value = 623
67
574
and max index = 3
5
5
So i would like to, plot the average of the 3 rows, 2 data points before and after the max values...so to plot the average of;
14, 51, 623, 23, 4, 1
3, 5, 67, 37, 37
0, 0, 574, 4, 5
Thanks so much for any help!
data = [14 51 623 23 4 1 4 5
0 0 3 5 67 37 37 5
0 0 0 0 574 4 5 6]; %// example data
data = data.'; %'// it's easier to work along columns
[~, pos] = max(data); %// position of maxima
ind = bsxfun(#plus,bsxfun(#plus, pos,(-2:2).'),(0:size(data,2)-1)*size(data,1));
%'// linear index into the matrix obtained from pos
data_trimmed = data(ind).'; %'// index and transpose back
data = data.'; %'// undo transpose to put data back into shape
Result:
data_trimmed =
14 51 623 23 4
3 5 67 37 37
0 0 574 4 5
I am trying to replace the first and second column of an edgelist matrix (edgenumber x 3) by specific node numbers as such:
5 1 1
1 38 1
2 1 1
28 17 1
18 1 1
25 1 1
that the node numbers (connection from node 5 to node 1) are replaced by the corresponding values from a vector. The edgelist is generated from an unweighted 40x40 adjacency matrix.
The vector degree_list of size 40x1 contains the "real"node numbers of this edgelist which i want to add to a larger 321x321 adjacency matrix. (if there's an easier way to do that then by concatenating the edgelists, that would also be great).
degree_list=[183,150,151,39,184,185,152,...];
So of the above edgelist I would like to replace all 1s in coll 1 and 2 by 183, all 2s by 150, etc.
Then I need to keep this new edgelist which I the add to the larger edgelist, transform it back to an adjacency matrix and have my new correct bigger adjM.
I have tried to find a solution here and on other websites but not been successful. Thank you very much for your help,
Chris
Code
a1 = [
5 1 1
1 38 1
2 1 1
28 17 1
18 1 1
25 1 1]
degree_list=[183,150,151,39,184,185,152];
col12 = a1(:,[1 2])
col12_uniq = unique(col12)
degree_list = [degree_list numel(degree_list)+1:max(col12_uniq)]
uniq_dim3 = bsxfun(#eq,col12,permute(repmat(col12_uniq,1,2),[3 2 1]))
match_dim3 = bsxfun(#times,uniq_dim3,permute(degree_list(col12_uniq),[3 1 2]))
a1_out = [sum(match_dim3,3) a1(:,3)]
Output
a1 =
5 1 1
1 38 1
2 1 1
28 17 1
18 1 1
25 1 1
a1_out =
184 183 1
183 38 1
150 183 1
28 17 1
18 183 1
25 183 1