I've understood the use of varfun in matlab. I need to apply mean and std to the grouped variables of this table.
N mut time
___ ___ ____
250 0.1 0.07
250 0.1 0.05
250 0.1 0.04
250 0.1 0.03
250 0.2 0.03
250 0.2 0.04
250 0.2 0.03
250 0.2 0.05
250 0.3 0.05
250 0.3 0.06
750 0.2 0.24
750 0.3 0.29
750 0.3 0.3
750 0.3 0.31
750 0.3 0.3
750 0.4 0.33
750 0.4 0.34
750 0.4 0.33
750 0.4 0.32
750 0.5 0.38
750 0.5 0.39
This table has two values of N and five different values of mut and I need to compute the average of time grouped by N and mut.
To do this I use the varfun function with the function handle #mean
Tgroup = varfun(#mean,T,'InputVariables','time','GroupingVariables',{'N','mut'})
and I get:
mut N GroupCount mean_time
___ ___ __________ _________
0.1 250 4 0.0475
0.2 250 4 0.0375
0.2 750 1 0.24
0.3 250 2 0.055
0.3 750 4 0.3
0.4 750 4 0.33
0.5 750 2 0.385
but now I also want to add a column that contains the standard deviation. To do this I create a anonymous function
func = #(x)[mean(x), std(x)]
and I use it in varfunc
varfun(#(x)[mean(x),std(x)],([T(1:5:50,:);T(400:5:450,:)]),'InputVariables','time','GroupingVariables',{'mut','N'})
Unfortunately I get this:
ans =
mut N GroupCount Fun_time
___ ___ __________ ___________________
0.1 250 4 0.0475 0.017078
0.2 250 4 0.0375 0.0095743
0.2 750 1 0.24 0
0.3 250 2 0.055 0.0070711
0.3 750 4 0.3 0.008165
0.4 750 4 0.33 0.008165
0.5 750 2 0.385 0.0070711
where the last column 'Fun_time' contains two sub columns, the first being the mean of grouped times, the second the standard deviation of grouped times.
How can I split these two columns directly using an anonymous function? This is very similar to what is done in R with the plyr package.
You could use dplyr to simplify the process
DF = read.table(text="
N mut time
250 0.1 0.07
250 0.1 0.05
250 0.1 0.04
250 0.1 0.03
250 0.2 0.03
250 0.2 0.04
250 0.2 0.03
250 0.2 0.05
250 0.3 0.05
250 0.3 0.06
750 0.2 0.24
750 0.3 0.29
750 0.3 0.3
750 0.3 0.31
750 0.3 0.3
750 0.4 0.33
750 0.4 0.34
750 0.4 0.33
750 0.4 0.32
750 0.5 0.38
750 0.5 0.39",header=TRUE)
newDF = DF %>%
group_by(N,mut) %>%
summarise(count = n(),meanTime = mean(time),sdTime = sd(time) ) %>%
as.data.frame()
# > newDF
# N mut count avgTime avgSD
#1 250 0.1 4 0.0475 0.017078251
#2 250 0.2 4 0.0375 0.009574271
#3 250 0.3 2 0.0550 0.007071068
#4 750 0.2 1 0.2400 NaN
#5 750 0.3 4 0.3000 0.008164966
#6 750 0.4 4 0.3300 0.008164966
#7 750 0.5 2 0.3850 0.007071068
Related
I am trying to optimise the running time of my code by getting rid of some for loops. However, I have a variable that is incremented in each iteration in which sometimes the index is repeated. I provide here a minimal example:
a = [1 4 2 2 1 3 4 2 3 1]
b = [0.5 0.2 0.3 0.4 0.1 0.05 0.7 0.3 0.55 0.8]
c = [3 5 7 9]
for i = 1:10
c(a(i)) = c(a(i)) + b(i)
end
Ideally, I would like to compute it by writting:
c(a) = c(a) + b
but obviously it would not give me the same results since I have to recalculate the value for the same index several times so this way to vectorise it would not work.
Also, I am working in Matlab or Octave in case that this is important.
Thank you very much for any help, I am not sure that it is possible to be vectorise.
Edit: thank you very much for your answers so far. I have discovered accumarray, which I did not know before and also understood why changing the for loop between Matlab and Octave was giving me such different times. I also understood my problem better. I gave a too simple example which I thought I could extend, however, what if b was a matrix?
(Let's forget about c at the moment):
a = [1 4 2 2 1 3 4 2 3 1]
b =[0.69 -0.41 -0.13 -0.13 -0.42 -0.14 -0.23 -0.17 0.22 -0.24;
0.34 -0.39 -0.36 0.68 -0.66 -0.19 -0.58 0.78 -0.23 0.25;
-0.68 -0.54 0.76 -0.58 0.24 -0.23 -0.44 0.09 0.69 -0.41;
0.11 -0.14 0.32 0.65 0.26 0.82 0.32 0.29 -0.21 -0.13;
-0.94 -0.15 -0.41 -0.56 0.15 0.09 0.38 0.58 0.72 0.45;
0.22 -0.59 -0.11 -0.17 0.52 0.13 -0.51 0.28 0.15 0.19;
0.18 -0.15 0.38 -0.29 -0.87 0.14 -0.13 0.23 -0.92 -0.21;
0.79 -0.35 0.45 -0.28 -0.13 0.95 -0.45 0.35 -0.25 -0.61;
-0.42 0.76 0.15 0.99 -0.84 -0.03 0.27 0.09 0.57 0.64;
0.59 0.82 -0.39 0.13 -0.15 -0.71 -0.84 -0.43 0.93 -0.74]
I understood now that what I would be doing is rowSum per group, and given that I am using Octave I cannot use "splitapply". I tried to generalise your answers, but accumarray would not work for matrices and also I could not generalise #rahnema1 solution. The desired output would be:
[0.34 0.26 -0.93 -0.56 -0.42 -0.76 -0.69 -0.02 1.87 -0.53;
0.22 -1.03 1.53 -0.21 0.37 1.54 -0.57 0.73 0.23 -1.15;
-0.20 0.17 0.04 0.82 -0.32 0.10 -0.24 0.37 0.72 0.83;
0.52 -0.54 0.02 0.39 -1.53 -0.05 -0.71 1.01 -1.15 0.04]
that is "equivalent" to
[sum(b([1 5 10],:))
sum(b([3 4 8],:))
sum(b([6 9],:))
sum(b([2 7],:))]
Thank you very much, If you think I should include this in another question instead of adding the edit I will do so.
Original question
It can be done with accumarray:
a = [1 4 2 2 1 3 4 2 3 1];
b = [0.5 0.2 0.3 0.4 0.1 0.05 0.7 0.3 0.55 0.8];
c = [3 5 7 9];
c(:) = c(:) + accumarray(a(:), b(:));
This sums the values from b in groups defined by a, and adds that to the original c.
Edited question
If b is a matrix, you can use
full(sparse(repmat(a, 1, size(b,1)), repelem(1:size(b,2), size(b,1)), b))
or
accumarray([repmat(a, 1, size(b,1)).' repelem(1:size(b,2), size(b,1)).'], b(:))
Matrix multiplication and implicit expansion and can be used (Octave):
nc = numel(c);
c += b * (1:nc == a.');
For input of large size it may be more memory efficient to use sparse matrix:
nc = numel(c);
nb = numel(b);
c += b * sparse(1:nb, a, 1, nb, nc);
Edit: When b is a matrix you can extend this solution as:
nc = numel(c);
na = numel(a);
out = sparse(a, 1:na, 1, nc, na) * b;
The following script fits a curve bowing-like via curve_fit (from scipy.optimize), see below:
ydata = numpy.array[ 1.6504 1.63928044 1.62855028 1.6181874 1.60817119 1.59848249 1.58910347 1.58001759 1.57120948 1.56266487 1.55437054 1.54631424 1.5384846 1.53087109 1.52346397 1.5162542 1.5092334 1.50239383 1.4957283 1.48923013 1.48289315 1.47671162 1.4706802 1.46479393 1.45904821 1.45343874 1.44796151 1.44261281 1.43738913 1.43228723 1.42730406 1.42243677 1.4176827 1.41303936 1.40850439 1.40407561 1.39975096 1.39552851 1.39140647 1.38738314 1.38345695 1.37962642 1.37589018 1.37224696 1.36869555 1.36523487 1.36186389 1.35858169 1.35538741 1.35228028 1.34925958 1.34632469 1.34347504 1.34071015 1.33802957 1.33543295 1.33291998 1.33049042 1.32814407 1.32588081 1.32370057 1.32160331 1.31958908 1.31765795 1.31581005 1.31404556 1.31236472 1.3107678 1.30925513 1.30782709 1.30648411 1.30522666 1.3040553 1.30297062 1.30197327 1.30106398 1.30024355 1.29951286 1.29887287 1.29832464 1.29786933 1.29750821 1.29724268 1.29707426 1.29700463 1.29703564 1.29716927 1.29740773 1.2977534 1.29820885 1.29877688 1.29946049 1.3002629 1.30118751 1.30223793 1.30341792 1.30473139 1.30618232 1.30777475 1.30951267 1.3114 ]
xdata = numpy.array[ 0. 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1. ]
sigma = np.ones(len(xdata))
sigma[[0, -1]] = 0.01
def function_cte(x, b):
return 1.31*x + 1.57*(1-x) - b*x*(1-x)
def function_linear(x, c1, c2):
return 1.31*x + 1.57*(1-x) - (c1+c2*x)*x*(1-x)
popt_cte, pcov_cte = curve_fit(function_cte, xdata, ydata, sigma=sigma)
popt_lin, pcov_lin = curve_fit(function_linear, xdata, ydata, sigma=sigma)
But I'm getting the plot in figure ,
i.e., the initial points from both functions disagree with the data to fit (xdata, ydata).
I would like a fit constrained on the endpoints (0.0, 1.57) and (1.0, 1.31) at the same point that minimize the error. Any idea based on this code or it is better taking another way?
thanks!
The following code produces the figure I want but only when I remove the \tcbox. I'd like a border around the figure. What I have done below works with other tikz figures. The problem seems to be with the table data. Can anyone please advise?
\begin{figure}
\tcbox{
\begin{tikzpicture}
\begin{axis}[
legend pos=south east,
xlabel=Variable 1, % label x axis
ylabel=Variable 2, % label y axis
]
\addplot[
scatter, only marks,
scatter/classes={
a={mark=square*,blue},
b={mark=triangle*,red}
}
]
table[x=x,y=y,meta=label]{
x y label
0.1 0.35 a
0.2 0.4 a
0.25 0.35 a
0.3 0.4 a
0.3 0.35 a
0.4 0.3 a
0.45 0.3 a
0.4 0.4 a
0.6 0.7 b
0.65 0.55 b
0.65 0.55 b
0.7 0.6 b
0.75 0.65 b
0.8 0.75 b
0.9 0.6 b
0.7 0.6 b
0.5 0.7 b
0.5 0.55 b
0.6 0.8 b
};
\legend{}
\end{axis}
\end{tikzpicture}
}
\caption{CAP HERE}
\label{statsexample}
\end{figure}
The issue is with your row separator. You can use row sep=crcr instead:
\begin{figure}
\tcbox{
\begin{tikzpicture}
\begin{axis}[
legend pos=south east,
xlabel=Variable 1, % label x axis
ylabel=Variable 2, % label y axis
]
\addplot[
scatter, only marks,
scatter/classes={
a={mark=square*,blue},
b={mark=triangle*,red}
}
]
table[x=x,y=y,meta=label,row sep=crcr]{
x y label\\
0.1 0.35 a\\
0.2 0.4 a\\
0.25 0.35 a\\
0.3 0.4 a\\
0.3 0.35 a\\
0.4 0.3 a\\
0.45 0.3 a\\
0.4 0.4 a\\
0.6 0.7 b\\
0.65 0.55 b\\
0.65 0.55 b\\
0.7 0.6 b\\
0.75 0.65 b\\
0.8 0.75 b\\
0.9 0.6 b\\
0.7 0.6 b\\
0.5 0.7 b\\
0.5 0.55 b\\
0.6 0.8 b\\
};
\end{axis}
\end{tikzpicture}
}
\caption{CAP HERE}
\label{statsexample}
\end{figure}
I have a vector:
0.02
-0.02
0
-0.02
-0.08
-0.05
-0.04
-0.1
0
0.05
0.05
0.05
0.08
0.04
How do I normalize this with the first value starting at 100?
Simply divide by the first element and multiply by 100:
a = [0.02 -0.02 0 -0.02 -0.08 -0.05 -0.04 -0.1 0 0.05 0.05 0.05 0.08 0.04]
b = a ./ a(1) * 100
b =
100 -100 0 -100 -400 -250 -200 -500 0 250 250 250 400 200
myArr = [0.02 -0.02 0 -0.02 -0.08 -0.05 -0.04 ...
-0.1 0 0.05 0.05 0.05 0.08 0.04]
myArr = 100*myArr/myArr(1)
I need to sort out few small matrices from 1 huge raw matrix ...according to sorting 1st column (1st column contain either 1, 2, or 3)...
if 1st column is 1, then randomly 75% of the 1 save in file A1, 25% of the 1 save in file A2.
if 1st column is 2, then randomly 75% of the 2 save in file B1, 25% of the 2 save in file B2.
if 1st column is 3, then randomly 75% of the 3 save in file C1, 25% of the 3 save in file C2.
how am i going to write the code?
Example:
a raw matrix has 15 rows x 6 columns:
7 rows are 1 in 1st column, 5 rows are 2 in 1st column, and 3 rows are 3 in 1st column.
1 -0.05 -0.01 0.03 0.07 0.11
1 -0.4 -0.36 -0.32 -0.28 -0.24
1 0.3 0.34 0.38 0.42 0.46
1 0.75 0.79 0.83 0.87 0.91
1 0.45 0.49 0.53 0.57 0.61
1 0.8 0.84 0.88 0.92 0.96
1 0.05 0.09 0.13 0.17 0.21
2 0.5 0.54 0.58 0.62 0.66
2 0.4 0.44 0.48 0.52 0.56
2 0.9 0.94 0.98 1.02 1.06
2 0.85 0.89 0.93 0.97 1.01
2 0.75 0.79 0.83 0.87 0.91
3 0.36 0.4 0.44 0.48 0.52
3 0.6 0.64 0.68 0.72 0.76
3 0.4 0.44 0.48 0.52 0.56
7 rows got 1 in 1st column, randomly take out 75% of 7 rows (which is 7*0.75=5.25) to be new matrix (5rows x 6 columns), the rest of 25% become another new matrix
5 rows got 2 in 1st column, randomly take out 75% of 5 rows (which is 5*0.75=3.75) to be new matrix (4rows x 6 columns), the rest of 25% become another new matrix
3 rows got 3 in 1st column, randomly take out 75% of 3 rows (which is 3*0.75=2.25) to be new matrix (2rows x 6 columns), the rest of 25% become another new matrix
Result:
A1=
1 -0.4 -0.36 -0.32 -0.28 -0.24
1 0.3 0.34 0.38 0.42 0.46
1 0.75 0.79 0.83 0.87 0.91
1 0.8 0.84 0.88 0.92 0.96
1 -0.05 -0.01 0.03 0.07 0.11
B1=
2 0.9 0.94 0.98 1.02 1.06
2 0.85 0.89 0.93 0.97 1.01
2 0.5 0.54 0.58 0.62 0.66
2 0.75 0.79 0.83 0.87 0.91
C1=
3 0.36 0.4 0.44 0.48 0.52
3 0.4 0.44 0.48 0.52 0.56
here is one possible solution to your problem using the function randperm:
% Create matrices
firstcol=ones(15,1);
firstcol(8:12)=2;
firstcol(13:15)=3;
mat=[firstcol rand(15,5)];
% Sort according to first column
A=mat(mat(:,1)==1,:);
B=mat(mat(:,1)==2,:);
C=mat(mat(:,1)==3,:);
% Randomly rearrange lines
A=A(randperm(size(A,1)),:);
B=B(randperm(size(B,1)),:);
C=C(randperm(size(C,1)),:);
% Select first 75% lines (rounding)
A1=A(1:round(0.75*size(A,1)),:);
A2=A(round(0.75*size(A,1))+1:end,:);
B1=B(1:round(0.75*size(B,1)),:);
B1=B(round(0.75*size(B,1))+1:end,:);
C1=C(1:round(0.75*size(C,1)),:);
C1=C(round(0.75*size(C,1))+1:end,:);
Hope it helps.