I have the following code from my predecessor. I am unable to figure out what is the math that is happening here and how is the values avgCov and stdCov are different, and what they signify.
Cprofile_f is a curve similar to gaussian curve, like a peak. Cprofile_f is an array of known size (5700).
b1, d1 are index values. Usually, b1 is 2000, d1 is 4300.
avgCov=sum(Cprofile_f(b1:d1))/(d1-b1)
stdCov=0;
for ii=b1:d1
stdCov =stdCov + sqrt((avgCov - Cprofile_f(ii))^2);
end
stdCov =1- stdCov/(d1-b1)/avgCov
Trying to figure out, what stdCov mean here.
Looks like it's computing the average (avgCov) and standard deviation (stdCov, sort of) in order to compute 1 minus the coefficient of variation (stored in stdCov).
https://en.wikipedia.org/wiki/Coefficient_of_variation
Related
Maple helpfully can work out the solution to Laplace's equation in a square region and give me the answer in closed form (in terms of an infinite sum). If I try to plot the function of two variables as a 3d plot it gives me most of the surface but not all of it:
Here is the Maple code which produces the solution and turns it into an expression suitable for plotting
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0;
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100;
sol1:=pdsolve({lapeq,bcs});
vxy:=eval(v(x,y),sol1);
the result of which is
All good so far. Plotting it via
plot3d(vxy,x=0..1,y=0..1);
gives a result which is fine for x in the full range (0<x<1) but only for y between 0 and around 0.9:
I have tried to evalf some point in the unknown region and Maple can't tell me numerical values there. Is there any way to get Maple to "try a bit harder" to evaluate those numbers?
You could try setting the number of terms in the sum
Compare
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0;
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100;
sol1:=pdsolve({lapeq,bcs});
vxy:=subs(infinity=100,sol1);
plot3d(rhs(vxy),x=0..1,y=0..1);
With
restart;
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0;
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100;
sol1:=pdsolve({lapeq,bcs});
vxy:=eval(v(x,y),sol1);
plot3d(vxy,x=0..1,y=0..1);
I'm not a huge fan of chopping the infinite sum at some value of the upper bound for n, without at least demonstrating either symbolically or numerically that it is justified. Ie, that chopping does not provide a false idea of convergence.
So, you asked how to make it work "harder". I'll take that to mean that you too might prefer to let evalf/Sum itself decide whether each infinite numeric sum converges -- rather than manually truncate it yourself at some finite value for the upper value of the range for n.
For fun, and caution, I also divide both numerator and denominator of K by the potentially large exp call (potentially much larger than 1). That may not be necessary here.
restart;
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0:
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100:
sol1:=pdsolve({lapeq,bcs}):
vxy:=eval(v(x,y),sol1):
K:=op(1,vxy):
J:=simplify(combine(numer(K)/exp(2*Pi*n)))
/simplify(combine(denom(K)/exp(2*Pi*n))):
F:=subs(__d=J,
proc(x,y) local k, m, n, r;
if y<0.8 then
r:=Sum(__d,n=1..infinity);
else
UseHardwareFloats:=false;
m := ceil(1*abs(y/0.80)^16);
r:=add(Sum(eval(__d,n=m*n-k),n=1..infinity),
k=0..m-1);
end if;
evalf(r);
end proc):
plot3d( F, 0..1, 0..0.99 );
Naturally this is slower than mere chopping of terms to obtain a finite sum. And you might be satisfied with some technique that establishes that the excluded terms' sums are negligible.
I am using 64-bit Windows with Matlab R2017a.
I have Matlab data stored in a vector here. When I plot the data using the command figure; plot(B), it looks like this:
Normally, when you remove the mean from a signal like this which looks almost periodic, the signal becomes symmetric about the x-axis. I tried this using the code B2 = B - mean(B);. Upon plotting with the command figure; plot(B2), I get this:
which is not symmetric (max value is around 0.9 and min value is around -1.25). However, this result is not true for a very similar dataset found here. Before removing the mean, C looks like this:
And after, C2 = C - mean(C) looks like this:
which is symmetric about the x-axis (max value is around 1.1 and min value is around -1.1).
What results in this difference for these two seemingly similar datasets?
"Normally, when you remove the mean from a signal like this which looks almost periodic, the signal becomes symmetric about the x-axis."
That only is true, if your values are equally distributed. And your "looks periodic" is exactly what your dataset is: It looks kinda periodic, but it isn't. You have much more values close to zero than to -2. You see this a) when calculating your median, which is -0.1618 on dataset B and also visually the time it rests at zero is much longer (approx. 700 samples) than when it's around -2.2 (~400 samples).
While Christians Answer is 100% correct. It doesn't offer a solution to the problem.
To center your function like you have it around the x-axis you would need to calculate:
B3 = B - (max(B) + min(B))/2
Note: This only works sol nicely because your function "look periodic"
I am trying to perform an interpolation/fit (preferably non-linear, but linear should also be fine) on 4D data. My data has a form of:
[a,b,c] = func(input)
obviously, func is unknown and ultimately data looks like (input, a, b, c):
0 -0.1253 0.0341 0.01060
35 -0.0985 0.0176 0.02060
50 -0.0315 -0.0533 0.1118
60 -0.0518 -0.0327 0.03020
80 0.2939 -0.0713 0.05670
100 0.3684 -0.0765 0.06740
I take observations at e.g. input = [0, 35, 50, 60, 80, 100] (0 being min and 100 being max; I take 6 samples in between min and max) and then I get corresponding a, b and c values (I understand that 6 sample points are a bad design of experiment so I will extend it in future).
I am trying to guess the value of a, b and c at say input = 19? Any pointers?
How to estimate goodness of fit in such scenario?
This is not 4D interpolation, this is 3 times 1D interpolation. You just interpolate interp1([0 35],[-0.1253 -0.0985],19) and the same for b and c. (interp1(intput,a,19))
Note that for the most basic 1D interpolation in a mesh grid (not what you have), you need 2 data points in general. For the most basic 2D interpolation, you need 4 data points. For 3D interpolation, 8 minimum, 4D, 16.... (2^d in general).
Also note that 1D interpolation uses 2 "dims". Because you use one to guide the interpolation, the other one is interpolated. General, with [v,a,b,c] data you would use 3D interpolation.
all that said, you do are nto in this case. You have scattered data, not a grid, thus the problem becomes considerably more complicated.
In case you can generate a few more points (not necessarily 16) you can use the function griddatan for interpolating scattered data. Note that you can not just say "give me [a,b,c] for input=19, there could be infinite amount of a,b,cs that have that condition. In any case, you always need to give dim-1 amount of sample points, and get the last one interpolated. Just an advice: this function is computationally and memory-wise very expensive. Do not use for big data points because it will crash your PC.
In the case you want to find a set of parameters that make input=19 then you are getting to more complicated area. You want to minimise a function f(x), where x=[a,b,c] for f(x)=input
In math terms:
argmin_x |f(x)-input|^2= \vec{input}
this is a harder problem and arguably more mathematics than a programming question. Perhaps a ND bspline fitting of your data would be a good f
I have a dataset with 274 samples (9 months) of the daily energy (Watts.hour) used on a residential household. I'm not sure if i'm applying the lpc function correctly.
My code is the following:
filename='9-months.csv';
energy = csvread(filename);
C=zeros(5,1);
counter=0;
N=3;
for n=274:-1:31
w2=energy(1:n-1,1);
a=lpc(w2,N);
energy_estimated=0;
for X = 1:N
energy_estimated = energy_estimated + (-a(X+1)*energy(n-X));
end
w_real=energy(n);
error2=abs(w_real-energy_estimated);
counter=counter+1;
C(counter,1)=error2;
end
mean_error=round(mean(C));
Being "n" the sample on analysis, I will use the energy array's values, from 1 to n-1, to calculate the lpc coefficientes (with N=3).
After that, it will apply the calculated coefficients on the "for" cycle presented, in order to calculate the estimated energy.
Finally, error2 outputs the error between the real energy and estimated value.
On the example presented ( http://www.mathworks.com/help/signal/ref/lpc.html ) some filters are used. Do I need to apply any filter to it? Is my methodology correct?
Thank you very much in advance!
The lpc seems to be used correctly, but there are a few other things about your code. I am adressign the part at he "for n" :
for n=31:274 %for me it would seem more logically to go forward in time
w2=energy(1:n-1,1);
a=lpc(w2,N);
energy_estimate=filter([0 -a(2:end)],1,w2);
energy_estimate=energy_estimate(end);
estimates(n)=energy_estimate;
end
error=energy(31:274)-estimates(31:274)';
meanerror=mean(error); %you dont really round mean errors
filter is exactly what you are trying to do with the X=1:N loop. but this will perform the calculation for the entire w2 vector. If you just want the last value take the (end) command as well.
Now there is no reason to calculate the error for every single value and then add them to a vector you can do that faster after the calculation.
Now if your trying to estimate future values with a lpc it could work like that, but you are implying that every value is only dependend on the last 3 values. Have you tried something like a polynominal approach? i would think that this would be closer to reality.
I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?
+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.
One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/