Receiving NaN from an anonymous function in matlab - matlab

I'd like to implement a piecewise periodic function, which should be zero in certain intervals and look like a test function elsewhere (e.g. exp(a^2/(abs(x)^2-a^2)) for abs(x)< a and zero otherwise).
I tried
nu = #(x) ((8*10^(-4)/exp(1)*exp(30^2./(abs(mod(x,365)-31).^2-30.^2))).* ...
and((1<mod(x,365)),(mod(x,365)<61)) + ...
(8*10^(-4)/exp(1)*exp(10^2./(abs(mod(x,365)-300).^2-10.^2))).* ...
and((290<mod(x,365)),(mod(x,365)<310)));
respectively
nu = #(x) ((0*x).* and((0<=mod(x,365)),(mod(x,365)<=1)) + ...
(8*10^(-4)/exp(1)*exp(30^2./(abs(mod(x,365)-31).^2-30.^2))).* ...
and((1<mod(x,365)),(mod(x,365)<61)) + ...
(0*x).* and((61<=mod(x,365)),(mod(x,365)<=290)) + ...
(8*10^(-4)/exp(1)*exp(10^2./(abs(mod(x,365)-300).^2-10.^2))).* ...
and((290<mod(x,365)),(mod(x,365)<310)) + ...
(0*x).* and((310<=mod(x,365)),(mod(x,365)<365)));
which should behave the same. The aim is to have a period of [0,365), therefore the modulo.
Now my problem is that nu(1)=nu(61)=nu(290)=nu(310)=NaN and also in a small neighborhood of them, e.g. nu(0.99)=NaN. But I excluded these points from the exponential function, where this one would cause problems. And even if I use a smaller interval for the exponential functions (e.g (2,60) and (291,309)) I receive NaN at the same points.
Any ideas? Thanks in advice!

One trick I use when performing vectorised calculations in which there's risk of a division by zero or related error is to use the conditional to modify the problem value. For instance, suppose you wanted to invert all entries in a vector, but leave zero at zero (and set any value within, say, 1e-8 to zero, too). You'd do this:
outVect = 1./(inVect+(abs(inVect)<=1e-8)).*(abs(inVect)>1e-8);
For values satisfying the condition that abs(value)>1e-8, this calculates 1/value. If abs(value)<=1e-8, it actually calculates 1/(value+1), then multiplies by zero, resulting in a zero value. Without the conditional inside the denominator, it would calculate 1/value when value is zero, resulting in inf... and then multiply inf by zero, resulting in NaN.
The same technique should work with your more complicated anonymous function.

Related

Index when mean is constant

I am relatively new to matlab. I found the consecutive mean of a set of 1E6 random numbers that has mean and standard deviation. Initially the calculated mean fluctuate and then converges to a certain value.
I will like to know the index (i.e 100th position) at which the mean converges. I have no idea how to do that.
I tried using the logical operator but i have to go through 1e6 data points. Even with that i still can't find the index.
Y_c= sigma_c * randn(n_r, 1) + mu_c; %Random number creation
Y_f=sigma_f * randn(n_r, 1) + mu_f;%Random number creation
P_u=gamma*(B*B)/2.*N_gamma+q*B.*N_q + Y_c*B.*N_c; %Calculation of Ultimate load
prog_mu=cumsum(P_u)./cumsum(ones(size(P_u))); %Progressive Cumulative Mean of system response
logical(diff(prog_mu==0)); %Find index
I suspect the issue is that the mean will never truly be constant, but will rather fluctuate around the "true mean". As such, you'll most likely never encounter a situation where the two consecutive values of the cumulative mean are identical. What you should do is determine some threshold value, below which you consider fluctuations in the mean to be approximately equal to zero, and compare the difference of the cumulative mean to that value. For instance:
epsilon = 0.01;
const_ind = find(abs(diff(prog_mu))<epsilon,1,'first');
where epsilon will be the threshold value you choose. The find command will return the index at which the variation in the cumulative mean first drops below this threshold value.
EDIT: As was pointed out, this method may potentially fail if the first few random numbers are generated such that the difference between them is less than the epsilon value, but have not yet converged. I would like to suggest a different approach, then.
We calculate the cumulative means, as before, like so:
prog_mu=cumsum(P_u)./cumsum(ones(size(P_u)));
We also calculate the difference in these cumulative means, as before:
df_prog_mu = diff(prog_mu);
Now, to ensure that conversion has been achieved, we find the first index where the cumulative mean is below the threshold value epsilon and all subsequent means are also below the threshold value. To phrase this another way, we want to find the index after the last position in the array where the cumulative mean is above the threshold:
conv_index = find(~df_prog_mu,1,'last')+1;
In doing so, we guarantee that the value at the index, and all subsequent values, have converged below your predetermined threshold value.
I wouldn't imagine that the mean would suddenly become constant at a single index. Wouldn't it asymptotically approach a constant value? I would reccommend a for loop to calculate the mean (it sounds like maybe you've already done this part?) like this:
avg = [];
for k=1:length(x)
avg(k) = mean(x(1:k));
end
Then plot the consecutive mean:
plot(avg)
hold on % this will allow us to plot more data on the same figure later
If you're trying to find the point at which the consecutive mean comes within a certain range of the true mean, try this:
Tavg = 5; % or whatever your true mean is
err = 0.01; % the range you want the consecutive mean to reach before we say that it "became constant"
inRange = avg>(Tavg-err) & avg<(Tavg+err); % gives you a binary logical array telling you which values fell within the range
q = 1000; % set this as high as you can while still getting a value for consIndex
constIndex = [];
for k=1:length(inRange)
if(inRange(k) == sum(inRange(k:k+q))/(q-1);)
constIndex = k;
end
end
The below answer takes a similar approach but makes an unsafe assumption that the first value to fall within the range is the value where the function starts to converge. Any value could randomly fall within that range. We need to make sure that the following values also fall within that range. In the above code, you can edit "q" and "err" to optimize your result. I would recommend double checking it by plotting.
plot(avg(constIndex), '*')

How to calculate the "rest value" of a plot?

Didn't know how to paraphrase the question well.
Function for example:
Data:https://www.dropbox.com/s/wr61qyhhf6ujvny/data.mat?dl=0
In this case how do I calculate that the rest point of this function is ~1? I have access to the vector that makes the plot.
I guess the mean is an approximation but in some cases it can be pretty bad.
Under the assumption that the "rest" point is the steady-state value in your data and the fact that the steady-state value happens the majority of the times in your data, you can simply bin all of the points and use each unique value as a separate bin. The bin with the highest count should correspond to the steady-state value.
You can do this by a combination of histc and unique. Assuming your data is stored in y, do this:
%// Find all unique values in your data
bins = unique(y);
%// Find the total number of occurrences per unique value
counts = histc(y, bins);
%// Figure out which bin has the largest count
[~,max_bin] = max(counts);
%// Figure out the corresponding y value
ss_value = bins(max_bin);
ss_value contains the steady-state value of your data, corresponding to the most occurring output point with the assumptions I laid out above.
A minor caveat with the above approach is that this is not friendly to floating point data whose unique values are generated by floating point values whose decimal values beyond the first few significant digits are different.
Here's an example of your data from point 2300 to 2320:
>> format long g;
>> y(2300:2320)
ans =
0.99995724232555
0.999957488454868
0.999957733165346
0.999957976465197
0.999958218362579
0.999958458865564
0.999958697982251
0.999958935720613
0.999959172088623
0.999959407094224
0.999959640745246
0.999959873049548
0.999960104014889
0.999960333649014
0.999960561959611
0.999960788954326
0.99996101464076
0.999961239026462
0.999961462118947
0.999961683925704
0.999961904454139
Therefore, what I'd recommend is to perhaps round so that the first 5 or so significant digits are maintained.
You can do this to your dataset before you continue:
num_digits = 5;
y_round = round(y*(10^num_digits))/(10^num_digits);
This will first multiply by 10^n where n is the number of digits you desire so that the decimal point is shifted over by n positions. We round this result, then divide by 10^n to bring it back to the scale that it was before. If you do this, for those points that were 0.9999... where there are n decimal places, these will get rounded to 1, and it may help in the above calculations.
However, more recent versions of MATLAB have this functionality already built-in to round, and you can just do this:
num_digits = 5;
y_round = round(y,num_digits);
Minor Note
More recent versions of MATLAB discourage the use of histc and recommend you use histcounts instead. Same function definition and expected inputs and outputs... so just replace histc with histcounts if your MATLAB version can handle it.
Using the above logic, you could also use the median too. If the majority of data is fluctuating around 1, then the median would have a high probability that the steady-state value is chosen... so try this too:
ss_value = median(y_round);

Finding the best threshold level without a for loop in MATLAB

Let A and B be two matrices of the same size. For a matrix M, let ht(M,t) threshold all the entries of M by t. That is All entries whose absolute value is less than t are set to 0. Suppose I want to find the optimal threshold t such that norm(ht(A,t)-B,'fro')^2 is minimized.
The only way that I can see to do this is deficient: do a for loop over the unique values of A and threshold A and setting C=ht(A,t)-B, compute sum(sum(C.*C)).
This is just too slow when A is large. I have considered sorting the elements of A and finding some efficient way to set a few entries to zero at a time, but I'm not sure this can all be done without a for loop.
Is there a way to do it?
Here's a very simple example (so simple a for loop works easily in this case):
B =
0.101508820368332 0
0 0.301996943246957
Set
A=B+.1*ones(2)
A =
0.201508820368332 0.1
0.1 0.401996943246957
Simple inspection shows that if we zero out the off-diagonal entries of A we minimize the difference between A and B. There are 3 possible threshold values, given by unique(A)=[.1,.2015,.402]. Given a potential threshold value t, we can hard threshold A by:
function [A_thresholded] = ht(A,t)
%
A_thresholded = A .* (abs(A)>t);
The form of the data in a matrix is irrelevant. You can convert them to vectors and simply compute the square-norm. In fact, you can sort the contents of A in increasing order (and permute B to preserve pairing). When you increase the threshold to include one more value in A, the norm only changes by that one increment. Therefore, you can find your solution in O(n log n). Hope this helps.

Variable precicion arithmetic for symbolic integral in Matlab

I am trying to calculate some integrals that use very high power exponents. An example equation is:
(-exp(-(x+sqrt(p)).^2)+exp(-(x-sqrt(p)).^2)).^2 ...
./( exp(-(x+sqrt(p)).^2)+exp(-(x-sqrt(p)).^2)) ...
/ (2*sqrt(pi))
where p is constant (1000 being a typical value), and I need the integral for x=[-inf,inf]. If I use the integral function for numeric integration I get NaN as a result. I can avoid that if I set the limits of the integration to something like [-20,20] and a low p (<100), but ideally I need the full range.
I have also tried setting syms x and using int and vpa, but in this case vpa returns:
1.0 - 1.0*numeric::int((1125899906842624*(exp(-(x - 10*10^(1/2))^2) - exp(-(x + 10*10^(1/2))^2))^2)/(3991211251234741*(exp(-(x - 10*10^(1/2))^2) + exp(-(x + 10*10^(1/2))^2)))
without calculating a value. Again, if I set the limits of the integration to lower values I do get a result (also for low p), but I know that the result that I get is wrong – e.g., if x=[-100,100] and p=1000, the result is >1, which should be wrong as the equation should be asymptotic to 1 (or alternatively the codomain should be [0,1) ).
Am I doing something wrong with vpa or is there another way to calculate high precision values for my integrals?
First, you're doing something that makes solving symbolic problems more difficult and less accurate. The variable pi is a floating-point value, not an exact symbolic representation of the fundamental constant. In Matlab symbolic math code, you should always use sym('pi'). You should do the same for any other special numeric values, e.g., sqrt(sym('2')) and exp(sym('1')), you use or they will get converted to an approximate rational fraction by default (the source of strange large number you see in the code in your question). For further details, I recommend that you read through the documentation for the sym function.
Applying the above, here's a runnable example:
syms x;
p = 1000;
f = (-exp(-(x+sqrt(p)).^2)+exp(-(x-sqrt(p)).^2)).^2./(exp(-(x+sqrt(p)).^2)...
+exp(-(x-sqrt(p)).^2))/(2*sqrt(sym('pi')));
Now vpa(int(f,x,-100,100)) and vpa(int(f,x,-1e3,1e3)) return exactly 1.0 (to 32 digits of precision, see below).
Unfortunately, vpa(int(f,x,-Inf,Inf)), does not return an answer, but a call to the underlying MuPAD function numeric::int. As I explain in this answer, this is what can happen when int cannot obtain a result. Normally, it should try to evaluate the the integral numerically, but your function appears to be ill-defined at ±∞, resulting in divide by zero issues that the variable precision quadrature methods can't handle well. You can evaluate the integral at wider bounds by increasing the variable precision using the digits function (just remember to set digits back to the default of 32 when done). Setting digits(128) allowed me to evaluate vpa(int(f,x,-1e4,1e4)). You can also more efficiently evaluate your integral over a wider range via 2*vpa(int(f,x,0,1e4)) at lower effective digits settings.
If your goal is to see exactly how much less than one p = 1000 corresponds to, you can use something like vpa(1-2*int(f,x,0,1e4)). At digits(128), this returns
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000086457415971094118490438229708839420392402555445545519907545198837816908450303280444030703989603548138797600750757834260181259102
Applying double to this shows that it is approximately 8.6e-89.

What is this code doing? Machine Learning

I'm just learning matlab and I have a snippet of code which I don't understand the syntax of. The x is an n x 1 vector.
Code is below
p = (min(x):(max(x)/300):max(x))';
The p vector is used a few lines later to plot the function
plot(p,pp*model,'r');
It generates an arithmetic progression.
An arithmetic progression is a sequence of numbers where the next number is equal to the previous number plus a constant. In an arithmetic progression, this constant must stay the same value.
In your code,
min(x) is the initial value of the sequence
max(x) / 300 is the increment amount
max(x) is the stopping criteria. When the result of incrementation exceeds this stopping criteria, no more items are generated for the sequence.
I cannot comment on this particular choice of initial value and increment amount, without seeing the surrounding code where it was used.
However, from a naive perspective, MATLAB has a linspace command which does something similar, but not exactly the same.
Certainly looks to me like an odd thing to be doing. Basically, it's creating a vector of values p that range from the smallest to the largest values of x, which is fine, but it's using steps between successive values of max(x)/300.
If min(x)=300 and max(x)=300.5 then this would only give 1 point for p.
On the other hand, if min(x)=-1000 and max(x)=0.3 then p would have thousands of elements.
In fact, it's even worse. If max(x) is negative, then you would get an error as p would start from min(x), some negative number below max(x), and then each element would be smaller than the last.
I think p must be used to create pp or model somehow as well so that the plot works, and without knowing how I can't suggest how to fix this, but I can't think of a good reason why it would be done like this. using linspace(min(x),max(x),300) or setting the step to (max(x)-min(x))/299 would make more sense to me.
This code examines an array named x, and finds its minimum value min(x) and its maximum value max(x). It takes the maximum value and divides it by the constant 300.
It doesn't explicitly name any variable, setting it equal to max(x)/300, but for the sake of explanation, I'm naming it "incr", short for increment.
And, it creates a vector named p. p looks something like this:
p = [min(x), min(x) + incr, min(x) + 2*incr, ..., min(x) + 299*incr, max(x)];