get vector which's mean is zero from arbitrary vector - matlab

as i know to get zero mean vector from given vector,we should substract mean of given vector from each memeber of this vector.for example let us see following example
r=rand(1,6)
we get
0.8687 0.0844 0.3998 0.2599 0.8001 0.4314
let us create another vector s by following operation
s=r-mean(r(:));
after this we get
0.3947 -0.3896 -0.0743 -0.2142 0.3260 -0.0426
if we calculate mean of s by following formula
mean(s)
we get
ans =
-5.5511e-017
actually as i have checked this number is very small
-5.5511*exp(-017)
ans =
-2.2981e-007
so we should think that our vector has mean zero?so it means that that small deviation from 0 is because of round off error?for exmaple when we are creating white noise or such kind off random uncorrelated sequence of data,actually it is already supposed that even for such small data far from 0,it has zero mean and it is supposed in this case that for example for this case
-5.5511e-017 =0 ?
approximately of course

e-017 means 10 to the power of -17 (10^-17) but still the number is very small and hypothetically it is 0. And if you type
format long;
you will see the real precision used by Matlab

Actually you can refer to the eps command. Although matlab uses double that can encode numbers down to 2.2251e-308 the precission is determined size of the number.
Use it in the format eps(number) - it tell you the how large is the influence of the least significant bit.
on my machine eg. eps(0.3) returns 5.5511e-17 - exactly the number you report.

Related

Matlab function NNZ, numerical zero

I am working on a code in Least Square Non Negative solution recovery context on Matlab, and I need (with no more details because it's not that important for this question) to know the number of non zero elements in my matrices and arrays.
The function NNZ on matlab does exactly what I want, but it happens that I need more information about what Matlab thinks of a "zero element", it could be 0 itself, or the numerical zero like 1e-16 or less.
Does anybody has this information about the NNZ function, cause I couldn't get the original script
Thanks.
PS : I am not an expert on Matlab, so accept my apologies if it's a really simple task.
I tried "open nnz", on Matlab but I only get a small script of commented code lines...
Since nnz counts everything that isn't an exact zero (i.e. 1e-100 is non-zero), you just have to apply a relational operator to your data first to find how many values exceed some tolerance around zero. For a matrix A:
n = nnz(abs(A) > 1e-16);
Also, this discussion of floating-point comparison might be of interest to you.
You can add in a tolerance by doing something like:
nnz(abs(myarray)>tol);
This will create a binary array that is 1 when abs(myarray)>tol and 0 otherwise and then count the number of non-zero entries.

How to calculate the "rest value" of a plot?

Didn't know how to paraphrase the question well.
Function for example:
Data:https://www.dropbox.com/s/wr61qyhhf6ujvny/data.mat?dl=0
In this case how do I calculate that the rest point of this function is ~1? I have access to the vector that makes the plot.
I guess the mean is an approximation but in some cases it can be pretty bad.
Under the assumption that the "rest" point is the steady-state value in your data and the fact that the steady-state value happens the majority of the times in your data, you can simply bin all of the points and use each unique value as a separate bin. The bin with the highest count should correspond to the steady-state value.
You can do this by a combination of histc and unique. Assuming your data is stored in y, do this:
%// Find all unique values in your data
bins = unique(y);
%// Find the total number of occurrences per unique value
counts = histc(y, bins);
%// Figure out which bin has the largest count
[~,max_bin] = max(counts);
%// Figure out the corresponding y value
ss_value = bins(max_bin);
ss_value contains the steady-state value of your data, corresponding to the most occurring output point with the assumptions I laid out above.
A minor caveat with the above approach is that this is not friendly to floating point data whose unique values are generated by floating point values whose decimal values beyond the first few significant digits are different.
Here's an example of your data from point 2300 to 2320:
>> format long g;
>> y(2300:2320)
ans =
0.99995724232555
0.999957488454868
0.999957733165346
0.999957976465197
0.999958218362579
0.999958458865564
0.999958697982251
0.999958935720613
0.999959172088623
0.999959407094224
0.999959640745246
0.999959873049548
0.999960104014889
0.999960333649014
0.999960561959611
0.999960788954326
0.99996101464076
0.999961239026462
0.999961462118947
0.999961683925704
0.999961904454139
Therefore, what I'd recommend is to perhaps round so that the first 5 or so significant digits are maintained.
You can do this to your dataset before you continue:
num_digits = 5;
y_round = round(y*(10^num_digits))/(10^num_digits);
This will first multiply by 10^n where n is the number of digits you desire so that the decimal point is shifted over by n positions. We round this result, then divide by 10^n to bring it back to the scale that it was before. If you do this, for those points that were 0.9999... where there are n decimal places, these will get rounded to 1, and it may help in the above calculations.
However, more recent versions of MATLAB have this functionality already built-in to round, and you can just do this:
num_digits = 5;
y_round = round(y,num_digits);
Minor Note
More recent versions of MATLAB discourage the use of histc and recommend you use histcounts instead. Same function definition and expected inputs and outputs... so just replace histc with histcounts if your MATLAB version can handle it.
Using the above logic, you could also use the median too. If the majority of data is fluctuating around 1, then the median would have a high probability that the steady-state value is chosen... so try this too:
ss_value = median(y_round);

Does the rand function ever produce values of 0 or 1 in MATLAB/Octave?

I'm looking for a function that will generate random values between 0 and 1, inclusive. I have generated 120,000 random values by using rand() function in octave, but haven't once got the values 0 or 1 as output. Does rand() ever produce such values? If not, is there any other function I can use to achieve the desired result?
If you read the documentation of rand in both Octave and MATLAB, it is an open interval between (0,1), so no, it shouldn't generate the numbers 0 or 1.
However, you can perhaps generate a set of random integers, then normalize the values so that they lie between [0,1]. So perhaps use something like randi (MATLAB docs, Octave docs) where it generates integer values from 1 up to a given maximum. With this, define this maximum number, then subtract by 1 and divide by this offset maximum to get values between [0,1] inclusive:
max_num = 10000; %// Define maximum number
N = 1000; %// Define size of vector
out = (randi(max_num, N, 1) - 1) / (max_num - 1); %// Output
If you want this to act more like rand but including 0 and 1, make the max_num variable quite large.
Mathematically, if you sample from a (continuous) uniform distribution on the closed interval [0 1], values 0 and 1 (or any value, in fact) have probability strictly zero.
Programmatically,
If you have a random generator that produces values of type double on the closed interval [0 1], the probability of getting the value 0, or 1, is not zero, but it's so small it can be neglected.
If the random generator produces values from the open interval (0, 1), the probability of getting a value 0, or 1, is strictly zero.
So the probability is either strictly zero or so small it can be neglected. Therefore, you shouldn't worry about that: in either case the probability is zero for practical purposes. Even if rand were of type (1) above, and thus could produce 0 and 1, it would produce them with probability so small that you would "never" see those values.
Does that sound strange? Well, that happens with any number. You "never" see rand ever outputting exactly 1/4, either. There are so many possible outputs, all of them equally likely, that the probability of any given output is virtually zero.
rand produces numbers from the open interval (0,1), which does not include 0 or 1, so you should never get those values.. This was more clearly documented in previous versions, but it's still stated in the help text for rand (type help rand rather than doc rand).
However, since it produces doubles, there are only a finite number of values that it will actually produce. The precise set varies depending on the RNG algorithm used. For Mersenne twister, the default algorithm, the possible values are all multiples of 2^(-53), within the open interval (0,1). (See doc RandStream.list, and then "Choosing a Random Number Generator" for info on other generators).
Note that 2^(-53) is eps/2. Therefore, it's equivalent to drawing from the closed interval [2^(-53), 1-2^(-53)], or [eps/2, 1-eps/2].
You can scale this interval to [0,1] by subtracting eps/2 and dividing by 1-eps. (Use format hex to display enough precision to check that at the bit level).
So x = (rand-eps/2)/(1-eps) should give you values on the closed interval [0,1].
But I should give a word of caution: they've put a lot of effort into making sure that output of rand gives an appropriate distribution of any given double within (0,1), and I don't think you're going to get the same nice properties on [0,1] if you apply the scaling I suggested. My knowledge of floating-point math and RNGs isn't up to explaining why, or what you might do about that.
I just tried this:
octave:1> max(rand(10000000,1))
ans = 1.00000
octave:2> min(rand(10000000,1))
ans = 3.3788e-08
Did not give me 0 strictly, so watch out for floating point operations.
Edit
Even though I said, watch out for floating point operations I did fall for that. As #eigenchris pointed out:
format long g
octave:1> a=max(rand(1000000,1))
a = 0.999999711020176
It yields a floating number close to one, not equal, as you can see now after changing the precision, as #rayryeng suggested.
Although not direct to the question here, I find it helpful to link to this SO post Octave - random generate number that has a one liner to generate 1s and 0s using r = rand > 0.5.

MATLAB: using the find function to get indices of a certain value in an array

I have made an array of doubles and when I want to use the find command to search for the indices of specific values in the array, this yields an empty matrix which is not what I want. I assume the problem lies in the precision of the values and/or decimal places that are not shown in the readout of the array.
command:
peaks=find(y1==0.8236)
array readout:
y1 =
Columns 1 through 11
0.2000 0.5280 0.8224 0.4820 0.8239 0.4787 0.8235 0.4796 0.8236 0.4794 0.8236
Columns 12 through 20
0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794
output:
peaks =
Empty matrix: 1-by-0
I tried using the command
format short
but I guess this only truncates the displayed values and not the actual values in the array.
How can I used the find command to give an array of indices?
By default, each element of a numerical matrix in Matlab is stored using floating point double precision. As you surmise in the question format short and format long merely alter the displayed format, rather than the actual format of the numbers.
So if y1 is created using something like y1 = rand(100, 1), and you want to find particular elements in y1 using find, you'll need to know the exact value of the element you're looking for to floating point double precision - which depending on your application is probably non-trivial. Certainly, peaks=find(y1==0.8236) will return the empty matrix if y1 only contains values like 0.823622378...
So, how to get around this problem? It depends on your application. One approach is to truncate all the values in y1 to a given precision that you want to work in. Funnily enough, a SO matlab question on exactly this topic attracted two good answers about 12 hours ago, see here for more.
If you do decide to go down this route, I would recommend something like this:
a = 1e-4 %# Define level of precision
y1Round = round((1/a) * y1); %# Round to appropriate precision, and leave y1 in integer form
Index = find(y1Round == SomeValue); %# Perform the find operation
Note that I use the find command with y1Round in integer form. This is because integers are stored exactly when using floating point double, so you won't need to worry about floating point precision.
An alternative approach to this problem would be to use find with some tolerance for error, for example:
Index = find(abs(y1 - SomeValue) < Tolerance);
Which path you choose is up to you. However, before adopting either of these approaches, I would have a good hard look at your application and see if it can be reformulated in some way such that you don't need to search for specific "real" numbers from among a set of "reals". That would be the most ideal outcome.
EDIT: The code advocated in the other two answers to this question is neater than my second approach - so I've altered it accordingly.
Testing for equality with floating-point numbers is almost always a bad idea. What you probably want to do is test to see which numbers are close enough to the target value:
peaks = find( abs( y - .8236 ) < .0001 );
The problem is indeed with the precision. The array that you see displayed is not the actual array, as the actual array has more digits for each of the numbers. Changing the format just changes the way in which the array is displayed, so it doesn't solve the problem.
You have two options, either modify the array or modify what you are looking for. It is probably better to modify what you are looking for, since then you are not changing the actual values.
So instead of looking for equality, you can look for proximity (so the difference between the number you are searching for and the number in the array is at most some small epsilon):
peaks = find( abs(y1-0.8236) < epsilon )
In general, when you are dealing with floats, always try to avoid exact comparisons and use some error thresholds, since the representation of these numbers is limited so they are often stored with small inaccuracies.

Issue in viewing double values

I am having an issue with viewing double data in matlab console. Actually, I am importing a matrix from my data file. The value of a particular row and column was 1.543 but in the console when I use disp(x) where x is the matrix imported, it is showing as 1.0e+03 * 0.0002. However, when I try to access that particular element in the matrix using disp(x(25,25)) where 25 and 25 are the row and column numbers it is showing to be 1.543. So I am confused. Any clarifications. It is just that when I print the whole matrix it is showing as 1.0e+03 * 0.0002.
The following command should fix it. It is only a display issue, the precision of the actual values in the matrix are not affected:
format shortG
That happens due to high dynamic range of your data.
Try for example :
x = [10^-10 10^10];
disp(x);
The result is:
1.0e+010 *
0.0000 1.0000
It looks like the first value is zero, but it isn't. It is almost zero compared to the second one. That is not surprising. Try to add to the big value the small one, and subtract, and you get zero. That is due to floating point arithmetic.The following expression is true
isequal( (x(1)+x(2)) - x(2) , 0)
What can be done?
1) A really high dynamic range can cause troubles in any kind of computations. Try to understand where it came from, and solve the problem in a broader context.
2). You can try to set
format long
It can improve the situation visually for some of the cases.