Using Matlab to find output at specified input values - matlab

spacing_Pin = transpose(-27:0.0001:2);
thetah_2nd = Phi_intrp3(ismembertol(spacing_Pin,P_in2nd));
With this code, I want to evaluate Phi_intrp3at indices where spacing_Pinis equal to P_in2nd
I know I have asked similar questions before. And I have got some really helpful answers already. But in this case they do not seem to apply. P_in2ndhas only 40 entries, whereas spacing_Pinhas far more. Therefore I cannot consider the absolute value of the difference of spacing_Pinand P_in2ndto find out where they are closest to equal.
so P_in2ndhas values between -25.9747 and -0.0147. The decimals have 4 digits after the dot, but these are sometimes rounded by Matlab (format short). That's the catch, I think, P_in2nd is not found in spacing_Pin. The result is an empty matrix.
Here's the first 5 entries of P_in2nd:
-25,9747431735299
-24,9747431735299
-23,9947431735299
-23,0047431735299
-22,0047431735299
Now, I want to evaluate ¸Phi_intrp3at these values. For this purpose I can change spacing_Pin, but not P_in2nd. For example, when I search for the first entry of P_in2ndin spacing_Pin, I find that entry 10254 = -25,9747000000000. So I want to evaluate Phi_intrp3at this input entry.
Is there a way of doing this?

Related

Matlab function NNZ, numerical zero

I am working on a code in Least Square Non Negative solution recovery context on Matlab, and I need (with no more details because it's not that important for this question) to know the number of non zero elements in my matrices and arrays.
The function NNZ on matlab does exactly what I want, but it happens that I need more information about what Matlab thinks of a "zero element", it could be 0 itself, or the numerical zero like 1e-16 or less.
Does anybody has this information about the NNZ function, cause I couldn't get the original script
Thanks.
PS : I am not an expert on Matlab, so accept my apologies if it's a really simple task.
I tried "open nnz", on Matlab but I only get a small script of commented code lines...
Since nnz counts everything that isn't an exact zero (i.e. 1e-100 is non-zero), you just have to apply a relational operator to your data first to find how many values exceed some tolerance around zero. For a matrix A:
n = nnz(abs(A) > 1e-16);
Also, this discussion of floating-point comparison might be of interest to you.
You can add in a tolerance by doing something like:
nnz(abs(myarray)>tol);
This will create a binary array that is 1 when abs(myarray)>tol and 0 otherwise and then count the number of non-zero entries.

Subtracting matrices of unequal dimension

When I try to do the following command I get an error.
err = sqrt(mean((xi256-xc256).^2))
I am aware that the matrix sizes are different.
whos xi256 xc256` gives:
Name Size Bytes Class Attributes
xc256 27x1 216 double
xi256 513x1 4104 double
I am supposed to negate find the difference of these two matrices. In fact the command given at the top was in the course notes and the course has been running for several years! I have tried googling ways to resolve this error to perform this subtraction regardless but have found no solution. Maybe one of the matrices can be scaled to match the dimensions of the other? However, I have not been able to find any such functions that would let me do this.
I need to find the RMS error in a given set of data. xc256 was calculated through a numerical method, xi256 gives the true value.
Edit: I was able to use another set of results.
First check that xc256 is correctly computed and that you do not have a matrix transposition gone wrong or something like that. Computing the RMS between two vector of different sizes does not make sense, and padding or replicating will get you rid of the error, but is most probably not what you actually want.
There are just two situations that I can think of, I will list them here:
The line is wrong (not likely as it looks pretty normal, but make sure to check the book!)
The input of the line is wrong
Assuming it is in point 2, again there are two possibilities:
xi256 is of incorrect size (likelyhood of this depends on how you got it)
xc256 is of incorrect size
Assuming it is again point 2, there are yet again 2 likely possibilities:
xc256 should be a scalar
xc256 should be a vector with the same size as xi256
From here on there is insufficient information to continue, but check whether you accidentally made it 27 times as long, or 19 times too short somewhere. Just use some breakpoints throughout the code to see how the size develops.

strcmp files - Very large file size output

I'm reading in a csv file that is about 80MB - data_O3. It's about 250,000 x 5 in size. I created E, which is a little bit larger because it has all the days (data_O3 is missing some days). I want to compare the two so that if the date (saved in variable d3) and siteID (d4) are the same, the data point (column 5) is placed in E.
for j = 1:size(data_O3,1)
E(strcmp(d3,data_O3{j,3})&d4 == data_O3{j,4},5) = data_O3(j,5);
end
This script works fine, but for some reason, running it takes longer than expected. I've run the same code for other data that were only slightly smaller with no problem. Is this an issue with the strcmp code or something else?
The script and files used can be found here: https://www.dropbox.com/sh/7bzq3m1ixfeuhu6/i4oOvxHPkn
There are certainly see a number of ways to speed this up significantly.
First of all, read in all numeric data in as numbers. Matlab is not optimized to work with strings, and even cells should generally be avoided as much as possible. If you want to keep everything as strings, use another language (python or perl)
Once you have the state, county and site read in as numbers, then create a number instead of a string for the siteID. One approach would be to use the formula:
siteID = siteNum + 1e4*countyCode + 1e7*stateCode
That would generate unique siteIDs for all sites.
Use datenum to convert the date field into a number.
You are now in a position where the data_O3 defined on line 79 can be a purely numeric array (no cells!), as can your E matrix. That alone will make the process many times faster.
You also might want to define the E as something other than NaN. Maybe give it values of -1.
There may be more optimizations you can do in the comparison, but do the above first and I expect you will see a huge improvement.

MATLAB: using the find function to get indices of a certain value in an array

I have made an array of doubles and when I want to use the find command to search for the indices of specific values in the array, this yields an empty matrix which is not what I want. I assume the problem lies in the precision of the values and/or decimal places that are not shown in the readout of the array.
command:
peaks=find(y1==0.8236)
array readout:
y1 =
Columns 1 through 11
0.2000 0.5280 0.8224 0.4820 0.8239 0.4787 0.8235 0.4796 0.8236 0.4794 0.8236
Columns 12 through 20
0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794
output:
peaks =
Empty matrix: 1-by-0
I tried using the command
format short
but I guess this only truncates the displayed values and not the actual values in the array.
How can I used the find command to give an array of indices?
By default, each element of a numerical matrix in Matlab is stored using floating point double precision. As you surmise in the question format short and format long merely alter the displayed format, rather than the actual format of the numbers.
So if y1 is created using something like y1 = rand(100, 1), and you want to find particular elements in y1 using find, you'll need to know the exact value of the element you're looking for to floating point double precision - which depending on your application is probably non-trivial. Certainly, peaks=find(y1==0.8236) will return the empty matrix if y1 only contains values like 0.823622378...
So, how to get around this problem? It depends on your application. One approach is to truncate all the values in y1 to a given precision that you want to work in. Funnily enough, a SO matlab question on exactly this topic attracted two good answers about 12 hours ago, see here for more.
If you do decide to go down this route, I would recommend something like this:
a = 1e-4 %# Define level of precision
y1Round = round((1/a) * y1); %# Round to appropriate precision, and leave y1 in integer form
Index = find(y1Round == SomeValue); %# Perform the find operation
Note that I use the find command with y1Round in integer form. This is because integers are stored exactly when using floating point double, so you won't need to worry about floating point precision.
An alternative approach to this problem would be to use find with some tolerance for error, for example:
Index = find(abs(y1 - SomeValue) < Tolerance);
Which path you choose is up to you. However, before adopting either of these approaches, I would have a good hard look at your application and see if it can be reformulated in some way such that you don't need to search for specific "real" numbers from among a set of "reals". That would be the most ideal outcome.
EDIT: The code advocated in the other two answers to this question is neater than my second approach - so I've altered it accordingly.
Testing for equality with floating-point numbers is almost always a bad idea. What you probably want to do is test to see which numbers are close enough to the target value:
peaks = find( abs( y - .8236 ) < .0001 );
The problem is indeed with the precision. The array that you see displayed is not the actual array, as the actual array has more digits for each of the numbers. Changing the format just changes the way in which the array is displayed, so it doesn't solve the problem.
You have two options, either modify the array or modify what you are looking for. It is probably better to modify what you are looking for, since then you are not changing the actual values.
So instead of looking for equality, you can look for proximity (so the difference between the number you are searching for and the number in the array is at most some small epsilon):
peaks = find( abs(y1-0.8236) < epsilon )
In general, when you are dealing with floats, always try to avoid exact comparisons and use some error thresholds, since the representation of these numbers is limited so they are often stored with small inaccuracies.

Problem using the find function in MATLAB

I have two arrays of data that I'm trying to amalgamate. One contains actual latencies from an experiment in the first column (e.g. 0.345, 0.455... never more than 3 decimal places), along with other data from that experiment. The other contains what is effectively a 'look up' list of latencies ranging from 0.001 to 0.500 in 0.001 increments, along with other pieces of data. Both data sets are X-by-Y doubles.
What I'm trying to do is something like...
for i = 1:length(actual_latency)
row = find(predicted_data(:,1) == actual_latency(i))
full_set(i,1:4) = [actual_latency(i) other_info(i) predicted_info(row,2) ...
predicted_info(row,3)];
end
...in order to find the relevant row in predicted_data where the look up latency corresponds to the actual latency. I then use this to created an amalgamated data set, full_set.
I figured this would be really simple, but the find function keeps failing by throwing up an empty matrix when looking for an actual latency that I know is in predicted_data(:,1) (as I've double-checked during debugging).
Moreover, if I replace find with a for loop to do the same job, I get a similar error. It doesn't appear to be systematic - using different participant data sets throws it up in different places.
Furthermore, during debugging mode, if I use find to try and find a hard-coded value of actual_latency, it doesn't always work. Sometimes yes, sometimes no.
I'm really scratching my head over this, so if anyone has any ideas about what might be going on, I'd be really grateful.
You are likely running into a problem with floating point comparisons when you do the following:
predicted_data(:,1) == actual_latency(i)
Even though your numbers appear to only have three decimal places of precision, they may still differ by very small amounts that are not being displayed, thus giving you an empty matrix since FIND can't get an exact match.
One feature of floating point numbers is that certain numbers can't be exactly represented, since they aren't an integer power of 2. This occurs with the numbers 0.1 and 0.001. If you repeatedly add or multiply one of these numbers you can see some unexpected behavior. Amro pointed out one example in his comment: 0.3 is not exactly equal to 3*0.1. This can also be illustrated by creating your look-up list of latencies in two different ways. You can use the normal colon syntax:
vec1 = 0.001:0.001:0.5;
Or you can use LINSPACE:
vec2 = linspace(0.001,0.5,500);
You'd think these two vectors would be equal to one another, but think again!:
>> isequal(vec1,vec2)
ans =
0 %# FALSE!
This is because the two methods create the vectors by performing successive additions or multiplications of 0.001 in different ways, giving ever so slightly different values for some entries in the vector. You can take a look at this technical solution for more details.
When comparing floating point numbers, you should therefore do your comparisons using some tolerance. For example, this finds the indices of entries in the look-up list that are within 0.0001 of your actual latency:
tolerance = 0.0001;
for i = 1:length(actual_latency)
row = find(abs(predicted_data(:,1) - actual_latency(i)) < tolerance);
...
The topic of floating point comparison is also covered in this related question.
You may try to do the following:
row = find(abs(predicted_data(:,1) - actual_latency(i))) < eps)
EPS is accuracy of floating-point operation.
Have you tried using a tolerance rather than == ?