Here is a very quick one which should be simple to answer if I can explain myself adequately.
I want to create a 144 x 96 x 10000 array called A such that
A(1,1,:) = 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010....10000 etc.
....
A(144,96,:) = 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010....10000 etc.
I assume I should use a combination of ones and repmat but I cant seem to figure this one out.
Thanks.
Permute will kill you on large arrays,... you can also try:
array= 0.001:0.001:1000;
A = repmat(reshape(array,1,1,numel(array)),[144 96 1]);
you could do it the following way:
array=0.001:0.001:1000;
M=permute(repmat(array,144,1,96),[1 3 2])
It looks like repmat doesn't like [144,96,1] so we will create it in other size and then just change the order of the dimensions with permute
Related
I have a simulation running at 50 Hz, and some data that comes in at 10 Hz. I have extra 'in-between' points with dummy data at the following 50 Hz time points, and interpolation set to off. This should in theory ensure that between 10 Hz time steps, the dummy data is being held and only at the 10 Hz steps is the actual data present. For example, my data vector would be
[0.0 0.8 0.1 0.12 0.2 0.22 0.3 0.32 0.4 0.42 0.5 0.52 ...
-1 -1 1 -1 2 -1 3 -1 4 -1 5 -1 ...]
However, with a scope attached directly from the 'from-workspace' block, simulink is returning this:
[0.0 0.8 0.1 0.12 0.2 0.22 0.3 0.32 0.34 0.4 0.42 0.5 0.52...
-1 -1 -1 -1 2 -1 3 3 -1 4 -1 5 5...]
where some values are skipped and others are repeated in a consistent pattern. Is there something with simulinks time-step algorithms that would cause this?
Edit: A solution I ended up finding was to offset the entire time vector by 1/100th of a second so that the sim was taking data between points rather than on points, and that seemed to fix it. Not ideal, but functional.
I am using mle and mlecov to estimate the mean and variance of the scalar noise signal n which is assumed to be normally distributed with the following models for mean and standard deviation:
mean(x,y) = #(x,y) k(1)+k(2)*x+k(3)*x.^2+k(4)*y+k(5)*y.^2;
sd(x,y) = #(x,y) k(6)+k(7)*x+k(8)*x.^2+k(9)*y+k(10)*y.^2;
where x is in the [0,3] interval and and y is in the [0,pi/2] interval (thus, scaling does not immediately seem to be an issue). The sample of n, x and y values used for MLE has 10981 samples. Here are some graphs to show the sample qualitatively:
Figure 1. Histogram of the noise samples.
Figure 2. Scatter plot of the noise samples vs. the x and y samples respectively.
My goal is to compute the maximum likelihood estimates for the k(i) model parameters, i=1,...,10, as well as their standard deviation, kSE(i) (given by the square root of the diagonal elements of the asymptotic covariance matrix output by mlecov).
For the maximum likelihood estimation, I minimize the negative log likelihood:
I also give MATLAB the analytical gradient of the negative log likelihood L(k(1),...,k(10)), used by mle and mlecov such that numerical approximations of the gradient hopefully do not contribute to the numerical issue I am about to describe.
Numerical Issue
To demonstrate the issue, I present three scenarios.
Scenario 1. I directly run mle and mlecov on the sample data. This outputs the following Stata-like summary:
-----------------------------------------------------------------------------
Coeffs | Val. Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------------
k1 | -0.0153 0.0014 -11.27 0.000 -0.0179 -0.0126
k2 | 0.0075 0.0016 4.79 0.000 0.0045 0.0106
k3 | 0.0045 0.0006 7.44 0.000 0.0033 0.0056
k4 | 0.0131 0.0023 5.57 0.000 0.0085 0.0177
k5 | -0.0101 0.0012 -8.45 0.000 -0.0125 -0.0078
k6 | 0.0114 0.0011 10.25 0.000 0.0092 0.0135
k7 | 0.0244 0.0011 21.86 0.000 0.0222 0.0266
k8 | -0.0001 0.0004 -0.34 0.732 -0.0010 0.0007
k9 | -0.0190 0.0018 -10.48 0.000 -0.0225 -0.0154
k10 | 0.0057 0.0009 6.32 0.000 0.0039 0.0074
-----------------------------------------------------------------------------
The "Val." column corresponds to the k(i) estimates and the "Std. Err." column corresponds to kSE(i). The "P>|z|" column gives the p-value for a single coefficient Wald test of the null hypothesis k(i)==0 (if this p-value is <0.05, we reject the null hypothesis and thus conclude that the coefficient k(i) may be significant at the 95% level).
Note that to compute the asymptotic covariance matrix of the k(i) estimates, mlecov computes the Hessian H of L(k(1),...,k(10)) - which I provide an analytic gradient for. The condition number of H is cond(H)=2.7437e3. The mlecov function does a Cholesky factorization of the Hessian, which gives the upper-triangular matrix R with cond(R)=52.38.
Scenario 2. I multiply all samples by 0.1 and thus run mle and mlecov on the sample data n*0.1, x*0.1 and y*0.1. This outputs the following summary:
-----------------------------------------------------------------------------
Coeffs | Val. Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------------
k1 | -0.0010 0.0001 -7.39 0.000 -0.0013 -0.0008
k2 | 0.0063 0.0016 3.97 0.000 0.0032 0.0093
k3 | 0.0494 0.0060 8.21 0.000 0.0376 0.0611
k4 | 0.0023 0.0024 0.95 0.340 -0.0024 0.0070
k5 | -0.0462 0.0123 -3.75 0.000 -0.0704 -0.0221
k6 | 0.0014 0.0001 12.30 0.000 0.0012 0.0016
k7 | 0.0220 0.0011 20.86 0.000 0.0200 0.0241
k8 | 0.0078 0.0042 1.87 0.062 -0.0004 0.0160
k9 | -0.0228 0.0020 -11.27 0.000 -0.0267 -0.0188
k10 | 0.0747 0.0097 7.70 0.000 0.0557 0.0937
-----------------------------------------------------------------------------
The p-values have changed. Also, now cond(H)=9.3831e5 (!!!) and cond(R)=968.6616. Note that when I remove the second order terms (x.^2 and y.^2) from the mean and standard deviation models, there is no longer this problem (i.e. the p-values stay the same and the k(i) values, except for the constant terms k(1) and k(6), are simply scaled by 0.1). Does this indicate a numerical issue?
Scenario 3. I decided to also try scaling n, x and y to the interval [-1,1] by dividing their samples by the largest element (i.e. n(i)=n(i)/max(abs(n)), x(i)=x(i)/max(abs(x)) and y(i)=y(i)/max(abs(y))). Running mle and mlecov on this scaled sample outputs the following summary:
-----------------------------------------------------------------------------
Coeffs | Val. Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------------
k1 | -0.0347 0.0041 -8.40 0.000 -0.0428 -0.0266
k2 | 0.1193 0.0141 8.46 0.000 0.0917 0.1470
k3 | 0.0482 0.0164 2.94 0.003 0.0160 0.0803
k4 | -0.0002 0.0120 -0.02 0.987 -0.0238 0.0234
k5 | -0.0305 0.0103 -2.96 0.003 -0.0506 -0.0103
k6 | 0.0557 0.0035 16.11 0.000 0.0489 0.0624
k7 | 0.1131 0.0107 10.60 0.000 0.0922 0.1341
k8 | 0.1164 0.0128 9.13 0.000 0.0914 0.1414
k9 | -0.1132 0.0094 -11.99 0.000 -0.1317 -0.0947
k10 | 0.0583 0.0079 7.37 0.000 0.0428 0.0738
-----------------------------------------------------------------------------
The p-values have changed again! Now cond(H)=4.7550e3 (higher than Scenario 1 (unscaled) but lower than Scenario 2 (everything multiplied by 0.1)). Also, cond(R)=68.9565, which is only slightly higher than for Scenario 1.
My problem
The expected behavior across the three analyses, for me, is that k(i) and kSE(i) would change but the p-values would remain the same - in other words, scaling the data should not make any model coefficient more or less statistically significant. This is contrary to the above scenarios, where the p-values change each time!
Please help me to debug this numerical issue - or explain whether this is in fact the expected behavior and I have misunderstood something. Thank you for reading this long post and helping - I tried to encapsulate all relevant problem details here.
First, I assume you are controlling the random seed of the sampling so that's the same in all scenarios.
That taken care of, I think it may have something to do with the optimization problem you're trying to solve.
I have firsthand experience that tiny numerical changes (in my case, scaling the loglikelihood function by a factor, or equivalently: adding copies of all the datapoints) will change your result when the objective function is not convex.
I would try to derive the analytical gradient of the loglikelihood function in all of the parameters.
This should give you an idea of whether the optimization problem is convex.
If it is not convex, there are some things to do to make sure you get the real MLE.
Optimize the function 1000 times and pick the estimate with the highest loglikelihood
Change the tolerance and number of steps of the optimizer
Try other optimizers, like trust-region searches or particle swarms
I would start by simulating a simpler version of this problem and build it up gradually to see where this behaviour starts happening. For example, start with just 1 parameter for the mean and 1 for the noise, and see what happens with the p values then.
I have two vectors:
First have many values between 0 and 1
Second vector have 100 values(intervals) between 0 and 1: [0 0.01 0.02 .... 1] where 0 0.01 is first interval, 0.01 0.02 second and so on.
I need to create vector, where each element is the number of occurrences of elements of the first vector in each interval from second.
For example:
first [0.00025 0.0001 0.0011 0.0025 0.009 ...(a lot of values bigger then 0.01) ... 1]
then first element of result vector should be 5, and so on.
Any ideas how to implement this in matlab?
i have some response data as vector A where the variables are L and D.
I just want to find the coefficients for L and D which will fit my data in the form mentioned in the title.
I want to fit a curved line, and not a surface.
I feel it should be fairly simple, but reading a few old answers also didn't help my case.
Is there some easy way to do this?
In case u want to see the data, here it is:
A = [0 0.06 0.12 0.44 0.56 0.94 1 1 0 0.04 0.58 0.74 0.86 1 1]
L = [100 100 100 100 100 100 100 100 43.7 49.7 56 61.5 65 77 93.8]
D = [11.3 10.1 8.9 8.5 8.1 7.7 6.5 5.3 5 5 5 5 5 5 5]
Thanks a lot.
More info:
I wrote the above equation as logA = xlogL + ylogD, and tried to use
X = [ones(size(logL)) logL logD];
b = regress(logA,X);
but Matlab didn't return any coefficients, it just gave b = NaN NaN NaN
Jos from mathworks forum gave me the correct answer. Here it is:
nlm = fitnlm([L(:) D(:)], A, 'y~(x1^b1)*(x2^b2)', [0 0])
In case you dont have fitnlm, NonLinearModel.fit will also do. In fact, I used the latter.
Hope this helps someone.
I need to read an ASCII data file using MATLAB fscanf command. Data is basically floating numbers with fixed field length and precision. In each row of data file there are 10 columns of numeric values and the number of row varies from one file to another one. Below is an example of the first line:
0.000 0.000 0.005 0.000 0.010 0.000 0.015 0.000 0.020 -0.000
The field width is 7 and precision is 3.
I have tried:
x = fscanf(fid,'%7.3f\r\n');
x = fscanf(fid,[repmat('%7.3f',1,10) '\r\n']);
but they return nothing!
When I do not specify the field and precision, for example x = fscanf(fid,'%f');, it reads all the data but sine some data occupy exactly 7 spaces (example 158.000) it joins the two consecutive numbers which results in a wrong output. Here is an example:
0.999158.000
it reads this as 0.999158 and .000
Any hint or help will be highly appreciated.
If your data might not be separated by a space (0.999158.000 in the example you made in the question), you could try using textscan to read the file.
Notice that with this format you can not have an input such as -158.000.
Nevertheless, with this format, you can not have a value such as -158.000
Since textscan returns a cellarray you might need to convert the cellarray into a matrix (if you do not like working with cellarray).
fp=fopen('input_file_5.txt')
x = textscan(fp,repmat('%7.3f',1,10))
fclose(fp)
m=[x{:}]
Input file
0.999130.000 0.005 0.000 0.010 0.000 0.015 0.000 0.020 -0.000
0.369-30.000123.005 0.000 0.040 0.000 0.315 0.000 0.020-10.000
Output
m =
Columns 1 through 8
0.9990 130.0000 0.0050 0 0.0100 0 0.0150 0
0.3690 -30.0000 123.0050 0 0.0400 0 0.3150 0
Columns 9 through 10
0.0200 0
0.0200 -10.0000
Hope this helps.
For reading ASCII text files with well defined input as specified in the question you should use the dlmread function.
>> X = dlmread(filename, delimiter);
will read numeric data from filename that is delimited (along the same row) with delimiter into the matrix X. For you case you can use
>> X = dlmread(filename, ' ');
as your data is delimited by a space, ' '.