tfest :: too many parameters for chosen data size - matlab

I am trying to find transfer function for some input data and output data using the code
Temperature = [zeros(1,153) 300*ones(1,47)];
out_temp = [zeros(1,147) ScopeData4.signals(1).values'];
N = 1;
tfdata_tem = iddata(out_temp,Temperature,0.001);
sys = tfest(tfdata_tem,N);
but in the end I get the following error despite the fact that i have increased the number of samples and reduced the order to 1
There are too many parameters to estimate for chosen estimation data size. Reduce model order or use a larger data set.

The most likely problem is that your data set doesn't contain a rich enough set of frequencies for the underlying algorithm to estimate a model (of any order).
The iddata1 sample data set gives an example of what typical data should look like.
In particular, note that the input signal is comprised of many steps, occurring at non-regular intervals, unlike your data that has just one step.
load iddata1 z1;
plot(z1);

As the answer by Phil Goddard shows in the figure, you need two column output values and input values. But the values in your programming are two row values. That means you need to change it into
Temperature = [zeros(1,153) 300*ones(1,47)]';
out_temp = [zeros(1,147) ScopeData4.signals(1).values']';
N = 1;
tfdata_tem = iddata(out_temp,Temperature,0.001);
sys = tfest(tfdata_tem,N);

Related

Append for MATLAB

I am training an ANN, and I want to have different instances of training. In each instance, I want to find the maximum difference between the actual and predicted output. Then I want to take the average of all these maximums.
My code so far is:
maximum = [];
k=1;
for k = 1:5
%Train network
layers = [ ...
imageInputLayer([250 1 1])
reluLayer
fullyConnectedLayer(100)
fullyConnectedLayer(100)
fullyConnectedLayer(1)
regressionLayer];
options = trainingOptions('sgdm','InitialLearnRate',0.1, ...
'MaxEpochs',1000);
net = trainNetwork(nnntrain,nnnfluidtrain,layers,options);
net.Layers
%Test network
predictedn = predict(net,nnntest);
maximum = append(maximum, max(abs(predictedn-nnnfluidtest)));
k=k+1
end
My intent is to produce a list named 'maximum' with five elements (the max of each ANN training instance) that I would then like to take the average of.
However, it keeps giving me the error:
wrong number of input arguments for obsolete matrix-based syntax
when it tries to append. The first input is a list while the second is a 1x1 single.
Appending in MATLAB is a native operation. You append elements by actually building a new vector where the original vector is part of the input.
Therefore:
maximum = [maximum max(abs(predictedn-nnnfluidtest))];
If for some reason you would like to do it in function form, the function you are looking for is cat which is short form for concatenate. The append function is seen in multiple toolboxes but each one of them does not do what you want. cat is what you want but you still need to provide the original input vector as part of the arguments:
maximum = cat(2, maximum, max(abs(predictedn-nnnfluidtest)));
The first argument is the axis you want to append to. To respect the code that you're doing above, you want the columns to increase as you extend your vector so that is the second axis, or the axis being 2.

Matlab Curve fitting returns different parameter values every time

I'm trying to fit some data with the following equation, which is a modified error function and contains 4 unknown parameters (I'm mainly interested in par4):
Y = par1+(par2*(erf((X-par3)/ par4)))
If I set some sensible start points, the data point are fitted reasonably well (R-square ~ 0.95) but the error on some of the parameters found has a very wide range. Below is a image of the data to fit, the fitted curve and the parameters found.
Data and fit:
par4 = 0.9109 (-52.89, 54.71)
par2 = 0.02647 (0.02421, 0.02872)
par3 = 38.15 (8.128, 68.18)
par1 = 0.8647 (0.8624, 0.867)
If I change even slightly the start points (e.g. for par4, from 0.5 to 0.1), the final fit values change. Similarly, if I make `TolFun value larger the final parameters and their error change as well as the fit. In some cases, I'm even presented with no error intervals for the found parameters. For instance (if I set par4 start point = 3 and the others remain unchanged) I get:
par4 = 0.1712
par2 = 0.02674
par3 = 38.63
par1 = 0.8649 (0.8625, 0.8672)
As you can imagine, I'm no expert in the maths of the fitting process but I thought that the different results mean that the data is poor (no data points in the centre of the curve) and many curves can reproduce it with a relatively good R-square. Also, the large error intervals may mean that the spread in the data is large. Are my thoughts correct?
Also, what does the option TolFun control? And, I know the error intervals are calculated on 95% (2sigma) criterion; can I change that to 68% (1sigma) to have a narrower error interval?

Matlab vectorization of multiple embedded for loops

Suppose you have 5 vectors: v_1, v_2, v_3, v_4 and v_5. These vectors each contain a range of values from a minimum to a maximum. So for example:
v_1 = minimum_value:step:maximum_value;
Each of these vectors uses the same step size but has a different minimum and maximum value. Thus they are each of a different length.
A function F(v_1, v_2, v_3, v_4, v_5) is dependant on these vectors and can use any combination of the elements within them. (Apologies for the poor explanation). I am trying to find the maximum value of F and record the values which resulted in it. My current approach has been to use multiple embedded for loops as shown to work out the function for every combination of the vectors elements:
% Set the temp value to a small value
temp = 0;
% For every combination of the five vectors use the equation. If the result
% is greater than the one calculated previously, store it along with the values
% (postitions) of elements within the vectors
for a=1:length(v_1)
for b=1:length(v_2)
for c=1:length(v_3)
for d=1:length(v_4)
for e=1:length(v_5)
% The function is a combination of trigonometrics, summations,
% multiplications etc..
Result = F(v_1(a), v_2(b), v_3(c), v_4(d), v_5(e))
% If the value of Result is greater that the previous value,
% store it and record the values of 'a','b','c','d' and 'e'
if Result > temp;
temp = Result;
f = a;
g = b;
h = c;
i = d;
j = e;
end
end
end
end
end
end
This gets incredibly slow, for small step sizes. If there are around 100 elements in each vector the number of combinations is around 100*100*100*100*100. This is a problem as I need small step values to get a suitably converged answer.
I was wondering if it was possible to speed this up using Vectorization, or any other method. I was also looking at generating the combinations prior to the calculation but this seemed even slower than my current method. I haven't used Matlab for a long time but just looking at the number of embedded for loops makes me think that this can definitely be sped up. Thank you for the suggestions.
No matter how you generate your parameter combination, you will end up calling your function F 100^5 times. The easiest solution would be to use parfor instead in order to exploit multi-core calculation. If you do that, you should store the calculation results and find the maximum after the loop, because your current approach would not be thread-safe.
Having said that and not knowing anything about your actual problem, I would advise you to implement a more structured approach, like first finding a coarse solution with a bigger step size and narrowing it down successivley by reducing the min/max values of your parameter intervals. What you have currently is the absolute brute-force method which will never be very effective.

Generating Data Set in Matlab

I wanted to ask how to generate a data set in Matlab. I need it to test Feature Selection Algorithms on high dimensional data... The data set should be synthetic, multivariate and contain INTERACTING features.
Synthetic data sets like the MONKS problem is available on http://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems .... unfortunately I have no clue how to visualize/generate and modify the data according to my need. The goal is to run an algorithm which detects interacting features.
I will be very thankful for a kind reply.
I'm not sure this is what you are looking for, but if I needed to do this, I would start by generating anonymous functions and generic variable names that I could apply randomly within a dataset.
For example, you could generate a dataset:
myData = rand(100,6);
and create a few functions which include interdependencies
interact = #(x) x*x;
interact2 = #(x) x*(x-1);
then create a random logical distribution
y = round(rand(100,1)); %(100 rows of random 0's or 1's)
go through the dataset and use the interact function on only rows where y is true
dataset(y == 1,:) = interact(dataset(y==1,:));
repeat the above with the other interaction functions you define if you desire. it would probably be useful to do this so that you can avoid row dependencies (see below) so generating a few datasets could be in order, i.e.
dataset2(y==1,:) = interact2(dataset(y==1,:));
A similar approach might be taken with variables (in the example set it shows some categorical variables).
myVariable = repmat('data', 100, 1);
listofvariables = genvarname(cellstr(myVariable));
y = round(rand(100,1)); % logical index for the data
randomly select a generic variable to repeat
applyvar = round(rand(1,1)*100);
selectedVariable = listofvariables(applyvar);
replace indices of the variable list with your repeated variable
listofvariables(y == 1) = selectedVariable;
put together the dataset(s) in some order of your choosing
[cellstr(num2str(dataset(:,1))) listofvariables cellstr(num2str(dataset(:,2)) cellstr(num2str(dataset2(:,2))]

Find a Binary Data Sequence in a Signal

Here's my goal:
I'm trying to find a way to search through a data signal and find (index) locations where a known, repeating binary data sequence is located. Then, because the spreading code and demodulation is known, pull out the corresponding chip of data and read it. Currently, I believe xcorr will do the trick.
Here's my problem:
I can't seem to interpret my result from xcorr or xcorr2 to give me what I'm looking for. I'm either having a problem cross-referencing from the vector location of my xcorr function to my time vector, or a problem properly identifying my data sequence with xcorr, or both. Other possibilities may exist.
Where I am at/What I have:
I have created a random BPSK signal that consists of the data sequence of interest and garbage data over a repeating period. I have tried processing it using xcorr, which is where I am stuck.
Here's my code:
%% Clear Variables
clc;
clear all, close all;
%% Create random data
nbits = 2^10;
ngarbage = 3*nbits;
data = randi([0,1],1,nbits);
garbage = randi([0,1],1,ngarbage);
stream = horzcat(data,garbage);
%% Convert from Unipolar to Bipolar Encoding
stream_b = 2*stream - 1;
%% Define Parameters
%%% Variable Parameters
nsamples = 20*nbits;
nseq = 5 %# Iterate stream nseq times
T = 10; %# Number of periods
Ts = 1; %# Symbol Duration
Es = Ts/2; %# Energy per Symbol
fc = 1e9; %# Carrier frequency
%%% Dependent Parameters
A = sqrt(2*Es/Ts); %# Amplitude of Carrier
omega = 2*pi*fc %# Frequency in radians
t = linspace(0,T,nsamples) %# Discrete time from 0 to T periods with nsamples samples
nspb = nsamples/length(stream) %# Number of samples per bit
%% Creating the BPSK Modulation
%# First we have to stretch the stream to fit the time vector. We can quickly do this using _
%# simple matrix manipulation.
% Replicate each bit nspb/nseq times
repStream_b = repmat(stream_b',1,nspb/nseq);
% Tranpose and replicate nseq times to be able to fill to t
modSig_proto = repmat(repStream_b',1,nseq);
% Tranpose column by column, then rearrange into a row vector
modSig = modSig_proto(:)';
%% The Carrier Wave
carrier = A*cos(omega*t);
%% Modulated Signal
sig = modSig.*carrier;
Using XCORR
I use xcorr2() to eliminate the zero padding effect of xcorr on unequal vectors. See comments below for clarification.
corr = abs(xcorr2(data,sig); %# pull the absolute correlation between data and sig
[val,ind] = sort(corr(:),'descend') %# sort the correlation data and assign values and indices
ind_max = ind(1:nseq); %# pull the nseq highest valued indices and send to ind_max
Now, I think this should pull the five highest correlations between data and sig. These should correspond to the end bit of data in the stream for every iteration of stream, because I would think that is where the data would most strongly cross-correlate with sig, but they do not. Sometimes the maxes are not even one stream length apart. So I'm confused here.
Question
In a three part question:
Am I missing a certain step? How do I use xcorr in this case to find where data and sig are most strongly correlated?
Is my entire method wrong? Should I not be looking for the max correlations?
Or should I be attacking this problem from another angle, id est, not use xcorr and maybe use filter or another function?
Your overall method is great and makes a lot of sense. The problem you're having is that you're getting some actual correlation with your garbage data. I noticed that you shifted all of your sream to be zero-centered, but didn't do the same to your data. If you zero-center the data, your correlation peaks will be better defined (at least that worked when I tried it).
data = 2*data -1;
Also, I don't recommend using a simple sort to find your peaks. If you have a wide peak, which is especially possible with a noisy signal, you could have two high points right next to each other. Find a single maximum, and then zero that point and a few neighbors. Then just repeat however many times you like. Alternatively, if you know how long your epoch is, only do a correlation with one epoch's worth of data, and iterate through the signal as it arrives.
With #David K 's and #Patrick Mineault's help I manage to track down where I went wrong. First #Patrick Mineault suggested I flip the signals. The best way to see what you would expect from the result is to slide the small vector along the larger, searched vector. So
corr = xcorr2(sig,data);
Then I like to chop off the end there because it's just extra. I did this with a trim function I made that simply takes the signal you're sliding and trims it's irrelevant pieces off the end of the xcorr result.
trim = #(x,s2) x(1:end - (length(s2) - 1));
trim(corr,data);
Then, as #David K suggests, you need to have the data stream you're looking for encoded the same as your searched signal. So in this case
data = 2*data-1;
Second, if you just have your data at it's original bit length, and not at it's stretched, iterated length, it can be found in the signal but it will be VERY noisy. To reduce the noise, simply stretch the data to match it's stretched length in the iterated signal. So
rdata = repmat(data',1,nspb/nseq);
rdata = repmat(rdata',1,nseq);
data = rdata(:)';
Now finally, we should have crystal clear correlations for this case. And to pull out the maxes that should correspond to those correlations I wrote
[sortedValues sortIndex] = sort(corr(:),'descend');
c = 0 ;
for r = 1 : length(sortedValues)
if sortedValues(r,:) == max(corr)
c = c + 1;
maxIndex(1,c) = sortIndex(r,:);
else break % If you don't do this, you get loop lock
end
end
Now c should end up being nseq for this case and you should have 5 index times where the corrs should be! You can easily pull out the bits with another loop and c or length(maxIndex). I've also made this into a more "real world" toy script, where there is a data stream, doppler, fading, and it's over a time vector in seconds instead of samples.
Thanks for the help!
Try flipping the signal, i.e.:
corr = abs(xcorr2(data,sig(end:-1:1));
Is that any better?