I'm creating a function to use multiple matrices in an analysis studio.
The matrices are given with the same name with a date reference in the name (month to month and year to year: nov-1956 is matrix5611, dec-1956 is matrix5612, jan-1957 is matrix5712, and so on until the end of 1999.
For each one there should be a comparison between the mean value of each month/year (depending of what area of study are you focused on).
I'm trying to use some loops to vary the name of the input matrix instead of write manually date by date, but a function that helped would be useful.
Any idea or useful function?
If you have your data in different matrices, you can use eval to store the means to some matrix, in this example MeanMatrix, in which Y dimension is year and X dimension is month:
Edit: It's not running number from 5611 but yymm...
Edit: It seems that matrices don't begin from January 1956 but from November 1956.
% add here missing months matrix index strings.
MissingMatricesCellArray = {'5601', '5602', '5603', '5604', '5605', '5606', '5607', '5608', '5609', '5610'};
% MissingmatricesCellArray = {};
for Year = 56:99
for Month = 1:12
NumString = sprintf('%02d%02d', Year, Month);
% calculate and store means only for matrices that are not missing.
if ~(ismember (cellstr(NumString), MissingMatricesCellArray))
MeanMatrix(Year,Month) = mean(mean(eval ([ 'matrix', NumString ])));
end
end
end
Then you can compare the means of months and years the way you wish.
I would prefer to use cell array's for this rather than eval.
for y = 56:99 % for each year
for m = 1:12 % for each month
ind = createYearMonthInd(y,m);
matrix{ind} = ... % whatever you want here (note the curly braces)
end
end
function ind = createYearMonthInd(y,m)
ind = y * 100 + m;
Related
Trying to calculate the variance of a European option using repeated trial (instead of 1 trial). I want to compare the variance using the standard randn function and the sobolset. I'm not quite sure how to draw repeated samples from the latter.
Generating from randn is easy:
num_steps = 100;
num_paths = 10;
z = rand(num_steps, mum_paths); % 100 paths, for 10 trials
Once I have this, I can loop through all the 10 columns of the z matrix, and can also repeat the experiment many times, as the randn function will provide a new random variable set everytime.
for exp_num = 1: 20
for col = 1: 10
price_vec = z(:, col);
end
end
I'm not quite sure how to do this with the sobolset. I understand I can create a matrix of dimensions to start with (say 100* 10). I can loop through as above through all the columns for the first experiment. However, when I try the next experiment (#2), the loop starts from the beginning and all the numbers are the same. Meaning I don't get any variation in my pricing. It seems I will need to find a way to randomize the column selection at the start of every experiment number. Is there a better way to do this??
data1 = sobolset(1000, 'Skip', 1000, 'Leap', 100)
data2 = net(test1, 10)
for exp_num = 1: 20
% how do I change the start of the column selection here, so that the next data3 is different from %the one in the previous exp_num?
for col = 1:10
data3(:, col) = data(2:, col)
% perform calculations
end
end
I hope this is making sense....
Thanks for the help!
Update: 8/21
I tried the following:
num_runs = 100
num_samples = 1000
for j = 1: num_runs
for i = 1 : num_samples
sobol_set = sobolset(num_samples,'Skip',j*50,'Leap',1e2);
sobol_set = net(sobol_set, 5);
sobol_seq = sobol_set(:, i)';
z_uncorr = norminv(sobol_seq, 0, 1)
% do pricing with z_uncorr through some function F
end
end
After generating 100 prices (through some function F, mentioned above), I find that the variance of the 100 prices is higher than that I get from the standard pseudo random numbers. This should not be the case. I think I'm still not sampling correctly from the sobolset. Any advice would be appreciated.
I have created code that does a Newton's Method approximation. It prints in a table-like format the approximation at each step and the associated error. I want to add a column that shows an integer value that represents the number of correct digits in approximation against the true value.
I am attempting to convert each cell of approximation into a string and counting how many digits are accurate. Example, approx. = 3.14555, true = 3.1555. The number of accurate digits will be 2. Although I have this idea in my head, I am doing it all wrong in my code below. Do you know how to create a proper loop to achieve this? I have less than a year of MATLAB experience; my mental toolbox is limited.
% Program Code of Newton's Method to find root
% This program will not produce a result if initial guess is too far from
% true value
clear;clc;format('long','g')
% Can work for various functions
%FUNCTION: 2*x*log(x)-2*log(x)*x^(3)+2*x^(2)*log(x)-x^(2)+1
%INTIAL GUESS: .01
%ERROR: 1.e-8
a=input('Enter the function in the form of variable x:','s');
x(1)=input('Enter Initial Guess:');
error=input('Enter allowed Error:');
% Passing through the function and calculating the derivative
f=inline(a);
dif=diff(str2sym(a));
d=inline(dif);
% Looping through Newton's Method
for i=1:100
x(i+1)=x(i)-((f(x(i))/d(x(i))));
err(i)=abs(x(i+1)-x(i));
% The loop is broken if acceptable error magnitude is reached
if err(i)<error
break
end
end
root=x(i);
Root = (x(:,1:(end-1)))';
Error = err';
disp('The final approximation is:')
disp(root)
%BELOW IS ALL WRONG, I AM TRYING TO ADD A COLUMN TO 'table'
%THAT SHOWS HOW MANY DIGITS IN APPROXIMATION IS ACCURATE
iter = 0;
y = zeros(1,length(x));
plot(x,y,'+')
zero1 = ('0.327967785331818'); %ACTUAL VALUE
for i = 1:length(Root)
chr = mat2str(Root(i))
for j = 1:length(chr(i))
if chr(i)~=zero1(i)
iter = 0;
return
elseif chr(i)==zero1(i)
iter = iter + 1;
acc(i) = iter
end
end
end
table(Root, Error) %ADD ACCURACY COLUMN HERE
Perhaps something like multiplying both numbers by powers of 10 and then flooring them until the answers are no longer equal:
approx=3.14555;
truth=3.1555;
approx1=0;
truth1=0;
i=0;
while approx1==truth1
approx1=floor(approx*10^i);
truth1=floor(truth*10^i);
i=i+1;
end
acc=i-1;
I'm trying to take 5 years of data from a netcdf file for a variable and create an annual cycle. So take all 5 Jans and average them, take all 5 Febs and average them, etc, and plot on a line graph. I'm just starting the code (I want to get this to work before I move on) and am getting the following error message of: In an assignment A(:) = B, the number of elements in A and B must be the same.
My question is, is there a better way to do this?
Thanks for your help in advance. I'm a newbie, so I know this may be a simple question.
ncid = netcdf.open('example.nc','NC_NOWRITE');
PS1 = netcdf.getVar(ncid,netcdf.inqVarID(ncid, 'ps'), 'single');
for i = 1:12
MonthlyPS1(i) = PS1(month==i);
end
That should dump data into bins for each month, where I can later calculate the average.
If PS1 is a 5-by-12 array of real numbers, then:
MonthlyPS1 = mean(PS1);
If PS1 is a 12-by-5 array of real numbers, then:
MonthlyPS1 = mean(PS1, 2);
If PS1 is a 60-by-1 array of real numbers, then:
MonthlyPS1 = mean(reshape(PS1, 12, 5), 2);
If PS1 is not an array of real numbers, you need to explain what PS1 is, as requested by CST-Link. Also, it is not clear what the variable month is.
I will try to write a solution, but I'm no sure it is what you need (there's a certain amount of guessing related to my answer).
Also I will not write it in Matlab-idiomatic code, but rather explicit loops and calculations; like this you may see easier if it's what you want or not:
% allocate space for monthly mean values
PS1_mean = zeros(size(PS1,1), size(PS1,2), 12)
for d = 1:size(PS1,3)
% calculate the month for date d
m = mod(d, 12);
if m == 0
m = 12;
end;
% cumulate the data
PS1_mean(:, :, m) = PS1_mean(:, :, m) + PS1(:, :, d);
end;
% calculate the mean value
% (this might be tricky for incomplete years)
n_years = fix(size(PS1,3) / 12);
PS1_mean = PS1_mean / n_years;
To plot a "slice" of data for e.g. March, you may try:
mesh(PS1_mean(:,:,3));
Again, I hope that's what you're looking for.
Hello everyone I have a new small problem:
The data I am using have a weird trade time that goes from 17.00 of one day to 16.15 of the day after.
That means that, e.g., for the day 09-27-2013 The source I am using registers the transactions occurred as follows:
DATE , TIME , PRICE
09/27/2013,17:19:42,3225.00,1 #%first obs of the vector
09/27/2013,18:37:59,3225.00,1 #%second obs of the vector
09/27/2013,08:31:32,3200.00,1
09/27/2013,08:36:17,3203.00,1
09/27/2013,09:21:34,3210.50,1 #%fifth obs of the vector
Now first and second obs are incorrect for me: they belong to 9/27 trading day but they have been executed on 9/26. Since I am working on some functions in matlab that relies on non-decremental times I need to solve this issue. The date format I am using is actually the datenum Matlab format so I am trying to solve the problem just subtracting one from the incorrect observations:
%#Call time the time vector, I can identify the 'incorrect' observations
idx=find(diff(time)<0);
time(idx)=time(idx)-1;
It is easy to tell that this will only fix the 'last' incorrect observations of a series. In the previous example this would only correct the second element. And I should run the code several times (I thought about a while loop) until idx will be empty. This is not a big issue when working with small series but I have up to 20millions observations and probably hundred of thousands consecutively incorrect ones.
Is there a way to fix this in a vectorized way?
idx=find(diff(time)<0);
while idx
However, given that the computation would not be so complex I thought that a for loop could efficiently solve the issue and my idea was the following:
[N]=size(time,1);
for i=N:-1:1
if diff(time(i,:)<0)
time(i,:)=time(i,:)-1;
end
end
sadly it does not seems to work.
Here is an example of data I am actually using.
735504.591157407
735507.708030093 %# I made this up to give you an example of two consecutively wrong observations
735507.708564815 %# This is an incorrect observation
735507.160138889
735507.185358796
735507.356562500
Thanks everyone in advance
Sensible version -
for count = 1:numel(time)
dtime = diff([0 ;time]);
ind1 = find(dtime<0,1,'last')-1;
time(ind1) = time(ind1)-1;
end
Faster-but-crazier version -
dtime = diff([0 ;time]);
for count = 1:numel(time)
ind1 = find(dtime<0,1,'last')-1;
time(ind1) = time(ind1)-1;
dtime(ind1+1) = 0;
dtime(ind1) = dtime(ind1)-1;
end
More Crazier version -
dtime = diff([0 ;time]);
ind1 = numel(dtime);
for count = 1:numel(time)
ind1 = find(dtime(1:ind1)<0,1,'last')-1;
time(ind1) = time(ind1)-1;
dtime(ind1) = dtime(ind1)-1;
end
Some average computation runtimes for these versions with various datasizes -
Datasize 1: 3432 elements
Version 1 - 0.069 sec
Version 2 - 0.042 sec
Version 3 - 0.034 sec
Datasize 2: 20 Million elements
Version 1 - 37029 sec
Version 2 - 23303 sec
Version 3 - 20040 sec
So apparently I had 3 other different problems in the data source that I think could have stucked the routine Divakar proposed. Anyway I thought it was being too slow so I started thinking to another solution and came up with a super quick vectorized one.
Given that the observations I wanted to modify fall in a determined known interval of time the function just look for every observation falling in that interval and modifies it as I want (-1 in my case).
function [ datetime ] = correct_date( datetime,starttime, endtime)
%#datetime is my vector of dates and times in matlab numerical format
%#starttime is the starting hour of the interval expressed in datestr format. e.g. '17:00:00'
%#endtime is the ending hour of the interval expressed in datestr format. e.g. '23:59:59'
if (nargin < 1) || (nargin > 3),
error('Requires 1 to 3 input arguments.')
end
% default values
if nargin == 1,
starttime='17:00';
endtime='23:59:59';
elseif nargin == 2,
endtime='23:59:59';
end
tvec=[datenum(starttime) datenum(endtime)];
tvec=tvec-floor(tvec); %#As I am working on multiples days I need to isolate only HH:MM:SS for my interval limits
temp=datetime-floor(datetime); %#same motivation as in the previous line
idx=find(temp>=tvec(1)&temp<=tvec(2)); %#logical find the indices
datetime(idx)=datetime(idx)-1; %#modify them as I want
clear tvec temp idx
end
So I have a list of 190 numbers ranging from 1:19 (each number is repeated 10 times) that I need to sample 10 at a time. Within each sample of 10, I don't want the numbers to repeat, I tried incorporating a while loop, but computation time was way too long. So far I'm at the point where I can generate the numbers and see if there are repetitions within each subset. Any ideas?
N=[];
for i=1:10
N=[N randperm(19)];
end
B=[];
for j=1:10
if length(unique(N(j*10-9:j*10)))<10
B=[B 1];
end
end
sum(B)
Below is an updated version of the code. this might be a little more clear in showing what I want. (19 targets taken 10 at a time without repetition until all 19 targets have been repeated 10 times)
nTargs = 19;
pairs = nchoosek(1:nTargs, 10);
nPairs = size(pairs, 1);
order = randperm(nPairs);
values=randsample(order,19);
targs=pairs(values,:);
Alltargs=false;
while ~Alltargs
targs=pairs(randsample(order,19),:);
B=[];
for i=1:19
G=length(find(targs==i))==10;
B=[B G];
end
if sum(B)==19
Alltargs=true;
end
end
Here are some very simple steps to do this, basically you just shuffle the vector once, and then you grab the last 10 unique values:
v = repmat(1:19,1,10);
v = v(randperm(numel(v)));
[a idx]=unique(v);
result = unique(v);
v(idx)=[];
The algorithm should be fairly efficient, if you want to do the next 10, just run the last part again and combine the results into a totalResult
You want to sample the numbers 1:19 randomly in blocks of 10 without repetitions. The Matlab function 'randsample' has an optional 'replacement' argument which you can set to 'false' if you do not want repetitions. For example:
N = [];
replacement = false;
for i = 1:19
N = [N randsample(19,10,replacement)];
end
This generates a 19 x 10 matrix of random integers in the range [1,..,19] without repetitions within each column.
Edit: Here is a solution that addresses the requirement that each of the integers [1,..,19] occurs exactly 10 times, in addition to no repetition within each column / sample:
nRange = 19; nRep = 10;
valueRep = true; % true while there are repetitions
nLoops = 0; % count the number of iterations
while valueRep
l = zeros(1,nRep);
v = [];
for m = 1:nRep
v = [v, randperm(nRange,nRange)];
end
m1 = reshape(v,nRep,nRange);
for n = 1:nRep
l(n) = length(unique(m1(:,n)));
end
if all(l == nRep)
valueRep = false;
end
nLoops = nLoops + 1;
end
result = m1;
For the parameters in the question it takes about 300 iterations to find a result.
I think you should approach this constructively.
It's easy to initially find a 19 groups that fulfill your conditions just by rearranging the series 1:19: series1 = repmat(1:19,1,10); and rearranged= reshape(series1,10,19)
then shuffle the values
I would select two random columns copy them and switch the values at two random positions
then make a test if it fulfills your condition - like: test = #(x) numel(unique(x))==10 - if yes replace your columns
just keep shuffling till your time runs out or you are happy
of course you might come up with more efficient shuffling or testing
I was given another solution through the MATLAB forum that works pretty well (Credit to Niklas Nylen over on the MATLAB forum). Computation time is pretty low too. It basically shuffles the numbers until there are no repetitions within every 10 values. Thanks all for your help.
y = repmat(1:19,1,10);
% Run enough iterations to get the output random enough, I selected 100000
for ii = 1:100000
% Select random index
index = randi(length(y)-1);
% Check if it is allowed to switch places
if y(index)~=y(min(index+10, length(y))) && y(index+1)~=y(max(1,index-9))
% Make the switch
yTmp = y(index);
y(index)=y(index+1);
y(index+1)=yTmp;
end
end