How to loop through a vector that corresponds to another vector MATLAB - matlab

I have a column of years from 1981 to 2000 that corresponds to another column of prices for a good. I am trying to make a loop that iterates through only the years from 1990 to 2000 and prints the prices in order that correlates with their year. I have this code so far but I'm not sure why it won't run, any help would be awesome.
for x=1:year == 1990:2000
v = find(isfinite(price));
v
end

If your input data is something like this where the first column is year and the second column is price
data = [1990, 2.50;
1991, 3.00;
...
2000, 4.00];
You can loop through the years in your for loop (Note the syntax and how this compares to the one in your post) and then find the second column where the price corresponds to that year using logical indexing.
for year = 1990:2000
% Grabs column 2 where column 1 is equal to the year
price = data(data(:,1) == year, 2);
end
Even if your data lives in two different data structures you can do something similar (as long as they are the same size).
years = [1990, 1991, 1992, ... 2000];
prices = [2.50, 3.00, 3.50, ... 4.00];
for year = 1990:2000
price = prices(years == year);
end
Edit
If you are for-loop averse, you can definitly do the same thing without a for loop. The most robust solution is to use arrayfun.
annualPrices = arrayfun(#(x)prices(years == x), years, 'uniform', 0);
This will return a cell array where each element is all prices for a given year.
If you're guaranteed to only have one price per year, however, you can omit the uniform input and you'll get an array of prices.
annualPrices = arrayfun(#(x)prices(years == x), years);
One of the benefits is that neither of these approaches requires extra operations (such as sorting) on your data.

Example 1:
Let's make a matrix holding your data:
M = ones(100,2); % 1st column for the year and the second column for the prices
M(:,1) = (1951:2050).';
M(:,2) = rand(100,1);
A one liner to your question can be as follows:
M((M(:,1)<= 2000 & M(:,1) >= 1990),2)
Example 2:
If you have prices and years in two vectors, first make sure your years are sorted:
[sortedYears,Idx] = sort(years); % sort the years vector
sortedPrices = prices(Idx); % use the index to sort the prices in the same order
Now use the following one liner:
sortedPrices((sortedYears<= 2000 & sortedYears >= 1990));

Related

How to create a matrix using a for loop MATLAB

I have three vectors of the same size, pressure, year and month. Basically I would like to create a matrix of pressure values that correspond with the months and years that they were measured using a for loop. It should be 12x100 in order to appear as 12 months going down and 100 years going left to right.
I am just unsure of how to actually create the matrix, besides creating the initial structure. So far I can only find pressure for a single month (below I did January) for all years.
A = zeros([12, 100]);
for some_years = 1900:2000
press = pressure(year == some_years & month == 1)
end
And I can only print the pressures for January for all years, but I would like to store all pressures for all months of the years in a matrix. If anyone can help it would be greatly appreciated. Thank You.
Starting with variables pressure, year, and month. I would do something like:
A fairly robust solution using for loops:
T = length(pressure); % get number of time periods. I will assume vectors same length
if(length(pressure) ~= T || length(month) ~= T)
error('length mismatch');
end
min_year = min(year); % this year will correspond to index 1
max_year = max(year);
A = NaN(max_year - min_year + 1, 12); % I like to initialize to NaN (not a number)
% this way missing values are NaN
for i=1:T
year_index = year(i) - min_year + 1;
month_index = month(i); % Im assuming months run from 1 to 12
A(year_index, month_index) = pressure(i);
end
If you're data is SUPER nicely formatted....
If your data has NO missing, duplicate, or out of order year month pairs (i.e. data is formatted like):
year month pressure
1900 1 ...
1900 2 ...
... ... ...
1900 12 ...
1901 1 ...
... ... ...
Then you could do the ONE liner:
A = reshape(pressure, 12, max(year) - min(year) + 1)';

find unique times among years in time series

Suppose I have a date vector shown here by tt and a corresponding data series corresponding to aa. For example:
dd = datestr(datenum('2007-01-01 00:00','yyyy-mm-dd HH:MM'):1/24:...
datenum('2011-12-31 23:00','yyyy-mm-dd HH:MM'),...
'yyyy-mm-dd HH:MM');
tt = datevec(datenum(dd,'yyyy-mm-dd HH:MM'));
tt(1002,:) = [];
aa = rand(length(tt),1)
How is it possible to ensure that the hours and days are consistent among the years?
For example, I only want to keep times that are the same among years e.g.
2009-01-01 01:00
would be the same as
2010-01-01 01:00
ad so on.
If one year has a measurements at
2009-01-01 02:00
but yyyy-01-01 02:00
is not present in the other years, this time should re removed.
I would like the to return tt and aa where only those times that are consistent among the years are kept. how can this be done?
I was considering finding the indices for the unique years first as:
[~,~,iyears] = unique(tt(:,1),'rows');
and then find the indices for the unique month, day, and hour as:
[~,~,iid] = unique(tt(:,2:4),'rows');
but I am not sure how to combine these to give the desired output?
The solution below uses a loop to store data in an unitialized array, which can be inefficient, but unless your dataset is huge (with many, many years) it should do the job. The general idea is to break the dataset up into years. I am storing the resulting time-vectors in a cell array, because they probably won't have the same length. I then do a set-intersection of all the time vectors, to get a vector of common times. From there it's straight forward.
years = unique(tt(:,1), 'rows');
% Put the "sub-times" of each year into cell array
for ii = 1:length(years)
times_each_year{ii} = tt(tt(:,1)==years(ii),2:end);
end
% Do intersection of all "sub-times" sets
common_times = times_each_year{1};
for ii = 2:length(years)
common_times = intersect(common_times, times_each_year{ii},'rows');
end
% Find and delete the points that are not member of the "sub-times":
idx = ~ismember(tt(:,2:end),common_times,'rows');
deleted_points = datestr(tt(idx,:)); % for later review
tt(idx,:) = [];
However, note that the deleted_points vector contains more points than one might expect. That's because 2008 was a leap year, and all the points corresponding to Febr. 29th were deleted.
Another such oddity might await you if your data is "contaminated" by daylight savings time.
Code
a1 = str2num(datestr(tt,'mmddHHMM')); %// If in your data minutes are always 00, you can use 'mmddHH' instead and save some runtime
k1 = unique(a1);
gt1 = histc(a1,k1);
valid_rows = ismember(a1,k1(gt1==max(gt1)));
new_tt = tt(valid_rows,:); %// Desired tt output
new_aa = aa(valid_rows,:); %// Desired aa output
Explanation
To understand how it works, let's test out the code at a micro-level. Let's assume some small data that corresponds to tt -
data1 = [4 5 1 4 5 1 4 5 6]
data1 is the data collected over few sets and resembles tt that has data over few years with month, date, hour and minutes when these four parameters are conglomerated into a single parameter.
One can notice it would represent data from three sets/years with data as {4,5}, {1,4,5} and {1,4,5,6}. Our job is to found out all those values in data1 that is repeated across all the three years/sets of data. Thus, the final output must be {4,5}.
Let's see how this can be coded up.
Step 1: Get the unique values
unique_val = unique(data1)
We would have - [1 4 5 6]
Step 2: Get the count of unique values in the data
count_unique_val = histc(data1,unique_val)
Output is - [2 3 3 1]
Step 3: Get the indices from the unique values array where their counts are equal to the maximum of counts, indicating those are the unique values that are repeated across all the sets.
index1 = count_unique_val==max(count_unique_val)
Output comes out as - [0 1 1 0]
Step 4: Get those "consistent" unique values
consistent_val = unique_val(index1)
Gives us - [4 5], which is what we were looking for.
Step 5: Finally get the indices where the consistent data is present,
which can be used later on to select the rows with "consistent" data.
index_consistent_val = ismember(data1,consistent_val)
Output is - [1 1 0 1 1 0 1 1 0], which makes sense too.
Please note that in the original code a1 = str2num(datestr(tt,'mmddHHMM')); gets us the single parameter from the four parameters of month, date, hour and minutes as discussed in the comments earlier too.

find the indices to calculate the monthly averages of some hours in a time series

If I have one year worth of data
Jday = datenum('2010-01-01 00:00','yyyy-mm-dd HH:MM'):1/24:...
datenum('2010-12-31 23:00','yyyy-mm-dd HH:MM');
dat = rand(length(Jday),1);
and would like to calculate the monthly averages of 'dat', I would use:
% monthly averages
dateV = datevec(Jday);
[~,~,b] = unique(dateV(:,1:2),'rows');
monthly_av = accumarray(b,dat,[],#nanmean);
I would, however, like to calculate the monthly averages for the points that occur during the day i.e. between hours 6 and 18, how can this be done?
I can isolate the hours I wish to use in the monthly averages:
idx = dateV(:,4) >= 6 & dateV(:,4) <= 18;
and can then change 'b' to include only these points by:
b(double(idx) == 0) = 0;
and then calculate the averages by
monthly_av_new = accumarray(b,dat,[],#nanmean);
but this doesn't work because accumarray can only work with positive integers thus I get an error
Error using accumarray
First input SUBS must contain positive integer subscripts.
What would be the best way of doing what I've outlined? Keep in mind that I do not want to alter the variable 'dat' when doing this i.e. remove some values from 'dat' prior to calculating the averages.
Thinking about it, would the best solution be
monthly_av = accumarray(b(idx),dat(idx),[],#nanmean);
You almost have it. Just use logical indexing with idx in b and in dat:
monthly_av_new = accumarray(b(idx),dat(idx),[],#nanmean);
(and the line b(double(idx) == 0) = 0; is no longer needed).
This way, b(idx) contains only the indices corresponding to your desired hour interval, and data(idx) contains the corresponding values.
EDIT: Now I see you already found the solution! Yes, I think it's the best approach.

How do I create ranking (descending) table in matlab based on inputs from two separate data tables? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have four data sets (please bear with me here):
1st Table: List of 10 tickers (stock symbols) in one column in txt format in matlab.
2nd table: dates in numerical format in one column (10 days in double format).
3rd table: I have 10*10 data set of random numbers (assume 0-1 for simplicity). (Earnings Per Share growth EPS for example)--so I want high EPS growth in my ranking for portfolio construction.
4th table: I have another 10*10 data set of random numbers (assume 0-1 for simplicity). (Price to earnings ratios for example daily).-so I want low P/E ratio in my ranking for portfolio construction.
NOW: I want to rank portfolio of stocks each day made up of 3 stocks (largest values) from table one for a particular day and bottom three stocks from table 2 (smallest values). The output must be list of tickers for each day (3 in this case) based on combined ranking of the two factors (table 3 & 4 as described).
Any ideas? In short I need to end up with a top bucket with three tickers...
It is not entirely clear from the post what you are trying to achieve. Here is a take based on guessing, with various options.
Your first two "tables" store symbols for stocks and days (irrelevant for ranking). Your third and fourth are scores arranged in a stock x day manner. Let's assume stocks vertical, days horizontal and stocks symbolized with a value in [1:10].
N = 10; % num of stocks
M = 10; % num of days
T3 = rand(N,M); % table 3 stocks x days
T4 = rand(N,M); % table 4 stocks x days
Sort the score tables in ascending and descending order (to get upper and lower scores per day, i.e. per column):
[Sl,L] = sort(T3, 'descend');
[Ss,S] = sort(T4, 'ascend');
Keep three largest and smallest:
largest = L(1:3,:); % bucket of 3 largest per day
smallest = S(1:3,:); % bucket of 3 smallest per day
IF you need the ones in both (0 is nan):
% Inter-section of both buckets
indexI = zeros(3,M);
for i=1:M
z = largest(ismember(largest(:,i),smallest(:,i)));
if ~isempty(z)
indexI(1:length(z),i) = z;
end
end
IF you need the ones in either one (0 is nan):
% Union of both buckets
indexU = zeros(6,M);
for i=1:M
z = unique([largest(:,i),smallest(:,i)]);
indexU(1:length(z),i) = z;
end
IF you need a ranking of scores/stocks from the set of largest_of_3 and smallest_of_4:
scoreAll = [Sl(1:3,:); Ss(1:3,:)];
indexAll = [largest;smallest];
[~,indexSort] = sort(scoreAll,'descend');
for i=1:M
indexBest(:,i) = indexAll(indexSort(1:3,i),i);
end
UPDATE
To get a weighted ranking of the final scores, define the weight vector (1 x scores) and use one of the two options below, before sorting scoreAllW instead of scoreAll:
w = [0.3 ;0.3; 0.3; 0.7; 0.7; 0.7];
scoreAllW = scoreAll.*repmat(w,1,10); % Option 1
scoreAllW = bsxfun(#times, scoreAll, w); % Option 2

Daily values to Monthly Means for Multiple Years Matlab

I have observed daily data that I need to compare to generated Monthly data so I need to get a mean of each month over the thirty year period.
My observed data set is currently in 365x31 with rows being each day (no leap years!) and the extra column being the month number (1-12).
the problem I am having is that I can only seem to get a script to get the mean of all years. ie. I cannot figure how to get the script to do it for each column separately. Example of the data is below:
1 12 14
1 -15 10
2 13 3
2 2 37
...all the way to 12 for 365 rows
SO: to recap, I need to get the mean of [12; -15; 13; 2] then [14; 10; 3; 37] and so on.
I have been trying to use the unique() function to loop through which works for getting the number rows to average but incorrect means. Now I need it to do each month(28-31 rows) and column individually. Result should be a 12x30 matrix. I feel like I am missing something SIMPLE. Code:
u = unique(m); %get unique values of m (months) ie) 1-12
for i=1:length(u)
month(i) = mean(obatm(u(i), (2:31)); % the average for each month of each year
end
Appreciate any ideas! Thanks!
You can simply filter the rows for each month and then apply mean, like so:
month = zeros(12, size(obatm, 2));
for k = 1:12
month(k, :) = mean(obatm(obatm(:, 1) == k, :));
end
EDIT:
If you want something fancy, you can also do this:
cols = size(obatm, 2) - 1;
subs = bsxfun(#plus, obatm(:, 1), (0:12:12 * (cols - 1)));
vals = obatm(:, 2:end);
month = reshape(accumarray(subs(:), vals(:), [12 * cols, 1], #mean), 12, cols)
Look, Ma, no loops!