Extracting values in a matrix based on values of another matrix without a for loop in Octave - matlab

I'd like to know if there's a way to operate on the values of one matrix, based separatedly on the values of each row from another matrix, without using a for loop.
One specific example below.
data is a ~500k row matrix with three columns: the first is a list of hours/dates (I have them as serial numbers) covering 24 hours a day for several years, the second column is a location ID, and the third one is the electricity cost in that location.
hours has two columns: the first one is a list of dates/hours as a serial number and the second one is a certain upper limit specific for that hour.
What I need to do is find, for every date and hour in hours, the largest value of electrical cost that is below the upper limit set in hours, and save the ID of the corresponding location in a third column of hours if the id is unique, or zero if there's no location that satisfy the condition or if there's more than one.
A symplified version of the code I'm using:
#Add a third column for the hours matrix
ids=zeros(rows(hours),1)
hours=horzcat(hours,ids)
for i=1:rows(hour)
#Get the data for all locations in that hour
idx=(data(:,1)==hour(i,1) )
hourlydata=(data(idx,:))
#Get the ID for the maximum below the limit in that hour
idx=(hourlydata(:,3)<hour(i,2))
idlimit=(hourlydata(idx,3)==max(hourlydata(idx,3))
dataid=hourlydata(idlimit,2)
#Check if the data exists and is unique
if(rows(dataid)==1)
id=dataid(1,1)
else
id=0
endif
#Save the ID
hour(i,3)=id
endfor
Is there any way to do this without using the for loop?
(I already optimized the data matrix to make it as small as possible, but it's still pretty big, so I might encounter memory constraints when trying to implement a solution)

You can work with arrayfun. Suppose your data is
data = [datenum('2014-01-17'), 1, 13;datenum('2014-01-18'), 2, 7]
hours = [datenum('2014-01-17'), 17; datenum('2014-01-18'), 3]
Then define a selector function
function id = select( d, limit, data )
idx = (data(:,1)==d );
hourlydata = (data(idx,:));
idx = (hourlydata(:,3)<limit);
idlimit = (hourlydata(idx,3)==max(hourlydata(idx,3)));
dataid=hourlydata(idlimit,2);
if length(dataid)==1
id=dataid(1,1)
else
id=0
end
end
Now we find a vector the same size as hours containing the id
arrayfun(#(d, limit) select(d, limit, data), hours(:,1), hours(:,2))
ans =
1
0
You can easily merge this vector with hours.
Now i doubt that this is much faster, but no loop. Works with MATLAB, haven't checked with Octave.

Related

MATLAB Extracting Column Number

My goal is to create a random, 20 by 5 array of integers, sort them by increasing order from top to bottom and from left to right, and then calculate the mean in each of the resulting 20 rows. This gives me a 1 by 20 array of the means. I then have to find the column whose mean is closest to 0. Here is my code so far:
RandomArray= randi([-100 100],20,5);
NewArray=reshape(sort(RandomArray(:)),20,5);
MeanArray= mean(transpose(NewArray(:,:)))
X=min(abs(x-0))
How can I store the column number whose mean is closest to 0 into a variable? I'm only about a month into coding so this probably seems like a very simple problem. Thanks
You're almost there. All you need is a find:
RandomArray= randi([-100 100],20,5);
NewArray=reshape(sort(RandomArray(:)),20,5);
% MeanArray= mean(transpose(NewArray(:,:))) %// gives means per row, not column
ColNum = find(abs(mean(NewArray,1))==min(abs(mean(NewArray,1)))); %// gives you the column number of the minimum
MeanColumn = RandomArray(:,ColNum);
find will give you the index of the entry where abs(mean(NewArray)), i.e. the absolute values of the mean per column equals the minimum of that same array, thus the index where the mean of the column is closest to 0.
Note that you don't need your MeanArray, as it transposes (which can be done by NewArray.', and then gives the mean per column, i.e. your old rows. I chucked everything in the find statement.
As suggested in the comment by Matthias W. it's faster to use the second output of min directly instead of a find:
RandomArray= randi([-100 100],20,5);
NewArray=reshape(sort(RandomArray(:)),20,5);
% MeanArray= mean(transpose(NewArray(:,:))) %// gives means per row, not column
[~,ColNum] = min(abs(mean(NewArray,1)));
MeanColumn = RandomArray(:,ColNum);

Changing numbers for given indices between matrices

I'm struggling with one of my matlab assignments. I want to create 10 different models. Each of them is based on the same original array of dimensions 1x100 m_est. Then with for loop I am choosing 5 random values from the original model and want to add the same random value to each of them. The cycle repeats 10 times chosing different values each time and adding different random number. Here is a part of my code:
steps=10;
for s=1:steps
for i=1:1:5
rl(s,i)=m_est(randi(numel(m_est)));
rl_nr(s,i)=find(rl(s,i)==m_est);
a=-1;
b=1;
r(s)=(b-a)*rand(1,1)+a;
end
pert_layers(s,:)=rl(s,:)+r(s);
M=repmat(m_est',s,1);
end
for k=steps
for m=1:1:5
M_pert=M;
M_pert(1:k,rl_nr(k,1:m))=pert_layers(1:k,1:m);
end
end
In matrix M I am storing 10 initial models and want to replace the random numbers with indices from rl_nr matrix into those stored in pert_layers matrix. However, the last loop responsible for assigning values from pert_layers to rl_nr indices does not work properly.
Does anyone know how to solve this?
Best regards
Your code uses a lot of loops and in this particular circumstance, it's quite inefficient. It's better if you actually vectorize your code. As such, let me go through your problem description one point at a time and let's code up each part (if applicable):
I want to create 10 different models. Each of them is based on the same original array of dimensions 1x100 m_est.
I'm interpreting this as you having an array m_est of 100 elements, and with this array, you wish to create 10 different "models", where each model is 5 elements sampled from m_est. rl will store these values from m_est while rl_nr will store the indices / locations of where these values originated from. Also, for each model, you wish to add a random value to every element that is part of this model.
Then with for loop I am choosing 5 random values from the original model and want to add the same random value to each of them.
Instead of doing this with a for loop, generate all of your random indices in one go. Since you have 10 steps, and we wish to sample 5 points per step, you have 10*5 = 50 points in total. As such, why don't you use randperm instead? randperm is exactly what you're looking for, and we can use this to generate unique random indices so that we can ultimately use this to sample from m_est. randperm generates a vector from 1 to N but returns a random permutation of these elements. This way, you only get numbers enumerated from 1 to N exactly once and we will ensure no repeats. As such, simply use randperm to generate 50 elements, then reshape this array into a matrix of size 10 x 5, where the number of rows tells you the number of steps you want, while the number of columns is the total number of points per model. Therefore, do something like this:
num_steps = 10;
num_points_model = 5;
ind = randperm(numel(m_est));
ind = ind(1:num_steps*num_points_model);
rl_nr = reshape(ind, num_steps, num_points_model);
rl = m_est(rl_nr);
The first two lines are pretty straight forward. We are just declaring the total number of steps you want to take, as well as the total number of points per model. Next, what we will do is generate a random permutation of length 100, where elements are enumerated from 1 to 100, but they are in random order. You'll notice that this random vector uses only a value within the range of 1 to 100 exactly once. Because you only want to get 50 points in total, simply subset this vector so that we only get the first 50 random indices generated from randperm. These random indices get stored in ind.
Next, we simply reshape ind into a 10 x 5 matrix to get rl_nr. rl_nr will contain those indices that will be used to select those entries from m_est which is of size 10 x 5. Finally, rl will be a matrix of the same size as rl_nr, but it will contain the actual random values sampled from m_est. These random values correspond to those indices generated from rl_nr.
Now, the final step would be to add the same random number to each model. You can certainly use repmat to replicate a random column vector of 10 elements long, and duplicate them 5 times so that we have 5 columns then add this matrix together with rl.... so something like:
a = -1;
b = 1;
r = (b-a)*rand(num_steps, 1) + a;
r = repmat(r, 1, num_points_model);
M_pert = rl + r;
Now M_pert is the final result you want, where we take each model that is stored in rl and add the same random value to each corresponding model in the matrix. However, if I can suggest something more efficient, I would suggest you use bsxfun instead, which does this replication under the hood. Essentially, the above code would be replaced with:
a = -1;
b = 1;
r = (b-a)*rand(num_steps, 1) + a;
M_pert = bsxfun(#plus, rl, r);
Much easier to read, and less code. M_pert will contain your models in each row, with the same random value added to each particular model.
The cycle repeats 10 times chosing different values each time and adding different random number.
Already done in the above steps.
I hope you didn't find it an imposition to completely rewrite your code so that it's more vectorized, but I think this was a great opportunity to show you some of the more advanced functions that MATLAB has to offer, as well as more efficient ways to generate your random values, rather than looping and generating the values one at a time.
Hopefully this will get you started. Good luck!

How do I loop a portfolio return calculation for multiple periods in matlab?

I have the following data/sets (simplified version to make the example clear):
3*10 matrix of stocks identified by a number for a given period (3 stocks (my portfolio) in rows * by 10 days (in columns) named indexBest.
10*10 matrix of returns for each period for each security in my universe named dailyret (my universe of stocks is 10).
I want to create a loop where I am able to calculate the sum returns of each portfolio for each period into one matrix ideally 1*10 or 10*1 (returns * date or vice versa).
I have done this for a single period for a portfolio, see below: but I want to be able to automate this process instead of doing all this 10* over for each portfolio for each period. Please help!
Portfolio_1_L= indexBest(:,1); %helps me get the portfolio constituents for period one (3 stocks basically).
Returns_1_L= dailyret(Portfolio_1(:,1));%to calculate returns of the portfolio for period 1 I have referenced the extraction of returns to the portfolio constituents.
Sum_ret_Port_1_L=sum(Returns_1_L); %sum return of the portfolio for period one
How do I loop this process for all other 9 periods?
Use a for loop and use the index variable in place of hardcoded 1 in your example code. Also index the output to store the result for each day.
for day = 1:10
Portfolio_1_L = indexBest(:,day);
Returns_1_L = dailyret(Portfolio_1_L);
Sum_ret_Port_1_L(day) = sum(Returns_1_L);
end

indexing to find corresponding number

I have a time series of measurements taken at different depths of a water column. I have divided these into individual cells (for later) and require some help on how to complete the following: e.g.
time = [733774,733774,733775,733775,733775,733776,733776];
bthD = [20,10,0,15,10,20,10];
bthA = (1000:100:1600);
%Hypsographic
Hypso = [(10:1:20)',(1000:100:2000)'];
d = [1,1.3,1,2.5,2.5,1,1.2];
data = horzcat(time',bthD',d');
uniqueTimes = unique(time);
counts = hist(time,uniqueTimes);
newData = mat2cell(data,counts,length(uniqueTimes));
So, in newData I have three cells, that correspond to different days of measurements, in each cell I have newData(:,1) being time, newData(:,2) being depth, and newData(:,3) being the measurement. I would like to find what the area is at each depth in the cells, the area at different depths is given in the variable 'Hypso'.
How could I achieve this?
Your problem formulation is excellent! Very easy to understand what you need here. All you need is the function interp1. Use the first column of Hypso, I assume, as your depth, and the second column as the area. You can use the vectorized ability of the interp1 function to find all values in one call:
areaAtDepth = interp1(Hypso(:,1),Hypso(:,2),bthD)
areaAtDepth =
Columns 1 through 6
2000 1000 NaN 1500 1000 2000
Column 7
1000
You'll notice the Nan in the third column of the output. This is because it's associated depth, 0, is outside the range of the data, or support of the data I believe. You'll need to decide what you want to do when data is outside the range, or perhaps it never should be, so an error should be logged; it's up to you! Let me know if you have any more questions!

Matlab: Code Performance Issue Using "ismember"

I need this section of my code to run faster, as it is called many many times. I am new to Matlab and I feel as though there MUST be a way to do this that is not so round-about. Any help you could give on how to improve the speed of what I have or other functions to look into that would help me perform this task would be appreciated.
(Task is to get only lines of "alldata" where the first column is in the set of "minuteintervals" into "alldataMinutes". "minuteintervals" is just the minimum value of "alldata" column one increasing by twenty to the maximum of alldata.
minuteintervals= min(alldata(:,1)):20:max(alldata(:,1)); %20 second intervals
alldataMinutes= zeros(30000,4);
counter=1;
for x=1:length(alldata)
if ismember(alldata(x,1), minuteintervals)
alldataMinutes(counter,:)= alldata(x,:);
counter= counter+1;
end
end
alldataMinutes(counter:length(alldataMinutes),:)= [];
This should give you what you want, and it should be substantially faster:
minuteintervals = min(alldata(:,1)):20:max(alldata(:,1)); %# Interval set
index = ismember(alldata(:,1),minuteintervals); %# Logical index showing first
%# column values in the set
alldataMinutes = alldata(index,:); %# Extract the corresponding rows
This works by passing a vector of values to the function ISMEMBER, instead of passing values one at a time. The output index is a logical vector the same size as alldata(:,1), with a value of 1 (i.e. true) for elements of alldata(:,1) that are in the set minuteintervals, and a value of 0 (i.e. false) otherwise. You can then use logical indexing to easily extract the rows corresponding to the ones in index, placing them in alldataMinutes.