How to find the index of specific value in a vector? - matlab

I have a 100*20 matrix called pr (power receive in my case) the 100 represent number of users and 20 number of antennas each user receive certain power from the 20 antennas.(more than one user could receive power from the same antenna).
then i find the maximum power each user receive and put it in a 100*1 vector If this maximum values greater than (-112) counter increase. I need to create new vector 20*1 where 20 is the antennas number and count the number of users that receive power greater than(-112) for each antenna
[master_ant,id]=max(pr,[],2); %find vector of max values and vector of the corresponding index
for i=1:100
if(master_ant(i)>=-112) %check the rang
covered_user=covered_user+1;%counter increment
end
end
i tried this
[master_ant,id]=max(pr,[],2);
for i=1:100
if(master_ant(i)>=-112)
covered_user(id)=covered_user(id)+1;

The easiest way to do this is to consider another approach. The function sum, can actually (and is supposed to) do all the job for you.
a = randi([-130, -60],100,20); % Example matrix
covered_user = sum(a>=-112); % One-liner solution

Related

Vectorizing code - How to reduce MATLAB computational time

I have this piece of code
N=10^4;
for i = 1:N
[E,X,T] = fffun(); % Stochastic simulation. Returns every time three different vectors (whose length is 10^3).
X_(i,:)=X;
T_(i,:)=T;
GRID=[GRID T];
end
GRID=unique(GRID);
% Second part
for i=1:N
for j=1:(kmax)
f=find(GRID==T_(i,j) | GRID==T_(i,j+1));
s=f(1);
e=f(2)-1;
counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;
end
end
The code performs N different simulations of a stochastic process (which consists of 10^3 events, occurring at discrete moments (T vector) that depends on the specific simulation.
Now (second part) I want to know, as a function of time istant, how many simulations are in a particular state (X assumes value between 1 and 10). The idea I had: create a grid vector with all the moments at which something happens in any simulation. Then, looping over the simulations, loop over the timesteps in which something happens and incrementing all the counter indeces that corresponds to this particular slice of time.
However this second part is very heavy (I mean days of processing on a standard quad-core CPU). And it shouldn't.
Are there any ideas (maybe about comparing vectors in a more efficient way) to cut the CPU time?
This is a standalone 'second_part'
N=5000;
counter=zeros(11,length(GRID));
for i=1:N
disp(['Counting sim #' num2str(i)]);
for j=1:(kmax)
f=find(GRID==T_(i,j) | GRID==T_(i,j+1),2);
s=f(1);
e=f(2)-1;
counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;
end
end
counter=counter/N;
stop=find(GRID==Tmin);
stop=stop-1;
plot(counter(:,(stop-500):stop)')
with associated dummy data ( filedropper.com/data_38 ). In the real context the matrix has 2x rows and 10x columns.
Here is what I understand:
T_ is a matrix of time steps from N simulations.
X_ is a matrix of simulation state at T_ in those simulations.
so if you do:
[ut,~,ic]= unique(T_(:));
you get ic which is a vector of indices for all unique elements in T_. Then you can write:
counter = accumarray([ic X_(:)],1);
and get counter with no. of rows as your unique timesteps, and no. of columns as the unique states in X_ (which are all, and must be, integers). Now you can say that for each timestep ut(k) the number of time that the simulation was in state m is counter(k,m).
In your data, the only combination of m and k that has a value greater than 1 is (1,1).
Edit:
From the comments below, I understand that you record all state changes, and the time steps when they occur. Then every time a simulation change a state you want to collect all the states from all simulations and count how many states are from each type.
The main problem here is that your time is continuous, so basically each element in T_ is unique, and you have over a million time steps to loop over. Fully vectorizing such a process will need about 80GB of memory which will probably stuck your computer.
So I looked for a combination of vectorizing and looping through the time steps. We start by finding all unique intervals, and preallocating counter:
ut = unique(T_(:));
stt = 11; % no. of states
counter = zeros(stt,numel(ut));r = 1:size(T_,1);
r = 1:size(T_,1); % we will need that also later
Then we loop over all element in ut, and each time look for the relevant timestep in T_ in all simulations in a vectorized way. And finally we use histcounts to count all the states:
for k = 1:numel(ut)
temp = T_<=ut(k); % mark all time steps before ut(k)
s = cumsum(temp,2); % count the columns
col_ind = s(:,end); % fins the column index for each simulation
% convert the coulmns to linear indices:
linind = sub2ind(size(T_),r,col_ind.');
% count the states:
counter(:,k) = histcounts(X_(linind),1:stt+1);
end
This takes about 4 seconds at my computer for 1000 simulations, so it adds to a little more than one hour for the whole process. Not very quick...
You can try also one or two of the tweaks below to squeeze run time a little bit more:
As you can read here, accumarray seems to work faster in small arrays then histcouns. So may want to switch to it.
Also, computing linear indices directly is a quicker method than sub2ind, so you may want to try that.
implementing these suggestions in the loop above, we get:
R = size(T_,1);
r = (1:R).';
for k = 1:K
temp = T_<=ut(k); % mark all time steps before ut(k)
s = cumsum(temp,2); % count the columns
col_ind = s(:,end); % fins the column index for each simulation
% convert the coulmns to linear indices:
linind = R*(col_ind-1)+r;
% count the states:
counter(:,k) = accumarray(X_(linind),1,[stt 1]);
end
In my computer switching to accumarray and or removing sub2ind gain a slight improvement but it was not consistent (using timeit for testing on 100 or 1K elements in ut), so you better test it yourself. However, this still remains very long.
One thing that you may want to consider is trying to discretize your timesteps, so you will have much less unique elements to loop over. In your data about 8% of the time intervals a smaller than 1. If you can assume that this is short enough to be treated as one time step, then you could round your T_ and get only ~12.5K unique elements, which take about a minute to loop over. You can do the same for 0.1 intervals (which are less than 1% of the time intervals), and get 122K elements to loop over, what will take about 8 hours...
Of course, all the timing above are rough estimates using the same algorithm. If you do choose to round the times there may be even better ways to solve this.

How to identify an optimal subsample from a data set with missing values in MATLAB

I would like to identify the largest possible contiguous subsample of a large data set. My data set consists of roughly 15,000 financial time series of up to 360 periods in length. I have imported the data into MATLAB as a 360 by 15,000 numerical matrix.
This matrix contains a lot of NaNs due to some of the financial data not being available for the entire period. In the illustration, NaN entries are shown in dark blue, and non-NaN entries appear in light blue. It is these light blue non-NaN entries which I would like to ideally combine into an optimal subsample.
I would like to find the largest possible contiguous block of data that is contained in my matrix, while ensuring that my matrix contains a sufficient number of periods.
In a first step I would like to sort my matrix from left to right in descending order by the number of non-NaN entries in each column, that is, I would like to sort by the vector obtained by entering sum(~isnan(data),1).
In a second step I would like to find the sub-array of my data matrix that is at least 72 entries along the first dimension and is otherwise as large as possible, measured by the total number of entries.
What is the best way to implement this?
A big warning (may or may not apply depending on context)
As Oleg mentioned, when an observation is missing from a financial time series, it's often missing for reason: eg. the entity went bankrupt, the entity was delisted, or the instrument did not trade (i.e. illiquid). Constructing a sample without NaNs is likely equivalent to constructing a sample where none of these events occur!
For example, if this were hedge fund return data, selecting a sample without NaNs would exclude funds that blew up and ceased trading. Excluding imploded funds would bias estimates of expected returns upwards and estimates of variance or covariance downwards.
Picking a sample period with the fewest time series with NaNs would also exclude periods like the 2008 financial crisis, which may or may not make sense. Excluding 2008 could lead to an underestimate of how haywire things could get (though including it could lead to overestimate the probability of certain rare events).
Some things to do:
Pick a sample period as long as possible but be aware of the limitations.
Do your best to handle survivorship bias: eg. if NaNs represent delisting events, try to get some kind of delisting return.
You almost certainly will have an unbalanced panel with missing observations, and your algorithm will have to be deal with that.
Another general finance / panel data point, selecting a sample at some time point t and then following it into the future is perfectly ok. But selecting a sample based upon what happens during or after the sample period can be incredibly misleading.
Code that does what you asked:
This should do what you asked and be quite fast. Be aware of the problems though if whether an observation is missing is not random and orthogonal to what you care about.
Inputs are a T by n sized matrix X:
T = 360; % number of time periods (i.e. rows) in X
n = 15000; % number of time series (i.e. columns) in X
T_subsample = 72; % desired length of sample (i.e. rows of newX)
% number of possible starting points for series of length T_subsample
nancount_periods = T - T_subsample + 1;
nancount = zeros(n, nancount_periods, 'int32'); % will hold a count of NaNs
X_isnan = int32(isnan(X));
nancount(:,1) = sum(X_isnan(1:T_subsample, :))'; % 'initialize
% We need to obtain a count of nans in T_subsample sized window for each
% possible time period
j = 1;
for i=T_subsample + 1:T
% One pass: add new period in the window and subtract period no longer in the window
nancount(:,j+1) = nancount(:,j) + X_isnan(i,:)' - X_isnan(j,:)';
j = j + 1;
end
indicator = nancount==0; % indicator of whether starting_period, series
% has no NaNs
% number of nonan series of length T_subsample by starting period
max_subsample_size_by_starting_period = sum(indicator);
max_subsample_size = max(max_subsample_size_by_starting_period);
% find the best starting period
starting_period = find(max_subsample_size_by_starting_period==max_subsample_size, 1);
ending_period = starting_period + T_subsample - 1;
columns_mask = indicator(:,starting_period);
columns = find(columns_mask); %holds the column ids we are using
newX = X(starting_period:ending_period, columns_mask);
Here's an idea,
Assuming you can rearrange the series, calculate the distance (you decide the metric, but if looking at is nan vs not is nan, Hamming is ok).
Now hierarchically cluster the series and rearrange them using either a dendrogram
or http://www.mathworks.com/help/bioinfo/examples/working-with-the-clustergram-function.html
You should probably prune any series that doesn't have a minimum number of non nan values before you start.
First I have only little insight in financial mathematics. I understood it that you want to find the longest continuous chain of non-NaN values for each time series. The time series should be sorted depending on the length of this chain and each time series, not containing a chain above a threshold, discarded. This can be done using
data = rand(360,15e3);
data(abs(data) <= 0.02) = NaN;
%% sort and chop data based on amount of consecutive non-NaN values
binary_data = ~isnan(data);
% find edges, denote their type and calculate the biggest chunk in each
% column
edges = [2*binary_data(1,:)-1; diff(binary_data, 1)];
chunk_size = diff(find(edges));
chunk_size(end+1) = numel(edges)-sum(chunk_size);
[row, ~, id] = find(edges);
num_row_elements = diff(find(row == 1));
num_row_elements(end+1) = numel(chunk_size) - sum(num_row_elements);
%a chunk of NaN has a -1 in id, a chunk of non-NaN a 1
chunks_per_row = mat2cell(chunk_size .* id,num_row_elements,1);
% sort by largest consecutive block of non-NaNs
max_size = cellfun(#max, chunks_per_row);
[max_size_sorted, idx] = sort(max_size, 'descend');
data_sorted = data(:,idx);
% remove all elements that only have block sizes smaller then some number
some_number = 20;
data_sort_chop = data_sorted(:,max_size_sorted >= some_number);
Note that this can be done a lot simpler, if the order of periods within a time series doesn't matter, aka data([1 2 3],id) and data([3 1 2], id) are identical.
What I do not know is, if you want to discard all periods within a time series that don't correspond to the biggest value, get all those chains as individual time series, ...
Feel free to drop a comment if it has to be more specific.

Retrieve a specific permutation without storing all possible permutations in Matlab

I am working on 2D rectangular packing. In order to minimize the length of the infinite sheet (Width is constant) by changing the order in which parts are placed. For example, we could place 11 parts in 11! ways.
I could label those parts and save all possible permutations using perms function and run it one by one, but I need a large amount of memory even for 11 parts. I'd like to be able to do it for around 1000 parts.
Luckily, I don't need every possible sequence. I would like to index each permutation to a number. Test a random sequence and then use GA to converge the results to find the optimal sequence.
Therefore, I need a function which gives a specific permutation value when run for any number of times unlike randperm function.
For example, function(5,6) should always return say [1 4 3 2 5 6] for 6 parts. I don't need the sequences in a specific order, but the function should give the same sequence for same index. and also for some other index, the sequence should not be same as this one.
So far, I have used randperm function to generate random sequence for around 2000 iterations and finding a best sequence out of it by comparing length, but this works only for few number of parts. Also using randperm may result in repeated sequence instead of unique sequence.
Here's a picture of what I have done.
I can't save the outputs of randperm because I won't have a searchable function space. I don't want to find the length of the sheet for all sequences. I only need do it for certain sequence identified by certain index determined by genetic algorithm. If I use randperm, I won't have the sequence for all indexes (even though I only need some of them).
For example, take some function, 'y = f(x)', in the range [0,10] say. For each value of x, I get a y. Here y is my sheet length. x is the index of permutation. For any x, I find its sequence (the specific permutation) and then its corresponding sheet length. Based on the results of some random values of x, GA will generate me a new list of x to find a more optimal y.
I need a function that duplicates perms, (I guess perms are following the same order of permutations each time it is run because perms(1:4) will yield same results when run any number of times) without actually storing the values.
Is there a way to write the function? If not, then how do i solve my problem?
Edit (how i approached the problem):
In Genetic Algorithm, you need to crossover parents(permutations), But if you crossover permutations, you will get the numbers repeated. for eg:- crossing over 1 2 3 4 with 3 2 1 4 may result something like 3 2 3 4. Therefore, to avoid repetition, i thought of indexing each parent to a number and then convert the number to binary form and then crossover the binary indices to get a new binary number then convert it back to decimal and find its specific permutation. But then later on, i discovered i could just use ordered crossover of the permutations itself instead of crossing over their indices.
More details on Ordered Crossover could be found here
Below are two functions that together will generate permutations in lexographical order and return the nth permutation
For example, I can call
nth_permutation(5, [1 2 3 4])
And the output will be [1 4 2 3]
Intuitively, how long this method takes is linear in n. The size of the set doesn't matter. I benchmarked nth_permutations(n, 1:1000) averaged over 100 iterations and got the following graph
So timewise it seems okay.
function [permutation] = nth_permutation(n, set)
%%NTH_PERMUTATION Generates n permutations of set in lexographical order and
%%outputs the last one
%% set is a 1 by m matrix
set = sort(set);
permutation = set; %First permutation
for ii=2:n
permutation = next_permute(permutation);
end
end
function [p] = next_permute(p)
%Following algorithm from https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
%Find the largest index k such that p[k] < p[k+1]
larger = p(1:end-1) < p(2:end);
k = max(find(larger));
%If no such index exists, the permutation is the last permutation.
if isempty(k)
display('Last permutation reached');
return
end
%Find the largest index l greater than k such that p[k] < p[l].
larger = [false(1, k) p(k+1:end) > p(k)];
l = max(find(larger));
%Swap the value of p[k] with that of p[l].
p([k, l]) = p([l, k]);
%Reverse the sequence from p[k + 1] up to and including the final element p[n].
p(k+1:end) = p(end:-1:k+1);
end

Changing numbers for given indices between matrices

I'm struggling with one of my matlab assignments. I want to create 10 different models. Each of them is based on the same original array of dimensions 1x100 m_est. Then with for loop I am choosing 5 random values from the original model and want to add the same random value to each of them. The cycle repeats 10 times chosing different values each time and adding different random number. Here is a part of my code:
steps=10;
for s=1:steps
for i=1:1:5
rl(s,i)=m_est(randi(numel(m_est)));
rl_nr(s,i)=find(rl(s,i)==m_est);
a=-1;
b=1;
r(s)=(b-a)*rand(1,1)+a;
end
pert_layers(s,:)=rl(s,:)+r(s);
M=repmat(m_est',s,1);
end
for k=steps
for m=1:1:5
M_pert=M;
M_pert(1:k,rl_nr(k,1:m))=pert_layers(1:k,1:m);
end
end
In matrix M I am storing 10 initial models and want to replace the random numbers with indices from rl_nr matrix into those stored in pert_layers matrix. However, the last loop responsible for assigning values from pert_layers to rl_nr indices does not work properly.
Does anyone know how to solve this?
Best regards
Your code uses a lot of loops and in this particular circumstance, it's quite inefficient. It's better if you actually vectorize your code. As such, let me go through your problem description one point at a time and let's code up each part (if applicable):
I want to create 10 different models. Each of them is based on the same original array of dimensions 1x100 m_est.
I'm interpreting this as you having an array m_est of 100 elements, and with this array, you wish to create 10 different "models", where each model is 5 elements sampled from m_est. rl will store these values from m_est while rl_nr will store the indices / locations of where these values originated from. Also, for each model, you wish to add a random value to every element that is part of this model.
Then with for loop I am choosing 5 random values from the original model and want to add the same random value to each of them.
Instead of doing this with a for loop, generate all of your random indices in one go. Since you have 10 steps, and we wish to sample 5 points per step, you have 10*5 = 50 points in total. As such, why don't you use randperm instead? randperm is exactly what you're looking for, and we can use this to generate unique random indices so that we can ultimately use this to sample from m_est. randperm generates a vector from 1 to N but returns a random permutation of these elements. This way, you only get numbers enumerated from 1 to N exactly once and we will ensure no repeats. As such, simply use randperm to generate 50 elements, then reshape this array into a matrix of size 10 x 5, where the number of rows tells you the number of steps you want, while the number of columns is the total number of points per model. Therefore, do something like this:
num_steps = 10;
num_points_model = 5;
ind = randperm(numel(m_est));
ind = ind(1:num_steps*num_points_model);
rl_nr = reshape(ind, num_steps, num_points_model);
rl = m_est(rl_nr);
The first two lines are pretty straight forward. We are just declaring the total number of steps you want to take, as well as the total number of points per model. Next, what we will do is generate a random permutation of length 100, where elements are enumerated from 1 to 100, but they are in random order. You'll notice that this random vector uses only a value within the range of 1 to 100 exactly once. Because you only want to get 50 points in total, simply subset this vector so that we only get the first 50 random indices generated from randperm. These random indices get stored in ind.
Next, we simply reshape ind into a 10 x 5 matrix to get rl_nr. rl_nr will contain those indices that will be used to select those entries from m_est which is of size 10 x 5. Finally, rl will be a matrix of the same size as rl_nr, but it will contain the actual random values sampled from m_est. These random values correspond to those indices generated from rl_nr.
Now, the final step would be to add the same random number to each model. You can certainly use repmat to replicate a random column vector of 10 elements long, and duplicate them 5 times so that we have 5 columns then add this matrix together with rl.... so something like:
a = -1;
b = 1;
r = (b-a)*rand(num_steps, 1) + a;
r = repmat(r, 1, num_points_model);
M_pert = rl + r;
Now M_pert is the final result you want, where we take each model that is stored in rl and add the same random value to each corresponding model in the matrix. However, if I can suggest something more efficient, I would suggest you use bsxfun instead, which does this replication under the hood. Essentially, the above code would be replaced with:
a = -1;
b = 1;
r = (b-a)*rand(num_steps, 1) + a;
M_pert = bsxfun(#plus, rl, r);
Much easier to read, and less code. M_pert will contain your models in each row, with the same random value added to each particular model.
The cycle repeats 10 times chosing different values each time and adding different random number.
Already done in the above steps.
I hope you didn't find it an imposition to completely rewrite your code so that it's more vectorized, but I think this was a great opportunity to show you some of the more advanced functions that MATLAB has to offer, as well as more efficient ways to generate your random values, rather than looping and generating the values one at a time.
Hopefully this will get you started. Good luck!

Modifying matrix values ± a specific index value - MATLAB

I am attempting to create a model whereby there is a line - represented as a 1D matrix populated with 1's - and points on the line are generated at random. Every time a point is chosen (A), it creates a 'zone of exclusion' (based on an exponential function) such that choosing another point nearby has a much lower probability of occurring.
Two main questions:
(1) What is the best way to generate an exponential such that I can multiply the numbers surrounding the chosen point to create the zone of exclusion? I know of exppdf however i'm not sure if this allows me to create an exponential which terminates at 1, as I need the zone of exclusion to end and the probability to return to 1 eventually.
(2) How can I modify matrix values plus/minus a specific index (including that index)? I got as far as:
x(1:100) = 1; % Creates a 1D-matrix populated with 1's
p = randi([1 100],1,1);
x(p) =
But am not sure how to go about using the randomly generated number to alter values in the matrix.
Any help would be much appreciated,
Anna
Don't worry about exppdf, pick the width you want (how far away from the selected point does the probability return to 1?) and define some simple function that makes a small vector with zero in the middle and 1 at the edges. So here I'm just modifying a section of length 11 centred on p and doing nothing to the rest of x:
x(1:100)=1;
p = randi([1 100],1,1);
% following just scaled
somedist = (abs(-5:5).^2)/25;
% note - this will fail if p is at edges of data, but see below
x(p-5:p+5)=x(p-5:p+5).*somedist;
Then, instead of using randi to pick points you can use datasample which allows for giving weights. In this case your "data" is just the numbers 1:100. However, to make edges easier I'd suggest initialising with a "weight" vector which has zero padding - these sections of x will not be sampled from but stop you from having to make edge checks.
x = zeros([1 110]);
x(6:105)=1;
somedist = (abs(-5:5).^2)/25;
nsamples = 10;
for n = 1:nsamples
p = datasample(1:110,1,'Weights',x);
% if required store chosen p somewhere
x(p-5:p+5)=x(p-5:p+5).*somedist;
end
For an exponential exclusion zone you could do something like:
somedist = exp(abs(-5:5))/exp(5)-exp(0)/exp(5);
It doesn't quite return to 1 but fairly close. Here's the central region of x (ignoring the padding) after two separate runs: