How do I create ranking (descending) table in matlab based on inputs from two separate data tables? [closed] - matlab

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have four data sets (please bear with me here):
1st Table: List of 10 tickers (stock symbols) in one column in txt format in matlab.
2nd table: dates in numerical format in one column (10 days in double format).
3rd table: I have 10*10 data set of random numbers (assume 0-1 for simplicity). (Earnings Per Share growth EPS for example)--so I want high EPS growth in my ranking for portfolio construction.
4th table: I have another 10*10 data set of random numbers (assume 0-1 for simplicity). (Price to earnings ratios for example daily).-so I want low P/E ratio in my ranking for portfolio construction.
NOW: I want to rank portfolio of stocks each day made up of 3 stocks (largest values) from table one for a particular day and bottom three stocks from table 2 (smallest values). The output must be list of tickers for each day (3 in this case) based on combined ranking of the two factors (table 3 & 4 as described).
Any ideas? In short I need to end up with a top bucket with three tickers...

It is not entirely clear from the post what you are trying to achieve. Here is a take based on guessing, with various options.
Your first two "tables" store symbols for stocks and days (irrelevant for ranking). Your third and fourth are scores arranged in a stock x day manner. Let's assume stocks vertical, days horizontal and stocks symbolized with a value in [1:10].
N = 10; % num of stocks
M = 10; % num of days
T3 = rand(N,M); % table 3 stocks x days
T4 = rand(N,M); % table 4 stocks x days
Sort the score tables in ascending and descending order (to get upper and lower scores per day, i.e. per column):
[Sl,L] = sort(T3, 'descend');
[Ss,S] = sort(T4, 'ascend');
Keep three largest and smallest:
largest = L(1:3,:); % bucket of 3 largest per day
smallest = S(1:3,:); % bucket of 3 smallest per day
IF you need the ones in both (0 is nan):
% Inter-section of both buckets
indexI = zeros(3,M);
for i=1:M
z = largest(ismember(largest(:,i),smallest(:,i)));
if ~isempty(z)
indexI(1:length(z),i) = z;
end
end
IF you need the ones in either one (0 is nan):
% Union of both buckets
indexU = zeros(6,M);
for i=1:M
z = unique([largest(:,i),smallest(:,i)]);
indexU(1:length(z),i) = z;
end
IF you need a ranking of scores/stocks from the set of largest_of_3 and smallest_of_4:
scoreAll = [Sl(1:3,:); Ss(1:3,:)];
indexAll = [largest;smallest];
[~,indexSort] = sort(scoreAll,'descend');
for i=1:M
indexBest(:,i) = indexAll(indexSort(1:3,i),i);
end
UPDATE
To get a weighted ranking of the final scores, define the weight vector (1 x scores) and use one of the two options below, before sorting scoreAllW instead of scoreAll:
w = [0.3 ;0.3; 0.3; 0.7; 0.7; 0.7];
scoreAllW = scoreAll.*repmat(w,1,10); % Option 1
scoreAllW = bsxfun(#times, scoreAll, w); % Option 2

Related

Sort Matlab data into groups

I have a column of numerical data (imported from excel) and I would like to sort each of the column entries into 4 different groups based on custom size ranges, then calculate how many column entries are in each group, as a fraction of the total number of entries in the column.
For example, if my column was 1,3,13,11,5,9. I want to calculate how many entries fit into group 1-3, how many fit into group 4-7, and so on. Then calculate the amount of entries in each group as a fraction of the total number of column entries. ie, 6 in this example.
Does anyone know how to do this best?
Thanks
Hannah :)
Sry I misread your question:
here is the updated code
ranges = [1 3
4 7
8 11
12 13];
groups = size(ranges,1);
a = [ 1,3,13,11,5,9];
counter = zeros(groups,1);
for i=1:groups
counter(i) = sum(a>=ranges(i,1) & a<=ranges(i,2));
end
relative_counter = counter / numel(a);
Old answer:
I do not understand how you get your group bounds (in your question the first group has 3 elements and the 2nd group has 4?)
have a look at the following code. (be careful and test how it should behave at group boarders)
groups =4;
a = [ 1,3,13,11,5,9];
range = max(a)-min(a);
rangePerGroup = range/groups;
a_noOffset = a-min(a);
counter = zeros(groups,1);
for i=1:groups
counter(i) = sum(a_noOffset>=rangePerGroup*(i-1) & a_noOffset<=rangePerGroup*i);
end
relative_counter = counter / numel(a);

Length scaling orientation data in MATLAB

Problem
I have a data set of describing geological structures. Each structure has a row with two attributes - its length and orientation (0-360 degrees).
Within this data set, there are two types of structure.
Type 1: less data points, but the structures are physically larger (large length, and so more significant).
Type 2: more data points, but the structures are physically smaller (small length, and so less significant).
I want to create a rose plot to show the spread of the structures' orientations. However, I want this plot to also represent the significance of the structures in combination with the direction they face - taking into account the lengths.
Is it possible to scale this by length in MATLAB somehow so that the subset which is less numerous is not under represented, when the structures are large?
Example
A data set might contain:
10 structures orientated North-South, 50km long.
100 structures orientated East-West, 0.5km long.
In this situation the East-West population would look to be more significant than the North-South population based on absolute numbers. However, in reality the length of the members contributing to this population are much smaller and so the structures are less significant.
Code
This is the code I have so far:
load('WG_rose_data.xy')
azimuth = WG_rose_data(:,2);
length = WG_rose_data(:,1);
rose(azimuth,20);
Where WG_rose_data.xy is a data file with 2 columns containing the length and azimuth (orientation) data for the geological structures.
For each row in your data, you could duplicate it a given number of times, according to its length value. Therefore, if you had a structure with length 50, it counts for 50 data points, whereas a structure with length 1 only counts as 1 data point. Of course you have to round your lengths since you can only have integer numbers of rows.
This could be achieved like so, with your example data in the matrix d
% Set up example data: 10 large vertical structures, 100 small ones perpendicular
d = [repmat([0, 50], 10, 1); repmat([90, .5], 100, 1)];
% For each row, duplicate the data in column 1, according to the length in column 2
d1 = [];
for ii = 1:size(d,1)
% make d(ii,2) = length copies of d(ii,1) = orientation
d1(end+1:end+ceil(d(ii,2))) = d(ii,1);
end
Output rose plot:
You could fine tune how to duplicate the data to achieve the desired balance of actual data and length weighting.
Thanks for all the help with this. This code is my final working version for reference:
clear all
close all
% Input dataset
original_data = load('WG_rose_data.xy');
d = [];
%reformat azimuth
d(:,1)= original_data(:,2);
%reformat length
d(:,2)= original_data(:,1);
% For each row, duplicate the data in column 1, according to the length in column 2
d1 = [];
for a = 1:size(d,1)
d1(end+1:end+ceil(d(a,2))) = d(a,1);
end
%create oposite directions for rose diagram
length_d1_azi = length(d1);
d1_op_azi=zeros(1,length_d1_azi);
for i = 1:length_d1_azi
d1_op_azi(i)=d1(i)-180;
if d1_op_azi(i) < 1;
d1_op_azi(i) = 360 - (d1_op_azi(i)*-1);
end
end
%join calculated oposites to original input
new_length = length_d1_azi*2;
all=zeros(new_length,1);
for i = 1:length_d1_azi
all(i)=d1(i);
end
for j = length_d1_azi+1:new_length;
all(j)=d1_op_azi(j-length_d1_azi);
end
%convert input aray into radians to plot
d1_rad=degtorad(all);
rose(d1_rad,24)
set(gca,'View',[-90 90],'YDir','reverse');

How can I loop through dates and store weight values in a matrix? [MATLAB]

Hi so I am new to MATLAB. I am trying to find the means of weight values for each month over five years and put these values into a matrix that will be 5x12 in size.
I am attempting to accomplish this with a loop but I'm having a little trouble, if anyone can push me towards the right direction that would be awesome, thanks. What I have so far is this:
weight_data = (10 weights per month for 10 years, 1200 weights total)
year = (years 2000-2010) %year 1-10 corresponds with the 1200 weights)
month = (months 1-12) %weights for all months (120 months, correspond with 1200 weights)
weight_vec = zeros([12, 5]);
for n = year(1:5)
weights = weight_data(n);
mean_weights = mean(weights);
end
This only gives me one number though, I assume the mean from the 5 years I'm trying to loop through. I also know I need to incorporate the months somehow but I'm just confused on how to do this.
Being your weight_data matrix (10 weights x 12 months x 10 years), the weight of the i-th year are located in weight_data(:,:,i).
w=weight_data(:,:,i)
w is a (10 x 12) which contains the 10 weights values of the 12 months.
You can use mean to compute the mean value of the weights of each month:
w=mean(weight_data(:,:,i))
Therefore you can setup a loop over the years:
for i=1:1:5
mean_weights(:,i)=mean(weight_data(:,:,i))'
end
(the ' in mean(weight_data(:,:,i))' is required to transpose the output of mean from a row-array to a column-array so that it fits in your output matrix which is (12 x 5)
Hope this helps,
Qapla'

How to loop through a vector that corresponds to another vector MATLAB

I have a column of years from 1981 to 2000 that corresponds to another column of prices for a good. I am trying to make a loop that iterates through only the years from 1990 to 2000 and prints the prices in order that correlates with their year. I have this code so far but I'm not sure why it won't run, any help would be awesome.
for x=1:year == 1990:2000
v = find(isfinite(price));
v
end
If your input data is something like this where the first column is year and the second column is price
data = [1990, 2.50;
1991, 3.00;
...
2000, 4.00];
You can loop through the years in your for loop (Note the syntax and how this compares to the one in your post) and then find the second column where the price corresponds to that year using logical indexing.
for year = 1990:2000
% Grabs column 2 where column 1 is equal to the year
price = data(data(:,1) == year, 2);
end
Even if your data lives in two different data structures you can do something similar (as long as they are the same size).
years = [1990, 1991, 1992, ... 2000];
prices = [2.50, 3.00, 3.50, ... 4.00];
for year = 1990:2000
price = prices(years == year);
end
Edit
If you are for-loop averse, you can definitly do the same thing without a for loop. The most robust solution is to use arrayfun.
annualPrices = arrayfun(#(x)prices(years == x), years, 'uniform', 0);
This will return a cell array where each element is all prices for a given year.
If you're guaranteed to only have one price per year, however, you can omit the uniform input and you'll get an array of prices.
annualPrices = arrayfun(#(x)prices(years == x), years);
One of the benefits is that neither of these approaches requires extra operations (such as sorting) on your data.
Example 1:
Let's make a matrix holding your data:
M = ones(100,2); % 1st column for the year and the second column for the prices
M(:,1) = (1951:2050).';
M(:,2) = rand(100,1);
A one liner to your question can be as follows:
M((M(:,1)<= 2000 & M(:,1) >= 1990),2)
Example 2:
If you have prices and years in two vectors, first make sure your years are sorted:
[sortedYears,Idx] = sort(years); % sort the years vector
sortedPrices = prices(Idx); % use the index to sort the prices in the same order
Now use the following one liner:
sortedPrices((sortedYears<= 2000 & sortedYears >= 1990));

matlab updating time vector

I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];