Let's say I have a few matrices where the first column is a serial date, and the second column is the information for that specific date. These matrices are organized in a way that all of the dates are at a minimum number of consecutive days. In the example below (A), this number is 3. This being said, A has runs of 3 consecutive days, 4 consecutive days, 5 consecutive days, and so forth, making the consecutive day count 3+. My total set of matrices range from 2+ to 5+.
A=
694094 91
694095 92
694096 94
694097 86
694157 95
694158 99
694159 99
694160 97
694183 100
694184 99
694185 96
694505 94
694506 92
694507 89
...
I want to find a way to count the varying amount of consecutive days per year. That is, count the amount of 3-day events, 4-day events, and onward. So, using the above example, the output would look like:
B=
1900 3
1901 1
....
Which states that there are three 3+ consecutive day events in 1900, and according to the example, only one 3+ consecutive day event in 1901. The years come from the serial date numbers, and the documentation on that can be found here. My data ranges from 1900 to 2013.
So far I've tried to use the diff function to try and split the day events by the amount of 1s in a string, find those indices, and then use histc to count the events per year but I'm realizing that this is a failed approach. I'm sure accumarray could help in this situation -- but I'm still foggy on the function after going through examples on mathworks and SO.
Code
N = 3; %// 3 for 3+ events. Change it to 2 or 5 for 2+ and 5+ events respectively
%// Year IDs
year_ID = str2num(datestr(A(:,1),'yyyy'))
%// Binary array, where ones represent consecutive dates starting with zero
%// as the start of a pack of consecutive dates
diffA1 = [0 ; diff(A(:,1))==1]' %//'
%// Row numbers of A that signal the start of N+ events.
%// STRFIND here works like a "sliding-matcher" if I may call it that way.
%// It works with a matching window that slides across diffA1 to find N+ events
%// using a proper filter. Here a filter [0 1 1] is used for 3+ events
row_ID = strfind(diffA1,[0 ones(1,N-1)])
%// N+ events for each year
Nplus_event = year_ID(row_ID)
%// Desired output as a count of such N+ events against each year
B = [unique(Nplus_event) histc(Nplus_event,unique(Nplus_event))]
Related
To be generic the issue is: I need to create group means that exclude own group observations before calculating the mean.
As an example: let's say I have firms, products and product characteristics. Each firm (f=1,...,F) produces several products (i=1,...,I). I would like to create a group mean for a certain characteristic of the product i of firm f, using all products of all firms, excluding firm f product observations.
So I could have a dataset like this:
firm prod width
1 1 30
1 2 10
1 3 20
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
To reproduce the table:
firm=[1,1,1,2,2,2,3,3]
prod=[1,2,3,1,2,4,2,4]
hp=[30,10,20,25,15,40,10,35]
x=[firm' prod' hp']
Then I want to estimate a mean which will use values of all products of all other firms, that is excluding all firm 1 products. In this case, my grouping is at the firm level. (This mean is to be used as an instrumental variable for the width of all products in firm 1.)
So, the mean that I should find is: (25+15+40+10+35)/5=25
Then repeat the process for other firms.
firm prod width mean_desired
1 1 30 25
1 2 10 25
1 3 20 25
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
I guess my biggest difficulty is to exclude the own firm values.
This question is related to this page here: Calculating group mean/medians in MATLAB where group ID is in a separate column. But here, we do not exclude the own group.
p.s.: just out of curiosity if anyone works in economics, I am actually trying to construct Hausman or BLP instruments.
Here's a way that avoids loops, but may be memory-expensive. Let x denote your three-column data matrix.
m = bsxfun(#ne, x(:,1).', unique(x(:,1))); % or m = ~sparse(x(:,1), 1:size(x,1), true);
result = m*x(:,3);
result = result./sum(m,2);
This creates a zero-one matrix m such that each row of m multiplied by the width column of x (second line of code) gives the sum of other groups. m is built by comparing each entry in the firm column of x with the unique values of that column (first line). Then, dividing by the respective count of other groups (third line) gives the desired result.
If you need the results repeated as per the original firm column, use result(x(:,1))
Im doing a lab using matlab and have hit a bit of a snag. The prompt is:
a. Generate a vector to manipulate in the following exercises by using a random
number generator to create "pull-ups" counts for 50 people. The counts should be
from 1 to 10. Use this vector of counts for the next two problems.
b. How many people did more than 5 pull-ups? Do your results make sense for a
uniformly distributed random number generator?
c. Generate another vector for "pull-ups" counts 50 athletes, so this time use the
range from 11 to 20. Append this new vector to the previous vector (now you have
100 "pull-ups" counts).
d. Find the average number of "pull-ups" for the 100 total people. Do your results
make sense?
e. Use the 100 person vector in c and create a new vector that contains only the
counts from the odd-numbered índices (not the odd value counts, instead the
counts for every other person starting with person 1).
f. Use the 100 person vector in c and make a new vector of the "even-valued
counts".
Now, I can do parts a. and b. with no problem, but i do not have an idea on how to do part c. Ive been trying to do this
x=randi(20,11,50)
now i know that i get 110 values that range from 1 to 20 doing what i put above. But im trying to get 50 values from 11 to 20 and add those values to the vector in part a so that i have 100 values, with 50 ranging from 1-10 and the other 50 ranging from 11-20. Any idea what im doing wrong?
You need to provide an array as the first input to randi to specify the lower and upper limits of the random integers. If you specify just a scalar, then values between 1 and your provided values will be returned. The second and third inputs are the size of the output so we want the output to be 50 x 1
x = randi([11 20], 50, 1)
I have daily river flow data for 1975-2009 and I am asked to find the 7 consecutive days within each year that have the smallest flows.
Any advice how to start this? I've only been using MATLAB for a couple weeks.
Thanks!
You could convolve the data with ones(1,7) and look for the minimum, which will yield the starting day of your dry period:
[~,startingDay] = min(conv(flow,ones(1,7),'valid'))
(This is basically a moving average filter without the normalization).
Loop through the years to get each year's result.
Start by finding cumulative sum with cumsum. The difference between cumulative sums 7 days apart will give you the total for those 7 days. Then pick the minimum of those.
a = cumsum(flow);
b = a(8:end) - a(1:end-7);
[m,i] = min(b);
Here m holds the smallest total over 7 consecutive days, and i is a vector of indices telling you when they occurred.
Edit for clarity:
I have two matrices, p.valor 2x1000 and p.clase 1x1000. p.valor consists of random numbers spanning from -6 to 6. p.clase contains, in order, 200 1:s, 200 2:s and 600 3:s. What I wan´t to do is
Print p.valor using a diferent color/prompt for each clase determined in p.clase, as in following figure.
I first wrote this, in order to find out which locations in p.valor represented where the 1,2 respective 3 where in p.clase
%identify the locations of all 1,2 respective 3 in p.clase
f1=find(p.clase==1);
f2=find(p.clase==2);
f3=find(p.clase==3);
%define vectors in p.valor representing the locations of 1,2,3 in p.clase
x1=p.valor(f1);
x2=p.valor(f2);
x3=p.valor(f3);
There is 200 ones (1) in p.valor, thus, is x1=(1:200). The problem is that each number one(1) (and, respectively 2 and 3) represents TWO elements in p.valor, since p.valor has 2 rows. So even though p.clase and thus x1 now only have one row, I need to include the elements in the same colums as all locations in f1.
So the different alternatives I have tried have not yet been succesfull. Examples:
plot(x1(:,1), x1(:,2),'ro')
hold on
plot(x2(:,1),x2(:,2),'k.')
hold on
plot(x3(:,1),x3(:,2),'b+')
and
y1=p.valor(201:400);
y2=p.valor(601:800);
y3=p.valor(1401:2000);
scatter(x1,y1,'k+')
hold on
scatter(x2,y1,'b.')
hold on
scatter(x3,y1,'ro')
and
y1=p.valor(201:400);
y2=p.valor(601:800);
y3=p.valor(1401:2000);
plot(x1,y1,'k+')
hold on
plot(x2,y2,'b.')
hold on
plot(x3,y3,'ro')
My figures have the axisies right, but the plotted values does not match the correct figure provided (see top of the question).
Ergo, my question is: how do I include tha values on the second row in p.valor in my plotted figure?
I hope this is clearer!
Values from both rows simultaneously can be accessed using this syntax:
X=p.value(:,findX)
In this case, resulting X matrix will be a matrix having 2 rows and length(findX) columns.
M = magic(5)
M =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
M2 = M(1:2, :)
M2 =
17 24 1 8 15
23 5 7 14 16
Matlab uses column major indexing. So to get to the next row, you actually just have to add 1. Adding 2 to an index on M2 here gets you to the next column, or adding 5 to an index on M
e.g. M2(3) is 24. To get to the next row you just add one i.e. M2(4) returns 5.To get to the next column add the number of rows so M2(2 + 2) gets you 1. If you add the number of columns like you suggested you just get gibberish.
So your method is very wrong. Freude's method is 100% correct, it's much easier to use subscript indexing than linear indexing for this. But I just wanted to explain why what you were trying doesn't work in Matlab. (aside from the fact that X=p.value(findX findX+1000) gives you a syntax error, I assume you meant X=p.value([findX findX+1000]))
I have a series of times and returns in various matrices lets call them a b c. They are all x by 2 with column 1 being times in seconds and column 2 returns. While all the returns are over a set series of time intervals like 15s 30s 45s etc the problem is not all the matrices have all the time buckets so while a might be a 30 by 2 , b might only be a 28 by 2. Because it is missing say time 45 seconds and a return. I want to go through each matrix and where I am missing a time bucket I want to insert the bucket with a zero return - I am happy to create a control 30 by 1 matrix with all the times that need to be cross referenced
You can use ismember to locate these missing positions, so if a is the control vector and b is the missing data vector ind=find(ismember(a,b)==0); will give you the indices of a that are missing in b.
For example:
a=1:10;
b=[1:2 4:5 7:10];
ind=find(ismember(a,b)==0);
ind =
3 6
In order to to add zeros in the right places for b just
for n=1:numel(ind)
b=[b(1:ind(n)-1) , 0 , b(ind(n):end)];
end
b =
1 2 0 4 5 0 7 8 9 10