To get sum values of other columns based on first column values - matlab

An excel file contains 5 columns; first column contains year (1987 to 2080), second column contains month, third column contains days, fourth and fifth column contain values. I would like to get the sum values of columns four and five according to year in column one. For example, I would like to get the sum values of column four and five for year 1987, then 1988, then 1989...so on.!
Example of data file is attached
I have tried the following code considering that each year contains 365 days.
n=1;
for i=1:365:size(data,1)
Total(n,:) = sum(data(i:i+365-1,:));
n=n+1;
end
But the problem is that not all the years contain 365 days. Some of them (e.g. 1988, 1992) contain 366 days in a year as they are leap year. In those cases, the sum results become incorrect.
Looking for your help to get the sum values of columns 4 and 5 according to the year in column 1.
It would be greatly appreciated.

UPDATE: much faster solution at the end!
It can be done as follows with one line for each column:
% some example data
years = ceil(1987:0.3:2080)';
months = randi(12,numel(years),1);
days = randi(30,numel(years),1);
values = randi(42,numel(years),2);
% data similar to yours;
data = [ years months days values ];
That would be the easy readable long way:
% years
y = data(:,1)
% unique years
uy = unique(y);
% for column 4
C4 = arrayfun(#(x) sum( data(y == x, 4) ), uy )
% for column 5
C5 = arrayfun(#(x) sum( data(y == x, 5) ), uy )
or just short in one line per column:
C4 = arrayfun(#(x) sum( data( (data(:,1) == x), 4) ), unique(data(:,1)) )
returning a 94x1 double array with all sums for all 94 unique years of the example data.
If you want to arrange it somehow you could do it as follows:
summary = [uy, C4, C5]
returning something like:
summary = %//sum of sum of
column 4 column 5
1987 3 3
1988 40 40
1989 56 56
1990 96 96
1991 54 54
1992 15 15
1993 73 73
1994 42 42
1995 66 66
1996 56 56
...
You could also do all columns at once. Already for just 2 column it should be 50% faster.
cols = 4:5;
C = cell2mat( arrayfun(#(x) sum( data(y == x, cols),1 ), uy,'uni',0 ) )
The problem with that solution is, that you have a matrix of about 30000x5 size, and for every unique years it will apply the indexing on the whole matrix to "search" for the current year which is summed up. But actually there is an in-built function doing exactly that:
A simpler and much faster solution you can achieve using accumarray:
[~,~, i_uy] = unique(data(:,1));
C4 = accumarray(i_uy,data(:,4));
C5 = accumarray(i_uy,data(:,5));

Related

Matlab: Count till sum equals 360 > insert event1, next 360 >insert event 2 etc

I have been trying to solve this problem for a while now and I would appreciate a push in the right direction.
I have a matrix called Turn. This matrix contains 1 column of data, somewhere between 10000 and 15000 rows (is variable). What I like to do is as follows:
start at row 1 and add values of row 2, row 3 etc till sum==360. When sum==360 insert in column 2 at that specific row 'event 1'.
Start counting at the next row (after 'event 1') till sum==360. When sum==360 insert in column 2 at that specific row 'event 2'. etc
So I basically want to group my data in partitions of sum==360
these will be called events.
The row number at which sum==360 is important to me as well (every row is a time point so it will tells me the duration of an event). I want to put those row numbers in a new matrix in which on row 1: rownr event 1 happened, row 2: rownr event 2 happened etc.
You can find the row indices where events occur using the following code. Basically you're going to use the modulo operator to find where the sum of the first column of Turn is a multiple of 360.
mod360 = mod(cumsum(Turn(:,1)),360);
eventInds = find(mod360 == 0);
You could then loop over eventInds to place whatever values you'd like in the appropriate rows in the second column of Turn.
I don't think you'll be able to place the string 'event 1' in the column though as a string array is acts like a vector and will result in a dimension mismatch. You could just store the numerical value 1 for the first event and 2 for the second event and so on.
Ryan's answer looks like the way to go. But if your condition is such that you need to find row numbers where the cumulative sum is not exactly 360, then you would be required to do a little more work. For that case, you may use this -
Try this vectorized (and no loops) code to get the row IDs where the 360 grouping occurs -
threshold = 360;
cumsum_val = cumsum(Turn);
ind1 = find(cumsum_val>=threshold,1)
num_events = floor(cumsum_val(end)/threshold);
[x1,y1] = find(bsxfun(#gt,cumsum_val,threshold.*(1:num_events)));
[~,b,~] = unique(y1,'first');
row_nums = x1(b)
After that you can get the event data, like this -
event1 = Turn(1:row_nums(1));
event2 = Turn(row_nums(1)+1:row_nums(2));
event3 = Turn(row_nums(2)+1:row_nums(3));
...
event21 = Turn(row_nums(20)+1:row_nums(21));
...
eventN = Turn(row_nums(N-1)+1:row_nums(N));
Edit 1
Sample case:
We create a small data of 20 random integer numbers instead of 15000 as used for the original problem. Also, we are using a threshold of 30 instead of 360 to account for the small datasize.
Code
Turn = randi(10,[20 1]);
threshold = 30;
cumsum_val = cumsum(Turn);
ind1 = find(cumsum_val>=threshold,1)
num_events = floor(cumsum_val(end)/threshold);
[x1,y1] = find(bsxfun(#gt,cumsum_val,threshold.*(1:num_events)));
[~,b,~] = unique(y1,'first');
row_nums = x1(b);
Run
Turn =
7
6
3
4
5
3
9
2
3
2
3
5
4
10
5
2
10
10
5
2
threshold =
30
row_nums =
7
14
18
The run results shows the row_nums as 7, 14, 18, which mean that the second grouping starts with the 7th index in Turn, third grouping starts at 14th index and so on. Of course, you can append 1 at the beginning of row_nums to indicate that the first grouping starts at the 1st index.
Given a column vector x, say,
x = randi(100,10,1)
the following would give you the index of the first row where the cumulative sum off all the items above that row adds up to 360:
i = max( find( cumsum(x) <= 360) )
Then, you would have to use that index to find the next set of cumulative sums that add up to 360, something like
offset = max( find( cumsum(x(i+1:end)) <= 360 ) )
i_new = i + offset
You might need to add +1/-1 to the offset and the index.
>> x = randi(100,10,1)'
x =
90 47 47 44 8 79 45 9 91 6
>> cumsum(x)
ans =
90 137 184 228 236 315 360 369 460 466
>> i = max(find(cumsum(x)<=360))
i =
7

plotting the mean and standard deviation of data when there is more than one piece of data collected at a particular point

I have a matrix, in one columnn is the day of year and in the other is the data associated with that day of year. On some days of the year there are multiple data points, while others there is one or none. This makes it difficult to plot the information, what I would like to do is plot the data based on the mean and standard deviation of the data. So if data was collected three times on the 320th day of the year then the mean and standard deviation of these three data points would be found out and then when plotted the mean line would go through the mean and the standard deviation would represent error bars. So just say the data is:
DOY DATA
30, 12
30, 10
30, 8
120, 6
110, 5
I'd Like to transform it to:
DOY DATA STD
30, 10, 2
120, 6, 0
110, 5, 0
I then wish to plot this data with the standard deviation representing error bars.
How would I go about this?
Thanks
You can use Matlab's dataset to get easy grouping -
>> doy = [30 30 30 120 110]';
>> data = [12 10 8 6 5]';
The next line creates a dataset object with two columns, called "doy" and "data"
>> ds = dataset(doy, data);
This line says to calculate group statistics, using "doy" as the grouping variable, and computing the mean and std for each group. It also gives you the number of variables in each group in the column GroupCount.
>> grpstats(ds, 'doy', {'mean', 'std'})
ans =
doy GroupCount mean_data std_data
30 30 3 10 2
110 110 1 5 0
120 120 1 6 0
You could also use accumarray especially if you don't have the stats toolbox:
doy = [30 30 30 120 110]';
data = [12 10 8 6 5]';
[~,ind,subs] = unique(DOY);
means = accumarray(subs, data, size(ind), #mean);
stds = accumarray(subs, data, size(ind), #std);
final = [DOY(ind), means, stds]

leap year mean in matlab

good morning. I have a matrix (eout) with 14610 rows and 16 column.
14610 row represent one of each day of the period between 01-01-1960 and 31-12-2000.
What I need is a new matrix with 40 rows and 16 columns with a mean for each year.
something like a mean of 365 rows continuously.
The problem I have are the leap years each 4 years.
Suggestions to solve this?
For a start, to get the number of days for a given year:
function n = ndays(year)
tmp = repmat([1,1,0,0,0],numel(year),1);
n = datenum([year(:)+1,tmp])-datenum([year(:), tmp]);
end
With this, you could collect the rows e.g. with mat2cell:
rows_per_year = ndays(1960:2000);
chunks = mat2cell(yourInputMatrix, rows_per_year, size(yourInputMatrix,2));
means = cellfun(#(x) mean(x,1), chunks);
(The latter part comes untested..)

Daily values to Monthly Means for Multiple Years Matlab

I have observed daily data that I need to compare to generated Monthly data so I need to get a mean of each month over the thirty year period.
My observed data set is currently in 365x31 with rows being each day (no leap years!) and the extra column being the month number (1-12).
the problem I am having is that I can only seem to get a script to get the mean of all years. ie. I cannot figure how to get the script to do it for each column separately. Example of the data is below:
1 12 14
1 -15 10
2 13 3
2 2 37
...all the way to 12 for 365 rows
SO: to recap, I need to get the mean of [12; -15; 13; 2] then [14; 10; 3; 37] and so on.
I have been trying to use the unique() function to loop through which works for getting the number rows to average but incorrect means. Now I need it to do each month(28-31 rows) and column individually. Result should be a 12x30 matrix. I feel like I am missing something SIMPLE. Code:
u = unique(m); %get unique values of m (months) ie) 1-12
for i=1:length(u)
month(i) = mean(obatm(u(i), (2:31)); % the average for each month of each year
end
Appreciate any ideas! Thanks!
You can simply filter the rows for each month and then apply mean, like so:
month = zeros(12, size(obatm, 2));
for k = 1:12
month(k, :) = mean(obatm(obatm(:, 1) == k, :));
end
EDIT:
If you want something fancy, you can also do this:
cols = size(obatm, 2) - 1;
subs = bsxfun(#plus, obatm(:, 1), (0:12:12 * (cols - 1)));
vals = obatm(:, 2:end);
month = reshape(accumarray(subs(:), vals(:), [12 * cols, 1], #mean), 12, cols)
Look, Ma, no loops!

representing the day and some parameters of a month

Could you please help me for this matter?
I have 3 matrices, P (Power), T (Temperature) and H (Humidity)
every matrix has 31 columns (days) and there are 24 rows for every column
which are the data for the March of year 2000, i.e.
for example, the matrix P has 31 columns where every column represents
a day data for Power through 24 hours and the same idea is for T and H
I tried to write a MATLAB program that accomplish my goal but
It gave me errors.
My aim is:
In the MATLAB command window, the program should ask the user the following phrase:
Please enter the day number of March, 2000 from 1 to 31:
And I know it is as follows:
Name=input (Please enter the day number of March, 2000 from 1 to 31:)
Then, when, for example, number 5 is entered, the result shown is a matrix containing the following:
1st column: The day name or it can be represented by numbers
2nd column: simple numbers from 1 to 24 representing the hours for that day
3rd column: the 24 points of P of that day extracted from the original P
(the column number 5 of the original P)
4th column: the 24 points of T of that day extracted from the original T
(the column number 5 of the original T)
5th column: the 24 points of H of that day extracted from the original H
(the column number 5 of the original H)
Any help will be highly appreciated,
Regards
Here is what you ask for:
% some sample data
P = rand(24,31);
T = rand(24,31);
H = rand(24,31);
% input day number
daynum=input('Please enter the day number of March, 2000 from 1 to 31: ');
[r, c] = size(P);
% generate output matrix
OutputMatrix = zeros(r,5);
OutputMatrix(:,1) = repmat(weekday(datenum(2000,3,daynum)),r,1);
OutputMatrix(:,2) = 1:r;
OutputMatrix(:,3) = P(:,daynum);
OutputMatrix(:,4) = T(:,daynum);
OutputMatrix(:,5) = H(:,daynum);
disp(OutputMatrix)
The matrix can be generated in a one line, but this way is clearer.
Is it always for March 2000? :) Where do you get this information from?