How to perform repeated regression in matlab? - matlab

I have an excel file that contains 5 columns and 48 rows (water demand, population and rainfall data for four years (1997-2000) of each month)
Year Month Water_Demand Population Rainfall
1997 1 355 4500 25
1997 2 375 5000 20
1997 3 320 5200 21
.............% rest of the month data of year 1997.
1997 12 380 6000 24
1998 1 390 6500 23
1998 2 370 6700 20
............. % rest of the month data of year 1998
1998 12 400 6900 19
1999 1
1999 2
.............% rest of the month data of year 1997 and 2000
2000 12 390 7000 20
i want to do the multiple linear regression in MATLAB. Here dependent variable is water demand and independent variable is population and rainfall. I have written the code for this for all the 48 rows
A1=data(:,3);
A2=data(:,4);
A3=data(:,5);
x=[ones(size(A1)),A2,A3];
y=A1;
b=regress(y,x);
yfit=b(1)+b(2).*A2+b(3).*A3;
Now I want to do the repetition. First, I want to exclude the row number 1 (i.e. exclude year 1997, month 1 data) and do the regression with rest of the 47 rows data. Then I want to exclude row number 2, and do the regression with data of row number 1 and row 3-48. Then I want exclude row number 3 and do the regression with data of row number 1-2 and row 4-48. There is alway 47 row data point as I exclude one row in each run. Finally, I want to get a table of regression coefficient and yfit of each run.

A simple way I can think of is creating a for loop and a temporary "under test" matrix that is exactly the matrix you have without the line you want to exclude, like this
C = zeros(3,number_of_lines);
for n = 1:number_of_lines
under_test = data;
% this excludes the nth line of the matrix
under_test(n,:) = [];
B1=under_test(:,3);
B2=under_test(:,4);
B3=under_test(:,5);
x1=[ones(size(B1)),B2,B3];
y1=B1;
C(:,n)=regress(y1,x1);
end
I'm sure you can optimize this by using some of the matlab functions that operate on vectors, without using the for loop. But I think for only 48 lines it should be fast enough.

Related

in matlab multiple count ifs in matrix

Say I have the following data, S =
Year Week Postcode
2009 24 2035
2009 24 4114
2009 24 4127
2009 26 4114
2009 26 4556
2009 27 7054
2009 27 6061
2009 27 4114
2009 27 2092
2009 27 2315
2009 27 7054
2009 27 4217
2009 27 4551
2009 27 2035
2010 1 4132
2010 1 2155
2010 5 4114 ... (>60000 rows)
In Matlab, I would like to create a matrix with:
column 1: year (2006-2014)
column 2: week (1-52 for each year)
then the next n columns are unique postcodes where the data in each of these columns counts the occurrences from my data, S.
For example:
year week 2035 4114 4127 4556 7054
2009 24 1 1 1 0 0
2009 25 0 0 0 0 0
2009 26 0 1 0 1 0
2009 27 1 1 0 0 2
2009 28 0 0 0 0 0
Thanks if you can help!
Here is a working script which achieves this tabulation. The output is in the data table. You should:
Read the documentation on unique, tables, logical indexing, sortrows. As these are the key tools I used below.
Adapt the script to work with your data. This may involve changing matrices to cell arrays to deal with string inputs etc.
Possibly adapt this to be a function, for cleaner use if this is used regularly / on different data.
Code, fully commented for explanation:
% Use rng for repeatability in rand, n = num data entries
rng('default')
n = 100;
% Set up test data. You would use 3 equal length vectors of real data here
years = floor(rand(n,1)*9 + 2006); % random integer between 2006,2014
weeks = floor(rand(n,1)*52 + 1); % random integer between 1, 52
postcodes = floor(rand(n,1)*10)*7 + 4000; % arbitrary integers over 4000
% Create year/week values like 2017.13, get unique indices
[~, idx, ~] = unique(years + weeks/100);
% Set up table with year/week data
data = table();
data.Year = years(idx);
data.Week = weeks(idx);
% Get columns
uniquepostcodes = unique(postcodes);
% Cycle over unique columns, assign data
for ii = 1:numel(uniquepostcodes)
% Variable names cannot start with a numeric value, make start with 'p'
postcode = ['p', num2str(uniquepostcodes(ii))];
% Create data column variable for each unique postcode
data.(postcode) = zeros(size(data.Year,1),1);
% Count occurences of postcode in each date row
% This uses logical indexing of original data, looking for all rows
% which satisfy year and week of current row, and postcode of column.
for jj = 1:numel(data.Year)
data.(postcode)(jj) = sum(years == data.Year(jj) & ...
weeks == data.Week(jj) & ...
postcodes == uniquepostcodes(ii));
end
end
% Sort week/year data so all is chronological
data = sortrows(data, [1,2]);
% To check all original data was counted, you could run
% sum(sum(table2array(data(:,3:end))))
% ans = n, means that all data points were counted somewhere
On my PC, this takes less than 2.4 seconds for n = 60,000. There are almost definitely optimisations which can be made, but for something which may be used infrequently, this seems acceptable.
There is a linear increase in processing time, relative to the number of unique postcodes. This is because of the loop structure. So if you double the unique postcodes (20 rather than my example of 10) the time is nearer 4.8 seconds - twice as long.
If this solves your problem, consider accepting this as the answer.

MATLAB APPLY CUMSUM IN STEPS

I have data of integers in x = 500 X 612 matrix. I need a new variable xx in a 500 X 612 matrix but I need to apply cumsum along each row (500) across 12 column steps and applying cumsum like this 51 times --> 500 X (12 X 51) matrix. Then I need a for loop to produce 51 plots of the 500 rows and 12 columns of the cumsum time series. thank you!
I will rephrase what the question is asking to benefit those who are reading.
The OP wishes to segment a matrix into chunks by splitting up the matrix into a bunch of columns. A cumsum is applied to each row individually for each column and are then concatenated together to build a final matrix. As such, given this source matrix:
x =
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
Supposing that we wish to split up the matrix by columns 3, 6 and 9 and 12, we will have four chunks to work with. We do a cumsum on each of these blocks individually and piece the final result together. So the result would like the following:
xx =
1 3 6 4 9 15 7 15 24 10 21 33
13 27 42 16 33 51 19 39 60 22 45 69
First, you need to determine how many columns you want to break up the matrix into. In your case, we wish to segment the matrix into 4 chunks: Columns 1 - 3, columns 4 - 6, columns 7 - 9, and columns 10 - 12. As such, I'm going to reshape this matrix so that each column is an individual row from a chunk in this matrix. We then apply cumsum over this reshaped matrix and we then reshape it back to what you had originally.
Therefore, do this:
num_chunks = 4; %// Columns 3, 6, 9, 12
divide_point = size(x,2) / num_chunks; %// Determine how many elements are in a row for a cumsum
x_reshape = reshape(x.', divide_point, []); %// Get reshaped matrix
xy = cumsum(x_reshape); %// cumsum over all columns individually
xx = reshape(xy, size(x,2), size(x,1)).'; %// Reconstruct matrix
In the third line of code, x_reshape = reshape(x.', divide_point, []); may seem a bit daunting, but it's actually not that bad. I had to transpose the matrix first because you want to take each row of a chunk and place them into individual columns so we can perform a cumsum on each column. When you reshape something in MATLAB, it collects values column-wise and reshapes the input into an output of a specified size. Therefore, to collect the rows, we need to collect row-wise and so we must transpose this matrix. Next, divide_point tells you how many elements we have for a single row in one chunk. As such, we want to construct a matrix that is of size divide_point x N where divide_point tells you how many elements we have in a row of a chunk and N is the total number of rows over all chunks. Because I don't want to calculate how many there are (am rather lazy actually....), the [] syntax is to automatically infer this number so that we can get a reshaped matrix that respects the total number of elements in the original input. We then perform cumsum on each of these columns, and then we need to reshape this back into the original shape of the input. With this, we use reshape again on the cumsum result, but in order to get it back into the row-order that you want, we have to determine the transpose as reshape takes values in column-major order, then re-transpose that result.
We get:
xx =
1 3 6 4 9 15 7 15 24 10 21 33
13 27 42 16 33 51 19 39 60 22 45 69
In general, the total number of elements to sum over for a row needs to be evenly divisible by the total number of columns that your matrix contains. For example, given the above, if you were to try to segment this matrix into 5 chunks, you would certainly get an error as the number of rows to cumsum over is not symmetric.
As another example, let's say we wanted to break up the matrix into 6 chunks. Therefore, by setting num_chunks = 6, we get:
xx =
1 3 3 7 5 11 7 15 9 19 11 23
13 27 15 31 17 35 19 39 21 43 23 47
You can see that cumsum restarts at every second column, as we desired 6 chunks and to get 6 chunks with a matrix of 12 columns, a chunk is created at every second column.

how to extract elements of a sparse matrix?

I have a sparse matrix:
A=
(14,13) 0.5286
(15,14) 0.6781
(16,15) 0.5683
(17,16) 1.2773
(18,17) 1.0502
(19,18) 0.4966
(21,19) 0.9951
(21,20) 0.4522
(22,21) 0.8507
(23,22) 1.0727
(24,23) 0.8288
(25,24) 0.5811
(26,25) 0.8235
(28,26) 1.5128
(30,28) 0.7966
(30,29) 0.6363
(31,29) 0.8254
(32,31) 0.8573
(33,32) 1.0753
that is result of a minimum spanning tree. now I want to extract 13,14,15,...26,28,29,...33.
as seen 27 is not between numbers. so pred give: 13 14 15 16 17 18 19 21 22 23 24 25 26 28 29 30 31 32 that 20 and 33 is not.
how can I extract total of numbers that say in top?
[ii jj] = find(A);
answer = unique([ii(:); jj(:)]);
should do it.
Note that the find command with two outputs gives you the row and column index of all nonzero elements. Since you have a minimum spanning tree, each number you care about needs to occur at least once in the row or column (for example your matrix never has the number 29 in the first index, but it occurs in the second).
The unique function makes sure that each number that occurs is only represented once.

Matlab simulation: Query regarding generating random numbers

I am doing some simulations studies and for initial stuides I am trying to simulate 100 gas particles and then grouping of these gas particles in 5 groups randomly for 10 or 100 times (non zero values in any groups). after that i have to find the group with highest particle and the number.
for example
100 gas particles
1 2 3 4 5(groups) Total particle group/Highest number
20|20|20|20|20 100 1-2-3-4-5/20
70|16|04|01|09 100 1/70
18|28|29|10|15 100 3/29
.
.
etc
i have used this to generate 5 random numbers for a single time
for i=1:1
randi([1,100],1,5)
end
ans =
50 41 9 60 88
but how will i find the highest number and group?
Use the max function :
a = [50 41 9 60 88];
[C,I] = max(a)
C should be equal to 88 and I to 4.
For the special case of equality (first line in your code), you have to read the documentation to see the result of max. I think the index returned will be the first max.

How to extract new matrix from existing one

I have a large number of entries arranged in three columns. Sample of the data is:
A=[1 3 2 3 5 4 1 5 ;
22 25 27 20 22 21 23 27;
17 15 15 17 12 19 11 18]'
I want the first column (hours) to control the entire matrix to create new matrix as follows:
Anew=[1 2 3 4 5 ; 22.5 27 22.5 21 24.5; 14 15 16 19 15]'
Where the 2nd column of Anew is the average value of each corresponding hour for example:
from matrix A:
at hour 1, we have 2 values in 2nd column correspond to hour 1
which are 22 and 23 so the average is 22.5
Also the 3rd column: at hour 1 we have 17 and 11 and the
average is 14 and this continues to the hour 5 I am using Matlab
You can use ACCUMARRAY for this:
Anew = [unique(A(:,1)),...
cell2mat(accumarray(A(:,1),1:size(A,1),[],#(x){mean(A(x,2:3),2)}))]
This uses the first column A(:,1) as indices (x) to pick the values in columns 2 and 3 for averaging (mean(A(x,2:3),1)). The curly brackets and the call to cell2mat allow you to work on both columns at once. Otherwise, you could do each column individually, like this
Anew = [unique(A(:,1)), ...
accumarray(A(:,1),A(:,2),[],#mean), ...
accumarray(A(:,1),A(:,3),[],#mean)]
which may actually be a bit more readable.
EDIT
The above assumes that there's no missing entry for any of the hours. It will result in an error otherwise. Thus, a more robust way to calculate Anew is to allow for missing values. For easy identification of the missing values, we use the fillval input argument to accumarray and set it to NaN.
Anew = [(1:max(A(:,1)))', ...
accumarray(A(:,1),A(:,2),[],#mean,NaN), ...
accumarray(A(:,1),A(:,3),[],#mean,NaN)]
You can use consolidator to do the work for you.
[Afinal(:,1),Afinal(:,2:3)] = consolidator(A(:,1),A(:,2:3),#mean);
Afinal
Afinal =
1 22.5 14
2 27 15
3 22.5 16
4 21 19
5 24.5 15