How to Suitably Print Out Desired Output In MATLAB? [duplicate] - matlab

This question already exists:
How do I perform an if-statement on isnan to check for something?
Closed 3 months ago.
I am writing a program in MATLAB for my Computer Science 1 class. We are only allowed to use what we have learned in class unless explicitly told otherwise. My knowledge of MATLAB is fairly small. In this program, I am trying to read from an Excel file using xlsread. The file has the names, like Day1, Day2, etc. in the rows, and the columns are machines. It's formatted like this, in general terms:
Day1 2 3\n (\n is not actually part of it, just for readability)
Day2 6 9\n
Day3 1 43\n
Somedays, the machines don't work, so no widgets are produced that day. In that case, it's NaN. The program takes the average (Not with a built in average function), and prints the output to another file. I need to find out how to do this. For all of the rows. I'm fairly certain it requires a for loop. Here's the code.
for r = 1: rows
for c = 1: cols
if isnan(machines(r, c)) == 0
if(isnan >= blankDays)
fprintf(output, '\nThe machine was not working for %f.1f%% of the days\n', blankDays);
else
fprintf(output, '\nThe average is %f.4f\n', myAverage);
end
end
end
end
Help would be appreciated.

Related

Avoiding the use of for loops

I would like to get some help with how to avoid using for loops. I've seen several similar questions but have not been able to figure something specific to my needs. Currently I use loops and it is very bulky and messy. Below is the structure of the data and what I would like to achieve:
I have to index data from 1080 timepoints which come from variable mean_data which is a 1080x1 double. Some of the timepoints belong to specific task conditions and specific events that I am interested in. There are 3 conditions (cond1, cond2, cond3) and 4 task events (event1, event2, event3, event4). This information comes from variable params. Specially column 8 of params has the condition information (1, 2, 3 mean cond1, cond2, con3, respectively). The event information can be obtained from column 11 in params. Below is what I can do with loops:
for c=1:size(params,1)
if params(c,8)==1
cond1_event1(end+1,1)=mean([data(params(c,11)+3,1),data(params(c,11)+4,1)]);
cond1_event2(end+1,1)=mean([data(params(c,11)+6,1),data(params(c,11)+7,1)]);
cond1_event3(end+1,1)=mean([data(params(c,11)+8,1),data(params(c,11)+9,1)]);
cond1_event4(end+1,1)=mean([data(params(c,11)+10,1),data(params(c,11)+11,1)]);
elseif params(c,8)==2
cond2_event1(end+1,1)=mean([data(params(c,11)+3,1),data(params(c,11)+4+1,1)]);
etc.
elseif params(c,8)==3
cond3_event1(end+1,1)=mean([data(params(c,11)+3,1),data(params(c,11)+4,1)]);
etc.
end
end
The loops make it clear but it's just too long. Does anyone have any suggestions how to make this a bit more elegant? The output should yield 12 variables (3 condition x 4 events). Each variable is a nx1 double. Thank you.
You could simply use logical indexing, then use vertical concatenation.
idx = params(c, 8) == 1;
cond1_event1 = [cond1_event1; mean(...)];
Then repeat for the other conditions, or use a loop(1:3).

Matlab - Splitting a column into two (efficiently)

I had previously wrote some code to split 3 columns into 4, however the code was very inefficient and time consuming. As I am working with millions of rows it wasn't suitable. (Below is my previous code)
tline = fgetl(fid);
ID=tline(1:4);
IDN = str2double(ID);
Day=tline(6:8);
DayN = str2double(Day);
HalfHour=tline(9:10);
HalfHourN = str2double(HalfHour);
Usage=tline(12:end);
UsageN = str2double(Usage);
There must be a more efficient and quicker way of doing this?
Going back to basics, I have produced a x by 3 matrix. but require an x by 4 matrix
To show what I am trying to do, examining one row -
I am trying to change
1001 36501 1005
to
1001 365 01 1005
Any help would be much appreciated!
Edit:
The second column I am trying to divide into two, is always composed of 5 characters. I am trying to get the first 3 characters into their own column, likewise for the remaining characters.
What might take time in your case is actually the use of the str2double function. It is known that this built-in function becomes very slow when the data set is large. You might try to get rid of it if possible.
you can use modulo
ans = (36501 - mod(36501,100))/100
This would give you 365
if you want the 1, it is mod(36501,100)
so this would effectively split your second column into 2 different numbers, you can then re name them etc.
hmmm on second thoughts, if all your numbers on your second column are 5 digits, this can be extremely efficient, since mod is computed in matlab by b = a - m.*floor(a./m);
check http://uk.mathworks.com/help/matlab/ref/mod.html it should work for vectors (i.e. your second column)

MATLAB Loop Programming

I've been stuck on a MATLAB coding problem where I needed to create market weights for many stocks from a large data file with multiple days and portfolios.
I received help from an expert the other day using 'nested loops' it worked, but I don't understand what he has done in the final line. I was wondering if anyone could shed some light and provide an explanation of the last coding line.
xp = x (where x = market value)
dates=unique(x(:,1)); (finds the unique dates in the data set Dates are column 1)
for i=1:size(dates,1) (creates an empty matrix to fill the data in)
for j=5:size(xp,2)
xp(xp(:,1)==dates(i),j)=xp(xp(:,1)==dates(i),j)./sum(xp(xp(:,1)==dates(i),j)); (help???)
end
end
Any comment are much appreciated!
To understand the code, you have to understand the colon operator, logical indexing and the difference between / and ./. If any of these is unclear, please look it up in the documentation.
The following code does the same, but is easier to read because I separated each step into a single line:
dates=unique(x(:,1));
%iterates over all dates. size(dates,1) returns the number of dates
for i=1:size(dates,1)
%iterates over the fifth to last column, which contains the data that will be normalised.
for j=5:size(xp,2)
%mdate is a logical vector, which is used to select the rows with the currently processed date.
mdate=(xp(:,1)==dates(i))
%s is the sums up all values in column j same date
s=sum(xp(mdate,j))
%divide all values in column j with the same date by s, which normalises to 1
xp(mdate,j)=xp(mdate,j)./s;
end
end
With this code, I suggest to use the debugger and step through the code.

strcmp files - Very large file size output

I'm reading in a csv file that is about 80MB - data_O3. It's about 250,000 x 5 in size. I created E, which is a little bit larger because it has all the days (data_O3 is missing some days). I want to compare the two so that if the date (saved in variable d3) and siteID (d4) are the same, the data point (column 5) is placed in E.
for j = 1:size(data_O3,1)
E(strcmp(d3,data_O3{j,3})&d4 == data_O3{j,4},5) = data_O3(j,5);
end
This script works fine, but for some reason, running it takes longer than expected. I've run the same code for other data that were only slightly smaller with no problem. Is this an issue with the strcmp code or something else?
The script and files used can be found here: https://www.dropbox.com/sh/7bzq3m1ixfeuhu6/i4oOvxHPkn
There are certainly see a number of ways to speed this up significantly.
First of all, read in all numeric data in as numbers. Matlab is not optimized to work with strings, and even cells should generally be avoided as much as possible. If you want to keep everything as strings, use another language (python or perl)
Once you have the state, county and site read in as numbers, then create a number instead of a string for the siteID. One approach would be to use the formula:
siteID = siteNum + 1e4*countyCode + 1e7*stateCode
That would generate unique siteIDs for all sites.
Use datenum to convert the date field into a number.
You are now in a position where the data_O3 defined on line 79 can be a purely numeric array (no cells!), as can your E matrix. That alone will make the process many times faster.
You also might want to define the E as something other than NaN. Maybe give it values of -1.
There may be more optimizations you can do in the comparison, but do the above first and I expect you will see a huge improvement.

Cutting a 250000 value long vector into pieces of 50 and summing them with MATLAB [duplicate]

This question already has an answer here:
Summing up till a certain interval
(1 answer)
Closed 8 years ago.
I have a dataset (which is an hdf5 file) that contains a list of 250000 values, all quite small (sub 10). I want to cut this into 5000 pieces, so 50 each, and I want to individually sum the values in each of these 5000 pieces. I then want to create a histogram of these 5000 pieces, so I need to store them as well.
I am trying to do this using MATLAB, as my very limited programming skills have been developed using this, and it seems suitable for these purposes. Now, I haven't gotten very far, but what I have done so far is:
for n = 1:50:249951
% CR check before (before pumping?)
ROdata = h5read('hdf5file', '/data', [n], [n+49]);
sum(ROdata)
end
Of course, this does not yet store the values of the sum for each n. But more importantly, it does not work. For n = 1, all is fine, and I get the correct value. But already for n = 51, (so summing 51-100), I do not get the correct sum. What's going wrong here?
How should I store these (not working) sums?
Are you looking for something like this?
I assumed you already read your data and you have a 250000x1 vector.
%example data
data = randi(42,1,250000);
% rearranges your vector in groups of 50
A = reshape(data,[],5000);
% sums every column of your reshaped vector
sums = sum(A,1);
hist(sums,5000)