Read a complex, and long text file in Matlab - matlab

I have a very long text file which contains the data from 4 different stations with different time steps:
1:00
station 1
a number 1 (e.g.0.6E-06)
matrix1 (41x36)
station 2
number 2 (e.g.0.1E-06)
matrix2 (41x36)
station 3
number 3 (e.g.0.2E-06)
matrix3 (41x36)
station 4
number 4 (e.g.0.4E-06)
matrix4 (41x36)
2:00
station 1
a number (e.g.0.24E-06)
matrix5 (41x36)
station 2
a number (e.g.0.3E-06)
matrix6 (41x36)
station 3
number (e.g.0.12E-06)
matrix7 (41x36)
station 4
number (e.g.0.14E-06)
matrix8 (41x36)
.....
and so on
I need to read this data by each station and each step, and noted that each matrix should be scaled by multiplying with a number above it. An example is here: https://files.fm/u/sn447ttc#/view/example.txt
Could you please help?
Thank you a lot.

My idea here would be to read the textfile using fopen and textscan. Afterwards you can search for appearances of the Keyword FACTOR to subdivide the output. Here's the code:
fid=fopen('example.txt'); % open the document
dataRaw=textscan(fid,'%s','Delimiter',''); % read the file with no delimiter to achieve a cell array with 1 cell per line of the text file
fclose(fid); % close the document
rows=cellfun(#(x) strfind(x,'FACTOR'),dataRaw,'uni',0); % search for appearances of 'FACTOR'
hasFactor=find(~cellfun(#isempty,rows{1})); % get rownumbers of the lines that contain the word FACTOR
dataRaw=dataRaw{1}; % convert array for easier indexing
for ii=1:(numel(hasFactor)-1) % loop over appearances of the word FACTOR
array=cellfun(#str2num,dataRaw(hasFactor(ii)+2:hasFactor(ii+1)-1),'uni',0); % extract numerical data
output{ii}=str2num(dataRaw{hasFactor(ii)+1})*cat(1,array{:}); % create output scaled by the factor
end
array=cellfun(#str2num,dataRaw(hasFactor(end)+2:end),'uni',0);
output{end+1}=str2num(dataRaw{hasFactor(end)+1})*cat(1,array{:}); % These last 2 lines add the last array to the ouput
outputMat=cat(3,output{:}); % convert to a 3-dimensional matrix
outputStations=[{output(1:4:end)} {output(2:4:end)} {output(3:4:end)} {output(4:4:end)}]; % Sort the output to have 1 cell for each station
outputColumnSums=cellfun(#(x) cellfun(#sum,x,'uni',0),outputStations,'uni',0); % To sum up all the columns of each matrix
outputRowSums=cellfun(#(x) cellfun(#(y) sum(y,2),x,'uni',0),outputStations,'uni',0);
This approach is pretty slow and probably can be vectorized, but if you don't need it to be fast it should do the job. I created a cell-output with 1 cell per array and a 3 dimensional array as optional output. Hope that's fine with you

I have looked into your situation and it seems that the problem not trivial as anticipated. Keep in mind that if I have made mistakes on the assumption of the location of the data, you can let me know so I can edit it, or you can just change the numbers to that which suits your case. In this case, I initially loaded the delimited file into an Excel spreadsheet, just to visualize it.
After reading up on dlmread, I found that one can specify the exact rows and columns to pull from example.txt, as shown here:
data = dlmread('example.txt', ' ', [4 1 45 37]); % [r1 c1 r2 c2]
data2 = dlmread('example.txt', ' ', [47 1 88 37]);
The result of which is two matrices that are 41-by-37, containing only numbers. I started data at row 4 to bypass the header information/strings. Noticing the pattern, I set it up as a loop:
No_of_matrices_expected = 4;
dataCell = cell(No_of_matrices_expected, 1);
iterations = length(dataCell)
% Initial Conditions
rowBeginning = 4;
col1 = 1; % Constant
rowEnd = rowBeginning + 40; % == 44, right before next header information
col2 = 36; % Constant
for n = 1 : iterations
dataCell{n} = dlmread('example.txt', ' ', [rowBeginning, col1, rowEnd, col2]);
rowBeginning = rowBeginning + 41 + 2; % skip previous matrix and skip header info
rowEnd = rowBeginning + 40;
end
However, I stumbled across what you stated earlier which was that there are four different stations, each with their own time stamps. So running this loop more than 4 times led to unexpected results and MATLAB crashed. The reason is that the new timestamp creates an extra row for the date. Now, you could change the loop above to compensate for this extra row, or you can make multiple for loops for each station. This will be your decision to make.
Now if you wanted to save the header information, I would recommend taking a look into textscan. You can simply use this function to pull the first column of all the data into a cell array of strings. Then you can pull out the header information that you want. Keep in mind, use fopen if you want to use textscan.
I'll let you use what I have found thus far, but let me know if you need more help.
Numbers

Related

How can I convert this table to a cell array as shown in the screenshot?

I am trying to convert a table imported from a CSV file (see below) and convert it to two cell arrays. As shown in the screenshots, Cell 1 contains "measure" of he table, measures of the same ID go in the same cell. Similarly Cell 2 contains "t" in the same way.
Matlab is not my language but I have a function to test out written in Matlab only, so I am really unsure how I could achieve this task.
Let the data be defined as
data = array2table([1 100 1; 1 200 2; 1 300 3; 2 500 3; 2 600 4; 2 700 5; 2 800 6], ...
'VariableNames', {'id' 'measure' 't'}); % example data
You can use findgroups and splitapply as follows:
g = findgroups(data.id); % grouping variable
result_measure = splitapply(#(x){x.'}, data.measure, g).'; % split measure as per g
result_t = splitapply(#(x){x.'}, data.t, g).'; % split t as per g
Alternatively, findgroups for a single grouping variable can be replaced by unique (third output), and splitapply for a single data variable can be replaced by accumarray:
[~, ~, g] = unique(data.id); % grouping variable
result_measure = accumarray(g, data.measure, [], #(x){x.'}).'; % split measure as per g
result_t = accumarray(g, data.t, [], #(x){x.'}).'; % split t as per g
In case someone is interested in my non-Matlab solution to the problem:
array_id=table2array(data(:,'id'));
array_t=table2array(data(:,'t'));
array_measure=table2array(data(:,'measure'));
uni_id=unique(array_id);
t_cell=cell(1,length(uni_id));
measure_cell=cell(1,length(uni_id));
for i=1:length(uni_id)
temp_t=table2array(data(data.id==uni_id(i),'t'));
temp_measure=table2array(data(data.id==uni_id(i),'measure'));
t_cell{i}=temp_t';
measure_cell{i}=temp_measure';
end
Apparently this is nothing comparable to what Luis has, but it gets the job done.

Using Matlab to randomly split an Excel Sheet

I have an Excel sheet containing 1838 records and I need to RANDOMLY split these records into 3 Excel Sheets. I am trying to use Matlab but I am quite new to it and I have just managed the following code:
[xlsn, xlst, raw] = xlsread('data.xls');
numrows = 1838;
randindex = ceil(3*rand(numrows, 1));
raw1 = raw(:,randindex==1);
raw2 = raw(:,randindex==2);
raw3 = raw(:,randindex==3);
Your general procedure will be to read the spreadsheet into some matlab variables, operate on those matrices such that you end up with three thirds and then write each third back out.
So you've got the read covered with xlsread, that results in the two matrices xlsnum and xlstxt. I would suggest using the syntax
[~, ~, raw] = xlsread('data.xls');
In the xlsread help file (you can access this by typing doc xlsread into the command window) it says that the three output arguments hold the numeric cells, the text cells and the whole lot. This is because a matlab matrix can only hold one type of value and a spreadsheet will usually be expected to have text or numbers. The raw value will hold all of the values but in a 'cell array' instead, a different kind of matlab data type.
So then you will have a cell array valled raw. From here you want to do three things:
work out how many rows you have (I assume each record is a row) by using the size function and specifying the appropriate dimension (again check the help file to see how to do this)
create an index of random numbers between 1 and 3 inclusive, which you can use as a mask
randindex = ceil(3*rand(numrows, 1));
apply the mask to your cell array to extract the records matching each index
raw1 = raw(:,randindex==1); % do the same for the other two index values
write each cell back to a file
xlswrite('output1.xls', raw1);
You will probably have to fettle the arguments to get it to work the way you want but be sure to check the doc functionname page to get the syntax just right. Your main concern will be to get the indexing correct - matlab indexes row-first whereas spreadsheets tend to be column-first (e.g. cell A2 is column A and row 2, but matlab matrix element M(1,2) is the first row and the second column of matrix M, i.e. cell B1).
UPDATE: to split the file evenly is surprisingly more trouble: because we're using random numbers for the index it's not guaranteed to split evenly. So instead we can generate a vector of random floats and then pick out the lowest 33% of them to make index 1, the highest 33 to make index 3 and let the rest be 2.
randvec = rand(numrows, 1); % float between 0 and 1
pct33 = prctile(randvec,100/3); % value of 33rd percentile
pct67 = prctile(randvec,200/3); % value of 67th percentile
randindex = ones(numrows,1);
randindex(randvec>pct33) = 2;
randindex(randvec>pct67) = 3;
It probably still won't be absolutely even - 1838 isn't a multiple of 3. You can see how many members each group has this way
numel(find(randindex==1))

Matlab matching first column of a row as index and then averaging all columns in that row

I need help with taking the following data which is organized in a large matrix and averaging all of the values that have a matching ID (index) and outputting another matrix with just the ID and the averaged value that trail it.
File with data format:
(This is the StarData variable)
ID>>>>Values
002141865 3.867144e-03 742.000000 0.001121 16.155089 6.297494 0.001677
002141865 5.429278e-03 1940.000000 0.000477 16.583748 11.945627 0.001622
002141865 4.360715e-03 1897.000000 0.000667 16.863406 13.438383 0.001460
002141865 3.972467e-03 2127.000000 0.000459 16.103060 21.966853 0.001196
002141865 8.542932e-03 2094.000000 0.000421 17.452007 18.067214 0.002490
Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think first block of code = [1 2 3; 1 5 9; 2 5 7; 2 4 6] then the code repeats with different values for the columns except for the index. The main difference is the values trailing the ID which I need to average out in matlab and output a clean matrix with only one of each ID fully averaged for all occurrences of that ID.
Thanks for any help given.
A modification of this answer does the job, as follows:
[value_sort ind_sort] = sort(StarData(:,1));
[~, ii, jj] = unique(value_sort);
n = diff([0; ii]);
averages = NaN(length(n),size(StarData,2)); % preallocate
averages(:,1) = StarData(ii,1);
for col = 2:size(StarData,2)
averages(:,col) = accumarray(jj,StarData(ind_sort,col))./n;
end
The result is in variable averages. Its first column contains the values used as indices, and each subsequent column contains the average for that column according to the index value.
Compatibility issues for Matlab 2013a onwards:
The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace second line by
[~, ii, jj] = unique(value_sort,'legacy')

matlab updating time vector

I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];

Increasing the length of a column in MATLAB

I'm just beginning to teach myself MATLAB, and I'm making a 501x6 array. The columns will contain probabilities for flipping 101 sided die, and as such, the columns contain 101,201,301 entries, not 501. Is there a way to 'stretch the column' so that I add 0s above and below the useful data? So far I've only thought of making a column like a=[zeros(200,1);die;zeros(200,1)] so that only the data shows up in rows 201-301, and similarly, b=[zeros(150,1);die2;zeros(150,1)], if I wanted 200 or 150 zeros to precede and follow the data, respectively in order for it to fit in the array.
Thanks for any suggestions.
You can do several thing:
Start with an all-zero matrix, and only modify the elements you need to be non-zero:
A = zeros(501,6);
A(someValue:someOtherValue, 5) = value;
% OR: assign the range to a vector:
A(someValue:someOtherValue, 5) = 1:20; % if someValue:someOtherValue is the same length as 1:20