Matlab : how to read a constant width text file and turn it into a matrix? - matlab

i have a ASCII text file each row has format
------------------------------
Variable Columns Type
------------------------------
ID 1-11 Character
YEAR 12-15 Integer
MONTH 16-17 Integer
ELEMENT 18-21 Character
VALUE1 22-26 Integer
MFLAG1 27-27 Character
QFLAG1 28-28 Character
SFLAG1 29-29 Character
VALUE2 30-34 Integer
MFLAG2 35-35 Character
QFLAG2 36-36 Character
SFLAG2 37-37 Character
. . .
. . .
. . .
VALUE31 262-266 Integer
MFLAG31 267-267 Character
QFLAG31 268-268 Character
SFLAG31 269-269 Character
------------------------------
i only need variables "year" "month" "element" and "valuei" i = 1,2,...,31 (there are 31 values in each row)
parameters (like MFLAGi) can have a character in their place or white-space .
also value might not fill all of it's space with numbers so there can be space.
two sample lines from my text file
USC00190736189301TMAX 33 6 117 6 0 I6 -89 6 -28 6 -83 6 -67 6 -67 6 -28 6 -6 6 -139 6 -111 6 -117 6 -89 6 -106 6 -111 6 -106 6 -106 6 -39 6 -78 6 -61 6 -33 6 -6 6 6 6 39 6 28 6 6 6 -61 6 61 6 56 6 0 6
USC00190736189301TMIN -56 6 11 I6 -106 6 -161 6 -106 6 -133 6 -144 6 -117 6 -161 6 -156 6 -206 6 -183 6 -161 6 -161 6 -139 6 -178 6 -189 6 -161 6 -133 6 -150 6 -156 6 -156 6 -100 6 -50 6 -39 6 -67 6 -78 6 -111 6 -94 6 -33 6 -50 6
for example in line 1 value1 has only used 2 out of it's 5 spaces (' 33')
and both MFLAG1 and QFLAG1 are white-space .
i want to put "year" "month" "element" and "valuei" in a matrix and depending on the "element" value choose some of the rows and make my final matrix how can i do that ?
what i have thought of :
%open file
fid = fopen('myt.txt')
% read from file
%'whitespace','' do not overlook white spaces in counting
C = textscan(fid , formatspec ,'whitespace','')
i have two problems with this:
the formatspec i think should be
'%*11c %4d %2d %4c %5d %*3c'
ignore year month element valuei ignore
------------------
repeat this part 31 times
how can i repeat that part 31 times and concat all the parts together ?
i end up having a cell array C since "element" is a string i can't change it into a matrix. apparently C is column by column and each column is a whole string . then how can i access the read data row by row to select the rows i need (according to the value of "element") ?
am I using the wrong method to do what i want ? what should i do ?

for (1), you can use repmat:
idspec = ['%*11c %4d %2d %4c '];
valuespec = repmat('%5d %*3c',[1 31]);
filespec = [idspec valuespec];
(or something similar)
for (2), I can see a few options:
a) You could read the file twice, once ignoring the character column, and using the 'collectoutput' option, so that C would basically contain a matrix. You can read again by ignoring everything but ELEMENT, so that C would have the remaining info.
b) Using 'collectoutput', you'd have C with the year a month, then the ELEMENT, and then the rest.

Related

Why does readmatrix in Matlab skip the first n lines?

In my simulation I am writing data to file using writematrix, then later reading it back using readmatrix. I am appending to a single file at each time step, each line is the same length or longer than the previous line.
For some reason when using readmatrix on the output file, the first n lines are skipped entirely, as in not read at all. For example, my file looks like this:
...
11.8,1,2,3,4,5,6,7,8,9,10,2
11.9,1,2,3,4,5,6,7,8,9,10,2
...
12.3,1,2,3,4,5,6,7,8,9,10,2
12.4,7,8,9,10,7,8,9,10,1,2,1,1,2,3,4,5,6,3,4,5,6,1
12.5,7,8,9,10,7,8,9,10,1,2,1,1,2,3,4,5,6,3,4,5,6,1
...
30.5,7,8,9,10,7,8,9,10,1,2,2,1,2,3,4,5,6,3,4,5,6,2
30.6,7,8,9,10,7,8,9,10,1,2,2,1,2,3,4,5,6,3,4,5,6,2
30.7,17,18,19,20,1,2,7,8,9,10,1,1,2,3,4,5,6,3,4,5,6,2,11,12,13,14,15,16,7,8,9,10,1
30.8,17,18,19,20,1,2,7,8,9,10,1,1,2,3,4,5,6,3,4,5,6,2,11,12,13,14,15,16,7,8,9,10,1
...
(the first column is a time stamp, so the first ellipsis represents t=0 to t=11.7. At t=30.7 there is another step jump in the number of entries), and when I read using the command
data = readmatrix('/path/to/file/data.csv');
the matrix data looks like
12.4 7 8 9 10 7 8 9 10 1 2 1 1 2 3 4 5 6 3 4 5 6 1
12.5 7 8 9 10 7 8 9 10 1 2 1 1 2 3 4 5 6 3 4 5 6 1
12.6 7 8 9 10 7 8 9 10 1 2 1 1 2 3 4 5 6 3 4 5 6 1
...
30.5 7 8 9 10 7 8 9 10 1 2 2 1 2 3 4 5 6 3 4 5 6 2
30.6 7 8 9 10 7 8 9 10 1 2 2 1 2 3 4 5 6 3 4 5 6 2
30.7 17 18 19 20 1 2 7 8 9 10 1 1 2 3 4 5 6 3 4 5 6 2 11 12 13 14 15 16 7 8 9 10 1
30.8 17 18 19 20 1 2 7 8 9 10 1 1 2 3 4 5 6 3 4 5 6 2 11 12 13 14 15 16 7 8 9 10 1
...
That is to say, all the entries before t=12.4 (i.e. the first step jump in line length) are skipped.
In the file, if I delete everything before the first step jump (i.e everything before t=12.4), then I get the same matrix data, so we can conclude the subsequent step jumps cause no issue. If I delete everything from the second step jump (i.e. everything after t=30.6) then it still skips all the entries before t=12.4. If I have no step jumps (i.e. only t=0 to t=12.3) then it happily reads in the first lines.
I've tried reading the same file using csvread and it returns all of the data from the beginning of the file (albeit padded with zeros instead of nans), so I'm confident the issue isn't with the file.
Why is this happening?
A minimum working example is the first code block without the ellipses.
For reference, the first lines have 12 csvs, and each step jump increase that by 11
Edit:
Output from detectImportOptions
ans =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 20 more}
VariableTypes: {'double', 'double', 'double' ... and 20 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 20 more}
VariableOptions: Show all 23 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
PreserveVariableNames: false
Location Properties:
DataLines: [4 Inf]
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
Matlab's readmatrix is trying to be smart and locate a 2-D matrix within the data model of the CSV file you're passing it. It looks like it's passing over the first few lines which don't have explicit trailing empty "cells".
You can control this by setting the import options. Run opts = detectImportOptions(...); on your file and have a look at the DataLines property. If it doesn't start at 1, set it to [1 Inf] to force readmatrix to read in all the lines. And then call readmatrix, explicitly passing in that options structure.
To do this compactly (and probably more efficiently), call readmatrix with an explicit option right off the bat like this:
readmatrix(path2mat,delimitedTextImportOptions('DataLines',[0,Inf]))

Single point crossover

I have two array(matrix with one row) temp1 and temp2 as follow:
temp1=[1 2 3 4 5 6 7 8 9]
temp2=[10 11 12 13 14 15 16 17 18]
and I have an index pn=3. I need output as follows:
tempNew=[1 2 3 13 14 15 16 17 18]
i.e. how do I create tempNew such that all values on indices up to pn come from temp1 and all values beyond index pn come from temp2?
temp1=[1 2 3 4 5 6 7 8 9]
temp2=[10 11 12 13 14 15 16 17 18]
pn=3;
tempNew = [temp1(1:pn),temp2(pn+1:end)]
tempNew =
1 2 3 13 14 15 16 17 18
You use pn to create two temporary arrays to index both of your tempX arrays. Then simply concatenate them using square brackets.
Indexing always starts at 1 in MATLAB, so 1:pn will give you the first pn values of an array. end signifies the end of an array, so pn+1:end will give you all values from index pn+1 up to the last one of an array.

How to efficiently compare elements in two vectors in MATLAB without using loops?

Say I have a matrix A whose first column contains item IDs with repetition and second column contains their weights.
A= [1 40
3 33
2 12
4 22
2 10
3 6
1 15
6 29
4 10
1 2
5 18
5 11
2 8
6 25
1 14
2 11
4 28
3 38
5 35
3 9];
I now want to find the difference of each instance of A and its associated minimum weight. For that, I make a matrix B with its first column containing the unique IDs from column 1 of A, and its column 2 containing the associated minimum weight found from column 2 of A.
B=[1 2
2 8
3 6
4 10
5 11
6 25];
Then, I want to store in column 3 of A the difference of each entry and its associated minimum weight.
A= [1 40 38
3 33 27
2 12 4
4 22 12
2 10 2
3 6 0
1 15 13
6 29 4
4 10 0
1 2 0
5 18 7
5 11 0
2 8 0
6 25 0
1 14 12
2 11 3
4 28 18
3 38 32
5 35 24
3 9 3];
This is the code I wrote to do this:
for i=1:size(A,1)
A(i,3) = A(i,1) - B(B(:,1)==A(i,2),2);
end
But this code takes a long time to execute as it needs to loop through B every time it loops through A. That is, it has a complexity of size(A) x size(B). Is there a better way to do this without using loops, that would execute faster?
You can use accumarray to first compute the minimum value in the second column of A for each unique value in the first column of A. We can then index into the result using the first column of A and compare to the second column of A to create the third column.
% Compute the mins
min_per_group = accumarray(A(:,1), A(:,2), [], #min);
% Compute the difference between the second column and the minima
A(:,3) = A(:,2) - min_per_group(A(:,1));

How to load a text file in Matlab when the number of values in every line are different

I have a none rectangular text file like A which has 10 values in first line, 14 values in 2nd line, 16 values in 3rd line and so on. Here is an example of 4 lines of my text file:
line1:
1.68595314026 -1.48498177528 2.39820933342 27 20 15 2 4 62 -487.471069336 -517.781921387 5 96 -524.886108398 -485.697143555
Line2:
1.24980998039 -0.988095104694 1.89048337936 212 209 191 2 1 989 -641.149658203 -249.001220703 3 1036 -608.681762695 -300.815673828
Line3:
8.10434532166 -4.81520080566 4.90576314926 118 115 96 3 0 1703 749.967773438 -754.015136719 1 1359 1276.73632813 -941.855895996 2 1497 1338.98852539 -837.659179688
Line 4:
0.795098006725 -0.98456710577 1.89322447777 213 200 68 5 0 1438 -1386.39111328 -747.421386719 1 1565 -1153.50915527 -342.951965332 2 1481 -1341.57043457 -519.307800293 3 1920 -1058.8828125 -371.696960449 4 1303 -1466.5802002 -308.764587402
Now, I want to load this text file in to a matrix M in Matlab. I tired to use importdata function for loading it
M = importdata('A.txt');
but it loads the file in a rectangular matrix (all rows have same number of columns!!!) which is not right. The expected created matrix size should be like this:
size(M(1,:))= 1 10
size(M(2,:))= 1 14
size(M(3,:))= 1 16
How can I load this text file in a correct way into Matlab?
As #Jens suggested, you should use a cell array. Assuming your file contains only numeric values separated by whitespaces, for instance:
1 3 6
7 8 9 12 15
1 2
0 3 7
You can parse it into cell array like this:
% Read full file
str = fileread('A.txt');
% Convert text in a cell array of strings
c = textscan(str, '%s', 'Delimiter', '\n');
c = c{1};
% Convert 'string' elements to 'double'
n = cellfun(#str2num, c, 'UniformOutput', false)
You can then access individual lines like this:
>> n{1}
ans =
1 3 6
>> n{2}
ans =
7 8 9 12 15

how to read a row*col matrix from a file in MATLAB

I have the following data in a file .
1 3 5
2 6 8
10 12 14
16 18 20
I want to read it in a matrix by 4*3 dimension . Currently I am reading this matrix by the following code assuming the data is stored in the file named "A.txt".
A=textread('A.txt');
But the problem of this code is that if the file has any space at the last , MATLAB takes that input as zero . For example , if the file "A.txt" has a space after the data , by this piece of code , MATLAB takes input as the following :
1 3 5
2 6 8
10 12 14
16 18 20 0
So I want to read the matrix as a row* col syntax . Can you please help me ?
An option maybe is to capture the empty spaces as NaNs and after read the file remove the NaNs:
A = textread('A.txt','','emptyvalue',NaN)
A =
1 3 5 NaN
2 6 8 NaN
10 12 14 NaN
16 18 20 NaN
A = A(:,any(~isnan(A)))
A =
1 3 5
2 6 8
10 12 14
16 18 20