Matlab textscan gone wrong: cellfun to select data from certain lines - matlab

Hi I am using the following code to read some values from lines containing 'GPGGA' from data.txt
fid = fopen('D:\data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*s %*s %*s %*s %*s %*s %*s %*s %*s,'Delimiter',',');
fclose(fid);
Loc = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Loc = Loc(row_idxs, :);
display(Loc);
The code works perfectly if the last line in data.txt is deleted. Not sure why it throws this error when the last line is included in the text file. What is the reason? I'm confused!
"??? Error using ==> horzcat
CAT arguments dimensions are not consistent.
Error in ==> test at 4
Loc = [A{[2, 4]}];"
data.txt
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGGA,1.8,98.90,S,18.0014,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGGA,1.3,98.91,S,18.0015,E,1,04,1.0,100.7,M,48.0,M,,*40
$GPGGA,1.3,98.92,S,18.0016,E,1,04,1.0,105.4,M,48.0,M,,*4F
$GPGGA,1.8,98.93,S,18.0017,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGGA,1.8,98.94,S,18.0018,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGSV,4,4,16,27,,,,26,,,,24,,,,22,,,*79

Your format string is no good. It is only indicative of 15 columns. The sample data you've posted has 20 columns. I suggest using the following code (which runs without error on my machine) instead:
fid = fopen('D:\data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*[^\n]', 'Delimiter',',');
fclose(fid);
Loc = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Loc = Loc(row_idxs, :);
display(Loc);
Note the construct %*[^\n] in my format string. This tells textscan to ignore all columns from this point onwards. It is much neater than writing out lots of %*s over and over. Also, it means you're less likely to miscount the number of columns when building the format string :-)

Related

Scan text file into Matlab

I have a text file which I want to import into matlab. Here are the first 2 rows of the text file (tempfile.txt):
1,"4/26/2016","6:40:00 PM","111","0","13.45","NaN","ACTIVE","NaN",
2,"4/26/2016","6:40:30 PM","73","0","14.99","NaN","ACTIVE","NaN",
When I tried using textscan:
fid = fopen('tempfile.txt');
data = textscan(fid, '%*d %s %s %s %*d %*d %*d %*s %*s', 'Delimiter', ',')
It only imports the first row of the text file. I have tried adding \n to the formatSpec but it still does not work. Please help!
Your problem is that all of your fields are double quoted - i.e. They are strings, and you cannot parse them in as Floats/Doubles, instead parse them in as strings, and cast them to Doubles in Matlab:
data = textscan(fid, '%d %s %s %s %s %s %s %s %s', 'Delimiter', ',')
works fine at parsing your data, then use str2num to convert your data back to numeric. Why do you have double quotes around everything?
=============EDIT============
Since you only want 3 fields, you should do something like:
fid = fopen('abc1.txt');
data = textscan(fid, '%*d %s %s %s %*s %*s %*s %*s %*s', 'Delimiter', ',')
It seems that you have a comma separated values CSV file try this function instead:
M = csvread('tempfile.txt')

Dimensions of matrices being concatenated are not consistent

i read a csv file with textscan and when i want write in a file i receive this error : Error using horzcat. Dimensions of matrices being concatenated are not consistent.
if i change the first format in textscan (i mean %S) to %f the error vanishes.
the error occurs when matlab want to make [datatest{1} probability]
probability is 1000*1 double
datatest{1} is 1000*1 cell
datatest=textscan(FileID,'%s %*f %f %f %*s %*s %*s %*s %*s %*s %*s %*s %*s %f %f %f %f %f %f %f %f %f %f',1000,'headerlines',1,'delimiter',',');
csvwrite('output.csv',[datatest{1} probability]);
Your variable datatest{1} contains 1000 cells which each contains a string (may be or may be not the same length).
In your statement [datatest{1} probability] you are trying to concatenate cells (containing strings) with double numeric type, this does not work. The concatenation operator needs to operate on data of similar type.
Now even if you were to create a cell array which would contain all your desired columns myCellArray={datatest{1} probability}, this would not help you because the output of that cannot be passed on the function csvwrite.
csvwrite, or the better sister dlmwrite, do not accept cell arrays. You would have to convert the cell values into numeric values. Unfortunately, you want to write strings and numeric values, so your only way is to use low level functions like fprintf
In your case, to write the file you were expecting, you can use the following code.
col1 = datatest{1} ; %// extract the column of interest for easier indexing later on
fidw = fopen('output.csv','w') ; %// get a handle on a file to write (necessary with "fprintf")
for iline = 1:numel(probability) %// loop on each line
fprintf( fidw , '%s, %f\n' , col1{iline} , probability(iline) ) ; %// write the line
end
fclose(fidw) ; %// close the file - IMPORTANT - (necessary with "fprintf")

Using textscan to read certain rows

I am trying to read data from a text file using textscan from Matlab. Currently, the code is provided below reads rows 1 to 4. I need it to read rows from 5 to 8, then rows from 9 to 13 and so on. How would I achieve this?
fileID=fopen(fileName);
num_rows=4;
nHeaderLines = 2;
formatSpec = '%*s %*s %s %s %*s %*s %*s %f %*s';
dataIn = textscan(fileID,formatSpec,num_rows,'HeaderLines',nHeaderLines, 'Delimiter',',' );
fclose(fileID);
Use
file = fopen('myfile');
content = textscan(file,'%s','delimiter','\n');
fclose(file);
and you have all the lines in your file as cell array of strings. Then take any number of rows you want and process them as you like.

textscan Unexpected Empty Cell with valid Format string

I am reading a tab-delimited file. Five representative lines of this file are:
Date Time Property Path 1 Path 2 Path 3 Path 4 Path 5 Path 6 Path 7 Path 8
Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1
1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003
1/1 01:00:00 F1 (sm³/s) 1.9988E-004 1.6655E-003 2.2252E-004 1.6883E-003 1.8612E-003 2.0221E-004 2.0795E-004 1.7333E-003
1/1 02:00:00 F1 (sm³/s) -4.0722E-004 -3.3931E-003 -4.4324E-004 -2.1177E-003 -3.7075E-003 -2.5364E-004 -3.7330E-004 -3.1115E-003
When I use the following format string I get the expected results:
test = '1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003';
textscan(test, '%*s %*s %*s %*s %f %f %f %f %f %f %f %f')
Gives me:
ans =
[-0.0013] [-0.0112] [-1.0123e-04] [0.0098] [-8.4673e-04] [0.0012] [2.6890e-04] [0.0022]
Which is what I want, but when I attempt:
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
I get a 1x8 cell of empty cells.
What is the error in the format string translation?
I don't think there's anything wrong with your format string specifically.
Try pulling in the lines individually with fgetl or similar and just check that there's nothing you weren't expecting in the file. For example - your code seems to work for me but I can replicate your error by putting an additional blank line at the start of the file, which causes textscan to try and read the second header line as a line of data (and fail inelegantly). That particular error can be removed by increasing the value of HeaderLines.
fid = fopen('test.txt');
fgetl(fid) % repeat until you see your first line of data
Now, I try to use you code and it's work!
file=('d.txt');
fid=fopen(file);
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
Output:
celldisp(C)
C{1} =
-0.0013405
0.00019988
-0.00040722
C{2} =
-0.01117
0.0016655
-0.0033931
C{3} =
-0.00010123
0.00022252
-0.00044324
C{4} =
0.0097769
0.0016883
-0.0021177
C{5} =
-0.00084673
0.0018612
-0.0037075
C{6} =
0.001171
0.00020221
-0.00025364
C{7} =
0.0002689
0.00020795
-0.0003733
C{8} =
0.0022413
0.0017333
-0.0031115
I came across a problem where my textscan would only grab empty cell arrays, google search led me here. I solved it by using fgetl(fid) a couple of times and then frewind(fid), (fid being your variable for fopen) something about reading the lines made it easier to bring in the values.

Changing from textread to textscan MATLAB

I've problem changing my code that uses textread function to textscan.
Contents of data.txt:(Note:I've changed all actual coordinates to dddd.mmmmmm,ddddd.mmmmmm)
$GPGGA,104005.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,4.4,73.4,M,48.0,M,,*7E
$GPGGA,104006.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,2.1,73.5,M,48.0,M,,*7F
$GPGGA,104007.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,2.1,74.0,M,48.0,M,,*70
$GPGGA,104008.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,2.4,73.9,M,48.0,M,,*7C
$GPGGA,104009.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,04,2.4,73.9,M,48.0,M,,*75
Code:
fid = fopen('E:\data.txt','r');
Location=zeros(2,);
Block = 1;
while(~feof(fid))
A=textscan(fid,'%*s %*s %s %*s %s %*s %*s %*s %*s %*s','delimiter',',','delimiter','\n');
Location(:)=[%s %s]';
x=Location(1,:);
y=Location(2,:);
Block = Block+1;
end
display(Location);
The new code is wrong. I'm using 2 delimiters here. I want to take out the latitude and longitude values from each line if they are not null. How can I correct it? Also what do I need to do to take Lat Long values only from lines starting with $GPGGA if there are many different lines in the text file?
This code should work for both your requirements and put in the correct signs (please check):
fid = fopen('data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*s %*s %*s %*s %*s %*s %*s %*s %*s','Delimiter',',');
fclose(fid);
Location = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Location = Location(row_idxs, :);
LatSigns = -2*cellfun(#(dir) strcmp(dir, 'S'), A{3}(row_idxs))+1;
LongSigns = -2*cellfun(#(dir) strcmp(dir, 'W'), A{5}(row_idxs))+1;
Location = Location .* [LatSigns LongSigns];
display(Location);