Scan text file into Matlab - matlab

I have a text file which I want to import into matlab. Here are the first 2 rows of the text file (tempfile.txt):
1,"4/26/2016","6:40:00 PM","111","0","13.45","NaN","ACTIVE","NaN",
2,"4/26/2016","6:40:30 PM","73","0","14.99","NaN","ACTIVE","NaN",
When I tried using textscan:
fid = fopen('tempfile.txt');
data = textscan(fid, '%*d %s %s %s %*d %*d %*d %*s %*s', 'Delimiter', ',')
It only imports the first row of the text file. I have tried adding \n to the formatSpec but it still does not work. Please help!

Your problem is that all of your fields are double quoted - i.e. They are strings, and you cannot parse them in as Floats/Doubles, instead parse them in as strings, and cast them to Doubles in Matlab:
data = textscan(fid, '%d %s %s %s %s %s %s %s %s', 'Delimiter', ',')
works fine at parsing your data, then use str2num to convert your data back to numeric. Why do you have double quotes around everything?
=============EDIT============
Since you only want 3 fields, you should do something like:
fid = fopen('abc1.txt');
data = textscan(fid, '%*d %s %s %s %*s %*s %*s %*s %*s', 'Delimiter', ',')

It seems that you have a comma separated values CSV file try this function instead:
M = csvread('tempfile.txt')

Related

Obtain a table of proper size from a csv in Matlab

I'm trying to obtain a table from a csv file in Matlab. The file is available at the following link: http://vincentarelbundock.github.io/Rdatasets/csv/carData/SLID.csv
fid = fopen('SLID.csv', 'r');
C = textscan(fid, '%s %f %f %d %s %s', 'Delimiter', ',', ...
'headerLines', 1, 'TreatAsEmpty','NA');
fclose(fid);
T = cell2table(C,...
'VariableNames',{'id' 'wages' 'education' 'age' 'sex' 'language'});
whos T
But in such a way I obtain a 1x6 table, where each element is a cell of size 7425x1. How to obtain instead a 7425x6 table?
You can get the table you want using the table command:
T = table(C{1},C{2},C{3},C{4},C{5},C{6})
After that, you can set the column names using the table properties:
T.Properties.VariableNames{'Var2'} = 'wages';
etc.
Also, you may want to import the data using the %q specifier, which will remove the double quotes when reading the values from the file:
C = textscan(fid, '%q%f%f%d%q%q', 'Delimiter', ',',...
'headerLines', 1, 'TreatAsEmpty','NA')
But that depends on how you will work with the data later.

Dimensions of matrices being concatenated are not consistent

i read a csv file with textscan and when i want write in a file i receive this error : Error using horzcat. Dimensions of matrices being concatenated are not consistent.
if i change the first format in textscan (i mean %S) to %f the error vanishes.
the error occurs when matlab want to make [datatest{1} probability]
probability is 1000*1 double
datatest{1} is 1000*1 cell
datatest=textscan(FileID,'%s %*f %f %f %*s %*s %*s %*s %*s %*s %*s %*s %*s %f %f %f %f %f %f %f %f %f %f',1000,'headerlines',1,'delimiter',',');
csvwrite('output.csv',[datatest{1} probability]);
Your variable datatest{1} contains 1000 cells which each contains a string (may be or may be not the same length).
In your statement [datatest{1} probability] you are trying to concatenate cells (containing strings) with double numeric type, this does not work. The concatenation operator needs to operate on data of similar type.
Now even if you were to create a cell array which would contain all your desired columns myCellArray={datatest{1} probability}, this would not help you because the output of that cannot be passed on the function csvwrite.
csvwrite, or the better sister dlmwrite, do not accept cell arrays. You would have to convert the cell values into numeric values. Unfortunately, you want to write strings and numeric values, so your only way is to use low level functions like fprintf
In your case, to write the file you were expecting, you can use the following code.
col1 = datatest{1} ; %// extract the column of interest for easier indexing later on
fidw = fopen('output.csv','w') ; %// get a handle on a file to write (necessary with "fprintf")
for iline = 1:numel(probability) %// loop on each line
fprintf( fidw , '%s, %f\n' , col1{iline} , probability(iline) ) ; %// write the line
end
fclose(fidw) ; %// close the file - IMPORTANT - (necessary with "fprintf")

Using textscan to read certain rows

I am trying to read data from a text file using textscan from Matlab. Currently, the code is provided below reads rows 1 to 4. I need it to read rows from 5 to 8, then rows from 9 to 13 and so on. How would I achieve this?
fileID=fopen(fileName);
num_rows=4;
nHeaderLines = 2;
formatSpec = '%*s %*s %s %s %*s %*s %*s %f %*s';
dataIn = textscan(fileID,formatSpec,num_rows,'HeaderLines',nHeaderLines, 'Delimiter',',' );
fclose(fileID);
Use
file = fopen('myfile');
content = textscan(file,'%s','delimiter','\n');
fclose(file);
and you have all the lines in your file as cell array of strings. Then take any number of rows you want and process them as you like.

Matlab textscan gone wrong: cellfun to select data from certain lines

Hi I am using the following code to read some values from lines containing 'GPGGA' from data.txt
fid = fopen('D:\data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*s %*s %*s %*s %*s %*s %*s %*s %*s,'Delimiter',',');
fclose(fid);
Loc = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Loc = Loc(row_idxs, :);
display(Loc);
The code works perfectly if the last line in data.txt is deleted. Not sure why it throws this error when the last line is included in the text file. What is the reason? I'm confused!
"??? Error using ==> horzcat
CAT arguments dimensions are not consistent.
Error in ==> test at 4
Loc = [A{[2, 4]}];"
data.txt
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGGA,1.8,98.90,S,18.0014,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGGA,1.3,98.91,S,18.0015,E,1,04,1.0,100.7,M,48.0,M,,*40
$GPGGA,1.3,98.92,S,18.0016,E,1,04,1.0,105.4,M,48.0,M,,*4F
$GPGGA,1.8,98.93,S,18.0017,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGGA,1.8,98.94,S,18.0018,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGSV,4,4,16,27,,,,26,,,,24,,,,22,,,*79
Your format string is no good. It is only indicative of 15 columns. The sample data you've posted has 20 columns. I suggest using the following code (which runs without error on my machine) instead:
fid = fopen('D:\data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*[^\n]', 'Delimiter',',');
fclose(fid);
Loc = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Loc = Loc(row_idxs, :);
display(Loc);
Note the construct %*[^\n] in my format string. This tells textscan to ignore all columns from this point onwards. It is much neater than writing out lots of %*s over and over. Also, it means you're less likely to miscount the number of columns when building the format string :-)

Changing from textread to textscan MATLAB

I've problem changing my code that uses textread function to textscan.
Contents of data.txt:(Note:I've changed all actual coordinates to dddd.mmmmmm,ddddd.mmmmmm)
$GPGGA,104005.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,4.4,73.4,M,48.0,M,,*7E
$GPGGA,104006.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,2.1,73.5,M,48.0,M,,*7F
$GPGGA,104007.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,2.1,74.0,M,48.0,M,,*70
$GPGGA,104008.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,05,2.4,73.9,M,48.0,M,,*7C
$GPGGA,104009.3,dddd.mmmmmm,N,ddddd.mmmmmm,W,1,04,2.4,73.9,M,48.0,M,,*75
Code:
fid = fopen('E:\data.txt','r');
Location=zeros(2,);
Block = 1;
while(~feof(fid))
A=textscan(fid,'%*s %*s %s %*s %s %*s %*s %*s %*s %*s','delimiter',',','delimiter','\n');
Location(:)=[%s %s]';
x=Location(1,:);
y=Location(2,:);
Block = Block+1;
end
display(Location);
The new code is wrong. I'm using 2 delimiters here. I want to take out the latitude and longitude values from each line if they are not null. How can I correct it? Also what do I need to do to take Lat Long values only from lines starting with $GPGGA if there are many different lines in the text file?
This code should work for both your requirements and put in the correct signs (please check):
fid = fopen('data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*s %*s %*s %*s %*s %*s %*s %*s %*s','Delimiter',',');
fclose(fid);
Location = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Location = Location(row_idxs, :);
LatSigns = -2*cellfun(#(dir) strcmp(dir, 'S'), A{3}(row_idxs))+1;
LongSigns = -2*cellfun(#(dir) strcmp(dir, 'W'), A{5}(row_idxs))+1;
Location = Location .* [LatSigns LongSigns];
display(Location);