Related
[Edited:] I have a file data2007a.csv and I copied and pasted (using TextEdit in MacBook) the first consecutive few lines to a new file datatest1.csv for testing:
Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,0,Aruba,ANT,Netherlands Antilles,2007,Export,6,448.91
S3,ABW,0,Aruba,ATG,Antigua and Barbuda,2007,Export,6,0.312
S3,ABW,0,Aruba,CHN,China,2007,Export,6,24.715
S3,ABW,0,Aruba,COL,Colombia,2007,Export,6,95.885
S3,ABW,0,Aruba,DOM,Dominican Republic,2007,Export,6,11.432
I wanted to use textscan to read it into MATLAB with only columns 2,3,5 (starting from the second row) and I wrote the following code
clc,clear all
fid = fopen('datatest1.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
'Delimiter',',',...
'HeaderLines',1);
fclose(fid);
But I ended up with only the second row of columns 2,3 and 5:
I then keep the first row in data2007a.csv and selected several others to saved as datatest2.csv:
Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,1,Aruba,USA,United States,2007,Export,6,1.392
S3,ABW,1,Aruba,VEN,Venezuela,2007,Export,6,5633.157
S3,ABW,2,Aruba,ANT,Netherlands Antilles,2007,Export,6,310.734
S3,ABW,2,Aruba,USA,United States,2007,Export,6,342.42
S3,ABW,2,Aruba,VEN,Venezuela,2007,Export,6,63.722
S3,AGO,0,Angola,DEU,Germany,2007,Export,6,105.334
S3,AGO,0,Angola,ESP,Spain,2007,Export,6,8533.125
And I wrote:
clc,clear all
fid = fopen('datatest2.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
'Delimiter',',',...
'HeaderLines',1);
fclose(fid);
data{1}
It gives exactly what I wanted:
When I use the same code for my original data file data2007a.csv, it goes as in the first case.
What is going wrong and how can I fix it?
[Added:] If one replicates my experiments1, one can find that both cases work and the problem does not exist! I really don't know what is going on.
1 For "replicate" I mean copy-and-paste the data given above and save it as two new files, say, datatest4a.csv and datatest4b.csv. I used visdiff('datatest1.csv', 'datatest4a.csv') to compare two files and it returned:
Given how you fixed it, I think this is an end-of-line character issue. This sometimes comes up when moving text files between Windows and Unix based systems, as they use different conventions.
When you add %*[^\n] to the end of a textscan format, as you have here. it means to skip everything to the end of line. But if it expects a specific end of line character, and can't find one, it will skip everything to the end of the file. This would explain why you get one row correctly read and then nothing else.
If you don't specify what the end of line character is, Matlab appears to default to... something... in this not very clear specification in the help:
The default end-of-line sequence is \n, \r, or \r\n, depending on the contents of your file.
One way to try and cure this without having to create a new file would be to add this 'EndOfLine', '\r\n' to your textscan call:
If you specify '\r\n', then textscan treats any of \r, \n, and the
combination of the two (\r\n) as end-of-line characters.
This will hopefully handle most standard(ish) EOL conventions. It is likely that copy-pasting and saving with a different bit of software than was originally used to create the file changed the end of line characters such that Matlab was able to recognise them.
I'm quite new to Matlab and programming in general and would love to get some help with the following. I've look here on the website, but couldn't find an answer.
I am trying to use a for-loop and fprintf to give me a bunch of separate text files, whose file names contain the index I use for my for-loop. See for example this piece of code to get the idea of what I'd like to do:
for z=1:20
for x=1:z;
b=[x exp(x)];
fid = fopen('table z.txt','a');
fprintf(fid,'%6.2f, %6.2f\n',b);
fclose(fid);
end
end
What I'm looking for, is a script that (in this case) gives me 20 separate .txt files with names 'table i.txt' (i is 1 through 20) where
table 1.txt only contains [1, exp(1)],
table 2.txt contains [1, exp(1)] \newline [2, exp(2)]
and so on.
If I run the script above, I get only one text file (named 'table z.txt' with all the data appended underneath. So the naming of fopen doesn't 'feel' the z values, but interprets z as a letter (which, seeing the quotation marks doesn't really surprise me)
I think there must be an elegant way of doing this, but I haven't been able to find it. I hope someone can help.
Best,
L
use num2str and string concatenation [ ... ].
fid = fopen( ['table ' num2str(z) '.txt'],'a');
Opening your file in the innermost loop is inefficient, you should create a file as soon as you know z (see example below). To format a string the same way that fprintf, you can use sprintf.
for z=1:20
fname = sprintf('table %d.txt',z);
fid = fopen(fname,'w');
for x=1:z
fprintf(fid,'%6.2f, %6.2f\n', x, exp(x));
end
fclose(fid);
end
I am using MATLAB
I Have 51 files in their own directory all of .out extention created by a seperate program, all numbered 0 to 50.
ie
0.out
1.out
2.out
and so on til 50.out.
I need to load each file 1 by one to do calculations upon them within a for loop. How would I do this using the count variable to load the file, if the directory is set beforehand?
i.e.
%set directiory
cd(......)
%for loop
For count = 0:50,
data = count.out *<-----this line*
.....
Many thanks!
First generate the file name with
fileName = [int2str(count) '.out'];
then open the file with
fid = fopen(fileName, 'r');
The loading phase depends on the kind of file you want to read. Assuming it is a text file you can, for example, read it line after line with
while ~feof(fid)
line = fgetl(fid);
end
or use more specialized functions (see http://www.mathworks.it/it/help/matlab/text-files.html). Before the end of the for loop you'll have to close the file by calling
fclose(fid);
Another quite nice way to do it is to use the dir function
http://www.mathworks.co.uk/help/matlab/ref/dir.html
a = dir('c:\docs*.out')
Will give you a structure containing all the info about the *.out files in the directory you point it to, (or the path). You can then loop through it bit by bit. using fopen or csvread or whatever file reading function you want to use.
I am relatively new to using Matlab and I don't have much knowledge about programming either. For a project I am working on currently I need to process a lot of data which is logged using the following format.
$GPRMC,202124.985,V,,,,,,,091112,,,N*44
2038,4674,4667,5593,3379
2087,5133,5111,6084,3372
2138,5134,5114,6080,3376
2188,5133,5114,6084,3377
2238,5130,5113,6084,3410
2287,5134,5113,6080,3416
2337,5133,5110,6080,3417
2387,5133,5110,6084,3416
2438,5130,5113,6081,3396
2487,5132,5110,6080,3410
$GPRMC,202125.985,V,,,,,,,091112,,,N*45
2985,5130,5113,6085,3988
3035,5130,5118,6084,4541
3085,5138,5113,6082,5186
3135,5130,5114,6081,6001
3185,5134,5110,6084,6311
3234,5134,5113,6084,6319
3284,5131,5114,6084,6316
3339,5131,5110,6084,6260
3389,5130,5114,6080,6178
3438,5134,5110,6085,6077
$GPRMC,202126.985,V,,,,,,,091112,,,N*46
3942,5131,5114,6085,5916
3992,5130,5110,6084,5917
4042,5133,5110,6084,5950
4091,5131,5114,6080,5996
4142,5134,5114,6085,6062
4192,5134,5114,6084,6129
4242,5134,5110,6080,6150
4291,5130,5110,6079,6186
4341,5130,5110,6089,6246
4391,5130,5118,6083,6266
It continues like this until the end of the file. What I want to do is to be able to separate the data such that, all the '$GPRMC' strings (rows) are listed together as text (not separated) in one file or array while all the other rows (numerical) listed together in one file array (comma separated is desirable). Is it even possible? If it is than can you please give me some pointers?
Not quite sure what you mean by separated or not separated. If you copy the text you posted into some file like testf.dat, a simple script like this using fopen, fprintf, and fgets might be what you're looking for:
infile = fopen('testf.dat');
outf1 = fopen('GPRMC.dat','w');
outf2 = fopen('nums.dat','w');
tline = fgets(infile);
while ischar(tline)
if tline(1:6) == '$GPRMC'
fprintf(outf1,tline);
else
fprintf(outf2,tline);
end
tline = fgets(infile);
end
fclose(infile);
fclose(outf1);
fclose(outf2);
I am trying to read all the lines in a .m file with the following
file_content = textscan(fid, '%s', 'delimiter', '\n', 'whitespace', '')
but this just returns
file_content =
{0x1 cell}
when actually my file has 224 line. so if i use
file_content = textscan(fid,'%s',224,'delimiter','\n')
i get all the lines
file_content =
{224x1 cell}
what will be a more proper way to read all the data(mostly strings) in a .m file?
thanks
Since you do not list your needs (are you reading a huge file?, many small files? is speed an issue? what do you really want to do?) I'm giving you the simplest possible answer:
You do this:
f = fopen('data.txt');
g = textscan(f,'%s','delimiter','\n');
fclose(f);
remember to close after reading, because otherwise you won't be able to read again.
You can get the first line as g{1}{1}, the second as g{1}{2} and so on.
Here is the matlab documentation for textscan which gives a lot more details.
Here's a method that worked for me:
fid = fopen('filename','r'); %opening in read mode (default)
inter = textscan(fid,'%[^\n]');
lines = inter{1,1};
fclose(fid);
This command reads the whole file 'line by line'. for example, I had a text file with 1332 lines, this code creates a variable inter which is a {1,1 cell} and lines which is a [1x102944 char].
I'm not sure why/how this works (it'd be great if someone else reading this knows how!) but it works for my program.
That call to textscan means "read everything up to a \n".
In general your file may have mixed line endings, or none at all and have records separated by ':' or something.