how to textscan to read all the lines in a file

how to textscan to read all the lines in a file - matlab

I am trying to read all the lines in a .m file with the following
file_content = textscan(fid, '%s', 'delimiter', '\n', 'whitespace', '')
but this just returns
file_content =
{0x1 cell}
when actually my file has 224 line. so if i use
file_content = textscan(fid,'%s',224,'delimiter','\n')
i get all the lines
file_content =
{224x1 cell}
what will be a more proper way to read all the data(mostly strings) in a .m file?
thanks

Since you do not list your needs (are you reading a huge file?, many small files? is speed an issue? what do you really want to do?) I'm giving you the simplest possible answer:
You do this:
f = fopen('data.txt');
g = textscan(f,'%s','delimiter','\n');
fclose(f);
remember to close after reading, because otherwise you won't be able to read again.
You can get the first line as g{1}{1}, the second as g{1}{2} and so on.
Here is the matlab documentation for textscan which gives a lot more details.

Here's a method that worked for me:
fid = fopen('filename','r'); %opening in read mode (default)
inter = textscan(fid,'%[^\n]');
lines = inter{1,1};
fclose(fid);
This command reads the whole file 'line by line'. for example, I had a text file with 1332 lines, this code creates a variable inter which is a {1,1 cell} and lines which is a [1x102944 char].
I'm not sure why/how this works (it'd be great if someone else reading this knows how!) but it works for my program.

That call to textscan means "read everything up to a \n".
In general your file may have mixed line endings, or none at all and have records separated by ':' or something.

Related

How to delete every nth lines in a txt file?

I have a txt file (ANSYS 1ST principle nodal stress list) and there are almost 16k lines inside of it. I wanna delete specific lines for example 1st, 2nd, 3rd, 4th, 5th, 39th, 40th, 41th, 42th, 43th, etc. I dont need to search anything, i know which lines be deleted. is there anybody help?

Maybe not the most efficient way but this works:
data_file = 'data.txt';
lines_to_skip = [1:5, 39:43];
fid = fopen(data_file);
ii = 0;
while ~feof(fid)
ii = ii + 1;
file_content{ii} = fgetl(fid);
end
lines = true(1,ii);
lines(lines_to_skip) = false;
fid = fopen(data_file,'w');
fprintf(fid,'%s\r\n',file_content{lines});
fclose(fid);

If you are using linux you can use that command:
sed -i '2d' data.txt

This is tagged as Matlab, but doing this inside Matlab is going to be painful because it doesn't usually offer a convenient way to remove bytes in the middle of a file, so you'd have to write some code to write the text to a new file, skipping lines as appropriate.
If you're on a UNIX system it'll be much easier using sed. There's a great answer here explaining how to do that. The key command is:
# To delete line 10 and 12:
sed -i -e '10d;12d' your-file.txt

reading data from csv files with `textscan` in MATLAB

[Edited:] I have a file data2007a.csv and I copied and pasted (using TextEdit in MacBook) the first consecutive few lines to a new file datatest1.csv for testing:
Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,0,Aruba,ANT,Netherlands Antilles,2007,Export,6,448.91
S3,ABW,0,Aruba,ATG,Antigua and Barbuda,2007,Export,6,0.312
S3,ABW,0,Aruba,CHN,China,2007,Export,6,24.715
S3,ABW,0,Aruba,COL,Colombia,2007,Export,6,95.885
S3,ABW,0,Aruba,DOM,Dominican Republic,2007,Export,6,11.432
I wanted to use textscan to read it into MATLAB with only columns 2,3,5 (starting from the second row) and I wrote the following code
clc,clear all
fid = fopen('datatest1.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
'Delimiter',',',...
'HeaderLines',1);
fclose(fid);
But I ended up with only the second row of columns 2,3 and 5:
I then keep the first row in data2007a.csv and selected several others to saved as datatest2.csv:
Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,1,Aruba,USA,United States,2007,Export,6,1.392
S3,ABW,1,Aruba,VEN,Venezuela,2007,Export,6,5633.157
S3,ABW,2,Aruba,ANT,Netherlands Antilles,2007,Export,6,310.734
S3,ABW,2,Aruba,USA,United States,2007,Export,6,342.42
S3,ABW,2,Aruba,VEN,Venezuela,2007,Export,6,63.722
S3,AGO,0,Angola,DEU,Germany,2007,Export,6,105.334
S3,AGO,0,Angola,ESP,Spain,2007,Export,6,8533.125
And I wrote:
clc,clear all
fid = fopen('datatest2.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
'Delimiter',',',...
'HeaderLines',1);
fclose(fid);
data{1}
It gives exactly what I wanted:
When I use the same code for my original data file data2007a.csv, it goes as in the first case.
What is going wrong and how can I fix it?
[Added:] If one replicates my experiments1, one can find that both cases work and the problem does not exist! I really don't know what is going on.
1 For "replicate" I mean copy-and-paste the data given above and save it as two new files, say, datatest4a.csv and datatest4b.csv. I used visdiff('datatest1.csv', 'datatest4a.csv') to compare two files and it returned:

Given how you fixed it, I think this is an end-of-line character issue. This sometimes comes up when moving text files between Windows and Unix based systems, as they use different conventions.
When you add %*[^\n] to the end of a textscan format, as you have here. it means to skip everything to the end of line. But if it expects a specific end of line character, and can't find one, it will skip everything to the end of the file. This would explain why you get one row correctly read and then nothing else.
If you don't specify what the end of line character is, Matlab appears to default to... something... in this not very clear specification in the help:
The default end-of-line sequence is \n, \r, or \r\n, depending on the contents of your file.
One way to try and cure this without having to create a new file would be to add this 'EndOfLine', '\r\n' to your textscan call:
If you specify '\r\n', then textscan treats any of \r, \n, and the
combination of the two (\r\n) as end-of-line characters.
This will hopefully handle most standard(ish) EOL conventions. It is likely that copy-pasting and saving with a different bit of software than was originally used to create the file changed the end of line characters such that Matlab was able to recognise them.

Write data to one file, numerous times

I have a very long data needs to be analyzed, and so I've decided to write it to a file several times during the code, and store it as a .csv file.
I've tried the following:
fid = fopen('c:\temp\results\final.csv','w'); %fid is a global variable
In another file do the following
string_data = strcat(fullFileName, '_' ,stim_vec,'_THD.csv');
fprintf(fid, '\n%s\n', string_data);
dlmwrite('c:\temp\results\final.csv', THD, '-append');
string_data = strcat(fullFileName, '_' ,stim_vec,'_phase_diff.csv');
fprintf(fid, '\n%s\n', string_data);
dlmwrite('c:\temp\results\final.csv', phase_diff, '-append');
And again in the first file
fclose(pid)
My problem is that data arriving from dlmwrite command only being written once, I think that fprintf data overrun it - but couldn't figure out how...
Can someone please assist me?
Thanks,

Reading huge .csv files with matlab - file is not well orgenized

I have several .csv files that I read with matlab using textscan, beause csvread and xlsread do not support this size of a file 200Mb-600Mb.
I use this line to read it:
C = textscan(fileID,'%s%d%s%f%f%d%d%d%d%d%d%d','delimiter',',');
the problem that I have found that sometimes the data is not in this format and then the textscan stop to read in that line without any error.
So what I have done is to read it in this way
C = textscan(fileID,'%s%d%s%f%f%s%s%s%s%s%s%s%s%s%s%s','delimiter',',');
In this way I see the in 2 rows out of 3 milion there is a change in the format.
I want to read all the lines except the bad/different lines.
In addition if its possible to read only the lines that the first string is 'PAA'. is it possible ?
I have tried to load it directly to matlab but its super slow and sometime it get stuck. Or for the realy big one it will announce memory problem.
Any recomendations?

For large files which are still small enough to fit your memory, parsing all lines at once is typically the best choice.
f = fopen('data.txt');
g = textscan(f,'%s','delimiter','\n');
fclose(f);
In a next step you have to identify the lines starting with PAA use strncmp.
Now having your data filtered, apply your textscan expression above to each line. If it fails, try the other.

Matlab is slow with this kind of thing because it needs to load everything into memory. I would suggest using grep/bash/cmd lines to reduce your file to readable lines before processing them in Matlab, in Linux you can:
awk '{if (p ~ /^PAA/ && $1 ~ /^PAA/) print; p=$1}' yourfile.csv > yourNewFile.csv %// This will give you a new file with all the lines that starts with PAA (NOTE: Case sensitive)
To Find lines that does not have the same format, you can use:
awk -F ',' 'NF = 12 {print NR, $0} ' yourfile.csv > yourNewFile.csv
This line looks at 12 delimiters for each line, and discard any line that has more than 12 ",".

Preventing fgets from deleting first line

I'm opening a file, reading the first line using fgets, using regexp to test what format the file is in, and if the file is in the desired format, I use fscanf to read the entire file.
fid = fopen('E:\Tick Data\Data Output\Differentformatfiles\AUU01.csv','rt');
% reads first line of file but seems to be deleting the line:
str = fgets(fid);
% test for pattern mm/dd/yyyy
if(regexp(str, '\d\d/\d\d/\d\d\d\d'))
c = fscanf(fid, '%d/%d/%d,%d:%d:%d,%f,%d,%*c');
Unfortunately, if the contents of my file look like:
20010701,08:29:30.000,95.00,29,E
20010702,08:29:30.000,95.00,68,E
20010703,08:29:30.000,95.00,5,E
20010704,08:29:30.000,95.00,40,E
20010705,08:29:30.000,95.00,72,E
str will equal 20010701,08:29:30.000,95.00,29,E, but c will only equal the last 4 lines:
20010702,08:29:30.000,95.00,68,E
20010703,08:29:30.000,95.00,5,E
20010704,08:29:30.000,95.00,40,E
20010705,08:29:30.000,95.00,72,E
Is there a way to prevent fgets from deleting the first line? Or another function I should use?

It isn't actually erasing it, it's just moving on to the next line. You could either use a combination of fpos and fseek to go back to the beginning of that line, but since you've already got the line stored in str, I would add two lines:
if(regexp(str, '\d\d/\d\d/\d\d\d\d'))
c1 = sscanf(str, '%d/%d/%d,%d:%d:%d,%f,%d,%*c'); % scan the string
c2 = fscanf(fid, '%d/%d/%d,%d:%d:%d,%f,%d,%*c');
c = {c1;c2}; % concatenate the cells
It certainly isn't the most elegant solution, but it's robust and easy to shoehorn into your existing code.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to textscan to read all the lines in a file - matlab

That call to textscan means "read everything up to a \n". In general your file may have mixed line endings, or none at all and have records separated by ':' or something.

Related

How to delete every nth lines in a txt file?

reading data from csv files with `textscan` in MATLAB

Write data to one file, numerous times

Reading huge .csv files with matlab - file is not well orgenized

Preventing fgets from deleting first line

Categories

Resources