How to load a csv file in MATLAB by skipping erroneous rows? - matlab

I am using the csvread command to read a large CSV file:
M=csvread('myfile.csv');
But there are few rows in it, apparently, which do not allow MATLAB to load the file because of being text (or otherwise garbage). For example, line number 45372, 117573, etc. So, how do I skip them when loading the file?

When few mean already few you can use try-catch structure in while loop. It is also needed to know number of columns.
Following code expect error message:
Mismatch between file and format string.
Trouble reading number from file (row <row#>, field <field#>) ==> <faulty line content>
The code tries to load whole csv in once. If it fails it analyses error message, identifies the faulty line, loads the file to the line before and appends the result to the data variable. In next run it starts loading one line below the faulty line. This iteration is terminated when the remaining file is succesfully read to the end.
StartRow=<number of ignored rows (header)>;
Ncols=<width>;
data=zeros(0,Ncols);
csvFault=true;
while csvFault
try
Temp=csvread('YourFile.csv',StartRow,0) % Try to read file from StartRow to the end-of-file.
csvFault=false; % Executed only when CSVread is succesful
catch msg
%% Error occured
faultyRow=regexp(msg.message,' ','split'); % Split error message into cells
faultyRow=faultyRow{12}; % 12th cell contain row identifier
faultyRow=str2double(faultyRow(1:end-1)); % Identifier is number followed by 1 letter
%% Read the data from the segment between faulty lines
Temp=csvread('YourFile.csv',StartRow,0,[StartRow,0,faultyRow-1,Ncols-1])
StartRow=faultyRow+1;
end
data=[data;Temp];
end

Related

How can I import a table in MATLAB with footer rows?

I have a text file with header lines above the table and below the table is a blank line and then a table with summary statistics for the table. Handling the header lines is easy as most of the standard functions have an option for that (i.e. readtable). The length of the file is not always the same. The issue with readtable is that the footer table has fewer columns than the main table, so the function is unable to read those lines and returns an error.
This is the error that I get with readtable:
Error using readtable (line 216)
Reading failed at line 2285. All lines of a text file must have the same number of delimiters. Line 2285 has 0 delimiters, while preceding lines
have 24.
Note: readtable detected the following parameters:
'Delimiter', '\t', 'HeaderLines', 21, 'ReadVariableNames', true, 'Format', '%T%f%f%f%q%f%f%f%f%f%f%q%f%f%f%f%f%f%f%f%f%f%f%f%f'
Here's what I've come up with as an alternative solution:
dataStartRow = 23;
numRows = length(readmatrix(filePath, 'NumHeaderLines',0));
dataEndRow = numRows - 8;
opts = detectImportOptions(filePath);
opts.DataLines = [dataStartRow, dataEndRow];
dataTable = readtable(filePath, opts);
This works but I have another file with a different number of footer rows and I don't know how to deal with this without hardcoding in the number of footer lines.
I've considered using fgetl, and reading lines in one by one to determine when to stop adding to the table, but that seems very inefficient. How can I import this table with an unknown number of table lines and an unknown number of footer lines?
First of all, don't conclude that something 'seems very inefficient' unless you've profiled or timed it and found that it's actually too slow for your requirements.
In this case though, you can change the action MATLAB takes in the event of an error, by setting this property of your delimitedTextImportOptions object:
opts = detectImportOptions(filePath);
opts.ImportErrorRule = 'omitrow'; # ignore any lines that don't match the detected pattern
dataTable = readtable(filePath, opts);
If you use this code regularly to read in new data, I'd consider doing some sort of validation on dataTable to make sure it is consistent with what you expect, just in case a new file causes detectImportOptions to give a different result for some reason. For example if you know that the number of columns and their format should always be the same, you could specify
opts.Format = '%T%f%f%f%q%f%f%f%f%f%f%q%f%f%f%f%f%f%f%f%f%f%f%f%f';
and then check that the resulting table is not empty.

Matlab saving a .mat file with a variable name

I am creating a matlab application that is analyzing data on a daily basis.
The data is read in from an csv file using xlsread()
[num, weather, raw]=xlsread('weather.xlsx');
% weather.xlsx is a spreadsheet that holds a list of other files (csv) i
% want to process
for i = 1:length(weather)
fn = [char(weather(i)) '.csv'];
% now read in the weather file, get data from the local weather files
fnOpen = xlsread(fn);
% now process the file to save out the .mat file with the location name
% for example, one file is dallasTX, so I would like that file to be
% saved as dallasTx.mat
% the next is denverCO, and so denverCO.mat, and so on.
% but if I try...
fnSave=[char(weather(i)) '.mat'] ;
save(fnSave, fnOpen) % this doesn't work
% I will be doing quite a bit of processing of the data in another
% application that will open each individual .mat file
end
++++++++++++++
Sorry about not providing the full information.
The error I get when I do the above is:
Error using save
Argument must contain a string.
And Xiangru and Wolfie, the save(fnSave, 'fnOpen') works as you suggested it would. Now I have a dallasTX.mat file, and the variable name inside is fnOpen. I can work with this now.
Thanks for the quick response.
It would be helpful if you provide the error message when it doesn't work.
For this case, I think the problem is the syntax for save. You will need to do:
save(fnSave, 'fnOpen'); % note the quotes
Also, you may use weather{i} instead of char(weather(i)).
From the documentation, when using the command
save(filename, variables)
variables should be as described:
Names of variables to save, specified as one or more character vectors or strings. When using the command form of save, you do not need to enclose the input in single or double quotes. variables can be in one of the following forms.
This means you should use
save(fnSave, 'fnOpen');
Since you want to also use a file name stored in a variable, command syntax isn't ideal as you'd have to use eval. In this case the alternative option would be
eval(['save ', fnSave, ' fnOpen']);
If you had a fixed file name (for future reference), this would be simpler
save C:/User/Docs/MyFile.mat fnOpen

Postgres: Inconsistent reported row numbers by failed Copy error

I have a file foo that I'd like to copy into a table.
\copy stage.from_csv FROM '/path/foo.dat' CSV;
If foo has an error like column mismatch or bad type, the return error is normal:
CONTEXT: COPY from_csv, line 5, column report_year: "aa"
However, if the error is caused by an extraneous quotation mark, the reported line number is always one greater than the size of the file.
CONTEXT: COPY from_csv, line 11: "02,2004,"05","123","09228","00","SUSX","PR",30,,..."
The source file has 10 lines, and I placed the error in line 5. If you examine the information in the CONTEXT message, it contains the line 5 data, so I know postgres can identify the row itself. However, it cannot identify the row by number. I have done this with a few different file lengths and the returned line number behavior is consistent.
Anyone know the cause and/or how to get around this?
That is because the error manifests only at the end of the file.
Look at this CSV file:
"google","growing",1,2
"google","growing",2,3
"google","falling","3,4
"google","growing",4,5
"yahoo","growing",1,2
There is a bug in the third line, an extra " has been added before the 3.
Now to parse a CSV file, you first read until you hit the next line break.
But be careful, line breaks that are within double quotes don't count!
So all of the line breaks are part of a quoted string, and the line continues until the end of file.
Now that we have read our line, we can continue parsing it and notice that the number of quotes is unbalanced. Hence the error message:
ERROR: unterminated CSV quoted field
CONTEXT: COPY stock, line 6: ""google","falling","3,4
"google","growing",4,5
"yahoo","growing",1,2
"
In a nutshell: the error does not occur until we have reached the end of file.

Abaqus *.inp file created using Matlab

I was trying to do parametric studies in ABAQUS. I created an *.inp file (master) using GUI in abaqus, then wrote a matlab code to create a new *.inp file using the master. Master *.inp file can be found here and will be required to run the code.
In the new *.inp file everything was same as the master except a few specific lines which I am changing for parametric studies, code is given below. I am getting the files nicely but the problem is ABAQUS can't read the file and gives error messages. By visual inspection I don't find any faults. I guess matlab is writing the *.inp file in some other format which ABAQUS can't interpret.
clc;
%Number of lines to be copied
total_lines=4538; %total number of lines
lines_b4_RP1=4406; % lines before reference point 1
%creating new files
for A=0
for R=[20 30 40 50 100 200 300 400 500]
fileroot = sprintf('P_SHS_120X120X1_NLA_I15_A%dR%d.inp', A,R);
main_inp=fopen('P_SHS_120X120X1_NLA_I15_A0R10.inp','r'); %inputting the main inp file to be copied
wfile=fopen(fileroot,'w+'); %wfile= writing the new file
for i=1:total_lines
data=fgets(main_inp);
if i<lines_b4_RP1
fprintf(wfile,'%s\n', data);
elseif i==lines_b4_RP1
formatline1=('%s\n');
txtline='*Node';
fprintf(wfile, formatline1 ,txtline);
elseif i==(lines_b4_RP1+1)
formatline2=('%d%s%d%s%d%s%d\r\n');
comma=',';
refpt1=1;
xcoord1=R*cosd(A);
ycoord1=R*sind(A);
zcoord1=-20;
fprintf(wfile, formatline2, refpt1,comma,xcoord1,comma,ycoord1,comma,zcoord1);
elseif i==(lines_b4_RP1+2)
fprintf(wfile, formatline1 ,txtline);
elseif i==(lines_b4_RP1+3)
refpt2=2;
xcoord2=R*cosd(A);
ycoord2=R*sind(A);
zcoord2=420;
fprintf(wfile, formatline2 ,refpt2,comma,xcoord2,comma,ycoord2,comma,zcoord2);
elseif i>(lines_b4_RP1+3)
fprintf(wfile,'%s\n', data);
else break;
end
end
fclose(main_inp);
fclose(wfile);
end
end
Thanks in advance.
N.B. A sample *.dat file containing the error message is given here.
You are using fgets to get each line of the input file. From the matlab help:
fgets: Read line from file, keeping newline characters
You then print each line using
fprintf(wfile,'%s\n', data);
This creates two newlines at the end of each data line in the file. A second problem in your file is that you use \r\n in your format specifier. In matlab (unlike C) this will give you two newlines. e.g.
>> fprintf('Hello\rWorld\nFoo\r\nBar\n')
Hello
World
Foo
Bar
>>
I would suggest in future to test this approach with a much simpler format that you can use. Also there is a
*preprint
option that allows you to echo the contents of the input file back into the dat file. This creates big dat files, but it is useful for debugging.

How to avoid the repeated paragraghs of long txt files being ignored for importdata in matlab

I am trying to import all double from a txt file, which has this form
#25x1 string
#9999x2 double
.
.
.
#(repeat ten times)
However, when I am trying to use import Wizard, only the first
25x1 string
9999x2 double.
was successfully loaded, the other 9 were simply ignored
How may I import all the data? (Does importdata has a maximum length or something?)
Thanks
It's nothing to do with maximum length, importdata is just not set up for the sort of data file you describe. From the help file:
For ASCII files and spreadsheets, importdata expects
to find numeric data in a rectangular form (that is, like a matrix).
Text headers can appear above or to the left of the numeric data,
as follows:
Column headers or file description text at the top of the file, above
the numeric data. Row headers to the left of the numeric data.
So what is happening is that the first section of your file, which does match the format importdata expects, is being read, and the rest ignored. Instead of importdata, you'll need to use textscan, in particular, this style:
C = textscan(fileID,formatSpec,N)
fileID is returned from fopen. formatspec tells textscan what to expect, and N how many times to repeat it. As long as fileID remains open, repeated calls to textscan continue to read the file from wherever the last read action stopped - rather than going back to the start of the file. So we can do this:
fileID = fopen('myfile.txt');
repeats = 10;
for n = 1:repeats
% read one string, 25 times
C{n,1} = textscan(fileID,'%s',25);
% read two floats, 9999 times
C{n,2} = textscan(fileID,'%f %f',9999);
end
You can then extract your numerical data out of the cell array (if you need it in one block you may want to try using 'CollectOutput',1 as an option).