Index exceed matrix dimension error when reading a csv file using matlab? - matlab

I have a few cvs files from which I want to read specific lines and so to collect specific information from them.
While I found that I am able to read these files all good if removing manually a line, I would like to be able to skip this line using some code to avoid going through each of these files and manually removing this line.
Example:
My file looks like this
blabla
blabla
blabla
S>
blabla
blabla
nquan = 12
blabla
I am reading this file using the following code in matlab:
din = 'C:/example/';
CNVfiles = dir ([din '*.cnv']);
fid = fopen([din CNVfiles], 'r');
I want to be able to get the number '12' from the line '# nquan = 12' (which is the number of
column (Ncol) that I will need later),
p = ' ';
while ~isequal(p(1:7),'* nquan')
p = fgets(fid);
end
Ncol = str2double(p(11:end));
fclose(fid);
However, it gets me an error stating 'Index esceed matrix dimension' at 'end' ....when I look at what 'p' is, it tells me '* S>' and hence I am guessing that I have an issue when reading that '* S>' line in the files..
When I manually remove that line '* S>', it works all good and I get my Ncol = 12. However, I would like to avoid to do this manually since I have a bunch of cnv files like that.
I was thinking of skipping that line, but do not know how to do that...any ideas what is wrong here? and what can I do to make it works?
Many thanks,
Sophie

You are getting this error because when your loop reaches the line in your file which contains "* S>", the value of p is equal to '* S>'. As you can see, p is an array of length 4. When you now try p(1:7), Matlab complains since you are accessing elements that aren't present.

Related

How to read a number from text file via Matlab

I have 1000 text files and want to read a number from each file.
format of text file as:
af;laskjdf;lkasjda123241234123
$sakdfja;lskfj12352135qadsfasfa
falskdfjqwr1351
##alskgja;lksjgklajs23523,
asdfa#####1217653asl123654fjaksj
asdkjf23s#q23asjfklj
asko3
I need to read the number ("1217653") behind "#####" in each txt file.
The number will follow the "#####" closely in all text file.
"#####" and the close following number just appear one time in each file.
clc
clear
MyFolderInfo = dir('yourpath/folder');
fidin = fopen(file_name,'r','n','utf-8');
while ~feof(fidin)
tline=fgetl(fidin);
disp(tline)
end
fclose(fidin);
It is not finish yet. I am stuck with the problem that it can not read after the space line.
This is another approach using the function regex. This will easily provide a more advanced way of reading files and does not require reading the full file in one go. The difference from the already given example is basically that I read the file line-by-line, but since the example use this approach I believe it is worth answering. This will return all occurences of "#####NUMBER"
function test()
h = fopen('myfile.txt');
str = fgetl(h);
k = 1;
while (isempty(str) | str ~= -1 ) % Empty line returns empty string and EOF returns -1
res{k} = regexp(str,'#####\d+','match');
k = k+1;
str = fgetl(h);
end
for k=1:length(res)
disp(res{k});
end
EDIT
Using the expression '#####(\d+)' and the argument 'tokens' instead of 'match' Will actually return the digits after the "#####" as a string. The intent with this post was also, apart from showing another way to read the file, to show how to use regexp with a simple example. Both alternatives can be used with suitable conversion.
Assuming the following:
All files are ASCII files.
The number you are looking to extract is directly following #####.
The number you are looking for is a natural number.
##### followed by a number only occurs once per file.
You can use this code snippet inside a for loop to extract each number:
regx='#####(\d+)';
str=fileread(fileName);
num=str2double(regexp(str,regx,'tokens','once'));
Example of for loop
This code will iterate through ALL files in yourpath/folder and save the numbers into num.
regx='#####(\d+)'; % Create regex
folderDir='yourpath/folder';
files=cellstr(ls(folderDir)); % Find all files in folderDir
files=files(3:end); % remove . and ..
num=zeros(1,length(files)); % Pre allocate
for i=1:length(files) % Iterate through files
str=fileread(fullfile(folderDir,files{i})); % Extract str from file
num(i)=str2double(regexp(str,regx,'tokens','once')); % extract number using regex
end
If you want to extract more ''advanced'' numbers e.g. Integers or Real numbers, or handle several occurrences of #####NUMBER in a file you will need to update your question with a better representation of your text files.

Matlab: Printing a data on a specific line

I have below function:
function [] = Write(iteration)
status=close('all');
nomrep=num2str(iteration);
fid=fopen('ID.dat','a');
frewind(fid);
for l=1:iteration
line=fgetl(fid);
end
fprintf(fid,[nomrep,' \n']);
status=fclose(fid);
end
I expect that Write(15) creates ID.dat and prints 2 and 15 in consecutive lines at begining of line 15th.
But is prints those values always on the beginning of the file.
Even I tried fgetl(fid) alone, and also replaced for loop with while loop still did not work.
Is it due to the fact that I should fill in the lines before that with some dummy space? along side this, I executed
for i=1:5
Write(i);
end
Which should print 1 to 5 in each line but even this does not work.
This line is the problem:
fid=fopen('ID.dat','w');
Everytime you open the file, you are overwriting the previous contents (that is what the 'w' argument does). Change 'w' to 'a' (for append), and your file will retain the contents from one write to the next.

Reading huge .csv files with matlab - file is not well orgenized

I have several .csv files that I read with matlab using textscan, beause csvread and xlsread do not support this size of a file 200Mb-600Mb.
I use this line to read it:
C = textscan(fileID,'%s%d%s%f%f%d%d%d%d%d%d%d','delimiter',',');
the problem that I have found that sometimes the data is not in this format and then the textscan stop to read in that line without any error.
So what I have done is to read it in this way
C = textscan(fileID,'%s%d%s%f%f%s%s%s%s%s%s%s%s%s%s%s','delimiter',',');
In this way I see the in 2 rows out of 3 milion there is a change in the format.
I want to read all the lines except the bad/different lines.
In addition if its possible to read only the lines that the first string is 'PAA'. is it possible ?
I have tried to load it directly to matlab but its super slow and sometime it get stuck. Or for the realy big one it will announce memory problem.
Any recomendations?
For large files which are still small enough to fit your memory, parsing all lines at once is typically the best choice.
f = fopen('data.txt');
g = textscan(f,'%s','delimiter','\n');
fclose(f);
In a next step you have to identify the lines starting with PAA use strncmp.
Now having your data filtered, apply your textscan expression above to each line. If it fails, try the other.
Matlab is slow with this kind of thing because it needs to load everything into memory. I would suggest using grep/bash/cmd lines to reduce your file to readable lines before processing them in Matlab, in Linux you can:
awk '{if (p ~ /^PAA/ && $1 ~ /^PAA/) print; p=$1}' yourfile.csv > yourNewFile.csv %// This will give you a new file with all the lines that starts with PAA (NOTE: Case sensitive)
To Find lines that does not have the same format, you can use:
awk -F ',' 'NF = 12 {print NR, $0} ' yourfile.csv > yourNewFile.csv
This line looks at 12 delimiters for each line, and discard any line that has more than 12 ",".

create more than one text file using matab's fopen in a for-loop

I'm quite new to Matlab and programming in general and would love to get some help with the following. I've look here on the website, but couldn't find an answer.
I am trying to use a for-loop and fprintf to give me a bunch of separate text files, whose file names contain the index I use for my for-loop. See for example this piece of code to get the idea of what I'd like to do:
for z=1:20
for x=1:z;
b=[x exp(x)];
fid = fopen('table z.txt','a');
fprintf(fid,'%6.2f, %6.2f\n',b);
fclose(fid);
end
end
What I'm looking for, is a script that (in this case) gives me 20 separate .txt files with names 'table i.txt' (i is 1 through 20) where
table 1.txt only contains [1, exp(1)],
table 2.txt contains [1, exp(1)] \newline [2, exp(2)]
and so on.
If I run the script above, I get only one text file (named 'table z.txt' with all the data appended underneath. So the naming of fopen doesn't 'feel' the z values, but interprets z as a letter (which, seeing the quotation marks doesn't really surprise me)
I think there must be an elegant way of doing this, but I haven't been able to find it. I hope someone can help.
Best,
L
use num2str and string concatenation [ ... ].
fid = fopen( ['table ' num2str(z) '.txt'],'a');
Opening your file in the innermost loop is inefficient, you should create a file as soon as you know z (see example below). To format a string the same way that fprintf, you can use sprintf.
for z=1:20
fname = sprintf('table %d.txt',z);
fid = fopen(fname,'w');
for x=1:z
fprintf(fid,'%6.2f, %6.2f\n', x, exp(x));
end
fclose(fid);
end

Preventing fgets from deleting first line

I'm opening a file, reading the first line using fgets, using regexp to test what format the file is in, and if the file is in the desired format, I use fscanf to read the entire file.
fid = fopen('E:\Tick Data\Data Output\Differentformatfiles\AUU01.csv','rt');
% reads first line of file but seems to be deleting the line:
str = fgets(fid);
% test for pattern mm/dd/yyyy
if(regexp(str, '\d\d/\d\d/\d\d\d\d'))
c = fscanf(fid, '%d/%d/%d,%d:%d:%d,%f,%d,%*c');
Unfortunately, if the contents of my file look like:
20010701,08:29:30.000,95.00,29,E
20010702,08:29:30.000,95.00,68,E
20010703,08:29:30.000,95.00,5,E
20010704,08:29:30.000,95.00,40,E
20010705,08:29:30.000,95.00,72,E
str will equal 20010701,08:29:30.000,95.00,29,E, but c will only equal the last 4 lines:
20010702,08:29:30.000,95.00,68,E
20010703,08:29:30.000,95.00,5,E
20010704,08:29:30.000,95.00,40,E
20010705,08:29:30.000,95.00,72,E
Is there a way to prevent fgets from deleting the first line? Or another function I should use?
It isn't actually erasing it, it's just moving on to the next line. You could either use a combination of fpos and fseek to go back to the beginning of that line, but since you've already got the line stored in str, I would add two lines:
if(regexp(str, '\d\d/\d\d/\d\d\d\d'))
c1 = sscanf(str, '%d/%d/%d,%d:%d:%d,%f,%d,%*c'); % scan the string
c2 = fscanf(fid, '%d/%d/%d,%d:%d:%d,%f,%d,%*c');
c = {c1;c2}; % concatenate the cells
It certainly isn't the most elegant solution, but it's robust and easy to shoehorn into your existing code.