How to delete every nth lines in a txt file?

How to delete every nth lines in a txt file? - matlab

I have a txt file (ANSYS 1ST principle nodal stress list) and there are almost 16k lines inside of it. I wanna delete specific lines for example 1st, 2nd, 3rd, 4th, 5th, 39th, 40th, 41th, 42th, 43th, etc. I dont need to search anything, i know which lines be deleted. is there anybody help?

Maybe not the most efficient way but this works:
data_file = 'data.txt';
lines_to_skip = [1:5, 39:43];
fid = fopen(data_file);
ii = 0;
while ~feof(fid)
ii = ii + 1;
file_content{ii} = fgetl(fid);
end
lines = true(1,ii);
lines(lines_to_skip) = false;
fid = fopen(data_file,'w');
fprintf(fid,'%s\r\n',file_content{lines});
fclose(fid);

If you are using linux you can use that command:
sed -i '2d' data.txt

This is tagged as Matlab, but doing this inside Matlab is going to be painful because it doesn't usually offer a convenient way to remove bytes in the middle of a file, so you'd have to write some code to write the text to a new file, skipping lines as appropriate.
If you're on a UNIX system it'll be much easier using sed. There's a great answer here explaining how to do that. The key command is:
# To delete line 10 and 12:
sed -i -e '10d;12d' your-file.txt

Related

Search and replace variable

I have some 100+ conf files that I am working. I need to find and replace various variable for all these files. For example, I'd like to find the line
Amplitude = 100; and replace it to: Amplitude = 200; for all files.
I've searched in online and found the solution only for one file. I'm looking for a way to do that in Matlab. Any ideas?

If these files can be opened as normal text files then I wouldn't use matlab. Notepad++ has an replace option for as many files as you want, just make sure, you test it out on a backup file first. so have it find "Amplitude = 100" and replace that by what you want.
To see how to do it, look here:
how-to-find-and-replace-lines-in-multiple-files
If you can't do that, put all the files in the same directory (you have to do this anyway). Then load the files in matlab with that directory and run a for loop. However it might be a bit slow./
Basically if you can do 1 file, you could do all of them with a for loop.
If you need help with that I show can some code I used before.

Well, Matlab solution would be to (recursively) open all files in the directory. Here I show example for non-recursive solution (it does not check subfolders), though it would be easy enough to modify it to search subfolders too if needed:
d = dir(yourPath);
for i = 1 : length(d)
if ~(d(i).isdir)
%d(i) is file.
replaceSingleFile(strcat(d(i).folder, d(i).name));
end
end
As you say, you already know how to do replace for a single file, though to have complete answer here, solution could be along the lines (in the function replaceSingleFile).
F = fopen(fileYouWantReplaced);
i = 1;
while (~feof(F))
L = fgetl(F);
L = strrep(L, 'Amplitude = 100;', 'Amplitude = 200;');
Buf{i} = L;
i = i + 1;
end
fclose(F);
%now just write all Buf to the same file again.
F = fopen(file..., 'w'); % Discard contents.
for i = 1 : numel(Buf)
fprintf(F, '%s\n', Buf{i});
end
fclose(F);

Reading huge .csv files with matlab - file is not well orgenized

I have several .csv files that I read with matlab using textscan, beause csvread and xlsread do not support this size of a file 200Mb-600Mb.
I use this line to read it:
C = textscan(fileID,'%s%d%s%f%f%d%d%d%d%d%d%d','delimiter',',');
the problem that I have found that sometimes the data is not in this format and then the textscan stop to read in that line without any error.
So what I have done is to read it in this way
C = textscan(fileID,'%s%d%s%f%f%s%s%s%s%s%s%s%s%s%s%s','delimiter',',');
In this way I see the in 2 rows out of 3 milion there is a change in the format.
I want to read all the lines except the bad/different lines.
In addition if its possible to read only the lines that the first string is 'PAA'. is it possible ?
I have tried to load it directly to matlab but its super slow and sometime it get stuck. Or for the realy big one it will announce memory problem.
Any recomendations?

For large files which are still small enough to fit your memory, parsing all lines at once is typically the best choice.
f = fopen('data.txt');
g = textscan(f,'%s','delimiter','\n');
fclose(f);
In a next step you have to identify the lines starting with PAA use strncmp.
Now having your data filtered, apply your textscan expression above to each line. If it fails, try the other.

Matlab is slow with this kind of thing because it needs to load everything into memory. I would suggest using grep/bash/cmd lines to reduce your file to readable lines before processing them in Matlab, in Linux you can:
awk '{if (p ~ /^PAA/ && $1 ~ /^PAA/) print; p=$1}' yourfile.csv > yourNewFile.csv %// This will give you a new file with all the lines that starts with PAA (NOTE: Case sensitive)
To Find lines that does not have the same format, you can use:
awk -F ',' 'NF = 12 {print NR, $0} ' yourfile.csv > yourNewFile.csv
This line looks at 12 delimiters for each line, and discard any line that has more than 12 ",".

Preventing fgets from deleting first line

I'm opening a file, reading the first line using fgets, using regexp to test what format the file is in, and if the file is in the desired format, I use fscanf to read the entire file.
fid = fopen('E:\Tick Data\Data Output\Differentformatfiles\AUU01.csv','rt');
% reads first line of file but seems to be deleting the line:
str = fgets(fid);
% test for pattern mm/dd/yyyy
if(regexp(str, '\d\d/\d\d/\d\d\d\d'))
c = fscanf(fid, '%d/%d/%d,%d:%d:%d,%f,%d,%*c');
Unfortunately, if the contents of my file look like:
20010701,08:29:30.000,95.00,29,E
20010702,08:29:30.000,95.00,68,E
20010703,08:29:30.000,95.00,5,E
20010704,08:29:30.000,95.00,40,E
20010705,08:29:30.000,95.00,72,E
str will equal 20010701,08:29:30.000,95.00,29,E, but c will only equal the last 4 lines:
20010702,08:29:30.000,95.00,68,E
20010703,08:29:30.000,95.00,5,E
20010704,08:29:30.000,95.00,40,E
20010705,08:29:30.000,95.00,72,E
Is there a way to prevent fgets from deleting the first line? Or another function I should use?

It isn't actually erasing it, it's just moving on to the next line. You could either use a combination of fpos and fseek to go back to the beginning of that line, but since you've already got the line stored in str, I would add two lines:
if(regexp(str, '\d\d/\d\d/\d\d\d\d'))
c1 = sscanf(str, '%d/%d/%d,%d:%d:%d,%f,%d,%*c'); % scan the string
c2 = fscanf(fid, '%d/%d/%d,%d:%d:%d,%f,%d,%*c');
c = {c1;c2}; % concatenate the cells
It certainly isn't the most elegant solution, but it's robust and easy to shoehorn into your existing code.

Index exceed matrix dimension error when reading a csv file using matlab?

I have a few cvs files from which I want to read specific lines and so to collect specific information from them.
While I found that I am able to read these files all good if removing manually a line, I would like to be able to skip this line using some code to avoid going through each of these files and manually removing this line.
Example:
My file looks like this
blabla
blabla
blabla
S>
blabla
blabla
nquan = 12
blabla
I am reading this file using the following code in matlab:
din = 'C:/example/';
CNVfiles = dir ([din '*.cnv']);
fid = fopen([din CNVfiles], 'r');
I want to be able to get the number '12' from the line '# nquan = 12' (which is the number of
column (Ncol) that I will need later),
p = ' ';
while ~isequal(p(1:7),'* nquan')
p = fgets(fid);
end
Ncol = str2double(p(11:end));
fclose(fid);
However, it gets me an error stating 'Index esceed matrix dimension' at 'end' ....when I look at what 'p' is, it tells me '* S>' and hence I am guessing that I have an issue when reading that '* S>' line in the files..
When I manually remove that line '* S>', it works all good and I get my Ncol = 12. However, I would like to avoid to do this manually since I have a bunch of cnv files like that.
I was thinking of skipping that line, but do not know how to do that...any ideas what is wrong here? and what can I do to make it works?
Many thanks,
Sophie

You are getting this error because when your loop reaches the line in your file which contains "* S>", the value of p is equal to '* S>'. As you can see, p is an array of length 4. When you now try p(1:7), Matlab complains since you are accessing elements that aren't present.

how to textscan to read all the lines in a file

I am trying to read all the lines in a .m file with the following
file_content = textscan(fid, '%s', 'delimiter', '\n', 'whitespace', '')
but this just returns
file_content =
{0x1 cell}
when actually my file has 224 line. so if i use
file_content = textscan(fid,'%s',224,'delimiter','\n')
i get all the lines
file_content =
{224x1 cell}
what will be a more proper way to read all the data(mostly strings) in a .m file?
thanks

Since you do not list your needs (are you reading a huge file?, many small files? is speed an issue? what do you really want to do?) I'm giving you the simplest possible answer:
You do this:
f = fopen('data.txt');
g = textscan(f,'%s','delimiter','\n');
fclose(f);
remember to close after reading, because otherwise you won't be able to read again.
You can get the first line as g{1}{1}, the second as g{1}{2} and so on.
Here is the matlab documentation for textscan which gives a lot more details.

Here's a method that worked for me:
fid = fopen('filename','r'); %opening in read mode (default)
inter = textscan(fid,'%[^\n]');
lines = inter{1,1};
fclose(fid);
This command reads the whole file 'line by line'. for example, I had a text file with 1332 lines, this code creates a variable inter which is a {1,1 cell} and lines which is a [1x102944 char].
I'm not sure why/how this works (it'd be great if someone else reading this knows how!) but it works for my program.

That call to textscan means "read everything up to a \n".
In general your file may have mixed line endings, or none at all and have records separated by ':' or something.