MATLAB: How do you insert a line of text at the beginning of a file? - matlab

I have a file full of ascii data. How would I append a string to the first line of the file? I cannot find that sort of functionality using fopen (it seems to only append at the end and nothing else.)

The following is a pure MATLAB solution:
% write first line
dlmwrite('output.txt', 'string 1st line', 'delimiter', '')
% append rest of file
dlmwrite('output.txt', fileread('input.txt'), '-append', 'delimiter', '')
% overwrite on original file
movefile('output.txt', 'input.txt')

Option 1:
I would suggest calling some system commands from within MATLAB. One possibility on Windows is to write your new line of text to its own file and then use the DOS for command to concatenate the two files. Here's what the call would look like in MATLAB:
!for %f in ("file1.txt", "file2.txt") do type "%f" >> "new.txt"
I used the ! (bang) operator to invoke the command from within MATLAB. The command above sequentially pipes the contents of "file1.txt" and "file2.txt" to the file "new.txt". Keep in mind that you will probably have to end the first file with a new line character to get things to append correctly.
Another alternative to the above command would be:
!for %f in ("file2.txt") do type "%f" >> "file1.txt"
which appends the contents of "file2.txt" to "file1.txt", resulting in "file1.txt" containing the concatenated text instead of creating a new file.
If you have your file names in strings, you can create the command as a string and use the SYSTEM command instead of the ! operator. For example:
a = 'file1.txt';
b = 'file2.txt';
system(['for %f in ("' b '") do type "%f" >> "' a '"']);
Option 2:
One MATLAB only solution, in addition to Amro's, is:
dlmwrite('file.txt',['first line' 13 10 fileread('file.txt')],'delimiter','');
This uses FILEREAD to read the text file contents into a string, concatenates the new line you want to add (along with the ASCII codes for a carriage return and a line feed/new line), then overwrites the original file using DLMWRITE.
I get the feeling Option #1 might perform faster than this pure MATLAB solution for huge text files, but I don't know that for sure. ;)

How about using the frewind(fid) function to take the pointer to the beginning of the file?
I had a similar requirement and tried frewind() followed by the necessary fprintf() statement.
But, warning: It will overwrite on whichever is the 1st line. Since in my case, I was the one writing the file, I put a dummy data at the starting of the file and then at the end, let that be overwritten after the operations specified above.
BTW, even I am facing one problem with this solution, that, depending on the length(/size) of the dummy data and actual data, the program either leaves part of the dummy data on the same line, or bring my new data to the 2nd line..
Any tip in this regards is highly appreciated.

Related

no carriage return while using dlmwrite( ) to write array values to file in Matlab

I am trying to write the array values to CSV file in MATLAB using the following code
m=[3 12 15 ; 4 23 565];
dlmwrite('C:\Users\amar-admin\Desktop\abc.txt', m)
type C:\Users\amar-admin\Desktop\abc.txt
the output printed in the console is
3,12,15
4,23,565
but the output in File is
3,12,154,23,565
You may want to set the 'newline' option to 'pc':
dlmwrite('C:\Users\amar-admin\Desktop\abc.txt', m, 'newline', 'pc');
This will ensure that the file is created with a carriage return (\r) and a line feed (\n) at the end of each line, instead of potentially just a line feed, which could affect how it is displayed in certain text viewers. See this post for more information about the differences between \n and \r.
the issue was solved using .rtf extension
dlmwrite('C:\Users\amar-admin\Desktop\abc.rtf', m)
But, I still wonder if same is possible to .txt file

reading data from csv files with `textscan` in MATLAB

[Edited:] I have a file data2007a.csv and I copied and pasted (using TextEdit in MacBook) the first consecutive few lines to a new file datatest1.csv for testing:
Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,0,Aruba,ANT,Netherlands Antilles,2007,Export,6,448.91
S3,ABW,0,Aruba,ATG,Antigua and Barbuda,2007,Export,6,0.312
S3,ABW,0,Aruba,CHN,China,2007,Export,6,24.715
S3,ABW,0,Aruba,COL,Colombia,2007,Export,6,95.885
S3,ABW,0,Aruba,DOM,Dominican Republic,2007,Export,6,11.432
I wanted to use textscan to read it into MATLAB with only columns 2,3,5 (starting from the second row) and I wrote the following code
clc,clear all
fid = fopen('datatest1.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
'Delimiter',',',...
'HeaderLines',1);
fclose(fid);
But I ended up with only the second row of columns 2,3 and 5:
I then keep the first row in data2007a.csv and selected several others to saved as datatest2.csv:
Nomenclature,ReporterISO3,ProductCode,ReporterName,PartnerISO3,PartnerName,Year,TradeFlowName,TradeFlowCode,TradeValue in 1000 USD
S3,ABW,1,Aruba,USA,United States,2007,Export,6,1.392
S3,ABW,1,Aruba,VEN,Venezuela,2007,Export,6,5633.157
S3,ABW,2,Aruba,ANT,Netherlands Antilles,2007,Export,6,310.734
S3,ABW,2,Aruba,USA,United States,2007,Export,6,342.42
S3,ABW,2,Aruba,VEN,Venezuela,2007,Export,6,63.722
S3,AGO,0,Angola,DEU,Germany,2007,Export,6,105.334
S3,AGO,0,Angola,ESP,Spain,2007,Export,6,8533.125
And I wrote:
clc,clear all
fid = fopen('datatest2.csv');
data = textscan(fid,'%*s %s %d %*s %s %*[^\n]',...
'Delimiter',',',...
'HeaderLines',1);
fclose(fid);
data{1}
It gives exactly what I wanted:
When I use the same code for my original data file data2007a.csv, it goes as in the first case.
What is going wrong and how can I fix it?
[Added:] If one replicates my experiments1, one can find that both cases work and the problem does not exist! I really don't know what is going on.
1 For "replicate" I mean copy-and-paste the data given above and save it as two new files, say, datatest4a.csv and datatest4b.csv. I used visdiff('datatest1.csv', 'datatest4a.csv') to compare two files and it returned:
Given how you fixed it, I think this is an end-of-line character issue. This sometimes comes up when moving text files between Windows and Unix based systems, as they use different conventions.
When you add %*[^\n] to the end of a textscan format, as you have here. it means to skip everything to the end of line. But if it expects a specific end of line character, and can't find one, it will skip everything to the end of the file. This would explain why you get one row correctly read and then nothing else.
If you don't specify what the end of line character is, Matlab appears to default to... something... in this not very clear specification in the help:
The default end-of-line sequence is \n, \r, or \r\n, depending on the contents of your file.
One way to try and cure this without having to create a new file would be to add this 'EndOfLine', '\r\n' to your textscan call:
If you specify '\r\n', then textscan treats any of \r, \n, and the
combination of the two (\r\n) as end-of-line characters.
This will hopefully handle most standard(ish) EOL conventions. It is likely that copy-pasting and saving with a different bit of software than was originally used to create the file changed the end of line characters such that Matlab was able to recognise them.

MATLAB simultaneous read and write the same file

I want to read and write the same file simultaneously. Here is a simplified code:
clc;
close all;
clearvars;
fd = fopen ('abcd.txt','r+'); %opening file abcd.txt given below
while ~feof(fd)
nline = fgetl(fd);
find1 = strfind(nline,'abcd'); %searching for matching string
chk1 = isempty(find1);
if(chk1==0)
write = '0000'; %in this case, matching pattern found
% then replace that line by 0000
fprintf(fd,'%s \n',write);
else
continue;
end
end
File abcd.txt
abcde
abcd23
abcd2
abcd355
abcd65
I want to find text abcd in string of each line and replace the entire line by 0000. However, there is no change in the text file abcd.txt. The program doesn't write anything in the text file.
Someone can say read each line and write a separate text file line by line. However, there is a problem in this approach. In the original problem, instead of finding matching text `abcd, there is array of string with thousands of elements. In that case, I want to read the file, parse the file for find matching string, replace string as per condition, go to next iteration to search next matching string and so on. So in this approach, line by line reading original file and simultaneously writing another file does not work.
Another approach can be reading the entire file in memory, replacing the string and iterate. But I am not very sure how will it work. Another issue is memory usage.
Any comments?
What you are attempting to do is not possible in a efficient way. Replacing abcde with 0000, which should be done for the first line, would require all remaining text forward because you remove one char.
Instead solve this reading one file and write to a second, then remove the original file and rename the new one.

Reading huge .csv files with matlab - file is not well orgenized

I have several .csv files that I read with matlab using textscan, beause csvread and xlsread do not support this size of a file 200Mb-600Mb.
I use this line to read it:
C = textscan(fileID,'%s%d%s%f%f%d%d%d%d%d%d%d','delimiter',',');
the problem that I have found that sometimes the data is not in this format and then the textscan stop to read in that line without any error.
So what I have done is to read it in this way
C = textscan(fileID,'%s%d%s%f%f%s%s%s%s%s%s%s%s%s%s%s','delimiter',',');
In this way I see the in 2 rows out of 3 milion there is a change in the format.
I want to read all the lines except the bad/different lines.
In addition if its possible to read only the lines that the first string is 'PAA'. is it possible ?
I have tried to load it directly to matlab but its super slow and sometime it get stuck. Or for the realy big one it will announce memory problem.
Any recomendations?
For large files which are still small enough to fit your memory, parsing all lines at once is typically the best choice.
f = fopen('data.txt');
g = textscan(f,'%s','delimiter','\n');
fclose(f);
In a next step you have to identify the lines starting with PAA use strncmp.
Now having your data filtered, apply your textscan expression above to each line. If it fails, try the other.
Matlab is slow with this kind of thing because it needs to load everything into memory. I would suggest using grep/bash/cmd lines to reduce your file to readable lines before processing them in Matlab, in Linux you can:
awk '{if (p ~ /^PAA/ && $1 ~ /^PAA/) print; p=$1}' yourfile.csv > yourNewFile.csv %// This will give you a new file with all the lines that starts with PAA (NOTE: Case sensitive)
To Find lines that does not have the same format, you can use:
awk -F ',' 'NF = 12 {print NR, $0} ' yourfile.csv > yourNewFile.csv
This line looks at 12 delimiters for each line, and discard any line that has more than 12 ",".

Removing quotes that seem to be nonexistent in MatLab

I have an application that translates the .csv files I currently have into the correct format I need. But, the files I that I do have seem to have '"' double quotes around them, as seen in this image, which will not work with the program. As a result, I'm using this command to remove them:
for m = 1:currentsize(1)
for n = 1:currentsize(2)
replacement{m,n} = strrep(current{m,n}, '"', '');
end
end
I'm not entirely sure that this works though, as it spits this back at me as it runs:
Warning: Inputs must be character arrays or cell arrays of strings.
When I open the file in matlab, it seems to only have the single quotes around it, which is normal for any other file. However, when I open it in notepad++, it seems to have '"' double quotes around everything. Is there a way to remove these double quotes in any other way? My code doesn't seem to do anything as seen here:
After using xlswrite to write the replacement cell-array to a .csv file, one appears corrupted. Any idea why?
So, my questions are:
Is there any way to remove the quotes in a more efficient manner or without rewriting to a csv?
and
What exactly is causing the corruption in the xlswrite function? The variable replacement seems perfectly normal.
Thanks in advance!
Regarding the "corrupted" file. That's not a corrupted file, that's an xls file (not xlsx). You could verify this opening the text file in a hex editor to compare the signature. This happens when you chose no file extension or ask excel to write a file which can't be encoded into csv. I assume it's some character which causes the problems, try to isolate the critical line writing only parts of the cell.
Regarding your warning, not having the actual input I could only guess what's wrong. Again, I can only give some advices to debug the problem yourself. Run this code:
lastwarn('')
for m = 1:currentsize(1)
for n = 1:currentsize(2)
replacement{m,n} = strrep(current{m,n}, '"', '');
if ~isempty(lastwarn)
keyboard
lastwarn('')
end
end
end
This will launch the debugger when a warning is raised, allowing you to check the content of current{m,n}
It is mentionned in the documentation of strrep that :
The strrep function does not find empty strings for replacement. That is, when origStr and oldSubstr both contain the empty string (''), strrep does not replace '' with the contents of newSubstr.
You can use this user function substr like this :
for m = 1:currentsize(1)
for n = 1:currentsize(2)
replacement{m,n} = substr(current{m,n}, 2, -1);
end
end
Please use xlswrite function like this :
[status,message] = xlswrite(___)
and check the status if it is zero, that means the function did not succeed and an error message is generated in message as string.