I have this code and want to write an array in a tab delimited txt file :
fid = fopen('oo.txt', 'wt+');
for x = 1 :length(s)
fprintf(fid, '%s\t\n', s(x)(1)) ;
end;
fclose(fid);
but I receive this error :
Error: ()-indexing must appear last in an index expression.
how should i call s(x)(1)? s is an array
s <2196017x1 cell>
when I use this code I get no error but return me some characters not words.
fprintf(fid, '%s\t\n', ( s{x}{1})) ;
With MATLAB, you cannot immediately index into the result of a function using () without first assigning it to a temporary variable (Octave does allow this though). This is due to some of the ambiguities that happen when you allow this.
tmp = s(x);
fprintf(fid, '%s\t\n', tmp(1)) ;
There are some ways around this but they aren't pretty
It is unclear what exactly your data structure is, but it looks like s is a cell so you should really be using {} indexing to access it's contents
fprintf(fid, '%s\t\n', s{x});
Update
If you're trying to read individual words in from your input file and then write those out to a tab-delimited file, I'd probably do something like the following:
fid = fopen('input.txt', 'r');
contents = fread(fid, '*char')';
fclose(fid)
% Break a string into words and yield a cell array of strings
words = regexp(contents, '\s+', 'split');
% Write these out to a file separated by tabs
fout = fopen('output.tsv', 'w');
fprintf(fout, '%s\t', words{:});
fclose(fout)
Related
I am trying to concatenate three lines (I want to leave the lines as is; 3 rows) from Shakespeare.txt file that shows:
To be,
or not to be:
that is the question.
My code right now is
fid = fopen('Shakespeare.txt')
while ~feof(fid)
a = fgets(fid);
b = fgets(fid);
c = fgets(fid);
end
fprintf('%s', strcat(a, b, c))
I'm supposed to use strcat and again, I want concatenated and leave them as three rows.
One method of keeping the rows separate is by storing the lines of the text file in a string array. Here a 1 by 3 string array is used. It may also be a good idea to use fgetl() which grabs each line of the text file at a time. Concantenating the outputs of fgetl() as strings may also be another option to ensure the they do not get stored as character (char) arrays. Also using the \n indicates to line break when printing the strings within the array String_Array.
fid = fopen('Shakespeare.txt');
while ~feof(fid)
String_Array(1) = string(fgetl(fid));
String_Array(2) = string(fgetl(fid));
String_Array(3) = string(fgetl(fid));
end
fprintf('%s\n', String_Array);
Ran using MATLAB R2019b
Loading a well formatted and delimited text file in Matlab is relatively simple, but I struggle with a text file that I have to read in. Sadly I can not change the structure of the source file, so I have to deal with what I have.
The basic file structure is:
123 180 (two integers, white space delimited)
1.5674e-8
.
.
(floating point numbers in column 1, column 2 empty)
.
.
100 4501 (another two integers)
5.3456e-4 (followed by even more floating point numbers)
.
.
.
.
45 String (A integer in column 1, string in column 2)
.
.
.
A simple
[data1,data2]=textread('filename.txt','%f %s', ...
'emptyvalue', NaN)
Does not work.
How can I properly filter the input data? All examples I found online and in the Matlab help so far deal with well structured data, so I am a bit lost at where to start.
As I have to read a whole bunch of those files >100 I rather not iterate trough every single line in every file. I hope there is a much faster approach.
EDIT:
I made a sample file available here: test.txt (google drive)
I've looked at the text file you supplied and tried to draw a few general conclusions -
When there are two integers on a line, the second integer corresponds to the number of rows following.
You always have (two integers (A, B) followed by "B" floats), repeated twice.
After that you have some free-form text (or at least, I couldn't deduce anything useful about the format after that).
This is a messy format so I doubt there are going to be any nice solutions. Some useful general principles are:
Use fgetl when you need to read a single line (it reads up to the next newline character)
Use textscan when it's possible to read multiple lines at once - it is much faster than reading a line at a time. It has many options for how to parse, which it is worth getting to know (I recommend typing doc textscan and reading the entire thing).
If in doubt, just read the lines in as strings and then analyse them in MATLAB.
With that in mine, here is a simple parser for your files. It will probably need some modifications as you are able to infer more about the structure of the files, but it is reasonably fast on the ~700 line test file you gave.
I've just given the variables dummy names like "a", "b", "floats" etc. You should change them to something more specific to your needs.
function output = readTestFile(filename)
fid = fopen(filename, 'r');
% Read the first line
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
a = nums{1}(1);
b = nums{1}(2);
% Read 'b' of the next lines:
contents = textscan(fid, '%f', b);
floats1 = contents{1};
% Read the next line:
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
c = nums{1}(1);
d = nums{1}(2);
% Read 'd' of the next lines:
contents = textscan(fid, '%f', d);
floats2 = contents{1};
% Read the rest:
rest = textscan(fid, '%s', 'Delimiter', '\n');
output.a = a;
output.b = b;
output.c = c;
output.d = d;
output.floats1 = floats1;
output.floats2 = floats2;
output.rest = rest{1};
end
You can read in the file line by line using the lower-level functions, then parse each line manually.
You open the file handle like in C
fid = fopen(filename);
Then you can read a line using fgetl
line = fgetl(fid);
String tokenize it on spaces is probably the best first pass, storing each piece in a cell array (because a matrix doesn't support ragged arrays)
colnum = 1;
while ~isempty(rem)
[token, rem] = strtok(rem, ' ');
entries{linenum, colnum} = token;
colnum = colnum + 1;
end
then you can wrap all of that inside another while loop to iterate over the lines
linenum = 1;
while ~feof(fid)
% getl, strtok, index bookkeeping as above
end
It's up to you whether it's best to parse the file as you read it or read it into a cell array first and then go over it afterwards.
Your cell entries are all going to be strings (char arrays), so you will need to use str2num to convert them to numbers. It does a good job of working out the format so that might be all you need.
I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1
The following two statements read the first line from an input file (fid) and parse said line into strings delimited by whitespace.
a = textscan(fid,'%s',1,'Delimiter','\n');
b = textscan(a{1}{1},'%s');
I would like to know if this action can be accomplished in a single statement, having a form similar to the following (which is syntactically invalid).
b = textscan(textscan(fid,'%s',1,'Delimiter','\n'),'%s');
Thanks.
Instead of
a = textscan(fid, '%s', 1, 'Delimiter', '\n');
you can use
a = fgetl(fid);
That will return the next line in fid as a string (the newline character at the end is stripped). You can then split that line into white-space separated chunks as follows:
b = regexp(a, '\s*', 'split');
Combined:
b = regexp(fgetl(fid), '\s*', 'split');
Note that this is not 100% equivalent to your code, since using textscan adds another cell-layer (representing different lines in the file). That's not a problem, though, simply use
b = {regexp(fgetl(fid), '\s*', 'split')};
if you need that extra cell-layer.
How could I write a function to take in the following:
filename: (a string that corresponds to the name of a file)
wordA and wordB: They are both two strings with no space
The function should do this:
A- read the a txt file line by line
B- replace every occurrence of wordA to wordB.
C- Write the modifified text file with the same as the original file, but preprended with 'new_'. For instance, if the input file name was 'data.txt', the output would be 'new_data.txt'.
Here is what I have done. It has so many mistakes but I got the main idea. Could you please help to find my mistake and to make the function work.
function [ ] = replaceStr( filename,wordA, wordB )
% to replace wordA to wordB in a txt and then save it in a new file.
newfile=['new_',filename]
fh=fopen(filename, 'r')
fh1=fgets(fh)
fh2=fopen(newfile,'w')
line=''
while ischar(line)
line=fgetl(fh)
newLine=[]
while ~isempty(line)
[word line]= strtok(line, ' ')
if strcmp(wordA,wordB)
word=wordB
end
newLine=[ newLine word '']
end
newLine=[]
fprintf('fh2,newLine')
end
fclose(fh)
fclose(fh2)
end
You can read the entire file in a string using the FILEREAD function (it calls FOPEN/FREAD/FCLOSE underneath), substitute text, then save it all at once to a file using FWRITE.
str = fileread(filename); %# read contents of file into string
str = strrep(str, wordA, wordB); %# Replace wordA with wordB
fid = fopen(['new_' filename], 'w');
fwrite(fid, str, '*char'); %# write characters (bytes)
fclose(fid);
Some things to fix:
It will be much easier to use the function STRREP instead of parsing the text yourself.
I would use FGETS instead of FGETL to keep the newline character as part of the string, since you will want to output them to your new file anyway.
The format of your FPRINTF statement is all wrong.
Here's a corrected version of your code with the above fixes:
fidInFile = fopen(filename,'r'); %# Open input file for reading
fidOutFile = fopen(['new_' filename],'w'); %# Open output file for writing
nextLine = fgets(fidInFile); %# Get the first line of input
while nextLine >= 0 %# Loop until getting -1 (end of file)
nextLine = strrep(nextLine,wordA,wordB); %# Replace wordA with wordB
fprintf(fidOutFile,'%s',nextLine); %# Write the line to the output file
nextLine = fgets(fidInFile); %# Get the next line of input
end
fclose(fidInFile); %# Close the input file
fclose(fidOutFile); %# Close the output file