Nested textscan statements - matlab

The following two statements read the first line from an input file (fid) and parse said line into strings delimited by whitespace.
a = textscan(fid,'%s',1,'Delimiter','\n');
b = textscan(a{1}{1},'%s');
I would like to know if this action can be accomplished in a single statement, having a form similar to the following (which is syntactically invalid).
b = textscan(textscan(fid,'%s',1,'Delimiter','\n'),'%s');
Thanks.

Instead of
a = textscan(fid, '%s', 1, 'Delimiter', '\n');
you can use
a = fgetl(fid);
That will return the next line in fid as a string (the newline character at the end is stripped). You can then split that line into white-space separated chunks as follows:
b = regexp(a, '\s*', 'split');
Combined:
b = regexp(fgetl(fid), '\s*', 'split');
Note that this is not 100% equivalent to your code, since using textscan adds another cell-layer (representing different lines in the file). That's not a problem, though, simply use
b = {regexp(fgetl(fid), '\s*', 'split')};
if you need that extra cell-layer.

Related

string concatenation from a .txt file matlab

I am trying to concatenate three lines (I want to leave the lines as is; 3 rows) from Shakespeare.txt file that shows:
To be,
or not to be:
that is the question.
My code right now is
fid = fopen('Shakespeare.txt')
while ~feof(fid)
a = fgets(fid);
b = fgets(fid);
c = fgets(fid);
end
fprintf('%s', strcat(a, b, c))
I'm supposed to use strcat and again, I want concatenated and leave them as three rows.
One method of keeping the rows separate is by storing the lines of the text file in a string array. Here a 1 by 3 string array is used. It may also be a good idea to use fgetl() which grabs each line of the text file at a time. Concantenating the outputs of fgetl() as strings may also be another option to ensure the they do not get stored as character (char) arrays. Also using the \n indicates to line break when printing the strings within the array String_Array.
fid = fopen('Shakespeare.txt');
while ~feof(fid)
String_Array(1) = string(fgetl(fid));
String_Array(2) = string(fgetl(fid));
String_Array(3) = string(fgetl(fid));
end
fprintf('%s\n', String_Array);
Ran using MATLAB R2019b

Matlab Error: ()-indexing must appear last in an index expression

I have this code and want to write an array in a tab delimited txt file :
fid = fopen('oo.txt', 'wt+');
for x = 1 :length(s)
fprintf(fid, '%s\t\n', s(x)(1)) ;
end;
fclose(fid);
but I receive this error :
Error: ()-indexing must appear last in an index expression.
how should i call s(x)(1)? s is an array
s <2196017x1 cell>
when I use this code I get no error but return me some characters not words.
fprintf(fid, '%s\t\n', ( s{x}{1})) ;
With MATLAB, you cannot immediately index into the result of a function using () without first assigning it to a temporary variable (Octave does allow this though). This is due to some of the ambiguities that happen when you allow this.
tmp = s(x);
fprintf(fid, '%s\t\n', tmp(1)) ;
There are some ways around this but they aren't pretty
It is unclear what exactly your data structure is, but it looks like s is a cell so you should really be using {} indexing to access it's contents
fprintf(fid, '%s\t\n', s{x});
Update
If you're trying to read individual words in from your input file and then write those out to a tab-delimited file, I'd probably do something like the following:
fid = fopen('input.txt', 'r');
contents = fread(fid, '*char')';
fclose(fid)
% Break a string into words and yield a cell array of strings
words = regexp(contents, '\s+', 'split');
% Write these out to a file separated by tabs
fout = fopen('output.tsv', 'w');
fprintf(fout, '%s\t', words{:});
fclose(fout)

Read textfile with a mix of floats, integers and strings in the same column

Loading a well formatted and delimited text file in Matlab is relatively simple, but I struggle with a text file that I have to read in. Sadly I can not change the structure of the source file, so I have to deal with what I have.
The basic file structure is:
123 180 (two integers, white space delimited)
1.5674e-8
.
.
(floating point numbers in column 1, column 2 empty)
.
.
100 4501 (another two integers)
5.3456e-4 (followed by even more floating point numbers)
.
.
.
.
45 String (A integer in column 1, string in column 2)
.
.
.
A simple
[data1,data2]=textread('filename.txt','%f %s', ...
'emptyvalue', NaN)
Does not work.
How can I properly filter the input data? All examples I found online and in the Matlab help so far deal with well structured data, so I am a bit lost at where to start.
As I have to read a whole bunch of those files >100 I rather not iterate trough every single line in every file. I hope there is a much faster approach.
EDIT:
I made a sample file available here: test.txt (google drive)
I've looked at the text file you supplied and tried to draw a few general conclusions -
When there are two integers on a line, the second integer corresponds to the number of rows following.
You always have (two integers (A, B) followed by "B" floats), repeated twice.
After that you have some free-form text (or at least, I couldn't deduce anything useful about the format after that).
This is a messy format so I doubt there are going to be any nice solutions. Some useful general principles are:
Use fgetl when you need to read a single line (it reads up to the next newline character)
Use textscan when it's possible to read multiple lines at once - it is much faster than reading a line at a time. It has many options for how to parse, which it is worth getting to know (I recommend typing doc textscan and reading the entire thing).
If in doubt, just read the lines in as strings and then analyse them in MATLAB.
With that in mine, here is a simple parser for your files. It will probably need some modifications as you are able to infer more about the structure of the files, but it is reasonably fast on the ~700 line test file you gave.
I've just given the variables dummy names like "a", "b", "floats" etc. You should change them to something more specific to your needs.
function output = readTestFile(filename)
fid = fopen(filename, 'r');
% Read the first line
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
a = nums{1}(1);
b = nums{1}(2);
% Read 'b' of the next lines:
contents = textscan(fid, '%f', b);
floats1 = contents{1};
% Read the next line:
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
c = nums{1}(1);
d = nums{1}(2);
% Read 'd' of the next lines:
contents = textscan(fid, '%f', d);
floats2 = contents{1};
% Read the rest:
rest = textscan(fid, '%s', 'Delimiter', '\n');
output.a = a;
output.b = b;
output.c = c;
output.d = d;
output.floats1 = floats1;
output.floats2 = floats2;
output.rest = rest{1};
end
You can read in the file line by line using the lower-level functions, then parse each line manually.
You open the file handle like in C
fid = fopen(filename);
Then you can read a line using fgetl
line = fgetl(fid);
String tokenize it on spaces is probably the best first pass, storing each piece in a cell array (because a matrix doesn't support ragged arrays)
colnum = 1;
while ~isempty(rem)
[token, rem] = strtok(rem, ' ');
entries{linenum, colnum} = token;
colnum = colnum + 1;
end
then you can wrap all of that inside another while loop to iterate over the lines
linenum = 1;
while ~feof(fid)
% getl, strtok, index bookkeeping as above
end
It's up to you whether it's best to parse the file as you read it or read it into a cell array first and then go over it afterwards.
Your cell entries are all going to be strings (char arrays), so you will need to use str2num to convert them to numbers. It does a good job of working out the format so that might be all you need.

Textscan Matlab ; Doesn't read the format

I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1

How to read line from a text file as a string in matlab?

I am trying to read a text file in MATLAB which has a format like the following. I am looking to read the whole line as a string.
2402:0.099061 2404:0.136546 2406:0.447161 2407:0.126333 2408:0.213803 2411:0.068189
I tried couple of things.
textscan(fid, '%s') reads the line but splits the line into cells at spaces.
fscanf(fid, '%s') reads the line as a string but removes all the spaces.
fgetl(fid) will do what you're looking for. Newline is stripped off.
textscan uses a whitespace delimeter by default. Set the delimiter to an empty string:
>> q = textscan(fid, '%s', 'Delimiter', '');
>> q{1}{:}
ans = 2402:0.099061 2404:0.136546 2406:0.447161 2407:0.126333 2408:0.213803 2411:0.068189
If you want to read the whole file as string (your file has only one line), try:
s = fileread('input.txt'); %# returns a char vector
s = strtrim(s); %# trim whitespaces
If you look at the source code of FILEREAD function, it is basically reading the file in binary mode as an array of characters: fread(fid, '*char')
whitespace is treated as a delimiter by default with textscan.
specify a different delimiter (that is not present in your data) when calling, that should do the trick, add this f.e.
'delimiter', '|'
you can also use
file = textread(<fileref goes here>, '%s', 'delimiter', '\n')
then
file{1,1}
will return
ans =
2402:0.099061 2404:0.136546 2406:0.447161 2407:0.126333 2408:0.213803 2411:0.068189
hope this helps
Use:
clc;
fid = fopen('fileName.m');
while ischar(tline)
disp(strcat("Line imported: ",tline))
tline = fgetl(fid);
end
fclose(fid);