Using fscanf in MATLAB to read an unknown number of columns - matlab

I want to use fscanf for reading a text file containing 4 rows with an unknown number of columns. The newline is represented by two consecutive spaces.
It was suggested that I pass : as the sizeA parameter but it doesn't work.
How can I read in my data?
update: The file format is
String1 String2 String3
10 20 30
a b c
1 2 3
I have to fill 4 arrays, one for each row.

See if this will work for your application.
fid1=fopen('test.txt');
i=1;
check=0;
while check~=1
str=fscanf(fid1,'%s',1);
if strcmp(str,'')~=1;
string(i)={str};
end
i=i+1;
check=strcmp(str,'');
end
fclose(fid1);
X=reshape(string,[],4);
ar1=X(:,1)
ar2=X(:,2)
ar3=X(:,3)
ar4=X(:,4)
Once you have 'ar1','ar2','ar3','ar4' you can parse them however you want.

I have found a solution, i don't know if it is the only one but it works fine:
A=fscanf(fid,'%[^\n] *\n')
B=sscanf(A,'%c ')
Z=fscanf(fid,'%[^\n] *\n')
C=sscanf(Z,'%d')
....

You could use
rawText = getl(fid);
lines = regexp(thisLine,' ','split);
tokens = {};
for ix = 1:numel(lines)
tokens{end+1} = regexp(lines{ix},' ','split'};
end
This will give you a cell array of strings having the row and column shape or your original data.
To read an arbitrary line of text then break it up according the the formating information you have available. My example uses a single space character.
This uses regular expressions to define the separator. Regular expressions powerful but too complex to describe here. See the MATLAB help for regexp and regular expressions.

Related

Converting a .txt file with 1 million digits of "e" into a vector in matlab

I have a text file with 1 million decimal digits of "e" number with 80 digits on each line excluding the first and the last line which have 76 and 4 digits and the file has 12501 lines. I want to convert it into a vector in matlab with each digit on each row. I tried num2str function, but the problem is that it gets converted like for example '7.1828e79' (13 characters). What can I do?
P.S.1: The first two lines of the text file (76 and 80 digits) are:
7182818284590452353602874713526624977572470936999595749669676277240766303535 47594571382178525166427427466391932003059921817413596629043572900334295260595630
P.S.2: I used "dlmread" and got a 12501x1 vector, with the first and second row of 7.18281828459045e+75 and 4.75945713821785e+79 and the problem is that when I use num2str for example for the first row value, I get: '7.182818284590453e+75' as a string and not the whole 76 digits. My aim was to do something like this:
e1=dlmread('e.txt');
es1=num2str(e1);
for i=1:12501
for j=1:length(es1(1,:))
a1((i-1)*length(es1(1,:))+j)=es1(i,j);
end
end
e_digits=a1.';
but I get a string like this:
a1='7.182818284590453e+754.759457138217852e+797.381323286279435e+799.244761460668082e+796.133138458300076e+791.416928368190255e+79 5...'
with 262521 characters instead of 1 million digits.
P.S.3: I think the problem might be solved if I can manipulate the text file in a way that I have one digit on each line and simply use dlmread.
Well, this is not hard, there are many ways to do it.
So first you want to load in your file as a Char Array using something simple like (you want a Char Array so that you can easily manipulate it to forget about the lines breaks) :
C = fileread('yourfile.txt'); %loads file as Char Array
D = C(~isspace(C)); %Removes SPACES which are line-breaks
Next, you want to actually append a SPACE between each char (this is because you want to use the num2str transform - and matlab needs to see the space), you can do this using a RESHAPE, a STRTRIM or simply a REGEX:
E = strtrim(regexprep(D, '.{1}', '$0 ')); %Add a SPACE after each Numeric Digit
Now you can transform it using str2num into a Vector:
str2num(E)'; %Converts the Char Array back to Vector
which will give you a single digit each row.
Using your example, I get a vector of 156 x 1, with 1 digit each row as you required.
You can get a digit per row like this
fid = fopen('e.txt','r');
c = textscan(fid,'%s');
c=cat(1,c{:});
c = cellfun(#(x) str2num(reshape(x,[],1)),c,'un',0);
c=cat(1,c{:});
And it is not the only possible way.
Could you please tell what is the final task, how do you plan using the array of e digits?

Slow regexprep with a very long string

I have simulation data in an ascii file with a lot of data points. I'm trying to extract variable names and their values from it. The below is an example of what the file format looks like:
*ESA
*COM on Tue Sep 27 15:23:02 2016
*COM C:\Users\vi813c\Documents\My Matlab\
*COM The pathname to the ESB file was: C:\Users\vi813c\Documents\My Matlab
Case013
*RTITLE
Run Date/Time = 20-SEP-2016 13:29:00
MSC.EASY5 time-history plot with 20001 data points
*EOD
*FLOAT
TIME FDLB(1) FSLB(1) FVLB(1) MXLB(1) \
MYLB(1) MZLB(1) FDLB(2) FSLB(2) FVLB(2) \
MXLB(2) MYLB(2) MZLB(2) FDLB(3) FSLB(3) \
FVLB(3) MXLB(3) MYLB(3) MZLB(3)
0 884.439 -0 53645.8 -972.132
-311780 207.866 5403.68 1981.49 327781
258746 -1.74898E+006 84631.4 5384.25 -1308.47
326538 -97028.6 -1.74013E+006 -61858.1
0.002 882.616 0.008033 53661.1 -972.4
-311702 207.779 5400.42 1982.11 327784
258726 -1.74906E+006 84628.3 5381.01 -1308.44
326541 -97040.1 -1.74021E+006 -61858.8
0.004 876.819 0.031336 53705.6 -973.183
-311683 207.661 5391.19 1983.9 327795
258693 -1.74935E+006 84624 5371.85 -1309.63
326552 -97040.6 -1.74051E+006 -61858.8
0.006 869.491 0.061631 53763.3 -974.213
-311806 207.618 5377.45 1986.76 327813
258659 -1.74995E+006 84621.7 5358.2 -1312.04
326569 -97040.3 -1.7411E+006 -61861
0.008 861.718 0.095625 53828.1 -975.379
-312039 207.648 5360.82 1990.12 327834
A summary of data format characteristics is as follows:
Everything above "*FLOAT" is a header and I need to get rid of it
Stuff between "*FLOAT" and the first numeric value are the variable names
The variable names and the values are delimited by space(s) and '\'
The data are "lumped". Each lump has values for the variables at a given simulation time step. In the example above, there are 19 variables so that there are 19 numeric values in each lump
There can be multiple data sets; each preceded with "*FLOAT" and a variable name section
The following is how I am currently handling this data:
fileread the file --> one big string of characters
regexprep {'\s+,'\','\n'} with ',' --> comma delimited for strsplit
strfind "*FLOAT"
strsplit by ',' --> now becomes a cell
find the first numeric value by isnan(str2double(parse))
Then between the index from 2. and the index from 4 are the variable names and between the index from 4 and the next "*FLOAT" are the numeric data
This scheme is sort of working, but I can't stop thinking that there's gotta be a better way to do this. For one, the step 1. is extremely slow. I guess it's one big string for regexprep to work on with multiple things to replace.
How can I improve my script?
I gave this a shot with the string class which is new in 16b.
str = string(fileread('file.txt'));
fileNewline = [13 newline]; % This data has carriage returns
str = extractAfter(str, ['*FLOAT' fileNewline]);
str = erase(str, ['\' fileNewline]);
str = splitlines(str);
% Get the variable names
varNames = split(str(1))';
% Get the data
data = reshape(str(2:end), 4, [])';
data = strip(data);
data = join(data);
data = split(data);
data = double(data);
I'm not sure about how to load the file faster.
As mentioned in another comment, textscan could probably help. It might end up being the fastest solution. With the correct format specified and using the 'HeaderLines' option, I think you can make it work.

Matlab: Function that returns a string with the first n characters of the alphabet

I'd like to have a function generate(n) that generates the first n lowercase characters of the alphabet appended in a string (therefore: 1<=n<=26)
For example:
generate(3) --> 'abc'
generate(5) --> 'abcde'
generate(9) --> 'abcdefghi'
I'm new to Matlab and I'd be happy if someone could show me an approach of how to write the function. For sure this will involve doing arithmetic with the ASCII-codes of the characters - but I've no idea how to do this and which types that Matlab provides to do this.
I would rely on ASCII codes for this. You can convert an integer to a character using char.
So for example if we want an "e", we could look up the ASCII code for "e" (101) and write:
char(101)
'e'
This also works for arrays:
char([101, 102])
'ef'
The nice thing in your case is that in ASCII, the lowercase letters are all the numbers between 97 ("a") and 122 ("z"). Thus the following code works by taking ASCII "a" (97) and creating an array of length n starting at 97. These numbers are then converted using char to strings. As an added bonus, the version below ensures that the array can only go to 122 (ASCII for "z").
function output = generate(n)
output = char(97:min(96 + n, 122));
end
Note: For the upper limit we use 96 + n because if n were 1, then we want 97:97 rather than 97:98 as the second would return "ab". This could be written as 97:(97 + n - 1) but the way I've written it, I've simply pulled the "-1" into the constant.
You could also make this a simple anonymous function.
generate = #(n)char(97:min(96 + n, 122));
generate(3)
'abc'
To write the most portable and robust code, I would probably not want those hard-coded ASCII codes, so I would use something like the following:
output = 'a':char(min('a' + n - 1, 'z'));
...or, you can just generate the entire alphabet and take the part you want:
function str = generate(n)
alphabet = 'a':'z';
str = alphabet(1:n);
end
Note that this will fail with an index out of bounds error for n > 26, so you might want to check for that.
You can use the char built-in function which converts an interger value (or array) into a character array.
EDIT
Bug fixed (ref. Suever's comment)
function [str]=generate(n)
a=97;
% str=char(a:a+n)
str=char(a:a+n-1)
Hope this helps.
Qapla'

MATLAB reading CSV file with timestamp and values

I have the following sample from a CSV file. Structure is:
Date ,Time(Hr:Min:S:mS), Value
2015:08:20,08:20:19:123 , 0.05234
2015:08:20,08:20:19:456 , 0.06234
I then would like to read this into a matrix in MATLAB.
Attempt :
Matrix = csvread('file_name.csv');
Also tried an attempt formatting the string.
fmt = %u:%u:%u %u:%u:%u:%u %f
Matrix = csvread('file_name.csv',fmt);
The problem is when the file is read the format is wrong and displays it differently.
Any help or advice given would be greatly appreciated!
EDIT
When using #Adriaan answer the result is
2015 -11 -9
8 -17 -1
So it seems that MATLAB thinks the '-' is the delimiter(separator)
Matrix = csvread('file_name.csv',1,0);
csread does not support a format specifier. Just enter the number of header rows (I took it to be one, as per example), and number of header columns, 0.
You file, however, contains non-numeric data. Thus import it with importdata:
data = importdata('file_name.csv')
This will get you a structure, data with two fields: data.data contains the numeric data, i.e. a vector containing your value. data.textdata is a cell containing the rest of the data, you need the first two column and extract the numerics from it, i.e.
for ii = 2:size(data.textdata,1)
tmp1 = data.textdata{ii,1};
Date(ii,1) = datenum(tmp1,'YYYY:MM:DD');
tmp2 = data.textdata{ii,2};
Date(ii,2) = datenum(tmp2,'HH:MM:SS:FFF');
end
Thanks to #Excaza it turns out milliseconds are supported.

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str