Reading text file and dealing with numbers - matlab

I'm attempting to count the number of letters in a text file, but unfortunately I keep getting stuck if numbers are involved.
So far I have been able to deal with letters and symbols, but unfortunately the ischar function doesn't help me when it comes to numbers.
function ok = lets(file_name)
fid = fopen(file_name, 'rt');
if fid < 0
ok = -1;
end
C = [];
D = [];
oneline = fgets(fid);
while ischar(oneline)
C = oneline(isletter(oneline));
W = length(C);
D = [D ; W];
oneline = fgets(fid);
end
total = 0;
for i = 1:length(D)
total = D(i) + total;
end
ok = total;
How can I deal with counting letters if there are also numbers in a text file?

I approached the problem the following way:
function ok = lets(file_name)
file = memmapfile( file_name, 'writable', false );
lowercase = [65:90];
uppercase = [97:122];
data = file.Data;
ok = sum(histc(data,lowercase)+histc(data,uppercase));
end
I mapped the file to memory using the memmapfile function and compared the data with the character encodings from this ASCII table. Lower case letters are represented by [65:90] and upper case letters by [97:122]. By applying the histc function, I got the frequency in which each letter appeared in the file. The total number of letters is given by adding all the frequencies up.
Note that I called histc twice to avoid having a bin from 90 to 97, which would count the []^_` characters.
I applied the function to a sample file called sample.txt containing the following lines:
abc23D&f![
k154&¨&skj
djaljaljds
Here is my output:
>> lets('sample.txt')
Elapsed time is 0.017783 seconds.
ans =
19
Edit:
Outputting ok=-1 for problems reading file:
function ok = lets(fclose(fid);file_name)
try
file = memmapfile( file_name, 'writable', false );
catch
file=[];
ok=-1;
end
if ~isempty(file)
lowercase = [65:90];
uppercase = [97:122];
data = file.Data;
ok = sum(histc(data,lowercase)+histc(data,uppercase));
end
end
With fopen approach, since you get the ok=-1 "by default":
function ok = lets(file_name)
fid = fopen(file_name, 'rt');
if fid < 0
ok = -1;
else
celldata=textscan(fid,'%s');
fclose(fid);
lowercase = [65:90];
uppercase = [97:122];
data = uint8([celldata{1}{:});
ok = sum(histc(data,lowercase)+histc(data,uppercase));
end
end

I think you are making this a lot more complected than it needs to be, just use isletter like you had and then use length.
function ok = lets(file_name)
%Original code as you had it
fid = fopen(file_name, 'rt');
if fid < 0
ok = -1;
end
%Initialize length
ok = 0;
%Get first line
oneline = fgets(fid);
%While line isn't empty
while oneline ~= -1
%remove everythin that's not a letter
oneline(~isletter(oneline)) = [];
%Add number of letters to output
ok = ok + length(oneline);
%Get next line
oneline = fgets(fid);
end
end
I used the input file,
Ar,TF,760,2.5e-07,1273.14,4.785688323049946e+24,24.80738364864047,37272905351.7263,37933372595.0276
Ar,TF,760,5e-07,1273.14,4.785688323049946e+24,40.3092219226107,2791140681.70926,2978668073.513113
Ar,TF,760,7.5e-07,1273.14,4.785688323049946e+24,54.80989010679312,738684259.1671219,836079550.0157251
and got 18, this counts the e's in the numbers, do you want these to be counted?

Related

Matlab: func2str from a function in a m-file

In a main m-file I have
conformal = maketform('custom', 2, 2, [], #conformalInverse_0001, []);
used in imtransform that refers to the function defined in conformalInverse_0001.m:
function U = conformalInverse_0001(X, ~)
%#codegen
U = [zeros(size(X))];
Z = complex(X(:,1),X(:,2));
W = 1./(4.*Z.^2-1);
U(:,2) = imag(W);
U(:,1) = real(W);
How can I get the string '1./(4.*Z.^2-1)' in the main program?
I found a way to solve it, but it's not so elegant...
Assume conformalInverse_0001.m is a file in your folder.
You can parse the file as a text file, and search for your formula.
Example:
Assume you know the location is 5'th line in file, and start with W =.
You can use something like the following code to read '1./(4.*Z.^2-1)' in the main program:
%Open file for reading.
fid = fopen('conformalInverse_0001.m', 'r');
%Read 5 lines.
s = textscan(fid, '%s', 5, 'delimiter', '\n');
fclose(fid);
%Get the 5'th line.
s = s{1}(5);
%Convert cell array to string.
s = s{1};
%Get characters from the 5'th character to one char before end of string.
s = s(5:end-1)
Result: s = 1./(4.*Z.^2-1)
You can check textscan documentation for finding more elegant solution.
I'm not sure I fully understand the problem here, but what about adding into your conformalInverse_0001 function something like:
str = '1./(4.*Z.^2-1)';
save('temp_str','str') % or whatever data that you want to save from it
and then adding in your main file:
load('str.mat')% or you can use 'impordata'
where you want to extract it.
I have hacked two solutions with textscan: first knowing the line number and second searching the line that starts with substring 'W = '
% read line line_num = 5 and process string
f_id = fopen(conformalInverse_m_path);
conformalInverse_cell = textscan(f_id,'%s','delimiter','\n'); %disp(conformalInverse_cell); % {68×1 cell}
func_string = conformalInverse_cell{1}{line_num}; disp(func_string); % W = 1./(4.*Z.^2-1); OK
func_string_2=func_string(5:end-1); disp(func_string_2); % 1./(4.*Z.^2-1); OK
% read first line that starts with substring 'W = ' and process string
W_string = 'W = ';
for i=1:100
func_string = conformalInverse_cell{1}{i};
Firt4=func_string(1:4); %disp(['i = ', num2str(i), ': First4 = ', Firt4]);
if strcmp(Firt4,W_string) == 1; line_nr = i; break; end;
end
func_string_2 = conformalInverse_cell{1}{line_nr};
func_string_3=func_string_2(5:end-1);

discard newline character

Here is the code that I am using;
files2 = dir('X_*.txt');
for kty=1:p
fidF = fopen(['X(A)_' num2str(kty) '.txt'], 'w');
for i = 1:length(files2)
fid = fopen(files2(i).name);
while(~feof(fid))
string = fgetl(fid)
fprintf(fidF, '%s', string)
end
fclose(fidF);
end
end
P equal to 90, because there have 90 different X text file which include different angles.The new X(A) text files should be 90 different files.
The code is used for getting rid of this second line and it's working.
The thing I want to ask is that when I use this code it creates X(A) text files (90 files) but all include X_1 files angle variable but it should be;
X_1 > X(A)_1 (each variable should transfer to new file.)
(X_65 > X(A)_65)
X_2 > X(A)_2
...
...
How can I fix the code?
files2 = dir('angle_*.txt');
for i = 1:length(files2)
fidF = fopen(['angle(A)_' num2str(i) '.txt'], 'w');
fid = fopen(files2(i).name);
while(~feof(fid))
string = fgetl(fid)
fprintf(fidF, '%s', string)
end
fclose(fidF);
fclose(fid);
end
result are
angle_1=272 angle(A)_1=272
angle_2=276 angle(A)_2=308
angle_3=280 angle(A)_3=312
angle_4=284 angle(A)_4=316
angle_5=288 angle(A)_5=320
angle_6=292 angle(A)_6=324
angle_7=296 angle(A)_7=328
angle_8=300 angle(A)_8=332
angle_9=304 angle(A)_9=336
angle_10=308 angle(A)_10=340
angle_11=312 angle(A)_11=344
angle_12=316 angle(A)_12=348
angle_10 variable goes to angle(A)_2 variable and its copy in this order.
In order to have your input and output files match, you need to remove one of the for loops and have X(A)_#.txt match files2(#).name:
files2 = dir('X_*.txt');
for i = 1:length(files2)
fid = fopen(files2(i).name);
fNum = regexp(files2(i).name, '([0-9]*)', 'match');
fidF = fopen(['X(A)_' fNum{1} '.txt'], 'w');
while(~feof(fid))
string = fgetl(fid)
fprintf(fidF, '%s', string)
end
fclose(fidF);
fclose(fid);
end
I've removed the loop from 1:p and used the loop over the number of input files with i as the loop variable. i is used for both the output file name and the index to the input file list.

Save a sparse array in csv

I have a huge sparse matrix a and I want to save it in a .csv. I can not call full(a) because I do not have enough ram memory. So, calling dlmwrite with full(a) argument is not possible. We must note that dlmwrite is not working with sparse formatted matrices.
The .csv format is depicted below. Note that the first row and column with the characters should be included in the .csv file. The semicolon in the (0,0) position of the .csv file is necessary too.
;A;B;C;D;E
A;0;1.5;0;1;0
B;2;0;0;0;0
C;0;0;1;0;0
D;0;2.1;0;1;0
E;0;0;0;0;0
Could you please help me to tackle this problem and finally save the sparse matrix in the desired form?
You can use csvwrite function:
csvwrite('matrix.csv',a)
You could do this iteratively, as follows:
A = sprand(20,30000,.1);
delimiter = ';';
filename = 'filecontaininghugematrix.csv';
dims = size(A);
N = max(dims);
% create names first
idx = 1:26;
alphabet = dec2base(9+idx,36);
n = ceil(log(N)/log(26));
q = 26.^(1:n);
names = cell(sum(q),1);
p = 0;
for ii = 1:n
temp = repmat({idx},ii,1);
names(p+(1:q(ii))) = num2cell(alphabet(fliplr(combvec(temp{:})')),2);
p = p + q(ii);
end
names(N+1:end) = [];
% formats for writing
headStr = repmat(['%s' delimiter],1,dims(2));
headStr = [delimiter headStr(1:end-1) '\n'];
lineStr = repmat(['%f' delimiter],1,dims(2));
lineStr = ['%s' delimiter lineStr(1:end-1) '\n'];
fid = fopen(filename,'w');
% write header
header = names(1:dims(2));
fprintf(fid,headStr,header{:});
% write matrix rows
for ii = 1:dims(1)
row = full(A(ii,:));
fprintf(fid, lineStr, names{ii}, row);
end
fclose(fid);
The names cell array is quite memory demanding for this example. I have no time to fix that now, so think about this part yourself if it is really a problem ;) Hint: just write the header element wise, first A;, then B; and so on. For the rows, you can create a function that maps the index ii to the desired character, in which case the complete first part is not necessary.

Making a matrix of strings read in from file using feof function in matlab

I have an index file (called runnumber_odour.txt) that looks like this:
run00001.txt ptol
run00002.txt cdeg
run00003.txt adef
run00004.txt adfg
I need some way of loading this in to a matrix in matlab, such that I can search through the second column to find one of those strings, load the corresponding file and do some data analysis with it. (i.e. if I search for "ptol", it should load run00001.txt and analyse the data in that file).
I've tried this:
clear; clc ;
% load index file - runnumber_odour.txt
runnumber_odour = fopen('Runnumber_odour.txt','r');
count = 1;
lines2skip = 0;
while ~feof(runnumber_odour)
runnumber_odourmat = zeros(817,2);
if count <= lines2skip
count = count+1;
[~] = fgets(runnumber_odour); % throw away unwanted line
continue;
else
line = strcat(fgets(runnumber_odour));
runnumber_odourmat = [runnumber_odourmat ;cell2mat(textscan(line, '%f')).'];
count = count +1;
end
end
runnumber_odourmat
But that just produces a 817 by 2 matrix of zeros (i.e. not writing to the matrix), but without the line runnumber_odourmat = zeros(817,2); I get the error "undefined function or variable 'runnumber_odourmat'.
I have also tried this with strtrim instead of strcat but that also doesn't work, with the same problem.
So, how do I load that file in to a matrix in matlab?
You can do all of this pretty easily using a Map object so you will not have to do any searching or anything like that. Your second column will be a key to the first column. The code will be as follows
clc; close all; clear all;
fid = fopen('fileList.txt','r'); %# open file for reading
count = 1;
content = {};
lines2skip = 0;
fileMap = containers.Map();
while ~feof(fid)
if count <= lines2skip
count = count+1;
[~] = fgets(fid); % throw away unwanted line
else
line = strtrim(fgets(fid));
parts = regexp(line,' ','split');
if numel(parts) >= 2
fileMap(parts{2}) = parts{1};
end
count = count +1;
end
end
fclose(fid);
fileName = fileMap('ptol')
% do what you need to do with this filename
This will provide for quick access to any element
You can then do what was described in the previous question you had asked, with the answer I provided.

Reading data into MATLAB from a textfile

I have a textfile with the following structure:
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250
712,175
14
...
The file contains repeated sets of eight values (a date followed by seven numbers, each on their own line).
I want to read it into MATLAB and get the values into different vectors. I've tried to accomplish this with several different methods, but none have worked - all output some sort of error.
In case it's important, I'm doing this on a Mac.
EDIT: This is a shorter version of the code I previously had in my answer...
If you'd like to read your data file directly, without having to preprocess it first as dstibbe suggested, the following should work:
fid = fopen('datafile.txt','rt');
data = textscan(fid,'%s %s %s %s %s %s %s %s','Delimiter','\n');
fclose(fid);
data = [datenum(data{1}) cellfun(#str2double,[data{2:end}])]';
The above code places each set of 8 values into an 8-by-N matrix, with N being the number of 8 line sets in the data file. The date is converted to a serial date number so that it can be included with the other double-precision values in the matrix. The following functions (used in the above code) may be of interest: TEXTSCAN, DATENUM, CELLFUN, STR2DOUBLE.
I propose yet another solution. This one is the shortest in MATLAB code. First using sed, we format the file as a CSV file (comma seperated, with each record on one line):
cat a.dat | sed -e 's/,//g ; s/[ \t]*$/,/g' -e '0~8 s/^\(.*\),$/\1\n/' |
sed -e :a -e '/,$/N; s/,\n/,/; ta' -e '/^$/d' > file.csv
Explanation: First we get rid of the thousands comma seperator, and trim spaces at the end of each line adding a comma. But then we remove that ending comma for each 8th line. Finally we join the lines and remove empty ones.
The output will look like this:
1999-01-04,1100.00,1060.00,1092.50,0,6225,1336605,37
1999-01-05,1122.50,1087.50,1122.50,0,3250,712175,14
Next in MATLAB, we simply use textscan to read each line, with the first field as a string (to be converted to num), and the rest as numbers:
fid = fopen('file.csv', 'rt');
a = textscan(fid, '%s %f %f %f %f %f %f %f', 'Delimiter',',', 'CollectOutput',1);
fclose(fid);
M = [datenum(a{1}) a{2}]
and the resulting matrix M is:
730124 1100 1060 1092.5 0 6225 1336605 37
730125 1122.5 1087.5 1122.5 0 3250 712175 14
Use a script to modify your text file into something that Matlab can read.
eg. make it a matrix:
M = [
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605; <-- notice the ';'
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250; <-- notice the ';'
712,175
14
...
]
import this into matlab and read the various vectors from the matrix.
Note: my matlab is a bit rusty. Might containt errors.
It isn't entirely clear what form you want the data to be in once you've read it. The code below puts it all in one matrix, with each row representing a group of 8 rows in your text file. You may wish use different variables for different columns, or (if you have access to the Statistics toolbox), use a dataset array.
% Read file as text
text = fileread('c:/data.txt');
% Split by line
x = regexp(text, '\n', 'split');
% Remove commas from numbers
x = regexprep(x, ',', '')
% Number of items per object
n = 8;
% Get dates
index = 1:length(x);
dates = datenum(x(rem(index, n) == 1));
% Get other numbers
nums = str2double(x(rem(index, n) ~= 1));
nums = reshape(nums, (n-1), length(nums)/(n-1))';
% Combine dates and numbers
thedata = [dates nums];
You could also look into the function textscan for alternative ways of solving the problem.
Similar to Richie's. Using str2double to convert the file strings to doubles. This implementation processes line by line instead of breaking the file up with a regular expression. The output is a cell array of individual vectors.
function vectors = readdata(filename)
fid=fopen(filename);
tline = fgetl(fid);
counter = 0;
vectors = cell(7,1);
while ischar(tline)
disp(tline)
if counter > 0
vectors{counter} = [vectors{counter} str2double(tline)];
end
counter = counter + 1
if counter > 7
counter = 0;
end
tline = fgetl(fid);
end
fclose(fid);
This has regular expression checking to make sure your data is formatted well.
fid = fopen('data.txt','rt');
%these will be your 8 value arrays
val1 = [];
val2 = [];
val3 = [];
val4 = [];
val5 = [];
val6 = [];
val7 = [];
val8 = [];
linenum = 0; % line number in file
valnum = 0; % number of value (1-8)
while 1
line = fgetl(fid);
linenum = linenum+1;
if valnum == 8
valnum = 1;
else
valnum = valnum+1;
end
%-- if reached end of file, end
if isempty(line) | line == -1
fclose(fid);
break;
end
switch valnum
case 1
pat = '(?\d{4})-(?\d{2})-(?\d{2})'; % val1 (e.g. 1999-01-04)
case 2
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val2 (e.g. 1,100.00) [valid up to 1billion-1]
case 3
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val3 (e.g. 1,060.00) [valid up to 1billion-1]
case 4
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val4 (e.g. 1,092.50) [valid up to 1billion-1]
case 5
pat = '(?\d+)'; % val5 (e.g. 0)
case 6
pat = '(?\d*[,]*\d*[,]*\d+)'; % val6 (e.g. 6,225) [valid up to 1billion-1]
case 7
pat = '(?\d*[,]*\d*[,]*\d+)'; % val7 (e.g. 1,336,605) [valid up to 1billion-1]
case 8
pat = '(?\d+)'; % val8 (e.g. 37)
otherwise
error('bad linenum')
end
l = regexp(line,pat,'names'); % l is for line
if length(l) == 1 % match
if valnum == 1
serialtime = datenum(str2num(l.yr),str2num(l.mo),str2num(l.dy)); % convert to matlab serial date
val1 = [val1;serialtime];
else
this_val = strrep(l.val,',',''); % strip out comma and convert to number
eval(['val',num2str(valnum),' = [val',num2str(valnum),';',this_val,'];']) % save this value into appropriate array
end
else
warning(['line number ',num2str(linenum),' skipped! [didnt pass regexp]: ',line]);
end
end