Matlab .txt file analyses - matlab

I am analyzing a set of text in a .txt file. the file has 30 lines and each line contains different phrases both containing text, numbers, and symbols.
what's the best way to import this file into Matlab for analyses
(i.e.: how many Capital I's are in the text file or how many #text phrases are in the file (analyzing tweets on each line)

I think you'd best read the file line-by-line and save each line in a cell of a cell array:
fid = fopen(filename);
txtlines = cell(0);
tline = fgetl(fid);
while ischar(tline)
txtlines{numel(txtlines)+1}=tline;
tline = fgetl(fid);
end
fclose(fid);
This way you can easily access each line with txtlines{ii}.
If you always need to perform operations on the complete text (ie how many a's in the whole text-file, and not per-line), you can of course just throw the lines together in a single variable.
Executing an operation on each line, can be done simply with cellfun, for example counting the number of capital 'I's:
capI_per_line = cellfun(#(str) numel(strfind(str,'I')),txtlines);

If the file is reasonably sized (most 30 line files are) I would read it all into memory at once.
fid = fopen('saturate.m');
str = fread(fid,inf,'*char')';
fclose(fid);
Then, depending on your needs you can use basic matrix operations, string operations or regexp style analysis on the str variable.
For example, "how many capital 'I''s?" is:
numIs = sum(str=='I');
Or, "how many instances of 'someString'?" is:
numSomeString = length(strfind(str, 'someString'));

Related

How do I write a MATLAB code that edits another MATLAB file (.m)?

I have two files, Editor.m and Parameters.m. I want to write a code in Editor.m that when run does the following task:
reads Parameters.m
searches for a line in it (e.g. dt=1)
replaces it with something else (e.g. dt=0.6)
saves Parameters.m.
So, at the end of this process, Parameters.m will contain the line dt=0.6 instead of dt=1, without me having edited it directly.
Is there a way to do this? If so, how?
You can use regexprep to replace the value of interest.
% Read the file contents
fid = fopen('Parameters.m', 'r');
contents = fread(fid, '*char').';
fclose(fid);
% Replace the necessary values
contents = regexprep(contents, '(?<=dt=)\d*\.?\d+', '0.6');
% Save the new string back to the file
fid = fopen('Parameters.m', 'w');
fwrite(fid, contents)
fclose(fid)
If you can guarantee that it will only ever appear as 'dt=1', then you can use strrep instead
contents = strrep(contents, 'dt=1', 'dt=0.6');

Selecting and printing specific lines from a non rectangular text file (In MATLAB)

Hopefully somebody can help me as I am about to finish a program but I am having trouble with the format that my text file has.
My text file is quite large. I paste here the first 9 lines (Note that is a non rectangular text file which contains numeric and string data):
AFH,98.3,76.4,D,2,56.3,H
TYU,65.2,K,47,I
UJK,67.5,J
AFH,65.5,56.5,L,8,34.1,P
TYU,56.2,S,97,T
UJK,88.5,J
AFH,32.1,11.4,G,4,45.6,F
TYU,22.8,D,37,U
UJK,78.3,Z
The only data that I need from the entire text file are the lines that start with 'AFH'. I need to manage reading just these specific lines and write them in a new text file, so that I can use this last text file as input to run the rest of my program.
I can't find any way to be able to select just my 'AFH' lines to print them in a new text file.
The easiest way to do it is read the file line by line using fgetl and ignore the irrelevant lines.
myText = cell(1,N);
tline = fgetl(fid); % Read first line
while ischar(tline) % Keep going til the end of the file
if strcmp(tline(1:3),'AFH') % Check if it starts with 'AFH'
myText{end+1} = tline; % Save the line in cell array
end
tline = fgetl(fid); % Read the next line
end

Reading one column from a text file

What is the equivalent of fgel and fgets in MATLAB for reading one column at a time (not a line) from a text file?
You cannot avoid reading the file. However, if your dataset is large, you can tell MATLAB to ignore the irrelevant parts while reading the file.
For instance, if your columns are space delimited, and you want to read the floating-point numbers in the first column, you can try the following:
fid = fopen('input.txt');
C = textscan(fid, '%f %*[^\n]');
C = C{:};
fclose(fid);
This still reads the entire file, but stores only the first column in memory.

Textscan generates a vectore twice the expected size

I want to load a csv file in a matrix using matlab.
I used the following code:
formatSpec = ['%*f', repmat('%f',1,20)];
fid = fopen(filename);
X = textscan(fid, formatSpec, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
X = X{1};
The csv file has 1000 rows and 21 columns.
However, the matrix X generated has 2000 columns and 20 columns.
I tried using different delimiters like '\t' or '\n', but it doesn't change.
When I displayed X, I noticed that it displayed the correct csv file but with extra rows of zeros every 2 rows.
I also tried adding the 'HeaderLines' parameters:
`X = textscan(fid, formatSpec1, 'Delimiter', '\n', 'CollectOutput', 1, 'HeaderLines', 1);`
but this time, the result is an empty matrix.
Am I missing something?
EDIT: #horchler
I could read with no problem the 'test.csv' file.
There is no extra comma at the end of each row. I generated my csv file with a python script: I read the rows of another csv file, modified these (selecting some of them and doing arithmetic operations on them) and wrote the new rows on another csv file. In order to do this, I converted each element of the first csv file into floats...
New Edit:
Reading the textscan documentation more carefully, I think the problem is that my input file is neither a textfile nor a str, but a file containing floats
EDIT: three lines from the file
0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,2
1,-0.3834323,-1.92452324171,-1.2453254094,0.43455627857,-0.24571121,0.4340657,1,1,0,0,0,0.3517396202,1,0,0,0.3558122164,0.2936975319,0.4105696144,0,1,0
-0.78676,-1.09767,0.765554578,0.76579043,0.76,1,0,0,323124.235998,1,0,0,0,1,0,0,1,0,0,0,2
How about using regex ?
X=[];
fid = fopen(filename);
while 1
fl = fgetl(fid);
if ~ischar(fl), break, end
r =regexp(fl,'([-]*\d+[.]*\d*)','match');
r=r(1:21); % because your line 2nd is somehow having 22 elements,
% all lines must have same # elements or an error will be thrown
% Error: CAT arguments dimensions are not consistent.
X=[X;r];
end
fclose(fid);
Using csvread to read a csv file seems a good option. However, I also tend to read csv files with textscan as files are sometimes badly written. Having more options to read them is therefore necessary.
I face a reading problem like yours when I think the file is written a certain way but it is actually written another way. To debug it I use fgetl and print, for each line read, both the output of fgetl and its double version (see the example below). Examining the double version, you may find which character causes a problem.
In your case, I would first look at multiple occurrences of delimiters (',' and '\t') and , in 'textscan', I would activate the option 'MultipleDelimsAsOne' (while turning off 'CollectOutput').
fid = fopen(filename);
tline = fgetl(fid);
while ischar(tline)
disp(tline);
double(tline)
pause;
tline = fgetl(fid);
end
fclose(fid);

Open text files in matlab and save them from matlab

I have a big text file containing data that needs to be extracted and inserted into a new text file. I possibly need to store this data in an cell/matrix array ?
But for now, the question is that I am trying to test a smaller dataset, to check if the code below works.
I have a code in which it opens a text file, scans through it and replicates the data and saves it in another text file called, "output.txt".
Problem : It doesn't seem to save the file properly. It just shows an empty array in the text file, such as this " [] ". The original text file just contains string of characters.
%opens the text file and checks it line by line.
fid1 = fopen('sample.txt');
tline = fgetl(fid1);
while ischar(tline)
disp(tline);
tline = fgetl(fid1);
end
fclose(fid1);
% save the sample.txt file to a new text fie
fid = fopen('output.txt', 'w');
fprintf(fid, '%s %s\n', fid1);
fclose(fid);
% view the contents of the file
type exp.txt
Where do i go from here ?
It's not a good practice to read an input file by loading all of its contents to memory at once. This way the file size you're able to read is limited by the amount of memory on the machine (or by the amount of memory the OS is willing to allocate to a single process).
Instead, use fopen and its related function in order to read the file line-by-line or char-by- char.
For example,
fid1 = fopen('sample.txt', 'r');
fid = fopen('output.txt', 'w');
tline = fgetl(fid1);
while ischar(tline)
fprintf(fid, '%s\n', tline);
tline = fgetl(fid1);
end
fclose(fid1);
fclose(fid);
type output.txt
Of course, if you know in advance that the input file is never going to be large, you can read it all at once using by textread or some equivalent function.
Try using textread, it reads data from a text file and stores it as a matrix or a Cell array. At the end of the day, I assume you would want the data to be stored in a variable to manipulate it as required. Once you are done manipulating, open a file using fopen and use fprintf to write data in the format you want.