octave large data set, doesn't disp() or write properly - matlab

I have a huge data set that I'm caching, then writing filtered analysis data to disk. I have various disp() commands in my code, along with fprintf() calls.
I'd like to see the results both in the file, and on the screen while the processes are running, but what I'm finding is that I get nothing until I terminate the program, at which point all the data is written into my file and the disp() floods the terminal.
Would there be a way to force disp() and the fprintf() to execute as they're being processed??
Here's an example:
function one(varargin)
setenv GNUTERM 'x11';
dirname = strcat(pwd, '/fileset');
files = dir(dirname);
disp('reading directory'), disp(dirname);
fileidx = find(~[files.isdir]);
out = fopen('write_data.txt', 'w');
fprintf(out, '"--- var a[0]", "--- var [1]";\n');
numfiles = length(fileidx);
for i = 1:numfiles
dispstring = sprintf('processing file %d of %d...', i, numfiles);
disp(dispstring);
filename = [dirname, '/', files(fileidx(i)).name];
disp(filename);
fid = fopen(filename, 'r');
%some processing here to obtain timevalues and maxvars
for i = 1:length(timevalues)
fprintf(out, '%d, %d;\n', timevalues(i), maxvars(i));
end
end
fclose(out);
end
I saw this post, but I wasn't sure which of the methods suggested applied to me. It also seemed like fflush() was meant for pushing data into a plot at higher priority.

I have had this problem before and you do you fflush to solve it. Write
fflush(stdout);
to force the terminal to be updated with the results of all the prints and disps to stdout that came before the call to fflush(stdout). I'm not sure if you should bother flushing the output to the file as it will probably make your code slower, but if you want to you can do
fflush(out);

Related

MATLAB : read data file in real time

I have an application, A, that writes to a File.
I want to use MATLAB to read N lines in realtime from this file.
My question is related to this stack post: How to plot real time data from text file in MATLAB
The author of one of the answers, mentions the following approach:
You can't plot using hard real time conditions, thus it can always happen that matlab misses a 10ms timeslot. You have to use option 2 to get all data.
To get started: Write a function which only reads the new data which was written since last call. To achieve this, do not close the file handle. It stores the position.
As such, here is my code:
myfile_fid=fopen(filePath, 'rt')
waitForFileToHaveData(filePath, 10);
for readingIdx = 1:10
fgetl(myfile_fid)
end
My waitForFileToHaveData function, is defined as follows:
function waitForFileToHaveData(filePath, desired_length)
if (getNumLinesOfFile(filePath) < desired_length)
disp('###### not enough data in the file');
pause(0.02);
waitForFileToHaveData(filePath, desired_length);
end
end
function num = getNumLinesOfFile(file_to_read)
[status, result] = system( ['wc -l ', file_to_read] );
if(status~=1)
scanCell = textscan(result,'%u %s');
num = scanCell{1} - 2;
else
num = 0;
end
end
Result:
When I get into the for loop, myfile_fid evaluates to 3, while fgetl(myfile_fid) evaluates to -1. If I print out the results of getNumLinesOfFile(filePath), I see 20. The odd part is that if I wait, say for the file to have 40 lines, and execute the code above, I do not get the error. I tried to look at the documentation to see why fgetl returns back -1, but I cannot seem to find it in 2018b MATLAB documentation. There is mention that the myfile_fid can return a -1, but that is only if the file cannot be opened. However, at runtime, myfile_id evaluates to 3.
Using MATLAB, is it possible to read N number of lines since last read in a file that is being written to by another application?
fgetl returns -1 when fileID reaches the end-of-file marker, See Matlab fgetl documentation. This means that if the first result from fgetl is -1 then the file is empty.
I'm not sure why you are getting -1 if getNumLinesOfFile returns 20, check the code carefully if you are reading the same file. Maybe the file has changed?
I wrote here MATLAB code that checks if 10 new lines were added and and then gets them with fgetl:
myfile_fid = fopen(filePath, 'rt');
newLines = 10;
linesRead = 0;
while(waitForFileToHaveData(filePath, linesRead + newLines))
linesRead = linesRead + newLines;
for readingIdx = 1:newLines
line = fgetl(myfile_fid)
end
end
fclose(myfile_fid);
I updated the waitForFileToHaveData function to return 1:
function ready = waitForFileToHaveData(filePath, desired_length)
while (getNumLinesOfFile(filePath) < desired_length)
disp('###### not enough data in the file');
pause(0.02);
end
ready = 1;
end
Note:
If the file had exactly 10 lines with no end-line marker at line 10, and you read them, then another 10 lines were added, fileID now points to the end-line of line 10 and the first line fgetl will return is the end-line at line 10, but since fgetl removes the end-line, it returns an empty array.
Side note:
the function waitForFileToHaveData uses recursion which is inefficient. You can easily use a while loop.

How can I run a MATLAB script on .csv files in two separate folders at the same time?

So I have an iterative loop that extracts data from .csv files in MATLAB's active folder and plots it. I would like to take it one step further and run the script on two folders, each with their own .csv files.
One folder is called stress and the other strain. As the name implies, they contain .csv files for stress and strain data for several samples, each of which is called E3-01, E3-02, E3-03, etc. In other words, both folders have the same number of files and the same names.
The way I see it, the process would have the following steps:
Look in the stress folder, look inside file E3-01, extract the data in the column labelled Stress
Look in the strain folder, look inside file E3-01, extract the data in the column labelled Strain
Combine the data together for sample E3-01 and plot it
Repeat steps 1-3 for all files in the folders
Like I said, I already have a script that can find the right column and extract the data. What I'm not sure about is how to tell MATLAB to alternate the folder that the script is being run on.
Instead of a script, would a function be better? Something that accepts 4 inputs: the names of the two folders and the columns to extract?
EDIT: Apologies, here's the code I have so far:
clearvars;
files = dir('*.csv');
prompt = {'Plot name:','x label:','y label:','x values:','y values:','Points to eliminate:'};
dlg_title = 'Input';
num_lines = 1;
defaultans = {'Title','x label','y label','Surface component 1.avg(epsY) [True strain]','Stress','0'};
answer = inputdlg(prompt,dlg_title,num_lines,defaultans);
name_plot = answer{1};
x_label = answer{2};
y_label = answer{3};
x_col = answer{4};
y_col = answer{5};
des_cols = {y_col,x_col};
smallest_n = 100000;
points_elim = answer{6};
avg_x_values = [];
avg_y_values = [];
for file = files'
M=xlsread(file.name);
[row,col]=size(M);
if smallest_n > row
smallest_n = row;
end
end
smallest_n=smallest_n-points_elim;
avg_x_values = zeros(smallest_n,size(files,1));
avg_y_values = zeros(smallest_n,size(files,1));
hold on;
set(groot, 'DefaultLegendInterpreter', 'none');
set(gca,'FontSize',20);
ii = 0;
for file = files'
ii = ii + 1;
[n,s,r] = xlsread(file.name);
colhdrs = s(1,:);
[row, col] = find(strcmpi(s,x_col));
x_values = n(1:end-points_elim,col);
[row, col] = find(strcmpi(s,y_col));
y_values = n(1:end-points_elim,col);
plot(x_values,y_values,'DisplayName',s{1,1});
avg_x_values(:,ii)=x_values(1:smallest_n);
avg_y_values(:,ii)=y_values(1:smallest_n);
end
ylabel({y_label});
xlabel({x_label});
title({name_plot});
colormap(gray);
hold off;
avg_x_values = mean(avg_x_values,2);
avg_y_values = mean(avg_y_values,2);
plot(avg_x_values,avg_y_values);
set(gca,'FontSize',20);
ylabel({y_label});
xlabel({x_label});
title({name_plot});
EDIT 2: #Adriaan I tried to write the following function to get a column from a file:
function [out_col] = getcolumn(col,file)
file = dir(file);
[n,s,r] = xlsread(file.name);
colhdrs = s(1,:);
[row, col] = find(strcmpi(s,col));
out_col = n(1:end,col);
end
but I get the error
Function 'subsindex' is not defined for values of class 'struct'.
Error in getcolumn (line 21)
y = x(:,n);
not sure why.
You can do both, of course, and it depends on preference mainly, provided you're the sole user of the script. If others are going to use it as well, use functions instead, as they can contain a proper help file and calling help functionname will then give you that help.
For instance:
folders1 = dir(../strain/*)
folders2 = dir(../stress/*)
for ii 1 = 1:numel(folders)
operand1 = folders1{ii};
operand2 = folders2{ii};
%... rest of script
%
% Or function:
data = YourFunction(folders1{ii},folders2{ii})
end
So all in all you can use both, although from experience I find functions easier to use in the end, as you just pass parameters and don't need to trawl through the complete code to change the parameters each run.
Additionally you can partition off small parts of your program which do a fix task. If you nest your functions, and finally call just a single function in your scripts, you don't have to look at hundreds of lines of code each time you run the script, but rather can just run a single function (which can also be inside a script or function, ad infinitum).
Finally, a function has its own scope; meaning that any variables that are in that function stay within that function unless you explicitly set them as output (apart from global variables, but those are problematic anyway). This can be a good thing, or a bad thing, depending on the rest of your code. If you function would output ~20 variables for further processing, the function probably should include more steps. It'd be a good thing if you create lots of intermediate variables (I always do), because when the function's finished running, the scope of that function will be removed from memory, saving you clear tmpVar1 tmpVar2 tmpVar3 etc every few lines in your script.
For the script the argument in favour would be that it is easier to debug; you don't need dbstop on error and can step a bit easier through the script, keeping check of all your variables. But, after the debugging has been completed, this argument becomes moot, and thus in general I'd start with writing a script, and once it performs as desired, I rework it to a function at minimal extra effort.

Read data from text file and passing to function

I got a this program which takes in all it data from a .txt file. It is possible to read the required data from the text file and pass that data to a function to work with? I have tried reading the data first and passing it to the function but then my plot refuses to work.
Right now I am doing it by sending in the name of the text file to the function and then read the data but this means that I am reading the data each time I call the function and I was hoping that I could just read the data once and then pass it on to the function. I think that not reading the data many times would speed up my program considerable.
My code looks like this
main.m
young bein_AB_light.txt %%calling the function with bein_AB_light.txt as parameter.
young.m
function young(filename)
fid = fopen(filename,'r');
C = textscan(fid,'%*f%*f%*f%*f%f');
fclose(fid);
Y=10500*C{1}.^2.29; %
plot(C{1},Y,'.K')
if(strfind(filename,'AB'))
xlabel('BMD[g/cm^3]');
ylabel('Youngstudull');
title('Reiknadur Youngstudull fyrir AB bein')
else
xlabel('BMD[g/cm^3]');
ylabel('Youngstudull');
title('Reiknadur Youngstudull fyrir SCI bein')
end
end
EDIT...
This is what I was trying but it gives me error when it tries to plot. Plot does not accept filename{1} to use as the X coordinites. I have also tried to use cell2mat function to change the input but that did not work.
main.m
fid = fopen(filename,'r');
AB_Bein = textscan(fid,'%*f%*f%*f%*f%f');
fclose(fid);
young AB_bein %%calling the function with AB_Bein as parameter.
young.m
function young(filename)
Y=10500*filename{1}.^2.29; %
plot(filename{1},Y,'.K')
if(strfind(filename,'AB'))
xlabel('BMD[g/cm^3]');
ylabel('Youngstudull');
title('Reiknadur Youngstudull fyrir AB bein')
else
xlabel('BMD[g/cm^3]');
ylabel('Youngstudull');
title('Reiknadur Youngstudull fyrir SCI bein')
end
end
it's possible that your problem is the way you are calling young.
If I create a function
function fileContents= young(filename)
fid = fopen(filename,'r');
C = textscan(fid,'%*f%*f%*f%*f%f');
fclose(fid);
fileContents=C{1};
and then call it using
fileContents= young('textfile.txt');
rather than
young textfile.txt
That brings the data from the file out into the variable named fileContents

for loop+structure allocation in matlab

This is a problem I am working on in Matlab.
I am looping through a series of *.ALL files and stripping the field name by a delimiter '.'. The first field is the network, second station name, third location code and fourth component. I pre-allocate my structure based on the number of files (3) I run through which for this example is a 3x3x3 structure that I would like to define as struct(station,loc code,component). You can see these definitions in my code example below.
I would like to loop through the station, loc code, and component and fill their values in the structure. The only problem is for some reason the way I've defined the loop it's actually looping through the files more than once. I only want to loop through each file once and extract the station, comp, and loc code from it and put it inside the structure. Because it's looping through the files more than once it's taken like 10 minutes to fill the structure. This is not very efficient at all. Can someone point out the culprit line for me or tell me what I'm doing incorrectly?
Here's my code below:
close all;
clear;
[files]=dir('*.ALL');
for i = 1:length(files)
fields=textscan(files(i).name, '%s', 'Delimiter', '.');
net{i,1}=fields{1}{:};
sta{i,1}=fields{1}{2};
loc{i,1}=fields{1}{3};
comp{i,1}=fields{1}{4};
data = [];
halfhour(1:2) = struct('data',data);
hour(1:24) = struct('halfhour',halfhour);
day(1:366) = struct('hour',hour);
PSD_YE_DBT(1:length(files),1:length(files),1:length(files)) =
struct('sta','','loc','','comp','','allData',[],'day',day);
end
for s=1:1:length(sta)
for l=1:1:length(loc)
for c=1:1:length(comp)
tempFileName = strcat(net{s},'.',sta{s},'.',loc{l},'.',comp{c},'.','ALL');
fid = fopen(tempFileName);
PSD_YE_DBT(s,l,c).sta = sta{s};
PSD_YE_DBT(s,l,c).loc = loc{l};
PSD_YE_DBT(s,l,c).comp = comp{c};
end
end
end
Example file names for the three files I'm working with are:
XO.GRUT.--.HHE.ALL
XO.GRUT.--.HHN.ALL
XO.GRUT.--.HHZ.ALL
Thanks in advance!

Matlab: running all functions in a given directory in a function/script

I'm very new to Matlab and I'm looking for some advice from someone who is more experienced.
I want to write a function that will loop through a given directory and run all matlab functions in that directory. What is the best/most robust way to do this? I've provided my implementation below but, I'm worried because most of my matlab experience thus far tells me that for every function I implement, there is usually an equivalent matlab built-in or at least a better/faster/safer way to achieve the same ends.
I'd be happy to provide any other necessary info. Thanks!
function [results] = runAllFiles(T)
files = dir('mydir/');
% get all file names in mydir
funFile = files(arrayfun(#(f) isMatFun(f), files));
% prune the above list to get a list of files in dir where isMatFun(f) == true
funNames = arrayfun(#(f) {stripDotM(f)}, funFiles);
% strip the '.m' suffix from all the file names
results = cellfun(#(f) {executeStrAsFun(char(f), T)}, funNames);
% run the files as functions and combine the results in a matrix
end
function [results] = executeStrAsFun(fname, args)
try
fun = str2func(fname); % convert string to a function
results = fun(args); % run the function
catch err
fprintf('Function: %s\n', err.name);
fprintf('Line: %s\n', err.line);
fprintf('Message: %s\n', err.message);
results = ['ERROR: Couldn''t run function: ' fname];
end
end
Well, for looking up all the .m-files in a directory, you can make use of files = what('mydir/'); and then consult files.m to get all .m-files (including their extension). At first sight, I would use eval to evaluate each function, but on the other hand: your solution of using str2func looks even better.
So I guess you could do the following:
function [results] = runAllFiles(T)
files = what('mydir/');
mFiles = arrayfun(#(f) {stripDotM(f)}, files.m);
% strip the '.m' suffix from all the file names
results = cellfun(#(f) {executeStrAsFun(char(f), T)}, mFiles);
% run the files as functions and combine the results in a matrix
end
function [results] = executeStrAsFun(fname, args)
try
fun = str2func(fname); % convert string to a function
results = fun(args); % run the function
catch err
fprintf('Function: %s\n', err.name);
fprintf('Line: %s\n', err.line);
fprintf('Message: %s\n', err.message);
results = ['ERROR: Couldn''t run function: ' fname];
end
end
A problem I foresee is when you have both functions and scripts in your directory, but I know of no (built-in) way to verify whether an .m-file is a function or a script. You could always check the contents of the file, but that might get somewhat complicated.