I need to concatenate multiple files' data into one matrix.
So far, the way that I have been testing loading my data is something akin to the following:
fid = fopen('data01.txt', 'r');
raw = textscan(fid, '%d/%d/%d %d:%d:%f %f %f %f %d', 'delimiter', ',');
m = cellfun(#double, raw, 'UniformOutput', false);
value_of_interest = m{:,10}
...But the data set that I have on disk is many files and all exist within a single directory. I'd prefer to refer to a specific path for this directory, rather than placing my script there. How can I modify my script so that it loads all data for all of the files in said folder?
So far I have this:
dirname = uigetdir;
files = dir(dirname);
fileIndex = find(~[files.isdir]);
for i = 1:length(fileIndex)
fileName = files(fileIndex(i)).name;
fid = fopen(fileName, 'r');
raw = textscan(fid, '%d/%d/%d %d:%d:%f %f %f %f %d', 'delimiter', ',');
time = [m{:,4}, m{:,5}, m{:,6}]; %needs to contain a float
converted_time = ((m{:,4} * 3600.0) + (m{:,5} * 60.0) + m{:,6}); %hh:mm:ss -> seconds
values = power(m{:,10}, 2);
values(values <= thresh) = 0;
% need to concat into the var 'values' here... also need to accumulate the time variable
end
plot(converted_time, values);
...But I need to put the two together.
EDIT: I should mention that I may run out of memory, which is explained later in comments below to my chosen answer.
First, have another look at how you are defining the fileName of the file to be opened. Instead, you should try fileName = [dirname, '\', files(fileIndex(i)).name];, since the name field of files will not contain the full path. This will solve your problem of referencing a list of files that are not in your current path.
Now, to avoid remembering all the data from all those files, we can do this job per file inside the loop:
...
plot(converted_time, values);
hold('on');
end
The short command hold('on');, often written simply hold on; modifies the plot axes such that subsequent data can be plotted without erasing any previous lines.
Related
I have been working on making a database which contains images and their preset values and other important parameters. But unfortunately, I'm not being able to save the initial data of say 10 images in one .csv file. I have made the code that runs fine with creating .csv file but saving the last value and overwriting all the previous values. I gave also once modified that is comment down in the code using sprintf but it make .csv file for every iteration separately. But i want to make one .csv file containing 7 column with all the respective values.
My code is below and output of my code is attached Output.
Please someone guide me how to make single .csv file with 10 values for instance (could be increased to hundreds in final database) to save in 1 .csv file.
clc
clear all
myFolder = 'C:\Users\USER\Desktop\PixROIDirectory\PixelLabelData_1';
filePattern = fullfile(myFolder, '*.png'); % Change to whatever pattern you need
theFiles = dir(filePattern);
load('gTruthPIXDATA.mat','gTruth')
gTruth.LabelDefinitions;
for i=1:10
%gTruth.LabelData{i,1};
baseFileName = theFiles(i).name;
fullFileName = fullfile(myFolder, baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
imageArray = imread(fullFileName);
oUt = regionprops(imageArray,'BoundingBox');
Y = floor(oUt.BoundingBox);
X_axis = Y(1);
Y_axis = Y(2);
Width = Y(3);
Height = Y(4);
CLASS = gTruth.LabelDefinitions{1,1};
JPG = gTruth.DataSource.Source{i,1};
PNG = gTruth.LabelData{i,1};
OUTPUT = [JPG X_axis Y_axis Width Height CLASS PNG]
% myFile = sprintf('value%d.csv',i);
% csvwrite(myFile,OUTPUT);
end
Try fprintf (https://www.mathworks.com/help/matlab/ref/fprintf.html).
You will need to open your output file to be written, then you can append lines to it through each iteration
Simple example:
A = [1:10]; % made up a matrix of numbers
fid = fopen('test.csv','w'); % open a blank csv and set as writable
for i = 1:length(A) % loop through the matrix
fprintf(fid,'%i\n',A(i)); % print each integer, then a line break \n
end
fclose(fid); % close the file for writing
I have (very large) comma separated files compressed in bz2 format. If I un-compressed them and I read with
fileID = fopen('file.dat');
X = textscan(fileID,'%d %d64 %s %f %d %f %f %d', 'delimiter', ',');
fclose(fileID);
everything is fine. But I would like to read them without uncompressing them, something like
fileID = fopen('file.bz2');
X = textscan(fileID,'%d %d64 %s %f %d %f %f %d', 'delimiter', ',');
fclose(fileID);
which, unfortunatley, returns an empty X. Any suggestions? Do I have to uncompressed them unavoidably via the system(' ... ') command ?
You could try to use the form of textscan that takes a string instead of a stream. Using the Matlab Java integration, you can leverage Java chained streams to decompress on the fly and read single lines, which can then be parsed:
% Build a stream chain that reads, decompresses and decodes the file into lines
fileStr = javaObject('java.io.FileInputStream', 'file.dat.gz');
inflatedStr = javaObject('java.util.zip.GZIPInputStream', fileStr);
charStr = javaObject('java.io.InputStreamReader', inflatedStr);
lines = javaObject('java.io.BufferedReader', charStr);
% If you know the size in advance you can preallocate the arrays instead
% of just stating the types to allow vcat to succeed
X = { int32([]), int64([]), {}, [], int32([]), [], [], int32([]) };
curL = lines.readLine();
while ischar(curL) % on EOF, readLine returns null, which becomes [] (type double)
% Parse a single line from the file
curX = textscan(curL,'%d %d64 %s %f %d %f %f %d', 'delimiter', ',');
% Append new line results
for iCol=1:length(X)
X{iCol}(end+1) = curX{iCol};
end
curL = lines.readLine();
end
lines.close(); % Don't forget this or the file will remain open!
I'm not exactly vouching for the performance of this method, with all the array appending going on, but at least that is how you can read a GZ file on the fly in Matlab/Octave. Also:
If you have a Java stream class that decompresses another format (try e.g. Apache Commons Compress), you can read it the same way. You could read bzip2 or xz files.
There are also classes to access archives, like zip files in the base Java distribution, or tar/RAR/7z and more in Apache Commons Compress. These classes usually have some way of finding files stored within the archive, allowing you to open an input stream to them within the archive and read in the same way as above.
On a unix system I would use named pipes and do something like this:
system('mkfifo pipename');
system(['bzcat file.bz2 > pipename &']);
fileID = fopen('pipename', 'r');
X = textscan(fileID,'%d %d64 %s %f %d %f %f %d', 'delimiter', ',');
fclose(fileID);
system('remove pipename');
I have boatloads of tab delimited textfiles that contain numerical data in 1000x2 format.
They're named file00001.txt - file10000.txt
I would like to write a script to load each of these files and make a variable containing ONLY the 400th row of the 2nd column of each of these files.
After that I'm going to try and plot a graph with the data I collected - but that's not important here.
I would be very grateful for your help.
Edit -
My most recent endeavour is:
numfiles = 10;
mydata = cell(1, numfiles);
for k = 1:numfiles
myfilename = sprintf('DM0000%d.txt', k);
mydata{k} = importdata(myfilename);
end
I'm running into a few problems -
1) if numfiles is >9, the 10th file data entry in the mydata variable comes up as []. This may have something to do with the naming method of my files? They're named in this fashion:
DM00000 ...DM00009, DM00010, DM00011, etc.
2) Also this is pretty slow to load, someone said using fopen, if so where should I put it in and how?
I'm guessing it'd be somewhere along the lines of fopen('filename', 'r')?
Based on your edit, this is what I'd recommend:
numfiles = 10;
row = 400;
column = 2;
data = zeros(1, numfiles);
for k = 1:numfiles
filename = sprintf('DM%05d.txt', k);
fid = fopen(filename,'r');
tempdata = textscan(fid, '%f%f');
fclose(fid);
data(k) = tempdata{column}(row);
end
I've updated the formatspec in sprintf to create the filenames correctly (you were missing the padding with zeros). I'm using textscan to import the data as doubles (change the %f to something else if required - check out the formatspec documentation). I also changed data to be a matrix rather than a cell array. You mentioned that you'd want to plot the data, so it'll be easier if it's a matrix and I couldn't see any need to use a cell array here.
I am trying to read in a csv file which will have the format
Var1 Val1A Val1B ... Val1Q
Var2 Val2A Val2B ... Val2Q
...
And I will not know ahead of time how many variables (rows) or how many runs (columns) will be in the file.
I have been trying to get text scan to work but no matter what I try I cannot get either all the variable names isolated or a rows by columns cell array. This is what I've been trying.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
if fID == -1
disp('Could not find file')
return
end
vars = textscan(fID, '%s,%*s','delimiter','\n');
fclose(fID);
Does anyone have a suggestion?
If the file has the same number of columns in each row (you just don't know how many to begin with), try the following.
First, figure out how many columns by parsing just the first row and find the number of columns, then parse the full file:
% Open the file, get the first line
fid = fopen('myfile.txt');
line = fgetl(fid);
fclose(fid);
tmp = textscan(line, '%s');
% The length of tmp will tell you how many lines
n = length(tmp);
% Now scan the file
fid = fopen('myfile.txt');
tmp = textscan(fid, repmat('%s ', [1, n]));
fclose(fid);
For any given file, are all the lines equal length? If they are, you could start by reading in the first line and use that to count the number of fields and then use textscan to read in the file.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
firstLine = fgetl(fID);
numFields = length(strfind(firstLine,' ')) + 1;
fclose(fID);
formatString = repmat('%s',1,numFields);
fID = fopen(strcat(pwd,'/',inputFile),'rt');
vars = textscan(fID, formatString,' ');
fclose(fID);
Now you will have a cell array where first entry are the var names and all the other entries are the observations.
In this case I assumed the delimiter was space even though you said it was a csv file. If it is really commas, you can change the code accordingly.
I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];