I have a folder with 20,000+ data entries, and am trying to graph the time in-between each entry, to see where our process slows down. I've been trying to use the Dir() function on Matlab, by combining a couple of different codes I've found, but I feel like I'm way out of my depth, and not even able to get the basic structure right.
for i=1:25910
n = num2str(i);
d = dir('P' n '_Bump.datx');
moddate = d.date;
Plot(n,moddate)
end
I'm more familiar with python if there's a similar function that could pull the timestamp off of a file in that.
Data is formatted like:
P1_Bump.datx
P2_Bump.datx
...
P25910_Bump.datx
You can pull all of your files into a table (which is slightly easier to deal with than a struct in this case) like this:
files = dir( '*_Bump.datx' );
files = struct2table( files, 'AsArray', true );
Then get the dates from the datenum field of the table (since the date field is a char and not a MATLAB date)
dates = datetime( files.datenum, 'convertfrom', 'datenum' )
To calculate the time between files you can use diff
timesBetween = diff(dates);
Then plot
figure;
plot( timesBetween );
Note that timesBetween will be one element shorter than the number of files, since it's the differences in between files.
Related
I am writing Matlab code to open date files from a continuous air sampling instrument and could use help cleaning it up/formatting the plots correctly.
This is my code currently,
#importing data
A101 = readtable("AE33_AE33-S10-01288_20220101.dat");
A102 = readtable("AE33_AE33-S10-01288_20220102.dat");
A103 = readtable("AE33_AE33-S10-01288_20220103.dat");
A104 = readtable("AE33_AE33-S10-01288_20220104.dat");
#removing empty cells
A101=A101(~any(ismissing(A101),2),:);
A102=A102(~any(ismissing(A102),2),:);
A103=A103(~any(ismissing(A103),2),:);
A104=A104(~any(ismissing(A104),2),:);
#times
t1 = A101{:,2};
t2 = A102{:,2};
t3 = A103{:,2};
t4 = A104{:,2};
#BC data for each day
b = A101{:,56};
c = A102{:,56};
d = A103{:,56};
e = A104{:,56};
#plotting
plot(ts1, b, t2, c, t3, d, t4, e);
title('BC concentration');
xlabel('Time (hours)');
ylabel('BC concentration (ng/m3)');
ylim([0,1600]);
legend({'1/01', '1/02', '1/03', '1/04'})
As of now, the days are all overlayed, but I need to have one continuous graph with an extended time series. The time is formatted in hr:min:sec in the second column of each table already & the instrument creates a new file for each day.
Additionally, if anyone could help me with a more concise way of importing the files & removing the blank cells I would appreciate it since I'll eventually be looking at months of data. I'm not particularly knowledgeable about coding/Matlab, so I'm sure some of these are easy fixes. Thanks!
If I am understanding you correctly, you want all of your data to be in one large column array instead of 4 smaller column arrays, right? If that is what you want then all you need to do is concatenate your data arrays.
For example, in your case, you would want:
AllData = [b;c;d;e];
AllTime = [t1;t2;t3;t4];
This would give you two column arrays that contain all the data points and all the time points respectively. One issue you may run into is if your time data restarts at 0 every day in which case you would need to add +24 hours to every time array in increasing magnitude.
For example:
AllTimeAdjusted = [t1; t2+24; t3+48; t4+72];
Note: You will not be able to preserve the legend entries as they are and may need to find another way to figure that out.
New to Matlab so this may be more simple than I'm realising.
I'm working with a large number of text files for some data analysis and would like to separate them into a number of categories. They come under a format similar to Tp_angle_RunNum.txt, and Ts_angle_RunNum.text. I would like to have 4 different groups, the Tp and Ts files for angle1, and the same for angle2.
Any help would be appreciated!
A few things here:
Remember that your text files are not necessarily going to be where you process your data, so think about how you want to store the data in memory while you work with it. It may be beneficial to pull all of the data into a MATLAB Table. You can read your text files directly into tables with the function readtable()., with columns to capture Ts, Tp, and angle... or angle1 and angle2. You can use logicals with all 4, i.e., a true or false as to which group the data row belongs. You could also capture the run number so you know exactly what file it came from. Lots of ways to store the data. But, if you get into very large data sets, Tables are way easier to manipulate and work with. They are also very fast, and compact if you use categorical types as applicable.
dir() will likely be a necessary function for you for the dir listing, however there are some alternatives and some considerations based upon your platform. I suggest you take a look at the doc page for dir.
Take advantage of MATLAB Strings and vector processing. Its not the only way to do things, but it is by far the easiest. Strings were introduced in R2016b, and have gotten better since as more and more features and capabilities work with strings. More recently, you can use patterns instead of (or with!) regular expressions for finding what you need to process. The doc page I linked above has some great examples, so no point in me reinventing that wheel.
fileparts() is also your friend in MATLAB when working with many files. You can use it to separate the path, the filename and the extension. You might use it to simplify your processing.
Regarding vector processing, you can take an entire array of strings and pass it to the functions used with pattern. Or, if you take my suggestion of working with tables, you can get rows of your table that match specific characteristics.
Lets look at a few of these concepts together with some sample code. I don't have your files, but I can demonstrate with just the output of the dir command and some dummy files... I'm going to break this into two parts, a dirTable function (which is a dir wrapper I like to use instead of dir, and keep on my path) and a script that uses it. I'd suggest you copy the code and take advantage of code sections to run a section at a time. See doc page on Create and Run Sections in Code if this is new to you
dirTable.m
% Filename: dirTable.m
function dirListing = dirTable( names )
arguments
names (:,1) string {mustBeText} = pwd()
end
dirListing = table();
for nIdx = 1:numel( names )
tempListing = dir( names( nIdx ) );
dirListing = [ dirListing;...
struct2table( tempListing,'AsArray', true ) ]; %#ok<AGROW>
end
if ~isempty( dirListing )
%Adjust some of the types...
dirListing.name = categorical( dirListing.name );
dirListing.folder = categorical( dirListing.folder );
dirListing.date = datetime( dirListing.date );
end
end
Example script
%% Create some dummy files - assuming windows, or I'd use the "touch" cmd.
cmds = "type nul >> Ts_42_" + (1:3) + ".txt";
cmds = [cmds;"type nul >> Tp_42_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Ts_21_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Tp_21_" + (1:3) + ".txt"];
for idx = 1:numel(cmds)
system(cmds(idx));
end
%% Get the directory listing for all the files
% Note, the filenames come out as categoricals by my design, though that
% doesnt help much for this example - in fact - I'll have to cast to
% convert the categoricals to string a few times. Thats ok, its not a
% heavy lift. If you use dir directly, you'd not only be casting to
% string, but you'd also have to deal with the structure and clunky if/else
% conditions everywhere.
listing = dirTable();
%% Define patterns for each of the 4 groups
% - pretending the first code cell doesnt exist.
Tp_Angle1_pattern = "Tp_21";
Ts_Angle1_pattern = "Ts_21";
Tp_Angle2_pattern = "Tp_42";
Ts_Angle2_pattern = "Ts_42";
%% Cycle a group's data, creating a single table from all the files
% I could be more clever here and loop through the patterns as well and
% create a table of tables; however, I am going to keep this code easier
% to read at the cost of repetitiveness. I will however use a local
% function to gather all the runs from one group into a single table.
Tp_Angle1_matches = string(listing.name).startsWith(Tp_Angle1_pattern);
Tp_Angle1_filenames = string(listing.name(Tp_Angle1_matches));
Tp_Angle1_data = aggregateDataFilesToTable(Tp_Angle1_filenames);
% Repeat for each group... Or loop the above code for a single table
% if you loop for a single table, make sure to add column(s) for the group
% information
%% My local function for reading all the files in a group
function data_table = aggregateDataFilesToTable(filenames)
arguments
filenames (:,1) string
end
% We could assume that since we're using run numbers at the end of the
% filename, that we'll get the filenames pre-sorted for us. If not zero
% padding the numbers, then need to extract the run number to determine the
% sort order of the files to read in. I'm going to be lazy and assume zero
% padded for simplicity.
data_table = table();
for fileIdx = 1:numel(filenames)
% For the following line, two things:
% 1) the [data_table;readtable()] syntax appends the table from
% readtable to the end of data_table.
% 2) The comment at the end acknowledges that this variable is growing
% in a loop, which is usually not the best practice; however, since I
% have no way of knowing the total table dimensions ahead of time, I
% cannot pre-allocate the table before the loop - hence the table()
% call before the for loop. If you have a way of knowing this ahead of
% time, do pre-allocate!
data_table = [data_table;readtable(filenames(fileIdx))]; %#ok<AGROW>
end
end
NOTE 1: using empty parens is not necessary on function calls with no parameters; however, I find it to be easier for others to read when they know they are calling a function and not reading a variable.
NOTE 2: I know the dummy files are empty. That won't matter for this example, as an empty table appended to an empty table is another empty table. And the OP's quesiton was about the file manipulation and grouping, etc.
NOTE 3: In case the syntax is new to you, BOTH functions in my example use function argument blocks, which were introduced in R2019b - and they make for much easier to maintain and read code than NOT validating inputs, or using more complex ways of validating inputs. I was going to leave that out of this example, but they were already in my dirTable function, so I figured I'd just explain it instead.
I have 672 samples like these in a .txt file:
{
sleep:1360.36,
eat:4.36,
live:16.37,
travel:22.18,
work:22,
school:0.84,
vt:386.87
},
I want to put them in an excel file where {sleep, eat, live, travel, work, school, vt} are represented in a row and each sample represented in columns, with the correspondent number matching each. I've never dealt with text files following this format on matlab so I have no idea how to do this. Can anyone help me?
You can import data from Excel into Matlab using xlsread and export data using xlswrite. See the documentation
Syntax
xlswrite(filename,A,sheet,xlRange)
where A might be a cell array where the cells contain number of strings, sheet is the name of the Excel sheet and xlRange is the range in the Excel sheet (example: A1:B5).
Code example:
A = {'Column1', 'Column2', 'Column3'; 1, 2, 3};
xlswrite('example.xls', A, 'ExampleSheet', 'A1:B3');
Some hints:
If you know the number of rows and columsn of your data only at runtime but still want to give a range you must somehow assemble the range string yourself (rows are easy with sprintf, column names are more difficult (A, B, C, .., Z, AA, AB, ...))
If you do not have Excel on your computer, you will get csv files (see documentation)
Although each call to xlswrite returns quite fast, the system is still working. If another call to xlswrite comes too soon you might get unexpected (delay dependent) errors with no way to avoid them then to wait for sufficient time. I usually collect my data and then write everything to an excel file in one go.
very possible, you can do it in Matlab if you are familiar with it (although it is also quite easy to do in excel). To load in your file (no need to convert it, Matlab reads txt Files). you can do something like:
fileID = fopen('test2.txt'); %//Your file name
Input = textscan(fileID,'%s %f','Delimiter',':');
GoodRows = find(~isnan(Input{2} ));
column1 = Input{1}(GoodRows,:); //Column 1 is in the format of Cells (since you are storing Strings
column2 = Input{2}(GoodRows,:); //Column 2 is in the format of a Matrix, which will allow you to take numbers and do averages etc.
The cell and the matrix share indexes, so you can reformat your data eventually in to a Cell and export it in Matlab.
column1 =
'sleep'
'eat'
'live'
'travel'
'work'
'school'
'vt'
column2 =
1.0e+003 *
1.3604
0.0044
0.0164
0.0222
0.0220
0.0008
0.3869
==============EDIT===============
If you have multiple columns after the String, i.e.:
sleep,1.5,1.4,1.3
If you want to keep using textscan, you will need to specify how many columns there are. This is done by either:
Input = textscan(fileID,'%s %f %f %f %f','Delimiter',':'); //% add %f for each numeric column.
Or
Input = textscan(fileID,'%s repmat('%f',[1,N])]','Delimiter',':'); %// where N is the number of numeric columns you have.
I'm currently working with netCDF output from climate models and would like to obtain a text file of the time series for each latitude/longitude combination in the netCDF. For example, if the netCDF has 10 latitudes and 10 longitudes I would obtain 100 text files, each with a time series in a column format. I'm fairly familiar with the Matlab/netCDF language, but I can't seem to wrap my head around this. Naming the text files is not important; I will rename them "latitude_longitude_PCP.txt", where PCP is precipitation at the latitude and longitude location.
Any help would be appreciated. Thanks.
--Darren
There are several ways this problem could be solved.
Method 1. If you were able to put your netcdf file on a THREDDS Data Server, you could use the NetCDF Subset Service Grid as Point to specify a longitude/latitude point and get back the data in CSV or XML format. Here's an example from Unidata's THREDDS Data Server: http://thredds.ucar.edu/thredds/ncss/grid/grib/NCEP/GFS/Global_0p5deg/best/pointDataset.html
Method 2. If you wanted to use Matlab to extract a time series at a specific longitude/latitude location you could use the "nj_tseries" function from NCTOOLBOX, available at: http://nctoolbox.github.io/nctoolbox/
Method 3. If you really want to write an ASCII time series at every i,j location in your [time,lon,lat] grid using Matlab, you could do something like this (using NCTOOLBOX):
url='http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_2p5deg/best';
nc = ncgeodataset(url);
nc.variables
var='Downward_Short-Wave_Radiation_Flux_surface_12_Hour_Average';
lon = nc.data('lon');
lat = nc.data('lat');
jd = nj_time(nc,var);
ncvar = nc.variable(var);
for j=1:length(lat)
for i=1:length(lon)
v=ncvar.data(:,j,i);
outfile=sprintf('%6.2flon%6.2flat.csv',lon(i),lat(j))
fid=fopen(outfile,'wt')
data= [datevec(jd) v]
fprintf(fid,'%2.2d %2.2d %2.2d %2.2d %2.2d %2.2d %7.2f\n',data')
fclose(fid)
disp([outfile ' created.'])
end
end
If you had enough memory to read all the data into matlab, you could read outside the double loop, which would be a lot faster. But writing ASCII is slow anyway, so it might not matter that much.
%% Create demo data
data = reshape(1:20*30*40,[20 30 40]);
nccreate('t.nc','data','Dimensions',{'lat', 20, 'lon',30, 'time', inf});
ncwrite('t.nc', 'data',data);
ncdisp('t.nc');
%% Write timeseries to ASCII files
% Giving an idea of the size of your data can help people
% recommend different approaches tailored to the data size.
% For smaller data, it might be faster to read in the full
% 3D data into memory
varInfo = ncinfo('t.nc','data');
disp(varInfo);
for latInd =1:varInfo.Size(1)
for lonInd =1:varInfo.Size(2)
fileName = ['t_ascii_lat',num2str(latInd),'_lon',num2str(lonInd),'.txt'];
tSeries = ncread('t.nc','data',[latInd, lonInd, 1],[1,1,varInfo.Size(3)]);
dlmwrite(fileName,squeeze(tSeries));
end
end
%% spot check
act = dlmread('t_ascii_lat10_lon29.txt');
exp = squeeze(data(10,29,:));
assert(isequal(act,exp));
I am quite new to data analysis, so if this is a rookie question, I'm sorry, I am learning as I go.
I have just started doing some work in variable star astronomy. I have about 100 files for every night of observation that all contain the same basic information (star coordinates, magnitude, etc.). I am loading all of the files into my workspace as arrays using a for-loop
files = dir('*.out');
for i=1:length(files)
eval(['load ' files(i).name ' -ascii']);
end
I'm only really interested in two columns in each file. Is there a way to extract a column and set it to a vector while this for-loop is running? I'm sure that it's possible, but the actual syntax for it is escaping me.
try using load as a function and save it's output to a variable
files = dir('*.out');
twoCols = {};
for ii=1:length(files)
data = load( files(ii).name, '-ascii' ); % load file into "data"
twoCols{ii} = data(:,1:2); % take only two columns
end
Now variable twoCols holds the two columns of each file in a different cell.
You have to assign the load result to a new variable. Then if lets say your variable is starsInfo you can use
onlyTwoFirst = starsInfo(:,1:2)
That means take all the rows, but only columns 1 and 2.