Matlab - importing ascii data - matlab

I ve got an ascii file and im trying to import it to matlab in order to make some plots. Is there any way of importing those data, even tho they contain , (comma) rather than . (dot)?
00:00:00,000;-2,14;
00:00:00,001;-1,80;
Well the first column which I want to create is referred to the time and its corresponding to 00:00:00,001; 00:00:00,002; etc.
The second column should be the amplitude of the sample i.e. -2,14; -1,80 etc.

Yup. First use importdata so that you can read each row of your text file as a cell in a cell array. After, to allow for the processing of your times to be performed in MATLAB, you'll need to replace each , character with a .. This will allow you to use MATLAB's commands for date and time processing. Specifically, use regular expressions to help you do this. Regular expressions help you find patterns in strings. We can use these patterns to help extract out the data you need. Use regexprep to replace all , characters with a ..
For the purposes of this answer, the example data that I'm going to be using is:
00:00:00,000;-2,14;
00:00:00,001;-1,80;
00:00:00,002;-0,80;
00:00:00,003;2,40;
00:00:00,004;3,78;
Therefore, assuming that your data is stored in a text file called data.txt, do:
%// Load in each row as a cell array
A = importdata('data.txt');
%// Each row has , replaced with .
Arep = regexprep(A, ',', '\.');
Now, what we can do is split up all of the quantities by using ; as the delimiter. We can use regexp to help us split up the quantities. We can further decompose the data by:
Arep_decomp = regexp(Arep, '[^;]+', 'match');
The first parameter is the cell array that contains each of our rows in the text file (with the commas converted to periods). The second parameter is a pattern that specifies what exactly you're trying to look for in each string in the cell array. [^;]+ means that you want to find all strings that consist of a bunch of characters excluding until we hit a semi-colon. Once we hit the semi-colon, we stop. 'match' means that you want to retrieve the actual strings which will be stored as cell arrays.
The result after the above line's execution gives:
Arep_decomp{1}{1} =
00:00:00.000
Arep_decomp{1}{2} =
-2.14
Arep_decomp{2}{1} =
00:00:00.001
Arep_decomp{2}{2} =
-1.80
Arep_decomp{3}{1} =
00:00:00.002
Arep_decomp{3}{2} =
-0.80
Arep_decomp{4}{1} =
00:00:00.003
Arep_decomp{4}{2} =
2.40
Arep_decomp{5}{1} =
00:00:00.004
Arep_decomp{5}{2} =
3.78
You can see that the output cell array, Arep_decomp is a 5 element cell array, where each cell is a nested 2 element cell array, where the first element is the time, and the second element is the magnitude. Note that these are all strings.
What you can do now is create two numeric arrays that will convert these quantities into numeric representations. Specifically, the time format that you have looks like the form:
HH:MM:SS.FFF
H is for hours, M is for minutes, S is for seconds and F is for microseconds. Use datenum to allow you to convert these time representations into actual date numbers. You would do this so that you can plot these on a graph, but then you perhaps want to display these times on the plot as well. That can easily be done by manipulating some plot functions. Nevertheless, use cellfun so that we can extract out the time strings as a separate array so we can use this for plotting later, and also use this to convert the time strings into date numbers via datenum, and convert the magnitude numbers into actual numbers.
Therefore:
datestr = cellfun(#(x) x{1}, Arep_decomp, 'uni', 0);
datenums = cellfun(#(x) datenum(x, 'HH:MM:SS.FFF'), datestr);
mags = cellfun(#(x) str2double(x{2}), Arep_decomp);
The first line of code extracts out each of the time strings as a single cell array - the uni=0 flag is important to do this. Next, we convert each time string into a date number, and we convert the magnitude strings into physical numbers by str2double.
Now, all you have to do is plot the data. That can be done by:
plot(datenums, mags);
set(gca, 'XTick', datenums);
set(gca, 'XTickLabel', datestr);
The above code plots the data where the date numbers are on the horizontal axis, the magnitude numbers are on the vertical axis, but we will probably want to rename the horizontal axis to be those time strings that you wanted. Therefore, we use to calls to set to ensure that the only ticks that are visible are from the date numbers themselves, and we relabel the date numbers so that they are the string representations of the times themselves.
Once we run the above code, we get:
Because the time step in between times is so small, it may clutter the horizontal axis as the labels are long, yet the interval is short. Therefore, you may consider only displaying times at a certain interval and you can do that by doing something like:
step_size = 5;
plot(datenums, mags);
set(gca, 'XTick', datenums(1:step_size:end));
set(gca, 'XTickLabel', datestr(1:step_size:end));
step_size controls how many ticks and labels appear in succession. Obviously, you need to make sure that step_size is smaller than the total number of points in your data.
For your copying and pasting pleasure, this is what the full code I wrote looks like:
%// Load in each row as a cell array
A = importdata('data.txt');
%// Each row has , replaced with .
Arep = regexprep(A, ',', '\.');
Arep_decomp = regexp(Arep, '[^;]+', 'match');
datestr = cellfun(#(x) x{1}, Arep_decomp, 'uni', 0);
datenums = cellfun(#(x) datenum(x, 'HH:MM:SS.FFF'), datestr);
mags = cellfun(#(x) str2double(x{2}), Arep_decomp);
step_size = 1;
%step_size = 5;
plot(datenums, mags);
set(gca, 'XTick', datenums(1:step_size:end));
set(gca, 'XTickLabel', datestr(1:step_size:end));

Related

How to split cell array values into two columns in MATLAB?

I have data in a cell array as shown in the variable viewer here:
{[2.13949546690144;56.9515770543056],
[1.98550875192835;50.4110852121618],
...}
I want to split it into two columns with two decimal-point numbers as:
2.13 56.95
1.98 50.41
by removing opening and closing braces and semicolons such as [;]
(to do as like "Text to columns" in Excel).
If your N-element cell array C has 2-by-1 numeric data in each cell, you can easily convert that into an N-by-2 numeric matrix M like so (using the round function to round each element to 2 significant digits):
M = round([C{:}].', 2);
The syntax C{:} creates a comma-separated list of the contents of C, equivalent to C{1}, C{2}, ... C{N}. These are all horizontally concatenated using [ ... ], then the result is transposed using .'.
% let's build a matching example...
c = cell(2,1);
c{1} = [2.13949546690144; 56.9515770543056];
c{2} = [1.98550875192835; 50.4110852121618];
% convert your cell array to a double array...
m = cell2mat(c);
% take the odd rows and place them to the left
% take the even rows and place them to the right
m = [m(1:2:end,:) m(2:2:end,:)];
% round the whole matrix to two decimal digits
m = round(m,2);
Depending on your environment settings, you may still see a lot of trailing zeros after the first two decimal digits... but don't worry, everything is ok (on the precision point of view). If you want to display only the "real" digits of your numbers, use this command:
format short g;
you should use cell2mat
A={2.14,1.99;56.95,50.41};
B=cell2mat(A);
As for the rounding, you can do:
B=round(100*B)/100;

Unify timestamps as date strings

MATLAB R2015b
I have a table containing a date string and a time string in various formats in two columns for each row:
11.01.2016 | 00:00:00 | data
10/19/16 | 05:29:00 | data
12.02.16 | 06:40 | data
I want to convert this two columns to one column with a common format:
31.12.2017 14:00:00
My current solution uses a loop over each row and combines the columns as strings, checks for the various formats to use datetime with an appropriate format string and then uses datestr with the desired format string. Datetime was not able to automatically determine the format of the input string.
As you can imagine, this is horribly slow for large tables (approx. 50000 rows).
Is there any faster solution?
Thanks in advance.
I gave a try to vectorize the code. The trick is to
convert tables > cell > char-array, then
manipulate char strings, then
convert back from char-array > cell > table
Also, there is an important bit to pad all cells having shorter lenths with 'null' character in a vectorized way. Without this, it will not be possible to convert from cell > char-array. Here is the code.
clc
clear all
%% create Table T
d={'11.01.2016';
'10/19/16';
'12.02.16'};
t={'00:00:00';
'05:29:00';
'06:40'};
dat=[123;
456;
789];
T = table(d,t,dat);
%% deal with dates in Table T
% separate date column and convert to cell
dd = table2cell(T(:,1));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter dates
nmax=max(cellfun(#numel,dd));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,dd,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ddd=cell2mat(temp1);
% replace the separators in 3rd and 6th location with '.' (period)
ddd(:,[3 6]) = repmat(['.' '.'], length(dd),1);
% find indexes of shorter dates
short_year_idx = find(uint16(ddd(:,nmax)) == 0);
% find the year value for those short_year cases
yy = ddd(short_year_idx,[7 8]);
% replace null chars with '20XX' string in desirted place
ddd(short_year_idx,7:nmax) = ...
[repmat('20',size(short_year_idx,1),1) yy];
% convert char array back to cell and replace in table
dddd = mat2cell(ddd,ones(1,size(d,1)),nmax);
T(:,1) = table(dddd);
%% deal with times in Table T
% separate time column and convert to cell
tt = table2cell(T(:,2));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter times
nmax=max(cellfun(#numel,tt));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,tt,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ttt=cell2mat(temp1);
% find indexes of shorter times (assuming only ':00' in end is missing
short_time_idx = find(uint16(ttt(:,nmax)) == 0);% dirty hack, as null=0 in ascii
% replace null chars with ':00' string
ttt(short_time_idx,[6 7 8]) = repmat(':00',size(short_time_idx,1),1);
% convert char array back to cell and replace in table
tttt = mat2cell(ttt,ones(1,size(t,1)),nmax);
T(:,2) = table(tttt);
If you call the two columns cell arrays c1 and c2, then something like this should work:
c = detestr(datenum(strcat(c1,{' '},c2)), 'dd.mm.yyyy HH:MM:SS')
Then you would need to drop the old columns and put this one c in their place. On the inside, datenum must be doing something similar to what you're doing, however, so I'm not sure if this will be faster. I suspect that it is because (we can hope) the standard functions are optimized.
If your table isn't representing those as cell arrays, then you may need to do a pre-processing step to form the cell arrays for strcat.

Printing progress in command window

I'd like to use fprintf to show code execution progress in the command window.
I've got a N x 1 array of structures, let's call it myStructure. Each element has the fields name and data. I'd like to print the name side by side with the number of data points, like such:
name1 number1
name2 number2
name3 number3
name4 number4
...
I can use repmat N times along with fprintf. The problem with that is that all the numbers have to come in between the names in a cell array C.
fprintf(repmat('%s\t%d',N,1),C{:})
I can use cellfun to get the names and number of datapoints.
names = {myStucture.name};
numpoints = cellfun(#numel,{myStructure.data});
However I'm not sure how to get this into a cell array with alternating elements for C to make the fprintf work.
Is there a way to do this? Is there a better way to get fprintf to behave as I desire?
You're very close. What I would do is change your cellfun call so that the output is a cell array instead of a numeric array. Use the 'UniformOutput' flag and set this to 0 or false.
When you're done, make a new cell array where both the name cell array and the size cell array are stacked on top of each other. You can then call fprintf once.
% Save the names in a cell array
A = {myStructure.name};
% Save the sizes in another cell array
B = cellfun(#numel, {myStructure.data}, 'UniformOutput', 0);
% Create a master cell array where the first row are the names
% and the second row are the sizes
out = [A; B];
% Print out the elements side-by-side
fprintf('%s\t%d\n', out{:});
The trick with the third line of code is that when you unroll the cell array using {:}, this creates a comma-separated list unrolled in column-major format, and so doing out{:} actually gives you:
A{1}, B{1}, A{2}, B{2}, ..., A{n}, B{n}
... which provides the interleaving you need. Therefore, providing this order into fprintf coincides with the format specifiers that are specified and thus gives you what you need. That's why it's important to stack the cell arrays so that each column gives the information you need.
Minor Note
Of course one should never forget that one of the easiest ways to tackle your problem is to just use a simple for loop. Even though for loops are considered bad practice, their performance has come a long way throughout MATLAB's evolution.
Simply put, just do this:
for ii = 1 : numel(myStructure)
fprintf('%s\t%d\n', myStructure(ii).name, numel(myStructure(ii).data));
end
The above code is arguably more readable in comparison to what we did above with cell arrays. You're accessing the structure directly rather than having to create intermediate variables for the purpose of calling fprintf once.
Example Run
Here's an example of this running. Using the data shown below:
clear myStructure;
myStructure(1).name = 'hello';
myStructure(1).data = rand(5,1);
myStructure(2).name = 'hi';
myStructure(2).data = zeros(3,3);
myStructure(3).name = 'huh';
myStructure(3).data = ones(6,4);
I get the following output after running the printing code:
hello 5
hi 9
huh 24
We can see that the sizes are correct as the first element in the structure is simply a random 5 element vector, the second element is a 3 x 3 = 9 zeroes matrix while the last element is a 6 x 4 = 24 ones matrix.

Using Matlab to randomly split an Excel Sheet

I have an Excel sheet containing 1838 records and I need to RANDOMLY split these records into 3 Excel Sheets. I am trying to use Matlab but I am quite new to it and I have just managed the following code:
[xlsn, xlst, raw] = xlsread('data.xls');
numrows = 1838;
randindex = ceil(3*rand(numrows, 1));
raw1 = raw(:,randindex==1);
raw2 = raw(:,randindex==2);
raw3 = raw(:,randindex==3);
Your general procedure will be to read the spreadsheet into some matlab variables, operate on those matrices such that you end up with three thirds and then write each third back out.
So you've got the read covered with xlsread, that results in the two matrices xlsnum and xlstxt. I would suggest using the syntax
[~, ~, raw] = xlsread('data.xls');
In the xlsread help file (you can access this by typing doc xlsread into the command window) it says that the three output arguments hold the numeric cells, the text cells and the whole lot. This is because a matlab matrix can only hold one type of value and a spreadsheet will usually be expected to have text or numbers. The raw value will hold all of the values but in a 'cell array' instead, a different kind of matlab data type.
So then you will have a cell array valled raw. From here you want to do three things:
work out how many rows you have (I assume each record is a row) by using the size function and specifying the appropriate dimension (again check the help file to see how to do this)
create an index of random numbers between 1 and 3 inclusive, which you can use as a mask
randindex = ceil(3*rand(numrows, 1));
apply the mask to your cell array to extract the records matching each index
raw1 = raw(:,randindex==1); % do the same for the other two index values
write each cell back to a file
xlswrite('output1.xls', raw1);
You will probably have to fettle the arguments to get it to work the way you want but be sure to check the doc functionname page to get the syntax just right. Your main concern will be to get the indexing correct - matlab indexes row-first whereas spreadsheets tend to be column-first (e.g. cell A2 is column A and row 2, but matlab matrix element M(1,2) is the first row and the second column of matrix M, i.e. cell B1).
UPDATE: to split the file evenly is surprisingly more trouble: because we're using random numbers for the index it's not guaranteed to split evenly. So instead we can generate a vector of random floats and then pick out the lowest 33% of them to make index 1, the highest 33 to make index 3 and let the rest be 2.
randvec = rand(numrows, 1); % float between 0 and 1
pct33 = prctile(randvec,100/3); % value of 33rd percentile
pct67 = prctile(randvec,200/3); % value of 67th percentile
randindex = ones(numrows,1);
randindex(randvec>pct33) = 2;
randindex(randvec>pct67) = 3;
It probably still won't be absolutely even - 1838 isn't a multiple of 3. You can see how many members each group has this way
numel(find(randindex==1))

Count the number of times something showed up in the top 36 rows

I have a cell of size 1x7 where each cell inside of that is 365x5xN in which each N is a different location (siteID). It is already sorted according to column 5 (the columns are Lat, Lon, siteID, date, and data).
(The data can be found here: https://www.dropbox.com/sh/li3hh1nvt11vok5/4YGfwStQlo. Variable in question is PM25)
I want to go through the entire 1x7 cell and, looking at only the top 36 rows (basically, the top 10 percentile), count the number of times each date shows up. In other words, I want to know on which days the data value fell in the top 10 percentile.
Does anyone know how I can do this? I can't get my mind around how to approach this issue --> counting across all these cells and spitting out a quantity for each day of the year
Assuming you have a sorted cell array, you may use this -
%%// Get all the dates for all the rows in sorted cell array
all_dates = [];
for k1=1:size(sorted_cell,2)
all_dates = [all_dates reshape(cell2mat(sorted_cell{1,k1}(:,4,:)),1,[])];
end
all_unique_dates = unique(all_dates);
all_out = [num2cell(all_unique_dates)' num2cell(zeros(numel(all_unique_dates),1))];%%//'
%%// Get all the dates for the first 36 rows in sorted cell array
dates = [];
for k1=1:size(sorted_cell,2)
dates = [dates reshape(cell2mat(sorted_cell{1,k1}(1:36,4,:)),1,[])];
end
%%// Get unique dates and their counts
unique_dates = unique(dates);
count = histc(dates, unique_dates);
%%// As output create a cell array with the first column as dates
%%// and the second column as the counts
out = [num2cell(unique_dates)' num2cell(count)']
%%// Get all the dates and the corresponding counts.
%%// Thus many would still have counts as zeros.
all_out(ismember(all_unique_dates,unique_dates),:)=out;
Often when something looks tricky from the outside, it's easier to start from the inside instead. How can we get the top dates from a single array?
dates = unique(array(1:35,4));
Now, how to do that for each cell? A loop is always straightforward, but this is a pretty simple function, so let's use the one-liner:
datecell = cellfun(#(x) unique(x(1:35,4)), cellarray, 'UniformOutput', false);
Now we have a just the dates we want, for each cell. If there's no need to keep them separated, let's just stick them all together into one big array:
dates = cell2mat(datecell);
dates = unique(dates); % in case there are any duplicates
If you want to actually count each date as well (it's a little unclear), it might be a little too involved for an anonymous function, so we could either write our own function to pass to cellfun, or just cop out and stick it in a loop:
dates = {};
counts = {};
for ii = 1:length(cellarray)
[dates{ii}, ~, idx] = unique(cellarray{ii}(1:35,4));
counts{ii} = accumarray(idx, 1);
end
Now, those cell arrays may contain duplication, so we'll have to combine the counts where necessary in a similar way:
dates = cell2mat(dates);
counts = cell2mat(counts);
[dates, ~, idx] = unique(dates);
counts = accumarray(idx, counts); % add the counts of duplicated dates together
Note that re-assigning different data to the same variable names like this isn't particularly good practice - I'm just feeling exceptionally lazy tonight, and coming up with good, descriptive names is hard ;)