MATLAB R2015b
I have a table containing a date string and a time string in various formats in two columns for each row:
11.01.2016 | 00:00:00 | data
10/19/16 | 05:29:00 | data
12.02.16 | 06:40 | data
I want to convert this two columns to one column with a common format:
31.12.2017 14:00:00
My current solution uses a loop over each row and combines the columns as strings, checks for the various formats to use datetime with an appropriate format string and then uses datestr with the desired format string. Datetime was not able to automatically determine the format of the input string.
As you can imagine, this is horribly slow for large tables (approx. 50000 rows).
Is there any faster solution?
Thanks in advance.
I gave a try to vectorize the code. The trick is to
convert tables > cell > char-array, then
manipulate char strings, then
convert back from char-array > cell > table
Also, there is an important bit to pad all cells having shorter lenths with 'null' character in a vectorized way. Without this, it will not be possible to convert from cell > char-array. Here is the code.
clc
clear all
%% create Table T
d={'11.01.2016';
'10/19/16';
'12.02.16'};
t={'00:00:00';
'05:29:00';
'06:40'};
dat=[123;
456;
789];
T = table(d,t,dat);
%% deal with dates in Table T
% separate date column and convert to cell
dd = table2cell(T(:,1));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter dates
nmax=max(cellfun(#numel,dd));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,dd,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ddd=cell2mat(temp1);
% replace the separators in 3rd and 6th location with '.' (period)
ddd(:,[3 6]) = repmat(['.' '.'], length(dd),1);
% find indexes of shorter dates
short_year_idx = find(uint16(ddd(:,nmax)) == 0);
% find the year value for those short_year cases
yy = ddd(short_year_idx,[7 8]);
% replace null chars with '20XX' string in desirted place
ddd(short_year_idx,7:nmax) = ...
[repmat('20',size(short_year_idx,1),1) yy];
% convert char array back to cell and replace in table
dddd = mat2cell(ddd,ones(1,size(d,1)),nmax);
T(:,1) = table(dddd);
%% deal with times in Table T
% separate time column and convert to cell
tt = table2cell(T(:,2));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter times
nmax=max(cellfun(#numel,tt));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,tt,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ttt=cell2mat(temp1);
% find indexes of shorter times (assuming only ':00' in end is missing
short_time_idx = find(uint16(ttt(:,nmax)) == 0);% dirty hack, as null=0 in ascii
% replace null chars with ':00' string
ttt(short_time_idx,[6 7 8]) = repmat(':00',size(short_time_idx,1),1);
% convert char array back to cell and replace in table
tttt = mat2cell(ttt,ones(1,size(t,1)),nmax);
T(:,2) = table(tttt);
If you call the two columns cell arrays c1 and c2, then something like this should work:
c = detestr(datenum(strcat(c1,{' '},c2)), 'dd.mm.yyyy HH:MM:SS')
Then you would need to drop the old columns and put this one c in their place. On the inside, datenum must be doing something similar to what you're doing, however, so I'm not sure if this will be faster. I suspect that it is because (we can hope) the standard functions are optimized.
If your table isn't representing those as cell arrays, then you may need to do a pre-processing step to form the cell arrays for strcat.
Related
I have a data set with multiple fields of which several of the fields are different names for equivalent properties. I have rescaled and adjusted the data so that the quantities are comparable and want to merge them into a single field.
As a toy example, let's say I have:
s = struct('pounds', [nan nan 4.8], 'pennies', [120 370 nan]);
s.pennies = s.pennies/100;
How do I merge my incomplete fields to get the desired output:
snew = struct(pounds, [1.2 3.7 4.8]);
If you have modified your field values such that they should be equivalent, and simply need to combine the non-NaN values, one option is to vertically concatenate the fields then use min or max down each column (which will ignore the NaN values). Then just remove the unwanted field with rmfield:
>> s = struct('pounds', [nan,nan,4.8], 'pennies', [120,370,nan]);
>> s.pounds = min([s.pounds; s.pennies./100], [], 1); % Scaling included here
>> s = rmfield(s, 'pennies')
s =
struct with fields:
pounds: [1.2000 3.7000 4.8000]
The following works for any number of fields. Since it is guaranteed that only one field is not NaN at each position, you can
Convert to a matrix such that each original field becomes a row of the matrix.
Keep only the numbers, ignoring NaN's. By assumption, this gives exactly one number per column.
Arrange that into a struct with the desired field name.
s = struct('pounds',[nan,nan,4.8], 'pennies', [120,370,nan])
s.pennies = s.pennies/100; % example data
target_field = 'pounds'; % field to which the conversion has been done
t = struct2cell(s); % convert struct to cell array
t = vertcat(t{:}); % convert cell array to matrix
t = t(~isnan(t)).'; % keep only numbers, ignoring NaN's
result = struct(target_field, t); % arrange into a struct
try my two-liner below
c=struct2cell(s);
s=struct('pounds',unique([c{:}]));
even better, you can also do it using the below oneliner
s=struct('pounds',unique(cell2mat(cellfun(#(x) x(:), struct2cell(s),'UniformOutput',false)))')
Need to read in data stored as two columns of hex values in text file temp.dat into a Matlab variable with 8 rows and two columns.
Would like to stick with the fcsanf method.
temp.dat looks like this (8 rows, two columns):
0000 7FFF
30FB 7641
5A82 5A82
7641 30FB
7FFF 0000
7641 CF05
5A82 A57E
30FB 89BF
% Matlab code
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
% Matlab treats hex a a character string
formatSpec = '%s %s';
% Want the output variable to be 8 rows two columns
sizeA = [8,2];
A = fscanf(fid,formatSpec,sizeA)
fclose(fid);
Matlab is producing the following which I don't expect.
A = 8×8 char array
'03577753'
'00A6F6A0'
'0F84F48F'
'0B21F12B'
'77530CA8'
'F6A00F59'
'F48F007B'
'F12B05EF'
In another variation, I attemped changing the format string like this
formatSpec = '%4c %4c';
Which produced this output:
A =
8×10 char array
'0↵45 F7↵78'
'031A3F65E9'
'00↵80 4A↵B'
'0F52F0183F'
'7BA7B0C20 '
'F 86↵0F F '
'F724700AB '
'F6 1F↵55 '
Still another variation like this:
formatSpec = '%4c %4c';
sizeA = [8,16];
A = fscanf(fid,formatSpec);
Produces a one by 76 character array:
A =
'00007FFF
30FB 7641
5A82 5A827641 30FB
7FFF 0000
7641CF05
5A82 A57E
30FB 89BF'
Would like and expect Matlab to produce a workspace variable with 8 rows and 2 columns.
Have followed the example on the Matlab help area here:
https://www.mathworks.com/help/matlab/ref/fscanf.html
My Matlab code is based on the 'read file contents into an array' section about 1/3 of the way down the page. The example I reference is doing something very similar except that the two columns are one int and one float rather than two characters.
Running Matlab R2017a on Redhat.
Here is the complete code with the solution provided by Azim and comments about
what I learned as a result of posting the question.
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
formatSpec = '%9c\n';
% specify the output size as the input transposed, NOT the input.
sizeA = [9,8];
A = fscanf(fid,formatSpec,sizeA);
% A' is an 8 by 9 character array, which is the goal matrix size.
% B is an 8 by 1 cell array, each member has this format 'dead beef'.
%
% Cell arrays are data types with indexed data containers called cells,
% where each cell can contain any type of data.
B = cellstr(A');
% split divides str at whitespace characters.
S = split(C)
fclose(fid)
S =
8×2 cell array
'0000' '7FFF'
'30FB' '7641'
'5A82' '5A82'
'7641' '30FB'
'7FFF' '0000'
'7641' 'CF05'
'5A82' 'A57E'
'30FB' '89BF'
It is likely your, 8x2 MATLAB variable would end up being a cell array. This can be done in two steps.
First, your lines have 9 characters so you could use formatSpec = '%9c\n' to read each line. Next you need to adjust the size parameter to read 9 rows and 8 columns; sizeA = [9 8]. This will read in all 9 characters into columns of the output; transposing the output will get you closer.
In the second step you need to convert the result of fscanf into your 8x2 cell array. Since you have R2017a you can then use cellstr and split to get your result.
Finally, if you need the integer values of each hex value you can use hex2dec on each cell in the cell-array.
I have data in a cell array as shown in the variable viewer here:
{[2.13949546690144;56.9515770543056],
[1.98550875192835;50.4110852121618],
...}
I want to split it into two columns with two decimal-point numbers as:
2.13 56.95
1.98 50.41
by removing opening and closing braces and semicolons such as [;]
(to do as like "Text to columns" in Excel).
If your N-element cell array C has 2-by-1 numeric data in each cell, you can easily convert that into an N-by-2 numeric matrix M like so (using the round function to round each element to 2 significant digits):
M = round([C{:}].', 2);
The syntax C{:} creates a comma-separated list of the contents of C, equivalent to C{1}, C{2}, ... C{N}. These are all horizontally concatenated using [ ... ], then the result is transposed using .'.
% let's build a matching example...
c = cell(2,1);
c{1} = [2.13949546690144; 56.9515770543056];
c{2} = [1.98550875192835; 50.4110852121618];
% convert your cell array to a double array...
m = cell2mat(c);
% take the odd rows and place them to the left
% take the even rows and place them to the right
m = [m(1:2:end,:) m(2:2:end,:)];
% round the whole matrix to two decimal digits
m = round(m,2);
Depending on your environment settings, you may still see a lot of trailing zeros after the first two decimal digits... but don't worry, everything is ok (on the precision point of view). If you want to display only the "real" digits of your numbers, use this command:
format short g;
you should use cell2mat
A={2.14,1.99;56.95,50.41};
B=cell2mat(A);
As for the rounding, you can do:
B=round(100*B)/100;
I ve got an ascii file and im trying to import it to matlab in order to make some plots. Is there any way of importing those data, even tho they contain , (comma) rather than . (dot)?
00:00:00,000;-2,14;
00:00:00,001;-1,80;
Well the first column which I want to create is referred to the time and its corresponding to 00:00:00,001; 00:00:00,002; etc.
The second column should be the amplitude of the sample i.e. -2,14; -1,80 etc.
Yup. First use importdata so that you can read each row of your text file as a cell in a cell array. After, to allow for the processing of your times to be performed in MATLAB, you'll need to replace each , character with a .. This will allow you to use MATLAB's commands for date and time processing. Specifically, use regular expressions to help you do this. Regular expressions help you find patterns in strings. We can use these patterns to help extract out the data you need. Use regexprep to replace all , characters with a ..
For the purposes of this answer, the example data that I'm going to be using is:
00:00:00,000;-2,14;
00:00:00,001;-1,80;
00:00:00,002;-0,80;
00:00:00,003;2,40;
00:00:00,004;3,78;
Therefore, assuming that your data is stored in a text file called data.txt, do:
%// Load in each row as a cell array
A = importdata('data.txt');
%// Each row has , replaced with .
Arep = regexprep(A, ',', '\.');
Now, what we can do is split up all of the quantities by using ; as the delimiter. We can use regexp to help us split up the quantities. We can further decompose the data by:
Arep_decomp = regexp(Arep, '[^;]+', 'match');
The first parameter is the cell array that contains each of our rows in the text file (with the commas converted to periods). The second parameter is a pattern that specifies what exactly you're trying to look for in each string in the cell array. [^;]+ means that you want to find all strings that consist of a bunch of characters excluding until we hit a semi-colon. Once we hit the semi-colon, we stop. 'match' means that you want to retrieve the actual strings which will be stored as cell arrays.
The result after the above line's execution gives:
Arep_decomp{1}{1} =
00:00:00.000
Arep_decomp{1}{2} =
-2.14
Arep_decomp{2}{1} =
00:00:00.001
Arep_decomp{2}{2} =
-1.80
Arep_decomp{3}{1} =
00:00:00.002
Arep_decomp{3}{2} =
-0.80
Arep_decomp{4}{1} =
00:00:00.003
Arep_decomp{4}{2} =
2.40
Arep_decomp{5}{1} =
00:00:00.004
Arep_decomp{5}{2} =
3.78
You can see that the output cell array, Arep_decomp is a 5 element cell array, where each cell is a nested 2 element cell array, where the first element is the time, and the second element is the magnitude. Note that these are all strings.
What you can do now is create two numeric arrays that will convert these quantities into numeric representations. Specifically, the time format that you have looks like the form:
HH:MM:SS.FFF
H is for hours, M is for minutes, S is for seconds and F is for microseconds. Use datenum to allow you to convert these time representations into actual date numbers. You would do this so that you can plot these on a graph, but then you perhaps want to display these times on the plot as well. That can easily be done by manipulating some plot functions. Nevertheless, use cellfun so that we can extract out the time strings as a separate array so we can use this for plotting later, and also use this to convert the time strings into date numbers via datenum, and convert the magnitude numbers into actual numbers.
Therefore:
datestr = cellfun(#(x) x{1}, Arep_decomp, 'uni', 0);
datenums = cellfun(#(x) datenum(x, 'HH:MM:SS.FFF'), datestr);
mags = cellfun(#(x) str2double(x{2}), Arep_decomp);
The first line of code extracts out each of the time strings as a single cell array - the uni=0 flag is important to do this. Next, we convert each time string into a date number, and we convert the magnitude strings into physical numbers by str2double.
Now, all you have to do is plot the data. That can be done by:
plot(datenums, mags);
set(gca, 'XTick', datenums);
set(gca, 'XTickLabel', datestr);
The above code plots the data where the date numbers are on the horizontal axis, the magnitude numbers are on the vertical axis, but we will probably want to rename the horizontal axis to be those time strings that you wanted. Therefore, we use to calls to set to ensure that the only ticks that are visible are from the date numbers themselves, and we relabel the date numbers so that they are the string representations of the times themselves.
Once we run the above code, we get:
Because the time step in between times is so small, it may clutter the horizontal axis as the labels are long, yet the interval is short. Therefore, you may consider only displaying times at a certain interval and you can do that by doing something like:
step_size = 5;
plot(datenums, mags);
set(gca, 'XTick', datenums(1:step_size:end));
set(gca, 'XTickLabel', datestr(1:step_size:end));
step_size controls how many ticks and labels appear in succession. Obviously, you need to make sure that step_size is smaller than the total number of points in your data.
For your copying and pasting pleasure, this is what the full code I wrote looks like:
%// Load in each row as a cell array
A = importdata('data.txt');
%// Each row has , replaced with .
Arep = regexprep(A, ',', '\.');
Arep_decomp = regexp(Arep, '[^;]+', 'match');
datestr = cellfun(#(x) x{1}, Arep_decomp, 'uni', 0);
datenums = cellfun(#(x) datenum(x, 'HH:MM:SS.FFF'), datestr);
mags = cellfun(#(x) str2double(x{2}), Arep_decomp);
step_size = 1;
%step_size = 5;
plot(datenums, mags);
set(gca, 'XTick', datenums(1:step_size:end));
set(gca, 'XTickLabel', datestr(1:step_size:end));
I want to do something like
scatter(timesRefined, upProb)
where timesRefined is a cell array in which each entry is a string corresponding to a time moment, such as 8:32:21.122 and upProb is simply a vector of numbers with same length as cell array. What is the most convenient way to do this?
You can convert your timesRefined cell to a numeric representation of date with datenum
>> timesRefined = {'8:32:21.122','9:30:54.123'};
>> datenum(timesRefined)
ans =
734869.355800023
734869.396459757
The resulting number expresses a date as days from the epoch. Since you are not concerned with days, just time, and provided your observations are contained within one day, you can simply take the fractional part of the datenum output:
>> datestr(mod(datenum(timesRefined),1))
ans =
8:32 AM
9:30 AM
and do scater(mod(datenum(timesRefined),1),upProb)
EDIT:
As pointed out by Pursuit, you can use the result of datenum directly as your x values and use datetick('x','HH:MM:SS.FFF')
strsplit from the Matlab file exchange should help. If all values are numeric, you'll get a matrix back.
timestr = '8:32:21.122';
timenum = strsplit(timestr,':');
convmat = [60*60; 60; 1];
time_in_seconds = sum(timenum .* convmat);