Suppressing unwanted spaces in dates - matlab

From reading code from elsewhere, I have a matrix of dates called 'time' that have unwanted spaces that I want removed.
I've tried isspace and regexprep with no luck
time = regexprep(time, '\W', '');
I have about 130000 dates with the following format:
04-July -2017 09:54:30.000
04-July -2017 09:54:31.000
etc
There are two spaces between the end of 'July' to the next dash I want to suppress to:
04-July-2017 09:54:30.000
04-July-2017 09:54:31.000

Replace two or more spaces with nothing:
>> time = {'04-July -2017 09:54:30.000'
'04-July -2017 09:54:31.000'}
>> regexprep(time,' {2,}','')
{'04-July-2017 09:54:30.000'}
{'04-July-2017 09:54:31.000'}

Unless you just want to correct your input file for later usage, you do not necessarily need to correct the input. There are several ways to parse the time directly with the extra spaces:
Let time be:
time = ['04-July -2017 09:54:31.000';
'04-July -2017 09:54:32.000']
Then to directly parse the string representation of the datetime into a MATLAB date serial number you can use:
%% get date in [MATLAB date serial number]
formatIn = 'dd-mmm -yyyy HH:MM:SS.FFF' ;
matlabTime = datenum(time,formatIn)
matlabTime =
736880.412858796
736880.41287037
This serial time representation is not so human readable but it's the fastest thing you can get if you want to do calculations with date/time.
If your goal is simply to correct the string, then you can use the same trick to read the value in, and define exactly which output format you want out:
%% get date in [string]
formatIn = 'dd-mmm -yyyy HH:MM:SS.FFF' ;
formatOut = 'dd-mmm-yyyy HH:MM:SS.FFF' ;
stringTime = datestr(datenum(time,formatIn),formatOut)
stringTime =
04-Jul-2017 09:54:31.000
04-Jul-2017 09:54:32.000
If you want to use the new datetime objects, the input format has a slight different syntax but the operation is roughly the same:
%% get date in [datetime] objects
formatIn = 'dd-MMM -yyyy HH:mm:ss.SSS' ;
t = datetime(time,'InputFormat',formatIn)
t =
04-Jul-2017 09:54:31
04-Jul-2017 09:54:32
Although the MATLAB console display t in human readable format, t is now a datetime object. Check the documentation if you want to use this.

Replace only two white-spaces after a month and preceding a dash (-):
>> date = '04-July -2017 09:54:30.000';
>> regexprep(date, '(\w) -', '$1-')
ans =
'04-July-2017 09:54:30.000'

Related

Trouble importing csv-data within MATLAB

I am trying to read in a csv-file that contains daily data on EUR/USD exchange rates including the dates specifying year, month and day. The problem is that using readtable(filename) puts single quotes around all table-entries and therefore hinders me using the data at all.
Detect import options:
opts = detectImportOptions('EUR_USD Historische Data.csv');
Read in the data:
EUR_USD = readtable('EUR_USD Historische Data.csv');
Substract dates and transform to datetime variable:
dt = EUR_USD(:,1);
dates = datetime(dt,'InputFormat','yyyyMMdd');
% Does not work because of single quotes
I was able to subtract closing prices and make them workable, but I am not sure if this is an elegant way of doing so:
closing_prices = str2double(table2array(EUR_USD(:,5)));
Ultimately the goal is to make the data workable. I need to compare two columns with datetime-variables and if dates do not match between the two columns I need to remove that entry such that in the end both columns match.
This is the vector with dates:
Dates vector wrong
I need it to look like this:
Dates vector correct
I think all you need to do is remove the ' character in order to read the data into datetime correctly. Look at the following example:
%stringz is the same as dt here: just the string data
T = table;
T.stringz = string(['''string1'''; '''string2'''; '''string3''']);
stringz = T.stringz;
%Run the for loop to remove the ' chars
for i = 1:length(stringz)
strval = char(stringz(i,1));
strval = strval(2:end-1);
strmat(i,1) = string(strval);
end
%Then load data into datetime after this for loop
dates = datetime(strmat,'InputFormat','yyyyMMdd');
strmat return a 3x1 string array with no ' characters on the outside of the string.

How to convert a numeric date vector into text date vector?

I have a numeric vector corresponding to dates in the following format yyyymmdd, ie for December 24th, 2010 it is 20101224. How can I convert it into text format, i.e. in the following format 'mm-dd-yyyy'?
You should really use datetime rather than convert to strings,
dates = datetime(20100124,'ConvertFrom','yyyymmdd')
The first input can be a numeric vector, assuming it's of the yyyymmdd format.
If you then want to specify a display format use,
dates.Format = 'MM-dd-yyyy'
If you really need them as strings you can then use,
dates = datestr(dates)
Matlab has a datestr command which might be useful. Example usage:
formatOut = 'mm-dd-yyyy';
datestr(now,formatOut)
For your date, you could convert the input number to a string, convert the string to a date and create a date string with the new format.
formatIn = 'yyyymmdd';
formatOut = 'mm-dd-yyyy';
inStr = num2str(20101224); % Skip this step if already a number
outStr = datestr(datenum(inStr, formatIn), formatOut)

Output timezone with Matlab datestr()

Very simple question. I'm using Matlab's datetime type, so I can carry timezone information. I need to get a specific string representation, to input into a DB. But datestr() does not have any fields to output tz info.
a = datetime('now', 'TimeZone', 'UTC');
%need output in the format 'YYYYMMDDTHH:MM:SS+00:00'
Any thoughts?
You can get the output you want by setting the Format property of the datetime object to display the time zone offset, converting it to a character array, then replacing the space by 'T':
>> a = datetime('now', 'TimeZone', 'UTC', 'Format', 'yyyyMMdd HH:mm:SSxxxxx')
a =
datetime
20171002 21:37:74+00:00
>> out = strrep(char(a), ' ', 'T')
out =
20171002T21:37:74+00:00
Also, take note of the case of the letters in the format string, as that matters for some of them.

Why is datestr('19-01-2004') = 26-Jun-0024 in MATLAB R2011a?

I also tried the following:
datestr('19-01-2004','dd-mm-yyyy')
ans =
26-06-0024
I am new to MATLAB, so I am not sure what else to check.
In the function datestr(), the 2nd parameter denotes how the output should look like. It doesn't say anything about the input.
Essentially, you try to perform 2 steps: parse a string and then format the parsed date again.
So you can do
n = datenum('19-01-2004','dd-mm-yyyy')
datestr(n, 'yyyy-mm-dd')
and you'll get an n of 731965 and a final output of 2004-01-19.
You can as well do
v = datevec('19-01-2004','dd-mm-yyyy')
datestr(v, 'yyyy-mm-dd')
and your v becomes [2004 1 19 0 0 0].
So remember: step 1 - parsing of input with the appropriate format string, step 2 - formatting of output with the wanted format string.
If you want to give the date in a "clean" and readable format, you could just do
v = [2004 1 19 0 0 0]
datestr(v, 'yyyy-mm-dd')
datestr(v, 'dd.mm.yyyy')
datestr(v, 'mm/dd/yyyy')
When using datestr to convert a date string from one form to another, the format of the input date string is limit to those listed here. The format of your input '19-01-2004' is 'dd-mm-yyyy' and is not one of the supported formats.
If we change the input string to '01/19/2004', which is the supported format 'mm/dd/yyyy', we get the correct output:
>> datestr('01/19/2004','dd-mm-yyyy')
ans =
19-01-2004
To circumvent the limited number of supported input formats, the documentation recommends using datenum first. So you can map your original input onto itself like:
>> datestr(datenum('19-01-2004','dd-mm-yyyy'),'dd-mm-yyyy')
ans =
19-01-2004
As for why MATLAB returns the date it does has to do with how it handles the unknown format.
I suspect whatever method they use to finally decide upon a format results in a really small date number, hence the year 24 output.

How to check input string format correctness, without reading the string?

I am writing a script that takes two strings of the format 'HH:MM' as inputs. These strings are times in hours (HH) and minutes (MM). I would like to display an error message if the user inputs the wrong format for a time, such as 'HH:MM:SS' if they think the script can interpret seconds as well. I have it set up to accept negative times, so an input like '-HH:MM' will be interpreted correctly. An input like 'HHH:MMM' with variable hour and minute sizes is also OK, actually any input of the form %s:%s should be accepted since errors like '5:30 AM' are dealt with later.
What I need is to test that the inputs are of the form "string colon string" before reading, is this possible? To make the problem clearer, here is code explaining how I read the inputs time1 and time2:
[hour1, min1] = strread(time1, '%s%s', 'delimiter', ':');
[hour2, min2] = strread(time2, '%s%s', 'delimiter', ':');
If time1 and time2 are formatted wrong, strread throws an unhelpful error. I want to display my own error first to explain what the problem was. How can I check the formats of time1 and time2 before actually reading them?
Ideas:
formatSpec = '%s : %s';
input = textscan(time1,formatSpec);
%Compare input to formatSpec somehow to see if they match?
if (no_match)
error('time1 must be formatted as HH:MM');
end
You can try something like that :
time1 = '10:21';
if isempty(regexp(time1,'^\d{2}:\d{2}'))
disp('the format is wrong') %won't display because the format if ok
end
And to check other format :
time1 = '100:21';
if isempty(regexp(time1,'^\d{2}:\d{2}'))
disp('the format is wrong') %will display because the format is wrong
end
EDIT
If you want to accept 'HHH:MMM' and other cases use:
regexp(time1,'^\d+:\d+')
And for the negative case ('-HHH:MMM' or other negative cases) use:
regexp(time1,'^-\d+:\d+')
Second edit
And if you want to test it in only one line :
regexp(time1,'^(-|.){1}\d+:\d+$') % however this one doesn't support 'HH:MM AM'
regexp(time1,'^(-|.){1}\d+:\d+.+$') % Now support 'HH:MM AM'
I tested it and it returns 1 for every case you mentionned.
It looks like you accept any numbers as long as there is only one : sign. In another words, perhaps you wanted to detect the more-than-one-colon case? You could count number of : signs and generate errors for those cases first before processing the string?