How to import dates correctly from this .csv file into Matlab? - matlab

I have a .csv file with the first column containing dates, a snippet of which looks like the following:
date,values
03/11/2020,1
03/12/2020,2
3/14/20,3
3/15/20,4
3/16/20,5
04/01/2020,6
I would like to import this data into Matlab (I think the best way would probably be using the readtable() function, see here). My goal is to bring the dates into Matlab as a datetime array. As you can see above, the problem is that the dates in the original .csv file are not consistently formatted. Some of them are in the format mm/dd/yyyy and some of them are mm/dd/yy.
Simply calling data = readtable('myfile.csv') on the .csv file results in the following, which is not correct:
'03/11/2020' 1
'03/12/2020' 2
'03/14/0020' 3
'03/15/0020' 4
'03/16/0020' 5
'04/01/2020' 6
Does anyone know a way to automatically account for this type of data in the import?
Thank you!
My version: Matlab R2017a
EDIT ---------------------------------------
Following the suggestion of Max, I have tried specifiying some of the input options for the read command using the following:
T = readtable('example.csv',...
'Format','%{dd/MM/yyyy}D %d',...
'Delimiter', ',',...
'HeaderLines', 0,...
'ReadVariableNames', true)
which results in:
date values
__________ ______
03/11/2020 1
03/12/2020 2
NaT 3
NaT 4
NaT 5
04/01/2020 6
and you can see that this is not working either.

If you are sure all the dates involved do not go back more than 100 years, you can easily apply the pivot method which was in use in the last century (before th 2K bug warned the world of the danger of the method).
They used to code dates in 2 digits only, knowing that 87 actually meant 1987. A user (or a computer) would add the missing years automatically.
In your case, you can read the full table, parse the dates, then it is easy to detect which dates are inconsistent. Identify them, correct them, and you are good to go.
With your example:
a = readtable(tfile) ; % read the file
dates = datetime(a.date) ; % extract first column and convert to [datetime]
idx2change = dates.Year < 2000 ; % Find which dates where on short format
dates.Year(idx2change) = dates.Year(idx2change) + 2000 ; % Correct truncated years
a.date = dates % reinject corrected [datetime] array into the table
yields:
a =
date values
___________ ______
11-Mar-2020 1
12-Mar-2020 2
14-Mar-2020 3
15-Mar-2020 4
16-Mar-2020 5
01-Apr-2020 6

Instead of specifying the format explicitly (as I also suggested before), one should use the delimiterImportoptions and in the case of a csv-file, use the delimitedTextImportOptions
opts = delimitedTextImportOptions('NumVariables',2,...% how many variables per row?
'VariableNamesLine',1,... % is there a header? If yes, in which line are the variable names?
'DataLines',2,... % in which line does the actual data starts?
'VariableTypes',{'datetime','double'})% as what data types should the variables be read
readtable('myfile.csv',opts)
because the neat little feature recognizes the format of the datetime automatically, as it knows that it must be a datetime-object =)

Related

How to transform date in Stata?

I've looked for help on the internet for the following, but I could not find a satisfying answer: for an assignment, I need to plot the time series of a certain variable (the term spread in percentages), with years on the x-axis.
However, we use daily data. Does anybody know a convenient way in which this can be done? The 'date' variable that I've got is formulated in the following way: 20111017 represents the 17th of October 2011.
I tried to extract the first 4 numbers of the variable 'date', by using the substr(date, 1, 4) command, but the message 'type mismatch' popped up. Also, I'm not quite sure if it gives the right information if I only use the years to plot daily data (over the years). It now gives the following graph, which doesn't look that nice.
Answering the question in your title.
The date() function expects a string. If your variable with value 20111017 is in a numeric format you can convert it like this: tostring datenum , gen(datestr).
Then when using the date() function you must provide a mask that tells Stata what format the date string is in. Below is a reproducible example you can run to see how this works.
* Example generated by -dataex-. For more info, type help dataex
clear
input float datenum
20111016
end
* Convert numberic varaible to string
tostring datenum , gen(datestr)
* Convert string to date
gen date = date(datestr, "YMD")
* Display date as date
format date %td
If this does not help you, try to provide a reproducible example.
This adds some details to the helpful answer by #TheIceBear.
As he indicates, one way to get a Stata daily date from your run-together date variable is convert it to a string first. But tostring is just one way to do that and not essential. (I have nothing against tostring, as its original author, but it is better suited to other tasks.)
Here I use daily() not date(): the results are identical, but it's a good idea to use daily(): date() is all too often misunderstood as a generic date function, whereas all it does is produce daily dates (or missings).
To get a numeric year variable, just divide by 10000 and round down. You could convert to a string, extract the first 4 characters, and then convert to numeric, but that's more operations.
clear
set obs 1
gen long date = 20111017
format date %8.0f
gen ddate = daily(strofreal(date, "%8.0f"), "YMD")
format %td ddate
gen year = floor(date/10000)
list
+-----------------------------+
| date ddate year |
|-----------------------------|
1. | 20111017 17oct2011 2011 |
+-----------------------------+

SAS: Get current year in YY format

I want to assign the current year in a YY format to either a macro or data set variable.
I am able to use the automatic macro variables &sysdate or &sysdate9 to get the current date. However, extracting the year in a YY format is proving to be a nightmare. Below are some examples of what I've been trying.
There exists the YEARw. format. But when I try to use it I get errors or weird results. For instance, running
data _null_;
yy = year(input("&sysdate9.", year2.));
put yy=;
run;
produces the error
ERROR 48-59: The informat YEAR was not found or could not be loaded.
If I try to format the variable in the output, I get 1965 instead of the current year. The following
data _null_;
yy = year(input("&sysdate9.", date9.));
put yy= yy year2.;
run;
outputs
yy=2016 65
Please help.
This works to get you the 2-digit year number of the current year:
DATA _NULL_;
YEAR = PUT(TODAY(),YEAR2.);
PUT YEAR;
RUN;
/* Returns: 16 */
To breakdown what I am doing here:
I use TODAY() to get the current date as a DATE type. &SASDATE needs to be converted to a DATE, but also it is the date that the SAS session started. TODAY() is the current date.
PUT allows us to pass in a non-character (numeric/date) value, which is why it is used with TODAY() as opposed to INPUT.
I think it is worth exploring the issues here in more detail.
First, Formats are patterns for converting numeric values to a human readable format. That's what you want to do here: convert a date value to a human readable format, in this case to a year.
Informats, on the other hand, convert human readable information to numeric values. That's not what you're doing here; you have a value already.
Second, put matches with Formats, and input matches with informats, exclusively.
Third, you get close in your last try: but you misuse the year format. Formats are basically value mappings, so they map every possible numeric value in their range (sometimes "all values" is the range, sometimes not) to a display value (string). You need to know what kind of value is expected on the input. YEARw. expects a date value as input, not a year value: meaning input is "number of days from 1/1/1960", mapped to "year". So you cannot take a value you've already mapped to a year value and map it again with that method; it will not make any sense.
Let's look at it:
data _null_;
yy = year(input("&sysdate9.", date9.));
put yy= yy year2.;
run;
yy contains the result of the year function - 2016. Good so far. Now, you need the 2 digit year (16); you can get that through mod function, if you like, or put/substr/input:
data _null_;
yy = input(substr(put(year(input("&sysdate9.", date9.)),4.),3,2),2.);
put yy=;
run;
mod is probably easier though since it's a number. But of course you could've used year:
data _null_;
yy = put(input("&sysdate9.", date9.),year2.);
put yy=;
run;
Now, yy is character, so you could wrap that with input(...,2.) or leave it character depending on your purposes.
Finally - a use note on &sysdate9.. You can easily make this a date without input:
"&sysdate9."d
So:
yy = put("&sysdate9."d,year2.);
That's called a date literal (and "..."dt and "..."t also work for datetime,time). They require things in the standard SAS formats to work properly.
And as pointed out in Nicarus' answer, today() is a bit better than &sysdate9 since it is guaranteed to be today. If you're running this in batch or restart your session daily, this won't matter, but it will if you have a long-running session.
Apply the year function to the date variable
Convert to string
Take last 2 digits
EDIT: change input to PUT
Year = substr(put(year(today()), 4.), 3);

Why does Matlab dbf-reader read certain integers wrong?

I use the matlab dbf reader available
here
I've noticed that three digit integers some times are read wrong.
Original data from dbf-file:
LAMAX,DTLD,1,599,727Q9,A,STANDARD,1,18,18,0,2359.5
But looking at the data in Matlab you see that 599 becomes 995.
Why is that?
'LAMAX','DTLD',[1],[995],'727Q9','A','STANDARD','1','18','18','0',
[2.3595e+03]
This is how I read the dbf file with matlab code
[dbfData, NAMES] = dbfread(path2file);
where dbfData is the actual data and NAMES are the field names in the dbf-file.
EDIT:
The dbf-file was created with INM
When I open the dbf file using OpenOffice the headers look like this
METRIC_ID,C,6 ; GRID_ID,C,8I_INDEX,N,3,0 ; J_INDEX,N,3,0 ; ACFT_ID,C,12 ; OP_TYPE,C,1 ; PROF_ID1,C,8 ; PROF_ID2,C,1 ; RWY_ID,C,8 ; TRK_ID1,C,8 ; TRK_ID2,C,1 ; DISTANCE,N,9,1
The distorted integers are stored with 3 digits numbers without decimals J_INDEX,N,3,0
Have you used the updated version of STR2DOUBLE2CELL?
From the link above:
STR2DOUBLE2CELL subfunction sometimes works incorrectly if number of digits in the input parameter is different

Select specific filenames from an array of filenames containing a date in the name

If I have a group of .wav files and Im trying to pick only month wise or do daily/only night psd(power spectral density) averages etc or choose files belonging to a month how to go about? The following are first 10 .wav files in a .txt file that are read into matlab code-
AMAR168.1.20150823T200235Z.wav
AMAR168.1.20150823T201040Z.wav
AMAR168.1.20150823T201845Z.wav
AMAR168.1.20150823T202650Z.wav
AMAR168.1.20150823T203455Z.wav
AMAR168.1.20150823T204300Z.wav
AMAR168.1.20150823T205105Z.wav
AMAR168.1.20150823T205910Z.wav
AMAR168.1.20150823T210715Z.wav
yyyymmddTHHMMSSZ.wav is part of the format to get sense of some parameters.
Many thanks
You need to be more specific.
Do all files always start with "AMAR168.1." for instance?
Anyway, here's a general approach to get you started:
AllFilenames = fileread ('filenames.dat');
FileNames = strsplit (AllFilenames, '\n');
for i = FileNames
if ~isempty (strfind (i{:}, '20150823')); disp(i{:}); end
end
Your filename examples aren't very useful because they all have the same date, but, anyway, you get the point.
Alternatively, if the filenames always have the same format and size, you could do, e.g.:
AllFilenames = fileread ('filenames.dat');
AllFilenames = strvcat (strsplit (AllFilenames, '\n'));
LogicalIndices = categorical (cellstr (AllFilenames(:,15:16))) == '08';
to obtain all rows where the month is '08' for instance. This assumes that the month is always at position 15 to 16 in the string

How to produce a formatted date string in Q/KDB?

How can one produce an ISO date string "yyyy-MM-dd" from a Q date type? I looked at concatenating the various parts but am not even able to get the day/month, e.g. d:2015.12.01;d.month prints 2015.12, i.e. more than just the month.
If you plan to do it on a large scale (i.e. a large vector/list of dates or a column in a table) and you're sure your dates are always well-formed, then you could use a dot-amend:
q)update .[;(::;4 7);:;"-"]string date from ([] date:2#.z.D)
date
------------
"2016-01-04"
"2016-01-04"
This way you wouldn't have to apply to "each" entry of the vector/list, it works on the vector/list itself.
q)"-" sv "." vs string[2015.12.01]
"2015-12-01"
vs vector from string, splits by "." above;
sv string to vector, join by "-" above.
Remember a string is just a char array, so you can grab each part as you require with indexing. But the above is useful as the resulting vector of vs gives a 3-length vector that you manipulate any way you like
I believe the shortest (and cleanest) option for ISO8601 UTC timestamp available since at least kdb v3.4 would be to use .h.iso8601 builtin
i.e.
q).h.iso8601 .z.p
"2020-11-09T15:42:19.292301000"
Or, if you just need milliseconds similar to what JS toISOString() does, use:
q).isotime:{(23#.h.iso8601 x),"Z"}
q).isotime[.z.p]
"2020-11-09T16:02:02.601Z"
q).isotime[2015.12.01]
"2015-12-01T00:00:00.000Z"
Note .z.p is important, as .h.iso8601 .z.P would silently give you local time without timezone (+0100 etc) so it would still be interpreted as UTC by compliant ISO8601 parser :(
Check-out this GitHub library for datetime formatting. It supports the excel way of formatting date and time. It might not be the right fit for formatting a large number of objects.
q).dtf.format["yyyy-mm-dd"; 2018.06.08T01:02:03.456]
"2018-06-08"
time formatting :
q).dtf.format["yyyy-mmmm-dd hh:uu AM/PM"; 2018.01.08T01:02:03.456]
"2018-January-08 01:02 AM"
I am using something like this:
q)ymd:{[x;s](4#d),s,(2#-5#d),s,-2#d:string[x]}
q)ymd[.z.D;"-"]
"2016-01-25"
q)ymd[.z.D;"/"]
"2016/01/25"
q)ymd[.z.D;""]
"20160125"
Or for tables:
q)t:([]a:5#1;5#.z.d)
q)update s:ymd[;"-"] each d from t
a d s
-------------------------
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
Please change the separator like - or / in the update statement.
update s:{ssr[string x;".";y]}'[d;"-"] from ([]a:5#1;5?.z.d)
a d s
-------------------------
1 2010.12.31 "2010-12-31"
1 2012.08.24 "2012-08-24"
1 2004.12.05 "2004-12-05"
1 2000.10.02 "2000-10-02"
1 2006.09.10 "2006-09-10"