How can i accurately import "dates" from .csv files in SAS? - date

I am trying to import .csv files in SAS and they include dates and times.
In csv files, they are defined as "m-d-yyyy hh:mm:ss" (I am not allowed to change the data in excel but I have to work just on SAS).
The problem is that when SAS reads it, it thinks m is a and d is m. :(
For example, what is "9-1-2016 8:00:57" in excel is converted as 09JAN16:08:00:57 in SAS.
I want formats like 01SEP16:08:00:57.
How can i accurately import dates from .csv files in SAS?
Thanks.

I see that you've solved this by using R instead. In case anyone wants an answer for this in future:
One thing you can try is to change your options so that SAS reads as MDY instead of DMY. The way to do this is to run:
options datestyle=mdy;

Related

matlab lose response when use xlsread reading a large spreadsheet

I am trying to use xlsread functioin to read spreadsheets of 6000x2700 (xlsx file).
I have two questions:
First, when I use something like
[num,txt,~]=xlsread(input_file,input_sheet,'A1:CYY6596')
Matlab keeps showing 'busy' and lose response (while I can open it in excel within 30 seconds).
Is there any solution If I don't want to loop through ranges of the xlsx file? In other word, can I just dump spreadsheet of this size into matlab using xlsread?
Alternatively, Maybe I can use loops to read these files range by range, but I cannot identify the last column of each of the spreadsheets unless I read the whole file first. Therefore, If I cannot identify the last column, it is hard to make loops and do my interpretation on the file.
So My second questions is: Is there a way to identify the last column of the spreadsheet without reading the whole spreadsheet?
Thanks.
EDIT:However, if I run a similar code which only reads first 400 columns ('A1:RY6596') of the spreadsheet, such problem doesn't happen.
which version of matlab you are using?
matlab has a problem to load bix excell file.
convert the excell in csv and use M = csvread(filename).
You can try to convert .xlsx into .xls also.
You can Try the tool in
File Exchange

Variables format in SAS proc import

When importing an excel file into SAS, I find that the import is not being done properly due to wrong variables'format.
The table trying to import look like this:
ID Barcode
1 56798274
2 56890263
3 60998217
4 SKU89731
...
The code I am using is the following:
PROC IMPORT OUT= WORK.test
DATAFILE= "D:\Barcode.xlsx"
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
What happens is that column "Barcode" has best12. format and therefore cases as ID=4 get a missing value ('.') because they originally had both characters and numbers.
Since it is not possible to change the format of a variable in a proc step how can I import the file correctly, and only using SAS editor?
EDIT:
Another option that does half the work and it might give some inspiration is dynamically changing the format of the variable by importing through a data step:
libname excelxls Excel "D:\Barcode.xlsx";
data want;
set excelxls.'Sheet1$'n (DBSASTYPE=(Barcode='CHAR(255)'));
run;
With the above code I force SAS to import the variable in the format I want (char), but still missings are generated for values as in ID=4.
I think your problem is that you have mixed=NO. Change this to mixed=YES and SAS will check a sample of the observations to see if there are any non-numeric characters in the variables - if it finds one then it will specify the variable as character.
Take a look at the link here for more information:
You could convert to a csv (or maybe xls?) file and either:
use the guessingrows= option to increase the number of rows SAS uses to determine type.
If you want complete control of the import: copy the data step code that proc import puts in the log and paste into your program. Now you can modify it to read the data exactly as you want.

Matlab's Import Tool recognizes a column as numbers but generate %s in formatSpec

I use Matlab's Import Tool to generate a script that will take care of importing several CSV files with the same columns.
The Import Tool successfully manages to recognize the type of each column of my CSV file:
However, in the generated script, the same column are cast as strings (%s = string):
Any idea why?
Surprisingly it works fine with CSV files with fewer columns (it works with 70-column CSV files, but the issue arises with with 120-column CSV files). Here is one example of a CSV file that triggers the issue.
I use R2014b x64 with Windows 7 SP1 x64 Ultimate.
This is happening because one of the columns in your file contains data which contains numbers and text. The Import Tool is predicting that you're going to want to extract the numbers from this field, so it labels the column as 'NUMBER'. However, standard textscan doesn't allow for this, so the Import Tool must generate code to read in all of the data as text, and does the numeric conversion afterwards. The Import Tool is trying to help avoid errors using textscan.
The result of running the generated code is still numeric, as is shown in the Import Tool.
The specific column is labeled SEGMENT_ID in your example file. It contains data like:
l8K3ziItQ2pRFQ8
L79Y0zA8xF7fVTA
JYYqCCwRrigaUmD

How determine date format in MATLAB

I want to know in MATLAB which is the date pattern used by Excel. This is because I read an Excel file from MATLAB, but depending of the user machine locate the date is represented as dd-mm-yyyy or mm-dd-yyyy.
CLARIFICATION: Sorry for my bad explanation. This is my scenario. I have an Excel file with dates (and other collumns, no relevant for this problem). I have two computers, which need to run my matlab application. In the first one when I use xlsread (in MATLAB) the dates appears in dd-mm-yyyy format due to the regional configuration of my computer. In the second one, I read the same file, in the same MatLab version, but the readed dates are in mm-dd-yyyy format (again, due to the regional configuration of computer 2, which is different from computer 1).
Now, when I try to use datenum, to date transformation, I cant use formatIn parameter in a right way, because if I specify the formatIn equals to mm-dd-yyyy this will Works correctly in computer 1, by not in computer 2, and vice versa.
So, I think that I need to identify in MATLAB which is the date pattern used by Excel in the computer, in order to find the right input parameter for formatIn.
It is impossible to do unless you know your data really well. For instance if you have yearly readings for 01/07/20XX, it is impossible to know if it is 7th Jan or 1st July.
However, you can try the following:
MyString='01-23-2012';
FirstTwo=str2num(MyString(1:2));
if(FirstTwo>12)
display('DD/MM');
else
display('MM/DD');
end
If the first two digits of the date are greater than 12, then you can probably conclude that you have DD/MM/YYYY. You can loop this over all your dates.
If you're talking about an actual .xls file, I don't know enough to say if there's a some flag for this kind of thing, but one heuristic approach (and possibly the only approach with a CSV format) would be to look for numbers greater than 12. That will immediately tell you which format you have, because such a number can't be correspond to a month. Of course, with a small data set, this isn't reliable (strictly, it's never perfectly reliable, but with non-trivial data, it's highly likely to work).
You may be able to do something with Java to tell you the date format.
>> import java.text.DateFormat;
>> import java.text.SimpleDateFormat;
>> df = DateFormat.getDateInstance(DateFormat.SHORT);
>> dateFormat = char(df.toPattern())
dateFormat =
dd/MM/yy
I think xlsread uses this format, although you'll need to test it on both of your machines.
Note there is also a Locale input to getDateInstance that may be useful.
I am kind of confused by your question, both MATLAB and Excel are able to easily support mm-dd and dd-mm. In excel, the default will depend on where you live. In America, it will be mm-dd, and in Europe (and probably most of the rest of the world), it will be dd-mm.
In MATLAB, I am not sure if it is location dependent like Excel is, as an American, the standard is of course mm-dd, but you can fully customize how matlab parses date strings!
Check out http://www.mathworks.com/help/matlab/ref/datenum.html and then go to input arguments, then "formatIn", it will provide you a list of ways to read in dates and convert it to a serial date number. (or vector if you want)
EDIT:
Nevermind, I misunderstood your question
I run into the same issue with computers from Australia and USA.
This is a way around but it is a clean solution.
In excel convert date to text for example
in International format 'yyyymmdd'
% B1=TEXT(A1,"yyyymmdd") % This is in excel
% in matlab read excel file 'dates.xlsx'
[data, dates_header] =xlsread('dates.xlsx');
% use datevec to read-in data
t = datevec(dates_header(:,2),'yyyymmdd');

MATLAB: How to import multiple CSV files with mixed data types

I have just started learning MATLAB and have difficulties to import csv files to a 2-D array..
Here is a sample csv for my needs:(all the csv files are in the same format with fixed columns)
Date, Code, Number....
2012/1/1, 00020.x1, 10
2012/1/2, 00203.x1, 0300
...
As csvread() only works with integer numbers, should I import numeric data and text data separately or is there any quick way to import multiple csv files with mixed data types?
Thanks a lot!!
What you're looking for is maybe the function xlsread.
It opens any file recognized by Excel, and automatically separates text data from numerical data.
The problem is that the default delimiter for at least on my computer is ;, and not , (at least for my locale here in Brazil). So xlsread will try to separate the fields on the file with a ;, and not a comma as you'd like.
To change that you have to change your system locales to add the comma as the list separator. So if you feel like it, to do it in windows vista, click Start, Control Panel, Regional and Language Options, Customize this format, and change the List Separator from ';' to ','. On other windows the process should be almost the same.
After doing that, typing:
[num, txt, all] = xlsread('your_file.csv');
will return something like:
num =
10
300
txt =
'01/01/2012' ' 00020.x1'
'02/01/2012' ' 00203.x1'
all =
'01/01/2012' ' 00020.x1' [ 10]
'02/01/2012' ' 00203.x1' [300]
Notice that if your locale has already the list separator set to ',', you won't have to change anything on your system to make that work.
If you don't want to change your system just to use the xlsread function, then you could use the textscan function described here: http://www.mathworks.com/help/techdoc/ref/textscan.html
The problem is that it is not as simple as calling it, as you will have to open the file, iterate on the lines, and tell matlab explicitly the format of your file.
Best regards
I recently wrote a function that solves exactly this problem. See delimread.
It's worth noting that xlsread on csv files only works in windows. On Linux or Mac, xlsread works in 'basic' mode which cannot read csv files. It might not be a great idea in the longrun to use xlsread in case you need to migrate across platforms or automate code runs on Linux servers.
xlsread is also much slower than other text parsing functions since it opens an Excel session to read the file.