I am trying to read in some data in date format and the solution is eluding me. Here are four of my tries using the simplest self-contained examples I could devise. (And the site is making me boost my text-to-code ratio in order for this to post, so please ignore this sentence).
*EDIT - my example was too simplistic. I have spaces in my variables, so I do need to specify positions (the original answer said to ignore positions entirely). The solution below works, but the date variable is not a date.
data clinical;
input
name $ 1-13
visit_date $ 14-23
group $ 25
;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
No need to specify the lengths. datalines already assumes space-delimited values. A simple way to specify an informat is to use a : after each input variable.
data clinical;
input ID$ visit_date:mmddyy10. group$;
format visit_date mmddyy10.; * Make the date look human-readable;
datalines;
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
;
run;
Output:
ID visit_date group
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
A friend of mine suggested this, but it seems odd to have to switch syntax markedly depending on whether the variable is a date or not.
data clinical;
input
name $ 1-12
#13 visit_date MMDDYY10.
group $ 25 ;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
SAS provides a lot of different ways to input data, just depending on what you want to do.
Column input, which is what you start with, is appropriate when this is true:
To read with column input, data values must have these attributes:
appear in the same columns in all the input data records
consist of standard numeric form or character form
Your data does not meet this in the visit_date column. So, you need to use something else.
Formatted input is appropriate to use when you want these features:
With formatted input, an informat follows a variable name and defines how SAS reads the values of this variable. An informat gives the data type and the field width of an input value. Informats also read data that is stored in nonstandard form, such as packed decimal, or numbers that contain special characters such as commas.
Your visit_date column matches this requirement, as you have a specific informat (mmddyy10.) you would like to use to read in the data into date format.
List input would also work, especially in modified list format, in some cases, though in your example of course it wouldn't due to the spaces in the name. Here's when you might want to use it:
List input requires that you specify the variable names in the INPUT statement in the same order that the fields appear in the input data records. SAS scans the data line to locate the next value but ignores additional intervening blanks. List input does not require that the data is located in specific columns. However, you must separate each value from the next by at least one blank unless the delimiter between values is changed. By default, the delimiter for data values is one blank space or the end of the input record. List input does not skip over any data values to read subsequent values, but it can ignore all values after a given point in the data record. However, pointer controls enable you to change the order that the data values are read.
(For completeness, there is also Named input, though that's more rare to see, and not helpful here.)
You can mix Column and Formatted inputs, but you don't want to mix List input as it doesn't have the same concept of pointer control exactly so it can be easy to end up with something you don't want. In general, you should use the input type that's appropriate to your data - use Column input if your data is all text/regular numerics, use formatted input if you have particular formats for your data.
Related
I have some samples I need to take.
In order to create a good identifier/serial number for the samples, I want it to be a product of its characteristics.
For example, if the sample was taken from India and the temperature was 40 degrees then I would click dropdowns in the form to create those two entries and then a serial number would be spat out in the form "Ind40".
Assuming that your form is bound to a table, you can create a calculated column in the table that concatenates the values from other columns into a single value.
For instance, create a new column and give it a name (for example, SerialNbr). Then for Data Type select "Calculated". An expression builder window will appear:
Enter the columns you'd like to concatenate and separate them with &. Here is an example of how the expression could look:
Left([Country],3) & [Temperature]
This expression takes the first 3 chars from the Country column and combines it with the value from Temperature column to create the value in column SerialNbr. The calculated column will automatically update when values are entered into the other fields. I'd also suggest adding another value to the calculated expression to help avoid duplicates, such as date/time of submission.
I am looking for a logic which will help me in coverting a string to number in teradata and hive.
It should be easily implementable in Tearadata as I dont have permission to deploy a UDF in TD. In hive if it is not simple I can easily write a UDF.
My requirement - Lets say I have columns sender_country, receiver country. I want to generate a number for concat('sender_country','_','receiver_country')
The number should always be same if the countries appear again.
Below is the illustration
UID sender_country receiver_country concat number
1 US UK US_UK 198760
2 FR IN FR_IN 146785
3 CH RU CH_RU 467892
4 US UK US_UK 198760
It should be in a way where all unique combinations of a country should have unique values. Like in above example US_US is repeated, it has same corresponding number.
I tried hashbucket(hashrow('concat')) in TD, but don't know its equivalent implementation in hive.
Similarly we have hash() function in hive, but don't have its equivalent function in TD.
I could not find any hash functions which returns similar values in TD and Hive too
You can simply convert each character into a number:
Ascii(Substr(sender_country,1,1))*1000000+
Ascii(Substr(sender_country,2,1))*10000+
Ascii(Substr(receiver_country,1,1))*100+
Ascii(Substr(receiver_country,2,1))
returns 85838575 for US,UK
My end goal is to have a box change color when the last 3 records input into a field (based on the time of input) in FileMaker achieve a certain criteria (ex. variance < 2). I would like to know how to make this happen, or how a calculation/script can be written to only look at the last 3 records.
There are several ways you could approach this. A simple one would be to use a script to:
Show all records in the given table;
Unsort them (assuming they were entered in chronological order; otherwise sort them by creation timestamp);
Omit all records except the last three;
Get the value of a summary field defined as Standard Deviation of your value field;
Set a global variable/field to the square of the returned value.
Then use the global variable/field to conditionally format your "box".
If you don't want to use a script, you will have to define a relationship in order to get the last three values in the table, regardless of the current found set and/or sort order. Or you may use the ExecuteSQL() function for this.
I am attempting to merge two data sets without a single key variable. The data looks like this in both data sets:
study_id.....round....other variables different between the two sets
A000019....R012....etc
A000019....R013
A000047....R013
A000047....R014
A000047....R015
A000267....R014
This is my code...
DATA RAKAI.complete;
length study_id $ 8;
MERGE hivgps2 rccsdata;
BY study_id round;
RUN;
I've tried to merge by study_id and round which are the only two variables shared across the data sets. But it just stacks the two sets creating double the correct number of IDs. The combination of "study_id" and "round" provides a unique identifier, but no one variable does. Does is just make the most sense to code a new unique id by combining the two variables that are shared by both data sets?
Many Thanks
I realized I can post the code that I meant to deal with potential unwanted spaces here.
DATA hivgps2;
SET hivgps2;
study_id = compress(study_id);
round= compress(round);
RUN;
DATA rccsdata;
SET rccsdata;
study_id = compress(study_id);
round=compress(round);
RUN;
Your code is the correct format for merging by multiple variables. Records from both datasets are included, so if none of the keys match then the result will be the same as if you used SET instead of MERGE.
Are you sure that there is any overlap between the two sets of data? Check that your variables are the same length. If they are character then make sure the values are consistent in their use of upper and lower case letters. Make sure that the values do not have leading spaces or other non-printing characters. Also make sure you haven't attached a format to one of the datasets so that the values you see printed are not what is actually in the data.
In your clean up data steps you should force the length of the variables to be consistent. Also you can compress more than just spaces from the values. I like to eliminate anything that is not a normal 7-bit ASCII code. That will get rid of tabs, non-breaking spaces, nulls and other strange things. In normal 7-Bit ASCII the printable characters are between ! ('21'x or 33 decimal) and ~ ('7E'x or 126 decimal).
data hivgps2_clean ;
length study_id $10 round $5 ;
set hivgps2;
format study_id round ;
study_id=upcase(compress(study_id,compress(study_id,collate(33,126))));
round=upcase(compress(round,compress(study_id,collate(33,126))));
run;
proc sort; by study_id round; run;
data rccsdata_clean;
length study_id $10 round $5 ;
set rccsdata;
format study_id round ;
study_id=upcase(compress(study_id,compress(study_id,collate(33,126))));
round=upcase(compress(round,compress(study_id,collate(33,126))));
run;
proc sort; by study_id round; run;
data want;
merge hivgps2_clean(in=in1) rccsdata_clean(in=in2);
by study_id round;
run;
You can try that, or you can just use a proc sql join:
proc sql;
create table rakai.complete as select
a.*, b.*
from hivgps2 as a
full join rccsdata as b
on a.study_id = b.study_id and a.round = b.round;
quit;
I have a .csv file and I can't read it on Octave. On R I just use the command below and everything is read alright:
myData <- read.csv("myData.csv", stringsAsFactors = FALSE)
However, when I go to Octave it doesn't do it properly with the below command:
myData = csvread('myData.csv',1,0);
When I open the file with Notepad, the data looks something like the below. Note there isn't a comma separating the last column name (i.e. Column3) from the first value (i.e. Value1) and the same thing happens with the last value of the first row (i.e. Value3) and the first value of the second row (i.e Value4)
Column1,Column2,Column3Value1,Value2,Value3Value4,Value5,Value6
The Column1 is meant for date values (with format yyyy-mm-dd hh:mm:ss), I don't know if that has anything to do with the problem.
Alex's answers already explains why csvread does not work for your case. That function only reads numeric data and returns an array. Since your fields are all strings, you need something that reads a csv file into a cell array.
That function is named csv2cell and is part of the io package.
As a separate note, if you plan to make operation with those dates, you may want to convert those dates as strings, into serial date numbers. This will allow you to put your dates in a numeric array which will allow for faster operations and reduced memory usage. Also, the financial package has many functions to deal with dates.
csvread only reads numeric data, so a date does not qualify unfortunately.
In Octave you might want to check out the dataframe package. In Matlab you would do readtable.
Otherwise there are also more primitive functions you can use like textscan.