sas how to count items in a list - macros

I would like to count the number of integers in my variable, the same way that %countw counts the number of words in a variable.
Example:
%let test = 'aaa' 'bbb';
%let ntest = %sysfunc(countw(&test.));
ntest = 2..
My question is how to do this for integers?
Now I have:
%let test2 = 12, 13, 14;
How to get ntest = 3?
How can I get the number of items in &test2.?
I apologize if this is ridiculously simple and I just missed the documentation.

Because there are commas as delimiters in your macro variable value, you can use the %superq function to prevent these commas to be interpreted as parameter separators in the macro call. And since your values are separated by both commas and spaces, you can specify both in a %str function, for the same reason as previously.
%let test2= 12, 13, 14;
%let ntest=%sysfunc(countw(%superq(test2),%str(, )));

Related

SAS - Fill in null values for numeric variables but not dates

I use the following to fill in null values for numeric variables with 0, but this does it for date variables as well. How can I fill in null values only for non-date numeric variables?
data mydataset;
set mydataset;
array myarray _numeric_;
do over myarray;
if myarray=. then myarray=0;
end;
run;
There is an excellent post on SAS Communities about determining if a variable is a date or not. It's available here. Your question isn't trivial, as you shouldn't forget about user-defined formats which can also behave like date formats (proc format). Let's suppose that all your date variables are of one format.
data test;
format x y z datetime8.;
x = .;
y = .;
z = .;
run;
%macro get_vars_format(lib, tab, fmt);
proc sql noprint;
select name into :names separated by ' '
from sashelp.vcolumn
where libname = "&lib." and
memname = "&tab." and
format eq "&fmt.";
quit;
%put Names: &names.;
data work.test;
set &lib..&tab.;
%let i = 1;
%let name = %scan(&names., &i., %str( ));
%do %while(&name. ne );
if &name. eq . then &name = 0;
output;
%let i = %eval(&i + 1);
%let name = %scan(&names., &i., %str( ));
%end;
run;
%mend get_vars_format;
%get_vars_format(lib=WORK, tab=TEST, fmt=DATETIME8.);
This macro takes three arguments:'
library name,
data set name,
and desired format name.
I am saving all the variables names into a macro-variable (they are separated by a space sign) and in a next step, I am iterating this in a loop (note i and name macro-variables). For every row, if a value of any variable with a given format is equal to . then replace it with 0. Note that if a variable has a date format, 0 will be treated as a Jan 01, 1960, as that's the first day in SAS date value convention.

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str

SAS Reading multiple records from one line without Line Feed CRLF

I have only 1 line without line feed (CRLF CRLF), the linefeed is a string of 4 characters, in this example is "#A$3" I don't need dlm for now, and I need to import it from a external file (/files/Example.txt)
JOSH 30JUL1984 1011 SPANISH#A$3RACHEL 29OCT1986 1013 MATH#A$3JOHNATHAN 05JAN1985 1015 chemistry
I need this line into 3 lines:
JOSH 30JUL1984 1011 SPANISH
RACHEL 29OCT1986 1013 MATH
JOHNATHAN 05JAN1985 1015 chemistry
How I can do that in SAS?
*Added: Your solutions are working with this example, but i have a issue, a line that contains more than the maximum length allowed for the line(32,767 bytes),
For example this line in the above exercise contains 5,000 records.
Is it possible?
Use the DLMSTR= option on the infile statement -- this will specify "#A$3" as the delimiter. Then use ## on the input statement to tell SAS to look for more records on the same line.
data test;
infile "/files/Example.txt" dsd dlmstr='#A$3';
informat var $255.;
input var $ ##;
run;
With your example, you will get a data set with 3 records with 1 variable containing the strings you are looking for.
Adjust the length of var as needed.
You could do something like this:
First import the file as a single row (be sure to adjust the length):
DATA WORK.IMPORTED_DATA;
INFILE "/files/Example.txt" TRUNCOVER;
LENGTH Column1 $ 255;
INPUT #1 Column1 $255.;
RUN;
Then parse imported data into variables using a data step:
data result (keep=var1-var4);
set WORK.IMPORTED_DATA;
delim = '#A$3';
end = 1;
begin = 1;
do while (end > 0);
end = find(Column1, delim, begin);
row = substr(Column1, begin, end - begin);
var1 = scan(row, 1);
var2 = scan(row, 2);
var3 = scan(row, 3);
var4 = scan(row, 4);
begin = end + length(delim);
output;
end;
run;
Try this in data step by viewing #A$3 as a multi-character delimiter:
data want (keep=subject);
infile 'C:\sasdata\test.txt';
input;
length line $4500 subject $80;
line=tranwrd(_infile_,"#A$3",'!');
do i=1 by 1 while (scan(line,i,'!') ^= ' ');
subject=scan(line,i,'!');
output;
end;
run;
_infile_ gives the current row that is being read in the data step. I converted the multi-character delimiter #A$2 into a single-character delimiter. tranwrd() can replace a sub-string inside a string. And then use the delimiter inside the scan() function.
Also, if you want to break the values up into separate variables, just scan some more. E.g. put something like B = scan(subject,2); into do loop and data want (keep= A B C D);. Cheers.

Joining the digits of a numeric vector

I'm fairly new to Matlab, although not to programming. I'm trying to hash a string, and get back a single value that acts as a unique id for that string. I'm using this DataHash function from FileExchange which returns the hash as an integer vector. So far the best solution I've found for converting this to a single numeric value goes:
hash_opts.Format = 'uint8';
hash_vector = DataHash(string, hash_opts);
hash_string = num2str(hash_vector);
% Use a simple regex to remove all whitespace from the string,
% takes it from '1 2 3 4' to '1234'
hash_string = regexprep(hash_string, '[\s]', '');
hashcode = str2double(hash_string);
A reproducible example that doesn't depend on DataHash:
hash_vector = [1, 23, 4, 567];
hash_string = num2str(hash_vector);
% Use a simple regex to remove all whitespace from the string,
% takes it from '1 2 3 4' to '1234'
hash_string = regexprep(hash_string, '[\s]', '');
hashcode = str2double(hash_string); % Output: 1234567
Are there more efficient ways of achieving this, without resorting to a regex?
Yes, Matlab's regex implementation isn't particularly fast. I suggest that you use strrep:
hashcode = str2double(strrep(hash_string,' ',''));
Alternatively, you can use a string creation method that doesn't insert spaces in the first place:
hash_vector = [1, 23, 4, 567];
hash_string = str2double(sprintf('%d',hash_vector))
Just make sure that your hash number is less than 2^53 or the conversion to double might not be exact.
I'v seen there's already an answer - though it loses precission as it omits leading 0s - I'm not really sure if it will cause you troubles but I wouldn't want to rely on it.
As you output as uint8 why don't you use hex values instead - this will give you the exactly same number. Converting back is also easy using dec2hex.
hash_vector = [1, 23, 4, 253]
hash_str=sprintf('%02x',hash_vector); % to assure every 8 bit use 2 hex digits!
hash_dig=hex2dec(hash_str)
btw. - your sampe hash contains 567 - an impossible number in uint8.
Having looked at DataHash the question would also be why not use base64 or hex in the first place.

Number to letter swapping in MATLAB

I have a vector, for example, V = [ 1, 2, 3, 4 ]. Is there a way to change this to the letters, [ a,b,c,d ]?
Using 'a' directly instead of ascii codes might be slightly more readable
charString = char(V-1+'a');
Uppercase is then obtained with
charString = char(V-1+'A');
There are two simple ways to do this. One way is a simple index.
C = 'abcdefghijklmnopqrstuvwxyz';
V = [8 5 12 12 15 23 15 18 12 4];
C(V)
ans =
helloworld
Of course, char will do it too. The char answer is better because it does not require you to store a list of letters to index into.
char('a' + V - 1)
ans =
helloworld
This is best since when you add 'a' to something, it converts 'a' to its ascii representation on the fly. +'a' will yield 97, the ascii form of 'a'.
A nice thing is it also works for 'A', so if you wanted caps, just add 'A' instead.
char('A' + V - 1)
ans =
HELLOWORLD
You can find more information about working with strings in MATLAB from these commands:
help strings
doc strings
Something like
C = char(V+ones(size(V)).*(97-1))
should work (97 is the ASCII code for 'a', and you want 1 to map to 'a' it looks like).
Using the CHAR function, which turns a number (i.e. ASCII code) into a character:
charString = char(V+96);
EDIT: To go backwards (mapping 'a' to 1, 'b' to 2, etc.), use the DOUBLE function to recast the character back to its ASCII code number:
V = double(charString)-96;