Merge macro tables in SAS - merge

I'm a beginner in SAS so I am unfamiliar with syntax. I have two datasets that were created using macros.
(macro: https://gist.github.com/statgeek/2f27939fd72d1dd7d8c8669cd39d7e67)
DATA test1;
set sashelp.class;
if prxmatch('m/M/oi', sex);
female=ifn( sex='F',1,0);
RUN;
%table_char(test1, height weight age, sex, female, test1_table_char);
DATA test2;
set sashelp.class;
if prxmatch('m/F/oi', sex);
female=ifn( sex='F',1,0);
RUN;
%table_char(test2, height weight age, sex, female, test2_table_char);
Desired Output:
Male Female
Height
Count
Mean
Median
.
.
Weight
Count
Mean
Median
.
.
Sex
M
F
Etc
I wwould like to merge the two macro tables together created with %table_char by Name. How should I call th two tables so I can merge?
DATA final_merge;
merge test1_table_char test2_table_char;
by NAME;
RUN;

looks like you need to do is append the datasets.
data final;
set test1 test2;
run;
You need not split and merge datasets, you can simply do.
DATA final;
set sashelp.class;
female=ifn( sex='F',1,0);
RUN;
if you want to merge sort the datasets and then merge datasets
proc sort data =test1;
by your_variable;
run;
proc sort data =test2;
by your_variable;
run;
data final;
merge test1 test2
by your_variable;
run;

Combine or merge test1 and test2 by NAME:
proc sort data=work.test1;
by name;
run;
proc sort data=work.test2;
by name;
run;
data work.test;
merge work.test1 work.test2;
by name;
run;
Producing this result:
Name Sex Age Height Weight female
Alfred M 14 69.0 112.5 0
Alice F 13 56.5 84.0 1
Barbara F 13 65.3 98.0 1
Carol F 14 62.8 102.5 1
Henry M 14 63.5 102.5 0
James M 12 57.3 83.0 0
Jane F 12 59.8 84.5 1
Janet F 15 62.5 112.5 1
Jeffrey M 13 62.5 84.0 0
John M 12 59.0 99.5 0
Joyce F 11 51.3 50.5 1
Judy F 14 64.3 90.0 1
Louise F 12 56.3 77.0 1
Mary F 15 66.5 112.0 1
Philip M 16 72.0 150.0 0
Robert M 12 64.8 128.0 0
Ronald M 15 67.0 133.0 0
Thomas M 11 57.5 85.0 0
William M 15 66.5 112.0 0
Run macro with the merged output:
%table_char(test, height weight age, sex, female, test_table_char);
Produces the following result:
categorical value
Sex
F 9(47.4%)
M 10(52.6%)
Height
Count(Missing) 19(0)
Mean (SD) 62.3(5.1)
Median (IQR) 62.8(57.5 - 66.5
Range 51.3 - 72.0
90th Percentile 69.0
Weight
Count(Missing) 19(0)
Mean (SD) 100.0(22.8)
Median (IQR) 99.5(84.0 - 112.
Range 50.5 - 150.0
90th Percentile 133.0
Age
Count(Missing) 19(0)
Mean (SD) 13.3(1.5)
Median (IQR) 13.0(12.0 - 15.0
Range 11.0 - 16.0
90th Percentile 15.0
Female 9(47.4%)

Related

KDB: weighted median

How can one compute weighted median in KDB?
I can see that there is a function med for a simple median but I could not find something like wmed similar to wavg.
Thank you very much for your help!
For values v and weights w, med v where w gobbles space for larger values of w.
Instead, sort w into ascending order of v and look for where cumulative sums reach half their sum.
q)show v:10?100
17 23 12 66 36 37 44 28 20 30
q)show w:.001*10?1000
0.418 0.126 0.077 0.829 0.503 0.12 0.71 0.506 0.804 0.012
q)med v where "j"$w*1000
36f
q)w iasc v / sort w into ascending order of v
0.077 0.418 0.804 0.126 0.506 0.012 0.503 0.12 0.71 0.829
q)0.5 1*(sum;sums)#\:w iasc v / half the sum and cumulative sums of w
2.0525
0.077 0.495 1.299 1.425 1.931 1.943 2.446 2.566 3.276 4.105
q).[>]0.5 1*(sum;sums)#\:w iasc v / compared
1111110000b
q)v i sum .[>]0.5 1*(sum;sums)#\:w i:iasc v / weighted median
36
q)\ts:1000 med v where "j"$w*1000
18 132192
q)\ts:1000 v i sum .[>]0.5 1*(sum;sums)#\:w i:iasc v
2 2576
q)wmed:{x i sum .[>]0.5 1*(sum;sums)#\:y i:iasc x}
Some vector techniques worth noticing:
Applying two functions with Each Left (sum;sums)#\: and using Apply . and an operator on the result, rather than setting a variable, e.g. (0.5*sum yi)>sums yi:y i or defining an inner lambda {sums[x]<0.5*sum x}y i
Grading one list with iasc to sort another
Multiple mappings through juxtaposition: v i sum ..
You could effectively weight the median by duplicating (using where):
q)med 10 34 23 123 5 56 where 4 1 1 1 1 1
10f
q)med 10 34 23 123 5 56 where 1 1 1 1 1 4
56f
q)med 10 34 23 123 5 56 where 1 2 1 3 2 1
34f
If your weights are percentages (e.g. 0.15 0.10 0.20 0.30 0.25) then convert them to equivalent whole/counting numbers
q)med 1 2 3 4 5 where "i"$100*0.15 0.10 0.20 0.30 0.25
4f

How can I get the cumulative sum between a predefined number of days in SAS EG

I would like to find a way to calculate the cumulative sum between a predefined number of days.
Just to clarify, I am looking to calculate the cumulative sum between days. Those can be consecutive (e.g. 03AUG and 04AUG) or not (e.g. 04AUG and 06AUG).
Below the data I have:
data have;
input ID $ DT:date9. Amount;
format DT date9.;
datalines;
A 09JUL2021 3600
A 03AUG2021 456
A 04AUG2021 33
A 06AUG2021 235
A 07AUG2021 100
A 09AUG2021 86
A 12AUG2021 456
A 24AUG2021 22
A 25AUG2021 987
A 26AUG2021 916
A 27AUG2021 81
;
run;
I want to be able to create a new variable that shows the cumulative amount between 2 or more days.
I should be able every time to select if I want the cumulative amount between 2 days, or 3 days and so on.
Below the data I want, when I select to calculate the sum between two days:
data what_I_want;
input ID $ DT:date9. Amount Sum_Between_Days;
format DT date9.;
datalines;
A 09JUL2021 3600 0
A 03AUG2021 456 0
A 04AUG2021 33 489
A 06AUG2021 235 268
A 07AUG2021 100 335
A 09AUG2021 86 186
A 12AUG2021 456 0
A 24AUG2021 22 0
A 25AUG2021 987 0
A 26AUG2021 916 1925
A 27AUG2021 81 1984
;
run;
Below the data I want when I select to calculate the sum between 3 days:
data what_I_want;
input ID $ DT:date9. Amount Sum_Between_Days;
format DT date9.;
datalines;
A 09JUL2021 3600 0
A 03AUG2021 456 0
A 04AUG2021 33 0
A 06AUG2021 235 724
A 07AUG2021 100 0
A 09AUG2021 86 186
A 12AUG2021 456 0
A 24AUG2021 22 0
A 25AUG2021 987 0
A 26AUG2021 916 0
A 27AUG2021 81 2006
;
run;
Hopefully, I make sense, but please let me know if not..
Thanks in advance.

KDB+/Q: Custom min max scaler

Im trying to implement a custom min max scaler in kdb+/q. I have taken note of the implementation located in the ml package however I'm looking to be able to scale data between a custom range i.e. 0 and 255. What would be an efficient implementation of min max scaling in kdb+/q?
Thanks
Looking at the link to github on the page you referenced it looks like you may be able to define a function like so:
minmax255:{[sf;x]sf*(x-mnx)%max[x]-mnx:min x}[255]
Where sf is your scaling factor (here given by 255).
q)minmax255 til 10
0 28.33333 56.66667 85 113.3333 141.6667 170 198.3333 226.6667 255
If you don't like decimals you could round to the nearest whole number like:
q)minmax255round:{[sf;x]floor 0.5+sf*(x-mnx)%max[x]-mnx:min x}[255]
q)minmax255round til 10
0 28 57 85 113 142 170 198 227 255
(logic here is if I have a number like 1.7, add .5, and floor I'll wind up with 2, whereas if I had a number like 1.2, add .5, and floor I'll end up with 1)
If you don't want to start at 0 you could use | which takes the max of it's left and right arguments
q)minmax255roundlb:{[sf;lb;x]lb|floor sf*(x-mnx)%max[x]-mnx:min x}[255;10]
q)minmax255roundlb til 10
10 28 56 85 113 141 170 198 226 255
Where I'm using lb to mean 'lower bound'
If you want to apply this to a table you could use
q)show testtab:([]a:til 10;b:til 10)
a b
---
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
q)update minmax255 a from testtab
a b
----------
0 0
28.33333 1
56.66667 2
85 3
113.3333 4
141.6667 5
170 6
198.3333 7
226.6667 8
255 9
The following will work nicely
minmaxCustom:{[l;u;x]l + (u - l) * (x-mnx)%max[x]-mnx:min x}
As petty as it sounds, it is my strong recommendation that you do not follow through with Shehir94 solution for a custom minimum value. Applying a maximum to get a starting range, it will mess with the original distribution. A custom minmax scaling should be a simple linear transformation on a standard 0-1 minmax transformation.
X' = a + bX
For example, to get a custom scaling of 10-255, that would be a b=245 and a=10, we would expect the new mean to follow this formula and the standard deviation to only be a Multiplicative, but applying lower bound messes with this, for example.
q)dummyData:10000?100.0
q)stats:{`transform`minVal`maxVal`avgVal`stdDev!(x;min y;max y; avg y; dev y)}
q)minmax255roundlb:{[sf;lb;x]lb|sf*(x-mnx)%max[x]-mnx:min x}[255;10]
q)minmaxCustom:{[l;u;x]l + (u - l) * (x-mnx)%max[x]-mnx:min x}
q)res:stats'[`orig`lb`linear;(dummyData;minmax255roundlb dummyData;minmaxCustom[10;255;dummyData])]
q)res
transform minVal maxVal avgVal stdDev
-----------------------------------------------
orig 0.02741043 99.98293 50.21896 28.92852
lb 10 255 128.2518 73.45999
linear 10 255 133.024 70.9064
// The transformed average should roughly be
q)10 + ((255-10)%100)*49.97936
132.4494
// The transformed std devaition should roughly be
q)2.45*28.92852
70.87487
To answer the comment, this could be applied over a large number of coluwould be applied to a table in the following manner
q)n:10000
q)tab:([]sym:n?`3;col1:n?100.0)
q)multiColApply:{[tab;scaler;colList]flip ft,((),colList)!((),scaler each (ft:flip tab)[colList])}
q)multiColApply[tab;minmaxCustom[10;20];`col1`col2]
sym col1 col2 col3
------------------------------
cag 13.78461 10.60606 392.7524
goo 15.26201 16.76768 517.0911
eoh 14.05111 19.59596 515.9796
kbc 13.37695 19.49495 406.6642
mdc 10.65973 12.52525 178.0839
odn 16.24697 17.37374 301.7723
ioj 15.08372 15.05051 785.033
mbc 16.7268 20 534.7096
bhj 12.95134 18.38384 711.1716
gnf 19.36005 15.35354 411.597
gnd 13.21948 18.08081 493.1835
khi 12.11997 17.27273 578.5203

Merge overwrite stopping at first match

i currently have three data sets in SAS 9.3
Data set "Main" contains SKU ID's and Customer ID's as well as various other variables such as week.
Customer_ID week var2 var3 SKU_ID
1 1 x x 1
1 2 x x 1
1 3 x x 1
1 1 x x 2
1 2 x x 2
2 1 x x 1
2 2 x x 1
2 3 x x 1
2 1 x x 2
2 2 x x 2
data set "standard" contains the standard location for each Customer_ID.
data set "overrides" contains data override location (if applicable) for a certain sku for certain customers for instance. Thus, it contains SKU_ID, customer_id and location
standard data set
customer_id location
1 A
1 A
2 C
2 C
override dataset
customer_id sku_id location
1 1 A
1 2 B
When merging all of the data sets this is what i get
Customer_ID week var2 var3 SKU_ID location
1 1 x x 1 A
1 2 x x 1 A
1 3 x x 1 A
1 1 x x 2 B
1 2 x x 2 A
2 1 x x 1 C
2 2 x x 1 C
2 3 x x 1 C
versus what i want it to look like
Customer_ID week var2 var3 SKU_ID location
1 1 x x 1 A
1 2 x x 1 A
1 3 x x 1 A
1 1 x x 2 B
1 2 x x 2 B
2 1 x x 1 C
2 2 x x 1 C
2 3 x x 1 C
proc sort data=overrides; by Location SKU_ID; run;
Proc sort data= main; by Location SKU_ID;
run;
Proc sort data= Standard; by Location;
run;
data Loc_Standard No_LOC;
Merge Main(in = a) Standard(in = b);
by Location;
if a and b then output Loc_standard;
else if b then output No_LOC;
run;
/*overwrites standard location if an override for a sku exist*/
Data Loc_w_overrides;
Merge Loc_standard overrides;
by Location SKU_ID;
run;
That is how SAS combines datasets. When datasets have observations to contribute to a BY group the values from the datasets are read in the order they appear in the MERGE statement. But when one dataset runs out of new observations for the BY group then SAS does not read those values in. So the value read from the other dataset is no longer replaced.
Either drop the original variable and just use the value from the second dataset. Basically this will setup an 1 to Many merge.
Or rename the override variable and add your own logic for when to apply the override.
I am not sure how you are getting the result you posted since you do not have any standards for CUSTOMER_ID=2 in your posted data. If the values of location to not depend on customer_id then why is that variable in the standards and override datasets?
Perhaps you meant that the standards dataset only has SKU_ID and location?
data main_w_standards;
merge main standards;
by sku_id ;
run;
proc sort data=main_w_standards;
by customer_id sku_id;
run;
data main_w_overrides;
merge main_w_standards overrides(in=in2 rename=(location=override));
by customer_id SKU_ID;
if in2 then location=override;
drop override;
run;
Why not UPDATE the STANDARD(loc) with OVERIDE(oride) and then merge with customer data.
data loc;
input customer_id Sku_id location:$1.;
cards;
1 1 A
1 2 A
;;;;
proc print;
data oride;
input customer_id sku_id location:$1.;
cards;
1 1 A
1 2 B
;;;;
run;
proc print;
data locoride;
update loc oride;
by cu: sk:;
run;

kdb voolkup. get value from table that is mapped to smallest val larger than x

Assuming I have a dict
d:flip(100 200 400 800 1600; 1 3 4 6 10)
how can I create a lookup function that returns the value of the smallest key that is larger than x? Given a table
tbl:flip `sym`val!(`a`b`c`d; 50 280 1200 1800)
I would like to do something like
{[x] : update new:fun[x[`val]] from x} each tbl
to end up at a table like this
tbl:flip `sym`val`new!(`a`b`c`d; 50 280 1200 1800; 1 4 10 0N)
sym val new
a 50 1
b 280 4
c 1200 10
d 1800
stepped dictionaries may help
http://code.kx.com/q/cookbook/temporal-data/#stepped-attribute
q)d:`s#-0W 100 200 400 800 1600!1 3 4 6 10 0N
q)d 50 280 1200 1800
1 4 10 0N
I think you will want to use binr to return the next element greater than or equal to x. Note that you should use a sorted list for this to work correctly. For the examples above, converting d to a dictionary with d:(!). flip d I came up with:
q)k:asc key d
q)d k k binr tbl`val
1 4 10 0N
q)update new:d k k binr val from tbl
sym val new
------------
a 50 1
b 280 4
c 1200 10
d 1800
Where you get the dictionary keys to use with: k k binr tbl`val.
Edit: if the value in the table needs to be mapped to a value greater than x but not equal to, you could try:
q)show tbl:update val:100 from tbl where i=0
sym val
--------
a 100
b 280
c 1200
d 1800
q)update new:d k (k-1) binr val from tbl
sym val new
------------
a 100 3
b 280 4
c 1200 10
d 1800