How to merge multiple cases into one in SPSS? - merge

I want to fill in missing values for a case with values from cases in a different file. The corresponding cases have the same refrence number, variable REF. In the end, there should only be be one case per reference number, with no missing values in any variable. I already tried: Data-> Merge files-> Add variable-> many to one, but I still end up with multiple cases per reference number or no change at all in the table. I can't figure out how this works.
My two data sets:
REF p1 p2 p3
1 5 NA NA
2 3 NA NA
3 4 NA NA
REF p1 p2 p3
1 NA 3 NA
1 NA NA 1
2 NA 2 NA
2 NA NA 4
3 NA 1 NA
3 NA NA 1
Desired output:
REF p1 p2 p3
1 5 3 1
2 3 2 4
3 4 1 1
What I tried, but did not work:

I suggest you first stack the two files, so that all the data is in one table, then use aggregation to get all the data for each case into one line. I suggest aggregation using the max function under the assumption that for every REF only one value exists in each column, so the aggregation will select this value and leave out the other "competing" missing values.
EDITED to leave only one line per "REF":
add files /file = dataset1 /file = dataset2.
exe.
dataset name gen.
aggregate /outfile=* /break=REF /P1 P2 P3=max(P1 P2 P3).

Related

How to replace identical values in multiple cases with the values of another variable? (SPSS)

If my variable REF has the value NA in a certain line (case), I want it to be the value of CASE instead of NA.
My data set:
CASE REF
1 NA
2 NA
3 1
4 NA
5 2
6 1
7 1
8 4
Desired output:
CASE REF
1 1
2 2
3 1
4 4
5 2
6 1
7 1
8 4
I tried "Recode into Same Variables", but somehow I don't know how to reference the variable CASE in there. What is the correct way to use SPSS for this?
I am assuming ref is a numeric variable and when you say NA you mean missing values. If this is not the case, let me know in a comment and I will revise solution accordingly.
Assuming CASE is a variable in your dataset, this should do it:
if missing(ref) ref=case.
If by "CASE" you are referring to the case number and not a variable in the dataset, use this instead:
if missing(ref) ref=$casenum.

How to update multiple tables in kdb

Say I have a list of tables. (sym1, sym2, sym3 etc)
How would I add a new column to each table called Sym containing the table name?
Thank you
You could use something like:
q){#[value x;`Sym;:;x]}each tables[]
+`a`b`c`Sym!(0 1 2 3 4;0 1 2 3 4;0 1 2 3 4;`sym1`sym1`sym1`sym1`sym1)
+`a`b`c`Sym!(0 1 2 3 4;0 1 2 3 4;0 1 2 3 4;`sym2`sym2`sym2`sym2`sym2)
+`a`b`c`Sym!(0 1 2 3 4;0 1 2 3 4;0 1 2 3 4;`sym3`sym3`sym3`sym3`sym3)
If you remove value from the first argument of #, this will update the tables in place.
Otherwise, since this returns a list, you can use indexing to return the table you want from the list:
q)({#[value x;`Sym;:;x]}each tables[])0
a b c Sym
----------
0 0 0 sym1
1 1 1 sym1
2 2 2 sym1
3 3 3 sym1
4 4 4 sym1
Hope this helps,
James
Another way to achieve this :
q){update Sym:x from x}each `sym1`sym2`sym3
q)raze (sym1;sym2;sym3)
p s Sym
----------------
2.08725 75 sym1
2.065687 6 sym1
2.058972 63 sym2
2.095509 62 sym2
2.036151 90 sym3
2.090895 63 sym3
If you are getting these tables (sym1,sym2,sym3) as the output of another function call like :
f each `s1`s2`s3
then I'll suggest updating the function to add the column Sym just before return these individual tables.
f:{
/some logic
update Sym:x from t
}
This will save an operation of adding a new column separately

Tableau Frequency Distribution - multiple groups

I have data which includes 2 columns, ages, and groups similar to
B 1 B 1 B 1 B 4 B 5 B 8 D 2 D 2 D 3 D 3 D 3 D 4 D 6 D 7 D 9 D 9
In Tableau, I wish to plot a line for each group B and D, % number of records(observations) (of group in group), against the age range, 1 to 9.
So B 1 - 3/6*100, B 5 1/6*100, D 3 - 3/10*100.
Any help or pointers would be really appreciated.
Enda
Drag 'Age' measure in columns.
Drag 'Group' dimension in 'Color'
Drag the tableau default measure of 'Number of Records' in rows. Make it's aggregation as 'sum', add quick table calculation of 'Percent of Total'. Change it's 'Compute Using' to 'Age'.
That's it! Hopefully this is what you were trying to do.

How to sum across a row in KDB/Q

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?
why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?
Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.
One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website
This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)

Pass multiple arguments to a function within select

I'd like to calculate a new column which is a function of several columns using select.
My actual application will involve a grouping in the select so the columns entries which I will pass to the function will contain lists. But this simple example illustrates my question
t:([] a:1 2 3; b:10 20 30; c:5 6 7)
/ Pass one argument, using projection (set first two arguments to 1)
select s:{[x;y;z] x+y+z}[1;1;] each a from t
/ Pass two arguments using each-both (set first arg to 1)
select s:a {[x;y;z] x+y+z}[1;;]'b from t
Now, how can I pass three or more arguments?
Each' will work in general but it's best to use vector operations where possible. Here I use the . operator to apply our function, \t to time both methods. I store their results to r1/r2 to show they are the same:
q)t:([]a:til n;b:til n;c:til n:1200300)
q)\t r1:update d:{x+y+z}'[a;b;c] from t
289
q)\t r2:update d:{x+y+z} . (a;b;c) from t
20
q)r1~r2
1b
q)r2
a b c d
-----------
0 0 0 0
1 1 1 3
2 2 2 6
3 3 3 9
4 4 4 12
5 5 5 15
..
Cheers,
Ryan
The following form works in general
q)t:([]a:til 10;b:til 10;c:til 10)
q)select d:{x+y+z}'[a;b;c] from t
d
--
0
3
6
9
..