How to replace identical values in multiple cases with the values of another variable? (SPSS) - merge

If my variable REF has the value NA in a certain line (case), I want it to be the value of CASE instead of NA.
My data set:
CASE REF
1 NA
2 NA
3 1
4 NA
5 2
6 1
7 1
8 4
Desired output:
CASE REF
1 1
2 2
3 1
4 4
5 2
6 1
7 1
8 4
I tried "Recode into Same Variables", but somehow I don't know how to reference the variable CASE in there. What is the correct way to use SPSS for this?

I am assuming ref is a numeric variable and when you say NA you mean missing values. If this is not the case, let me know in a comment and I will revise solution accordingly.
Assuming CASE is a variable in your dataset, this should do it:
if missing(ref) ref=case.
If by "CASE" you are referring to the case number and not a variable in the dataset, use this instead:
if missing(ref) ref=$casenum.

Related

Merge 2 lists according to values in a boolean list

I have a method of achieving this which also explains my question.
a:1 2 3 4;
b:5 6 7;
cond:1101001b;
comb:(count cond) # 0N;
comb[where cond]:a;
comb[where not cond]:b
But q has so many utilities for manipulating lists, I am wondering if there is a more direct way of doing this.
rank is what you need.
q)comb
1 2 5 3 6 7 4
q)(b,a)rank cond
1 2 5 3 6 7 4
You could write the expression in a single line
comb:#[;where not cond;:;b] #[;where cond;:;a] (count cond)#0N
Alternatively, assuming the 1s and 0s of cond matches the lengths of a and b:
(a,b) iasc where[cond],where not cond

How to merge multiple cases into one in SPSS?

I want to fill in missing values for a case with values from cases in a different file. The corresponding cases have the same refrence number, variable REF. In the end, there should only be be one case per reference number, with no missing values in any variable. I already tried: Data-> Merge files-> Add variable-> many to one, but I still end up with multiple cases per reference number or no change at all in the table. I can't figure out how this works.
My two data sets:
REF p1 p2 p3
1 5 NA NA
2 3 NA NA
3 4 NA NA
REF p1 p2 p3
1 NA 3 NA
1 NA NA 1
2 NA 2 NA
2 NA NA 4
3 NA 1 NA
3 NA NA 1
Desired output:
REF p1 p2 p3
1 5 3 1
2 3 2 4
3 4 1 1
What I tried, but did not work:
I suggest you first stack the two files, so that all the data is in one table, then use aggregation to get all the data for each case into one line. I suggest aggregation using the max function under the assumption that for every REF only one value exists in each column, so the aggregation will select this value and leave out the other "competing" missing values.
EDITED to leave only one line per "REF":
add files /file = dataset1 /file = dataset2.
exe.
dataset name gen.
aggregate /outfile=* /break=REF /P1 P2 P3=max(P1 P2 P3).

(q/kdb+) Merge items in a list

I have a list of items and need to merge them into a single column
using the list
list:(1 2;3 4 5 7;0 1 3)
index value
0 1 2
1 3 4 5 7
2 0 1 3
my goal is
select from list2
value
1
2
3
4
5
7
0
1
3
'raze' function flattens out 1 level of the list.
q) raze (1 2;3 4 5 7;0 1 3)
q) 1 2 3 4 5 7 0 1 3
If you have list with multi level indexing then use 'over' adverb with raze:
q) (raze/)(1 2 3;(11 12;33 44);5 6)
To convert that to table column:
q) t:([]c:raze list)
ungroup would also work provided your table doesn't have multiple columns with different nesting (or strings)
q)ungroup ([]list)
list
----
1
2
3
4
5
7
0
1
3
If you just wanted your list to appear like that I would do the following.
1 cut raze list
I see that you have used a select statement, however if you want your column defined as this in your table do the following
a:raze list
tab:([] b:a)
Your output from this should look like this
q)tab
b
-
1
2
3
4
5
7
0
1
3
Overall, a more concise way to achieve what you want to do would be
select from ([]raze list)
To avoid any errors you should not call the column header 'value' as this is a protected keyword in kdb+ and when you try to reassign it as a column header kdb will through an assign error
`assign
Hope this helps

How to get the difference of matrixes without repetitions removed

The function setdiff(A,B,'rows') is used to return the set of rows that are in A but not B, with repetitions removed.
Is there any way to do it without removing the repetitions?
Thanks a lot.
You can use ismember instead of setdiff, to find all the rows of B that appear in A.
Because you want only those that NOT appear in A, use the ~ sign, and finally take all A rows in these rows indices:
A =
1 2 3
4 5 6
1 2 3
7 8 9
B =
4 5 6
C=A(~ismember(A,B,'rows'),:)
C =
1 2 3
1 2 3
7 8 9

Pass multiple arguments to a function within select

I'd like to calculate a new column which is a function of several columns using select.
My actual application will involve a grouping in the select so the columns entries which I will pass to the function will contain lists. But this simple example illustrates my question
t:([] a:1 2 3; b:10 20 30; c:5 6 7)
/ Pass one argument, using projection (set first two arguments to 1)
select s:{[x;y;z] x+y+z}[1;1;] each a from t
/ Pass two arguments using each-both (set first arg to 1)
select s:a {[x;y;z] x+y+z}[1;;]'b from t
Now, how can I pass three or more arguments?
Each' will work in general but it's best to use vector operations where possible. Here I use the . operator to apply our function, \t to time both methods. I store their results to r1/r2 to show they are the same:
q)t:([]a:til n;b:til n;c:til n:1200300)
q)\t r1:update d:{x+y+z}'[a;b;c] from t
289
q)\t r2:update d:{x+y+z} . (a;b;c) from t
20
q)r1~r2
1b
q)r2
a b c d
-----------
0 0 0 0
1 1 1 3
2 2 2 6
3 3 3 9
4 4 4 12
5 5 5 15
..
Cheers,
Ryan
The following form works in general
q)t:([]a:til 10;b:til 10;c:til 10)
q)select d:{x+y+z}'[a;b;c] from t
d
--
0
3
6
9
..