Creating a unique ID variable using information from two other variables - group-by

I have a large dataset of children and their parents (could be one or two),
collected over multiple waves.
The child has a unique ID, but the parents
have just been called parent1 "1" or parent2 "2", so they do not have their own unique ID.
I would like to make a new variable like "New ParentID" below which
gives each parent a unique ID.
`
> ChildID = c("1","1","1","1","1","2","2","2","2","3","3","3","3","3","3")
> StudyWave = c("1","2","3","4","5","2","3","4","6","1","2","3","4","5","6")
> ParentID = c("1","2","1","2","1","1","1","1","1","1","2","1","2","1","2")
> NewParentID = c("1","2","1","2","1","3","3","3","3","4","5","4","5","4","5")
> data=cbind.data.frame(ChildID, StudyWave, ParentID, NewParentID)
> data
ChildID StudyWave ParentID NewParentID
1 1 1 1 1
2 1 2 2 2
3 1 3 1 1
4 1 4 2 2
5 1 5 1 1
6 2 2 1 3
7 2 3 1 3
8 2 4 1 3
9 2 6 1 3
10 3 1 1 4
11 3 2 2 5
12 3 3 1 4
13 3 4 2 5
14 3 5 1 4
15 3 6 2 5
`
Many thanks in advance for any suggestions - I am stuck.

Related

KDB+/Q:Input agnostic function for single and multi row tables

I have tried using the following function to derive a table consisting of 3 columns with one column data holding a list of an arbitrary schema.
fn:{
flip `time`data`id!(x`b;(x`a`b`c`d`e);x`a)
};
which works well on input with multiple rows i.e.:
q)x:flip `a`b`c`d`e!(5#enlist 5?10)
q)fn[`time`data`id!(x`b;(x`a`b`c`d`e);x`a)]
time data id
-----------------
8 8 5 2 8 6 8
5 8 5 2 8 6 5
2 8 5 2 8 6 2
8 8 5 2 8 6 8
6 8 5 2 8 6 6
However fails when using input with a single row i.e.
q)x:`a`b`c`d`e!5?10
q)fn[`time`data`id!(x`b;(x`a`b`c`d`e);x`a)]
time data id
------------
8 7 7
8 8 7
8 4 7
8 4 7
8 6 7
which is obviously incorrect.
One might fix this by using enlist i.e.
q)x:enlist `a`b`c`d`e!5?10
q)fn[`time`data`id!(x`b;(x`a`b`c`d`e);x`a)]
time| 8
data| 7 8 4 4 6
id | 7
Which is correct, however if one were to apply this in the function i.e.
fn:{
flip enlist `time`data`id!(x`b;(x`a`b`c`d`e);x`a)
};
...
time| 2 5 8 7 9
data| 2 5 8 7 9 2 5 8 7 9 2 5 8 7 9 2 5 8 7 9 2 5 8 7 9
id | 2 5 8 7 9
Which has the wrong format of data values.
My question here is how might one avert this conversion issue and derive the same field values whether the argument is a multi row or single row table.
Or otherwise what is the canonical implementation of this in kdb+/q
Thanks
Edit:
To clarify: my problem isn't necessarily with the data input as one could just apply enlist if it is only one row. My question pertains to how one might use enlist in the fn function to make single row input conform to the logic seen when using multi row tables. i.e. how to replace fn enlist input with fn data (how to make the function input agnostic) Thanks
Are you meaning to flip the data perpendicular to the rest of the table? Your 5 row example works because there are 5 rows and 5 columns. The single row doesn't work due to 1 row to 5 columns.
Correct me if I'm wrong but I think this is what you want:
fn:{([]time:x`b;data:flip x`a`b`c`d`e;id:x`a)};
--------------------------------------------------
t1:flip `a`b`c`d`e!(5#enlist til 5);
a b c d e
---------
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
fn[t1]
time data id
-----------------
0 0 0 0 0 0 0
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
--------------------------------------------------
t2:enlist `a`b`c`d`e!til 5;
a b c d e
---------
0 1 2 3 4
fn[t2]
time data id
-----------------
1 0 1 2 3 4 0
Note without the flip you get this:
([]time:t1`b;data:t1`a`b`c`d`e;id:t1`a)
time data id
-----------------
0 0 1 2 3 4 0
1 0 1 2 3 4 1
2 0 1 2 3 4 2
3 0 1 2 3 4 3
4 0 1 2 3 4 4
In this case the time is no longer in line with the data but it works because of 5 row and cols.
Edit - I can't think of a better way to convert a dictionary to a table when needed other than using count first in a conditional. Note if the first key is a nested list this wouldn't work
{ $[1 = count first x;enlist x;x] } `a`b`c`d`e!til 5
Note, your provided function doesn't work with this:
{
flip `time`data`id!(x`b;(x`a`b`c`d`e);x`a)
}{$[1 = count first x;enlist x;x]} `a`b`c`d`e!til 5

Remove rows containing duplicate elements

I have this example matrix:
1 2 4 5 1 3
2 3 5 6 3 4
1 2 3 4 5 6
3 2 4 6 1 5
...
I need to delete each row that contains duplicate elements. In this example I have to delete the first and second rows.
I know how to do this in a for-loop, but I don't want to use a for-loop.
Assuming A as the input matrix, you could do -
A(all(diff(sort(A,2),[],2),2),:)
Sample run -
>> A
A =
1 2 4 5 1 3
2 3 5 6 3 4
1 2 3 4 5 6
3 2 4 6 1 5
>> A(all(diff(sort(A,2),[],2),2),:)
ans =
1 2 3 4 5 6
3 2 4 6 1 5
Alternatively, if you don't mind some bsxFUN -
A(~any(sum(bsxfun(#eq,A,permute(A,[1 3 2])),2)>1,3),:)

convert For syntax to a faster way in Matlab

I have below code and I want to convert it to a faster way but I don't how I can convert For syntax to a faster way in Matlab.
If user count is 5 and item count is 2 and time count is 4, I want to create this matrix:
1 1 1
1 1 2
1 1 3
1 1 4
1 2 1
1 2 2
1 2 3
1 2 4
2 1 1
2 1 2
2 1 3
2 1 4
...
result=zeros(userCount*itemCount*timeCount,4);
j=0;
for i=1:userCount
result(j*itemCount*timeCount+1:j*itemCount*timeCount+itemCount*timeCount,1)=ones(itemCount*timeCount,1)*i;
j=j+1;
end
j=0;
h=1;
for i=1:userCount*itemCount
result(j*timeCount+1:j*timeCount+timeCount,2)=ones(timeCount,1)*(h);
j=j+1;
h=h+1;
if h>itemCount
h=1;
end
end
j=0;
for i=1:userCount*itemCount
result(j*timeCount+1:j*timeCount+timeCount,3)=1:timeCount;
j=j+1;
end
for i=1:size(subs,1)
f=(result(:,1)==subs(i,1)& result(:,2)==subs(i,2));
result(f,:)=[];
end
What you are describing is to enumerate permutations for three independent linear sets. One way to achieve this would be to use ndgrid and unroll each output into a single vector:
userCount = 5; itemCount = 2; timeCount = 4;
[X,Y,Z] = ndgrid(1:timeCount,1:itemCount,1:userCount);
result = [Z(:) Y(:) X(:)];
We get:
result =
1 1 1
1 1 2
1 1 3
1 1 4
1 2 1
1 2 2
1 2 3
1 2 4
2 1 1
2 1 2
2 1 3
2 1 4
2 2 1
2 2 2
2 2 3
2 2 4
3 1 1
3 1 2
3 1 3
3 1 4
3 2 1
3 2 2
3 2 3
3 2 4
4 1 1
4 1 2
4 1 3
4 1 4
4 2 1
4 2 2
4 2 3
4 2 4
5 1 1
5 1 2
5 1 3
5 1 4
5 2 1
5 2 2
5 2 3
5 2 4

sort multiple columns according to particular order

I have a matrix
A = 1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
I have 3 arrays containing the orders in which I separately want to sort the respective columns. Example I1 for column 1, I2 for column 2, ....
I1 = 5 I2 = 4 I3 = 3
4 3 2
3 2 1
2 1 5
1 5 4
After sorting the matrix A I should get:-
If only I1 is used to sort the 1st column
A = 5 1 1
4 2 2
3 3 3
2 4 4
1 5 5
If only I2 is used to sort the 2nd column
A = 1 4 1
2 3 2
3 2 3
4 1 4
5 5 5
If only I3 is used to sort the 3rd column
A = 1 1 3
2 2 2
3 3 1
4 4 5
5 5 4
If only I1,I2,I3 is used to sort the all columns
A = 5 4 3
4 3 2
3 2 1
2 1 5
1 5 4
Please suggest me how to do.
If their dimensions are all the same, this should be what you need:
A([I1 I2 I3]);
If you wish to sort columns individually, you can use this syntax:
A(:,2)=A(I2,2);
Or e.g. columns 2 and 3:
A(:,[2 3]) = [A(I2,2) A(I3,3)];

I want to group the columns of the matrix A which have the same value for the third line in Matlab

For a matrix A (4 rows, 1000 columns). I want to group the columns of the matrix A which have the same value for the third line. so I must have sub matrix with a third row that contains the same value.
for example:
if:
A =
1 4 5 2 2 2 2 1 1 5
1 4 5 4 4 2 2 4 5 2
3 3 3 3 4 1 3 5 3 4
4 5 5 5 4 1 5 5 5 5
then
A1 =
1 4 5 2 2 1
1 4 5 4 2 5
3 3 3 3 3 3
4 5 5 5 5 5
A2 =
2 5
4 2
4 4
4 5
A3 =
2
2
1
1
the result can be in the form of a cell.
here's one possible hack (warning: I haven't been able to check this):
A =
1 4 5 2 2 2 2 1 1 5
1 4 5 4 4 2 2 4 5 2
3 3 3 3 4 1 3 5 3 4
4 5 5 5 4 1 5 5 5 5
specialRow=3;
unqCols = unique(A(specialRow,:));
numUnq = length(unqCols);
sepMats{numUnq}=[];
for i=1:numUnq
sepMats{i} = A(:,A(specialRow,:)==unqCols(i));
end
In the example you shown, there are 4 unique elements in the 3rd row, so you should obtain 4 submatrices, but you only show 3 ?
Here is one way:
clear all;
%data
A = [1 4 5 2 2 2 2 1 1 5;
1 4 5 4 4 2 2 4 5 2;
3 3 3 3 4 1 3 5 3 4;
4 5 5 5 4 1 5 5 5 5
]
%engine
row = 3;
b = unique(A(row,:));
r = arrayfun(#(i) A(:,A(row,:)==b(i)),1:length(b), 'UniformOutput',false);
r{:}
You can make the assignment in a single line using ACCUMARRAY:
A = [1 4 5 2 2 2 2 1 1 5;
1 4 5 4 4 2 2 4 5 2;
3 3 3 3 4 1 3 5 3 4;
4 5 5 5 4 1 5 5 5 5
];
out = accumarray(A(3,:)', (1:size(A,2)), [], #(x){A(:,x)} );
With this, out{i} contains all columns of A where the third row of A equals i (and empty in case there is no valid column).
If you want out{i} to contain columns corresponding to the i-th smallest unique value in the third row of A, you can use GRP2IDX from the statistics toolbox first:
[idx,correspondingEntryInA] = grp2idx(A(3,:)'); %'#
out = accumarray(idx, (1:size(A,2)), [], #(x){A(:,x)} );
Here, out{i} contains the columns corresponding to correspondingEntryInA(i).