pyspark transpose from rows to columns values and assign value - pyspark

I'm looking to transpose a row value into column name and currently handling 300million records.
Here the value column is static 1, 2, 3.
My data looks like :
id description Value
5644555 field1 1
23783009 field2 2
2190345 field33 3
2190346 field345 1
2190347 field67 2
2190348 field2 3
2190347 field33 2
2190347 field345 2
23783009 field67 1
2190352 field68 1
where ever there is a missing value we need the field1 value to be hard coded to 0. And also I don't know the max number of distinct description columns as it fluctuates from 200 to 300 unique entries. I tried of using pivot not sure how to work with this scenario to handle large set of 300million records. Really appreciate on any help here.
load_df.groupBy("id").pivot("description").sum("Value").show()
Resultant Expected output looks like:
id field1 field2 field33 field345 field67 field68.........
5644555 1 0 0 0 0 0
23783009 0 2 0 0 1 0
2190345 0 0 3 0 0 0
2190346 0 0 0 1 0 0
2190347 0 0 2 2 2
2190348 0 3 0 0 0 0
2190352 0 0 0 0 0 1

Related

Count of flag value 1 between two flag values 0s

I have the flag column which has 0 and 1 values.
I want to count the 1s between two 0s
flag
0
0
1
1
0
1
1
1
Expected output
0
0
1
2
0
1
2
3

Finding the position of repeated values in a logical array

If I have a vector such as:
0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0
How do I find the position of the first time there are two consecutive 1s. I.e. the answer to the above would be 9.
Thanks!
Can't comment, so will give you a hint here: "Finite State Machines"

How to identify the same number in a matrix? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a binary matrix, it looks like this:
A = [ 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0;
1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1; ]
But when I try to put A into a calculation, I only can use vector B, which
is the row sum of matrix A into calculation. B looks like this:
B=[ 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 1];
But I still want to carry the information about which "1" comes from which row of matrix A. I want to know is there any way to add additional conditions to vector B, so that vector B still can carry the information from matrix A, that is which "1" comes from which "row" of matrix A.
Assuming that A only contains 0 and 1 values,
[v, B] = max(A,[],1);
B(v==0) = 0;
gives
B =
2 2 0 1 1 1 0 2 2 0 1 1 0 0 2 3 3
If there are more than one 1 value in a column, this gives the row index of the first one.
Its #luis's idea.. i'm just adding little changes. Also i still don't know whether this is what OP wants.
Created a 3D matrix from luis's solution, So that both the binary values and row info are stored in B. If you want binary values, access slice 1. if you want row info, access slice 2
[B(:,:,1), B(:,:,2)] = max(A);
B(1,~all(B,3),:) = 0;
>> B
B(:,:,1) =
1 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 1
B(:,:,2) =
2 2 0 1 1 1 0 2 2 0 1 1 0 0 2 3 3
If you want a specific binary value and its row index, say for eg, 8th binary value and its corresponding row index,
>> B(:,8,:)
ans(:,:,1) =
1
ans(:,:,2) =
2

is there a better way of assigning values to a matrix

say i have a matrix
A=zeros(10,3);
and a vector
ll=[1 1 1 2 2 2 3 1 3 2]';
and i want to assign the value in each row corresponding to the value in ll for that row to be 1
i.e output would be
A= 1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
1 0 0
0 0 1
0 1 0
how i do it is using a for loop
for ii=1:length(ll)
A(ii,ll(ii)=1;
end
This should do the trick:
ll=[1 1 1 2 2 2 3 1 3 2]';
A=bsxfun(#eq,ll,1:max(ll))
I'm using bsxfun to check when the entry of ll is equal to an element of the row vector [1 2 3] (in this case). If the entry of ll is 1, it will be equal to the entry in the first column of the [1 2 3] vector and will give a 1 in the first column of A and zeros in the rest of the columns of that row.
Just convert to a linear index:
A((ll-1)*size(A,1) + (1:size(A,1)).') = 1;

Logical operations on matrices columns.

Let say that I have 1 matrix with numbers (0,1). How can i create new matrix that is the result of a logical operation among the columns?
eg. A =
0 0 0 1 0
1 1 1 1 1
0 1 1 0 0
0 0 0 0 1
1 0 0 1 0
1 1 1 1 1
If all elements of **rows** are equal to 1 - 1, if not - 0.
(like AND operation)
Ans= 0
1
0
0
0
1
Thanks!
To solve your problem this would work -
all(A,2)
If you were looking to set elements based on the columnwise data in A, you would do this -
all(A,1)
More info on all, must serve you well.