add additional row by index increasing for every line of row - pyspark

ratings_test = test_data.map(lambda l: l.split()).map(lambda a :
Row(userId=int(a[0]),movieId=int(a[1]),index=i)).cache()
I want the index row to be increasing

Related

Remove formula from column with python

I am trying to remove the formula from a column in a existing sheet with python.
I tryed to set my formula to None using the column object (column.formula = None)
It does not work and my column object remains unchanged. Anyone have inputs to solve this issue ? Thank you !
This took me a bit to figure out, but seems like I've found a solution. Turns out that this is a 2-step process:
Update the column object to remove the formula (by setting column.formula to an empty string).
For each row in the sheet, update the cell within that column to remove the formula (set cell.value to an empty string and cell.formula to None).
Completing the STEP 1 will remove the formula from the column object -- but that cell in each row will still contain the formula. That's why STEP 2 is needed -- STEP 2 will remove the formula from the individual cell in each row.
Here's some example code in Python that does what I've described. (Be sure to update the id values to correspond to your sheet.)
STEP 1: Remove formula from the Column
column_spec = smartsheet.models.Column({
'formula': ''
})
# Update column
sheetId = 3932034054809476
columnId = 4793116511233924
result = smartsheet_client.Sheets.update_column(sheetId, columnId, column_spec)
STEP 2: Remove the formula from that cell in each row
Note: This sample code updates only one specific row -- in your case, you'll need to update every row in the sheet. Just build a row object for each row in the sheet (like shown below), then call smartsheet_client.Sheets.update_rows once, passing in the array of row objects that you've built corresponding to all rows in the sheet. By doing things this way, you're only calling the API once, which is the most efficient way of doing things.
# Build new cell value
new_cell = smartsheet.models.Cell()
new_cell.column_id = 4793116511233924
new_cell.value = ''
new_cell.formula = None
# Build the row to update
row_to_update = smartsheet.models.Row()
row_to_update.id = 5225480965908356
row_to_update.cells.append(new_cell)
# Update row
sheetId = 3932034054809476
result = smartsheet_client.Sheets.update_rows(sheetId, [row_to_update])

OpenRefine: How can I offset values? (preceding row to the following row)

Let's suppose I have this list in OpenRefine:
A
B
C
Is there a way to move (offset values) B to A like the following?
A B
B C
With the cross() function, and v3.5 of OpenRefine (currently in beta) you can access previous or following rows by not supplying the field name. You can achieve the same by creating an index column in v3.4.
So, you can do cells.ColumnName.value +" "+ cross(row.index + 1, "", "")[0].cells.ColumnName.value to get the value of the next row appending the value of that cell in the current row, with a space.
Note that this will take the value of the row with an index higher, not necessally the row following in the display, if you use sorting.
Regards, Antoine

How to insert a structure within a structure

I have a 1x1 structure called imu_data.txyzrxyz1. It has one field called txyzrxyz1 and the value is 4877x7 double. I just want to "copy and paste" row 62 into row 63 (double up that row) so that the structure now becomes a 4878x7 structure. I've tried the following, with other versions without success:
extra_63 = imu_data.txyzrxyz1(63,:);
imu_data2.txyzrxyz1 = [{imu_data.txyzrxyz1(1:62,:) extra_63 imu_data.txyzrxyz1(63:end,:)}]
Thanks
You can index the row to duplicate twice while matrix indexing:
row_to_duplicate = 63;
yourdata = rand(100,10);
yourstruct.data = yourdata;
yourstruct.data = yourstruct.data([1:row_to_duplicate, row_to_duplicate:end],:)
So in case of 63, 1:row_to_duplicate will create a column vector from 1:63, and row_to_duplicate:end will create a column vector from 63:100 in this example. When combining these, 63 will occur twice, hence that row is duplicated.
You were almost there, you only had to get rid of the {}'s and put the data in the right orientation by using ; instead of a space between matrix entries to vertically concatenate instead of horizontally:
extra_63 = imu_data.txyzrxyz1(63,:);
imu_data2.txyzrxyz1 = [imu_data.txyzrxyz1(1:62,:); extra_63; imu_data.txyzrxyz1(63:end,:)]

Remove rows from a matrix

I have the array "A" with values:
101 101
0 0
61.6320000000000 0.725754779522671
73.7000000000000 0.830301150185882
78.2800000000000 0.490917508345341
81.2640000000000 0.602561200211232
82.6880000000000 0.435568593909153
And I wish to remove this first row and retain the shape of the array (2 columns), thus creating the array
0 0
61.6320000000000 0.725754779522671
73.7000000000000 0.830301150185882
78.2800000000000 0.490917508345341
81.2640000000000 0.602561200211232
82.6880000000000 0.435568593909153
I have used A = A(A~=101); , which removes the values as required - however it packs the array down to one column.
The best way is:
A = A(2:end, :)
But you can also do
A(1,:) = []
however it is slightly less efficient (see Deleting matrix elements by = [] vs reassigning matrix)
If you are looking to delete rows that equal a certain number try
A = A(A(:,1)~=101,:)
Use all or any if you want to delete row if either all or any column equals your value:
A = A(all(A~=101,2),:)

Removing rows from all columns based on values of one column

i have a data of 732x26 and one of the column contains unwanted values, i used
logicalIndex = FOMassFlow > MeanFOMassFlow;
FOMassFlow = FOMassFlow(FOMassFlow ~= 0)
to remove the unwanted values of the particular column. How can i remove the rows of the unwanted values in the data of 732x26 ? (example, unwanted values found in column 5 row 6, i would like to remove the entire row 6 in the data of 732 rows by 26 columns)
To remove an entire row from a matrix, so in your case it's row 6, simply do:
FOMassFlow(6,:) = [];
This will mutate FOMassFlow so that you have 731 rows with row 6 removed.