OpenRefine: How can I offset values? (preceding row to the following row) - offset

Let's suppose I have this list in OpenRefine:
A
B
C
Is there a way to move (offset values) B to A like the following?
A B
B C

With the cross() function, and v3.5 of OpenRefine (currently in beta) you can access previous or following rows by not supplying the field name. You can achieve the same by creating an index column in v3.4.
So, you can do cells.ColumnName.value +" "+ cross(row.index + 1, "", "")[0].cells.ColumnName.value to get the value of the next row appending the value of that cell in the current row, with a space.
Note that this will take the value of the row with an index higher, not necessally the row following in the display, if you use sorting.
Regards, Antoine

Related

Is there a function in R to create a new column in a tibble that depends on values from a previous row?

First time poster and quite new to R.
I'm trying to add a new variable to a tibble ("joined") that adds value nrow-1 from column 22 ("NurseID"), if the value of the variable in column 3("AccountID") on nrow matches the one on nrow-1.
I can do it with a sorted loop, but this is a large dataset and it takes a long time to run and I wonder if there is a faster/easier way to do this
arrange (joined, AccountID, date_day, shift)
tie <- "."
for (i in 2:nrow(joined))
{
ifelse (joined[i,3]==joined[i-1,3], temp<-joined[i-1,22], temp<-".")
tie <- c(tie,temp)
}
temptie <- as.numeric(tie)
joined <- as_tibble(cbind(joined,temptie))
Any help / input is much appreciated. Please kindly let me know if you need more information on the tibble

Postgres: How to increment the index (pointer) to access other rows

I have been trying to understand how to increment the reference to some value.
In C I would simply increment the pointer to retrieve a value in the next array location.
How does this mechanism work in Postgres? is it possible?
For an example, I have created a table with some data in:
create table mathtest (
x int, y int, val int)
insert into mathtest (x,y,val)
values (1,1,10),(2,2,20),(3,3,30),(4,4,40),(5,5,50),(6,6,60),(7,7,70),(8,8,80),(9,9,90),(10,10,100),(11,11,110)
What I want to do is add the val value from the current row and then the val value when the x value in the row equals the current x value plus 2, and then plus 4. I realise that I can't assume the next row that is retrieved will be in a set order so I can't use 'lead'
If it was C I would simply increment the pointer.
The data output needs to be when the modulo of x and y = 0 for certain divisors. (this bit works)
select
x base,
(x+2) plus1x,
(x+4) plus2x,
y,
val
from mathtest
where x%2 =0 and y%3 = 0
This outputs the following:
base plus1x plus2x y val
1 6 8 10 6 60
The output I would like is:
60 + 80 +100 = 240
I can't conceptualise how to do it. My mind seems to be stuck in procedural C mode!
Whatever I type and try is an error.
Can any body help me to get over this hurdle?
Welcome to the world of window functions.
You need an explicit ordering, otherwise it makes no sense to speak of the "previous row".
As a simple example, to get the difference to the previous value, you can query like
SELECT val -
lag(val) OVER (ORDER BY x)
FROM mathtest;

How to add values to last column of a table based on certain conditions in MATLAB?

I have a 29736 x 6 table, which is referred to as table_fault_test_data. It has 6 columns, with names wind_direction, wind_speed, air_temperature, air_pressure, density_hubheight and Fault_Condition respectively. What I want to do is to label the data in the Fault_Condition (last table column with either a 1 or a 0 value, depending on the values in the other columns.
I would like to do the following checks (For eg.)
If wind_direction value(column_1) is below 0.0040 and above 359.9940, label 6 th column entry corresponding to the respective row of the table as a 1, else label as 0.
Do this for the entire table. Similarly, do this check for others
like air_temperature, air_pressure and so on. I know that if-else
will be used for these checks. But, I am really confused as to how I
can do this for the whole table and add the corresponding value to
the 6 th column (Maybe using a loop or something).
Any help in this
regard would be highly appreciated. Many Thanks!
EDIT:
Further clarification: I have a 29736 x 6 table named table_fault_test_data . I want to add values to the 6 th column of table based on conditions as below:-
for i = 1:29736 % Iterating over the whole table row by row
if(1st column value <x | 1st column value > y)
% Add 0 to the Corresponding element of 6 th column i.e. table_fault_test_data(i,6)
elseif (2nd column value <x | 2nd column value > y)
% Add 0 to the Corresponding element of 6 th column i.e. table_fault_test_data(i,6)
elseif ... do this for other cases as well
else
% Add 1 to the Corresponding element of 6 th column i.e. table_fault_test_data(i,6)
This is the essence of my requirements. I hope this helps in understanding the question better.
You can use logical indexing, which is supported also for tables (for loops should be avoided, if possible). For example, suppose you want to implement the first condition, and also suppose your x and y are known; also, let us assume your table is called t
logicalIndecesFirstCondition = t{:,1} < x | t{:,2} >y
and then you could refer to the rows which verify this condition using logical indexing (please refer to logical indexing
E.g.:
t{logicalIndecesFirstCondition , 6} = t{logicalIndecesFirstCondition , 6} + 1.0;
This would add 1.0 to the 6th column, for the rows for which the logical condition is true

Count unique values in list of sub-lists

I have RDD of the following structure (RDD[(String,Map[String,List[Product with Serializable]])]):
This is a sample data:
(600,Map(base_data -> List((10:00 01-08-2016,600,111,1,1), (10:15 01-08-2016,615,111,1,5)), additional_data -> List((1,2)))
(601,Map(base_data -> List((10:01 01-08-2016,600,111,1,2), (10:02 01-08-2016,619,111,1,2), (10:01 01-08-2016,600,111,1,4)), additional_data -> List((5,6)))
I want to calculate the number of unique values of the 4th fields in sub-lists.
For instance let's take the first entry. The list is List((10:00 01-08-2016,600,111,1,1), (10:15 01-08-2016,615,111,1,5)). It contains 2 unique values (1 and 5) in the 4th field of sub-lists.
As to the second entry, it also contains 2 unique values (2 and 4), because 2 is repeated twice.
The resulting RDD should be of the format RDD[Map[String,Any]].
I tried to solve this task as follows:
val result = myRDD.map({
line => Map(("id",line._1),
("unique_count",line._2.get("base_data").groupBy(l => l).count(_))))
})
However this code does not do what I need. In fact, I don't know how to properly indicate that I want to group by 4th field...
You are quite close to the solution. There is no need to call groupBy, but you can access the item of the tuples by index, transform the resulting List into a Set and then just return the size of the Set, which corresponds to the number of unique elements:
("unique_count", line._2("base_data").map(bd => bd.productElement(4)).toSet.size)

Extracting rows from .mat table using for loop in MATLAB

What I have is a variable X which has values assigned to it in the form of a table of 9 columns and around 100 rows. Here is an example:
X =
Columns 1 through 7
-2.2869 -1.1168 0.1430 -4.0753 1.7620 -6.3229 -3.1997
-2.2504 -1.1022 0.2046 -3.9865 1.7423 -6.2172 -3.1231
-2.2138 -1.0876 0.2663 -3.8977 1.7226 -6.1115 -3.0465
-2.1772 -1.0730 0.3279 -3.8089 1.7029 -6.0058 -2.9700
I need to create a for loop that extracts the first r rows of the first 'p' colmuns. For example r=3 and p=4.
Any idea on how I can do that?
I suggest you don't use a for-loop, but rather index directly into the matrix:
out = X(1:r,1:p)
returns the first r rows and p columns of X.