Is there a function in R to create a new column in a tibble that depends on values from a previous row? - match

First time poster and quite new to R.
I'm trying to add a new variable to a tibble ("joined") that adds value nrow-1 from column 22 ("NurseID"), if the value of the variable in column 3("AccountID") on nrow matches the one on nrow-1.
I can do it with a sorted loop, but this is a large dataset and it takes a long time to run and I wonder if there is a faster/easier way to do this
arrange (joined, AccountID, date_day, shift)
tie <- "."
for (i in 2:nrow(joined))
{
ifelse (joined[i,3]==joined[i-1,3], temp<-joined[i-1,22], temp<-".")
tie <- c(tie,temp)
}
temptie <- as.numeric(tie)
joined <- as_tibble(cbind(joined,temptie))
Any help / input is much appreciated. Please kindly let me know if you need more information on the tibble

Related

OpenRefine: How can I offset values? (preceding row to the following row)

Let's suppose I have this list in OpenRefine:
A
B
C
Is there a way to move (offset values) B to A like the following?
A B
B C
With the cross() function, and v3.5 of OpenRefine (currently in beta) you can access previous or following rows by not supplying the field name. You can achieve the same by creating an index column in v3.4.
So, you can do cells.ColumnName.value +" "+ cross(row.index + 1, "", "")[0].cells.ColumnName.value to get the value of the next row appending the value of that cell in the current row, with a space.
Note that this will take the value of the row with an index higher, not necessally the row following in the display, if you use sorting.
Regards, Antoine

Postgres: How to increment the index (pointer) to access other rows

I have been trying to understand how to increment the reference to some value.
In C I would simply increment the pointer to retrieve a value in the next array location.
How does this mechanism work in Postgres? is it possible?
For an example, I have created a table with some data in:
create table mathtest (
x int, y int, val int)
insert into mathtest (x,y,val)
values (1,1,10),(2,2,20),(3,3,30),(4,4,40),(5,5,50),(6,6,60),(7,7,70),(8,8,80),(9,9,90),(10,10,100),(11,11,110)
What I want to do is add the val value from the current row and then the val value when the x value in the row equals the current x value plus 2, and then plus 4. I realise that I can't assume the next row that is retrieved will be in a set order so I can't use 'lead'
If it was C I would simply increment the pointer.
The data output needs to be when the modulo of x and y = 0 for certain divisors. (this bit works)
select
x base,
(x+2) plus1x,
(x+4) plus2x,
y,
val
from mathtest
where x%2 =0 and y%3 = 0
This outputs the following:
base plus1x plus2x y val
1 6 8 10 6 60
The output I would like is:
60 + 80 +100 = 240
I can't conceptualise how to do it. My mind seems to be stuck in procedural C mode!
Whatever I type and try is an error.
Can any body help me to get over this hurdle?
Welcome to the world of window functions.
You need an explicit ordering, otherwise it makes no sense to speak of the "previous row".
As a simple example, to get the difference to the previous value, you can query like
SELECT val -
lag(val) OVER (ORDER BY x)
FROM mathtest;

partial Distance Based RDA - Centroids vanished from Plot

I am trying to fir a partial db-RDA with field.ID to correct for the repeated measurements character of the samples. However including Condition(field.ID) leads to Disappearance of the centroids of the main factor of interest from the plot (left plot below).
The Design: 12 fields have been sampled for species data in two consecutive years, repeatedly. Additionally every year 3 samples from reference fields have been sampled. These three fields have been changed in the second year, due to unavailability of the former fields.
Additionally some environmental variables have been sampled (Nitrogen, Soil moisture, Temperature). Every field has an identifier (field.ID).
Using field.ID as Condition seem to erroneously remove the F1 factor. However using Sampling campaign (SC) as Condition does not. Is the latter the rigth way to correct for repeated measurments in partial db-RDA??
set.seed(1234)
df.exp <- data.frame(field.ID = factor(c(1:12,13,14,15,1:12,16,17,18)),
SC = factor(rep(c(1,2), each=15)),
F1 = factor(rep(rep(c("A","B","C","D","E"),each=3),2)),
Nitrogen = rnorm(30,mean=0.16, sd=0.07),
Temp = rnorm(30,mean=13.5, sd=3.9),
Moist = rnorm(30,mean=19.4, sd=5.8))
df.rsp <- data.frame(Spec1 = rpois(30, 5),
Spec2 = rpois(30,1),
Spec3 = rpois(30,4.5),
Spec4 = rpois(30,3),
Spec5 = rpois(30,7),
Spec6 = rpois(30,7),
Spec7 = rpois(30,5))
data=cbind(df.exp, df.rsp)
dbRDA <- capscale(df.rsp ~ F1 + Nitrogen + Temp + Moist + Condition(SC), df.exp); ordiplot(dbRDA)
dbRDA <- capscale(df.rsp ~ F1 + Nitrogen + Temp + Moist + Condition(field.ID), df.exp); ordiplot(dbRDA)
You partial out variation due to ID and then you try to explain variable aliased to this ID, but it was already partialled out. The key line in the printed output was this:
Some constraints were aliased because they were collinear (redundant)
And indeed, when you ask for details, you get
> alias(dbRDA, names=TRUE)
[1] "F1B" "F1C" "F1D" "F1E"
The F1? variables were constant within ID which already was partialled out, and nothing was left to explain.

Consolidating a data table in Scala

I am working on a small data analysis tool, and practicing/learning Scala in the process. However I got stuck at a small problem.
Assume data of type:
X Gr1 x_11 ... x_1n
X Gr2 x_21 ... x_2n
..
X GrK x_k1 ... x_kn
Y Gr1 y_11 ... y_1n
Y Gr3 y_31 ... y_3n
..
Y Gr(K-1) ...
Here I have entries (X,Y...) that may or may not exist in up to K groups, with a series of values for each group. What I want to do is pretty simple (in theory), I would like to consolidate the rows that belong to the same "entity" in different groups. so instead of multiple lines that start with X, I want to have one row with all values from x_11 to x_kn in columns.
What makes things complicated however is that not all entities exist in all groups. So wherever there's "missing data" I would like to pad with for instance zeroes, or some string that denotes a missing value. So if I have (X,Y,Z) in up to 3 groups, the type I table I want to have is as follows:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 N/A N/A y_31 y_32
Z N/A N/A z_21 z_22 N/A N/A
I have been stuck trying to figure this out, is there a smart way to use List functions to solve this?
I wrote this simple loop:
for {
(id, hitlist) <- hits.groupBy(_.acc)
h <- hitlist
} println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
to able to generate the tables that look like the example above. Note that, my original data is of a different format and layout,but that has little to do with the problem at hand, thus I have skipped all steps regarding parsing. I should be able to use groupBy in a better way that actually solves this for me, but I can't seem to get there.
Then I modified my loop mapping the hits to ratios and appending them to one another:
for ((id, hitlist) <- hits.groupBy(_.acc)){
val l = hitlist.map(_.ratios).foldRight(List[Double]()){
(l1: List[Double], l2: List[Double]) => l1 ::: l2
}
println(id + "\t" + l.mkString("\t"))
//println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
}
That gets me one step closer but still no cigar! Instead of a fully padded "matrix" I get a jagged table. Taking the example above:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 y_31 y_32
Z z_21 z_22
Any ideas as to how I can pad the table so that values from respective groups are aligned with one another? I should be able to use _.sampleId, which holds the "group membersip" for each "hit", but I am not sure how exactly. ´hits´ is a List of type Hit which is practically a wrapper for each row, giving convenience methods for getting individual values, so essentially a tuple which have "named indices" (such as .acc, .sampleId..)
(I would like to solve this problem without hardcoding the number of groups, as it might change from case to case)
Thanks!
This is a bit of a contrived example, but I think you can see where this is going:
case class Hit(acc:String, subAcc:String, value:Int)
val hits = List(Hit("X", "x_11", 1), Hit("X", "x_21", 2), Hit("X", "x_31", 3))
val kMax = 4
val nMax = 2
for {
(id, hitlist) <- hits.groupBy(_.acc)
k <- 1 to kMax
n <- 1 to nMax
} yield {
val subId = "x_%s%s".format(k, n)
val row = hitlist.find(h => h.subAcc == subId).getOrElse(Hit(id, subId, 0))
println(row)
}
//Prints
Hit(X,x_11,1)
Hit(X,x_12,0)
Hit(X,x_21,2)
Hit(X,x_22,0)
Hit(X,x_31,3)
Hit(X,x_32,0)
Hit(X,x_41,0)
Hit(X,x_42,0)
If you provide more information on your hits lists then we could probably come with something a little more accurate.
I have managed to solve this problem with the following code, I am putting it here as an answer in case someone else runs into a similar problem and requires some help. The use of find() from Noah's answer was definitely very useful, so do give him a +1 in case this code snippet helps you out.
val samples = hits.groupBy(_.sampleId).keys.toList.sorted
for ((id, hitlist) <- hits.groupBy(_.acc)) {
val ratios =
for (sample <- samples)
yield hitlist.find(h => h.sampleId == sample).map(_.ratios)
.getOrElse(List(Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN))
println(id + "\t" + ratios.flatten.mkString("\t"))
}
I figure it's not a very elegant or efficient solution, as I have two calls to groupBy and I would be interested to see better solutions to this problem.

sum two values from different datasets using lookups in report builder

I have a report that should read values from 2 dataset by Currency:
Dataset1: Production Total
Dataset2: Net Total
Ive tried to use:
Lookup(Fields!Currency_Type.Value,
Fields!Currency_Type1.Value,
Fields!Gross_Premium_Amount.Value,
"DataSet2")
This returns only the first amount from dataset 2.
I've tried Lookupset function as well but it didn't SUM the retrieved values.
Any help would be appreciated.
Thanks Jamie for the reply.
THis is what i have done and it worked perfect:
From Report Properties--> Code , write the below function:
Function SumLookup(ByVal items As Object()) As Decimal
If items Is Nothing Then
Return Nothing
End If
Dim suma As Decimal = New Decimal()
Dim ct as Integer = New Integer()
suma = 0
ct = 0
For Each item As Object In items
suma += Convert.ToDecimal(item)
Next
If (ct = 0) Then return 0 else return suma
End Function
Then you can call the function:
code.SumLookup(LookupSet(Fields!Currency_Type.Value, Fields!Currency_Type1.Value,Fields!Gross_Premium_Amount.Value, "DataSet2"))
Yes, Lookup will only return the first matching value. Three options come to mind:
Change your query, so that you only need to get one value: use a GROUP BY and SUM(...) to combine your two rows in the query. If you are using this query other places, then make a copy and change that.
Is there some difference in the rows? Such as one is for last year and one is for this year? If so, create an artificial lookup key and lookup the two values separately:
=Lookup(Fields!Currency_Type.Value & ","
& YEAR(DATEADD(DateInterval.Year,-1,today())),
Fields!Currency_Type1.Value & ","
& Fields!Year.Value,
Fields!Gross_Premium_Amount.Value,
"DataSet2")
+
Lookup(Fields!Currency_Type.Value & ","
& YEAR(today()),
Fields!Currency_Type1.Value & ","
& Fields!Year.Value,
Fields!Gross_Premium_Amount.Value,
"DataSet2")
Use the LookupSet function as mentioned. With this you'll get a collection of the values back, and then need to add those together. The easiest way to do this is with embedded code in the report. Add this function to the report's code:
Function AddList(ByVal items As Object()) As Double
If items Is Nothing Then
Return 0
End If
Dim Total as Double
Total = 0
For Each item As Object In items
Total = Total + CDbl(item)
Next
Return Total
End Function
Now call that with:
=Code.AddList(LookupSet(Fields!Currency_Type.Value,
Fields!Currency_Type1.Value,
Fields!Gross_Premium_Amount.Value,
"DataSet2"))
(Note: this code was not tested. I just composed it in the Stack Overflow edit window & I'm no fan of VB. But it should give you a good idea of what to do.)