Error in seq.default(1, length(cts_splt), by = 2) : wrong sign in 'by' argument - merge

I am trying to merge two data sets and thus trying to cut some NA data with the following code:
vedba <- read.csv(vedba_in)
head(vedba)
vedba$Start <- as.POSIXct(strptime(vedba$Midway,format="%d/%m/%Y %H:%M:%S"),tz="GMT")
head(vedba)
##cut data
cts <- cut(as.numeric(vedba$Start),breaks=c(breaks[1] - 3600,breaks))
## which dive do each vedba event belong
cts_splt <- strsplit(as.character(cts)[!is.na(cts)],split=",")
cts_splt <- unlist(cts_splt)
cts_splt <- cts_splt[seq(1,length(cts_splt), by=2)]
substring(cts_splt,1) <- "0"
cts_splt <- as.numeric(cts_splt)
dive_no <- match(cts_splt,as.numeric(begin))
Yet when I run it, I receive the following error:
Error in seq.default(1, length(cts_splt), by = 2) :
wrong sign in 'by' argument
I am stumped and can't fix it. I have used this argument before and haven't had the error so something must be wrong with my data set. Any clues?
I have uploaded an image of what my data looks like.
My vedba_in data

Related

Error in `[.data.frame`(predict.exercise, , i) : undefined columns selected

Hi i am trying to run this code and am getting this error.
code- for(i in variables) prediction.exercise[, paste0(i,"_lag")] <- shift(prediction.exercise[,i],n=1,type="lag")
error- Error in [.data.frame(prediction.exercise, , i) :
undefined columns selected
prediction.exercise= refers to the dataset
I tried to create a lag in the data set

R isn't recognizing the date field I have sent as an input to Rstudio in .csv format

[This is my sample data.]
What I had been trying to achieve is Forecasting in R with dates as CSV input via R studio.
When I've tried to change the data type of the Data field in my input using as.Date(my_date_field, "%Y-%m-%d"), Class(my_date_field) results in Date only but printing the Values of my_date_field results in "NA"s.
So, I am unable to forecast on timeline basis at all.
Please help me out sorting out this issue.
The code I've used for forecasting is:
library(forecast)
library(lubridate)
FitData <- read.csv("~//Power BI//fit.csv")
Fitdataset <- aggregate(FitData$Metric ~ FitData$PED, data = FitData, FUN= sum)
Fitdataset$FitData$PED<- as.Date(Fitdataset$FitData$PED, format="%y-%d-%m")
ts_FitData <- ts(Fitdataset$FitData$Metric, frequency=12, start=c(Fitdataset$FitData$PED`1,1))
decom <- stl(ts_FitData, s.window = "periodic")
pred <- forecast(decom, h = 7)
plot(pred)
`

Execution order in Scala for/yield block

I make three database calls (that all return Future values) using this syntax:
for {
a <- databaseCallA
b <- databaseCallB(a)
c <- databaseCallC(a)
} yield (a,b,c)
The second and third call depend on the result of the first, but the two of them could be run in parallel.
How can I get databaseCallC to be issued immediately after databaseCallB (without waiting for the result b)?
Or is this already happening?
This is not happening currently - you have told the Futures to start one after the other. To parallelise the second and third call, you could use this:
for {
a <- databaseCallA
(eventualB, eventualC) = (databaseCallB(a), databaseCallC(a))
b <- eventualB
c <- eventualC
} yield(a,b,c)
This will start both the computation of b and c as soon as a is available, and complete once all three are available with the triple

Custom Sort Range

Just need to know how can I get the following code not to get me a type mismatch error. The last line which is commented out works but when i replace the Range("B2:B2000") with f it gives me a type mismatch error. Reason why I am not just using the last line instead since it works is because what if column B becomes Column C if i insert a new column in Column B. Is there something else that I need to add to the f to make it work?
f = Application.WorksheetFunction.Match("PCR No.", Range("A1:AZ1"), 0)
ActiveWorkbook.Worksheets("3. PMO Internal View").Sort.SortFields.Add Key:=Cells(1, f)
ActiveWorkbook.Worksheets("3. PMO Internal View").Sort.SortFields.Clear
ActiveWorkbook.Worksheets("3. PMO Internal View").Sort.SortFields.Add Key:= _
f, SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
'Range("B2:B2000"), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal

Remove variable labels attached with foreign/Hmisc SPSS import functions

As usual, I got some SPSS file that I've imported into R with spss.get function from Hmisc package. I'm bothered with labelled class that Hmisc::spss.get adds to all variables in data.frame, hence want to remove it.
labelled class gives me headaches when I try to run ggplot or even when I want to do some menial analysis! One solution would be to remove labelled class from each variable in data.frame. How can I do that? Is that possible at all? If not, what are my other options?
I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric)) and as.character where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!
Thanks!
Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)
clear.labels <- function(x) {
if(is.list(x)) {
for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
}
else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
Use as follows:
my.unlabelled.df <- clear.labels(my.labelled.df)
EDIT
Here's a bit of a cleaner version of the function, same results:
clear.labels <- function(x) {
if(is.list(x)) {
for(i in seq_along(x)) {
class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
attr(x[[i]],"label") <- NULL
}
} else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is function or equality {==) but rather with inherits. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.
You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.
w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))
The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:
class(x[[i]]) <- NULL # no error from assignment of empty vector
The error you report can be reproduced with this code:
> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled' atomic [1:3] 4 5 6
..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] :
invalid replacement object to be a class string
You can try out the read.spss function from the foreign package.
A rough and ready way to get rid of the labelled class created by spss.get
for (i in 1:ncol(x)) {
z<-class(x[[i]])
if (z[[1]]=='labelled'){
class(x[[i]])<-z[-1]
attr(x[[i]],'label')<-NULL
}
}
But can you please give an example where labelled causes problems?
If I have a variable MAED in a data frame x created by spss.get, I have:
> class(x$MAED)
[1] "labelled" "factor"
> is.factor(x$MAED)
[1] TRUE
So well-written code that expects a factor (say) should not have any problems.
Suppose:
library(Hmisc)
w <- spss.get('...')
You could remove the labels of a variable called "var1" by using:
attributes(w$var1)$label <- NULL
If you also want to remove the class "labbled", you could do:
class(w$var1) <- NULL
or if the variable has more than one class:
class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]
Hope this helps!
Well, I figured out that unclass function can be utilized to remove classes (who would tell, aye?!):
library(Hmisc)
# let's presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"
It's not the luckiest solution, just imagine back-converting bunch of vectors... If anyone tops this, I'll check it as an answer...