mixAK::getProfiles help or bug? - cluster-analysis

Please could I have someone's views on the mixAK package getProfiles fn?
n <- 6
dat <- data.frame(id=rep((1:n), n/2), y = rnorm(n),
group=rep(LETTERS[1:2], n/2),
day=1:18,
x=rnorm(n))
library(mixAK)
ip <- getProfiles(t = "day", y = c("y", "x"), id = "id", data = dat)
ip
That doesnt seem right? Similar to the bug in a previous stackoverflow here:
mixAK getProfiles bug?

Related

plotly r sankey add_trace

i am reading the document https://plotly.com/r/reference/sankey/, and want to change the links color for a sankey chart. But i can't quite understand the parameters in add_trace() function
where should i specify the color value?
add_trace(p,type='sankey', color=????)
You haven't provided a minimal reproducible example, so I can't jump right into your code. But I think I can point you in the right direction.
In the documentation you screenshotted, it's saying that the color argument is one key of the list link that defines links in the plot. Using this example from the R plotly documentation for adding links, let's take a look at where that goes:
library(plotly)
library(rjson)
json_file <- "https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
fig <- plot_ly(
type = "sankey",
domain = list(
x = c(0,1),
y = c(0,1)
),
orientation = "h",
valueformat = ".0f",
valuesuffix = "TWh",
node = list(
label = json_data$data[[1]]$node$label,
color = json_data$data[[1]]$node$color,
pad = 15,
thickness = 15,
line = list(
color = "black",
width = 0.5
)
),
link = list(
source = json_data$data[[1]]$link$source,
target = json_data$data[[1]]$link$target,
value = json_data$data[[1]]$link$value,
label = json_data$data[[1]]$link$label,
#### Color goes here! ####
color = "yellow"
)
)
fig <- fig %>% layout(
title = "Energy forecast for 2050<br>Source: Department of Energy & Climate Change, Tom Counsell via <a href='https://bost.ocks.org/mike/sankey/'>Mike Bostock</a>",
font = list(
size = 10
),
xaxis = list(showgrid = F, zeroline = F),
yaxis = list(showgrid = F, zeroline = F)
)
fig
The plotly documentation can be a bit opaque at times. I have found it helpful to sometimes review the documentation for python. For example, this part of the python documentation does give some more guidance about changing link colors.

Improve PySpark implementation for finding connected components in a graph

I am currently working on the implementation of this paper describing Map Reduce Algorithm to fing connected component : https://www.cse.unr.edu/~hkardes/pdfs/ccf.pdf
As a beginner in Big Data world , I started the implementation of CCF-Iterate (w. secondary sorting) algorithm with a small graph : 6 edges and 8 nodes. I'm running this code with the free version of Databricks.
It takes 1 minute to give a result. That seems too long for a such small example . How can I reduce this time ? What kind of optimization is possible? Any advice would be really apreciated.
The idea is to test this algo for large graphs
PySpark code:
graph = sc.parallelize([ (2,3),(1,2), (2,4), (3,5), (6,7),(7,8)])
counter_new_pair = sc.accumulator(1)
while (counter_new_pair.value > 0):
counter_new_pair = sc.accumulator(0)
#CCF Iterate Sorting
mapping_1 = graph.map(lambda x : (x[0], x[1]))
mapping_2 = graph.map(lambda x : (x[1], x[0]))
fusion = mapping_1.union(mapping_2)
fusion = fusion.groupByKey().map(lambda x : (x[0], list(x[1])))
fusion = fusion.map(lambda x : (x[0], sorted(x[1])))
values = fusion.filter(lambda x : x[1][0] < x[0])
key_min_value = values.map(lambda x : (x[0], x[1][0]))
values = values.map(lambda x : (x[1][0], x[1][1:]))
values = values.filter(lambda x : len(x[1]) != 0)
values = values.flatMap(lambda x : [(val, x[0]) for val in x[1]])
values.foreach(lambda x: counter_new_pair.add(1))
joined = values.union(key_min_value)
# CCF Dedup
mapping = joined.map(lambda x : ((x[0], x[1]), None))
graph = mapping.groupByKey().map(lambda x : (x[0][0], x[0][1]))
Thanks

stats::step failed in function because can't find the data in lm object

everyone!
I tried using step function in my own function, but it seems that step function only check global variable but not variables in function.
here is my example code :
library(tidyverse)
# simple test function
my_step_function <- function(model_data, formula) {
mod <- lm(formula, model_data, x = TRUE, y = TRUE)
step_mod <- step(mod, direction = "both", trace = FALSE)
summary(step_mod)
}
# test data
test <- tibble(
x1 = 1:100,
x2 = -49:50+9*rnorm(100),
x3 = 50+5*rnorm(100),
x4 = 10*rnorm(100),
x5 = sqrt(1:100),
y = 5*x1 + 2*x2 + 10*x5 + rnorm(100)
) %>% nest(data = everything())
# can't work in map() function, this is where I first find the problem
test %>%
mutate(RW = map(
data,
~ my_step_function(.x,formula = formula(y~.))
))
# error:can't find object 'model_data'
# can't work when used directly
my_step_function(test$data[[1]],formula = (y~.))
# error:can't find object 'model_data'
# still can't work when give a test variable name
test_data <- test$data[[1]]
my_step_function(test_data,formula = (y~.))
# error:can't find object 'model_data'
# work when the global variable name is same with the variable name in the function
model_data <- test$data[[1]]
my_step_function(model_data,formula = (y~.))
# success!
I will appreciate it if someone can solve my puzzle !Thank everyone!

Group_by returns just one row while aggregate returns the expected outcome

I am currently stuck at the post-processing of some EddyData. Following an example (https://github.com/bgctw/REddyProc/blob/master/vignettes/aggUncertainty.md) I came up with an unexpected outcome of group_by which is reproducible but I don't understand why.
Group_by returns just one row while aggregate gives the expected outcome.
Here is a minimal example:
library(tidyverse)
#create example data frame
date.time <- seq(from=as.POSIXct("2015-01-01 00:30:00"), to=as.POSIXct("2015-01-03 00:30:00"),by="30 mins")
nee <- runif(length(date.time),-200,200)
df <- data.frame(date.time, nee)
#calculate day of the year
df <- df %>% mutate(
date.time = df$date.time
, DoY = as.POSIXlt(date.time - 15*60)$yday # midnight belongs to the previous
)
#trying to summarise nee for each day
aggDay <- df %>% group_by(DoY) %>% summarise(nee=sum(nee))
aggDay
nee
1 322.1195
aggDay just returns one row while aggregate would work in this case
aggregate(df$nee, by=list(df$DoY), sum)
Group.1 x
1 0 -25.15698
2 1 448.13960
3 2 -100.86310
Unfortunately, the original code involves some further calculations which is the reason why I'd like to stay with group_by.
#original code, not reproducible here
aggDay <- df %>% group_by(DoY) %>%
summarise(
DateTime = first(DateTime)
, nRec = sum( NEE_uStar_fqc == 0, na.rm = TRUE)
, nEff = computeEffectiveNumObs(
resid, effAcf = !!autoCorr, na.rm = TRUE)
, NEE = mean(NEE_uStar_f, na.rm = TRUE)
, sdNEE = if (nEff <= 1) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nEff - 1))
, sdNEEuncorr = if (nRec == 0) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nRec - 1))
)
I restarted RStudio and now it works. Don't ask me. There must have been a problem with another loaded package.

Total distance of route using Leaflet routing machine in rMaps/rCharts

I would like to produce a shiny app that asks for two addresses, maps an efficient route, and calculates the total distance of the route. This can be done using the Leaflet Routing Machine using the javascript library, however I would like to do a bunch of further calculations with the distance of the route and have it all embedded in a shiny app.
You can produce the map using rMaps by following this demo by Ramnathv here. But I'm not able to pull out the total distance travelled even though I can see that it has been calculated in the legend or controller. There exists another discussion on how to do this using the javascript library - see here. They discuss using this javascript code:
alert('Distance: ' + routes[0].summary.totalDistance);
Here is my working code for the rMap. If anyone has any ideas for how to pull out the total distance of a route and store it, I would be very grateful. Thank you!
# INSTALL DEPENDENCIES IF YOU HAVEN'T ALREADY DONE SO
library(devtools)
install_github("ramnathv/rCharts#dev")
install_github("ramnathv/rMaps")
# CREATE FUNCTION to convert address to coordinates
library(RCurl)
library(RJSONIO)
construct.geocode.url <- function(address, return.call = "json", sensor = "false") {
root <- "http://maps.google.com/maps/api/geocode/"
u <- paste(root, return.call, "?address=", address, "&sensor=", sensor, sep = "")
return(URLencode(u))
}
gGeoCode <- function(address,verbose=FALSE) {
if(verbose) cat(address,"\n")
u <- construct.geocode.url(address)
doc <- getURL(u)
x <- fromJSON(doc)
if(x$status=="OK") {
lat <- x$results[[1]]$geometry$location$lat
lng <- x$results[[1]]$geometry$location$lng
return(c(lat, lng))
} else {
return(c(NA,NA))
}
}
# GET COORDINATES
x <- gGeoCode("Vancouver, BC")
way1 <- gGeoCode("645 East Hastings Street, Vancouver, BC")
way2 <- gGeoCode("2095 Commercial Drive, Vancouver, BC")
# PRODUCE MAP
library(rMaps)
map = Leaflet$new()
map$setView(c(x[1], x[2]), 16)
map$tileLayer(provider = 'Stamen.TonerLite')
mywaypoints = list(c(way1[1], way1[2]), c(way2[1], way2[2]))
map$addAssets(
css = "http://www.liedman.net/leaflet-routing-machine/dist/leaflet-routing-machine.css",
jshead = "http://www.liedman.net/leaflet-routing-machine/dist/leaflet-routing-machine.js"
)
routingTemplate = "
<script>
var mywaypoints = %s
L.Routing.control({
waypoints: [
L.latLng.apply(null, mywaypoints[0]),
L.latLng.apply(null, mywaypoints[1])
]
}).addTo(map);
</script>"
map$setTemplate(
afterScript = sprintf(routingTemplate, RJSONIO::toJSON(mywaypoints))
)
# map$set(width = 800, height = 800)
map
You can easily create a route via the google maps api. The returned data frame will have distance info. Just sum up the legs for total distance.
library(ggmap)
x <- gGeoCode("Vancouver, BC")
way1txt <- "645 East Hastings Street, Vancouver, BC"
way2txt <- "2095 Commercial Drive, Vancouver, BC"
route_df <- route(way1txt, way2txt, structure = 'route')
dist<-sum(route_df[,1],na.rm=T) # total distance in meters
#
qmap(c(x[2],x[1]), zoom = 12) +
geom_path(aes(x = lon, y = lat), colour = 'red', size = 1.5, data = route_df, lineend = 'round')