I'm analyzing how many users have used a particular hashtag and how they have contributed to the total number of tweets. My results are:
Data:
20.68% of tweets related to #HashtagX are created by 20 users. Now, these 20 users only represent 0.001% of the total of 14,432 users who have ever used the hashtag #HashtagX.
What happens if we take the top 100 users by number of tweets? 44% of tweets are created by the top 100 users.
If we extend to the top 500 users by number of users we see that 72% of tweets is created by the top 500.
I am wondering how to implement a slope graph because I think that is a good way to show the relationship between both variables, but it is not a default graph provides for any library.
One of the ways to show the relationship between both variables ("Users" vs "Tweets") is a Slope Chart.
Visualization obtained (solved graph for the question):
Slope Chart
1) Libraries
library(ggplot2)
library(scales)
library(ggrepel)
theme_set(theme_classic())
2) Data example
Country = c('20 accounts', '50 accounts', '100 accounts','200 accounts','300 accounts',
'500 accounts','1000 accounts','14.443 accounts')
January = c(0.14, 0.34, 0.69,1.38,2.07,3.46,6.92,100)
April = c(20.68, 33.61, 44.94, 57.49,64.11,72,80,100)
Tweets_N = c(26797, 43547, 58211, 74472,83052,93259,103898,129529)
a = data.frame(Country, January, April)
left_label <- paste(a$Country, paste0(a$January,"%"),sep=" | ")
right_label <- paste(paste0(round(a$April),"%"),paste0(Tweets_N," tweets"),sep=" | ")
a$color_class <- "green"
3) Plot
p <- ggplot(a) + geom_segment(aes(x=1, xend=2, y=January, yend=April, col=color_class), size=.25, show.legend=F) +
geom_vline(xintercept=1, linetype="dashed", size=.1) +
geom_vline(xintercept=2, linetype="dashed", size=.1) +
scale_color_manual(labels = c("Up", "Down"),
values = c("blue", "red")) +
labs(
x="", y = "Percentage") +
xlim(.5, 2.5) + ylim(0,(1.1*(max(a$January, a$April)))) # X and Y axis limits
# Add texts
p <- p + geom_text_repel(label=left_label, y=a$January, x=rep(1, NROW(a)), hjust=1.1, size=3.5,direction = "y")
p <- p + geom_text(label=right_label, y=a$April, x=rep(2, NROW(a)), hjust=-0.1, size=3.5)
p <- p + geom_text(label="Accounts", x=1, y=1.1*(max(a$January, a$April)), hjust=1.2, size=4, check_overlap = TRUE) # title
p <- p + geom_text(label="Tweeets (% of Total)", x=2, y=1.1*(max(a$January, a$April)), hjust=-0.1, size=4, check_overlap = TRUE)
# title
# Minify theme
p + theme(panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
panel.border = element_blank(),
plot.margin = unit(c(1,2,1,2), "cm"))
Related
I want to add random effects at level 1. Below is the working code with level 2 simulated.
I have two questions i hope folks can help with
How do i get a reasonable estimate of level 2 variance (assuming i have sample data). Can i just square the between person SD on the dv?
how do simulate level 1 variance and how do i determine a reasonable value at level 1.
I've tried: randomeffect = list(int_neighborhood = list(variance = 8, var_level = 2),
weight= list(variance = 8, var_level = 1 ))
but that kicks an error
This code works without level 1
ctrl <- lmeControl(opt='optim');
sim_arguments <- list(
formula = y ~ 1 + weight + age + sex + (1 | neighborhood),
reg_weights = c(4, -0.03, 0.2, 0.33),
fixed = list(weight = list(var_type = 'continuous', mean = 180, sd = 30),
age = list(var_type = 'ordinal', levels = 30:60),
sex = list(var_type = 'factor', levels = c('male', 'female'))),
randomeffect = list(int_neighborhood = list(variance = 8, var_level = 2)),
sample_size = list(level1 = 62, level2 = 60)
)
nested_data <- sim_arguments %>%
simulate_fixed(data = NULL, .) %>%
simulate_randomeffect(sim_arguments) %>%
simulate_error(sim_arguments) %>%
generate_response(sim_arguments)
RandomIntercept <- lme(fixed= y ~1 + weight + age + sex ,
random= ~ 1 | neighborhood,
correlation = corAR1(),
data=nested_data,
control=ctrl,
na.action=na.exclude)
summary(RandomIntercept)
RandomSlope <-lme(fixed= y ~1 + weight + age + sex ,
random= ~ 1 +weight| neighborhood,
correlation = corAR1(),
data=nested_data,
control=ctrl,
na.action=na.exclude)
summary(RandomSlope)
anova(RandomIntercept,RandomSlope)
I’m trying to implement Adam by myself for a learning purpose.
Here is my Adam implementation:
class ADAMOptimizer(Optimizer):
"""
implements ADAM Algorithm, as a preceding step.
"""
def __init__(self, params, lr=1e-3, betas=(0.9, 0.99), eps=1e-8, weight_decay=0):
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
super(ADAMOptimizer, self).__init__(params, defaults)
def step(self):
"""
Performs a single optimization step.
"""
loss = None
for group in self.param_groups:
#print(group.keys())
#print (self.param_groups[0]['params'][0].size()), First param (W) size: torch.Size([10, 784])
#print (self.param_groups[0]['params'][1].size()), Second param(b) size: torch.Size([10])
for p in group['params']:
grad = p.grad.data
state = self.state[p]
# State initialization
if len(state) == 0:
state['step'] = 0
# Momentum (Exponential MA of gradients)
state['exp_avg'] = torch.zeros_like(p.data)
#print(p.data.size())
# RMS Prop componenet. (Exponential MA of squared gradients). Denominator.
state['exp_avg_sq'] = torch.zeros_like(p.data)
exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
b1, b2 = group['betas']
state['step'] += 1
# L2 penalty. Gotta add to Gradient as well.
if group['weight_decay'] != 0:
grad = grad.add(group['weight_decay'], p.data)
# Momentum
exp_avg = torch.mul(exp_avg, b1) + (1 - b1)*grad
# RMS
exp_avg_sq = torch.mul(exp_avg_sq, b2) + (1-b2)*(grad*grad)
denom = exp_avg_sq.sqrt() + group['eps']
bias_correction1 = 1 / (1 - b1 ** state['step'])
bias_correction2 = 1 / (1 - b2 ** state['step'])
adapted_learning_rate = group['lr'] * bias_correction1 / math.sqrt(bias_correction2)
p.data = p.data - adapted_learning_rate * exp_avg / denom
if state['step'] % 10000 ==0:
print ("group:", group)
print("p: ",p)
print("p.data: ", p.data) # W = p.data
return loss
I think I implemented everything correct however the loss graph of my implementation is very spiky compared to that of torch.optim.Adam.
My ADAM implementation loss graph (below)
torch.optim.Adam loss graph (below)
If someone could tell me what I am doing wrong, I’ll be very grateful.
For the full code including data, graph (super easy to run): https://github.com/byorxyz/AMS_pytorch/blob/master/AdamFails_1dConvex.ipynb
See the code and error. I have already tried Do, For,...and it is not working.
CODE + Error from Mathematica:
Import of survival probabilities _{k}p_x and _{k}p_y (calculated in excel)
px = Import["C:\Users\Eva\Desktop\kpx.xlsx"];
px = Flatten[Take[px, All], 1];
NOTE: The probability _{k}p_x can be found on the position px[[k+2, x -16]
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] = Sum[v^k*px[[k + 2, x - 16]]*py[[k + 2, y - 16]], {k , 0, n - 1}]
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
General::stop: Further output of Part::pkspec1 will be suppressed during this calculation.
Part of dataset (left corner of the dataset):
k\x 18 19 20
0 1 1 1
1 0.999478086278185 0.999363078716059 0.99927911905056
2 0.998841497412202 0.998642656911039 0.99858030519133
3 0.998121451605207 0.99794428814123 0.99788275311401
4 0.997423447323642 0.997247180349674 0.997174407432264
5 0.996726703362208 0.996539285828369 0.996437857252448
6 0.996019178300768 0.995803204773039 0.99563600297737
7 0.995283481416241 0.995001861216016 0.994823584922968
8 0.994482556091416 0.994189960607964 0.99405569519175
9 0.993671079225432 0.99342255996206 0.993339856748282
10 0.992904079096455 0.992707177451333 0.992611817294026
11 0.992189069953677 0.9919796017009 0.991832027835091
Without having the exact same data files to work with it is often easy for each of us to make mistakes that the other cannot reproduce or understand.
From your snapshot of your data set I used Export in Mathematica to try to reproduce your .xlsx file. Then I tried the following
px = Import["kpx.xlsx"];
px = Flatten[Take[px, All], 1];
py = px; (* fake some py data *)
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] := Sum[v^k*px[[k+2,x-16]]*py[[k+2,y-16]], {k,0,n-1}];
JointLifeIndep[17, 17, 12]
and it displays 362.402
Notice I used := instead of = in my definition of JointLifeIndep. := and = do different things in Mathematica. = will immediately evaluate the right hand side of that definition. This is possibly the reason that you are getting the error that you do.
You should also be careful with your subscript values and make sure that every subscript is between 1 and the number of rows (or columns) in your matrix.
So see if you can try this example with an Excel sheet containing only the snapshot of data that you showed and see if you get the same result that I do.
Hopefully that will be enough for you to make progress.
I would like to produce a shiny app that asks for two addresses, maps an efficient route, and calculates the total distance of the route. This can be done using the Leaflet Routing Machine using the javascript library, however I would like to do a bunch of further calculations with the distance of the route and have it all embedded in a shiny app.
You can produce the map using rMaps by following this demo by Ramnathv here. But I'm not able to pull out the total distance travelled even though I can see that it has been calculated in the legend or controller. There exists another discussion on how to do this using the javascript library - see here. They discuss using this javascript code:
alert('Distance: ' + routes[0].summary.totalDistance);
Here is my working code for the rMap. If anyone has any ideas for how to pull out the total distance of a route and store it, I would be very grateful. Thank you!
# INSTALL DEPENDENCIES IF YOU HAVEN'T ALREADY DONE SO
library(devtools)
install_github("ramnathv/rCharts#dev")
install_github("ramnathv/rMaps")
# CREATE FUNCTION to convert address to coordinates
library(RCurl)
library(RJSONIO)
construct.geocode.url <- function(address, return.call = "json", sensor = "false") {
root <- "http://maps.google.com/maps/api/geocode/"
u <- paste(root, return.call, "?address=", address, "&sensor=", sensor, sep = "")
return(URLencode(u))
}
gGeoCode <- function(address,verbose=FALSE) {
if(verbose) cat(address,"\n")
u <- construct.geocode.url(address)
doc <- getURL(u)
x <- fromJSON(doc)
if(x$status=="OK") {
lat <- x$results[[1]]$geometry$location$lat
lng <- x$results[[1]]$geometry$location$lng
return(c(lat, lng))
} else {
return(c(NA,NA))
}
}
# GET COORDINATES
x <- gGeoCode("Vancouver, BC")
way1 <- gGeoCode("645 East Hastings Street, Vancouver, BC")
way2 <- gGeoCode("2095 Commercial Drive, Vancouver, BC")
# PRODUCE MAP
library(rMaps)
map = Leaflet$new()
map$setView(c(x[1], x[2]), 16)
map$tileLayer(provider = 'Stamen.TonerLite')
mywaypoints = list(c(way1[1], way1[2]), c(way2[1], way2[2]))
map$addAssets(
css = "http://www.liedman.net/leaflet-routing-machine/dist/leaflet-routing-machine.css",
jshead = "http://www.liedman.net/leaflet-routing-machine/dist/leaflet-routing-machine.js"
)
routingTemplate = "
<script>
var mywaypoints = %s
L.Routing.control({
waypoints: [
L.latLng.apply(null, mywaypoints[0]),
L.latLng.apply(null, mywaypoints[1])
]
}).addTo(map);
</script>"
map$setTemplate(
afterScript = sprintf(routingTemplate, RJSONIO::toJSON(mywaypoints))
)
# map$set(width = 800, height = 800)
map
You can easily create a route via the google maps api. The returned data frame will have distance info. Just sum up the legs for total distance.
library(ggmap)
x <- gGeoCode("Vancouver, BC")
way1txt <- "645 East Hastings Street, Vancouver, BC"
way2txt <- "2095 Commercial Drive, Vancouver, BC"
route_df <- route(way1txt, way2txt, structure = 'route')
dist<-sum(route_df[,1],na.rm=T) # total distance in meters
#
qmap(c(x[2],x[1]), zoom = 12) +
geom_path(aes(x = lon, y = lat), colour = 'red', size = 1.5, data = route_df, lineend = 'round')
Lets say that I want to have unlimited path segements and have the get multiply them together such that:
get "/multiply/num1/num2/num3......" do
num1 = params[:num1].to_i
num2 = params[:num2].to_i
....
solution = num1 * num2 * ....
"the solution is = #{solution}"
end
I want the user to be able to type out as many path segments as they want and then get the solution for those numbers multiplied together.
I have found a way to do it:
get "/multiply/*" do
n = params[:splat][0].split('/')
for i in (0...n.length)
n[i] = n[i].to_f
end
n = n.inject{ |sum, n| sum * n }
"solution = #{n}"
end