I want to build a dashboard where I can search by any address in the U.S., have it geocoded by Tableau, and then filter a dataset based on a 1-5 mile radius from that address. Does Tableau have the capability to do this, or do I need a third party app to accomodate the address search feature / geocoding.
Unfortunately, Tableau's built-in logic only supports the identification of things like cities and zip codes. If you want more granular location visuals (street addresses and such), you will need geocode yourself prior to building your datasource.
onlinehelp.tableau.com
In the past, I've done this with some simple python code and ArcGIS:
import pandas as pd
import numpy as np
import geopy
from geopy.geocoders import ArcGIS
#csv with street addresses, cities, states, and zip
file_loc = "C:/Documents/data.csv"
df = pd.read_csv(file_loc, na_values=None,
dtype = {'STREET': object, 'CITY': object, 'STATE': object, 'ZIP': object})
df = df.fillna('')
street = df['STREET'].map(str)
city = df['CITY'].map(str)
state = df['STATE'].map(str)
zip = df['ZIP'].map(str)
addr = street + ' ' + city + ' ' + state + ' ' + zip
df['ADDRESS'] = addr
addresses = df['ADDRESS'].unique()
geolocator = ArcGIS()
addr_list = []
for a in addresses:
temp_dict = {}
loc = geolocator.geocode(a,timeout = 5)
if loc:
temp_dict['LAT'] = loc.latitude
temp_dict['LONG'] = loc.longitude
temp_dict['ADDR'] = loc.address
temp_dict['ADDRESS'] = a
else:
temp_dict['LAT'] = None
temp_dict['LONG'] = None
temp_dict['ADDR'] = None
temp_dict['ADDRESS'] = a
addr_list.append(temp_dict)
addr_df = pd.DataFrame(addr_list)
df = df.merge(addr_df, on='ADDRESS', how='left')
writer = pd.ExcelWriter('outdate.xlsx')
df.to_excel(writer,'Sheet1',index=False)
writer.save()
Related
Im having some issues while trying to add a new node to a graph (with OSMNX)
I need to calculate some distances on some areas that dont have nodes near.
Here is my code:
import networkx as nx
import osmnx as ox
from IPython.display import IFrame
import geopandas as gpd
from shapely.geometry import Point
my_dict = {
'001': {
'y': -31.640224888841907,
'x': -60.672566478771884,
'street_count': 1
}
}
tmp_list = []
for item_key, item_value in my_dict.items() :
tmp_list.append({
'geometry' : Point(item_value['x'], item_value['y']),
'osmid': item_key,
'y' : item_value['y'],
'x' : item_value['x'],
'street_count': item_value ['street_count']
})
my_nodes = gpd.GeoDataFrame(tmp_list)
G = ox.graph_from_place("Santa Fe, Santa Fe, Argentina", network_type="drive", buffer_dist=5000)
nodes= ox.graph_to_gdfs(G, nodes=True, edges=False)
edges= ox.graph_to_gdfs(G, edges=True, nodes=False)
nodes = nodes.append(my_nodes, ignore_index = True)
G2 = ox.graph_from_gdfs(nodes, edges)
m1 = ox.plot_graph_folium(G2, popup_attribute="name", weight=2, color="#8b0000")
dest = (-60.70916, -31.64553)
ori= (-60.66756, - 31.63719)
iniciocercano = ox.nearest_nodes(G2, ori[0], ori[1], return_dist=True)
finalcercano = ox.nearest_nodes(G2, dest[0], dest[1], return_dist=True)
pathDistance = nx.shortest_path_length(G2, iniciocercano[0], finalcercano[0], weight="length")
route = nx.shortest_path(G2, iniciocercano[0], finalcercano[0])
And the error that Im getting is: Input contains NaN.
I also notice that the original graph (G) has: 9423 nodes and 25013 edges. And the new graph (G2) has: 18847 nodes and 25013 which is pretty strange. Somehow the nodes are getting duplicate.
Thank you for your time.
Your my_nodes GeoDataFrame is not indexed by osmid like it needs to be, and like your nodes GeoDataFrame is.
I am learning neural network modeling and its uses in time series prediction.
First, thank you for reading this post and for your help :)
On this page there are various NN models (LSTM, CNN etc.) for predicting "traffic volume":
https://michael-fuchs-python.netlify.app/2020/11/01/time-series-analysis-neural-networks-for-forecasting-univariate-variables/#train-validation-split
I got inspired and decided to use/shorten/adapt the code in there for a problem of my own: predicting the bitcoin price.
I have the bitcoin daily prices starting 1.1.2017
in total 2024 daily prices
I use the first 85% of the data for the training data, and the rest as the validation (except the last 10 observation, which I would like to use as test data to see how good my model is)
I would like to use a Feedforward model
My goal is merely having a code that runs.
I have managed so far to have most of my code run. However, I get a strange format for my test forecast results: It should be simply an array of 10 numbers (i.e. predicted prices corresponding to the 10 day at the end of my data). To my surprise what is printed out is a long list of numbers. I need help to find out what changes I need to make to make to the code to make it run.
Thank you for helping me :)
The code is pasted down there, followed by the error:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing #import MinMaxScaler
from sklearn import metrics #import mean_squared_error
import seaborn as sns
sns.set()
import tensorflow as tf
from tensorflow import keras
from keras.layers import Input, Dense, Flatten
from keras.optimizers import Adam
from keras.models import Sequential
from keras.callbacks import EarlyStopping
tf.__version__
df = pd.read_csv('/content/BTC-USD.csv')
def mean_absolute_percentage_error_func(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def timeseries_evaluation_metrics_func(y_true, y_pred):
print('Evaluation metric results: ')
print(f'MSE is : {metrics.mean_squared_error(y_true, y_pred)}')
print(f'MAE is : {metrics.mean_absolute_error(y_true, y_pred)}')
print(f'RMSE is : {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
print(f'MAPE is : {mean_absolute_percentage_error_func(y_true, y_pred)}')
print(f'R2 is : {metrics.r2_score(y_true, y_pred)}',end='\n\n')
def univariate_data_prep_func(dataset, start, end, window, horizon):
X = []
y = []
start = start + window
if end is None:
end = len(dataset) - horizon
for i in range(start, end):
indicesx = range(i-window, i)
X.append(np.reshape(dataset[indicesx], (window, 1)))
indicesy = range(i,i+horizon)
y.append(dataset[indicesy])
return np.array(X), np.array(y)
# Generating the test set
test_data = df['close'].tail(10)
df = df.drop(df['close'].tail(10).index)
df.shape
# Defining the target variable
uni_data = df['close']
uni_data.index = df['formatted_date']
uni_data.head()
#scaling
from sklearn import preprocessing
uni_data = uni_data.values
scaler_x = preprocessing.MinMaxScaler()
x_scaled = scaler_x.fit_transform(uni_data.reshape(-1, 1))
# Single Step Style (sss) modeling
univar_hist_window_sss = 50
horizon_sss = 1
# 2014 observations in total
# 2014*0.85=1710 should be part of the training (304 validation)
train_split_sss = 1710
x_train_uni_sss, y_train_uni_sss = univariate_data_prep_func(x_scaled, 0, train_split_sss,
univar_hist_window_sss, horizon_sss)
x_val_uni_sss, y_val_uni_sss = univariate_data_prep_func(x_scaled, train_split_sss, None,
univar_hist_window_sss, horizon_sss)
print ('Length of first Single Window:')
print (len(x_train_uni_sss[0]))
print()
print ('Target horizon:')
print (y_train_uni_sss[0])
BATCH_SIZE_sss = 32
BUFFER_SIZE_sss = 150
train_univariate_sss = tf.data.Dataset.from_tensor_slices((x_train_uni_sss, y_train_uni_sss))
train_univariate_sss = train_univariate_sss.cache().shuffle(BUFFER_SIZE_sss).batch(BATCH_SIZE_sss).repeat()
validation_univariate_sss = tf.data.Dataset.from_tensor_slices((x_val_uni_sss, y_val_uni_sss))
validation_univariate_sss = validation_univariate_sss.batch(BATCH_SIZE_sss).repeat()
n_steps_per_epoch = 55
n_validation_steps = 10
n_epochs = 100
#FFNN architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, input_shape=x_train_uni_sss.shape[-2:]),
tf.keras.layers.Dense(units=horizon_sss)])
model.compile(loss='mse',
optimizer='adam')
#fit the model
model_path = '/content/FFNN_model_sss.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
min_delta=0, patience=10,
verbose=1, mode='min'),
tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss',
save_best_only=True,
mode='min', verbose=0)]
history = model.fit(train_univariate_sss, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
validation_data=validation_univariate_sss, validation_steps=n_validation_steps, verbose =1,
callbacks = keras_callbacks)
#validation
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
# Testing our model
trained_ffnn_model_sss = tf.keras.models.load_model(model_path)
df_temp = df['close']
test_horizon = df_temp.tail(univar_hist_window_sss)
test_history = test_horizon.values
result = []
# Define Forecast length here
window_len = len(test_data)
test_scaled = scaler_x.fit_transform(test_history.reshape(-1, 1))
for i in range(1, window_len+1):
test_scaled = test_scaled.reshape((1, test_scaled.shape[0], 1))
# Inserting the model
predicted_results = trained_ffnn_model_sss.predict(test_scaled)
print(f'predicted : {predicted_results}')
result.append(predicted_results[0])
test_scaled = np.append(test_scaled[:,1:],[[predicted_results]])
result_inv_trans = scaler_x.inverse_transform(result)
result_inv_trans
I believe the problem might have to do with the shapes of data. How exactly I do not yet know.
Data:
click here
Traceback:
click here
Im trying to follow an exercise on calculating the maximum drawdown and maximum drawdown duration of a market market neutral vs a long-only trading strategy.
I followed the code to the T and has worked perfectly up until now, and I seem to be getting a ValueError Exception. What code do I need to change for my code to work?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from MaxDD_Function import calculateMaxDD
# CALCUALTING MAXDD AND CREATING THE FUNCTION.
def calculateMaxDD(cumret):
highwatermark = np.zeros(cumret.shape)
drawdown = np.zeros(cumret.shape)
drawdownduration = np.zeros(cumret.shape)
for t in np.arange(1, cumret.shape[0]):
highwatermark[t] = (np.maximum(highwatermark[t -1]), cumret[t])
drawdown[t] = ((1+ cumret[t] )/(1 + highwatermark[t]) - 1)
if drawdown[t] == 0:
drawdownduration[t] == 0
else:
drawdownduration[t] = drawdownduration[t -1] + 1
maxDD, i = np.min(drawdown, np.argmin(drawdown)) # drawdown < 0 always
maxDDD = np.max(drawdownduration)
return (maxDD, maxDDD, i)
# First part of example. Read the csv data and calculate.
#The first dataframe/set for my strategy
df = pd.read_csv('IGE_daily.csv')
#print (df.head())
df.sort_values(by= 'Date', inplace = True)
dailyret = df.loc[:, 'Adj Close'].pct_change()
excessRet = ((dailyret - 0.04)/252)
sharpeRatio = ((np.sqrt(252)*np.mean(excessRet))/np.std(excessRet))
print (sharpeRatio)
#Second part of example
#This is the second dataframe/set for my strategy.
df2 = pd.read_csv('SPY.csv')
#The new data frame, with both datasets.
df = pd.merge (df, df2, on = 'Date', suffixes = ('_IGE', '_SPY'))
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace = True)
df.sort_index(inplace = True)
dailyret = df [['Adj Close_IGE', 'Adj Close_SPY' ]].pct_change() # Daily
Returns
dailyret.rename(columns = {"Adj Close_IGE": "IGE", "Adj Close_SPY": "SPY"
}, inplace = True)
netRet = (dailyret['IGE'] - dailyret['SPY'])/2
sharpeRatio = np.sqrt(252) * np.mean(netRet)/np.std(netRet)
print (sharpeRatio)
cumret = np.cumprod(1 + netRet) - 1 #Cumalative return
#print (plt.plot(cumret))
#print (plt.show()) # Remember to always run plt.show to see the plot in
terminal.
maxDrawdown, maxDrawdownDuration, startDrawdownDay =
calculateMaxDD(cumret.values)
maxDrawdown = calculateMaxDD(cumret.values)
print (maxDrawdown)
Here are the results I got from my above mentioned code:
Ivies-MacBook-Pro:Quant_Trading Ivieidahosa$ python Ex3_4.py
-46.10531783058014
0.7743286831426566
Traceback (most recent call last):
File "Ex3_4.py", line 76, in <module>
maxDrawdown = calculateMaxDD(cumret.values)
File "Ex3_4.py", line 15, in calculateMaxDD
highwatermark[t] = (np.maximum(highwatermark[t -1]), cumret[t])
ValueError: invalid number of arguments
I expected the output on themaxDrawdown to be -0.09529268047208683,maxDrawdwnduration to be 497 andstartDrawdownday to be 1223.
Q: What code do I need to change for my code to work?
Your code uses a call to a numpy function having a defined a minimum-call-signature as: np.maximum( <array_like_A>, <array_like_B> )
This simply fails to meet the expected behaviour once only one of the expected pair of values was delivered in the reported line ( see the closing parenthesis ), or a scalar or any other, non-array-like type of object(s) were attempted to be delivered into the call-signature:
highwatermark[t] = ( np.maximum( highwatermark[t-1] ), cumret[t] )
where a tuple was attempted to get constructed on the right hand side of the value-assignment (well, actually an object-reference gets assigned in python, sure, but was trying to remain short here to tell that fast for an easy reading ), the first item of which was expected to get assigned to a returned value from a call to the above documented np.maximum(...) function. And Hic Sunt Leones ...
May like to start further bug-tracing with a cross-check-ing of the state of objects and the call-signature:
try:
for t in np.arange( 1, cumret.shape[0] ):
print( "The shape of <highwatermark[t-1]>-object was: ",
highwatermark[t-1].shape, " for t == ", t
)
except:
print( "The <highwatermark[t-1]>-object was not a numpy array",
" for t == ", t
)
finally:
print( np.maximum.__doc__ )
I am trying to convert some data with the indicated measurements package, but I'm not succeeding on it.
My data:
Long Lat
62ᵒ36.080 58ᵒ52.940
61ᵒ28.020 54ᵒ59.940
62ᵒ07.571 56ᵒ48.873
62ᵒ04.929 57ᵒ33.605
63ᵒ01.419 60ᵒ30.349
63ᵒ09.555 61ᵒ29.199
63ᵒ43.499 61ᵒ23.590
64ᵒ34.175 62ᵒ30.304
63ᵒ16.342 59ᵒ16.437
60ᵒ55.090 54ᵒ49.269
61ᵒ28.013 54ᵒ59.928
62ᵒ07.868 56ᵒ48.040
62ᵒ04.719 57ᵒ32.120
62ᵒ36.083 58ᵒ51.766
63ᵒ01.644 60ᵒ30.714
64ᵒ33.897 62ᵒ30.772
63ᵒ43.604 61ᵒ23.426
63ᵒ09.288 61ᵒ29.888
63ᵒ16.722 59ᵒ16.204
What I'm trying:
library(measurements)
library(readxl)
coord = read.table('coord_converter.txt', header = T, stringsAsFactors = F)
# change the degree symbol to a space
lat = gsub('°','', coord$Lat)
long = gsub('°','', coord$Long)
# convert from decimal minutes to decimal degrees
lat = measurements::conv_unit(lat, from = 'deg_dec_min', to = 'dec_deg')
long = measurements::conv_unit(long, from = 'deg_dec_min', to = 'dec_deg')
What I'm getting with this penultimate line:
Warning messages:
In split(as.numeric(unlist(strsplit(x, " "))) * c(3600, 60), f = rep(1:length(x), : NAs introduced by coercion
In as.numeric(unlist(strsplit(x, " "))) * c(3600, 60) : longer object length is not a multiple of shorter object length
In split.default(as.numeric(unlist(strsplit(x, " "))) * c(3600, : data length is not a multiple of split variable
Can someone point my mistake or make a suggestion of how to proceed?
Thank you!
I think the issue here was that after gsub call, degrees and minutes were not space delimited, as required by measurements::conv_unit.
For example, this works fine (for this reproducible example I also changed "ᵒ" to "°"):
library(measurements)
#read your data
txt <-
"Long Lat
62°36.080 58°52.940
61°28.020 54°59.940
62°07.571 56°48.873
62°04.929 57°33.605
63°01.419 60°30.349
63°09.555 61°29.199
63°43.499 61°23.590
64°34.175 62°30.304
63°16.342 59°16.437
60°55.090 54°49.269
61°28.013 54°59.928
62°07.868 56°48.040
62°04.719 57°32.120
62°36.083 58°51.766
63°01.644 60°30.714
64°33.897 62°30.772
63°43.604 61°23.426
63°09.288 61°29.888
63°16.722 59°16.204"
coord <- read.table(text = foo, header = TRUE, stringsAsFactors = F)
# change the degree symbol to a space
lat = gsub('°',' ', coord$Lat)
long = gsub('°',' ', coord$Long)
# convert from decimal minutes to decimal degrees
lat = measurements::conv_unit(lat, from = 'deg_dec_min', to = 'dec_deg')
long = measurements::conv_unit(long, from = 'deg_dec_min', to = 'dec_deg')
yields...
> cbind(long, lat)
long lat
[1,] "62.6013333333333" "58.8823333333333"
[2,] "61.467" "54.999"
[3,] "62.1261833333333" "56.81455"
[4,] "62.08215" "57.5600833333333"
[5,] "63.02365" "60.5058166666667"
[6,] "63.15925" "61.48665"
[7,] "63.7249833333333" "61.3931666666667"
[8,] "64.5695833333333" "62.5050666666667"
[9,] "63.2723666666667" "59.27395"
[10,] "60.9181666666667" "54.82115"
[11,] "61.4668833333333" "54.9988"
[12,] "62.1311333333333" "56.8006666666667"
[13,] "62.07865" "57.5353333333333"
[14,] "62.6013833333333" "58.8627666666667"
[15,] "63.0274" "60.5119"
[16,] "64.56495" "62.5128666666667"
[17,] "63.7267333333333" "61.3904333333333"
[18,] "63.1548" "61.4981333333333"
[19,] "63.2787" "59.2700666666667"
I would like to produce a shiny app that asks for two addresses, maps an efficient route, and calculates the total distance of the route. This can be done using the Leaflet Routing Machine using the javascript library, however I would like to do a bunch of further calculations with the distance of the route and have it all embedded in a shiny app.
You can produce the map using rMaps by following this demo by Ramnathv here. But I'm not able to pull out the total distance travelled even though I can see that it has been calculated in the legend or controller. There exists another discussion on how to do this using the javascript library - see here. They discuss using this javascript code:
alert('Distance: ' + routes[0].summary.totalDistance);
Here is my working code for the rMap. If anyone has any ideas for how to pull out the total distance of a route and store it, I would be very grateful. Thank you!
# INSTALL DEPENDENCIES IF YOU HAVEN'T ALREADY DONE SO
library(devtools)
install_github("ramnathv/rCharts#dev")
install_github("ramnathv/rMaps")
# CREATE FUNCTION to convert address to coordinates
library(RCurl)
library(RJSONIO)
construct.geocode.url <- function(address, return.call = "json", sensor = "false") {
root <- "http://maps.google.com/maps/api/geocode/"
u <- paste(root, return.call, "?address=", address, "&sensor=", sensor, sep = "")
return(URLencode(u))
}
gGeoCode <- function(address,verbose=FALSE) {
if(verbose) cat(address,"\n")
u <- construct.geocode.url(address)
doc <- getURL(u)
x <- fromJSON(doc)
if(x$status=="OK") {
lat <- x$results[[1]]$geometry$location$lat
lng <- x$results[[1]]$geometry$location$lng
return(c(lat, lng))
} else {
return(c(NA,NA))
}
}
# GET COORDINATES
x <- gGeoCode("Vancouver, BC")
way1 <- gGeoCode("645 East Hastings Street, Vancouver, BC")
way2 <- gGeoCode("2095 Commercial Drive, Vancouver, BC")
# PRODUCE MAP
library(rMaps)
map = Leaflet$new()
map$setView(c(x[1], x[2]), 16)
map$tileLayer(provider = 'Stamen.TonerLite')
mywaypoints = list(c(way1[1], way1[2]), c(way2[1], way2[2]))
map$addAssets(
css = "http://www.liedman.net/leaflet-routing-machine/dist/leaflet-routing-machine.css",
jshead = "http://www.liedman.net/leaflet-routing-machine/dist/leaflet-routing-machine.js"
)
routingTemplate = "
<script>
var mywaypoints = %s
L.Routing.control({
waypoints: [
L.latLng.apply(null, mywaypoints[0]),
L.latLng.apply(null, mywaypoints[1])
]
}).addTo(map);
</script>"
map$setTemplate(
afterScript = sprintf(routingTemplate, RJSONIO::toJSON(mywaypoints))
)
# map$set(width = 800, height = 800)
map
You can easily create a route via the google maps api. The returned data frame will have distance info. Just sum up the legs for total distance.
library(ggmap)
x <- gGeoCode("Vancouver, BC")
way1txt <- "645 East Hastings Street, Vancouver, BC"
way2txt <- "2095 Commercial Drive, Vancouver, BC"
route_df <- route(way1txt, way2txt, structure = 'route')
dist<-sum(route_df[,1],na.rm=T) # total distance in meters
#
qmap(c(x[2],x[1]), zoom = 12) +
geom_path(aes(x = lon, y = lat), colour = 'red', size = 1.5, data = route_df, lineend = 'round')