How to proper use sql/hive variables in the new databricks connect - pyspark

I'm testing the new databricks connect and I often use sql variables in my python scripts on databricks, however I'm not able to use those variables through dbconnect. The example below works fine in databricks but not in dbconnect:
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
import pandas as pd
spark = SparkSession.builder.getOrCreate()
sqlContext = SQLContext(spark)
df = spark.createDataFrame(pd.DataFrame({'a':[2,5,8], 'b':[3,5,5]}))
df.createOrReplaceTempView('test_view')
sqlContext.sql("set a_value = 2")
sqlContext.sql("select * from test_view where a = ${a_value}")
In dbconnect I received the follow:
---------------------------------------------------------------------------
ParseException Traceback (most recent call last)
<ipython-input-50-404f4c5b017c> in <module>
10
11 sqlContext.sql("set a_value = 2")
---> 12 sqlContext.sql("select * from test_view where a = ${a_value}")
c:\users\pc\miniconda3\lib\site-packages\pyspark\sql\context.py in sql(self, sqlQuery)
369 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
370 """
--> 371 return self.sparkSession.sql(sqlQuery)
372
373 #since(1.0)
c:\users\pc\miniconda3\lib\site-packages\pyspark\sql\session.py in sql(self, sqlQuery)
702 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
703 """
--> 704 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
705
706 #since(2.0)
c:\users\pc\miniconda3\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
c:\users\pc\miniconda3\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
132 # Hide where the exception came from that shows a non-Pythonic
133 # JVM exception message.
--> 134 raise_from(converted)
135 else:
136 raise
c:\users\pc\miniconda3\lib\site-packages\pyspark\sql\utils.py in raise_from(e)
ParseException:
mismatched input '<EOF>' expecting {'(', 'COLLECT', 'CONVERT', 'DELTA', 'HISTORY', 'MATCHED', 'MERGE', 'OPTIMIZE', 'SAMPLE', 'TIMESTAMP', 'UPDATE', 'VERSION', 'ZORDER', 'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLONE', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COPY', 'COPY_OPTIONS', 'COST', 'CREATE', 'CREDENTIALS', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DAY', 'DBPROPERTIES', 'DEEP', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DROP', 'ELSE', 'ENCRYPTION', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FILES', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMAT_OPTIONS', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PATTERN', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SECOND', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHALLOW', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'YEAR', '+', '-', '*', 'DIV', '~', STRING, BIGINT_LITERAL, SMALLINT_LITERAL, TINYINT_LITERAL, INTEGER_VALUE, EXPONENT_VALUE, DECIMAL_VALUE, DOUBLE_LITERAL, BIGDECIMAL_LITERAL, IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 34)
== SQL ==
select * from test_view where a =
----------------------------------^^^
So, has anyone managed to make these variables work?
Thanks

You can pass parameters/arguments to your SQL statements by programmatically creating the SQL string using Scala/Python and pass it to sqlContext.sql(string).
sqlContext.sql("set a_value = 2")
sqlContext.sql("select * from test_view where a = ${a_value}").show()

Related

How to use the #app.callback function with two inputs as dropdowns?

**Hello everyone!
I have been trying to create an interactive dashboard in python using the #app.callback function with two inputs. My dataset layout can be summarized into 4 main columns. [1]: https://i.stack.imgur.com/boMKt.png
I'd like Geography and Time Period to manifest in the form of dropdowns (therefore use the Dcc. dropdown function.
The first dropdown will filter the dataset according to the Geography and the second one will define the "Period time - MAT, L12w or L4w) within the country. Therefore somehow the second dropdown is to be integrated within the first dropdown.
I am familiarized with both the dropdown and #app.callback function. But I can't seem to find a script that fuses both. Important note: the output desired is a pie chart that distinguishes Manufacturers' (Column 2) value share (column 4) according to the selected Geography and time Period. I am guessing the mystery resides in the app.layout structure. However, I tried everything and the code won't work.
Also, you will find the code I have done so far attached. The important bit is from "#DESIGN APP LAYOUT" onwards.
I'd really appreciate a quick response. Thanks in advance for the help!**
from dash import html
from dash import dcc
from dash.dependencies import Input, Output, State
import plotly.express as px
import pandas as pd
import pandas as pd
pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.width=None
data = pd.read_csv (r'C:\Users\Sara.Munoz\OneDrive - Unilever\Documents\Sarita.csv',
encoding = "ISO-8859-1",
)
df=data
print(df.head())
cols=df.columns
print(cols)
###RE-ARRANGE DATASET###
df = pd.melt(df, id_vars=['Geography Node Name', 'Geography Id', 'Geography Level',
'Category Node Name', 'Category Id', 'Category Level',
'Global Manufacturer Name', 'Global Manufacturer Id',
'Brand Position Type', 'Brand Position Name', 'Brand Position Id',
'Local Brand Name', 'Local Brand Id', 'Measure',
'Currency or Unit of Measure','Latest Available Date'],value_vars=['MAT','L12W','L4W'], var_name='Period',value_name='Data')
for col in df.columns:
print(col)
###CLEAN DATASET###
df.rename(columns = {'Geography Node Name':'Geography','Category Node Name':'Category',
'Global Manufacturer Name':'Manufacturer','Geography Level':'GLevel'},inplace = True)
df.drop(["Geography Id", "Category Id","Global Manufacturer Id","Brand Position Type",
"Brand Position Name","Brand Position Id","Local Brand Name","Local Brand Id","Latest Available Date",
"Currency or Unit of Measure"], axis = 1, inplace=True)
print("SEE BELOW NEW DATASET")
print(df.head())
#####FOR VALUE SHARE
print("FOR VALUE SHARE")
df2 = df.loc[df['GLevel'] == 5]
df2 = df2.loc[df2['Measure'] == 'Value Share']
df2 = df2.loc[df2['Category'] == 'Toothpaste']
df2 = df2[df2.Manufacturer != 'ALL MANUFACTURERS']
df2 = df2[df2.Category != 'Oral Care']
df2.drop(["GLevel", "Category","Category Level"], axis = 1, inplace=True)
print(df2.head())
#####FOR VOLUME SHARE
print("FOR VOLUME SHARE")
df3 = df.loc[df['GLevel'] == 5]
df3 = df3.loc[df3['Measure'] == 'Volume Share']
df3 = df3.loc[df3['Category'] == 'Toothpaste']
df3 = df3[df3.Manufacturer != 'ALL MANUFACTURERS']
df3 = df3[df3.Category != 'Oral Care']
df3.drop(["GLevel", "Category","Category Level"], axis = 1, inplace=True)
df3=df3.sort_values(['Geography', 'Period'],ascending = [True, True])
df3 = pd.DataFrame(df3)
df3=df3[['Geography','Period','Manufacturer','Measure','Data']]
print(df3)
###############################################################################
app = dash.Dash(__name__)
app.layout = html.Div(
[
dcc.Dropdown(
id="dropdown-1",
options=[
{'label': 'Indonesia', 'value': 'Indonesia'},
{'label': 'France', 'value': 'France'},
{'label': 'Vietnam', 'value': 'Vietnam'},
{'label': 'Chile', 'value': 'Chile'},
{'label': 'United Arab Emirates', 'value': 'United Arab Emirates'},
{'label': 'Morocco', 'value': 'Morocco'},
{'label': 'Russian Federation', 'value': 'Russian Federation'},
{'label': 'China', 'value': 'China'},
{'label': 'Greece', 'value': 'Greece'},
{'label': 'Netherlands', 'value': 'Netherlands'},
{'label': 'Austria', 'value': 'Austria'},
{'label': 'Germany', 'value': 'Germany'},
{'label': 'Switzerland', 'value': 'Switzerland'},
{'label': 'Italy', 'value': 'Italy'},
{'label': 'Denmark', 'value': 'Denmark'},
{'label': 'Norway', 'value': 'Norway'},
{'label': 'Sweden', 'value': 'Sweden'}
],
multi=True,
),
dcc.Dropdown(
id="dropdown-2",
options=[
{'label': 'MAT', 'value': 'MAT'},
{'label': 'L12W', 'value': 'L12W'},
{'label': 'L4W', 'value': 'L4W'}
],
multi=True,
),
html.Div([], id="plot1", children=[])
], style={'display': 'flex'})
#app.callback(
Output("plot1", "children"),
[Input("dropdown-1", "value"), Input("dropdown-2", "value")],
prevent_initial_call=True
)
def get_graph(entered_Geo, entered_Period):
fd = df2[(df3['Geography']==entered_Geo) &
(df3['Period']==entered_Period)]
g1= fd.groupby(['Manufacturer'],as_index=False). \
mean()
g1 = g1
plot1= px.pie(g1, values='Data', names='Manufacturer', title="Value MS")
return[dcc.Graph(figure=plot1)]
if __name__ == '__main__':
app.run_server()
#DESIGN APP LAYOUT##############################################################################
app.layout = html.Div([
html.Label("Geography:",style={'fontSize':30, 'textAlign':'center'}),
dcc.Dropdown(
id='dropdown1',
options=[{'label': s, 'value': s} for s in sorted(df3.Geography.unique())],
value=None,
clearable=False
),
html.Label("Period:", style={'fontSize':30, 'textAlign':'center'}),
dcc.Dropdown(id='dropdown2',
options=[],
value=[],
multi=False),
html.Div([
html.Div([ ], id='plot1'),
html.Div([ ], id='plot2')
], style={'display': 'flex'}),
])
##############
# Populate the Period dropdown with options and values
#app.callback(
Output('dropdown2', 'options'),
Output('dropdown2', 'value'),
Input('dropdown1', 'value'),
)
def set_period_options(chosen_Geo):
dff = df3[df3.Geography==chosen_Geo]
Periods = [{'label': s, 'value': s} for s in df3.Period.unique()]
values_selected = [x['value'] for x in Periods]
return Periods, values_selected
# Create graph component and populate with pie chart
#app.callback([Output(component_id='plot1', component_property='children'),
Output(component_id='plot2', component_property='children')],
Input('dropdown2', 'value'),
Input('dropdown1', 'value'),
prevent_initial_call=True
)
def update_graph(selected_Period, selected_Geo):
if len(selected_Period) == 0:
return no_update
else:
#Volume Share
dff3 = df3[(df3.Geography==selected_Geo) & (df3.Period==selected_Period)]
#Value Share
dff2 = df2[(df2.Geography==selected_Geo) & (df2.Period==selected_Period)]
#####
fig1 = px.pie(dff2, values='Data', names='Manufacturer', title=" Value MS")
fig2 = px.pie(dff3, values='Data', names='Manufacturer', title=" Volume MS")
table =
return [dcc.Graph(figure=fig1),
dcc.Graph(figure=fig2) ]
if __name__ == '__main__':
app.run_server()```

Using dblink extension from pg-promise

I have a PostgreSQL database with dblink extension.
I can use dblink without issues from pgAdmin but not from my nodeJS code using pg-promise.
I have checked that I am on the correct schema and database.
Running SELECT * FROM pg_extension; from my code does return that dblink is installed. However running a query including dblink results in: error: function dblink(unknown, unknown) does not exist
Is there something I should do to make dblink work in this scenario?
This a basic example of my code:
query1 = 'SELECT * FROM pg_extension;'
query2 = `Select *
FROM dblink('host=XXX user=XXX password=XXXX dbname=XXXX',
'select name from example_table')
AS t(name text);`
db.any(
query1
).then(function(results) {
console.log('Query1 result:', results)
}).catch(function(err) {
console.log(`Error in query1 ${err}`)
})
db.any(
query2
).then(function(results) {
console.log('Query2 result:', results)
}).catch(function(err) {
console.log(`Error in query2 ${err}`)
})
Result:
Query1 result: [
{
extname: 'plpgsql',
extowner: 10,
extnamespace: 11,
extrelocatable: false,
extversion: '1.0',
extconfig: null,
extcondition: null
},
{
extname: 'dblink',
extowner: 10,
extnamespace: 2200,
extrelocatable: true,
extversion: '1.2',
extconfig: null,
extcondition: null
},
{
extname: 'timescaledb',
extowner: 10,
extnamespace: 24523,
extrelocatable: false,
extversion: '1.7.1',
extconfig: [
25044, 25042, 25068, 25083,
25081, 25102, 25100, 25118,
25116, 25139, 25155, 25157,
25173, 25175, 25193, 25210,
25246, 25254, 25283, 25293,
25303, 25307, 25324, 25343,
25358, 25472, 25478, 25475
],
extcondition: [
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'WHERE id >= 1000',
'',
'',
"WHERE key='exported_uuid'",
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
''
]
}
]
Error in query2 error: function dblink(unknown, unknown) does not exist
It looks like exactly what I get if dblink is installed in "public" but "public" is not in my search_path.

Open Street Map using OSMN: how to get building height?

I am trying to find a way to extract building heights. Here is what I've tried so far:
place_name = "Uptown, Dallas, Texas"
buildings = ox.geometries_from_place(place_name, tags={'building':True})
print(buildings.columns)
outputs:
Index(['amenity', 'geometry', 'nodes', 'addr:housenumber', 'addr:street',
'building', 'building:levels', 'height', 'name', 'office', 'wikidata',
'wikipedia', 'parking', 'addr:city', 'addr:postcode', 'addr:state',
'layer', 'cuisine', 'access', 'addr:country', 'brand', 'brand:wikidata',
'brand:wikipedia', 'opening_hours', 'operator', 'operator:wikidata',
'operator:wikipedia', 'phone', 'ref:walmart', 'shop', 'website',
'wheelchair', 'beds', 'emergency', 'gnis:feature_id', 'healthcare',
'old_name', 'ele', 'gnis:county_name', 'gnis:import_uuid',
'gnis:reviewed', 'source', 'fee', 'smoking', 'roof:levels',
'roof:shape', 'addr:unit', 'tourism', 'short_name', 'contact:website',
'outdoor_seating', 'ways', 'type'],
dtype='object')
height parameters are NaN for most values. The closet parameter is building:levels but it is just number of stories in buildings.

Spark SQL error : org.apache.spark.sql.catalyst.parser.ParseException: extraneous input '$' expecting

I am forming a query in a String Builder like below :
println(dataQuery)
Execution started at 2019-10-31 02:58:24.006019 PST
res245: String =
" SELECT transaction_created_date, txn_mth, txn_mth_id, breakout_y_n, cast($counter as Int) AS arrival_days, cast(date_sub(date_add(transaction_created_date,$counter),day(transaction_created_date)) as String) as Arrival_date,trim(cast(getDayOfWeek(cast(date_sub(date_add(transaction_created_date,$counter),day(transaction_created_date)) as String)) as String)) as weekday,cast(ceil($counter/7)as Int) as week_no, sum(if(arrival_day_base<=$counter,gross,0)) as GROSS, sum(if(arrival_day_base<=$counter,nbc,0)) as NBC, sum(if(arrival_day_base<=$counter,nbr,0)) as NBR, sum(if(arrival_day_base<=$counter,dp,0)) as DP, sum(if(arrival_day_base==$counter,gross,0)) as DAYGROSS, sum(if(arrival_day_base==$counter,nbc,0)) as DAYNBC, sum(if(arrival_day_base==$counter,nbr,0)) as DAYNBR, , sum(if(arrival_day_base==$counter,dp,0)) as DAYDP,
FROM BASE_DLV
GROUP BY transaction_created_date, txn_mth, txn_mth_id, breakout_y_n, arrival_days, arrival_date, weekday, week_no
when executing it as sql val data3 = spark.sql(dataQuery)
getting below error:
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '$' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'FIRST', 'AFTER', 'LAST', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'DIRECTORY', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'COST', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IGNORE', 'BOTH', 'LEADING', 'TRAILING', 'IF', 'POSITION', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 74)
== SQL ==
SELECT transaction_created_date, txn_mth, txn_mth_id, breakout_y_n, cast($counter as Int) AS arrival_days, cast(date_sub(date_add(transaction_created_date,$counter),day(transaction_created_date)) as String) as Arrival_date,trim(cast(getDayOfWeek(cast(date_sub(date_add(transaction_created_date,$counter),day(transaction_created_date)) as String)) as String)) as weekday,cast(ceil($counter/7)as Int) as week_no, sum(if(arrival_day_base<=$counter,gross,0)) as GROSS, sum(if(arrival_day_base<=$counter,nbc,0)) as NBC, sum(if(arrival_day_base<=$counter,nbr,0)) as NBR, sum(if(arrival_day_base<=$counter,dp,0)) as DP, sum(if(arrival_day_base==$counter,gross,0)) as DAYGROSS, sum(if(arrival_day_base==$counter,nbc,0)) as DAYNBC, sum(if(arrival_day_base==$counter,nbr,0)) as DAYNBR, sum(if(arrival_day_base==$counter,dp,0)) as DAYDP
--------------------------------------------------------------------------^^^
FROM BASE_DLV
GROUP BY transaction_created_date, txn_mth, txn_mth_id, breakout_y_n, arrival_days, arrival_date, weekday, week_no
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:641)
... 71 elided
also tried to run the same query directly
val data2 =spark.sql(s"""SELECT transaction_created_date, txn_mth, txn_mth_id, breakout_y_n,
cast($counter as Int) AS arrival_days,
cast(date_sub(date_add(transaction_created_date,$counter),day(transaction_created_date)) as String) as Arrival_date,
trim(cast(getDayOfWeek(cast(date_sub(date_add(transaction_created_date,$counter),day(transaction_created_date)) as String)) as String)) as weekday,
cast(ceil($counter/7)as Int) as week_no,
sum(if(arrival_day_base<=$counter,gross,0)) as GROSS,
sum(if(arrival_day_base<=$counter,nbc,0)) as NBC,
sum(if(arrival_day_base<=$counter,nbr,0)) as NBR,
sum(if(arrival_day_base<=$counter,dp,0)) as DP,
sum(if(arrival_day_base==$counter,gross,0)) as DAYGROSS,
sum(if(arrival_day_base==$counter,nbc,0)) as DAYNBC,
sum(if(arrival_day_base==$counter,nbr,0)) as DAYNBR,
sum(if(arrival_day_base==$counter,dp,0)) as DAYDP
FROM BASE_DLV
GROUP BY transaction_created_date, txn_mth, txn_mth_id, breakout_y_n, arrival_days, arrival_date, weekday, week_no""")
and it is executing successfully
Execution started at 2019-10-31 02:51:32.451289 PST
data2: org.apache.spark.sql.DataFrame = [transaction_created_date: string, txn_mth: string ... 14 more fields]
Execution completed at 2019-10-31 02:51:34.532190 PST in 2.08 s
but getting same parse error on trying
val data3 = spark.sql(s"""$dataQuery""")
can anyone please help with the using the stringBuilder in spark.sql() without the issue
dataQuery should have counter defined and evaluated
val counter = 10
val dataQuery = s"select $counter as cnt" //gives select 10 as cnt
spark.sql(s"$dataQuery").show()
shows
+---+
|cnt|
+---+
| 10|
+---+
I think what you are noticing is in scala multi line queries need """ triple quotes around multi line SQL statements.

Allauth/facebook won't send/accept https

I've created an app and deployed it live on digitalocean and enabled HTTPS with certbot, when I want to login with facebook:
I got this error:
and the redirect URL:
is not HTTPS:
My code:
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'django.contrib.sites',
'social_django',
'allauth',
'allauth.account',
'allauth.socialaccount',
'allauth.socialaccount.providers.facebook',
]
SOCIALACCOUNT_PROVIDERS = {
'facebook': {
'METHOD': 'oauth2',
'SCOPE': ['email', 'public_profile'],
'AUTH_PARAMS': {'auth_type': 'reauthenticate'},
'INIT_PARAMS': {'cookie': True},
'FIELDS': [
'id',
'email',
'name',
'first_name',
'last_name',
'verified',
'locale',
'timezone',
'link',
'gender',
'updated_time',
],
'EXCHANGE_TOKEN': True,
'LOCALE_FUNC': 'path.to.callable',
'VERIFIED_EMAIL': False,
'VERSION': 'v2.12',
}
}
SOCIAL_AUTH_REDIRECT_IS_HTTPS = True
AUTHENTICATION_BACKENDS = (
'django.contrib.auth.backends.ModelBackend',
# 'social_core.backends.facebook.FacebookOAuth2',
'allauth.account.auth_backends.AuthenticationBackend',
)
login btn
{% load socialaccount %}
Facebook OAuth2
Try adding the following to your settings:
ACCOUNT_DEFAULT_HTTP_PROTOCOL = 'https'
(see this issue on the allauth github https://github.com/pennersr/django-allauth/issues/1994)