Related
**Hello everyone!
I have been trying to create an interactive dashboard in python using the #app.callback function with two inputs. My dataset layout can be summarized into 4 main columns. [1]: https://i.stack.imgur.com/boMKt.png
I'd like Geography and Time Period to manifest in the form of dropdowns (therefore use the Dcc. dropdown function.
The first dropdown will filter the dataset according to the Geography and the second one will define the "Period time - MAT, L12w or L4w) within the country. Therefore somehow the second dropdown is to be integrated within the first dropdown.
I am familiarized with both the dropdown and #app.callback function. But I can't seem to find a script that fuses both. Important note: the output desired is a pie chart that distinguishes Manufacturers' (Column 2) value share (column 4) according to the selected Geography and time Period. I am guessing the mystery resides in the app.layout structure. However, I tried everything and the code won't work.
Also, you will find the code I have done so far attached. The important bit is from "#DESIGN APP LAYOUT" onwards.
I'd really appreciate a quick response. Thanks in advance for the help!**
from dash import html
from dash import dcc
from dash.dependencies import Input, Output, State
import plotly.express as px
import pandas as pd
import pandas as pd
pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.width=None
data = pd.read_csv (r'C:\Users\Sara.Munoz\OneDrive - Unilever\Documents\Sarita.csv',
encoding = "ISO-8859-1",
)
df=data
print(df.head())
cols=df.columns
print(cols)
###RE-ARRANGE DATASET###
df = pd.melt(df, id_vars=['Geography Node Name', 'Geography Id', 'Geography Level',
'Category Node Name', 'Category Id', 'Category Level',
'Global Manufacturer Name', 'Global Manufacturer Id',
'Brand Position Type', 'Brand Position Name', 'Brand Position Id',
'Local Brand Name', 'Local Brand Id', 'Measure',
'Currency or Unit of Measure','Latest Available Date'],value_vars=['MAT','L12W','L4W'], var_name='Period',value_name='Data')
for col in df.columns:
print(col)
###CLEAN DATASET###
df.rename(columns = {'Geography Node Name':'Geography','Category Node Name':'Category',
'Global Manufacturer Name':'Manufacturer','Geography Level':'GLevel'},inplace = True)
df.drop(["Geography Id", "Category Id","Global Manufacturer Id","Brand Position Type",
"Brand Position Name","Brand Position Id","Local Brand Name","Local Brand Id","Latest Available Date",
"Currency or Unit of Measure"], axis = 1, inplace=True)
print("SEE BELOW NEW DATASET")
print(df.head())
#####FOR VALUE SHARE
print("FOR VALUE SHARE")
df2 = df.loc[df['GLevel'] == 5]
df2 = df2.loc[df2['Measure'] == 'Value Share']
df2 = df2.loc[df2['Category'] == 'Toothpaste']
df2 = df2[df2.Manufacturer != 'ALL MANUFACTURERS']
df2 = df2[df2.Category != 'Oral Care']
df2.drop(["GLevel", "Category","Category Level"], axis = 1, inplace=True)
print(df2.head())
#####FOR VOLUME SHARE
print("FOR VOLUME SHARE")
df3 = df.loc[df['GLevel'] == 5]
df3 = df3.loc[df3['Measure'] == 'Volume Share']
df3 = df3.loc[df3['Category'] == 'Toothpaste']
df3 = df3[df3.Manufacturer != 'ALL MANUFACTURERS']
df3 = df3[df3.Category != 'Oral Care']
df3.drop(["GLevel", "Category","Category Level"], axis = 1, inplace=True)
df3=df3.sort_values(['Geography', 'Period'],ascending = [True, True])
df3 = pd.DataFrame(df3)
df3=df3[['Geography','Period','Manufacturer','Measure','Data']]
print(df3)
###############################################################################
app = dash.Dash(__name__)
app.layout = html.Div(
[
dcc.Dropdown(
id="dropdown-1",
options=[
{'label': 'Indonesia', 'value': 'Indonesia'},
{'label': 'France', 'value': 'France'},
{'label': 'Vietnam', 'value': 'Vietnam'},
{'label': 'Chile', 'value': 'Chile'},
{'label': 'United Arab Emirates', 'value': 'United Arab Emirates'},
{'label': 'Morocco', 'value': 'Morocco'},
{'label': 'Russian Federation', 'value': 'Russian Federation'},
{'label': 'China', 'value': 'China'},
{'label': 'Greece', 'value': 'Greece'},
{'label': 'Netherlands', 'value': 'Netherlands'},
{'label': 'Austria', 'value': 'Austria'},
{'label': 'Germany', 'value': 'Germany'},
{'label': 'Switzerland', 'value': 'Switzerland'},
{'label': 'Italy', 'value': 'Italy'},
{'label': 'Denmark', 'value': 'Denmark'},
{'label': 'Norway', 'value': 'Norway'},
{'label': 'Sweden', 'value': 'Sweden'}
],
multi=True,
),
dcc.Dropdown(
id="dropdown-2",
options=[
{'label': 'MAT', 'value': 'MAT'},
{'label': 'L12W', 'value': 'L12W'},
{'label': 'L4W', 'value': 'L4W'}
],
multi=True,
),
html.Div([], id="plot1", children=[])
], style={'display': 'flex'})
#app.callback(
Output("plot1", "children"),
[Input("dropdown-1", "value"), Input("dropdown-2", "value")],
prevent_initial_call=True
)
def get_graph(entered_Geo, entered_Period):
fd = df2[(df3['Geography']==entered_Geo) &
(df3['Period']==entered_Period)]
g1= fd.groupby(['Manufacturer'],as_index=False). \
mean()
g1 = g1
plot1= px.pie(g1, values='Data', names='Manufacturer', title="Value MS")
return[dcc.Graph(figure=plot1)]
if __name__ == '__main__':
app.run_server()
#DESIGN APP LAYOUT##############################################################################
app.layout = html.Div([
html.Label("Geography:",style={'fontSize':30, 'textAlign':'center'}),
dcc.Dropdown(
id='dropdown1',
options=[{'label': s, 'value': s} for s in sorted(df3.Geography.unique())],
value=None,
clearable=False
),
html.Label("Period:", style={'fontSize':30, 'textAlign':'center'}),
dcc.Dropdown(id='dropdown2',
options=[],
value=[],
multi=False),
html.Div([
html.Div([ ], id='plot1'),
html.Div([ ], id='plot2')
], style={'display': 'flex'}),
])
##############
# Populate the Period dropdown with options and values
#app.callback(
Output('dropdown2', 'options'),
Output('dropdown2', 'value'),
Input('dropdown1', 'value'),
)
def set_period_options(chosen_Geo):
dff = df3[df3.Geography==chosen_Geo]
Periods = [{'label': s, 'value': s} for s in df3.Period.unique()]
values_selected = [x['value'] for x in Periods]
return Periods, values_selected
# Create graph component and populate with pie chart
#app.callback([Output(component_id='plot1', component_property='children'),
Output(component_id='plot2', component_property='children')],
Input('dropdown2', 'value'),
Input('dropdown1', 'value'),
prevent_initial_call=True
)
def update_graph(selected_Period, selected_Geo):
if len(selected_Period) == 0:
return no_update
else:
#Volume Share
dff3 = df3[(df3.Geography==selected_Geo) & (df3.Period==selected_Period)]
#Value Share
dff2 = df2[(df2.Geography==selected_Geo) & (df2.Period==selected_Period)]
#####
fig1 = px.pie(dff2, values='Data', names='Manufacturer', title=" Value MS")
fig2 = px.pie(dff3, values='Data', names='Manufacturer', title=" Volume MS")
table =
return [dcc.Graph(figure=fig1),
dcc.Graph(figure=fig2) ]
if __name__ == '__main__':
app.run_server()```
hello i have an object Json which have some values as payload , key , card .. so i aim to get the data i need directly with key "payload" .
var dataFinal= tag.data.toString();
and this what i get if i print my data
[log] handle {nfca: {identifier: [12, 4, 18, 17], atqa: [4, 0], maxTransceiveLength: 253, sak: 8, timeout: 618}, mifareclassic: {identifier: [99, 4, 150, 17], blockCount: 64, maxTransceiveLength: 253, sectorCount: 16, size: 1024, timeout: 618, type: 0}, ndef: {identifier: [99, 4, 150, 17], isWritable: true, maxSize: 716, canMakeReadOnly: false, cachedMessage: {records: [{typeNameFormat: 1, type: [84], identifier: [], payload: [1,45,989]}]}, type: com.nxp.ndef.mifareclassic}}
how can i get the payload value ?
You can check out the function jsonDecode() which expects a string as a param and returns dynamic or Map in your case
import 'dart:convert';
Map<String,dynamic> data = jsonDecode(tag.data.toString());
print(data["nfca"]);
I have a project where I need to do a update multiple rows at once. I have found the example on how to do this is the docs: documentation
I have done used a columnset because it is being recommended to do to in the documentations. I have set the ?feature_id so it is only used in the WHERE clause.
The error my code is generating is the following: error: column "created_on" is of type timestamp with time zone but expression is of type text. I have noticed in the query that is being generated and that seems to be in line with the example.
This code has an insert statement for the features that are new and that seems to work fine. The error is only being thrown on the update query.
const insertValues = [];
const updateValues = [];
for (let i = 0; i < features.length; i += 1) {
const feature = features[i];
if (!excistingFeaturesIds.includes(feature.id)) {
insertValues.push({
plot_id: plotId,
type: feature.type,
area: feature.area,
created_on: currendDate,
updated_on: currendDate,
geo_feature: feature.geoFeature,
});
} else {
updateValues.push({
feature_id: feature.id,
plot_id: plotId,
type: feature.type,
area: feature.area,
created_on: currendDate,
updated_on: currendDate,
geo_feature: feature.geoFeature,
});
}
}
const insertColumnSet = new pgp.helpers.ColumnSet(['plot_id', 'type', 'area', 'created_on', 'updated_on', 'geo_feature'], { table: 'features' });
const updateColumnSet = new pgp.helpers.ColumnSet(['?feature_id', 'plot_id', 'type', 'area', 'created_on', 'updated_on', 'geo_feature'], { table: 'features' });
if (insertValues && insertValues.length > 0) {
const insertQuery = pgp.helpers.insert(
insertValues, insertColumnSet,
);
await promiseDB.none(insertQuery);
}
if (updateValues && updateValues.length > 0) {
const updateQuery = `${pgp.helpers.update(
updateValues, updateColumnSet,
)} WHERE v.feature_id = t.feature_id`;
console.log(updateQuery);
await promiseDB.none(updateQuery);
}
return res.status(201).json({
message: 'Features added!',
});
} catch (err) {
console.log(err);
return res.status(400).send(err);
}
UPDATE
"features" AS t
SET
"plot_id" = v. "plot_id",
"type" = v. "type",
"area" = v. "area",
"created_on" = v. "created_on",
"updated_on" = v. "updated_on",
"geo_feature" = v. "geo_feature"
FROM (
values(1, 3, 'roof', 342.01520314642977, '2021-07-20T09:56:10.007+02:00', '2021-07-20T09:56:10.007+02:00', '{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[coords...]]]},"properties":{"UIDN":6338864,"OIDN":5290477,"VERSIE":1,"BEGINDATUM":"2015-09-23","VERSDATUM":"2015-09-23","TYPE":1,"LBLTYPE":"hoofdgebouw","OPNDATUM":"2015-08-25","BGNINV":5,"LBLBGNINV":"kadastralisatie","type":"roof","tools":"polygon","description":"D1","id":5290477,"area":342.01520314642977,"roofType":"saddle","roofGreen":"normal","database":true},"id":1}'),
(2,
3,
'roof',
181.00725895629216,
'2021-07-20T09:56:10.007+02:00',
'2021-07-20T09:56:10.007+02:00',
'{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[coords...]]]},"properties":{"UIDN":6338518,"OIDN":5290131,"VERSIE":1,"BEGINDATUM":"2015-09-23","VERSDATUM":"2015-09-23","TYPE":1,"LBLTYPE":"hoofdgebouw","OPNDATUM":"2015-08-25","BGNINV":5,"LBLBGNINV":"kadastralisatie","type":"roof","tools":"polygon","description":"D2","id":5290131,"area":181.00725895629216,"roofType":"flat","roofGreen":"normal","database":true},"id":2}'),
(3,
3,
'roof',
24.450163203958745,
'2021-07-20T09:56:10.007+02:00',
'2021-07-20T09:56:10.007+02:00',
'{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[coords...]]]},"properties":{"UIDN":5473377,"OIDN":4708120,"VERSIE":1,"BEGINDATUM":"2014-07-04","VERSDATUM":"2014-07-04","TYPE":2,"LBLTYPE":"bijgebouw","OPNDATUM":"2014-05-27","BGNINV":4,"LBLBGNINV":"bijhouding binnengebieden","type":"roof","tools":"polygon","description":"D3","id":4708120,"area":24.450163203958745,"roofType":"saddle","roofGreen":"normal","database":true},"id":3}'),
(4,
3,
'water',
57.65676046589426,
'2021-07-20T09:56:10.007+02:00',
'2021-07-20T09:56:10.007+02:00',
'{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[coords...]]]},"properties":{"UIDN":473256,"OIDN":199890,"VERSIE":2,"BEGINDATUM":"2017-03-08","VERSDATUM":"2021-05-06","VHAG":-9,"NAAM":"nvt","OPNDATUM":"2017-01-30","BGNINV":4,"LBLBGNINV":"bijhouding binnengebieden","type":"water","tools":"polygon","description":"W1","id":199890,"area":57.65676046589426,"waterType":"natural","database":true},"id":4}'))
AS v ("feature_id",
"plot_id",
"type",
"area",
"created_on",
"updated_on",
"geo_feature")
WHERE
v.feature_id = t.feature_id
INSERT INTO "features" ("plot_id", "type", "area", "created_on", "updated_on", "geo_feature")
values(3, 'roof', 342.01520314642977, '2021-07-20T10:17:04.565+02:00', '2021-07-20T10:17:04.565+02:00', '{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[coords...]]]},"properties":{"UIDN":6338864,"OIDN":5290477,"VERSIE":1,"BEGINDATUM":"2015-09-23","VERSDATUM":"2015-09-23","TYPE":1,"LBLTYPE":"hoofdgebouw","OPNDATUM":"2015-08-25","BGNINV":5,"LBLBGNINV":"kadastralisatie","type":"roof","tools":"polygon","description":"D1","id":5290477,"area":342.01520314642977,"roofType":"saddle","roofGreen":"normal","database":true},"id":1}'), (3, 'roof', 181.00725895629216, '2021-07-20T10:17:04.565+02:00', '2021-07-20T10:17:04.565+02:00', '{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[coords...]]]},"properties":{"UIDN":6338518,"OIDN":5290131,"VERSIE":1,"BEGINDATUM":"2015-09-23","VERSDATUM":"2015-09-23","TYPE":1,"LBLTYPE":"hoofdgebouw","OPNDATUM":"2015-08-25","BGNINV":5,"LBLBGNINV":"kadastralisatie","type":"roof","tools":"polygon","description":"D2","id":5290131,"area":181.00725895629216,"roofType":"flat","roofGreen":"normal","database":true},"id":2}'), (3, 'roof', 24.450163203958745, '2021-07-20T10:17:04.565+02:00', '2021-07-20T10:17:04.565+02:00', '{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[coords...]]]},"properties":{"UIDN":5473377,"OIDN":4708120,"VERSIE":1,"BEGINDATUM":"2014-07-04","VERSDATUM":"2014-07-04","TYPE":2,"LBLTYPE":"bijgebouw","OPNDATUM":"2014-05-27","BGNINV":4,"LBLBGNINV":"bijhouding binnengebieden","type":"roof","tools":"polygon","description":"D3","id":4708120,"area":24.450163203958745,"roofType":"saddle","roofGreen":"normal","database":true},"id":3}'), (3, 'water', 57.65676046589426, '2021-07-20T10:17:04.565+02:00', '2021-07-20T10:17:04.565+02:00', '{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[coords...]]]},"properties":{"UIDN":473256,"OIDN":199890,"VERSIE":2,"BEGINDATUM":"2017-03-08","VERSDATUM":"2021-05-06","VHAG":-9,"NAAM":"nvt","OPNDATUM":"2017-01-30","BGNINV":4,"LBLBGNINV":"bijhouding binnengebieden","type":"water","tools":"polygon","description":"W1","id":199890,"area":57.65676046589426,"waterType":"natural","database":true},"id":4}')
Your columns created_on and updated_on need SQL type casting, hence the error.
And there is no need re-creting the same list of table -> columns.
In all, your column-sets can be created like this:
const insertColumnSet = new pgp.helpers.ColumnSet([
'plot_id',
'type',
'area',
{name: 'created_on', cast: 'timestamptz'},
{name: 'updated_on', cast: 'timestamptz'},
'geo_feature'
], { table: 'features' });
const updateColumnSet = insertColumnSet.extend(['?feature_id']};
See Column full syntax, plus extend method.
UPDATE
Note that strictly speaking, only the UPDATE needs the type casting, while your INSERT can infer the type automatically, which you can reflect in the column-sets like this:
const insertColumnSet = new pgp.helpers.ColumnSet(['plot_id', 'type', 'area',
'created_on', 'updated_on', 'geo_feature'], { table: 'features' });
const updateColumnSet = insertColumnSet.merge([
{name: 'feature_id', cnd: true}, // or just '?feature_id'
{name: 'created_on', cast: 'timestamptz'},
{name: 'updated_on', cast: 'timestamptz'}
]};
Both approaches will work fine in your case though ;)
See merge method.
I'm using the student data set from:
https://archive.ics.uci.edu/ml/machine-learning-databases/00320/
If I scale the features in the pipeline it loses the bulk of the metadata which I need later. Here is the basic setup without scaling to produce the metadata. The scaling options are commented for easy replication.
I'm selecting out numeric and categorical columns I wish to use for the model. Here is my data setup and pipeline without scaling to see the metadata.
# load data
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('student-performance').getOrCreate()
df_raw = spark.read.options(delimiter=';', header=True, inferSchema=True).csv('student-mat.csv')
# specify columns and filter
cols_cate = ['school', 'sex', 'Pstatus', 'Mjob', 'Fjob', 'famsup', 'activities', 'higher', 'internet', 'romantic']
cols_num = ['age', 'Medu', 'Fedu', 'studytime', 'failures', 'famrel', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2']
col_label = ['G3']
keep = cols_cate + cols_num + col_label
df_keep = df_raw.select(keep)
# setup pipeline
from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler, MinMaxScaler
cols_assembly = []
stages = []
for col in cols_cate:
string_index = StringIndexer(inputCol=col, outputCol=col+'-indexed')
encoder = OneHotEncoder(inputCol=string_index.getOutputCol(), outputCol=col+'-encoded')
cols_assembly.append(encoder.getOutputCol())
stages += [string_index, encoder]
# assemble vectors
assembler_input = cols_assembly + cols_num
assembler = VectorAssembler(inputCols=assembler_input, outputCol='features')
stages += [assembler]
# MinMaxScalar option - will need to change 'features' -> 'scaled-features' later
#scaler = MinMaxScaler(inputCol='features', outputCol='scaled-features')
#stages += [scaler]
# apply pipeline
from pyspark.ml import Pipeline
pipeline = Pipeline(stages=stages)
pipelineModel = pipeline.fit(df_keep)
df_pipe = pipelineModel.transform(df_keep)
cols_selected = ['features'] + cols_cate + cols_num + ['G3']
df_pipe = df_pipe.select(cols_selected)
Make the training data, fit a model, and get predictions.
from pyspark.ml.regression import LinearRegression
train, test = df_pipe.randomSplit([0.7, 0.3], seed=14)
lr = LinearRegression(featuresCol='features',labelCol='G3', maxIter=10, regParam=0.3, elasticNetParam=0.8)
lrModel = lr.fit(train)
lr_preds = lrModel.transform(test)
Checking the metadata of the "features" column I have a lot of information here.
lr_preds.schema['features'].metadata
Output:
{'ml_attr': {'attrs': {'numeric': [{'idx': 16, 'name': 'age'},
{'idx': 17, 'name': 'Medu'},
{'idx': 18, 'name': 'Fedu'},
{'idx': 19, 'name': 'studytime'},
{'idx': 20, 'name': 'failures'},
{'idx': 21, 'name': 'famrel'},
{'idx': 22, 'name': 'goout'},
{'idx': 23, 'name': 'Dalc'},
{'idx': 24, 'name': 'Walc'},
{'idx': 25, 'name': 'health'},
{'idx': 26, 'name': 'absences'},
{'idx': 27, 'name': 'G1'},
{'idx': 28, 'name': 'G2'}],
'binary': [{'idx': 0, 'name': 'school-encoded_GP'},
{'idx': 1, 'name': 'sex-encoded_F'},
{'idx': 2, 'name': 'Pstatus-encoded_T'},
{'idx': 3, 'name': 'Mjob-encoded_other'},
{'idx': 4, 'name': 'Mjob-encoded_services'},
{'idx': 5, 'name': 'Mjob-encoded_at_home'},
{'idx': 6, 'name': 'Mjob-encoded_teacher'},
{'idx': 7, 'name': 'Fjob-encoded_other'},
{'idx': 8, 'name': 'Fjob-encoded_services'},
{'idx': 9, 'name': 'Fjob-encoded_teacher'},
{'idx': 10, 'name': 'Fjob-encoded_at_home'},
{'idx': 11, 'name': 'famsup-encoded_yes'},
{'idx': 12, 'name': 'activities-encoded_yes'},
{'idx': 13, 'name': 'higher-encoded_yes'},
{'idx': 14, 'name': 'internet-encoded_yes'},
{'idx': 15, 'name': 'romantic-encoded_no'}]},
'num_attrs': 29}}
If I add scaling after the VectorAssembler (commented-out above) in the pipeline, retrain, and make predictions again, it loses all of this metadata.
lr_preds.schema['scaled-features'].metadata
Output:
{'ml_attr': {'num_attrs': 29}}
Is there any way to get this metadata back? Thanks in advance!
mck's suggestion of using 'features' from lr_preds works to get the metadata, it's unchanged. Thank you.
the column features should remain in the dataframelr_preds, maybe you can get it from that column instead?
I have a sample collection below,
samplecol:
{
key1: 'value1',
key2: 'value2',
key3: 'value3',
key4: 'value4',
key5: 'value5',
key6: 'value6',
key7: 'value7',
key8: 'value8',
key9: 'value9',
key10: 'value10',
key11: 'value11',
key12: 'value12',
key13: 'value13',
key14: 'value14',
}
I want to retrieve only 'value14', for that I can write a query like
db.samplecol.find(
{},
{key1: 0, key2: 0, key3: 0, key4: 0, key5: 0, key6: 0, key7: 0, key8: 0, key9: 0, key10:0, key11: 0, key12: 0, key13: 0, key14:1}
);
To retrieve only key14 , i have to make every other keys as 0, if it is 10 or 20 I can manage to write what should i do if I have 100s of fields.
Is there any easiest way to do like
db.samplecol.find({}, {key14: only}) ?
It simple will be as following:
db.samplecol.find({}, {key14: 1})
In that case only key14 will be fetched.
Here is a useful tutorial about Projection on mongodb site: https://docs.mongodb.com/v3.2/tutorial/project-fields-from-query-results/#return-the-specified-fields-and-the-id-field-only