Changing the size and symbol of scatter chart plot points in ScalaFX - scala

I want to make a linear regression program which visualizes the data to user. I'm using EJML for calculations and ScalaFX for front end. Everything is going fine but when I plot the data using Scatter Chart, the line drawn from the data is set to be rectangles which cover up the original data points. I would like to know how I can change the size, shape and transparency etc. of the plotted points.
Almost all of guides around JavaFX say that I should modify the CSS file (which doesn't automatically exist) in order to style my chart. I don't know how to do that in ScalaFX or even that is it possible to do that way. My result of searching every possible tutorial has been fruitless.
import scalafx.application.JFXApp
import scalafx.scene.Scene
import scalafx.scene.chart.ScatterChart
import scalafx.collections.ObservableBuffer
import scalafx.scene.chart.NumberAxis
import scalafx.scene.chart.XYChart
import scalafx.scene.shape.Line
import org.ejml.simple.SimpleMatrix
import scala.math.pow
import scala.collection.mutable.Buffer
object Plotting extends JFXApp {
/*
* Below are some arbitrary x and y values for a regression line
*/
val xValues = Array(Array(1.0, 1.0, 1.0, 1.0, 1.0, 1.0), Array(14.0, 19.0, 22.0, 26.0, 31.0, 43.0))
val yValues = Array(Array(51.0, 57.0, 66.0, 71.0, 72.0, 84.0))
val temp = yValues.flatten
val wrapper = xValues(1).zip(temp)
/*
* In the lines before stage what happens is that matrices for the x and y values are created, coefficients
* for the regression line are calculated with matrix operations and (x, y) points are calculated for the
* regression line.
*/
val X = new SimpleMatrix(xValues).transpose
val Y = new SimpleMatrix(yValues).transpose
val secondX = new SimpleMatrix(xValues(0).size, 2)
for (i <- 0 until xValues(0).size) {
secondX.set(i, 0, xValues(0)(i))
secondX.set(i, 1, xValues(1)(i))
}
val invertedSecondX = secondX.pseudoInverse()
val B = invertedSecondX.mult(Y)
val graphPoints = Buffer[(Double, Double)]()
for (i <- 0 to xValues(1).max.toInt) {
graphPoints.append((i.toDouble, B.get(0, 0) + i * B.get(1, 0)))
}
stage = new JFXApp.PrimaryStage {
title = "Demo"
scene = new Scene(400, 400) {
val xAxis = NumberAxis()
val yAxis = NumberAxis()
val pData = XYChart.Series[Number, Number](
"Data",
ObservableBuffer(wrapper.map(z => XYChart.Data[Number, Number](z._1, z._2)): _*))
val graph = XYChart.Series[Number, Number](
"RegressionLine",
ObservableBuffer(graphPoints.map(z => XYChart.Data[Number, Number](z._1, z._2)): _*))
val plot = new ScatterChart(xAxis, yAxis, ObservableBuffer(graph, pData))
root = plot
}
}
}

This certainly isn't as well documented as it might be... :-(
Stylesheets are typically placed in your project's resource directory. If you're using SBT (recommended), this would be src/main/resources.
In this example, I've added a stylesheet called MyCharts.css to this directory with the following contents:
/* Blue semi-transparent 4-pointed star, using SVG path. */
.default-color0.chart-symbol {
-fx-background-color: blue;
-fx-scale-shape: true;
-fx-shape: "M 0.0 10.0 L 3.0 3.0 L 10.0 0.0 L 3.0 -3.0 L 0.0 -10.0 L -3.0 -3.0 L -10.0 0.0 L -3.0 3.0 Z ";
-fx-opacity: 0.5;
}
/* Default shape is a rectangle. Here, we round it to become a red circle with a white
* center. Change the radius to control the size.
*/
.default-color1.chart-symbol {
-fx-background-color: red, white;
-fx-background-insets: 0, 2;
-fx-background-radius: 3px;
-fx-padding: 3px;
}
color0 will be used for the first data series (the regression line), color1 for the second (your scatter data). All other series use the default, JavaFX style.
(For more information on using scalable vector graphics (SVG) paths to define custom shapes, refer to the relevant section of the SVG specification.)
To have this stylesheet used by ScalaFX (JavaFX), you have a choice of options. To have them apply globally, add it to the main scene (which is what I've done below). Alternatively, if each chart needs a different style, you can add different stylesheets to specific charts. (BTW, I also added the standard includes import as this prevents many JavaFX-ScalaFX element conversion issues; otherwise, I've made no changes to your sources.)
import scalafx.Includes._
import scalafx.application.JFXApp
import scalafx.scene.Scene
import scalafx.scene.chart.ScatterChart
import scalafx.collections.ObservableBuffer
import scalafx.scene.chart.NumberAxis
import scalafx.scene.chart.XYChart
import scalafx.scene.shape.Line
import org.ejml.simple.SimpleMatrix
import scala.math.pow
import scala.collection.mutable.Buffer
object Plotting extends JFXApp {
/*
* Below are some arbitrary x and y values for a regression line
*/
val xValues = Array(Array(1.0, 1.0, 1.0, 1.0, 1.0, 1.0), Array(14.0, 19.0, 22.0, 26.0, 31.0, 43.0))
val yValues = Array(Array(51.0, 57.0, 66.0, 71.0, 72.0, 84.0))
val temp = yValues.flatten
val wrapper = xValues(1).zip(temp)
/*
* In the lines before stage what happens is that matrices for the x and y values are created, coefficients
* for the regression line are calculated with matrix operations and (x, y) points are calculated for the
* regression line.
*/
val X = new SimpleMatrix(xValues).transpose
val Y = new SimpleMatrix(yValues).transpose
val secondX = new SimpleMatrix(xValues(0).size, 2)
for (i <- 0 until xValues(0).size) {
secondX.set(i, 0, xValues(0)(i))
secondX.set(i, 1, xValues(1)(i))
}
val invertedSecondX = secondX.pseudoInverse()
val B = invertedSecondX.mult(Y)
val graphPoints = Buffer[(Double, Double)]()
for (i <- 0 to xValues(1).max.toInt) {
graphPoints.append((i.toDouble, B.get(0, 0) + i * B.get(1, 0)))
}
stage = new JFXApp.PrimaryStage {
title = "Demo"
scene = new Scene(400, 400) {
// Add our stylesheet.
stylesheets.add("MyCharts.css")
val xAxis = NumberAxis()
val yAxis = NumberAxis()
val pData = XYChart.Series[Number, Number](
"Data",
ObservableBuffer(wrapper.map(z => XYChart.Data[Number, Number](z._1, z._2)): _*))
val graph = XYChart.Series[Number, Number](
"RegressionLine",
ObservableBuffer(graphPoints.map(z => XYChart.Data[Number, Number](z._1, z._2)): _*))
val plot = new ScatterChart(xAxis, yAxis, ObservableBuffer(graph, pData))
root = plot
}
}
}
For further information in the CSS formatting options available (changing shapes, colors, transparency, etc.) refer to the JavaFX CSS Reference Guide.
The result looks like this:

I almost don't dare to add somethig to Mike Allens solution (wich is very good, as always), but this did not work out for me because I could not get my scala to find and/or process the .css file.
I would have done it this way if possible, but I just could not get it to work.
Here is what I came up with:
Suppose I have some data to display:
val xyExampleData: ObservableBuffer[(Double, Double)] = ObservableBuffer(Seq(
1 -> 1,
2 -> 4,
3 -> 9))
Then I convert this to a Series for the LineChart:
val DataPoints = ObservableBuffer(xyExampleData map { case (x, y) => XYChart.Data[Number, Number](x, y) })
val PointsToDisplay = XYChart.Series[Number, Number]("Points", DataPoints)
now I put this again into a Buffer, maybe with some other data from different series.
val lineChartBuffer = ObservableBuffer(PointsToDisplay, ...)
and finally I create my lineChart, wich I call (with lack of creativity) lineChart:
val lineChart = new LineChart(xAxis, yAxis, lineChartBuffer) {...}
The lines between data points can be recoloured now easily with:
lineChart.lookup(".default-color0.chart-series-line").setStyle("-fx-stroke: blue;")
This will change the Line-colour of the FIRST Dataset in the LineChartBuffer.
If you want to change the Line-Properties for the second you call
lineChart.lookup(".default-color1.chart-series-line")...
There is also "-fx-stroke-width: 3px;" to set the with of the line.
"-fx-opacity: 0.1;"
"-fx-stroke-dash-array: 10;"
-fx-fill: blue;"
are also usefull, but dont call the above line repeatedly, because the second call will override the first.
Instead concatenate all the strings into one:
lineChart.lookup(".default-color0.chart-series-line").setStyle("-fx-stroke: blue;-fx-opacity: 0.1;-fx-stroke-dash-array: 10;-fx-fill: blue;")
Now for the formatting of the Symbols at each data-Point:
unfortunately there seems to be no other way than to format each Symbol seperately:
lineChart.lookupAll(".default-color0.chart-line-symbol").asScala foreach { node => node.setStyle("-fx-background-color: blue, white;") }
for this to run you need import scala.collection.JavaConverters._
for the conversion from a java set to a scala set.
One can also make all data-poins from only one data-set invisible, for example:
lineChart.lookupAll(".default-color1.chart-line-symbol").asScala foreach { node => node.setVisible(false) }
To say this is a nice solution would be exaggerated.
And it has the big disadvantage, that you have to recolour or reformat every Symbol after adding a new Datapoint to one of the series in LineChartBuffer. If you don't, the new Symbols will have standard colours and settings.
The Lines stay, ones they are recoloured, I can't say why.
But the good side of it, one can always reformat curves in a Line Chart afterwards like this!

Related

Updating a Dash Callback using RadioItems

I am fairly new to python coding so I apologize in advance for my ignorance. I am trying to create a Dash App that drops outliers using standard deviation. The user selects a standard deviation using RadioItem inputs.
My question is what amendments do I need to make to my code so that the RadioItem value updates max_deviations using a callback?
Import packages, clean the data and define a query
import dash
import plotly.express as px
from dash import Dash, dcc, html, Input, Output, State
import pandas as pd
import numpy as np
app = dash.Dash(__name__)
server = app.server
df=pd.read_csv(r'C:\SVS_GIS\POWER BI\CSV_DATA\QSAS2021.csv', encoding='unicode_escape')
#SET DATE OF VALUATION
df['TIME'] = ((pd.to_datetime(df['Sale Date'], dayfirst=True)
.rsub(pd.to_datetime('01/10/2021', dayfirst=True))
.dt.days
)*-1)
df=df[df['TIME'] >= -365]
df = df.query("(SMA >=1 and SMA <= 3) and (LGA==60)")
prepare dataframe for dropping outliers
data = pd.DataFrame(data=df)
x = df.TIME
y = df.CHANGE
mean = np.mean(y)
standard_deviation = np.std(y)
distance_from_mean = abs(y - mean)
app layout
app.layout = html.Div([
html.Label("Standard Deviation Picker:", style={'fontSize':25, 'textAlign':'center'}),
html.Br(),
html.Label("1.0 = 68%, 2.0 = 95%, 3.0 = 99.7%", style={'fontSize':15,
'textAlign':'center'}),
html.Div(id="radio_items"),
dcc.RadioItems(
options=[{'label': i, 'value': i} for i in [1.0, 2.0, 3.0]],
value=2.0
),
html.Div([
dcc.Graph(id="the_graph")]
)])
callback
#app.callback(
Output("the_graph", "figure"),
Input("radio_items", 'value')
)
def update_graph(max_deviations):
not_outlier = distance_from_mean < max_deviations * standard_deviation
no_outliers = y[not_outlier]
trim_outliers = pd.DataFrame(data=no_outliers)
dff = pd.merge(trim_outliers, df, left_index=True, right_index=True)
return (dff)
fig = px.scatter(dff, x='TIME', y='CHANGE_y',
color ='SMA',
trendline='ols',
size='PV',
height=500,
width=800,
hover_name='SMA',
)
return dcc.Graph(id='the_graph', figure=fig)
if __name__ == '__main__':
app.run_server(debug=False)
Your dcc.RadioItems doesn't have an id prop. Add that, and make sure it matches the ID given in the callback, and you should be good.

plotly r sankey add_trace

i am reading the document https://plotly.com/r/reference/sankey/, and want to change the links color for a sankey chart. But i can't quite understand the parameters in add_trace() function
where should i specify the color value?
add_trace(p,type='sankey', color=????)
You haven't provided a minimal reproducible example, so I can't jump right into your code. But I think I can point you in the right direction.
In the documentation you screenshotted, it's saying that the color argument is one key of the list link that defines links in the plot. Using this example from the R plotly documentation for adding links, let's take a look at where that goes:
library(plotly)
library(rjson)
json_file <- "https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
fig <- plot_ly(
type = "sankey",
domain = list(
x = c(0,1),
y = c(0,1)
),
orientation = "h",
valueformat = ".0f",
valuesuffix = "TWh",
node = list(
label = json_data$data[[1]]$node$label,
color = json_data$data[[1]]$node$color,
pad = 15,
thickness = 15,
line = list(
color = "black",
width = 0.5
)
),
link = list(
source = json_data$data[[1]]$link$source,
target = json_data$data[[1]]$link$target,
value = json_data$data[[1]]$link$value,
label = json_data$data[[1]]$link$label,
#### Color goes here! ####
color = "yellow"
)
)
fig <- fig %>% layout(
title = "Energy forecast for 2050<br>Source: Department of Energy & Climate Change, Tom Counsell via <a href='https://bost.ocks.org/mike/sankey/'>Mike Bostock</a>",
font = list(
size = 10
),
xaxis = list(showgrid = F, zeroline = F),
yaxis = list(showgrid = F, zeroline = F)
)
fig
The plotly documentation can be a bit opaque at times. I have found it helpful to sometimes review the documentation for python. For example, this part of the python documentation does give some more guidance about changing link colors.

scipy.integrate.nquad ignoring opts?

I need to compute a numerical (triple) integral, but do not need very high precision on the value, and would therefore like to sacrifice some precision for speed when using nquad. I thought that I might be able to do this by increasing the epsrel and/or epsabs options, but they seem to have no effect. For example (note, this is just an example integrand - I don't actually need to compute this particular integral...):
import numpy as np
from scipy.integrate import nquad
def integrand(l, b, d, sigma=250):
x = d * np.cos(l) * np.cos(b)
y = d * np.sin(l) * np.cos(b)
z = d * np.sin(b)
return np.exp(-0.5 * z**2 / sigma**2) / np.sqrt(2*np.pi * sigma**2)
ranges = [
(0, 2*np.pi),
(0.5, np.pi/2),
(0, 1000.)
]
# No specification of `opts` - use the default epsrel and epsabs:
result1 = nquad(integrand, ranges=ranges, full_output=True)
# Set some `quad` opts:
result2 = nquad(integrand, ranges=ranges, full_output=True,
opts=dict(epsabs=1e-1, epsrel=0, limit=3))
Both outputs are identical:
>>> print(result1)
(4.252394424844468, 1.525272379143154e-12, {'neval': 9261})
>>> print(result2)
(4.252394424844468, 1.525272379143154e-12, {'neval': 9261})
A full example is included here: https://gist.github.com/adrn/b9aa92c236df011dbcdc131aa94ed9f9
Is this not the right approach, or is scipy.integrate ignoring my inputted opts?
From the scipy.integrate.nquad it is stated that opts can only be passed to quad as can be seen here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.nquad.html
Example of application:
import numpy as np
from scipy.integrate import quad
def integrand(a, sigma=250):
x = 2 * np.sin(a) * np.cos(a)
return x
# No specification of `opts` - use the default epsrel and epsabs:
result1 = quad(integrand,0, 2*np.pi)
# Set some `quad` opts:
result2 = quad(integrand,0, 4*np.pi,epsabs=1e-6, epsrel=1e-6, limit=40)
returns:
result1: (-1.3690011097614755e-16, 4.4205541621600365e-14)
result2: (-1.7062635631484713e-15, 9.096805257467047e-14)
The reason nquad doesn't complain about the presence of options is because nquad includes quad, dbquad and tplquad.

Using the deep neural network package "Chainer" to train a simple dataset

I'm trying to use the chainer package for a large project I'm working on. I have read through the tutorial on their website which gives an example of applying it to the MNIST dataset, but it doesn't seem to scale easily to other examples, and there's simply not enough documentation otherwise.
Their example code is as follows:
class MLP(Chain):
def __init__(self, n_units, n_out):
super(MLP, self).__init__(
# the size of the inputs to each layer will be inferred
l1=L.Linear(None, n_units), # n_in -> n_units
l2=L.Linear(None, n_units), # n_units -> n_units
l3=L.Linear(None, n_out), # n_units -> n_out
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
train, test = datasets.get_mnist()
train_iter = iterators.SerialIterator(train, batch_size=5, shuffle=True)
test_iter = iterators.SerialIterator(test, batch_size=2, repeat=False, shuffle=False)
model = L.Classifier(MLP(100, 10)) # the input size, 784, is inferred
optimizer = optimizers.SGD()
optimizer.setup(model)
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (4, 'epoch'), out='result')
trainer.extend(extensions.Evaluator(test_iter, model))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(['epoch', 'main/accuracy', 'validation/main/accuracy']))
trainer.extend(extensions.ProgressBar())
trainer.run()
Could someone point me in the direction of how to simple fit a straight line to a few data points in 2D? If I can understand a simple fit such as this I should be able to scale appropriately.
Thanks for the help!
I pasted simple regression modeling here.
You can use original train data and test data as tuple.
train = (data, label)
Here, data.shape = (Number of data, Number of data dimesion)
And, label.shape = (Number of data,)
Both of their data type should be numpy.float32.
import chainer
from chainer.functions import *
from chainer.links import *
from chainer.optimizers import *
from chainer import training
from chainer.training import extensions
from chainer import reporter
from chainer import datasets
import numpy
class MyNet(chainer.Chain):
def __init__(self):
super(MyNet, self).__init__(
l0=Linear(None, 30, nobias=True),
l1=Linear(None, 1, nobias=True),
)
def __call__(self, x, t):
l0 = self.l0(x)
f0 = relu(l0)
l1 = self.l1(f0)
f1 = flatten(l1)
self.loss = mean_squared_error(f1, t)
reporter.report({'loss': self.loss}, self)
return self.loss
def get_optimizer():
return Adam()
def training_main():
model = MyNet()
optimizer = get_optimizer()
optimizer.setup(model)
train, test = datasets.get_mnist(label_dtype=numpy.float32)
train_iter = chainer.iterators.SerialIterator(train, 50)
test_iter = chainer.iterators.SerialIterator(test, 50,
repeat=False,
shuffle=False)
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (10, 'epoch'))
trainer.extend(extensions.ProgressBar())
trainer.extend(extensions.Evaluator(test_iter, model))
trainer.extend(
extensions.PlotReport(['main/loss', 'validation/main/loss'],
'epoch'))
trainer.run()
if __name__ == '__main__':
training_main()

Exclude items from training set data

I have my data in two colors and excluded_colors.
colors contains all colors
excluded_colors contains some colors that I wish to exclude from my trainingset.
I am trying to split the data into a training and testing set and ensure that the colors in excluded_colors are not in my training set but exist in the testing set.
In order to achieve the above, I did this
var colors = spark.sql("""
select colors.*
from colors
LEFT JOIN excluded_colors
ON excluded_colors.color_id = colors.color_id
where excluded_colors.color_id IS NULL
"""
)
val trainer: (Int => Int) = (arg:Int) => 0
val sqlTrainer = udf(trainer)
val tester: (Int => Int) = (arg:Int) => 1
val sqlTester = udf(tester)
val rsplit = colors.randomSplit(Array(0.7, 0.3))
val train_colors = splits(0).select("color_id").withColumn("test",sqlTrainer(col("color_id")))
val test_colors = splits(1).select("color_id").withColumn("test",sqlTester(col("color_id")))
However, I'm realizing that by doing the above the colors in excluded_colors are completely ignored. They are not even in my testing set.
Question
How can I split the data in 70/30 while also ensuring that the colors in excluded_colors are not in training but are present in testing.
What we want to do is remove the "excluded colors" from the training set but have them in the testing and have a training/test split of 70/30.
What we need is a bit of math.
Given the total dataset (TD) and the excluded colors dataset (E) we can say that for train dataset (Tr) and test dataset (Ts) that:
|Tr| = x * (|TD|-|E|)
|Ts| = |E| + (1-x) * |TD|
We also know that |Tr| = 0.7 |TD|
Hence x = 0.7 |TD| / (|TD| - |E|)
Now that we know the sampling factor x, we can say:
Tr = (TD-E).sample(withReplacement = false, fraction = x)
// where (TD - E) is the result of the SQL expr above
Ts = TD.sample(withReplacement = false, fraction = 0.3)
// we sample the test set from the original dataset