How to fix the order of scale in rCharts nPlot? - visualization

Currently I've got a data set look like this:
df <- data.frame(Time = c("2013-07", "2013-07", "2013-07","2013-10", "2014-01", "2014-05", "2014-05", "2014-05"),
local = "ABC",
Point = c("Point1", "Point2", "Point3", "Point3", "Point3", "Point1", "Point2", "Point3"),
Part1 = c(NaN, NaN, NaN, NaN, NaN, 1, 1, NaN),
Part2 = c(NaN, 2, 11, 4, 2, NaN, 1, 1),
Part3 = c(4, NaN, NaN, NaN, NaN, 1, 1, NaN))
I'm trying to plot a bar plot using rCharts in R studio.
n1 <- nPlot(Part2 ~ Time, group = "Point", data = df, type = "multiBarChart")
n1
The output looks like what I want except one thing.
Ideally the order of x axis should be 2013-07, 2013-10, 2014-01, 2014-05
But the one I got is 2013-07, 2014-05, 2013-10, 2014-01.
I have also tried to convert "Time" variable into a Date format or a POSIXct format. Things turn out to be the same.
So can anybody help me with this?
Is there any help file for rCharts with all possible functions, arguments and customization explanations?
Thanks in advance

I think the primary issue is missing data, which nvd3 does not like. I changed the structure of the data slightly with expand.grid to make sure that there as a point for each date, and in that case nvd3 sorts as expected whether we hand it a number or character date.
Here is the code
library(rCharts)
df <- data.frame(Time = c("2013-07", "2013-07", "2013-07","2013-10", "2014-01", "2014-05", "2014-05", "2014-05"),
local = "ABC",
Point = c("Point1", "Point2", "Point3", "Point3", "Point3", "Point1", "Point2", "Point3"),
Part1 = c(NaN, NaN, NaN, NaN, NaN, 1, 1, NaN),
Part2 = c(NaN, 2, 11, 4, 2, NaN, 1, 1),
Part3 = c(4, NaN, NaN, NaN, NaN, 1, 1, NaN))
#df$Time <- as.Date( paste0(as.character(df$Time),"-01" ) )
df2 <- merge(
structure(expand.grid(unique(df$Time),unique(df$Point)),names=c("Time","Point"))
,df
,all=T
)
#df2[,4:6] <- lapply(df2[,4:6], function(x){ ifelse(is.na(x),0,x) })
n1 <- nPlot(Part2 ~ Time, group = "Point", data = df2, type = "multiBarChart")
n1$xAxis (
#"#! function(d){ return d3.time.format('%Y-%m')(new Date( d*60*60*24*1000 ) ) } !#"
"#! function(d){ return d3.time.format('%Y-%m')(function(d){ return d3.time.format('%Y-%m')(d3.time.format('%Y-%m').parse(d) ) }) } !#"
)
n1

Related

summary row with gtsummary

I am trying to create a table of events with gtsummary and I would like to obtain a final row counting the events of the previous rows. add_overall() and add_n() do add the total but in a column, counting the same event across groups but not the overall events.
I created this example.
x1 <- sample(c("No", "Yes"), 30, replace = TRUE, prob = c(0.85, 0.15))
x2 <- sample(c("No", "Yes"), 30, replace = TRUE, prob = c(0.9, 0.1))
x3 <- sample(c("No", "Yes"), 30, replace = TRUE, prob = c(0.75, 0.25))
y <- sample(c("A", "B"), 30, replace = TRUE, prob = c(0.5, 0.5))
df <- data.frame(as_factor(x1), as_factor(x2), as_factor(x3), as_factor(y))
colnames(df) <-c("event_1", "event_2", "event_3", "group")
tbl_summary(df, by=group, statistic = all_categorical() ~ "{n}")
example
I tried using summary_rows() function from gt package after converting the table to a gt object but there is an error when summarising because these variables are factors.
Any other ideas?
You can do this by adding a new variable to your data frame that is the row sum of each of the events. Then you can display that variable's sum in the summary table. Example below!
library(gtsummary)
#> #Uighur
library(tidyverse)
df <-
data.frame(
event_1 = sample(c(FALSE, TRUE), 30, replace = TRUE, prob = c(0.85, 0.15)),
event_2 = sample(c(FALSE, TRUE), 30, replace = TRUE, prob = c(0.9, 0.1)),
event_3 = sample(c(FALSE, TRUE), 30, replace = TRUE, prob = c(0.75, 0.25)),
group = sample(c("A", "B"), 30, replace = TRUE, prob = c(0.5, 0.5))
) |>
rowwise() |>
mutate(Total = sum(event_1, event_2, event_3))
tbl_summary(
df,
by = group,
type = Total ~ "continuous",
statistic =
list(all_categorical() ~ "{n}",
all_continuous() ~ "{sum}")
) |>
as_kable() # convert to kable to display on stack overflow
Characteristic
A, N = 16
B, N = 14
event_1
4
4
event_2
1
2
event_3
7
6
Total
12
12
Created on 2023-01-12 with reprex v2.0.2
Thank you so much (great package gtsummary). That works! I had some trouble summing over factors. If variables are factors the code
mutate(Total = sum(event_1=="Yes", event_2=="Yes", event_3=="Yes"))
does it.

Polars Dataframe: Apply MinMaxScaler to a column with condition

I am trying to perform the following operation in Polars.
For value in column B which is below 80 will be scaled between 1 and 4, where as for anything above 80, will be set as 5.
df_pandas = pd.DataFrame(
{
"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"B": [50, 300, 80, 12, 105, 78, 66, 42, 61.5, 35],
}
)
test_scaler = MinMaxScaler(feature_range=(1,4)) # from sklearn.preprocessing
df_pandas.loc[df_pandas['B']<80, 'Test'] = test_scaler.fit_transform(df_pandas.loc[df_pandas['B']<80, "B"].values.reshape(-1,1))
df_pandas = df_pandas.fillna(5)
This is what I did with Polars:
# dt is a dictionary
dt = df.filter(
pl.col('B')<80
).to_dict(as_series=False)
below_80 = list(dt.keys())
dt_scale = list(
test_scaler.fit_transform(
np.array(dt['B']).reshape(-1,1)
).reshape(-1) # reshape back to one dimensional
)
# reassign to dictionary dt
dt['B'] = dt_scale
dt_scale_df = pl.DataFrame(dt)
dt_scale_df
dummy = df.join(
dt_scale_df, how="left", on="A"
).fill_null(5)
dummy = dummy.rename({"B_right": "Test"})
Result:
A
B
Test
1
50.0
2.727273
2
300.0
5.000000
3
80.0
5.000000
4
12.0
1.000000
5
105.0
5.000000
6
78.0
4.000000
7
66.0
3.454545
8
42.0
2.363636
9
61.5
3.250000
10
35.0
2.045455
Is there a better approach for this?
Alright, I have got 3 examples for you that should help you from which the last should be preferred.
Because you only want to apply your scaler to a part of a column, we should ensure we only send that part of the data to the scaler. This can be done by:
window function over a partition
partition_by
when -> then -> otherwise + min_max expression
Window function over partititon
This requires a python function that will be applied over the partitions. In the function itself we then have to check in which partition we are and deal with it accordingly.
df = pl.from_pandas(df_pandas)
min_max_sc = MinMaxScaler((1, 4))
def my_scaler(s: pl.Series) -> pl.Series:
if s.len() > 0 and s[0] > 80:
out = (s * 0 + 5)
else:
out = pl.Series(min_max_sc.fit_transform(s.to_numpy().reshape(-1, 1)).flatten())
# ensure all types are the same
return out.cast(pl.Float64)
df.with_column(
pl.col("B").apply(my_scaler).over(pl.col("B") < 80).alias("Test")
)
partition_by
This partitions the the original dataframe to a dictionary holding the different partitions. We then only modify the partitions as needed.
parts = (df
.with_column((pl.col("B") < 80).alias("part"))
.partition_by("part", as_dict=True)
)
parts[True] = parts[True].with_column(
pl.col("B").map(
lambda s: pl.Series(min_max_sc.fit_transform(s.to_numpy().reshape(-1, 1)).flatten())
).alias("Test")
)
parts[False] = parts[False].with_column(
pl.lit(5.0).alias("Test")
)
pl.concat([df for df in parts.values()]).select(pl.all().exclude("part"))
when -> then -> otherwise + min_max expression
This one I like best. We can make function that creates a polars expression that is the min_max scaling function you need. This will have best performance.
def min_max_scaler(col: str, predicate: pl.Expr):
x = pl.col(col)
x_min = x.filter(predicate).min()
x_max = x.filter(predicate).max()
# * 3 + 1 to set scale between 1 - 4
return (x - x_min) / (x_max - x_min) * 3 + 1
predicate = pl.col("B") < 80
df.with_column(
pl.when(predicate)
.then(min_max_scaler("B", predicate))
.otherwise(5).alias("Test")
)

How to pick an element from matrix (list of list in python) based on decision variables (one for row, and one for column) | OR-Tools, Python

I am new to constraint programming and OR-Tools. A brief about the problem. There are 8 positions, for each position I need to decide which move of type A (move_A) and which move of type B (move_B) should be selected such that the value achieved from the combination of the 2 moves (at each position) is maximized. (This is only a part of the bigger problem though). And I want to use AddElement approach to do the sub setting.
Please see the below attempt
from ortools.sat.python import cp_model
model = cp_model.CpModel()
# value achieved from combination of different moves of type A
# (moves_A (rows)) and different moves of type B (moves_B (columns))
# for e.g. 2nd move of type A and 3rd move of type B will give value = 2
value = [
[ -1, 5, 3, 2, 2],
[ 2, 4, 2, -1, 1],
[ 4, 4, 0, -1, 2],
[ 5, 1, -1, 2, 2],
[ 0, 0, 0, 0, 0],
[ 2, 1, 1, 2, 0]
]
# 6 moves of type A
num_moves_A = len(value)
# 5 moves of type B
num_moves_B = len(value[0])
num_positions = 8
type_move_A_position = [model.NewIntVar(0, num_moves_A - 1, f"move_A[{i}]") for i in range(num_positions)]
type_move_B_position = [model.NewIntVar(0, num_moves_B - 1, f"move_B[{i}]") for i in range(num_positions)]
value_position = [model.NewIntVar(0, 10, f"value_position[{i}]") for i in range(num_positions)]
# I am getting an error when I run the below
objective_terms = []
for i in range(num_positions):
model.AddElement(type_move_B_position[i], value[type_move_A_position[i]], value_position[i])
objective_terms.append(value_position[i])
The error is as follows:
Traceback (most recent call last):
File "<ipython-input-65-3696379ce410>", line 3, in <module>
model.AddElement(type_move_B_position[i], value[type_move_A_position[i]], value_position[i])
TypeError: list indices must be integers or slices, not IntVar
In MiniZinc the below code would have worked
var int: obj = sum(i in 1..num_positions ) (value [type_move_A_position[i], type_move_B_position[i]])
I know in OR-Tools we will have to create some intermediary variables to store results first, so the above approach of minizinc will not work. But I am struggling to do so.
I can always create a 2 matrix of binary binary variables one for num_moves_A * num_positions and the other for num_moves_B * num_positions, add re;evant constraints and achieve the purpose
But I want to learn how to do the same thing via AddElement constraint
Any help on how to re-write the AddElement snippet is highly appreciated. Thanks.
AddElement is 1D only.
The way it is translated from minizinc to CP-SAT is to create an intermediate variable p == index1 * max(index2) + index2 and use it in an element constraint with a flattened matrix.
Following Laurent's suggestion (using AddElement constraint):
from ortools.sat.python import cp_model
model = cp_model.CpModel()
# value achieved from combination of different moves of type A
# (moves_A (rows)) and different moves of type B (moves_B (columns))
# for e.g. 2 move of type A and 3 move of type B will give value = 2
value = [
[-1, 5, 3, 2, 2],
[2, 4, 2, -1, 1],
[4, 4, 0, -1, 2],
[5, 1, -1, 2, 2],
[0, 0, 0, 0, 0],
[2, 1, 1, 2, 0],
]
min_value = min([min(i) for i in value])
max_value = max([max(i) for i in value])
# 6 moves of type A
num_moves_A = len(value)
# 5 moves of type B
num_moves_B = len(value[0])
# number of positions
num_positions = 5
# flattened matrix of values
value_flat = [value[i][j] for i in range(num_moves_A) for j in range(num_moves_B)]
# flattened indices
flatten_indices = [
index1 * len(value[0]) + index2
for index1 in range(len(value))
for index2 in range(len(value[0]))
]
type_move_A_position = [
model.NewIntVar(0, num_moves_A - 1, f"move_A[{i}]") for i in range(num_positions)
]
model.AddAllDifferent(type_move_A_position)
type_move_B_position = [
model.NewIntVar(0, num_moves_B - 1, f"move_B[{i}]") for i in range(num_positions)
]
model.AddAllDifferent(type_move_B_position)
# below intermediate decision variable is created which
# will store index corresponding to the selected move of type A and
# move of type B for each position
# this will act as index in the AddElement constraint
flatten_index_num = [
model.NewIntVar(0, len(flatten_indices), f"flatten_index_num[{i}]")
for i in range(num_positions)
]
# another intermediate decision variable is created which
# will store value corresponding to the selected move of type A and
# move of type B for each position
# this will act as the target in the AddElement constraint
value_position_index_num = [
model.NewIntVar(min_value, max_value, f"value_position_index_num[{i}]")
for i in range(num_positions)
]
objective_terms = []
for i in range(num_positions):
model.Add(
flatten_index_num[i]
== (type_move_A_position[i] * len(value[0])) + type_move_B_position[i]
)
model.AddElement(flatten_index_num[i], value_flat, value_position_index_num[i])
objective_terms.append(value_position_index_num[i])
model.Maximize(sum(objective_terms))
# Solve
solver = cp_model.CpSolver()
status = solver.Solve(model)
solver.ObjectiveValue()
for i in range(num_positions):
print(
str(i)
+ "--"
+ str(solver.Value(type_move_A_position[i]))
+ "--"
+ str(solver.Value(type_move_B_position[i]))
+ "--"
+ str(solver.Value(value_position_index_num[i]))
)
The below version uses AddAllowedAssignments constraint to achieve the same purpose (per Laurent's alternate approach) :
from ortools.sat.python import cp_model
model = cp_model.CpModel()
# value achieved from combination of different moves of type A
# (moves_A (rows)) and different moves of type B (moves_B (columns))
# for e.g. 2 move of type A and 3 move of type B will give value = 2
value = [
[-1, 5, 3, 2, 2],
[2, 4, 2, -1, 1],
[4, 4, 0, -1, 2],
[5, 1, -1, 2, 2],
[0, 0, 0, 0, 0],
[2, 1, 1, 2, 0],
]
min_value = min([min(i) for i in value])
max_value = max([max(i) for i in value])
# 6 moves of type A
num_moves_A = len(value)
# 5 moves of type B
num_moves_B = len(value[0])
# number of positions
num_positions = 5
type_move_A_position = [
model.NewIntVar(0, num_moves_A - 1, f"move_A[{i}]") for i in range(num_positions)
]
model.AddAllDifferent(type_move_A_position)
type_move_B_position = [
model.NewIntVar(0, num_moves_B - 1, f"move_B[{i}]") for i in range(num_positions)
]
model.AddAllDifferent(type_move_B_position)
value_position = [
model.NewIntVar(min_value, max_value, f"value_position[{i}]")
for i in range(num_positions)
]
tuples_list = []
for i in range(num_moves_A):
for j in range(num_moves_B):
tuples_list.append((i, j, value[i][j]))
for i in range(num_positions):
model.AddAllowedAssignments(
[type_move_A_position[i], type_move_B_position[i], value_position[i]],
tuples_list,
)
model.Maximize(sum(value_position))
# Solve
solver = cp_model.CpSolver()
status = solver.Solve(model)
solver.ObjectiveValue()
for i in range(num_positions):
print(
str(i)
+ "--"
+ str(solver.Value(type_move_A_position[i]))
+ "--"
+ str(solver.Value(type_move_B_position[i]))
+ "--"
+ str(solver.Value(value_position[i]))
)

Google OR-Tools doesn't find solution on VRPtw problem

I'm tackling with VRPtw problem and struggling that the solver finds no solution with any data except for artificial small one.
The setting is as below.
There are several depots and locations to visit. Each locations have the time-window. Each vehicles have break time and work time. Also, the locations have some constraints and only the vehicles which satisfy that demand can visit there.
Based on this experiment setting, I wrote the code below.
As I wrote, it looks that it is working with small artificial data, but with real data, it never found the solution. I tried with 5 different data sets.
Although I set the 7200 sec time limit, previously I ran for longer than 10 hours and it was same.
The data's scale is 40~50 vehicles and 200~300 locations.
Does this code have a problem? If not, on what kind of order, should I change the approach(such as initialization, searching method and so on)?
(Edited to use integer for time matrix)
from dataclasses import dataclass
from typing import List, Tuple
from ortools.constraint_solver import pywrapcp
from ortools.constraint_solver import routing_enums_pb2
# TODO: Refactor
BIG_ENOUGH = 100000000
TIME_DIMENSION = 'Time'
TIME_LIMIT = 7200
#dataclass
class DataSet:
time_matrix: List[List[int]]
locations_num: int
vehicles_num: int
vehicles_break_time_window: List[Tuple[int, int, int]]
vehicles_work_time_windows: List[Tuple[int, int]]
location_time_windows: List[Tuple[int, int]]
vehicles_depots_indices: List[int]
possible_vehicles: List[List[int]]
def execute(data: DataSet):
manager = pywrapcp.RoutingIndexManager(data.locations_num,
data.vehicles_num,
data.vehicles_depots_indices,
data.vehicles_depots_indices)
routing_parameters = pywrapcp.DefaultRoutingModelParameters()
routing_parameters.solver_parameters.trace_propagation = True
routing_parameters.solver_parameters.trace_search = True
routing = pywrapcp.RoutingModel(manager, routing_parameters)
def time_callback(source_index, dest_index):
from_node = manager.IndexToNode(source_index)
to_node = manager.IndexToNode(dest_index)
return data.time_matrix[from_node][to_node]
transit_callback_index = routing.RegisterTransitCallback(time_callback)
routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)
routing.AddDimension(
transit_callback_index,
BIG_ENOUGH,
BIG_ENOUGH,
False,
TIME_DIMENSION)
time_dimension = routing.GetDimensionOrDie(TIME_DIMENSION)
# set time window for locations start time
# set condition restrictions
possible_vehicles = data.possible_vehicles
for location_idx, time_window in enumerate(data.location_time_windows):
index = manager.NodeToIndex(location_idx + data.vehicles_num)
time_dimension.CumulVar(index).SetRange(time_window[0], time_window[1])
routing.SetAllowedVehiclesForIndex(possible_vehicles[location_idx], index)
solver = routing.solver()
for i in range(data.vehicles_num):
routing.AddVariableMinimizedByFinalizer(
time_dimension.CumulVar(routing.Start(i)))
routing.AddVariableMinimizedByFinalizer(
time_dimension.CumulVar(routing.End(i)))
# set work time window for vehicles
for vehicle_index, work_time_window in enumerate(data.vehicles_work_time_windows):
start_index = routing.Start(vehicle_index)
time_dimension.CumulVar(start_index).SetRange(work_time_window[0],
work_time_window[0])
end_index = routing.End(vehicle_index)
time_dimension.CumulVar(end_index).SetRange(work_time_window[1],
work_time_window[1])
# set break time for vehicles
node_visit_transit = {}
for n in range(routing.Size()):
if n >= data.locations_num:
node_visit_transit[n] = 0
else:
node_visit_transit[n] = 1
break_intervals = {}
for v in range(data.vehicles_num):
vehicle_break = data.vehicles_break_time_window[v]
break_intervals[v] = [
solver.FixedDurationIntervalVar(vehicle_break[0],
vehicle_break[1],
vehicle_break[2],
True,
'Break for vehicle {}'.format(v))
]
time_dimension.SetBreakIntervalsOfVehicle(
break_intervals[v], v, node_visit_transit
)
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = (
routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
search_parameters.local_search_metaheuristic = (
routing_enums_pb2.LocalSearchMetaheuristic.GREEDY_DESCENT)
search_parameters.time_limit.seconds = TIME_LIMIT
search_parameters.log_search = True
solution = routing.SolveWithParameters(search_parameters)
return solution
if __name__ == '__main__':
data = DataSet(
time_matrix=[[0, 0, 4, 5, 5, 6],
[0, 0, 6, 4, 5, 5],
[1, 3, 0, 6, 5, 4],
[2, 1, 6, 0, 5, 4],
[2, 2, 5, 5, 0, 6],
[3, 2, 4, 4, 6, 0]],
locations_num=6,
vehicles_num=2,
vehicles_depots_indices=[0, 1],
vehicles_work_time_windows=[(720, 1080), (720, 1080)],
vehicles_break_time_window=[(720, 720, 15), (720, 720, 15)],
location_time_windows=[(735, 750), (915, 930), (915, 930), (975, 990)],
possible_vehicles=[[0], [1], [0], [1]]
)
solution = execute(data)
if solution is not None:
print("solution is found")

Find a text and replace it with a value in Matlab

I have some data which look like this:
I would like to pre-process the data in a way that I replace all Mostly false with 1, Mostly true with 2 and Definitely true w/ 3. Is there a find and replace command or what is the best way of doing this?
You can use a map object to do the mapping
m = containers.Map( {'Mostly false', 'Mostly true', 'Definitely true'}, ...
{ 1, 2, 3} );
Then for some example data
data = {'Mostly false', 'Mostly false', 'Mostly true', 'Mostly false', 'Definitely true'};
You can perform the conversion with
data = m.values( data );
% >> data = {1, 1, 2, 1, 3}
This assumes there will always be a match in your map.
Alternatively, you could do the operation manually (for the same example data), this will leave non-matches unaltered, and you could use strcmpi for case-insensitivity:
c = {'Mostly false', 'Mostly true', 'Definitely true'
1, 2, 3};
for ii = 1:size(c,2)
% Make the replacement for each column in 'c'
data( strcmp( data, c{1,ii} ) ) = c(2,ii);
end