I am trying to solve VRP problem. I am following this file that runs smooth as silk.
The problem is that, when I modify it to fit my problem, it does not seems to work correctly. I don't know where the difference may lay.
Some details:
15 vehicles.
7 bases (vehicles depart from these bases).
10 demands locations (17 nodes in total).
Costs for all vehicles are the same.
Demands are all constant and equal to 1.
The duration of the service varies depending on the location.
The time window available for each location is the same (8 am to 5 pm). So there is no tight schedule, as long as the places are visited within the work shift.
The problem:
When I set the capacity of my vehicle to a low value (say, 2), then several vehicles go out to solve some task (with a maximum of 2 per vehicle, of course). This works fine, and the output is satisfactory. However, this is not the scenario I am looking for. Actually, my vehicles have the ability to perform as many tasks as they can, as long as the total time required for those tasks does not exceed the number of daily working hours (8am-5pm). However, if I set my capabilities to 100 (let's say infinity for this case), the solution indicates that only one vehicle comes out and it does all the tasks, even though all the tasks add up to 31 hours!
Questions:
What parameters am I using wrong?
Why does start_fn is not used anywhere, does it have to do with this?
will it be related to the first_solution_strategy? This shouldn't be the cause, right? The contradiction is the vehicle working 31 hours, when the maximum period set is 9 hours.
I don't understand why when we do: routing.AddDimension(.....'time') we put customers.time_horizon twice.
Why is the duration of the route equal to zero in all cases? This can be right.
I am lost with this. If someone can help me I will be very grateful
So, Here is my Code:
import numpy as np
from collections import namedtuple
from ortools.constraint_solver import pywrapcp
from ortools.constraint_solver import routing_enums_pb2
from datetime import timedelta
class Ordenes():
def __init__(self,
lats_in,
lons_in,
num_bases,
distmap,
ot_time,
num_stops,
demands,
time_horizon = 24 * 60 * 60): # A 24 hour period.
self.number = num_stops #: The number of customers and depots
self.distmap = distmap #: the distance matrix
stops = np.array(range(0, num_stops)) # The 'name' of the stop, indexed from 0 to num_stops-1
self.num_bases = num_bases
self.time_horizon = time_horizon
lats, lons = lats_in,lons_in
# timedeltas: Demands locations are available from 8am to 6pm.
stime, etime = int(8 * 60 * 60), int(18 * 60 * 60)
start_times = [timedelta(seconds = stime) for idx in range(self.number)]
stop_times = [timedelta(seconds = etime) for idx in range(self.number)]
# A named tuple for the Orden
Orden = namedtuple(
'Orden',
[
'index' , # the index of the stop
'demand' , # the demand for the stop
'lat' , # the latitude of the stop
'lon' , # the longitude of the stop
'tw_open', # timedelta window open
'tw_close'
]) # timedelta window cls
ziped_ = zip(stops, demands, lats, lons, start_times, stop_times)
self.ordenes = [
Orden(idx, dem, lat, lon, tw_open, tw_close)
for idx, dem, lat, lon, tw_open, tw_close in ziped_]
# The number of seconds needed to 'unload' 1 unit of goods its variable.
self.service_time_per_dem = ot_time # seconds
def set_manager(self, manager):
self.manager = manager
def central_start_node(self):
return range(self.num_bases)
def make_distance_mat(self):
return (self.distmap)
def get_total_demand(self):
return (sum([c.demand for c in self.ordenes]))
def return_dist_callback(self, **kwargs):
self.make_distance_mat()
def dist_return(from_index, to_index):
# Convert from routing variable Index to distance matrix NodeIndex.
from_node = self.manager.IndexToNode(from_index)
to_node = self.manager.IndexToNode(to_index)
return (self.distmat[from_node][to_node])
return dist_return
def return_dem_callback(self):
def dem_return(from_index):
# Convert from routing variable Index to distance matrix NodeIndex.
from_node = self.manager.IndexToNode(from_index)
return (self.ordenes[from_node].demand)
return dem_return
def zero_depot_demands(self, depot):
start_depot = self.ordenes[depot]
self.ordenes[depot] = start_depot._replace(
demand = 0 ,
tw_open = None,
tw_close = None)
return
def make_service_time_call_callback(self):
def service_time_return(a, b):
return (self.service_time_per_dem[a])
return service_time_return
def make_transit_time_callback(self, speed_kmph=30):
def transit_time_return(a, b):
return (self.distmat[a][b] / (speed_kmph * 1 / 3600))
return transit_time_return
class Cuadrillas():
def __init__(self,
capacities,
costs,
start_nodes):
Cuadrilla = namedtuple(
'Cuadrilla',
['index','capacity','cost'])
self.start_nodes = start_nodes
self.number = np.size(capacities)
idxs = np.array(range(0, self.number))
zip_in_ = zip(idxs, capacities, costs)
self.cuadrillas = [Cuadrilla(idx, cap, cost) for idx, cap, cost in zip_in_ ]
def get_total_capacity(self):
return (sum([c.capacity for c in self.cuadrillas]))
def return_starting_callback(self, ordenes):
# create a different starting and finishing depot for each vehicle
self.starts = self.start_nodes
self.ends = self.starts
# the depots will not have demands, so zero them.
for depot in self.starts:
ordenes.zero_depot_demands(depot)
for depot in self.ends:
ordenes.zero_depot_demands(depot)
def start_return(v):
return (self.starts[v])
return start_return
def vehicle_output_string(manager, routing, plan):
"""
Return a string displaying the output of the routing instance and
assignment (plan).
Args: routing (ortools.constraint_solver.pywrapcp.RoutingModel): routing.
plan (ortools.constraint_solver.pywrapcp.Assignment): the assignment.
Returns:
(string) plan_output: describing each vehicle's plan.
(List) dropped: list of dropped orders.
"""
dropped = []
for order in range(routing.Size()):
if (plan.Value(routing.NextVar(order)) == order):
dropped.append(str(order))
capacity_dimension = routing.GetDimensionOrDie('Capacity')
time_dimension = routing.GetDimensionOrDie('Time')
plan_output = ''
for route_number in range(routing.vehicles()):
order = routing.Start(route_number)
plan_output += 'Route {0}:'.format(route_number)
if routing.IsEnd(plan.Value(routing.NextVar(order))):
plan_output += ' Empty \n'
else:
while True:
load_var = capacity_dimension.CumulVar(order)
time_var = time_dimension.CumulVar(order)
node = manager.IndexToNode(order)
plan_output += \
' {node} Load({load}) Time({tmin}, {tmax}) -> '.format(
node=node,
load=plan.Value(load_var),
tmin=str(timedelta(seconds=plan.Min(time_var))),
tmax=str(timedelta(seconds=plan.Max(time_var))))
if routing.IsEnd(order):
plan_output += ' EndRoute {0}. \n'.format(route_number)
break
order = plan.Value(routing.NextVar(order))
plan_output += '\n'
return (plan_output, dropped)
# [START solution_printer]
def print_solution(manager, routing, assignment):
"""Prints solution on console."""
print(f'Objective: {assignment.ObjectiveValue()}')
# Display dropped nodes.
dropped_nodes = 'Dropped nodes:'
for index in range(routing.Size()):
if routing.IsStart(index) or routing.IsEnd(index):
continue
if assignment.Value(routing.NextVar(index)) == index:
node = manager.IndexToNode(index)
if node > 16:
original = node
while original > 16:
original = original - 16
dropped_nodes += f' {node}({original})'
else:
dropped_nodes += f' {node}'
print(dropped_nodes)
# Display routes
time_dimension = routing.GetDimensionOrDie('Time')
total_time = 0
for vehicle_id in range(manager.GetNumberOfVehicles()):
plan_output = f'Route for vehicle {vehicle_id}:\n'
index = routing.Start(vehicle_id)
start_time = 0
while not routing.IsEnd(index):
time_var = time_dimension.CumulVar(index)
node = manager.IndexToNode(index)
if node > 16:
original = node
while original > 16:
original = original - 16
plan_output += f'{node}({original})'
else:
plan_output += f'{node}'
plan_output += f' Time:{assignment.Value(time_var)} -> '
if start_time == 0:
start_time = assignment.Value(time_var)
index = assignment.Value(routing.NextVar(index))
time_var = time_dimension.CumulVar(index)
node = manager.IndexToNode(index)
plan_output += f'{node} Time:{assignment.Value(time_var)}\n'
end_time = assignment.Value(time_var)
duration = end_time - start_time
plan_output += f'Duration of the route:{duration}min\n'
print(plan_output)
total_time += duration
print(f'Total duration of all routes: {total_time}min')
# [END solution_printer]
def main():
# coordinates
lats = [-45.80359358,-45.76451539,-45.80393496,-45.7719334,-45.76607548,
-45.89857917,-45.70923876,-46.10321727,-45.81709206,-46.27827033,
-45.67994619,-45.73426141,-45.89791315,-45.74206645,-46.226577,
-46.08164013,-45.98688936]
lons = [-68.20091669, -68.0438965, -68.67399508, -68.11662549, -68.17842196
-68.32238459, -68.23153574, -68.74653904, -68.7490935 , -68.88576051,
-68.28244657, -68.29355024, -68.52404867, -68.92559956, -69.00577607,
-68.51192289, -68.65117288]
# Demand duration
ot_time = [0, 0, 0, 0, 0, 0, 0, 5400, 5400, 43200, 2520, 2520, 5400, 2520, 12600, 2520, 2520]
print(np.sum(ot_time)/3600)
# Number of Stops:
num_stops = 17
# Number of bases: The first 7 nodes
num_bases = 7
# demands: ONly one demand for each spot
demands = np.ones(num_stops)
# Distance matrix:
distmat =[
[ 0, 0, 22, 0, 0, 0, 0, 35, 35, 69, 1, 1, 25, 54, 61, 35, 35],
[ 0, 0, 22, 0, 0, 0, 0, 35, 35, 69, 1, 1, 25, 54, 62, 35, 35],
[22, 22, 0, 22, 22, 22, 22, 20, 21, 51, 20, 20, 13, 32, 46, 20, 21],
[ 0, 0, 22, 0, 0, 0, 0, 35, 35, 69, 1, 1, 25, 54, 61, 35, 35],
[ 0, 0, 22, 0, 0, 0, 0, 35, 35, 69, 1, 1, 25, 54, 61, 35, 35],
[ 0, 0, 22, 0, 0, 0, 0, 35, 35, 69, 1, 1, 25, 54, 62, 35, 35],
[ 0, 0, 22, 0, 0, 0, 0, 35, 35, 69, 1, 1, 25, 54, 62, 35, 35],
[35, 35, 20, 35, 35, 35, 35, 0, 0, 33, 34, 34, 9, 27, 26, 0, 0],
[35, 35, 21, 35, 35, 35, 35, 0, 0, 33, 34, 34, 9, 27, 26, 0, 0],
[69, 69, 51, 69, 69, 69, 69, 33, 33, 0, 68, 68, 43, 31, 10, 33, 33],
[ 1, 1, 20, 1, 1, 1, 1, 34, 34, 68, 0, 0, 25, 52, 61, 34, 34],
[ 1, 1, 20, 1, 1, 1, 1, 34, 34, 68, 0, 0, 25, 52, 61, 34, 34],
[25, 25, 13, 25, 25, 25, 25, 9, 9, 43, 25, 25, 0, 32, 36, 9, 9],
[54, 54, 32, 54, 54, 54, 54, 27, 27, 31, 52, 52, 32, 0, 33, 27, 27],
[61, 62, 46, 61, 61, 62, 62, 26, 26, 10, 61, 61, 36, 33, 0, 26, 26],
[35, 35, 20, 35, 35, 35, 35, 0, 0, 33, 34, 34, 9, 27, 26, 0, 0],
[35, 35, 21, 35, 35, 35, 35, 0, 0, 33, 34, 34, 9, 27, 26, 0, 0]]
# Create a set of customer, (and depot) stops.
customers = Ordenes(
lats_in = lats,
lons_in = lons,
num_bases = num_bases,
distmap = distmat,
ot_time = ot_time,
demands = demands,
num_stops = num_stops)
# All vehicule capacities are the same ---> 15 vehicules
veh_cap = 2 #100 does not work, why?
capacity = [int(x) for x in np.ones(15)*veh_cap]
# Constant cost for all vehicules: 100
cost = [int(x) for x in np.ones(15)*100]
# Get the starting nodes of "cuadrillas"
start_n = [
0,# vehicule 1 departs from and returns to base 0
1,# vehicule 2 departs from and returns to base 1
2,# vehicule 3 departs from and returns to base 2
2,# vehicule 4 departs from and returns to base 2
2,# vehicule 5 departs from and returns to base 2
3,# vehicule 6 departs from and returns to base 3
3,# vehicule 7 departs from and returns to base 3
4,# vehicule 8 departs from and returns to base 4
4,# vehicule 9 departs from and returns to base 4
4,# vehicule 10 departs from and returns to base 4
5,# vehicule 11 departs from and returns to base 5
5,# vehicule 12 departs from and returns to base 5
6,# vehicule 13 departs from and returns to base 6
6,# vehicule 14 departs from and returns to base 6
6]# vehicule 15 departs from and returns to base 6
# Get penalties for dropping nodes
penalty = [
0,
0,
0,
0,
0,
0,
0,
8636,
8636,
8596,
8571,
8571,
8556,
8551,
8495,
8490,
8490]
# Create a set of cuadrillas, the number set by the length of capacity.
vehicles = Cuadrillas(
capacities = capacity,
costs = cost,
start_nodes = start_n)
# Set the starting nodes, and create a callback fn for the starting node.
start_fn = vehicles.return_starting_callback(customers)
# Create the routing index manager.
manager = pywrapcp.RoutingIndexManager(
customers.number, # int number
vehicles.number, # int number
vehicles.starts, # List of int start depot
vehicles.ends) # List of int end depot
# Get customers a manager attribute
customers.set_manager(manager)
# Set model parameters
model_parameters = pywrapcp.DefaultRoutingModelParameters()
# Make the routing model instance.
routing = pywrapcp.RoutingModel(manager, model_parameters)
parameters = pywrapcp.DefaultRoutingSearchParameters()
# Setting first solution heuristic (cheapest addition).
parameters.first_solution_strategy = (
routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
# Routing: forbids use of TSPOpt neighborhood, (this is the default behaviour)
parameters.local_search_operators.use_tsp_opt = pywrapcp.BOOL_FALSE
# Disabling Large Neighborhood Search, (this is the default behaviour)
parameters.local_search_operators.use_path_lns = pywrapcp.BOOL_FALSE
parameters.local_search_operators.use_inactive_lns = pywrapcp.BOOL_FALSE
parameters.time_limit.seconds = 20
parameters.use_full_propagation = True
#parameters.log_search = True
# Create callback fns for distances, demands, service and transit-times.
dist_fn = customers.return_dist_callback()
dist_fn_index = routing.RegisterTransitCallback(dist_fn)
dem_fn = customers.return_dem_callback()
dem_fn_index = routing.RegisterUnaryTransitCallback(dem_fn)
# Create and register a transit callback.
serv_time_fn = customers.make_service_time_call_callback()
transit_time_fn = customers.make_transit_time_callback()
def tot_time_fn(from_index, to_index):
"""
The time function we want is both transit time and service time.
"""
# Convert from routing variable Index to distance matrix NodeIndex.
from_node = manager.IndexToNode(from_index)
to_node = manager.IndexToNode(to_index)
return serv_time_fn(from_node, to_node) + transit_time_fn(from_node, to_node)
tot_time_fn_index = routing.RegisterTransitCallback(tot_time_fn)
# Set the cost function (distance callback) for each arc, homogeneous for
# all vehicles.
routing.SetArcCostEvaluatorOfAllVehicles(dist_fn_index)
# Set vehicle costs for each vehicle, not homogeneous.
for veh in vehicles.cuadrillas:
routing.SetFixedCostOfVehicle(veh.cost, int(veh.index))
# Add a dimension for vehicle capacities
null_capacity_slack = 0
routing.AddDimensionWithVehicleCapacity(
dem_fn_index, # demand callback
null_capacity_slack,
capacity, # capacity array
True,
'Capacity')
# Add a dimension for time and a limit on the total time_horizon
routing.AddDimension(
tot_time_fn_index, # total time function callback
customers.time_horizon, #28800, # 8am
customers.time_horizon, #61200, # 5pm
True,
'Time')
time_dimension = routing.GetDimensionOrDie('Time')
for cust in customers.ordenes:
if cust.tw_open is not None:
time_dimension.CumulVar(
manager.NodeToIndex(cust.index)).SetRange(
cust.tw_open.seconds,
cust.tw_close.seconds)
"""
To allow the dropping of orders, we add disjunctions to all the customer
nodes. Each disjunction is a list of 1 index, which allows that customer to
be active or not, with a penalty if not. The penalty should be larger
than the cost of servicing that customer, or it will always be dropped!
"""
# To add disjunctions just to the customers, make a list of non-depots.
non_depot = set(range(customers.number))
non_depot.difference_update(vehicles.starts) # removes the items that exist in both sets.
non_depot.difference_update(vehicles.ends) # removes the items that exist in both sets.
nodes = [routing.AddDisjunction([manager.NodeToIndex(c)], penalty[c]) for c in non_depot]
# Solve the problem !
assignment = routing.SolveWithParameters(parameters)
# The rest is all optional for saving, printing or plotting the solution.
if assignment:
print('The Objective Value is {0}'.format(assignment.ObjectiveValue()))
plan_output, dropped = vehicle_output_string(manager, routing, assignment)
print(plan_output)
print('dropped nodes: ' + ', '.join(dropped))
print("\n#####################")
print("#####################")
print_solution(manager, routing, assignment)
else:
print('No assignment')
return # main
if __name__ == '__main__':
main()
The output using capacity equal 100.
Objective: 100
Dropped nodes:
Route for vehicle 0:
0 Time:0 -> 0 Time:0
Duration of the route:0min
Route for vehicle 1:
1 Time:0 -> 1 Time:0
Duration of the route:0min
Route for vehicle 2:
2 Time:0 -> 2 Time:0
Duration of the route:0min
Route for vehicle 3:
2 Time:0 -> 2 Time:0
Duration of the route:0min
Route for vehicle 4:
2 Time:0 -> 2 Time:0
Duration of the route:0min
Route for vehicle 5:
3 Time:0 -> 3 Time:0
Duration of the route:0min
Route for vehicle 6:
3 Time:0 -> 3 Time:0
Duration of the route:0min
Route for vehicle 7:
4 Time:0 -> 4 Time:0
Duration of the route:0min
Route for vehicle 8:
4 Time:0 -> 4 Time:0
Duration of the route:0min
Route for vehicle 9:
4 Time:0 -> 4 Time:0
Duration of the route:0min
Route for vehicle 10:
5 Time:0 -> 5 Time:0
Duration of the route:0min
Route for vehicle 11:
5 Time:0 -> 5 Time:0
Duration of the route:0min
Route for vehicle 12:
6 Time:0 -> 6 Time:0
Duration of the route:0min
Route for vehicle 13:
6 Time:0 -> 6 Time:0
Duration of the route:0min
Route for vehicle 14:
6 Time:0 -> 16 Time:0 -> 15 Time:28800 -> 14 Time:28800 -> 13 Time:28800 -> 12 Time:28800 -> 11 Time:28800 -> 10 Time:28800 -> 9 Time:28800 -> 8 Time:28800 -> 7 Time:28800 -> 6 Time:28800
Duration of the route:0min
Total duration of all routes: 0min
Solution using capacity equal 2:
Objective: 500
Dropped nodes:
Route for vehicle 0:
0 Time:0 -> 0 Time:0
Duration of the route:0min
Route for vehicle 1:
1 Time:0 -> 1 Time:0
Duration of the route:0min
Route for vehicle 2:
2 Time:0 -> 2 Time:0
Duration of the route:0min
Route for vehicle 3:
2 Time:0 -> 2 Time:0
Duration of the route:0min
Route for vehicle 4:
2 Time:0 -> 2 Time:0
Duration of the route:0min
Route for vehicle 5:
3 Time:0 -> 3 Time:0
Duration of the route:0min
Route for vehicle 6:
3 Time:0 -> 3 Time:0
Duration of the route:0min
Route for vehicle 7:
4 Time:0 -> 4 Time:0
Duration of the route:0min
Route for vehicle 8:
4 Time:0 -> 7 Time:28800 -> 4 Time:28800
Duration of the route:0min
Route for vehicle 9:
4 Time:0 -> 9 Time:28800 -> 8 Time:28800 -> 4 Time:28800
Duration of the route:0min
Route for vehicle 10:
5 Time:0 -> 5 Time:0
Duration of the route:0min
Route for vehicle 11:
5 Time:0 -> 11 Time:28800 -> 10 Time:28800 -> 5 Time:28800
Duration of the route:0min
Route for vehicle 12:
6 Time:0 -> 6 Time:0
Duration of the route:0min
Route for vehicle 13:
6 Time:0 -> 13 Time:28800 -> 12 Time:28800 -> 6 Time:28800
Duration of the route:0min
Route for vehicle 14:
6 Time:0 -> 16 Time:0 -> 15 Time:28800 -> 14 Time:28800 -> 6 Time:28800
Duration of the route:0min
Total duration of all routes: 0min
I am using the euclidean algorithm when i have p = 163, q = 311 and e = 281. Here is what i have so far
N = p * q = 50693
Totient Symbol(n) = 162 x 310 = 50220
1. 50220 = 178(281) + 202
2. 281 = 1 (202) + 79
3. 202 = 2 (79) + 44
4. 79 = 1 (44) + 35
5. 44 = 1 (35) + 9
6. 35 = 3 (9) + 8
7. 9 = 1 (8) + 1
8. 8 = 8 (1) + 0
I then move on to back substitution
A. 9 = 1 (8) + 1 === 1 = 9-1(8)
B. 8 = 35 – 3(9)
C. 1 = 1(9)-1(35-3(9)) -
D. 1 = 3(9) – 1(35) + 1(9) add similar items
E. 1 = 4(9) -1(35)
9 = 44 – 1(35)
1 = 4 (44-1(35)) – 1(35)
1 = 4(44)-4(35)-1(35)
1 = 4(44) – 5(35)
Take value next to 35 (5), subtract it from totient
50220 – 5 = 50215
d = 50215
This is wrong as i used an online calc to verify . Can anyone point me in the right direction here, i think the back substituion is wrong
There are two different ways to calculate RSA d values, the φ (phi / totient) method, and the λ (lambda / least common multiple) method. While the original RSA paper (and RFC 2313) use phi, modern implementations (and RFC 2437) use lambda.
The totient value is easy: (p-1)(q-1) = 50220.
For lambda(p-1, q-1) we need to first compute GCD(p-1, q-1), the example uses the subtraction form of the Euclidian algorithm:
GCD(162, 310)
GCD(162, 148)
GCD(14, 148)
GCD(14, 134)
GCD(14, 120)
GCD(14, 106)
GCD(14, 92)
GCD(14, 78)
GCD(14, 64)
GCD(14, 50)
GCD(14, 36)
GCD(14, 22)
GCD(14, 8)
GCD(6, 8)
GCD(6, 2)
GCD(4, 2)
GCD(2, 2)
GCD = 2
The least common multiple of (a, b) is a * b / GCD(a, b). So the lambda value is the totient / GCD, or 25110.
Now, to compute dPhi = ModInv(e, phi) or dLambda = ModInv(e, lambda) we can use the Extended Euclidean Algorithm:
ModInverse(281, 50220)
r=50220, newR=281, t= 0, newT= 1
r= 281, newR=202, t= 1, newT= -178
r= 202, newR= 79, t= -178, newT= 179
r= 79, newR= 44, t= 179, newT= -536
r= 44, newR= 35, t= -536, newT= 715
r= 35, newR= 9, t= 715, newT=-1251
r= 9, newR= 8, t=-1251, newT= 4468
r= 8, newR= 1, t= 4468, newT=-5719
r= 1, newR= 0, t=-5719, newT=50220
Correcting the sign of t
dPhi = 44501
ModInverse(281, 25110)
r=25110, newR=281, t= 0, newT= 1
r= 281, newR=101, t= 1, newT= -89
r= 101, newR= 79, t= -89, newT= 179
r= 79, newR= 22, t= 179, newT= -268
r= 22, newR= 13, t= -268, newT= 983
r= 13, newR= 9, t= 983, newT=-1251
r= 9, newR= 4, t=-1251, newT= 2234
r= 4, newR= 1, t= 2234, newT=-5719
r= 1, newR= 0, t=-5719, newT=25110
Correcting the sign of t
dLambda = 19391
You seem to have done the descending step of the Extended Euclidean Algorithm correctly, but being unfamiliar with the back-propogation calculation (as opposed to the inline form) I don't see where you made a value or arithmetic error.
Say there is an array of n elements, and out of n elements there be some numbers which are much bigger than the rest.
So, I might have:
16, 1, 1, 0, 5, 0, 32, 6, 54, 1, 2, 5, 3
In this case, I'd be interested in 32, 16 and 54.
Or, I might have:
32, 105, 26, 5, 1, 82, 906, 58, 22, 88, 967, 1024, 1055
In this case, I'd be interested in 1024, 906, 967 and 1055.
I'm trying to write a function to extract the numbers of interest. The problem is that I can't define a threshold to determine what's "much greater", and I can't just tell it to get the x biggest numbers because both of these will vary depending on what the function is called against.
I'm a little stuck. Does anyone have any ideas how to attack this?
Just taking all the numbers larger than the mean doesn't cut it all the time. For example if you only have one number which is much larger, but much more numbers wich are close to each other. The one large number won't shift the mean very much, which results in taking too many numbers:
data = [ones(1,10) 2*ones(1,10) 10];
data(data>mean(data))
ans =
2 2 2 2 2 2 2 2 2 2 10
If you look at the differences between numbers, this problem is solved:
>> data = [16, 1, 1, 0, 5, 0, 32, 6, 54, 1, 2, 5, 3];
sorted_data = sort(data);
dd = diff(sorted_data);
mean_dd = mean(dd);
ii = find(dd> 2*mean_dd,1,'first');
large_numbers = sorted_data(ii:end);
large_numbers =
6 16 32 54
the threshold value (2 in this case) lets you play with the meaning of "how much greater" a number has to be.
If it were me I'd use a little more statistical insight, that would give the most flexibility for the code in the future.
x = [1 2 3 2 2 1 4 6 15 83 2 4 22 81 0 8 7 7 7 3 1 2 3]
EpicNumbers = x( x>(mean(x) + std(x)) )
Then you can increase or decrease the number of standard deviations to broaden or tighten your threshold.
LessEpicNumbers = x( x>(mean(x) + 2*std(x)) )
MoreEpicNumbers = x( x>(mean(x) + 0.5*std(x)) )
A simple solution would be to use find and a treshold based on the mean value (or multiples thereof):
a = [16, 1, 1, 0, 5, 0, 32, 6, 54, 1, 2, 5, 3]
find(a>mean(a))