ORTOOLS - CPSAT - Objective to minimize a value by intervals - or-tools

I my model in ORTools CPSAT, I am computing a variable called salary_var (among others). I need to minimize an objective. Let’s call it « taxes ».
to compute the taxes, the formula is not linear but organised this way:
if salary_var below 10084, taxes corresponds to 0%
between 10085 and 25710, taxes corresponds to 11%
between 25711 and 73516, taxes corresponds to 30%
and 41% for above
For example, if salary_var is 30000 then, taxes are:
(25710-10085) * 0.11 + (30000-25711) * 0.3 = 1718 + 1286 = 3005
My question: how can I efficiently code my « taxes » objective?
Thanks for your help
Seb

This task looks rather strange, there is not much context and some parts of the task might touch some not-so-nice areas of finite-domain based solvers (large domains or scaling / divisions during solving).
Therefore: consider this as an idea / template!
Code
from ortools.sat.python import cp_model
# Data
INPUT = 30000
INPUT_UB = 1000000
TAX_A = 11
TAX_B = 30
TAX_C = 41
# Helpers
# new variable which is constrained to be equal to: given input-var MINUS constant
# can get negative / wrap-around
def aux_var_offset(model, var, offset):
aux_var = model.NewIntVar(-INPUT_UB, INPUT_UB, "")
model.Add(aux_var == var - offset)
return aux_var
# new variable which is equal to the given input-var IFF >= 0; else 0
def aux_var_nonnegative(model, var):
aux_var = model.NewIntVar(0, INPUT_UB, "")
model.AddMaxEquality(aux_var, [var, model.NewConstant(0)])
return aux_var
# Model
model = cp_model.CpModel()
# vars
salary_var = model.NewIntVar(0, INPUT_UB, "salary")
tax_component_a = model.NewIntVar(0, INPUT_UB, "tax_11")
tax_component_b = model.NewIntVar(0, INPUT_UB, "tax_30")
tax_component_c = model.NewIntVar(0, INPUT_UB, "tax_41")
# constraints
model.AddMinEquality(tax_component_a, [
aux_var_nonnegative(model, aux_var_offset(model, salary_var, 10085)),
model.NewConstant(25710 - 10085)])
model.AddMinEquality(tax_component_b, [
aux_var_nonnegative(model, aux_var_offset(model, salary_var, 25711)),
model.NewConstant(73516 - 25711)])
model.Add(tax_component_c == aux_var_nonnegative(model,
aux_var_offset(model, salary_var, 73516)))
tax_full_scaled = tax_component_a * TAX_A + tax_component_b * TAX_B + tax_component_c * TAX_C
# Demo
model.Add(salary_var == INPUT)
solver = cp_model.CpSolver()
status = solver.Solve(model)
print(list(map(lambda x: solver.Value(x), [tax_component_a, tax_component_b, tax_component_c, tax_full_scaled])))
Output
[15625, 4289, 0, 300545]
Remarks
As implemented:
uses scaled solving
produces scaled solution (300545)
no fiddling with non-integral / ratio / rounding stuff BUT large domains
Alternative:
Maybe something around AddDivisionEquality
Edit in regards to Laurents comments
In some scenarios, solving the scaled problem but being able to reason about the real unscaled values easier might make sense.
If i interpret the comment correctly, the following would be a demo (which i was not aware of and it's cool!):
Updated Demo Code (partial)
# Demo -> Attempt of demonstrating the objective-scaling suggestion
model.Add(salary_var >= 30000)
model.Add(salary_var <= 40000)
model.Minimize(salary_var)
model.Proto().objective.scaling_factor = 0.001 # DEFINE INVERSE SCALING
solver = cp_model.CpSolver()
solver.parameters.log_search_progress = True # SCALED BACK OBJECTIVE PROGRESS
status = solver.Solve(model)
print(list(map(lambda x: solver.Value(x), [tax_component_a, tax_component_b, tax_component_c, tax_full_scaled])))
print(solver.ObjectiveValue()) # SCALED BACK OBJECTIVE
Output (excerpt)
...
...
#1 0.00s best:30 next:[30,29.999] fixed_bools:0/1
#Done 0.00s
CpSolverResponse summary:
status: OPTIMAL
objective: 30
best_bound: 30
booleans: 1
conflicts: 0
branches: 1
propagations: 0
integer_propagations: 2
restarts: 1
lp_iterations: 0
walltime: 0.0039022
usertime: 0.0039023
deterministic_time: 8e-08
primal_integral: 1.91832e-07
[15625, 4289, 0, 300545]
30.0

Related

Decay chain simulation - with significantly different time scales

I would like to simulate a decay chain with Python. Normally, (in a loop over all nuclides) one calculates the number of decays per time step and updates the number of mother and daughter nuclei.
My problem is that the decay chain contains half-lives on very different time scales, i.e.
0.0001643 seconds for Po-214 and 307106512477175.9 seconds (= 1600 years) for Ra-226.
Using the same time step for all nuclides seems useless.
Is there a simulation method, preferably in Python, that can be used to handle this case?
Don't use time steps for this. Use event scheduling.
Half lives can be expressed as exponential decay, and the conversion between half life and rate of decay is straightforward. Start with the number of both types of nuclei, and schedule exponential inter-event times to figure out when the next decay of each type will occur. Whichever type has the lower time, decrement the corresponding number of nuclei and schedule the next decay for that type (and if need be, increment the count of whatever it decays into).
This can easily be generalized to multiple distinct event types by using a priority queue ordered by time of occurrence to determine which event will be the next one performed. This is the underlying principle behind discrete event simulation.
Update
This approach works with individual decay events, but we can leverage two important properties when we have exponential inter-event times.
The first is to note that exponentially distributed inter-event times means these are Poisson processes. The superposition property tells us that the union of two independent Poisson processes, each having rate λ, is a Poisson process with rate 2λ. Simple induction shows that if we have n independent Poisson properties with the same rate, their superposition is a Poisson process with rate nλ.
The second property is that the exponential distribution is memoryless. This means that when a Poisson event occurs, we can generate the time to the next event by generating a new exponentially distributed time at the current rate and adding it to the current time.
You haven't provided any information about what you want in the way of output, so I arbitrarily decided to print a report showing the time and the current numbers of nuclides whenever that number was halved. I also printed a report every 10 years, given the long half-life of Po-214.
I converted half-lifes to rates using the link provided at the top of the post, and then to means since that's what
Python numpy's exponential generator is parameterized to use. That's an easy conversion, since means and rates are inverses of each other.
Here's a Python implementation with comments:
from numpy.random import default_rng
from math import log
rng = default_rng()
# This creates a list of entries of quantities that will trigger a report.
# I've chosen to go with successive halvings of the original quantity.
def generate_report_qtys(n0):
report_qty = []
divisor = 2
while divisor < n0:
report_qty.append(n0 // divisor) # append next half-life qty to array
divisor *= 2
return report_qty
seconds_per_year = 365.25 * 24 * 60 * 60
po_214_half_life = 0.0001643 # seconds
ra_226_half_life = 1590 * seconds_per_year
log_2 = log(2)
po_mean = po_214_half_life / log_2 # per-nuclide decay rate for po_214
ra_mean = ra_226_half_life / log_2 # ditto for ra_226
po_n = po_n0 = 1_000_000_000
ra_n = ra_n0 = 1_000_000_000
time = 0.0
# Generate a report when the following sets of half-lifes are reached
po_report_qtys = generate_report_qtys(po_n0)
ra_report_qtys = generate_report_qtys(ra_n0)
# Initialize first event times for each type of event:
# - first entry is polonium next event time
# - second entry is radium next event time
# - third entry is next ten year report time
next_event_time = [
rng.exponential(po_mean / po_n),
rng.exponential(ra_mean / ra_n),
10 * seconds_per_year
]
# Print column labels and initial values
print("time,po_214,ra_226,time_in_years")
print(f"{time},{po_n},{ra_n},{time / seconds_per_year}")
while time < ra_226_half_life:
# Find the index of the next event time. Index tells us the event type.
min_index = next_event_time.index(min(next_event_time))
if min_index == 0:
po_n -= 1 # decrement polonium count
time = next_event_time[0] # update clock to the event time
if po_n > 0:
next_event_time[0] += rng.exponential(po_mean / po_n) # determine next event time for po
else:
next_event_time[0] = float('Inf')
# print report if this is a half-life occurrence
if len(po_report_qtys) > 0 and po_n == po_report_qtys[0]:
po_report_qtys.pop(0) # remove this occurrence from the list
print(f"{time},{po_n},{ra_n},{time / seconds_per_year}")
elif min_index == 1:
# same as above, but for radium
ra_n -= 1
time = next_event_time[1]
if ra_n > 0:
next_event_time[1] += rng.exponential(ra_mean / ra_n)
else:
next_event_time[1] = float('Inf')
if len(ra_report_qtys) > 0 and ra_n == ra_report_qtys[0]:
ra_report_qtys.pop(0)
print(f"{time},{po_n},{ra_n},{time / seconds_per_year}")
else:
# update clock, print ten year report
time = next_event_time[2]
next_event_time[2] += 10 * seconds_per_year
print(f"{time},{po_n},{ra_n},{time / seconds_per_year}")
Run times are proportional to the number of nuclides. Running with a billion of each took 831.28s on my M1 MacBook Pro, versus 2.19s for a million of each. I also ported this to Crystal, a compiled Ruby-like language, which produced comparable results in 32 seconds for a billion of each nuclide. I would recommend using a compiled language if you intend to run larger sized problems, but I will also point out that if you use half-life reporting as I did the results are virtually identical for smaller population sizes but are obtained much more rapidly.
I would also suggest that if you want to use this approach for a more complex model, you should use a priority queue of tuples containing time and type of event to store the set of pending future events rather than a simple list.
Last but not least, here's some sample output:
time,po_214,ra_226,time_in_years
0.0,1000000000,1000000000,0.0
0.0001642985647308265,500000000,1000000000,5.20630734690935e-12
0.0003286071415481526,250000000,1000000000,1.0412931957694901e-11
0.0004929007624958987,125000000,1000000000,1.5619082645571865e-11
0.0006571750701843468,62500000,1000000000,2.082462133319222e-11
0.0008214861652253772,31250000,1000000000,2.6031325741671646e-11
0.0009858208114474198,15625000,1000000000,3.1238776442043114e-11
0.0011502417677631668,7812500,1000000000,3.6448962144243124e-11
0.0013145712145548718,3906250,1000000000,4.165624808460947e-11
0.0014788866075394896,1953125,1000000000,4.686308868670272e-11
0.0016432124609700412,976562,1000000000,5.2070260760325286e-11
0.001807832817519779,488281,1000000000,5.728676507465013e-11
0.001972981254301889,244140,1000000000,6.252000324175124e-11
0.0021372947080755688,122070,1000000000,6.772678239395799e-11
0.002301139510796509,61035,1000000000,7.29187108904514e-11
0.0024642826956509244,30517,1000000000,7.808840645837847e-11
0.0026302282280720344,15258,1000000000,8.33469030620844e-11
0.0027944471221414947,7629,1000000000,8.855068579808016e-11
0.002954014120737834,3814,1000000000,9.3607058861822e-11
0.0031188370035748177,1907,1000000000,9.882998084692174e-11
0.003282466175503322,953,1000000000,1.0401507641592902e-10
0.003457552492113242,476,1000000000,1.0956322699169905e-10
0.003601851131916978,238,1000000000,1.1413577496124477e-10
0.0037747824699194033,119,1000000000,1.1961563838566314e-10
0.0039512825256332275,59,1000000000,1.252085876503038e-10
0.004124330529803301,29,1000000000,1.3069214800248755e-10
0.004337121375518753,14,1000000000,1.3743508300754027e-10
0.004535068261934763,7,1000000000,1.437076413268044e-10
0.004890820999020369,3,1000000000,1.5498076529965425e-10
0.004909065046898487,1,1000000000,1.555588842908994e-10
315576000.0,0,995654793,10.0
631152000.0,0,991322602,20.0
946728000.0,0,987010839,30.0
1262304000.0,0,982711723,40.0
1577880000.0,0,978442651,50.0
1893456000.0,0,974185269,60.0
2209032000.0,0,969948418,70.0
2524608000.0,0,965726762,80.0
2840184000.0,0,961524848,90.0
3155760000.0,0,957342148,100.0
3471336000.0,0,953178898,110.0
3786912000.0,0,949029294,120.0
4102488000.0,0,944898063,130.0
4418064000.0,0,940790494,140.0
4733640000.0,0,936699123,150.0
5049216000.0,0,932622334,160.0
5364792000.0,0,928565676,170.0
5680368000.0,0,924523267,180.0
5995944000.0,0,920499586,190.0
6311520000.0,0,916497996,200.0
6627096000.0,0,912511030,210.0
6942672000.0,0,908543175,220.0
7258248000.0,0,904590364,230.0
7573824000.0,0,900656301,240.0
7889400000.0,0,896738632,250.0
8204976000.0,0,892838664,260.0
8520552000.0,0,888956681,270.0
8836128000.0,0,885084855,280.0
9151704000.0,0,881232862,290.0
9467280000.0,0,877401861,300.0
9782856000.0,0,873581425,310.0
10098432000.0,0,869785364,320.0
10414008000.0,0,866002042,330.0
10729584000.0,0,862234212,340.0
11045160000.0,0,858485627,350.0
11360736000.0,0,854749939,360.0
11676312000.0,0,851032010,370.0
11991888000.0,0,847329028,380.0
12307464000.0,0,843640016,390.0
12623040000.0,0,839968529,400.0
12938616000.0,0,836314000,410.0
13254192000.0,0,832673999,420.0
13569768000.0,0,829054753,430.0
13885344000.0,0,825450233,440.0
14200920000.0,0,821859757,450.0
14516496000.0,0,818284787,460.0
14832072000.0,0,814727148,470.0
15147648000.0,0,811184419,480.0
15463224000.0,0,807655470,490.0
15778800000.0,0,804139970,500.0
16094376000.0,0,800643280,510.0
16409952000.0,0,797159389,520.0
16725528000.0,0,793692735,530.0
17041104000.0,0,790239221,540.0
17356680000.0,0,786802135,550.0
17672256000.0,0,783380326,560.0
17987832000.0,0,779970864,570.0
18303408000.0,0,776576174,580.0
18618984000.0,0,773197955,590.0
18934560000.0,0,769836170,600.0
19250136000.0,0,766488931,610.0
19565712000.0,0,763154778,620.0
19881288000.0,0,759831742,630.0
20196864000.0,0,756528400,640.0
20512440000.0,0,753237814,650.0
20828016000.0,0,749961747,660.0
21143592000.0,0,746699940,670.0
21459168000.0,0,743450395,680.0
21774744000.0,0,740219531,690.0
22090320000.0,0,736999181,700.0
22405896000.0,0,733793266,710.0
22721472000.0,0,730602000,720.0
23037048000.0,0,727427544,730.0
23352624000.0,0,724260327,740.0
23668200000.0,0,721110260,750.0
23983776000.0,0,717973915,760.0
24299352000.0,0,714851218,770.0
24614928000.0,0,711740161,780.0
24930504000.0,0,708645945,790.0
25246080000.0,0,705559170,800.0
25561656000.0,0,702490991,810.0
25877232000.0,0,699436919,820.0
26192808000.0,0,696394898,830.0
26508384000.0,0,693364883,840.0
26823960000.0,0,690348242,850.0
27139536000.0,0,687345934,860.0
27455112000.0,0,684354989,870.0
27770688000.0,0,681379178,880.0
28086264000.0,0,678414567,890.0
28401840000.0,0,675461363,900.0
28717416000.0,0,672522494,910.0
29032992000.0,0,669598412,920.0
29348568000.0,0,666687807,930.0
29664144000.0,0,663787671,940.0
29979720000.0,0,660901676,950.0
30295296000.0,0,658027332,960.0
30610872000.0,0,655164886,970.0
30926448000.0,0,652315268,980.0
31242024000.0,0,649481821,990.0
31557600000.0,0,646656096,1000.0
31873176000.0,0,643841377,1010.0
32188752000.0,0,641041609,1020.0
32504328000.0,0,638253759,1030.0
32819904000.0,0,635479981,1040.0
33135480000.0,0,632713706,1050.0
33451056000.0,0,629962868,1060.0
33766632000.0,0,627223350,1070.0
34082208000.0,0,624494821,1080.0
34397784000.0,0,621778045,1090.0
34713360000.0,0,619076414,1100.0
35028936000.0,0,616384399,1110.0
35344512000.0,0,613702920,1120.0
35660088000.0,0,611035112,1130.0
35975664000.0,0,608376650,1140.0
36291240000.0,0,605729994,1150.0
36606816000.0,0,603093946,1160.0
36922392000.0,0,600469403,1170.0
37237968000.0,0,597854872,1180.0
37553544000.0,0,595254881,1190.0
37869120000.0,0,592663681,1200.0
38184696000.0,0,590085028,1210.0
38500272000.0,0,587517782,1220.0
38815848000.0,0,584961743,1230.0
39131424000.0,0,582420312,1240.0
39447000000.0,0,579886455,1250.0
39762576000.0,0,577362514,1260.0
40078152000.0,0,574849251,1270.0
40393728000.0,0,572346625,1280.0
40709304000.0,0,569856166,1290.0
41024880000.0,0,567377753,1300.0
41340456000.0,0,564908008,1310.0
41656032000.0,0,562450828,1320.0
41971608000.0,0,560005832,1330.0
42287184000.0,0,557570018,1340.0
42602760000.0,0,555143734,1350.0
42918336000.0,0,552729893,1360.0
43233912000.0,0,550326162,1370.0
43549488000.0,0,547932312,1380.0
43865064000.0,0,545550017,1390.0
44180640000.0,0,543178924,1400.0
44496216000.0,0,540814950,1410.0
44811792000.0,0,538462704,1420.0
45127368000.0,0,536123339,1430.0
45442944000.0,0,533792776,1440.0
45758520000.0,0,531469163,1450.0
46074096000.0,0,529157093,1460.0
46389672000.0,0,526854383,1470.0
46705248000.0,0,524564196,1480.0
47020824000.0,0,522282564,1490.0
47336400000.0,0,520011985,1500.0
47651976000.0,0,517751635,1510.0
47967552000.0,0,515499791,1520.0
48283128000.0,0,513257373,1530.0
48598704000.0,0,511022885,1540.0
48914280000.0,0,508798440,1550.0
49229856000.0,0,506582663,1560.0
49545432000.0,0,504379227,1570.0
49861008000.0,0,502186693,1580.0
50176584000.0,0,500000869,1590.0
Expanded for More than 2 Nuclides
I mentioned that for more than a couple of nuclides you'd want to use a priority queue to track which decays occur next. I reorganized the code around functions, but that allowed greater flexibility in expanding the scope of the problem. Here you go:
#!/usr/bin/env python3
from numpy.random import default_rng
from math import log
import heapq
SECONDS_PER_YEAR = 365.25 * 24 * 60 * 60
LOG_2 = log(2)
rng = default_rng()
def generate_report_qtys(n0):
report_qty = []
divisor = 2
while divisor < n0:
report_qty.append(n0 // divisor) # append next half-life qty to array
divisor *= 2
return report_qty
po_n0 = 10_000_000
ra_n0 = 10_000_000
mu_n0 = 10_000_000
# mean is half-life / LOG_2
properties = dict(
po_214 = dict(
mean = 0.0001643 / LOG_2,
qty = po_n0,
report_qtys = generate_report_qtys(po_n0)
),
ra_226 = dict(
mean = 1590 * SECONDS_PER_YEAR / LOG_2,
qty = ra_n0,
report_qtys = generate_report_qtys(ra_n0)
),
made_up = dict(
mean = 75 * SECONDS_PER_YEAR / LOG_2,
qty = mu_n0,
report_qtys = generate_report_qtys(mu_n0)
)
)
nuclide_names = [name for name in properties.keys()]
def population_mean(nuclide):
return properties[nuclide]['mean'] / properties[nuclide]['qty']
def report(): # isolate as single point of maintenance even though it's a one-liner
nuc_qtys = [str(properties[nuclide]['qty']) for nuclide in nuclide_names]
print(f"{time},{time / SECONDS_PER_YEAR}," + ','.join(nuc_qtys))
def decay_event(nuclide):
properties[nuclide]['qty'] -= 1
current_qty = properties[nuclide]['qty']
if current_qty > 0:
heapq.heappush(event_q, (time + rng.exponential(population_mean(nuclide)), nuclide))
rep_qty = properties[nuclide]['report_qtys']
if len(rep_qty) > 0 and current_qty == rep_qty[0]:
rep_qty.pop(0) # remove this occurrence from the list
report()
def report_event():
heapq.heappush(event_q, (time + 10 * SECONDS_PER_YEAR, 'report_event'))
report()
event_q = [(rng.exponential(population_mean(nuclide)), nuclide) for nuclide in nuclide_names]
event_q.append((0.0, "report_event"))
heapq.heapify(event_q)
time = 0.0 # simulated time
print("time(seconds),time(years)," + ','.join(nuclide_names)) # column labels
while time < 1600 * SECONDS_PER_YEAR:
time, event_id = heapq.heappop(event_q)
if event_id == 'report_event':
report_event()
else:
decay_event(event_id)
To add more nuclides, add more entries to the properties dictionary, following the template of the current entries.

Accuracy is not increasing, though loss is decreasing

I am feeding cnn features into gpflow model. I am writing the chunks of code from my program here. I am using tape.gradient with Adam optimizer (scheduled lr). My accuracy gets stuck on 47% and surprisingly , my loss still gets reducing. Its very weird. I have debugged the program. CNN features are ok but gp model is not learning .Please can you check the training loop and let me know where am I wrong.
def optimization_step(gp_model: gpflow.models.SVGP, image_data,labels):
with tf.GradientTape(watch_accessed_variables=False)as tape:
tape.watch(gp_model.trainable_variables)
cnn_feat = cnn_model(image_data,training=False)
cnn_feat=tf.cast(cnn_feat,dtype=default_float())
labels=tf.cast(labels,dtype=np.int64)
data=(cnn_feat, labels)
loss = gp_model.training_loss(data)
gp_grads=tape.gradient(loss, gp_model.trainable_variables)
gp_optimizer.apply_gradients(zip(gp_grads, gp_model.trainable_variables))
return loss, cnn_feat
the loop for training is
def simple_training_loop(gp_model: gpflow.models.SVGP, epochs: int = 3, logging_epoch_freq: int = 10):
total_loss = []
features=[]
tf_optimization_step = tf.function(optimization_step, autograph=False)
for epoch in range(epochs):
lr.assign(max(args.learning_rate_clip, args.learning_rate * (args.decay_rate ** epoch)))
data_loader.shuffle_data(args.is_training)
for b in range(data_loader.n_batches):
batch_x, batch_y= data_loader.next_batch(b)
batch_x=tf.convert_to_tensor(batch_x)
batch_y=tf.convert_to_tensor(batch_y)
loss,features_CNN=tf_optimization_step(gp_model, batch_x,batch_y)
I am restoring weights for CNN from checkpoints saved during transfer learning.
With more epochs , loss continue to decrease but accuracy starts decreasing as well.
The gp model declaration is as follows
kernel = gpflow.kernels.Matern32() + gpflow.kernels.White(variance=0.01)
invlink = gpflow.likelihoods.RobustMax(C)
likelihood = gpflow.likelihoods.MultiClass(C, invlink=invlink)
the test Function
cnn_feat=cnn_model(test_x,training=False)
cnn_feat = tf.cast(cnn_feat, dtype=default_float())
mean, var = gp_model.predict_f(cnn_feat)
preds = np.argmax(mean, 1).reshape(test_labels.shape)
correct = (preds == test_labels.numpy().astype(int))
acc = np.average(correct.astype(float)) * 100
Can you please just check that whether the training loop is correctly written
The training loop looks fine. However, there are bits that should be modified for clarity and for optimisation sake.
def simple_training_loop(gp_model: gpflow.models.SVGP, epochs: int = 3, logging_epoch_freq: int = 10):
total_loss = []
features=[]
#tf.function
def compute_cnn_feat(x: tf.Tensor) -> tf.Tensor:
return tf.cast(cnn_model(x, training=False), dtype=default_float())
#tf.function
def optimization_step(cnn_feat: tf.Tensor, labels: tf.Tensor): # **Change 1.**
with tf.GradientTape(watch_accessed_variables=False) as tape:
tape.watch(gp_model.trainable_variables)
data = (cnn_feat, labels)
loss = gp_model.training_loss(data)
gp_grads = tape.gradient(loss, gp_model.trainable_variables) # **Change 2.**
gp_optimizer.apply_gradients(zip(gp_grads, gp_model.trainable_variables))
return loss
for epoch in range(epochs):
lr.assign(max(args.learning_rate_clip, args.learning_rate * (args.decay_rate ** epoch)))
data_loader.shuffle_data(args.is_training)
for b in range(data_loader.n_batches):
batch_x, batch_y= data_loader.next_batch(b)
batch_x = tf.convert_to_tensor(batch_x)
batch_y = tf.convert_to_tensor(batch_y, dtype=default_float())
cnn_feat = compute_cnn_feat(batch_x) # **Change 3.**
loss = optimization_step(cnn_feat, batch_y)
Change 1. Signature of a function that you wrap with tf.function should not have mutable objects.
Change 2. The gradient tape will track all computations inside the context manager, including the computation of the gradients i.e. tape.gradient(...). In turn, that means your code performs an unnecessary calculation.
Change 3. For the same reason as in "Change 2." I moved the CNN feature extraction outside of the gradient tape.

Implementing Adam in Pytorch

I’m trying to implement Adam by myself for a learning purpose.
Here is my Adam implementation:
class ADAMOptimizer(Optimizer):
"""
implements ADAM Algorithm, as a preceding step.
"""
def __init__(self, params, lr=1e-3, betas=(0.9, 0.99), eps=1e-8, weight_decay=0):
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
super(ADAMOptimizer, self).__init__(params, defaults)
def step(self):
"""
Performs a single optimization step.
"""
loss = None
for group in self.param_groups:
#print(group.keys())
#print (self.param_groups[0]['params'][0].size()), First param (W) size: torch.Size([10, 784])
#print (self.param_groups[0]['params'][1].size()), Second param(b) size: torch.Size([10])
for p in group['params']:
grad = p.grad.data
state = self.state[p]
# State initialization
if len(state) == 0:
state['step'] = 0
# Momentum (Exponential MA of gradients)
state['exp_avg'] = torch.zeros_like(p.data)
#print(p.data.size())
# RMS Prop componenet. (Exponential MA of squared gradients). Denominator.
state['exp_avg_sq'] = torch.zeros_like(p.data)
exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
b1, b2 = group['betas']
state['step'] += 1
# L2 penalty. Gotta add to Gradient as well.
if group['weight_decay'] != 0:
grad = grad.add(group['weight_decay'], p.data)
# Momentum
exp_avg = torch.mul(exp_avg, b1) + (1 - b1)*grad
# RMS
exp_avg_sq = torch.mul(exp_avg_sq, b2) + (1-b2)*(grad*grad)
denom = exp_avg_sq.sqrt() + group['eps']
bias_correction1 = 1 / (1 - b1 ** state['step'])
bias_correction2 = 1 / (1 - b2 ** state['step'])
adapted_learning_rate = group['lr'] * bias_correction1 / math.sqrt(bias_correction2)
p.data = p.data - adapted_learning_rate * exp_avg / denom
if state['step'] % 10000 ==0:
print ("group:", group)
print("p: ",p)
print("p.data: ", p.data) # W = p.data
return loss
I think I implemented everything correct however the loss graph of my implementation is very spiky compared to that of torch.optim.Adam.
My ADAM implementation loss graph (below)
torch.optim.Adam loss graph (below)
If someone could tell me what I am doing wrong, I’ll be very grateful.
For the full code including data, graph (super easy to run): https://github.com/byorxyz/AMS_pytorch/blob/master/AdamFails_1dConvex.ipynb

Openmdao V1.7 Sellar MDF

I foound out something strange with the MDA of sellar problem on the doc page of OpenMDAO (http://openmdao.readthedocs.io/en/1.7.3/usr-guide/tutorials/sellar.html)
If I extract the code and only run the MDA (adding counters in the disciplines), I observe that the number of calls is differents between disciplines (twice the number of d2 for d1 discipline) which is not expected . Does someone has an answer ?
Here is the results
Coupling vars: 25.588303, 12.058488
Number of discipline 1 and 2 calls (10,5)
And here is the code
# For printing, use this import if you are running Python 2.x from __future__ import print_function
import numpy as np
from openmdao.api import Component from openmdao.api import ExecComp, IndepVarComp, Group, NLGaussSeidel, \
ScipyGMRES
class SellarDis1(Component):
"""Component containing Discipline 1."""
def __init__(self):
super(SellarDis1, self).__init__()
# Global Design Variable
self.add_param('z', val=np.zeros(2))
# Local Design Variable
self.add_param('x', val=0.)
# Coupling parameter
self.add_param('y2', val=1.0)
# Coupling output
self.add_output('y1', val=1.0)
self.execution_count = 0
def solve_nonlinear(self, params, unknowns, resids):
"""Evaluates the equation
y1 = z1**2 + z2 + x1 - 0.2*y2"""
z1 = params['z'][0]
z2 = params['z'][1]
x1 = params['x']
y2 = params['y2']
unknowns['y1'] = z1**2 + z2 + x1 - 0.2*y2
self.execution_count += 1
def linearize(self, params, unknowns, resids):
""" Jacobian for Sellar discipline 1."""
J = {}
J['y1','y2'] = -0.2
J['y1','z'] = np.array([[2*params['z'][0], 1.0]])
J['y1','x'] = 1.0
return J
class SellarDis2(Component):
"""Component containing Discipline 2."""
def __init__(self):
super(SellarDis2, self).__init__()
# Global Design Variable
self.add_param('z', val=np.zeros(2))
# Coupling parameter
self.add_param('y1', val=1.0)
# Coupling output
self.add_output('y2', val=1.0)
self.execution_count = 0
def solve_nonlinear(self, params, unknowns, resids):
"""Evaluates the equation
y2 = y1**(.5) + z1 + z2"""
z1 = params['z'][0]
z2 = params['z'][1]
y1 = params['y1']
# Note: this may cause some issues. However, y1 is constrained to be
# above 3.16, so lets just let it converge, and the optimizer will
# throw it out
y1 = abs(y1)
unknowns['y2'] = y1**.5 + z1 + z2
self.execution_count += 1
def linearize(self, params, unknowns, resids):
""" Jacobian for Sellar discipline 2."""
J = {}
J['y2', 'y1'] = .5*params['y1']**-.5
#Extra set of brackets below ensure we have a 2D array instead of a 1D array
# for the Jacobian; Note that Jacobian is 2D (num outputs x num inputs).
J['y2', 'z'] = np.array([[1.0, 1.0]])
return J
class SellarDerivatives(Group):
""" Group containing the Sellar MDA. This version uses the disciplines
with derivatives."""
def __init__(self):
super(SellarDerivatives, self).__init__()
self.add('px', IndepVarComp('x', 1.0), promotes=['x'])
self.add('pz', IndepVarComp('z', np.array([5.0, 2.0])), promotes=['z'])
self.add('d1', SellarDis1(), promotes=['z', 'x', 'y1', 'y2'])
self.add('d2', SellarDis2(), promotes=['z', 'y1', 'y2'])
self.add('obj_cmp', ExecComp('obj = x**2 + z[1] + y1 + exp(-y2)',
z=np.array([0.0, 0.0]), x=0.0, y1=0.0, y2=0.0),
promotes=['obj', 'z', 'x', 'y1', 'y2'])
self.add('con_cmp1', ExecComp('con1 = 3.16 - y1'), promotes=['y1', 'con1'])
self.add('con_cmp2', ExecComp('con2 = y2 - 24.0'), promotes=['con2', 'y2'])
self.nl_solver = NLGaussSeidel()
self.nl_solver.options['atol'] = 1.0e-12
self.ln_solver = ScipyGMRES()
from openmdao.api import Problem, ScipyOptimizer
top = Problem() top.root = SellarDerivatives()
#top.driver = ScipyOptimizer()
#top.driver.options['optimizer'] = 'SLSQP'
#top.driver.options['tol'] = 1.0e-8
#
#top.driver.add_desvar('z', lower=np.array([-10.0, 0.0]),
# upper=np.array([10.0, 10.0]))
#top.driver.add_desvar('x', lower=0.0, upper=10.0)
#
#top.driver.add_objective('obj')
#top.driver.add_constraint('con1', upper=0.0)
#top.driver.add_constraint('con2', upper=0.0)
top.setup()
# Setting initial values for design variables top['x'] = 1.0 top['z'] = np.array([5.0, 2.0])
top.run()
print("\n")
print("Coupling vars: %f, %f" % (top['y1'], top['y2']))
count1 = top.root.d1.execution_count
count2 = top.root.d2.execution_count
print("Number of discipline 1 and 2 calls (%i,%i)"% (count1,count2))
This is a good observation. Whenever you have a cycle, the "head" component runs a second time. The reason is as follows:
If you have a model with components that contain implicit states, a single execution looks like this:
Call solve_nonlinear to execute components
Call apply_nonlinear to calculate the residuals.
We don't have any components with implicit states in this model, but we indirectly created the need for one by having a cycle. Our execution looks like this:
Call solve_nonlinear to execute all components.
Call apply_nonlinear (which caches the unknowns, calls solve_nolinear, and saves the difference in unknowns) on just the "head" component to generate a residual that we can converge.
Here, the head component is just the first component that is executed based on however it determines what order to run the cycle in. You can verify that only a single head component gets extra runs by building a cycle with more than 2 components.

Clarification between Epoch and iteration

This answer points to the difference between an Epoch and an iteration while training a neural network. However, when I look at the source code for the solver API in the Stanford CS231n course (and I'm assuming this is the case for most libraries out there as well), during each iteration, batch_size number of examples are randomly selected with replacement. Thus, there is no guarantee that all examples would been seen during each epoch is there?
Does an epoch then mean that all examples would be seen in expectation? Or am I understanding this wrong?
Relevant Source Code:
def _step(self):
"""
Make a single gradient update. This is called by train() and should not
be called manually.
"""
# Make a minibatch of training data
num_train = self.X_train.shape[0]
batch_mask = np.random.choice(num_train, self.batch_size)
X_batch = self.X_train[batch_mask]
y_batch = self.y_train[batch_mask]
# Compute loss and gradient
loss, grads = self.model.loss(X_batch, y_batch)
self.loss_history.append(loss)
# Perform a parameter update
for p, w in self.model.params.iteritems():
dw = grads[p]
config = self.optim_configs[p]
next_w, next_config = self.update_rule(w, dw, config)
self.model.params[p] = next_w
self.optim_configs[p] = next_config
def train(self):
"""
Run optimization to train the model.
"""
num_train = self.X_train.shape[0]
iterations_per_epoch = max(num_train / self.batch_size, 1)
num_iterations = self.num_epochs * iterations_per_epoch
for t in xrange(num_iterations):
self._step()
# Maybe print training loss
if self.verbose and t % self.print_every == 0:
print '(Iteration %d / %d) loss: %f' % (
t + 1, num_iterations, self.loss_history[-1])
# At the end of every epoch, increment the epoch counter and decay the
# learning rate.
epoch_end = (t + 1) % iterations_per_epoch == 0
if epoch_end:
self.epoch += 1
for k in self.optim_configs:
self.optim_configs[k]['learning_rate'] *= self.lr_decay
# Check train and val accuracy on the first iteration, the last
# iteration, and at the end of each epoch.
first_it = (t == 0)
last_it = (t == num_iterations + 1)
if first_it or last_it or epoch_end:
train_acc = self.check_accuracy(self.X_train, self.y_train,
num_samples=1000)
val_acc = self.check_accuracy(self.X_val, self.y_val)
self.train_acc_history.append(train_acc)
self.val_acc_history.append(val_acc)
if self.verbose:
print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
self.epoch, self.num_epochs, train_acc, val_acc)
# Keep track of the best model
if val_acc > self.best_val_acc:
self.best_val_acc = val_acc
self.best_params = {}
for k, v in self.model.params.iteritems():
self.best_params[k] = v.copy()
# At the end of training swap the best params into the model
self.model.params = self.best_params
Thanks.
I believe, as you say, that in the Stanford course they are effectively using "epoch" with the less strict meaning of "expected number of times each example is seen during training". However, in my experience, most implementations consider an epoch as running through every example in the training set once, and I'd say they only chose the sampling with replacement for simplicity. If you have a good amount of data, chances are that you will not see a difference, but still, it is more correct to sample without replacement until there are no more examples.
You can check, for example, how Keras does the training in its source code; it's a bit complicated, but the important point is that make_batches is called to split the (possibly shuffled) examples into batches, which matches your initial idea of "epoch".