Distributed Search Issues - Max number of local indexes - sphinx

I'm running 1.10-beta and have a distributed index setup. There are 4 base chunks and 4
delta chunks making the dist index look like this:
index dist_idx
{
type = distributed
local = b_0
local = b_1
local = b_2
local = b_3
local = d_0
local = d_1
local = d_2
local = d_3
}
Each chunk is a mod on the unique id (id MOD 4). My problem is only the first 7 are used.
In my test case I have the following data:
b_0 = 4 8
b_1 = 5
b_2 = 2 6
b_3 = 3 7
d_1 = 1
If I change the dist index to only have those 5 and search against dist_idx all 8 records
come back, but, if I search against it as I first described it only 2-8 return.
Does anyone have any ideas! I don't want to combine my delta's really although it would
probably work.
REF: http://www.sphinxsearch.com/forum/view.html?id=5163

Related

ORTOOLS - CPSAT - Objective to minimize a value by intervals

I my model in ORTools CPSAT, I am computing a variable called salary_var (among others). I need to minimize an objective. Let’s call it « taxes ».
to compute the taxes, the formula is not linear but organised this way:
if salary_var below 10084, taxes corresponds to 0%
between 10085 and 25710, taxes corresponds to 11%
between 25711 and 73516, taxes corresponds to 30%
and 41% for above
For example, if salary_var is 30000 then, taxes are:
(25710-10085) * 0.11 + (30000-25711) * 0.3 = 1718 + 1286 = 3005
My question: how can I efficiently code my « taxes » objective?
Thanks for your help
Seb
This task looks rather strange, there is not much context and some parts of the task might touch some not-so-nice areas of finite-domain based solvers (large domains or scaling / divisions during solving).
Therefore: consider this as an idea / template!
Code
from ortools.sat.python import cp_model
# Data
INPUT = 30000
INPUT_UB = 1000000
TAX_A = 11
TAX_B = 30
TAX_C = 41
# Helpers
# new variable which is constrained to be equal to: given input-var MINUS constant
# can get negative / wrap-around
def aux_var_offset(model, var, offset):
aux_var = model.NewIntVar(-INPUT_UB, INPUT_UB, "")
model.Add(aux_var == var - offset)
return aux_var
# new variable which is equal to the given input-var IFF >= 0; else 0
def aux_var_nonnegative(model, var):
aux_var = model.NewIntVar(0, INPUT_UB, "")
model.AddMaxEquality(aux_var, [var, model.NewConstant(0)])
return aux_var
# Model
model = cp_model.CpModel()
# vars
salary_var = model.NewIntVar(0, INPUT_UB, "salary")
tax_component_a = model.NewIntVar(0, INPUT_UB, "tax_11")
tax_component_b = model.NewIntVar(0, INPUT_UB, "tax_30")
tax_component_c = model.NewIntVar(0, INPUT_UB, "tax_41")
# constraints
model.AddMinEquality(tax_component_a, [
aux_var_nonnegative(model, aux_var_offset(model, salary_var, 10085)),
model.NewConstant(25710 - 10085)])
model.AddMinEquality(tax_component_b, [
aux_var_nonnegative(model, aux_var_offset(model, salary_var, 25711)),
model.NewConstant(73516 - 25711)])
model.Add(tax_component_c == aux_var_nonnegative(model,
aux_var_offset(model, salary_var, 73516)))
tax_full_scaled = tax_component_a * TAX_A + tax_component_b * TAX_B + tax_component_c * TAX_C
# Demo
model.Add(salary_var == INPUT)
solver = cp_model.CpSolver()
status = solver.Solve(model)
print(list(map(lambda x: solver.Value(x), [tax_component_a, tax_component_b, tax_component_c, tax_full_scaled])))
Output
[15625, 4289, 0, 300545]
Remarks
As implemented:
uses scaled solving
produces scaled solution (300545)
no fiddling with non-integral / ratio / rounding stuff BUT large domains
Alternative:
Maybe something around AddDivisionEquality
Edit in regards to Laurents comments
In some scenarios, solving the scaled problem but being able to reason about the real unscaled values easier might make sense.
If i interpret the comment correctly, the following would be a demo (which i was not aware of and it's cool!):
Updated Demo Code (partial)
# Demo -> Attempt of demonstrating the objective-scaling suggestion
model.Add(salary_var >= 30000)
model.Add(salary_var <= 40000)
model.Minimize(salary_var)
model.Proto().objective.scaling_factor = 0.001 # DEFINE INVERSE SCALING
solver = cp_model.CpSolver()
solver.parameters.log_search_progress = True # SCALED BACK OBJECTIVE PROGRESS
status = solver.Solve(model)
print(list(map(lambda x: solver.Value(x), [tax_component_a, tax_component_b, tax_component_c, tax_full_scaled])))
print(solver.ObjectiveValue()) # SCALED BACK OBJECTIVE
Output (excerpt)
...
...
#1 0.00s best:30 next:[30,29.999] fixed_bools:0/1
#Done 0.00s
CpSolverResponse summary:
status: OPTIMAL
objective: 30
best_bound: 30
booleans: 1
conflicts: 0
branches: 1
propagations: 0
integer_propagations: 2
restarts: 1
lp_iterations: 0
walltime: 0.0039022
usertime: 0.0039023
deterministic_time: 8e-08
primal_integral: 1.91832e-07
[15625, 4289, 0, 300545]
30.0

Table sort by month

I have a table in MATLAB with attributes in the first three columns and data from the fourth column onwards. I was trying to sort the entire table based on the first three columns. However, one of the columns (Column C) contains months ('January', 'February' ...etc). The sortrows function would only let me choose 'ascend' or 'descend' but not a custom option to sort by month. Any help would be greatly appreciated. Below is the code I used.
sortrows(Table, {'Column A','Column B','Column C'} , {'ascend' , 'ascend' , '???' } )
As #AnonSubmitter85 suggested, the best thing you can do is to convert your month names to numeric values from 1 (January) to 12 (December) as follows:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t.ColumnC = month(datenum(t.ColumnC,'mmmm'));
This will facilitate the access to a standard sorting criterion for your ColumnC too (in this example, ascending):
t = sortrows(t,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
If, for any reason that is unknown to us, you are forced to keep your months as literals, you can use a workaround that consists in sorting a clone of the table using the approach described above, and then applying to it the resulting indices:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t_original = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t_clone = t_original;
t_clone.ColumnC = month(datenum(t_clone.ColumnC,'mmmm'));
[~,idx] = sortrows(t_clone,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
t_original = t_original(idx,:);

How can I sum up functions that are made of elements of the imported dataset?

See the code and error. I have already tried Do, For,...and it is not working.
CODE + Error from Mathematica:
Import of survival probabilities _{k}p_x and _{k}p_y (calculated in excel)
px = Import["C:\Users\Eva\Desktop\kpx.xlsx"];
px = Flatten[Take[px, All], 1];
NOTE: The probability _{k}p_x can be found on the position px[[k+2, x -16]
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] = Sum[v^k*px[[k + 2, x - 16]]*py[[k + 2, y - 16]], {k , 0, n - 1}]
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
General::stop: Further output of Part::pkspec1 will be suppressed during this calculation.
Part of dataset (left corner of the dataset):
k\x 18 19 20
0 1 1 1
1 0.999478086278185 0.999363078716059 0.99927911905056
2 0.998841497412202 0.998642656911039 0.99858030519133
3 0.998121451605207 0.99794428814123 0.99788275311401
4 0.997423447323642 0.997247180349674 0.997174407432264
5 0.996726703362208 0.996539285828369 0.996437857252448
6 0.996019178300768 0.995803204773039 0.99563600297737
7 0.995283481416241 0.995001861216016 0.994823584922968
8 0.994482556091416 0.994189960607964 0.99405569519175
9 0.993671079225432 0.99342255996206 0.993339856748282
10 0.992904079096455 0.992707177451333 0.992611817294026
11 0.992189069953677 0.9919796017009 0.991832027835091
Without having the exact same data files to work with it is often easy for each of us to make mistakes that the other cannot reproduce or understand.
From your snapshot of your data set I used Export in Mathematica to try to reproduce your .xlsx file. Then I tried the following
px = Import["kpx.xlsx"];
px = Flatten[Take[px, All], 1];
py = px; (* fake some py data *)
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] := Sum[v^k*px[[k+2,x-16]]*py[[k+2,y-16]], {k,0,n-1}];
JointLifeIndep[17, 17, 12]
and it displays 362.402
Notice I used := instead of = in my definition of JointLifeIndep. := and = do different things in Mathematica. = will immediately evaluate the right hand side of that definition. This is possibly the reason that you are getting the error that you do.
You should also be careful with your subscript values and make sure that every subscript is between 1 and the number of rows (or columns) in your matrix.
So see if you can try this example with an Excel sheet containing only the snapshot of data that you showed and see if you get the same result that I do.
Hopefully that will be enough for you to make progress.

Creating an array of a specific size from data in an array of a larger size - averages

I want to find the average value across an array between the element(x) and element(x+1)
for val = 1: xMid_p-1
eapDia_p = diaArray_p(1,val);
baseDia_p = diaArray_p(1,end);
curDiaArray_p = linspace(eapDia_p, baseDia_p, xMid_p-1);
curRadArray_p = curDiaArray_p/2;
maxRad = max(curRadArray_p);
for val = 1 : xMid_p-1
ln(1,val) = maxRad(:) - curRadArray_p(val);
lnE(1,val) = ln(1,val).^3;
presAn(1,val)= acos(((refDia_p/2)*cos(refPresAng_p))./curRadArray_p(val));
arcToo(1,val) = 2 * curRadArray_p(val)*((twRefDia_p/refDia_p)+(tan(refPresAng_p)-refPresAng_p)-(tan(presAn(1,val))-presAn(1,val)));
chor(1,val) = 2 * curRadArray_p(val) * sin(arcToo(1,val)/(curRadArray_p(1,val)*2));
for val = 1 : xMid_p - 2
lnM(1,val) = maxRad(:) - curRadArray_p(val);
lnME(1,val)=lnM(1,val).^3;
end
end
lnCubed(1,:) = ln.^3;
lnMCubed(1,:) = lnM.^3;
lnEq = lnCubed(2:end) - lnMCubed;
end
please see chor(1,val), this would give the value :
chor =
1 2 3 4 5 6 7 8
I want to find the average chor, therefore the array will be one element smaller in size and will give the result
aveChor =
1.5 2.5 3.5 4.5 5.5 6.5 7.5
One approach using indexing -
aveChor = (chor(2:end) + [chor(1:end-1)])/2
Another approach using diff -
aveChor = (2*chor(1:end-1) + diff(chor))/2

SystemVerilog array random seed of Shuffle function

I get the same output everytime I run the code below.
module array_shuffle;
integer data[10];
initial begin
foreach (data[x]) begin
data[x] = x;
end
$display("------------------------------\n");
$display("before shuffle, data contains:\n");
foreach (data[x]) begin
$display("data[%0d] = %0d", x, data[x]);
end
data.shuffle();
$display("------------------------------\n");
$display("after shuffle, data contains:\n");
foreach (data[x]) begin
$display("data[%0d] = %0d", x, data[x]);
end
end
endmodule
Output:
------------------------------
before shuffle, data contains:
data[0] = 0
data[1] = 1
data[2] = 2
data[3] = 3
data[4] = 4
data[5] = 5
data[6] = 6
data[7] = 7
data[8] = 8
data[9] = 9
------------------------------
after shuffle, data contains:
data[0] = 8
data[1] = 6
data[2] = 7
data[3] = 9
data[4] = 5
data[5] = 0
data[6] = 1
data[7] = 4
data[8] = 2
data[9] = 3
Is there a way to seed the randomization of the shuffle function?
Shuffle returns the same result every time because you probably run the simulator with the same seed. This is the intended behavior, because when you run a simulation and find a bug, you want to be able to reproduce it, regardless of any design (and to some extent testbench) changes. To see a different output, try setting the seed on the simulator command line. For Incisive this is:
irun -svseed 1 // sets the seed to 1
irun -svseed random // will set a random seed
It's also possible to manipulate the seed of the random number generator using set_randstate, but I wouldn't mess with that.