Can we give supply and demand values in range for min cost flow problem using google or tools - or-tools

Currently its specified in documentation to provide Supply and demand in list with integer values. Is it possible to give in supply and demand in range.
Current implementation as in documentation:-
supplies = [20, 0, 0, -5, -15]

Related

Is H2O early stop only restrict against ntree?

I am reading the following two examples
https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/gbmTuning.ipynb
https://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/docs-website/h2o-docs/grid-search.html
Both of them when set up grid search, it fix the ntree instead of feed a list of ntree for example
[i * 100 for i in range(1, 11)].
Here is my question
I am wondering is that because early stop is set up against the
ntree? For example, we can set up ntree = 1000 and
score_tree_interval = 100, then it can evaluate the model
performance from 100, 200, ... till 1000. Do I understand correctly?
But if my grid search also include learn_rate and max_depth. Will
the early stop also evaluate against learn_rate and max_depth? I
mean within the same number of tree for example ntree = 500, when it
evaluate different learning rate [0.01, 0.015, 0.025, 0.05, 0.1],
will it stop somewhere in the list of learn rate?
In the
document of "stopping_tolerance" (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/stopping_tolerance.html)
it describes "the model will stop training after reaching three
scoring events in a row in which a model’s missclassication value
does not improve by 1e-3". So what are the three scoring events? are
they 3 different number of tree or they could be the same number of
tree but different learning rate?
As Darren Cook mentioned in the comments there is early stopping for each model you build and early stopping for grid search.
For an individual GBM, ntrees (the number of trees) is tuned with early stopping (i.e. using stopping_tolerance, stopping_rounds, stopping_metric specified within the algorithm). You can see this if you open up flow and take a look at the scoring history plot of your individual model. You will see that number of trees is the x-axis.
For Grid Search you have the added layer of your hyper parameters. So if you set ntrees = 100 in your GBM model, and you grid over learning rate = [0.01, 0.015] you will build two models one with ntrees = 100 and learn rate = 0.01, and a second model with ntrees = 100 and learn rate = 0.015. And, for example, in the first model each iteration will have a different number of trees while the learn rate will be fixed.
So looking at your specific questions:
yes this is correct
the grid will see if there is any improvement between your different learning rates (0.01, 0.015, etc) and max depth. So again what you are saying is correct. The grid will stop if it's not seeing any improvement with different learn rates and max depth values (i.e. it will not continue to build new models).
So here you need to separate out the model and the grid search. An individual model will stop building (adding trees) if it doesn't see improvement after three scoring events (and here your learn_rate and max_depth would be fixed while the ntrees would change). Then we step out to the grid, the grid will stop building new models if it doesn't see a user-specified amount of improvement between the individual models it built.

looping instead of manual input of multiple initial numbers variable Netlogo

I have a variable (grass) where you set in an initial number of grass. I want to see the result for different initial numbers, such as 50, 100, 150, ...., 1000, but it is too troublesome to set it manually, then run one by one. Is there anyway where I can just set a loop with 50 as the increment and stop at 1000, then release a certain value. Is that possible in Net Logo?
I have tried using the for function for other programming languages, but I do not know if it works here.
Look at BehaviorSpace - it's in the Tools menu. This is the batch simulation tool that exports the results so you can then do a summary etc. This will also allow you to do multiple repetitions for each of these values so you can calculate the average results for each, which is important when your simulation has randomness. There is a specific manual in the documentation, see https://ccl.northwestern.edu/netlogo/docs/behaviorspace.html

Recall, Recall rate#k and precision in top-k recommendation

According to authors in 1, 2, and 3, Recall is the percentage of relevant items selected out of all the relevant items in the repository, while Precision is the percentage of relevant items out of those items selected by the query.
Therefore, assuming user U gets a top-k recommended list of items, they would be something like:
Recall= (Relevant_Items_Recommended in top-k) / (Relevant_Items)
Precision= (Relevant_Items_Recommended in top-k) / (k_Items_Recommended)
Until that part everything is clear but I do not understand the difference between them and Recall rate#k. How would be the formula to compute recall rate#k?
Finally, I received an explanation from Prof. Yuri Malheiros (paper 1). Althougth recall rate#k as cited in papers cited in the questions seemed to be the normal recall metrics but applied into a top-k, they are not the same. This metric is also used in paper 2, paper 3 and paper 3
The recall rate#k is a percentage that depends on the tests made, i.e., the number of recommendations and each recommendation is a list of items, some items will be correct and some not. If we made 50 different recommendations, let us call it R (regardless of the number of items for each recommendation), to calculate the recall rate is necessary to look at each of the 50 recommendations. If, for each recommendation, at least one recommended item is correct, you can increment a value, in this case, let us call it N. In order to calculate the recall rate#R, it is neccesary to make the N/R.

How to use Morton Order(z order curve) in range search?

How to use Morton Order in range search?
From the wiki, In the paragraph "Use with one-dimensional data structures for range searching",
it says
"the range being queried (x = 2, ..., 3, y = 2, ..., 6) is indicated
by the dotted rectangle. Its highest Z-value (MAX) is 45. In this
example, the value F = 19 is encountered when searching a data
structure in increasing Z-value direction. ......BIGMIN (36 in the
example).....only search in the interval between BIGMIN and MAX...."
My questions are:
1) why the F is 19? Why the F should not be 16?
2) How to get the BIGMIN?
3) Are there any web blogs demonstrate how to do the range search?
EDIT: The AWS Database Blog now has a detailed introduction to this subject.
This blog post does a reasonable job of illustrating the process.
When searching the rectangular space x=[2,3], y=[2,6]:
The minimum Z Value (12) is found by interleaving the bits of the lowest x and y values: 2 and 2, respectively.
The maximum Z value (45) is found by interleaving the bits of the highest x and y values: 3 and 6, respectively.
Having found the min and max Z values (12 and 45), we now have a linear range that we can iterate across that is guaranteed to contain all of the entries inside of our rectangular space. The data within the linear range is going to be a superset of the data we actually care about: the data in the rectangular space. If we simply iterate across the entire range, we are going to find all of the data we care about and then some. You can test each value you visit to see if it's relevant or not.
An obvious optimization is to try to minimize the amount of superfluous data that you must traverse. This is largely a function of the number of 'seams' that you cross in the data -- places where the 'Z' curve has to make large jumps to continue its path (e.g. from Z value 31 to 32 below).
This can be mitigated by employing the BIGMIN and LITMAX functions to identify these seams and navigate back to the rectangle. To minimize the amount of irrelevant data we evaluate, we can:
Keep a count of the number of consecutive pieces of junk data we've visited.
Decide on a maximum allowable value (maxConsecutiveJunkData) for this count. The blog post linked at the top uses 3 for this value.
If we encounter maxConsecutiveJunkData pieces of irrelevant data in a row, we initiate BIGMIN and LITMAX. Importantly, at the point at which we've decided to use them, we're now somewhere within our linear search space (Z values 12 to 45) but outside the rectangular search space. In the Wikipedia article, they appear to have chosen a maxConsecutiveJunkData value of 4; they started at Z=12 and walked until they were 4 values outside of the rectangle (beyond 15) before deciding that it was now time to use BIGMIN. Because maxConsecutiveJunkData is left to your tastes, BIGMIN can be used on any value in the linear range (Z values 12 to 45). Somewhat confusingly, the article only shows the area from 19 on as crosshatched because that is the subrange of the search that will be optimized out when we use BIGMIN with a maxConsecutiveJunkData of 4.
When we realize that we've wandered outside of the rectangle too far, we can conclude that the rectangle in non-contiguous. BIGMIN and LITMAX are used to identify the nature of the split. BIGMIN is designed to, given any value in the linear search space (e.g. 19), find the next smallest value that will be back inside the half of the split rectangle with larger Z values (i.e. jumping us from 19 to 36). LITMAX is similar, helping us to find the largest value that will be inside the half of the split rectangle with smaller Z values. The implementations of BIGMIN and LITMAX are explained in depth in the zdivide function explanation in the linked blog post.
It appears that the quoted example in the Wikipedia article has not been edited to clarify the context and assumptions. The approach used in that example is applicable to linear data structures that only allow sequential (forward and backward) seeking; that is, it is assumed that one cannot randomly seek to a storage cell in constant time using its morton index alone.
With that constraint, one's strategy begins with a full range that is the mininum morton index (16) and the maximum morton index (45). To make optimizations, one tries to find and eliminate large swaths of subranges that are outside the query rectangle. The hatched area in the diagram refers to what would have been accessed (sequentially) if such optimization (eliminating subranges) had not been applied.
After discussing the main optimization strategy for linear sequential data structures, it goes on to talk about other data structures with better seeking capability.

Effective way to display the data in the chart

I have an application where some values are stored in DB, e.g. one value per second. It is 604800 values per 7 days and if I want to view this value in graph I need some effective way how to get only e.g. 800 values from DB if I have chart with 800px width.
I use some aggregation logic where mean value is computed for values in 2, 3, 4, 5, 6, 10, 12 minute interval and then hour and day interval aggregates are computed.
I use PostgreSQL and this aggregations are computed with statement:
"INSERT INTO aggre_table_ ... SELECT sum(...)/count(*) ... WHERE timestamp > ... and timestamp < ..."
Is there any better way how to do this or what is the best way of data aggregation for later displaying in charts?
Is it better to do this by some trigger or calling stored procedures?
Is there any DB support for aggregations for D3js, Highcharts or Google Charts?
How to aggregate your data is a large topic that is independent of your technology choices. It depends largely on how sensitive the data is, what the important indicators of the data are, what the implications of those indicators are, etc.
Is a single out of range point significant? Or are you looking for the overall trend? These are big questions with answers that aren't always easy.
My general suggestion:
to display a week worth of data, aggregate to hourly averages.
provide a range around that line indicating the distribution of points around each average
if something significant happened within that aggregated point, indicate it with a separate marker
provide drill down capability for each aggregated point to see the full detail charted, if that level of detail is important (chances are, it's not)
In Highcharts (Highstock in the fact) dataGrouping is used for approximation (see demo).
Also, here you can find more about Highstock.