trainCascadeObjectDetector FalseAlarmRate - lower or higher? - matlab

I read info about cascade training in Matlab:
Lower values for FalseAlarmRate increase complexity of each stage.
Increased complexity can achieve fewer false detections but can result
in longer training and detection times. Higher values for
FalseAlarmRate can require a greater number of cascade stages to
achieve reasonable detection accuracy.
Isn't it should be also Lower values in the second sentence? If lower values increases complexity then lower values requires a greater number of cascade stages, not higher ones...
So I'm a little bit confused
https://www.mathworks.com/help/vision/ref/traincascadeobjectdetector.html

Related

Are bounded distributions automatically adjusted to ensure the area under the curve is still equal to 1.0?

For instance, if I'm simulating demand, but the mean is close to zero and the SD is large enough that the normal distribution includes negative values, then negative demand outcomes are possible. So, we use a bounded normal with a min of zero to prevent that from occurring. However, the probabilities of the remaining possible demand values no longer sum to 1.0. So, the curve should be raised up y-axis just a bit. This is more of a theoretical question because in practice, I can't imagine it making too much of a difference. After all, each demand outcome's probability would simply be increased equally (by an amount equal to the area under the curve < 0 divided by the number of remaining possible demand outcomes), making this mostly a moot point.
Does anyone know if Anylogic automatically adjusts bounded distributions for this? Thanks.

Could I ensure the convergence by setting the min and max boundary of state variables in Dymola?

I was struggling to improve the convergence performance of my model in Dymola during the last month. Now I am thinking about that If I define the min and max attributes of the state variables, like the max mass flow rate is 10000kg/s, the min mass flow rate is 0.01kg/s, during the iteration, when the results reaching to the max or min boundary, would the iteration be continued or it just stops?
I am considering if when the iteration result reaches the boundary, the iteration would bounce back in the opposite direction, it might ensure the convergence of my model.
I prepare to do some tests about my idea, If anyone got the same question or opinion, welcome to comment or answer.
Setting min/max for variables is unlikely to significantly improve performance in Dymola.
If min/max assertions are active for a variable the solver will reject the step with values out of bounds, and currently not try to map them back to valid values in a clever way. That may skip some computations based on the out-of-bounds values, but it is rare that it matters much - and there is also a cost of rejecting the step etc.

Taking big chunk of Time while Running K means on Python Spark

I have a nparray vector with 0s and 1s with 37k rows and 6k columns.
When I try to run Kmeans Clustering in Pyspark, it takes almost forever to load and I cannot get the output. Is there any way to reduce the processing time or any other tricks to solve this issue?
I think that you may have too many columns, you could have faced the dimensionality course. Wikipedia link
[...] The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. [...]
In order to solve this problem, did you consider reducing your columns, using only relevant ones? Check again this Wikipedia link
[...] Feature projection transforms the data in the high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in principal component analysis (PCA), but many nonlinear dimensionality reduction techniques also exist. [...]

Why is l2 regularization always an addition?

I am reading through info about the l2 regularization of neural network weights. So far I understood, the intention is that weights get pushed towards zero the larger they become i.e. large weights receive a high penalty while lower ones are less severely punished.
The formulae is usually:
new_weight = weight * update + lambda * sum(squared(weights))
My question: Why is this always positive? If the weight is already positive the l2 will never decrease it but makes things worse and pushes the weight away from zero. This is the case in almost all formulae I saw so far, why is that?
The formula you presented is very vague about what an 'update' is.
First, what is regularization? Generally speaking, the formula for L2 regularization is:
(n is traing set size, lambda scales the influence of the L2 term)
You add an extra term to your original cost function , which will be also partially derived for the update of the weights. Intuitively, this punishes big weights, so the algorithm tries to find the best tradeoff between small weights and the chosen cost function. Small weights are associated with finding a simpler model, as the behavior of the network does not change much when given some random outlying values. This means it filters out the noise of the data and comes down to learn the simplest possible solution. In other words, it reduces overfitting.
Going towards your question, let's derive the update rule. For any weight in the graph, we get
Thus, the update formula for the weights can be written as (eta is the learning rate)
Considering only the first term, the weight seems to be driven towards zero regardless of what's happening. But the second term can add to the weight, if the partial derivative is negative. All in all, weights can be positive or negative, as you cannot derive a constraint from this expression. The same applies to the derivatives. Think of fitting a line with a negative slope: the weight has to be negative. To answer your question, neither the derivative of regularized cost nor the weights have to be positive all the time.
If you need more clarification, leave a comment.

relation between support vectors and accuracy in case of RBF kernel

I am using RBF kernel matlab function.
On couple of dataset as I go on increasing sigma value the number of support vectors increase and accuracy increases.
While in case of one data set, as I increase the sigma value, the support vectors decrease and accuracy increases.
I am not able to analyze the relation between support vectors and accuracy in case of RBF kernel.
The number of support vectors doesn't have a direct relationship to accuracy; it depends on the shape of the data (and your C/nu parameter).
Higher sigma means that the kernel is a "flatter" Gaussian and so the decision boundary is "smoother"; lower sigma makes it a "sharper" peak, and so the decision boundary is more flexible and able to reproduce strange shapes if they're the right answer. If sigma is very high, your data points will have a very wide influence; if very low, they will have a very small influence.
Thus, often, increasing the sigma values will result in more support vectors: for more-or-less the same decision boundary, more points will fall within the margin, because points become "fuzzier." Increased sigma also means, though, that the slack variables "moving" points past the margin are more expensive, and so the classifier might end up with a much smaller margin and fewer SVs. Of course, it also might just give you a dramatically different decision boundary with a completely different number of SVs.
In terms of maximizing accuracy, you should be doing a grid search on many different values of C and sigma and choosing the one that gives you the best performance on e.g. 3-fold cross-validation on your training set. One reasonable approach is to choose from e.g. 2.^(-9:3:18) for C and median_eval * 2.^(-4:2:10); those numbers are fairly arbitrary, but they're ones I've used with success in the past.