Probability in hashing and load balancing - hash

Explain what is the importance of using Probability in hashing and load balancing rather than other techniques.
importance of probability in computing field

Related

Modulus Switching in SEAL library

Conventionally, modulus switching is primarily used to make the noise growth linear, as opposed to exponential. However, in the BFV examples, it has been introduced as a tool to shave off primes (thereby reducing the bitlength of coefficient modulus) and improve computational efficiency.
Does it help in reducing noise growth in the BFV scheme as well? Will I observe exponential growth in noise without (manually) switching modulus?
In BFV you don't need to do modulus switching because exponential noise growth is prevented by the scale invariance property. The main benefit of it is therefore in improving computational performance and perhaps communication cost.
For example, in some simple protocol Alice might encrypt data and send it to Bob who computes on it and sends the result back. If Alice only needs to decrypt the result, the parameters can just as well be as small as possible when Alice receives the result, so Bob should switch to smallest possible parameters before sending the data back to Alice to minimize the communication cost.

General rules for training the RNN model when loss stops decreasing

I have an RNN model. After about 10K iterations, the loss stops decreasing, but the loss is not very small yet. Does it always mean the optimization is trapped in a local minimum?
In general, what would be the actions should I take to address this issue? Add more training data? Change a different optimization scheme (SGD now)? Or Other options?
Many thanks!
JC
If you are training you neural network using a gradient vector based algorithm such as Back Propagation or Resilient Propagation it can stop improving when it finds a local minimum and it is normal because of the nature of this type fo algorithm. In this case, the propagation algorithms is used to search what a (gradient) vector is pointing.
As a suggestion you could add a different strategy during the training to explore the space of search instead only searching. For sample, a Genetic Algorithm or the Simulated Annealing algorithm. These approaches will provide a exploration of possibilities and it can find a global minimum. You could implement 10 itegrations for each 200 iterations of the propagation algorithm, creating a hybrid strategy. For sample (it's just a pseudo-code):
int epochs = 0;
do
{
train();
if (epochs % 200 == 0)
traingExplorativeApproach();
epochs++;
} while (epochs < 10000);
I've developed a strategy like this using Multi-Layer Perceptrons and Elman recurrent neural network in classification and regressions problems and both cases a hybrid strategy has provided better results then a single propagation training.

Clustering Algorithm for average energy measurements

I have a data set which consists of data points having attributes like:
average daily consumption of energy
average daily generation of energy
type of energy source
average daily energy fed in to grid
daily energy tariff
I am new to clustering techniques.
So my question is which clustering algorithm will be best for such kind of data to form clusters ?
I think hierarchical clustering is a good choice. Have a look here Clustering Algorithms
The more simple way to do clustering is by kmeans algorithm. If all of your attributes are numerical, then this is the easiest way of doing the clustering. Even if they are not, you would have to find a distance measure for caterogical or nominal attributes, but still kmeans is a good choice. Kmeans is a partitional clustering algorithm... i wouldn't use hierarchical clustering for this case. But that also depends on what you want to do. you need to evaluate if you want to find clusters within clusters or they all have to be totally apart from each other and not included on each other.
Take care.
1) First, try with k-means. If that fulfills your demand that's it. Play with different number of clusters (controlled by parameter k). There are a number of implementations of k-means and you can implement your own version if you have good programming skills.
K-means generally works well if data looks like a circular/spherical shape. This means that there is some Gaussianity in the data (data comes from a Gaussian distribution).
2) if k-means doesn't fulfill your expectations, it is time to read and think more. Then I suggest reading a good survey paper. the most common techniques are implemented in several programming languages and data mining frameworks, many of them are free to download and use.
3) if applying state-of-the-art clustering techniques is not enough, it is time to design a new technique. Then you can think by yourself or associate with a machine learning expert.
Since most of your data is continuous, and it reasonable to assume that energy consumption and generation are normally distributed, I would use statistical methods for clustering.
Such as:
Gaussian Mixture Models
Bayesian Hierarchical Clustering
The advantage of these methods over metric-based clustering algorithms (e.g. k-means) is that we can take advantage of the fact that we are dealing with averages, and we can make assumptions on the distributions from which those average were calculated.

ELKI - Clustering Statistics

When a data set is analyzed by a clustering algorithm in ELKI 0.5, the program produces a number of statistics: the Jaccard index, F1-Measures, etc. In order to calculate these statistics, there have to be 2 clusterings to compare. What is the clustering created by the algorithm compared to?
The automatic evaluation (note that you can configure the evaluation manually!) is based on labels in your data set. At least in the current version (why are you using 0.5 and not 0.6.0?) it should only automatically evaluate if it finds labels in the data set.
We currently have not published internal measures. There are some implementations, such as evaluation/clustering/internal/EvaluateSilhouette.java, some of which will be in the next release.
In my experiments, internal evaluation measures were badly misleading. For example on the Silhouette coefficient, the labeled "solution" would often even score a negative silhouette coefficient (i.e. worse than not clustering at all).
Also, these measures are not scalable. The silhouette coefficient is in O(n^2) to compute; which usually makes this evaluation more expensive than the actual clustering!
We do appreciate contributions!
You are more than welcome to contribute your favorite evaluation measure to ELKI, to share with others.

Bayesian belief network/system with Fuzzy Clustering neural networks

Many researches have argued that Artificial Neural Networks (ANNs) can
improve the performance of intrusion detection systems (IDS) when
compared with traditional methods. However for ANN-based IDS,
detection precision, especially for low-frequent attacks, and
detection stability are still needed to be enhanced. A new approach is
called FC-ANN, based on ANN and fuzzy clustering, to solve the problem
and help IDS achieve higher detection rate, less false positive rate
and stronger stability. The general procedure of FC-ANN is as follows:
firstly fuzzy clustering technique is used to generate different
training subsets. Subsequently, based on different training subsets,
different ANN models are trained to formulate different base models.
Finally, a meta-learner, fuzzy aggregation module, is employed to
aggregate these results. Experimental results on the KDD CUP 1999
dataset show that the proposed new approach, FC-ANN, outperforms BPNN
and other well-known methods such as decision tree, the naïve Bayes in
terms of detection precision and detection stability.
Question:
Would it be possible to combine a Bayesian belief network/system with Fuzzy Clustering neural networks for intrusion detection?
Can anyone foresee any problems I may encounter? Your input would be most valuable.