is there a discretized method available in matlab? - matlab

I have a set attributes like so in my data file:
The selected attributes consists of both discrete and continuous attribute types. The attributes Protocol Type and Service are of type discrete and the attribute Src Bytes, Dst Bytes, Count are of continuous type.
I want to try implement the k-means/fcm algorithm for clustering trainning data for a Neural Network, but i have to process the dataset in number of iterations, the continuous type attribute will increase the load on the algorithm and thereby decreasing the performance. Hence they are converted to discrete values but how can I achieve this in matlab?
I also need help on understanding discrete and continuous and why or how the algorithms mentioned use them?

Matlab uses double floating point precision by default. I don't think there's significant improvement performing integer arithmetic using Matlab's interpreter (using integer data types might actually be slower because Matlab's functions are optimized for doubles).
Code up your algorithm without worrying about optimizing, then if it's too slow use the profiler to find which parts of your algorithm are slow.

Related

Is it possible to use evaluation metrics (like NDCG) as a loss function?

I am working on a Information Retrieval model called DPR which is a basically a neural network (2 BERTs) that ranks document, given a query. Currently, This model is trained in binary manners (documents are whether related or not related) and uses Negative Log Likelihood (NLL) loss. I want to change this binary behavior and create a model that can handle graded relevance (like 3 grades: relevant, somehow relevant, not relevant). I have to change the loss function because currently, I can only assign 1 positive target for each query (DPR uses pytorch NLLLoss) and this is not what I need.
I was wondering if I could use a evaluation metric like NDCG (Normalized Discounted Cumulative Gain) to calculate the loss. I mean, the whole point of a loss function is to tell how off our prediction is and NDCG is doing the same.
So, can I use such metrics in place of loss function with some modifications? In case of NDCG, I think something like subtracting the result from 1 (1 - NDCG_score) might be a good loss function. Is that true?
With best regards, Ali.
Yes, this is possible. You would want to apply a listwise learning to rank approach instead of the more standard pairwise loss function.
In pairwise loss, the network is provided with example pairs (rel, non-rel) and the ground-truth label is a binary one (say 1 if the first among the pair is relevant, and 0 otherwise).
In the listwise learning approach, however, during training you would provide a list instead of a pair and the ground-truth value (still a binary) would indicate if this permutation is indeed the optimal one, e.g. the one which maximizes nDCG. In a listwise approach, the ranking objective is thus transformed into a classification of the permutations.
For more details, refer to this paper.
Obviously, the network instead of taking features as input may take BERT vectors of queries and the documents within a list, similar to ColBERT. Unlike ColBERT, where you feed in vectors from 2 docs (pairwise training), for listwise training u need to feed in vectors from say 5 documents.

Neural Networks w/ Fixed Point Parameters

Most neural networks are trained with floating point weights/biases.
Quantization methods exist to convert the weights from float to int, for deployment on smaller platforms.
Can you build neural networks from the ground up that constrain all parameters, and their updates to be integer arithmetic?
Could such networks achieve a good accuracy?
(I know a bit about fixed-point and have only some rusty NN experience from the 90's so take what I have to say with a pinch of salt!)
The general answer is yes, but it depends on a number of factors.
Bear in mind that floating-point arithmetic is basically the combination of an integer significand with an integer exponent so it's all integer under the hood. The real question is: can you do it efficiently without floats?
Firstly, "good accuracy" is highly dependent on many factors. It's perfectly possible to perform integer operations that have higher granularity than floating-point. For example, 32-bit integers have 31 bits of mantissa while 32-bit floats effectively have only 24. So provided you do not require the added precision that floats give you near zero, it's all about the types that you choose. 16-bit -- or even 8-bit -- values might suffice for much of the processing.
Secondly, accumulating the inputs to a neuron has the issue that unless you know the maximum number of inputs to a node, you cannot be sure what the upper bound is on the values being accumulated. So effectively you must specify this limit at compile time.
Thirdly, the most complicated operation during the execution of a trained network is often the activation function. Again, you firstly have to think about what the range of values are within which you will be operating. You then need to implement the function without the aid of an FPU with all of the advanced mathematical functions it provides. One way to consider doing this is via lookup tables.
Finally, training involves measuring the error between values and that error can often be quite small. Here is where accuracy is a concern. If the differences you are measuring are too low, they will round down to zero and this may prevent progress. One solution is to increase the resolution of the value by providing more fractional digits.
One advantage that integers have over floating-point here is their even distribution. Where floating-point numbers lose accuracy as they increase in magnitude, integers maintain a constant precision. This means that if you are trying to measure very small differences in values that are close to 1, you should have no more trouble than you would if those values were as close to 0. The same is not true for floats.
It's possible to train a network with higher precision types than those used to run the network if training time is not the bottleneck. You might even be able to train the network using floating-point types and run it using lower-precision integers but you need to be aware of differences in behavior that these shortcuts will bring.
In short the problems involved are by no means insurmountable but you need to take on some of the mental effort that would normally be saved by using floating-point. However, especially if your hardware is physically constrained, this can be a hugely benneficial approach as floating-point arithmetic requires as much as 100 times more silicon and power than integer arithmetic.
Hope that helps.

MATLAB - How should I choose the FunctionTolerance of genetic algorithm optimisation?

I am optimising MRI experiment settings in order to get optimal precision of tissue property measurements using my data.
To optimise the settings (i.e. a vector of numbers) I am using the MATLAB genetic algorithm function (ga()). I want to compare the final result of different optimisation experiments, that have different parameter upper bounds but I do not know how to choose the FunctionTolerance.
My current implementation takes several days. I would like to increase FunctionTolerance so that it does not take as long, yet still allows me to make reliable comparisons of the final results of the two different optimisation experiments. In other words, I do not want to stop the optimisation too early. I want to stop it when it gets close to its best results, but not wait for a long time for it to refine the result.
Is there a general rule for choosing FunctionTolerance or does it depend on what is being optimised?

Pattern recognition teachniques that allow input as sequence of different length

I am trying to classify water end-use events expressed as a time-series sequences into appropriate categories (e.g. toilet, tap, shower, etc). My first attempt using HMM shows a quite promising result with an average accuracy of 80%. I just wonder if there is any other techniques that allow the training input as time-series sequences of different length like HMM does rather than the extracted feature vector of each sequence. I have tried Conditional Random Field (CRF) and SVM ;however, as far as I know, these two techniques require input as a pre-computed feature vector and the length of all input vectors must be the same for training purpose. I am not sure if I am right or wrong at this point. Any help would be appreciated.
Thanks, Will

Is normalization useful/necessary in optimization?

I am trying to optimize a device design using Matlab optimization toolbox (using the fmincon function to be precise). To get my point across quickly I am providing a small variable set {l_m, r_m, l_c, r_c} which at it's starting value is equal to {4mm, 2mm, 1mm, 0.5mm}.
Though Matlab doesn't specifically recommend normalizing the input variables, my professor advised me to normalize the variables to the maximum value of {l_m , r_m, l_c, r_c}. Thus the variables will now take values from 0 to 1 (instead of say 3mm to 4.5mm in the case of l_m). Of course I have to modify my objective function to convert it back to the proper values and then do the calculations.
My question is: do optimization functions like fmincon care if the input variables are normalized? Is it reasonable to expect change in performance on account of normalization? The point to be considered is how the optimizer varies variables like say l_m — in one case it can change it from 4mm to 4.1mm and in the other case it can change it from 0.75 to 0.76.
It is usually much easier to optimize when the input is normalized. You can expect an improvement in both speed of convergence and in the accuracy of the output.
For instance, As you can see on this article ( http://www-personal.umich.edu/~mepelman/teaching/IOE511/Handouts/511notes07-7.pdf ), the convergence rate of gradient descent is better bounded when the ratio of largest and smallest eigenvalues of the Hessian is small. Typically, when your data is normalized, this ratio is 1 (optimal).