Why can I not choose a beta parameter when conducting LDA with Mallet? - mallet

I am recently working with Mallet to conduct LDA Topic Modeling. I recognized that I am able to pass the alpha hyperparameter for the algorithm to Mallet, but the LDAMallet class does not contain any variable for the beta parameter. Can you guys tell me how that comes?
I know I can turn on hyperparameter optimization every n intervals, which will recalculate an optimal value for the parameters, but even there I dont know by what criteria they are optimized.
Best,
Nero

I'm assuming you're referring to the gensim wrapper? You can specify beta values from command-line Mallet, so there's no reason this couldn't be implemented in Python, but you're correct that it's not there now.
In practice, the default value of 0.01 is almost always close to optimal for natural language data, which is why I suspect no one has implemented it in gensim.

Related

How to check the accuracy of the results obtained from a Numerical simulation which used ODE solver solver_ivp in scipy?

I have a question regarding the results we are obtaining from ODE solvers. I will try my best to briefly explain the question I have with me. For an example if we ran a simulation with ANSYS or may be any other FEA package, before we conclude our results there are many parameters to check the quality of the final results we obtained.
But in a numerical simulation, we are the one who gives relTol,absTol and other parameter values to improve the accuracy of the calculation to the solver. For an example if we select solve_ivp which is highly customisable solver available with SciPy.
Q1).How exactly make sure, the results of the solver is acceptable ?.
Q2). What are the ways we can check the quality of the final results we obtained?, before we make a conclusion based on the results obtained.
Q3) How further improve the accuracy of the by changing solver options?.
Highly appreciate if you can share your ideas with sample codings.
IMO, Q1 and Q2 are the same question. The reliability of the results will depend on the accuracy of the mathematical model wrt to the simulated phenomenon (f.i. assuming linearity when linearity is questionable) and the precision of the algorithm. You need to check if the method converges, and if it converges, must converge to a correct solution.
Ideally, you should compare your results to "ground truth" on typical problems. Ground truth can be obtained from a lab experiment, or by using an alternative method known to yield correct results. Without this, you will never be sure that your numerical method is valid, other than by an act of faith.
To understand the effect of the parameters and address Q3, you can solve the same problem with different parameter settings and observe their effect, one by one. After a while, you should get a better understanding on the convergence properties in relation to the parameter settings.

Is it possible to use evaluation metrics (like NDCG) as a loss function?

I am working on a Information Retrieval model called DPR which is a basically a neural network (2 BERTs) that ranks document, given a query. Currently, This model is trained in binary manners (documents are whether related or not related) and uses Negative Log Likelihood (NLL) loss. I want to change this binary behavior and create a model that can handle graded relevance (like 3 grades: relevant, somehow relevant, not relevant). I have to change the loss function because currently, I can only assign 1 positive target for each query (DPR uses pytorch NLLLoss) and this is not what I need.
I was wondering if I could use a evaluation metric like NDCG (Normalized Discounted Cumulative Gain) to calculate the loss. I mean, the whole point of a loss function is to tell how off our prediction is and NDCG is doing the same.
So, can I use such metrics in place of loss function with some modifications? In case of NDCG, I think something like subtracting the result from 1 (1 - NDCG_score) might be a good loss function. Is that true?
With best regards, Ali.
Yes, this is possible. You would want to apply a listwise learning to rank approach instead of the more standard pairwise loss function.
In pairwise loss, the network is provided with example pairs (rel, non-rel) and the ground-truth label is a binary one (say 1 if the first among the pair is relevant, and 0 otherwise).
In the listwise learning approach, however, during training you would provide a list instead of a pair and the ground-truth value (still a binary) would indicate if this permutation is indeed the optimal one, e.g. the one which maximizes nDCG. In a listwise approach, the ranking objective is thus transformed into a classification of the permutations.
For more details, refer to this paper.
Obviously, the network instead of taking features as input may take BERT vectors of queries and the documents within a list, similar to ColBERT. Unlike ColBERT, where you feed in vectors from 2 docs (pairwise training), for listwise training u need to feed in vectors from say 5 documents.

Define initial parameters of a nonlinear fit with no information

I was wondering if there exists a technical way to choose initial parameters to these kind of problems (as they can take virtually any form). My question arises from the fact that my solution depends a little on initial parameters (as usual). My fit consists of 10 parameters and approximately 5120 data points (x,y,z) and has non linear constraints. I have been doing this by brute force, that is, trying parameters randomly and trying to observe a pattern but it led me nowhere.
I also have tried using MATLAB's Genetic Algorithm (to find a global optimum) but with no success as it seems my function has a ton of local minima.
For the purpose of my problem, I need justfy in some manner the reasons behind choosing initial parameters.
Without any insight on the model and likely values of the parameters, the search space is too large for anything feasible. Think that just trying ten values for every parameter corresponds to ten billion combinations.
There is no magical black box.
You can try Bayesian Optimization to find a global optimum for expensive black box functions. Matlab describes it's implementation [bayesopt][2] as
Select optimal machine learning hyperparameters using Bayesian optimization
but you can use it to optimize any function. Bayesian Optimization works by updating a prior belief over a distribution of functions with the observed data.
To speed up the optimization I would recommend adding your existing data via the InitialX and InitialObjective input arguments.

When I should use transformation of inputs variable in neural network?

I have run ANN in matlab for prediction a variable based on several response variables.ALL variables have numerical values.I could not get a desirable results although I changed hidden neuron several times many runs of the model and so on.My question is should I use transformation of the input variables to get a better results?how can I know that which transformation I should choos?Thanks for any help.
I strongly advise you to use some methods from time series analysis like lagged correlation or window lagged correlation (with statistical tests). You can find it in most of statistical packages (e.g. in R). From one small picture it's hard to deduce whether your prediction is lagged or not. Testing huge amount of data can help you in revealing true dependencies and avoid trusting in spurious correlations.

meet -Inf or NaN as the result for genetic algorithm using matlab

I sometimes get -Inf or NaN as the final value of my target function when I am using matlab ga toolbox doing the minimization. But if I do the optimization again with exactly the same option set up, I get a finite answer... Could anyone tell me why this is the case? and how could I solve the problem? Thanks very much!
The documentation and examples for ga are bad about this and barely mention the stochastic nature of this method (though if you're using it maybe you would be aware). If you wish to have repeatable results, you should always specify a seed value when perform stochastic simulations. This can be done in at least two ways. You can use the rng function:
rng(0);
where 0 is the seed value. Or you can possibly use the 'rngstate' field if you specify the optimization as a problem structure. See more here on reproducing results.
If you're doing any sort of experiments you should be specifying a seed. That way you can repeat a run if necessary to check why something may have happened or to obtain more finely-grained data. Just change the seed value to another positive integer if you want to run again.
The Genetic Algorithm is a stochastic algorithm, which means it does not explore the same problem space every time you run it. On each run it will be trying different solutions, and occasionally it is running into a solution on which your target function is ill-behaved.
Without knowing more about your specific problem, all I can really suggest is that you take a closer look at your target function and see if you can restrict it so that it does not explode to negative infinity. Look at the solution returned by the GA when you get these crazy target values, and see if you can adjust your target function so that it does not return infinite values for such solutions.