Non-parametric counterpart of Nested Anova with unbalanced designed - hierarchical

Is there a non-parametric counterpart of Nested ANOVA with unbalanced designed? My data is not normal even after doing different transformations. Thus i could not use parametric test. I came across Friedman tests but i understand that it requires a complete and balanced design.
What I want to determine is the effect of A (e.g. gear type) in B (e.g. catch) where A is nested in factor C (e.g. study site).
I'm new to R and I am not an expert in Stat either. Any help would be greatly appreciated. Thank you very much.

Related

Multi criteria objective optimization function that output single solution for each variable?

I am trying to make a matlab program that optimize a flexure hinge designs. As i research matlab functions for multicriteria objectives i found multiple functions such as gamultiobj, fgoalattain and paretosearch however most of them outputed arrays of result instead of outputing 1 result. however i am looking for function that just output 1 single result for each variables. So i am trying to use fmincon function but since they only except single function to optimize. So i tried to look for ways to combine multiple objective criteria function. I have found a weighted sum method to combine it(for example f(x) = w1 * f1(x) + w2 * f2(x) ;) i have tried fmimax as well however it always weighted towards f1(first objective function in the function array) even though f2 can still be reduced. I am hoping to weight between those 2 objective funtion 50/50 compromise.
So basically i am just looking for functions or methods for nonlinear multicriteria objective with non linear constraint problem that when given functions that give single solution that each objective compromise so that none of the objectivebeing prioritize above others (aside from weighted sum method) ?
There are a few ways to approach this problem, but I suspect none will do exactly what you're looking for.
When you have more than one objective, and assuming the are competing, then there exists a trade-off. You need to decide the balance that you prefer between f1 and f2.
Multi-objective optimization, which you mention, provides you with a Pareto-optimal set of solutions. These solutions will be what is called non-dominated, no solution in the set will be better than another both in terms of f1 and f2. Each solution will represent a different trade-off between these values. It is up to you to look at the resulting set and decide which particular trade-off works best for your particular application.
As you've already found you can also do a weighted sum of objectives, converting this to a single-objective problem. However, this requires you do know your desired trade-off between the objectives and weight them accordingly. If you have some baseline solution you are trying to improve you might use that to normalize your function. For example, f = f1/f10 + f2/f20, would give an even balance to improving f1 and f2 relative to your initial design.
A third alternative is to convert one objective to a constraint. For example, if you will be happy with any solution that has f1 < c, for example, you can set this as a constraint and then use only f2 as an objective.
Each of these methods require that you determine the trade-off, or some satisfactory value, for f1 and f2. No optimization algorithms can come up with that balance for you.

Number of Q values for a deep reinforcement learning network

I am currently developing a deep reinforcement learning network however, I have a small doubt about the number of q-values I will have at the output of the NN. I will have a total of 150 q-values, which personally seems excessive to me. I have read on several papers and books that this could be a problem. I know that it will depend from the kind of NN I will build, but do you guys think that the number of q-values is too high? should I reduce it?
There is no general principle what is "too much". Everything depends solely on the problem and throughput one can get in learning. In particular number of actions does not have to matter as long as internal parametrisation of Q(a, s) is efficient. To give some example lets assume that the neural network is actually of form NN(a, s) = Q(a, s), in other words it accepts action as an input, together with the state, and outputs the Q value. If such an architecture can be trained in a problem considered, than it might be able to scale to big action spaces; on the other hand if the neural net basically has independent output per action, something of form NN(s)[a] = Q(a, s) then many actions can lead to relatively sparse learning signal for the model and thus lead to slow convergence.
Since you are asking about reducing action space it sounds like the true problem has complex control (maybe it is a continuous control domain?) and you are looking for some discretization to make it simpler to learn. If this is the case you will have to follow the typical approach of trial and error - try with simple action space, observe the dynamics, and if the results are not satisfactory - increase the complexity of the problem. This allows making iterative improvements, as opposed to going in the opposite direction - starting with too complex setting to get any results and than having to reduce it without knowing what are the "reasonable values".

How many and which parents should we select for crossover in genetic algorithm

I have read many tutorials, papers and I understood the concept of Genetic Algorithm, but I have some problems to implement the problem in Matlab.
In summary, I have:
A chromosome containing three genes [ a b c ] with each gene constrained by some different limits.
Objective function to be evaluated to find the best solution
What I did:
Generated random values of a, b and c, say 20 populations. i.e
[a1 b1 c1] [a2 b2 c2]…..[a20 b20 c20]
At each solution, I evaluated the objective function and ranked the solutions from best to worst.
Difficulties I faced:
Now, why should we go for crossover and mutation? Is the best solution I found not enough?
I know the concept of doing crossover (generating random number, probability…etc) but which parents and how many of them will be selected to do crossover or mutation?
Should I do the crossover for the entire 20 solutions (parents) or only two of them?
Generally a Genetic Algorithm is used to find a good solution to a problem with a huge search space, where finding an absolute solution is either very difficult or impossible. Obviously, I don't know the range of your values but since you have only three genes it's likely that a good solution will be found by a Genetic Algorithm (or a simpler search strategy at that) without any additional operators. Selection and Crossover is usually carried out on all chromosome in the population (although it's not uncommon to carry some of the best from each generation forward as is). The general idea is that the fitter chromosomes are more likely to be selected and undergo crossover with each other.
Mutation is usually used to stop the Genetic Algorithm prematurely converging on a non-optimal solution. You should analyse the results without mutation to see if it's needed. Mutation is usually run on the entire population, at every generation, but with a very small probability. Giving every gene 0.05% chance that it will mutate isn't uncommon. You usually want to give a small chance of mutation, without it completely overriding the results of selection and crossover.
As has been suggested I'd do a lit bit more general background reading on Genetic Algorithms to give a better understanding of its concepts.
Sharing a bit of advice from 'Practical Neural Network Recipies in C++' book... It is a good idea to have a significantly larger population for your first epoc, then your likely to include features which will contribute to an acceptable solution. Later epocs which can have smaller populations will then tune and combine or obsolete these favourable features.
And Handbook-Multiparent-Eiben seems to indicate four parents are better than two. However bed manufactures have not caught on to this yet and seem to only produce single and double-beds.

Choosing Clustering Method based on results

I'm using WEKA for my thesis and have over 1000 lines of data. The database includes demographical information (Age, Location, status etc.) followed by name of products (valued 1 or 0). The end results is a recommender system.
I used two methods of clustering, K-Means and DBScan.
When using K-means I tried 3 different number of cluster, while using DBscan I chose 3 different epsilons (Epsilon 3 = 48 clusters with ignored 17% of data, Epsilone 2.5 = 19 clusters while cluster 0 holds 229 items with ignored 6%.) Meaning i have 6 different clustering results for same data.
How do I choose what's best suits my data ?
What is "best"?
As some smart people noticed:
the validity of a clustering is often in the eye of the beholder
There is no objectively "better" for clustering, or you are not doing cluster analysis.
Even when a result actually is "better" on some mathematical measure such as separation, silhouette or even when using a supervised evaluation using labels - its still only better at optimizing towards some mathematical goal, not to your use case.
K-means finds a local optimal sum-of-squares assignment for a given k. (And if you increase k, there exists a better assignment!) DBSCAN (it's actually correctly spelled all uppercase) always finds the optimal density-connected components for the given MinPts/Epsilon combination. Yet, both just optimize with respect to some mathematical criterion. Unless this critertion aligns with your requirements, it is worthless. So there is no best, until you know what you need. But if you know what you need, you would not need to do cluster analysis.
So what to do?
Try different algorithms and different parameters and analyze the output with your domain knowledge, if they help you with the problem you are trying to solve. If they help you solving your problem, then they are good. If they do not help, try again.
Over time, you will collect some experience. For example, if the sum-of-squares is meaningless for your domain, don't use k-means. If your data does not have meaningful density, don't use density based clustering such as DBSCAN. It's not that these algorithms fail. They just don't solve your problem, they solve a different problem that you are not interested in. And they might be really good at solving this other problem...

Ensemble classifier with wrapper method

I'm trying to combine multiple classifiers (ANN, SVM, kNN, ... etc.) using ensemble learning (viting, stacking ...etc.) .
In order to make a classifier, I'm using more than 20 types of explanatory variables.
However, each classifier has the best subset of explanatory variables. Thus, seeking the best combination of explanatory variables for each classifier in wrapper method,
I would like to combine multiple classifiers (ANN, SVM, kNN, ... etc.) using ensemble learning (viting, stacking ...etc.) .
By using the meta-learning with weka, I should be able to use the ensemble itself.
But I can not obtain the best combination of explanatory variables since wrapper method summarizes the prediction of each classifier.
I am not stick to weka if it can be solved easier in maybe matlab or R.
With ensemble approaches, best results have been achieved with very simple classifiers. Which on the other hand can be pretty fast, to make up for the ensemble cost.
This may seem counterintuitive at first: one would exepect a better input classifier to produce a better output. However, there are two reasons why this does not work.
First of all, with simple classifiers, you can usually tweak them more to get a diverse set of input classifiers. A full-dimensional method + feature bagging gives you a diverse set of classifiers. A classifier that internally does feature selection or reduction makes feature bagging largely disfunct for getting variety. Secondly, a complex method such as SVM is more likely to optimize/converge towards the very same result. After all, the complex methods are supposed to go through a much larger search space and find the best result in this search space. But that also means, you are more likely to get the same result again.
Last but not least, when using very primivite classifiers, the errors are better behaved and more likely to even out on ensemble combination.