Can we convert a adjacency list model of a cyclic graph into nested set model in RDBMS? - rdbms

If a cyclic graph is stored in adjacency list model, then we query using CTEs which is very slow. But if there is a way to convert adjacency list model of cyclic graph into nested set model, then I guess the queries might run a little faster. I know that for a tree it is possible to convert but I didn't find a method in case of graph, especially cyclic graphs.
If we can't convert to nested set model then what is the best method to store and query cyclic graphs in RDBMS other than adjacency list and CTEs?

Related

Can I separately train a classifier (e.g. SVM) with two different types of features and combine the results later?

I am a student and working on my first simple machine learning project. The project is about classifying articles into fake and true. I want to use SVM as classification algorithm and two different types of features:
TF-IDF
Lexical Features like the count of exclamation marks and numbers
I have figured out how to use the lexical features and TF-IDF as a features separately. However, I have not managed to figure out, how to combine them.
Is it possible, to train and test two separate learning algorithms (one with TF-IDF and the other one with lexical features) and later combine the results?
For example, can I calculate Accuracy, Precision and Recall for both separately and then take the average?
One way of combining two models is called model stacking. The idea behind it is, that you take the predictions of both models and feed them into a third model (called meta-model) which is then trained to make predictions given the output of the first two models. There is also another version of model stacking where you aditionally feed the original features into the meta-model.
However, in your case another way to combine both approaches would be to simply feed both the TF-IDF and the lexical features into one model and see how that performs.
For example, can I calculate Accuracy, Precision and Recall for both separately and then take the average?
This would unfortunately not work, because there is no combined model making those predictions for which your calculated metrics would be true.

How to improve predictor importance in decision tree ensemble (using TreeBagger class in Matlab)

I'm trying to train a classifier (specifically, a decision forest) using the Matlab 'TreeBagger' class.
I notice from the online documentation for TreeBagger, that there are a couple of methods/properties that could be used to see how important each data point feature is for distinguishing between classes of data point.
The two I found were the ComputeOOBVarImp property and the ClassificationTree.predictorImportance method. Using the latter on a decision forest/bagged ensemble of trees that I'd built, I found that many data point features had zero importance for the classifier.
Is there anything I can do with the TreeBagger class, or in conjunction with it, so that my trees use weak learners/splitting criteria that aren't just bounds on single input data features, but linear combinations of these features, in order to improve the 'information gain' at each node split.
I suppose this comes down to dimensionality reduction of the data, that I have no experience in dealing with in Matlab.
Thanks.

different results by SMO, NaiveBayes, and BayesNet classifiers in weka

I am trying different classifiers of Weka on my data set. I have small dataset and I am classifying my data into five classes.
My problem is that when I apply cross validation or percentage split classification by different classifiers, I get very different results.
For example, when I use NaiveBayse or BayseNet classifiers, I have an F-score of around 40 for all classes, but using SMO I get an F-score of 20. The worse result is obtained when I use LibLinear classifier which gives me a F-scores of around 15.
Maybe I should mention that since LibLinear classifier doesn't accept nominals, I assign a code to each of the possible nominal values and use them as Numeric values in my dataset.
Can anybody tell me why I get such different results? I expected all classifiers to have roughly similar results.
In addition, when I use LibLinear on my test set, I have all data classified under one of the classes and there is no instances in the other four classes.
Thanks in advance,
Why would you expect similar results? For small data set especially I think different methods could easily lead to different predictions. Also linear model has tolerance threshold that would cause early termination before convergence. It's something you can play with in LibLINEAR or SMO for instance.

Dynamic balanced data structure in Matlab?

This answer states
I don't think you (or I) can do dynamic data structures 'in' MATLAB.
We have to use MATLAB OO features and MATLAB classes. Since I think
that these facilities are really a MATLAB wrapper around Java I make
the bold claim that those facilities are outside MATLAB. A matter of
semantics, I concede. If you want to do dynamic data structures with
MATLAB, you have to use OO and classes, you can't do it with what I
think of as the core language, which lacks pointers at the user level.
Now suppose a bag. New numbers are added to the bag in random order and still the numbers should be ordered. The amount of numbers is unknown. Hence I need a dynamic data-structure: the size of the structure must be able to get changed. Also the structure must be able to get balanced i.e. I need to get it ordered.
Which data structure should I use for the dynamic balanced data-structure requirement in Matlab?
Matlab's matrices are inherently dynamic. If you have a vector of ordered numbers and want to insert a new number in its proper place (maintaining the vector ordered), you can simply do
[~, ind] = find(number<=vector,1,'first'); % determine where to insert
if isempty(ind), ind = numel(vector)+1; end % in this case, insert at the end
vector = [vector(1:ind-1) number vector(ind:end)]; % do the insert, extending the vector
Of course this is not very fast because of the need for memory reallocation.

What is the best way to implement a tree in matlab?

I want to write an implementation of a (not a binary) tree and and run some algorithms on it. The reason for using the matlab is that the rest of all programs are in matlab and it would be usful for some analysis and plotting. From an initial search in matlab i found that there aren't thing like pointers in matlab. So I'd like to know the best ( in terms on convinience) possible way to do this in matlab ? or any other ways ?
You can do this with MATLAB objects but you must make sure you use handle objects and not value objects because your nodes will contain cross-references to other nodes (i.e. parent, next sibling, first child).
This question is very old but still open. So I would just like to point readers to this implementation in plain MATLAB made by yours truly. Here is a tutorial that walks you through its use.
Matlab is very well suited to handle any kind of graphs (not only trees) represented as adjacency matrix or incidence matrix.
Matrices (representing graphs) can be either dense or sparse, depending on the properties of your graphs.
Last but not least, graph theory and linear algebra are in very fundamental ways related to each other see for example, so Matlab will be able to provide for you a very nice platform to harness such relationships.