Please tell me what is difference between hierarchical, network and relational data models?
Hierarchical model
1.One to many or one to one relationships.
2.Based on parent child relationship.
3.Retrieve algorithms are complex and asymmetric
4.Data Redundancy more
Network model
1.Many to many relationships.
2.Many parents as well as many children.
3.Retrieve algorithms are complex and symmetric
4.Data Redundancy more
Relational model
1.One to One,One to many, Many to many relationships.
2.Based on relational data structures.
3.Retrieve algorithms are simple and symmetric
4.Data Redundancy less
Related
Is is possible to model a general trend from a population using GPflow and also have individual predictions, as in Hensman et al?
Specifically, I am trying to fit spatial data from a bunch of individuals from a clinical assessment. For each individual, am I dealing with approx 20000 datapoints (different number of recordings for each individual), which definitely restricts myself to a sparse implementation. In addition to this, there also seemes that I need an input dependent noise model, hence the heteroskedasticity.
I have fitted a hetero-sparse model as in this notebook example, but I am not sure how to scale it to perform the hierarchical learning. Any ideas would be welcome :)
https://github.com/mattramos/SparseHGP may be helpful. This repo is gives GPFlow2 code for modelling a sparse hierarchical model. Note, there are still some rough edges in the implementation that require an expensive for loop to be constructed.
I want to do link prediction in a heterogeneous knowledge graph which will be evolving over time. The entity types and relations will be the same. But there will be unseen nodes.
I would like to know whether graphsage is the right choice for this purpose?. Does pytorch geometric support Graphsage for heterogeneous graphs?
As we know, the referential constraints are not enforced by Redshift. Should we still opt for dimensional modeling ?
If so, how do we get around the limitations and maintain data integrity of our datawarehouse.
Yes, dimensional modelling is feasible and strongly encouraged on Redshift. Redshift is even optimized for star schema queries
Optimizing for Star Schemas and Interleaved Sorting on Amazon Redshift
Refer this Is dimensional modeling feasible in Amazon RedShift?
I am performing multiclass classification and am investigating the impact on performance given by different types of features. I am using SVM 1v1 classifier for each set of features separately, and now I want to try training a combined model that will make use of all the feature sets I have. What are the ways of creating such a combined model without simply dumping all the features together? My understanding is that this is similar to the idea of an ensemble model, however, I couldn't find examples of ensembles that would operate on multiple feature sets.
I should also mention, that I am looking for a out-of-box implementations or some libraries, rather than implementing the models myself.
If you only have a 1-1 mapping between your abstract objects and some features in each of your sets - then this is actually a classical ensemble model, there is completely no difference. You think about your model as using multiple different feature extractors for the objects, thus
ABSTRACT OBJECTS ------ FEATURES ------ MODELS
\____________________________/ |
your definition of data your definition of model
while typical ML approach (perspective on your approach) would be
ABSTRACT OBJECTS ------ FEATURES ------ MODELS
| \_________________/
data model
in other words each pair (features_set, model) define actual model, and as you can see, with such perspective, you simply use any ensemble technique. The fact that you somehow "hand crafted" your various features sets does not change the fact, that it is just a part of modeling a function from your abstract objects (whatever they are) to actual decisions.
Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do document clustering?
A topic is quite different from a cluster of docs, after all, a topic is not composed of docs.
However, these two techniques are indeed related. I believe Topic Modeling is a viable way of deciding how similar documents are, hence a viable way for document clustering.
In representing each document as a topic distribution (actually a vector), topic modeling techniques reduce the feature dimensionality from number of distinct words appeared (in a corpus) to the number of topics. Similarity between docs' Topic distributions can be calculated using Cosine metrics and many other metrics, which reflect the similarity of the docs themselves in terms of the topics/themes they cover. Based on this quantified similarity measure, many clustering algorithms can be applied to group the documents.
And in this sense, I think it is right to say that topic modeling is a technique to do document clustering.
The relation between clustering and classification is very similar to the relation between topic modeling and multi-label classification.
In single-label multi-class classification we assign just one label per each document. And in clustering we put each document in just one group. The fact is that we can't define the clusters in advance as we define labels. If we ignore this fact, grouping and labeling are essentially the same thing.
However, in real world problems flat classification is not sufficient. Often documents are related to multiple categories/classes. Thus we leverage the multi-label classification. Now, we can see the topic modeling as the unsupervised version of multi-label classification as we can put each document under multiple groups/topics. Here again, I'm ignoring the fact that we can't decide what topics to use as labels in advance.