In the DataMapper documentation for associations I found an example where they put a model into a model like...
1 class Person
2
3 class Link
4
5 include DataMapper::Resource
6
7 storage_names[:default] = 'people_links'
8
9 # the person who is following someone
10 belongs_to :follower, 'Person', :key => true
11
12 # the person who is followed by someone
13 belongs_to :followed, 'Person', :key => true
14
15 end
16
17 include DataMapper::Resource
18
19 property :id, Serial
20 property :name, String, :required => true
21 ...
Does it have any influence on the result you get back or is it just another notation or format?
Thanks in advance, rufus
No, it doesn't have any influence on the result.
If you put your models in a namespace it will reflect in storage names though. That's why in the example above you see "storage_names[:default] = 'people_links'" in Link model, because that model is inside Person namespace, which is reflected in "people_links" table name.
Related
I am working with TraMineR and I am new to R and TraMineR.
Actually I made a typology of a life course dataset with TraMineR and the cluster library in R.
(used this guide: http://traminer.unige.ch/preview-typology.shtml)
Now I have different Cases sorted into different Types (all in all 4 Types).
I want to get into deeper analysis with a certain Type but I need to know which cases ( I have case numbers) belong to which type.
#
Is it possible to write the certain type a case is sorted to into the dataset itself as a new variable Is there another way?
In the example of the referenced guide, the Type is obtained as follows using an optimal matching distances with substitution costs based on transition probabilities
library(TraMineR)
data(mvad)
mvad.seq <- seqdef(mvad, 17:86)
dist.om1 <- seqdist(mvad.seq, method = "OM", indel = 1, sm = "TRATE")
library(cluster)
clusterward1 <- agnes(dist.om1, diss = TRUE, method = "ward")
cl1.4 <- cutree(clusterward1, k = 4)
cl.4 is a vector with the cluster membership of the sequences in the order corresponding to the mvad dataset. (It could be convenient to transform it into a factor.) Therefore, you can simply add this variable as an additional column to the dataset. For instance, if we want to name this new column as Type
mvad$Type <- cl1.4
tail(mvad[,c("id","Type")]) ## id and Type of the last 6 sequences
## id Type
## 707 707 3
## 708 708 3
## 709 709 4
## 710 710 2
## 711 711 1
## 712 712 4
I need your help with the formulation for a calculed a field in Tableau (Tableau Prep to be accurate).
I have a field called [Code Order] which contains only a series of Odd numbers (1,3,5,7,9,..) multiple times, which means it can be (1,3,1,3,5,7,1,1,1,3,5,7,9,11).
What I need is to transform these in a normal sequence of numbers so for my example above I need as a result: (1,2,1,2,3,4,1,1,1,2,3,4,5,6)
In other words when in [Code Order] I have :
1 = 1
3 = 2
5 = 3
7 = 4
9 = 5
11 = 6
13 = 7
15 = 8
...
365 = 183
For the moment my maximum is 365, which is position 183, I would like to avoid to type 182 IF formulas if possible. ;)
Thanks in advance for your help.
CYA
Plt.K
This might turn out to be more accurate in case your Code Order series is missing any values along the way.
Example series:
Alternate Field:
Tableau Setup:
You want to use the index() calculated field. Create a new field called index. The calculation is just index().
Add [Code Order] to your row shelf and index to your label. You should see something like this.
The following calculation should do the trick
CEILING([Code Order] / 2)
So my data let's say looks like this.
ItemID Event No. of Occurences
15 view 500
15 addtocart 89
15 bought 6
16 view 200
16 addtocart 11
16 bought 2
17 view 450
17 addtocart 43
17 bought 5
So ItemID and Event columns are dimension and No. of Occurences is a measure.
So far, I've been able to:
CALCULATE PERCENTAGE OF TOTAL for each ItemID, so for ItemID = 15, we have view = 84.03%(500/500+89+6)X100, addtocart = 14.95%(89/500+89+6)X100
What I want to accomplish is:
I want to show, for each ItemID, the PERCENTAGE of No. of Occurences which were bought to No. of Occurences which were view and also
I want to show, for each ItemID, the PERCENTAGE of No. of Occurences which were addtocart to No. of Occurences which were view.
So. For ItemID = 15, No. of Occurences(bought)/No. of Occurences(view) = 89/500 X 100
I'm a beginner. If I could shape this in some other way then, please suggest.
You can use calculated fields to easily do this.
Create 3 calculated fields (Analysis>Create Calculated Fields).
Event(Bought):
IF [Event]='Bought' then [Count of Occurences] ELSE 0 END
Event(View):
IF [Event]='View' then [Count of Occurences] ELSE 0 END
Ratio(Bought/View):
SUM([Event(Bought)])/SUM([Event(View)])
Add you EventId in the column shelf. Here is what it looks like for me.
I am trying to make a ranking of attributes depending on their predictive power by using OneR in WEKA iteratively. At every run I remove the chosen attribute to see what the next best is.
I have done this for all my attributes and some (3 out of ten attributes) get 'ranked' higher than others, although they have less % correct prediction, a smaller ROC Area average and their rules are less compact.
As I understand, OneR just looks at the frequency tables for the attribute it has and then the class values, so it wouldn't care about whether I take attributes out or not...but I am probably missing something
Would anyone have an idea?
As an alternative you can you use the OneR package (available on CRAN, more information here: OneR - Establishing a New Baseline for Machine Learning Classification Models)
With the option verbose = TRUE you get the accuracy of all attributes, e.g.:
> library(OneR)
> example(OneR)
OneR> data <- optbin(iris)
OneR> model <- OneR(data, verbose = TRUE)
Attribute Accuracy
1 * Petal.Width 96%
2 Petal.Length 95.33%
3 Sepal.Length 74.67%
4 Sepal.Width 55.33%
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'
OneR> summary(model)
Rules:
If Petal.Width = (0.0976,0.791] then Species = setosa
If Petal.Width = (0.791,1.63] then Species = versicolor
If Petal.Width = (1.63,2.5] then Species = virginica
Accuracy:
144 of 150 instances classified correctly (96%)
Contingency table:
Petal.Width
Species (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
setosa * 50 0 0 50
versicolor 0 * 48 2 50
virginica 0 4 * 46 50
Sum 50 52 48 150
---
Maximum in each column: '*'
Pearson's Chi-squared test:
X-squared = 266.35, df = 4, p-value < 2.2e-16
(full disclosure: I am the author of this package and I would be very interested in the results you get)
The OneR classifier looks a bit like nearest-neighbor. Given that, the following applies: In the source code of the OneR classifier, it says:
// if this attribute is the best so far, replace the rule
if (noRule || r.m_correct > m_rule.m_correct) {
m_rule = r;
}
Thus, it should be possible (either in 1-R generally or in this implementation) for an attribute to block another, yet be later removed in your process.
Say you have attributes 1,2, and 3 with the distribution 1: 50%, 2: 30%, 3: 20%. In all cases where attribute 1 is best, attribute 3 is second best.
Thus, when attribute 1 is left out, attribute 3 wins with 70%, even though before attribute 2 ranked as "better" than 3 in the comparison of all three.
First of all, excuse me if I do any mistakes, but English is not a language I use very often.
I have a data frame with numbers. A small part of the data frame is this:
nominal 2
2
2
2
ordinal
2
1
1
2
So, I want to use the gower distance function on these numbers.
Here ( http://rgm2.lab.nig.ac.jp/RGM2/R_man-2.9.0/library/StatMatch/man/gower.dist.html ) says that in order to use gower.dist, all nominal variables must be of class "factor" and all ordinal variables of class "ordered".
By default, all the columns are of class "integer" and mode "numeric". In order to change the class of the columns, i use these commands:
DF=read.table("clipboard",header=TRUE,sep="\t")
# I select all the cells and I copy them to the clipboard.
#Then R, with this command, reads the data from there.
MyHeader=names(DF) # I save the headers of the data frame to a temp matrix
for (i in 1:length(DF)) {
if (MyHeader[[i]]=="nominal") DF[[i]]=as.factor(DF[[i]])
}
for (i in 1:length(DF)) {
if (MyHeader[[i]]=="ordinal") DF[[i]]=as.ordered(DF[[i]])
}
The first for/if loop changes the class from integer to factor, which is what I want, but the second changes the class of ordinal variables to: "ordered" "factor".
I need to change all the columns with the header "ordinal" to "ordered", as the gower.dist function says.
Thanks in advance,
B.T.
What you are doing is fine --- if perhaps a little inelegantly.
With your ordered factor, you have something like:
> foo <- as.ordered(1:10)
> foo
[1] 1 2 3 4 5 6 7 8 9 10
Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10
> class(foo)
[1] "ordered" "factor"
Notice that it has two classes, indicating that it is an ordered factor and that is is a factor:
> is.ordered(as.ordered(1:10))
[1] TRUE
> is.factor(as.ordered(1:10))
[1] TRUE
In some senses, you might like to think that foo is an ordered factor but also inherits from the factor class too. Alternatively, if there isn't a specific method that handles ordered factors, but there is a method for factors, R will use the factor method. As far as R is concerned, an ordered factor is an object with classes "ordered" and "factor". This is what your function for Gower's distance will require.
You could easily do this with:
DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)
which gives you a dataframe with the correct structure. If you work with data frames, please stay away from [[]] unless you know very well what you're doing. Take Dirks advice, and check Owen's R Guide as well. You definitely need it.
If i do the conversion as I showed above, gower.dist() works perfectly fine. On a sidenote, the gowers distance can easily be calculated using the daisy() function as well:
DF <- data.frame(
ordinal= c(1,2,3,1,2,1),
nominal= c(2,2,2,2,2,2)
)
DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)
library(cluster)
daisy(DF,metric="gower")
library(StatMatch)
gower.dist(DF)