Vowpal Wabbit ignore linear terms, only keep interaction terms - feature-selection

Hi have a Vowpal Wabbit file with two namespaces, for example:
1.0 |A snow |B ski:10
0.0 |A snow |B walk:10
1.0 |A clear |B walk:10
0.0 |A clear |B walk:5
1.0 |A clear |B walk:100
1.0 |A clear |B walk:15
Using -q AB, I can get the interaction terms. Is there any way for me to keep only the interaction terms and ignore the linear terms?
In other words, the result of vw sample.vw -q AB --invert_hash sample.model right now is this:
....
A^clear:24861:0.153737
A^clear^B^walk:140680:0.015292
A^snow:117127:0.126087
A^snow^B^ski:21312:0.015803
A^snow^B^walk:28234:-0.010592
B^ski:107733:0.015803
B^walk:114655:0.007655
Constant:116060:0.234153
I would like it to be something like this:
....
A^clear^B^walk:140680:0.015292
A^snow^B^ski:21312:0.015803
A^snow^B^walk:28234:-0.010592
Constant:116060:0.234153
The --keep and --ignore options do not produce the desired effect because they are appear to be considered before the quadratic terms are generated. Is it possible to do this with vw or do I need a custom preprocessing step that creates all of the combinations?

John Langford (the main author of VW) wrote:
There is not a good way to do this at present. The easiest approach
would be to make --ignore apply to the foreach_feature<> template in the
source code.
You can use a trick with transforming each original example into four new examples:
1 |first:1 foo bar gah |second:1 loo too rah
-1 |first:1 foo bar gah |second:-1 loo too rah
1 |first:-1 foo bar gah |second:-1 loo too rah
-1 |first:-1 foo bar gah |second:1 loo too rah
This makes the quadratic features all be perfectly correlated with the
label, but the linear features have zero correlation with the label.
Hence a mild l1 regularization should kill off the linear features.
I'm skeptical that this will improve performance enough to care (hence
the design), but if you do find that it's useful, please tell us about it.
See the original posts:
https://groups.yahoo.com/neo/groups/vowpal_wabbit/conversations/topics/2964
https://groups.yahoo.com/neo/groups/vowpal_wabbit/conversations/topics/4346

Related

naive Bayes probability formula

In this photo is the probability with the Naives Bayes algorithm of :Fem:dv/m/s Young own Ex-credpaid Good ->62% and I calculated the Probability so: P(Fem:dv/m/s | Good)*P(Young | Good)*P(own | Good)*P(Ex-credpaid | good)*P(Good) -> 1/6*2/6*5/6*3/6*0.6=0,01389 and I dont Know where I Failed, could someone please tell me where is my error?
Table

How does perl resolve possible hash collisions in hashes?

As we know, perl implements its 'hash' type as a table with calculated indexes, where these indexes are truncated hashes.
As we also know, a hashing function can (and will, by probability) collide, giving the same hash to 2 or more different inputs.
Then: How does the perl interpreter handle when it finds that a key generated the same hash than another key? Does it handle it at all?
Note: This is not about the algorithm for hashing but about collision resolution in a hash table implementation.
A Perl hash is an array of linked lists.
+--------+ +--------+
| -------->| |
+--------+ +--------+
| | | key1 |
+--------+ +--------+
| ------+ | val1 |
+--------+ | +--------+
| | |
+--------+ | +--------+ +--------+
+-->| ------>| |
+--------+ +--------+
| key2 | | key3 |
+--------+ +--------+
| val2 | | val3 |
+--------+ +--------+
The hashing function produces a value which is used as the array index, then a linear search of the associated linked list is performed.
This means the worse case to lookup is O(N). So why do people say it's O(1)? You could claim that if you kept the list from exceeding some constant length, and that's what Perl does. It uses two mechanisms to achieve this:
Increasing the number of buckets.
Hashing algorithm perturbing.
Doubling the number of buckets should divide the number of entries in a given by half, on average. For example,
305419896 % 4 = 0 and 943086900 % 4 = 0
305419896 % 8 = 0 and 943086900 % 8 = 4
However, a malicious actor could choose values where this doesn't happen. This is where the hash perturbation comes into play. Each hash has its own random number that perturbs (causes variances in) the output of the hashing algorithm. Since the attacker can't predict the random number, they can't choose values that will cause collisions. When needed, Perl can rebuild the hash using a new random number, causing keys to map to different buckets than before, and thus breaking down long chains.
Sets of key-value pairs where the keys produce the same hash value are stored together in a linked list. The gory details are available in hv.c.

How to find all lines in a set that coincide in a single point?

Suppose I am given a set of lines, how can I partition this set into a number of clusters, such that all the lines in each cluster coincide in a single point?
If number of lines N is reasonable, then you can use O(N^3) algorithm:
for ever pair of lines (in form A*x+B*y+C=0) check whether they intersect - exclude pairs of parallel and not-coincident lines with determinant
|A1 B1|
= 0
|A2 B2|
For every intersecting pair if another line shares the same intersecting point, with determinant:
|A1 B1 C1|
|A2 B2 C2| = 0
|A3 B3 C3|
If N is too large for cubic algorithm using, then calculate all intersection points (upto O(N^2) of them) and add these points to any map structure (hashtable, for example). Check for matching points. Don't forget about numerical errors issue.

Merge and Dissolve Features in Openlayers?

Hello i have a map with complex features. each feature has 4 attributes.
Province | Regency | Sub-District | Village
i am using openlayers to display my map.
i need to be able to style this map with color based on attributes and filtering each of this features based on the common attributes.
which is is the best way to do this? using merge or dissolve?
or can i do this with openlayers?
for example
i have options to select the scope of the attributes color to be displayed.
for example when i choose scope village
Province | Regency | Sub-District | Village
A 101 X1 Z1
A 101 X2 Z2
B 102 X3 Z3
B 102 X4 Z4
C 103 X5 Z5
but when i choose scope Regency
the result will be
Province | Regency |
A 101
B 102
C 103
and if i use merge does the features after merging disappear?
OpenLayers has some excellent in-built classes that can help you out quite a bit. I think the classes you are looking for are OpenLayers.Strategy.Filter and OpenLayers.StyleMap.
The Filter Strategy allows you to specify a Filter object to a layer that will hide features that do not match the filter.
The StyleMap allows you to define Style objects to the features based on attributes or computed attributes (function output).
For both of these, there are great examples that you can find online (using the Google) to see these classes in action.

Training LIBSVM with multivariate data in MATLAB

How LIBSVM works performs multivariate regression is my generalized question?
In detail, I have some data for certain number of links. (Example 3 links). Each link has 3 dependent variables which when used in a model gives output Y. I have data collected on these links in some interval.
LinkId | var1 | var2 | var3 | var4(OUTPUT)
1 | 10 | 12.1 | 2.2 | 3
2 | 11 | 11.2 | 2.3 | 3.1
3 | 12 | 12.4 | 4.1 | 1
1 | 13 | 11.8 | 2.2 | 4
2 | 14 | 12.7 | 2.3 | 2
3 | 15 | 10.7 | 4.1 | 6
1 | 16 | 8.6 | 2.2 | 6.6
2 | 17 | 14.2 | 2.3 | 4
3 | 18 | 9.8 | 4.1 | 5
I need to perform prediction to find the output of
(2,19,10.2,2.3).
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference?
NOTE : Notice each link with same ID has same value.
This is not really a matlab or libsvm question but rather a generic svm related one.
How LIBSVM works performs multivariate regression is my generalized question?
LibSVM is just a library, which in particular - implements the Support Vector Regression model for the regression tasks. In short words, in a linear case, SVR tries to find a hyperplane for which your data points are placed in some margin around it (which is quite a dual approach to the classical SVM which tries to separate data with as big margin as possible).
In non linear case the kernel trick is used (in the same fashion as in SVM), so it is still looking for a hyperplane, but in a feature space induced by the particular kernel, which results in the non linear regression in the input space.
Quite nice introduction to SVRs' can be found here:
http://alex.smola.org/papers/2003/SmoSch03b.pdf
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.
You could train SVR (as it is a regression problem) with the whole data, but:
seems that var3 and LinkId are the same variables (1->2.2, 2->2.3, 3->4.1), if this is a case you should remove the LinkId column,
are values of var1 unique ascending integers? If so, these are also probably a useless featues (as they do not seem to carry any information, they seem to be your id numbers),
you should preprocess your data before applying SVM so eg. each column contains values from the [0,1] interval, otherwise some features may become more important than others just because of their scale.
Now, if you would like to create a separate model for each link, and follow above clues, you end up with 1 input variable (var2) and 1 output variable var4, so I would not recommend such a step. In general it seems that you have very limited featues set, it would be valuable to gather more informative features.