Understanding --readable_model and --invert_hash in vowpal wabbit for Neural Networks - hash

I was trying to make a diagram of the weights that vowpal wabbit has learnt to understand the architecture better and got very confused as to what was happening. I couldn't understand where all the weights given out by vowpal wabbit go in the structure.
My data:
$ cat dat1.vw
1 | a b
2 | a c
When doing a neural network with 2 nodes:
vw --nn 2 --invert_hash dat.nn2.ih --readable_model dat.nn2.rm dat1.vw
it gives dat.nn2.ih and dat.nn2.rm with some information like max, min, checksum etc and the weights as:
From dat.nn2.ih (from --invert_hash):
:29015:-0.3161
Constant:202096:-0.270493
Constant[1]:202097:0.214776
[1]:29016:-0.302343
[2]:156909:-0.479347
a:108232:-0.270493
a[1]:108233:0.214776
b:129036:-0.0849519
b[1]:129037:0.0473027
c:219516:-0.196927
c[1]:219517:0.172029
And from dat.nn2.rm (--readable_model):
29015:-0.3161 # <blank> ?
29016:-0.302343 # [1] ?
108232:-0.270493 # a (from input "a" to hidden node 0)
108233:0.214776 # a[1] (from input "a" to hidden node 1)
129036:-0.0849519 # b (from input "b" to hidden node 0)
129037:0.0473027 # b[1] (from input "b" to hidden node 1)
156909:-0.479347 # [2] ?
156910:0.394566 # <nonexistent> not there in .ih file ?
156911:0.69414 # <nonexistent> not there in .ih file ?
202096:-0.270493 # Constant (bias for hidden node 0)
202097:0.214776 # Constant[1] (bias for hidden node 1)
219516:-0.196927 # c (from input "c" to hidden node 0)
219517:0.172029 # c[1] (from input "c" to hidden node 1)
So, I can understand a, a[1], b, b[1], c, c[1], Constant, Constant[1] but I am unable to figure out what the rest of the hashes are for ?
From my understanding, there should be 3 more weights/hashes:
- From hidden node 0 to output node
- From hidden node 1 to output node
- Bias for output node
But I see a <blank>, [1], [2] and 2 hashes which are in the .rm but not in .ih. What exactly do these weights represent ?

Related

statistical test to compare 1st/2nd differences based on output from ggpredict / ggeffect

I want to conduct a simple two sample t-test in R to compare marginal effects that are generated by ggpredict (or ggeffect).
Both ggpredict and ggeffect provide nice outputs: (1) table (pred prob / std error / CIs) and (2) plot. However, it does not provide p-values for assessing statistical significance of the marginal effects (i.e., is the difference between the two predicted probabilities difference from zero?). Further, since I’m working with Interaction Effects, I'm also interested in a two sample t-tests for the First Differences (between two marginal effects) and the Second Differences.
Is there an easy way to run the relevant t tests with ggpredict/ggeffect output? Other options?
Attaching:
. reprex code with fictitious data
. To be specific: I want to test the following "1st differences":
--> .67 - .33=.34 (diff from zero?)
--> .5 - .5 = 0 (diff from zero?)
...and the following Second difference:
--> 0.0 - .34 = .34 (diff from zero?)
See also Figure 12 / Table 3 in Mize 2019 (interaction effects in nonlinear models)
Thanks Scott
library(mlogit)
#> Loading required package: dfidx
#>
#> Attaching package: 'dfidx'
#> The following object is masked from 'package:stats':
#>
#> filter
library(sjPlot)
library(ggeffects)
# create ex. data set. 1 row per respondent (dataset shows 2 resp). Each resp answers 3 choice sets, w/ 2 alternatives in each set.
cedata.1 <- data.frame( id = c(1,1,1,1,1,1,2,2,2,2,2,2), # respondent ID.
QES = c(1,1,2,2,3,3,1,1,2,2,3,3), # Choice set (with 2 alternatives)
Alt = c(1,2,1,2,1,2,1,2,1,2,1,2), # Alt 1 or Alt 2 in choice set
LOC = c(0,0,1,1,0,1,0,1,1,0,0,1), # attribute describing alternative. binary categorical variable
SIZE = c(1,1,1,0,0,1,0,0,1,1,0,1), # attribute describing alternative. binary categorical variable
Choice = c(0,1,1,0,1,0,0,1,0,1,0,1), # if alternative is Chosen (1) or not (0)
gender = c(1,1,1,1,1,1,0,0,0,0,0,0) # male or female (repeats for each indivdual)
)
# convert dep var Choice to factor as required by sjPlot
cedata.1$Choice <- as.factor(cedata.1$Choice)
cedata.1$LOC <- as.factor(cedata.1$LOC)
cedata.1$SIZE <- as.factor(cedata.1$SIZE)
# estimate model.
glm.model <- glm(Choice ~ LOC*SIZE, data=cedata.1, family = binomial(link = "logit"))
# estimate MEs for use in IE assessment
LOC.SIZE <- ggpredict(glm.model, terms = c("LOC", "SIZE"))
LOC.SIZE
#>
#> # Predicted probabilities of Choice
#> # x = LOC
#>
#> # SIZE = 0
#>
#> x | Predicted | SE | 95% CI
#> -----------------------------------
#> 0 | 0.33 | 1.22 | [0.04, 0.85]
#> 1 | 0.50 | 1.41 | [0.06, 0.94]
#>
#> # SIZE = 1
#>
#> x | Predicted | SE | 95% CI
#> -----------------------------------
#> 0 | 0.67 | 1.22 | [0.15, 0.96]
#> 1 | 0.50 | 1.00 | [0.12, 0.88]
#> Standard errors are on the link-scale (untransformed).
# plot
# plot(LOC.SIZE, connect.lines = TRUE)

Encoding Spotify URI to Spotify Codes

Spotify Codes are little barcodes that allow you to share songs, artists, users, playlists, etc.
They encode information in the different heights of the "bars". There are 8 discrete heights that the 23 bars can be, which means 8^23 different possible barcodes.
Spotify generates barcodes based on their URI schema. This URI spotify:playlist:37i9dQZF1DXcBWIGoYBM5M gets mapped to this barcode:
The URI has a lot more information (62^22) in it than the code. How would you map the URI to the barcode? It seems like you can't simply encode the URI directly. For more background, see my "answer" to this question: https://stackoverflow.com/a/62120952/10703868
The patent explains the general process, this is what I have found.
This is a more recent patent
When using the Spotify code generator the website makes a request to https://scannables.scdn.co/uri/plain/[format]/[background-color-in-hex]/[code-color-in-text]/[size]/[spotify-URI].
Using Burp Suite, when scanning a code through Spotify the app sends a request to Spotify's API: https://spclient.wg.spotify.com/scannable-id/id/[CODE]?format=json where [CODE] is the media reference that you were looking for. This request can be made through python but only with the [TOKEN] that was generated through the app as this is the only way to get the correct scope. The app token expires in about half an hour.
import requests
head={
"X-Client-Id": "58bd3c95768941ea9eb4350aaa033eb3",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"App-Platform": "iOS",
"Accept": "*/*",
"User-Agent": "Spotify/8.5.68 iOS/13.4 (iPhone9,3)",
"Accept-Language": "en",
"Authorization": "Bearer [TOKEN]",
"Spotify-App-Version": "8.5.68"}
response = requests.get('https://spclient.wg.spotify.com:443/scannable-id/id/26560102031?format=json', headers=head)
print(response)
print(response.json())
Which returns:
<Response [200]>
{'target': 'spotify:playlist:37i9dQZF1DXcBWIGoYBM5M'}
So 26560102031 is the media reference for your playlist.
The patent states that the code is first detected and then possibly converted into 63 bits using a Gray table. For example 361354354471425226605 is encoded into 010 101 001 010 111 110 010 111 110 110 100 001 110 011 111 011 011 101 101 000 111.
However the code sent to the API is 6875667268, I'm unsure how the media reference is generated but this is the number used in the lookup table.
The reference contains the integers 0-9 compared to the gray table of 0-7 implying that an algorithm using normal binary has been used. The patent talks about using a convolutional code and then the Viterbi algorithm for error correction, so this may be the output from that. Something that is impossible to recreate whithout the states I believe. However I'd be interested if you can interpret the patent any better.
This media reference is 10 digits however others have 11 or 12.
Here are two more examples of the raw distances, the gray table binary and then the media reference:
1.
022673352171662032460
000 011 011 101 100 010 010 111 011 001 100 001 101 101 011 000 010 011 110 101 000
67775490487
2.
574146602473467556050
111 100 110 001 110 101 101 000 011 110 100 010 110 101 100 111 111 101 000 111 000
57639171874
edit:
Some extra info:
There are some posts online describing how you can encode any text such as spotify:playlist:HelloWorld into a code however this no longer works.
I also discovered through the proxy that you can use the domain to fetch the album art of a track above the code. This suggests a closer integration of Spotify's API and this scannables url than previously thought. As it not only stores the URIs and their codes but can also validate URIs and return updated album art.
https://scannables.scdn.co/uri/800/spotify%3Atrack%3A0J8oh5MAMyUPRIgflnjwmB
Your suspicion was correct - they're using a lookup table. For all of the fun technical details, the relevant patent is available here: https://data.epo.org/publication-server/rest/v1.0/publication-dates/20190220/patents/EP3444755NWA1/document.pdf
Very interesting discussion. Always been attracted to barcodes so I had to take a look. I did some analysis of the barcodes alone (didn't access the API for the media refs) and think I have the basic encoding process figured out. However, based on the two examples above, I'm not convinced I have the mapping from media ref to 37-bit vector correct (i.e. it works in case 2 but not case 1). At any rate, if you have a few more pairs, that last part should be simple to work out. Let me know.
For those who want to figure this out, don't read the spoilers below!
It turns out that the basic process outlined in the patent is correct, but lacking in details. I'll summarize below using the example above. I actually analyzed this in reverse which is why I think the code description is basically correct except for step (1), i.e. I generated 45 barcodes and all of them matched had this code.
1. Map the media reference as integer to 37 bit vector.
Something like write number in base 2, with lowest significant bit
on the left and zero-padding on right if necessary.
57639171874 -> 0100010011101111111100011101011010110
2. Calculate CRC-8-CCITT, i.e. generator x^8 + x^2 + x + 1
The following steps are needed to calculate the 8 CRC bits:
Pad with 3 bits on the right:
01000100 11101111 11110001 11010110 10110000
Reverse bytes:
00100010 11110111 10001111 01101011 00001101
Calculate CRC as normal (highest order degree on the left):
-> 11001100
Reverse CRC:
-> 00110011
Invert check:
-> 11001100
Finally append to step 1 result:
01000100 11101111 11110001 11010110 10110110 01100
3. Convolutionally encode the 45 bits using the common generator
polynomials (1011011, 1111001) in binary with puncture pattern
110110 (or 101, 110 on each stream). The result of step 2 is
encoded using tail-biting, meaning we begin the shift register
in the state of the last 6 bits of the 45 long input vector.
Prepend stream with last 6 bits of data:
001100 01000100 11101111 11110001 11010110 10110110 01100
Encode using first generator:
(a) 100011100111110100110011110100000010001001011
Encode using 2nd generator:
(b) 110011100010110110110100101101011100110011011
Interleave bits (abab...):
11010000111111000010111011110011010011110001...
1010111001110001000101011000010110000111001111
Puncture every third bit:
111000111100101111101110111001011100110000100100011100110011
4. Permute data by choosing indices 0, 7, 14, 21, 28, 35, 42, 49,
56, 3, 10..., i.e. incrementing 7 modulo 60. (Note: unpermute by
incrementing 43 mod 60).
The encoded sequence after permuting is
111100110001110101101000011110010110101100111111101000111000
5. The final step is to map back to bar lengths 0 to 7 using the
gray map (000,001,011,010,110,111,101,100). This gives the 20 bar
encoding. As noted before, add three bars: short one on each end
and a long one in the middle.
UPDATE: I've added a barcode (levels) decoder (assuming no errors) and an alternate encoder that follows the description above rather than the equivalent linear algebra method. Hopefully that is a bit more clear.
UPDATE 2: Got rid of most of the hard-coded arrays to illustrate how they are generated.
The linear algebra method defines the linear transformation (spotify_generator) and mask to map the 37 bit input into the 60 bit convolutionally encoded data. The mask is result of the 8-bit inverted CRC being convolutionally encoded. The spotify_generator is a 37x60 matrix that implements the product of generators for the CRC (a 37x45 matrix) and convolutional codes (a 45x60 matrix). You can create the generator matrix from an encoding function by applying the function to each row of an appropriate size generator matrix. For example, a CRC function that add 8 bits to each 37 bit data vector applied to each row of a 37x37 identity matrix.
import numpy as np
import crccheck
# Utils for conversion between int, array of binary
# and array of bytes (as ints)
def int_to_bin(num, length, endian):
if endian == 'l':
return [num >> i & 1 for i in range(0, length)]
elif endian == 'b':
return [num >> i & 1 for i in range(length-1, -1, -1)]
def bin_to_int(bin,length):
return int("".join([str(bin[i]) for i in range(length-1,-1,-1)]),2)
def bin_to_bytes(bin, length):
b = bin[0:length] + [0] * (-length % 8)
return [(b[i]<<7) + (b[i+1]<<6) + (b[i+2]<<5) + (b[i+3]<<4) +
(b[i+4]<<3) + (b[i+5]<<2) + (b[i+6]<<1) + b[i+7] for i in range(0,len(b),8)]
# Return the circular right shift of an array by 'n' positions
def shift_right(arr, n):
return arr[-n % len(arr):len(arr):] + arr[0:-n % len(arr)]
gray_code = [0,1,3,2,7,6,4,5]
gray_code_inv = [[0,0,0],[0,0,1],[0,1,1],[0,1,0],
[1,1,0],[1,1,1],[1,0,1],[1,0,0]]
# CRC using Rocksoft model:
# NOTE: this is not quite any of their predefined CRC's
# 8: number of check bits (degree of poly)
# 0x7: representation of poly without high term (x^8+x^2+x+1)
# 0x0: initial fill of register
# True: byte reverse data
# True: byte reverse check
# 0xff: Mask check (i.e. invert)
spotify_crc = crccheck.crc.Crc(8, 0x7, 0x0, True, True, 0xff)
def calc_spotify_crc(bin37):
bytes = bin_to_bytes(bin37, 37)
return int_to_bin(spotify_crc.calc(bytes), 8, 'b')
def check_spotify_crc(bin45):
data = bin_to_bytes(bin45,37)
return spotify_crc.calc(data) == bin_to_bytes(bin45[37:], 8)[0]
# Simple convolutional encoder
def encode_cc(dat):
gen1 = [1,0,1,1,0,1,1]
gen2 = [1,1,1,1,0,0,1]
punct = [1,1,0]
dat_pad = dat[-6:] + dat # 6 bits are needed to initialize
# register for tail-biting
stream1 = np.convolve(dat_pad, gen1, mode='valid') % 2
stream2 = np.convolve(dat_pad, gen2, mode='valid') % 2
enc = [val for pair in zip(stream1, stream2) for val in pair]
return [enc[i] for i in range(len(enc)) if punct[i % 3]]
# To create a generator matrix for a code, we encode each row
# of the identity matrix. Note that the CRC is not quite linear
# because of the check mask so we apply the lamda function to
# invert it. Given a 37 bit media reference we can encode by
# ref * spotify_generator + spotify_mask (mod 2)
_i37 = np.identity(37, dtype=bool)
crc_generator = [_i37[r].tolist() +
list(map(lambda x : 1-x, calc_spotify_crc(_i37[r].tolist())))
for r in range(37)]
spotify_generator = 1*np.array([encode_cc(crc_generator[r]) for r in range(37)], dtype=bool)
del _i37
spotify_mask = 1*np.array(encode_cc(37*[0] + 8*[1]), dtype=bool)
# The following matrix is used to "invert" the convolutional code.
# In particular, we choose a 45 vector basis for the columns of the
# generator matrix (by deleting those in positions equal to 2 mod 4)
# and then inverting the matrix. By selecting the corresponding 45
# elements of the convolutionally encoded vector and multiplying
# on the right by this matrix, we get back to the unencoded data,
# assuming there are no errors.
# Note: numpy does not invert binary matrices, i.e. GF(2), so we
# hard code the following 3 row vectors to generate the matrix.
conv_gen = [[0,1,0,1,1,1,1,0,1,1,0,0,0,1]+31*[0],
[1,0,1,0,1,0,1,0,0,0,1,1,1] + 32*[0],
[0,0,1,0,1,1,1,1,1,1,0,0,1] + 32*[0] ]
conv_generator_inv = 1*np.array([shift_right(conv_gen[(s-27) % 3],s) for s in range(27,72)], dtype=bool)
# Given an integer media reference, returns list of 20 barcode levels
def spotify_bar_code(ref):
bin37 = np.array([int_to_bin(ref, 37, 'l')], dtype=bool)
enc = (np.add(1*np.dot(bin37, spotify_generator), spotify_mask) % 2).flatten()
perm = [enc[7*i % 60] for i in range(60)]
return [gray_code[4*perm[i]+2*perm[i+1]+perm[i+2]] for i in range(0,len(perm),3)]
# Equivalent function but using CRC and CC encoders.
def spotify_bar_code2(ref):
bin37 = int_to_bin(ref, 37, 'l')
enc_crc = bin37 + calc_spotify_crc(bin37)
enc_cc = encode_cc(enc_crc)
perm = [enc_cc[7*i % 60] for i in range(60)]
return [gray_code[4*perm[i]+2*perm[i+1]+perm[i+2]] for i in range(0,len(perm),3)]
# Given 20 (clean) barcode levels, returns media reference
def spotify_bar_decode(levels):
level_bits = np.array([gray_code_inv[levels[i]] for i in range(20)], dtype=bool).flatten()
conv_bits = [level_bits[43*i % 60] for i in range(60)]
cols = [i for i in range(60) if i % 4 != 2] # columns to invert
conv_bits45 = np.array([conv_bits[c] for c in cols], dtype=bool)
bin45 = (1*np.dot(conv_bits45, conv_generator_inv) % 2).tolist()
if check_spotify_crc(bin45):
return bin_to_int(bin45, 37)
else:
print('Error in levels; Use real decoder!!!')
return -1
And example:
>>> levels = [5,7,4,1,4,6,6,0,2,4,3,4,6,7,5,5,6,0,5,0]
>>> spotify_bar_decode(levels)
57639171874
>>> spotify_barcode(57639171874)
[5, 7, 4, 1, 4, 6, 6, 0, 2, 4, 3, 4, 6, 7, 5, 5, 6, 0, 5, 0]

Q-Learning neural network implementation

I was trying to implement Q-Learning with neural networks. I've got q-learning with a q-table working perfectly fine.
I am playing a little "catch the cheese" game.
It looks something like this:
# # # # # # # #
# . . . . . . #
# . $ . . . . #
# . . . P . . #
# . . . . . . #
# . . . . . . #
# . . . . . . #
# # # # # # # #
The player p is spawning somewhere on the map. If he hits a wall, the reward will be negative. Lets call that negative reward -R for now.
If the player p hits the dollar sign, the reward will be positive. This positive reward will be +R
In both cases, the game will reset and the player will spawn somewhere randomly on the map.
My neural network architecture looks like this:
-> Inputsize: [1, 8, 8]
Flattening: [1, 1, 64] (So I can use Dense layers)
Dense Layer: [1, 1, 4]
-> Outputsize: [1, 1, 4]
For the learning, I am storing some game samples in a buffer. The buffer maximum size is b_max.
So my training looks like this:
Pick a random number between 0 and 1
If the number is greater than the threshold, choose a random action.
Otherwise pick the action with the highest reward.
Take that action and observe the reward.
Update my neural network by choosing a batch of game samples from the buffer
5.1 Iterate through the batch and train the network as following:
5.2 For each batch. The input to the network is the game state. (Everywhere 0, except at the players position).
5.3 The output error of the output layer will be 0 everywhere except at the output neuron that is equal to the action that has been taking at that sample.
5.4 Here the expected output will be:
(the reward) + (discount_factor * future_reward) (future_reward = max (neuralNetwork(nextState))
5.5 Do everything from the beginning.
The thing is that it just doesn't seem to work properly.
I've an idea on how I could change this so it works but I am not sure if this is "allowed":
Each game decision could be trained until it does exactly what is supposed to do.
Then I would go to the next decision and train on that and so on. How is the training usually done?
I would be very happy if someone could help and give me a detailed explanation on how the training works. Especially when it comes to "how many times do run what loop?".
Greetings,
Finn
This is a map that shows what decision the neural network would like to do on each field:
# # # # # # # # # #
# 1 3 2 0 2 3 3 3 #
# 1 1 1 1 0 2 2 3 #
# 0 0 $ 1 3 0 1 1 #
# 1 0 1 2 1 0 3 3 #
# 0 1 2 3 1 0 3 0 # //The map is a little bit bigger but still it can be seen that it is wrong
# 2 0 1 3 1 0 3 0 # //0: right, 1 bottom, 2 left, 3 top
# 1 0 1 0 2 3 2 1 #
# 0 3 1 3 1 3 1 0 #
# # # # # # # # # #

Compare contrasts in linear model in Python (like Rs contrast library?)

In R I can do the following to compare two contrasts from a linear model:
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/spider_wolff_gorb_2013.csv"
filename <- "spider_wolff_gorb_2013.csv"
install.packages("downloader", repos="http://cran.us.r-project.org")
library(downloader)
if (!file.exists(filename)) download(url, filename)
spider <- read.csv(filename, skip=1)
head(spider, 5)
# leg type friction
# 1 L1 pull 0.90
# 2 L1 pull 0.91
# 3 L1 pull 0.86
# 4 L1 pull 0.85
# 5 L1 pull 0.80
fit = lm(friction ~ type + leg, data=spider)
fit
# Call:
# lm(formula = friction ~ type + leg, data = spider)
#
# Coefficients:
# (Intercept) typepush legL2 legL3 legL4
# 1.0539 -0.7790 0.1719 0.1605 0.2813
install.packages("contrast", repos="http://cran.us.r-project.org")
library(contrast)
l4vsl2 = contrast(fit, list(leg="L4", type="pull"), list(leg="L2",type="pull"))
l4vsl2
# lm model parameter contrast
#
# Contrast S.E. Lower Upper t df Pr(>|t|)
# 0.1094167 0.04462392 0.02157158 0.1972618 2.45 277 0.0148
I have found out how to do much of the above in Python:
import pandas as pd
df = pd.read_table("https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/spider_wolff_gorb_2013.csv", sep=",", skiprows=1)
df.head(2)
import statsmodels.formula.api as sm
model1 = sm.ols(formula='friction ~ type + leg', data=df)
fitted1 = model1.fit()
print(fitted1.summary())
Now all that remains is finding the t-statistic for the contrast of leg pair L4 vs. leg pair L2. Is this possible in Python?
statsmodels is still missing some predefined contrasts, but the t_test and wald_test or f_test methods of the model Results classes can be used to test linear (or affine) restrictions. The restrictions either be given by arrays or by strings using the parameter names.
Details for how to specify contrasts/restrictions should be in the documentation
for example
>>> tt = fitted1.t_test("leg[T.L4] - leg[T.L2]")
>>> print(tt.summary())
Test for Constraints
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
c0 0.1094 0.045 2.452 0.015 0.022 0.197
==============================================================================
The results are attributes or methods in the instance that is returned by t_test. For example the conf_int can be obtained by
>>> tt.conf_int()
array([[ 0.02157158, 0.19726175]])
t_test is vectorized and treats each restriction or contrast as separate hypothesis. wald_test treats a list of restrictions as joint hypothesis:
>>> tt = fitted1.t_test(["leg[T.L3] - leg[T.L2], leg[T.L4] - leg[T.L2]"])
>>> print(tt.summary())
Test for Constraints
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
c0 -0.0114 0.043 -0.265 0.792 -0.096 0.074
c1 0.1094 0.045 2.452 0.015 0.022 0.197
==============================================================================
>>> tt = fitted1.wald_test(["leg[T.L3] - leg[T.L2], leg[T.L4] - leg[T.L2]"])
>>> print(tt.summary())
<F test: F=array([[ 8.10128575]]), p=0.00038081249480917173, df_denom=277, df_num=2>
Aside: this also works for robust covariance matrices if cov_type was specified as argument to fit.

Model does not support probabiliy estimates in libsvm

I used libsvm in Matlab with the option '-b 1' in both training and prediction process. But it always returns Model does not support probabiliy estimates, so I don't get any probability or accuracy estimation. I tried in binary class SVM (not nu-svm!), it should have work with the '-b 1' but it's not. Does anyone know what's the reason for this problem?
Thanks
Let me show you the usage of svm-predict:
Usage: svm-predict [options] test_file model_file output_file
options:
-b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); for one-class SVM only 0 is supported
-q : quiet mode (no outputs)
svm-train:
Usage: svm-train [options] training_set_file [model_file] options:
-s svm_type : set type of SVM (default 0)
0 -- C-SVC (multi-class classification)
1 -- nu-SVC (multi-class classification)
2 -- one-class SVM
3 -- epsilon-SVR (regression) 4 -- nu-SVR (regression)
-t kernel_type : set type of kernel function (default 2)
0 -- linear: u'v 1 -- polynomial: (gammau'v + coef0)^degree
2 -- radial basis function: exp(-gamma|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
4 -- precomputed kernel (kernel values in training_set_file)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)
-v n: n-fold cross validation mode
-q : quiet mode (no outputs)
We can see that the last fourth line is -b option. If we trained the model with '-b 1' option, we'll get a model that can output probability when you try to predict. Otherwise, if you only use '-b 1' option when you try to predict and not generate a model with '-b 1'. you will get the error : Model does not support probabiliy estimates
The main thing is that if you want to get probabiliy estimates, you should use '-b 1' in your train and test process, both of them.
Actually your questions needs more information to get a proper answer. but generally, the part that is giving error is here in the source code:
try
{
BufferedReader input = new BufferedReader(new FileReader(argv[i]));
DataOutputStream output = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(argv[i+2])));
svm_model model = svm.svm_load_model(argv[i+1]);
if(predict_probability == 1)
{
if(svm.svm_check_probability_model(model)==0)
{
System.err.print("Model does not support probabiliy estimates\n");
System.exit(1);
}
}
else
{
if(svm.svm_check_probability_model(model)!=0)
{
System.out.print("Model supports probability estimates, but disabled in prediction.\n");
}
}
predict(input,output,model,predict_probability);
input.close();
output.close();
}
catch(FileNotFoundException e)
{
exit_with_help();
}
catch(ArrayIndexOutOfBoundsException e)
{
exit_with_help();
}
}
it mean it has does not find the probability model.