naive Bayes probability formula - naivebayes

In this photo is the probability with the Naives Bayes algorithm of :Fem:dv/m/s Young own Ex-credpaid Good ->62% and I calculated the Probability so: P(Fem:dv/m/s | Good)*P(Young | Good)*P(own | Good)*P(Ex-credpaid | good)*P(Good) -> 1/6*2/6*5/6*3/6*0.6=0,01389 and I dont Know where I Failed, could someone please tell me where is my error?
Table

Related

How to interpret relative stopping tolerance for optimization?

I'm spinning up on both integer linear programming and Matlab's Problem-Based Workflow. The Mixed Integer Linear Program (MILP) example, ends with references to stopping tolerances:
StepTolerance is generally used as a relative bound, meaning
iterations end when $|(x_i – x_{i+1})| < StepTolerance*(1 +
|x_i|)$, or a similar relative measure
I would have expected a relative tolerance to be used in a manner similar to one of the following:
| x_i - x_{i+1} | < StepTolerance |x_i|
| x_i - x_{i+1} | < StepTolerance | x_i + x_{i+1} |
| x_i - x_{i+1} | < StepTolerance | x_i + x_{i+1} | / 2
Can anyone please explain how the documented behaviour is "relative"?
For me, the same question applies to the (apparently relative) objective FunctionTolerance immediately following the StepTolerance section of the aforementioned stopping tolerances page.
Thanks.
Additional context
The cited MILP example ends in a Matlab message about gap tolerance, which pops up a help box when clicked. In turn, the help box links to the aforementioned stopping tolerances page. It is possible that the stopping tolerances page was too general a reference for the MILP example, and I'll mention this in the original post. However, I'm still curious about how to interpret the relativeness of the tolerance in the more general sense, e.g., say, in a gradient descent context.

Plot a curve in grafana

I have a 2-D time series for which I take 1-minute snapshots that I put in my influxdb.
To give a concrete example, consider a yield curve : this is a curve giving the interest rate by maturity date and looks like this:
maturity | 1YEAR | 2 YEARS | 2 YEARS | 3 YEARS | 4 YEARS | 5 YEARS |
interest | 0.5 | 0.75 | 0.83 | 0.99 | 1.01 | 1.05 |
My application takes snapshots of the curve and stores them in influxdb.
Now I want to plot these snapshots in grafana. So at one particular time stamp I want to plot the curve (X axis will be my maturities, and Y axis the corresponding interest rates for each maturity).
Can this be done in Grafana?
To the best of my knowledge, this is not currently possible with Grafana. One of your axes must always be time.

Using Landsat 7 to go from NDVI to Emissivity

I am using Landsat 7 to calculate land surface derived temperature.
I understand the main concepts behind the conversion, however, I am confused on how to factor Emissivity into my model.
I am using the model builder for my calculations and have created several modules that uses the instruments Gain, Bias Offset, Landsat K1, and Landsat K2 correction variables.
I converted the DN to radiance values as well.
Now, I need to factor in the last and probably the most confusing (for me) part: Emissivity.
I would like to calculate Emissivity using the NDVI.
I have a model procedure built to calculate the NDVI layer (band4- band3)/(band4+ band3).
I have also calculated Pv, which is the fraction of vegetation calculated by: [NDVI - NDVI_min]/[NDVI_max-NDVI_min]^2.
Now, by using the Vegetation Cover Method, all I need is Ev and Eg.
I do not understand how to find these values to calculate the Total Emissivity value per cell.
Does anyone have any idea on how I can incorporate the Emissivity into my formulation?
I am slightly confused on how to derive this value...
I believe Emissivity is frequently included as part of the dataset. Alternatively, emissivity databases do exist (such as the ASTER database here: https://lpdaac.usgs.gov/about/news_archive/aster_global_emissivity_database_ged_product_release, and others usually maintained by academic departments.)
Values of Ev = 0.99 and Eg = 0.97 are used, and the method of selection discussed, on p. 436 here: ftp://atmosfera.cl/pub/elias/Paula/2004_Sobrino_RSE.pdf (J.A. Sobrino et al. Land surface temperature retrieval from LANDSAT TM 5, Remote Sensing of Environment 90, 2004, p. 434–440).
Another approach is taken here: http://fromgistors.blogspot.com/2014/01/estimation-of-land-surface-temperature.html
Estimation of Land Surface Temperature
There are several studies about the calculation of land surface temperature. For instance, using NDVI for the estimation of land surface emissivity (Sobrino, et al., 2004), or using a land cover classification for the definition of the land surface emissivity of each class (Weng, et al. 2004).
For instance, the emissivity (e) values of various land cover types are provided in the following table (from Mallick, et al. 2012).
Soil: 0.928
Grass: 0.982
Asphalt: 0.942
Concrete: 0.937
Therefore, the land surface temperature can be calculated as (Weng, et al. 2004):
T = TB / [ 1 + (? * TB / ?) lne ]
where:
? = wavelength of emitted radiance
? = h * c / s (1.438 * 10^-2 m K)
h = Planck’s constant (6.626 * 10^-34 Js)
s = Boltzmann constant (1.38 * 10^-23 J/K)
c = velocity of light (2.998 * 10^8 m/s)
The values of ? for the thermal bands of Landsat setellites are listed in the following table:
| Satellite | Band | Center wavelength (µm) |
| Landsat 4, 5, and 7 | 6 | 11.45 |
| Landsat 8 | 10 | 10.8 |
| Landsat 8 | 11 | 12 |
Further reading on emissivity selection, see section 2.3, Emissivity Retrieval, here: https://books.google.com/books?id=XN4uAYlexnsC&lpg=PA51&ots=YQrmDa2S1G&dq=vegetation%20and%20bare%20soil%20emissivity&pg=PA50#v=onepage&q&f=false

Bootstrap weighted data - Matlab

I have a simple dataset with values and absolute frequencies, like the table below:
value|freq
-----------
1 | 10
3 | 20
4 | 10
3 | 10
And now I'd like to calculate the frequency table, like:
value| %
-----------
1 | 1/5
3 | 3/5
4 | 1/5
And last step, I'd like to compute the bootstrap CI with matlab. I have a lot of rows in the dataset.
I've calculated the frequency table via grpstatscommand in Matlab, but I don't know how I can use it in the boostrp function in matlab.
Any help or suggestions would be really appreciated.

Training LIBSVM with multivariate data in MATLAB

How LIBSVM works performs multivariate regression is my generalized question?
In detail, I have some data for certain number of links. (Example 3 links). Each link has 3 dependent variables which when used in a model gives output Y. I have data collected on these links in some interval.
LinkId | var1 | var2 | var3 | var4(OUTPUT)
1 | 10 | 12.1 | 2.2 | 3
2 | 11 | 11.2 | 2.3 | 3.1
3 | 12 | 12.4 | 4.1 | 1
1 | 13 | 11.8 | 2.2 | 4
2 | 14 | 12.7 | 2.3 | 2
3 | 15 | 10.7 | 4.1 | 6
1 | 16 | 8.6 | 2.2 | 6.6
2 | 17 | 14.2 | 2.3 | 4
3 | 18 | 9.8 | 4.1 | 5
I need to perform prediction to find the output of
(2,19,10.2,2.3).
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference?
NOTE : Notice each link with same ID has same value.
This is not really a matlab or libsvm question but rather a generic svm related one.
How LIBSVM works performs multivariate regression is my generalized question?
LibSVM is just a library, which in particular - implements the Support Vector Regression model for the regression tasks. In short words, in a linear case, SVR tries to find a hyperplane for which your data points are placed in some margin around it (which is quite a dual approach to the classical SVM which tries to separate data with as big margin as possible).
In non linear case the kernel trick is used (in the same fashion as in SVM), so it is still looking for a hyperplane, but in a feature space induced by the particular kernel, which results in the non linear regression in the input space.
Quite nice introduction to SVRs' can be found here:
http://alex.smola.org/papers/2003/SmoSch03b.pdf
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.
You could train SVR (as it is a regression problem) with the whole data, but:
seems that var3 and LinkId are the same variables (1->2.2, 2->2.3, 3->4.1), if this is a case you should remove the LinkId column,
are values of var1 unique ascending integers? If so, these are also probably a useless featues (as they do not seem to carry any information, they seem to be your id numbers),
you should preprocess your data before applying SVM so eg. each column contains values from the [0,1] interval, otherwise some features may become more important than others just because of their scale.
Now, if you would like to create a separate model for each link, and follow above clues, you end up with 1 input variable (var2) and 1 output variable var4, so I would not recommend such a step. In general it seems that you have very limited featues set, it would be valuable to gather more informative features.