3D or 4D interpolate to find the corrsponding values based on 4 column of variables - matlab

I'm trying to find out whether if its possible to find/interpolate to calculate the corresponding values from this set of variables
+-------------+-------------+------+------+
| x | y | z | g |
+-------------+-------------+------+------+
| 150.8385804 | 183.7613678 | 0.58 | 2 |
| 171.0745381 | 231.7033081 | 2 | 0.58 |
| 179.1394672 | 244.5019837 | 0.8 | 0.8 |
| 149.1849453 | 180.7103271 | 0.8 | 2 |
| 162.5648017 | 212.8121033 | 2 | 0.8 |
| 141.1687115 | 163.4759979 | 0.8 | 3 |
| 140.7505385 | 162.7905884 | 0.9 | 3 |
| 148.1461022 | 180.5486908 | 1.8 | 1.6 |
| 147.1552106 | 178.7599182 | 2 | 1.6 |
+-------------+-------------+------+------+
What would be the corresponding z and g for x=143 and y=179? I do have access to matlab if anyone can suggest a code for it.
Here is the MATLAB syntax to load the above data into your workspace:
X = [150.8385804 171.0745381 179.1394672 149.1849453 162.5648017 141.1687115 140.7505385 148.1461022 147.1552106].';
Y = [183.7613678 231.7033081 244.5019837 180.7103271 212.8121033 163.4759979 162.7905884 180.5486908 178.7599182].';
Z = [0.58 2 0.8 0.8 2 0.8 0.9 1.8 2].';
G = [2 0.58 0.8 2 0.8 3 3 1.6 1.6].';

You can use scatteredInterpolant to do this for you. scatteredInterpolant is used to perform interpolation on a scattered dataset, which is basically what you have. Actually, you can do it twice: Once for z and once for g. You specify x and y as key / control points with the corresponding z and g output points. scatteredInterpolant will create an object for you, and you can specify custom x and y values for each of the z and g scatteredInterpolants and it will give you an interpolated answer. The default interpolation method is linear. As such, you'd specify x=143 and y=179 and figure out what the output z and g are.
In other words:
X = [150.8385804 171.0745381 179.1394672 149.1849453 162.5648017 141.1687115 140.7505385 148.1461022 147.1552106].';
Y = [183.7613678 231.7033081 244.5019837 180.7103271 212.8121033 163.4759979 162.7905884 180.5486908 178.7599182].';
Z = [0.58 2 0.8 0.8 2 0.8 0.9 1.8 2].';
G = [2 0.58 0.8 2 0.8 3 3 1.6 1.6].';
%// Create scatteredInterpolant
Zq = scatteredInterpolant(X, Y, Z);
Gq = scatteredInterpolant(X, Y, G);
%// Figure out interpolated values
zInterp = Zq(143, 179);
gInterp = Gq(143, 179);

Related

Value of column, based on function over another column in Matlab table

I'm interested in the value of result that is in the same row as the min value of each column (and I have many columns, so I would like to loop over them, or do rowfun but I do not know how to get 'result' then).
Table A
+----+------+------+----+------+------+--------+
| x1 | x2 | x3 | x4 | x5 | x6 | result |
+----+------+------+----+------+------+--------+
| 1 | 4 | 10 | 3 | 12 | 2 | 8 |
| 10 | 2 | 8 | 1 | 12 | 3 | 10 |
| 5 | 10 | 5 | 4 | 2 | 10 | 12 |
+----+------+------+----+------+------+--------+
Solution
8 10 12 10 12 8
I know that I can apply rowfun, but then I don't know how to get result.
And then, I can do this, but cannot loop over all the columns:
A(cell2mat(A.x1) == min(cell2mat(A.x1)), 7)
and I have tried several ways of making this into a variable but I can't make it work, so that:
A(cell2mat(variable) == min(cell2mat(variable)), 7)
Thank you!
Assuming your data is homogeneous you can use table2array and the second output of min to index your results:
% Set up table
x1 = [1 10 5];
x2 = [4 2 10];
x3 = [10 8 5];
x4 = [3 1 4];
x5 = [12 12 2];
x6 = [2 3 10];
result = [8 10 12];
t = table(x1.', x2.', x3.', x4.', x5.', x6.', result.', ...
'VariableNames', {'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'result'});
% Convert
A = table2array(t);
% When passed a matrix, min finds minimum of each column by default
% Exclude the results column, assumed to be the last
[~, minrow] = min(A(:, 1:end-1));
solution = t.result(minrow)'
Which returns:
solution =
8 10 12 10 12 8
From the documentation for min:
M = min(A) returns the smallest elements of A.
<snip>
If A is a matrix, then min(A) is a row vector containing the minimum value of each column.

YUI 3 Chart Axis Label Positioning?+

I have created a chart using YUI and want to display axis labels. The x axis is fine, but the y axis label appears inside of the data.
Here is what is happening:
10 |
9 |
8 |
7 l |
6 a |
5 b | CHART
4 e |
3 l |
2 |
1 |
0 |_____________________________
Here is what I want to happen:
10 |
9 |
8 |
l 7 |
a 6 |
b 5 | CHART
e 4 |
l 3 |
2 |
1 |
0 |_____________________________
Here is my code for the chart axes:
var chartaxes = {
timeelapsed:{
position:"bottom",
type:"category",
title:"label"
},
kWh:{
position:"left",
type:"numeric",
title:"label",
}
};
Is there any way to fix this?
You need to set the series that is associated with the axes to display the title correctly using the keys variable.
var chartaxes = {
timeelapsed:{
position:"bottom",
type:"category",
title:"Time Elapsed (minutes)",
keys: ["category"],
and so on.

Difference between correctly / incorrectly classified instances in decision tree and confusion matrix in Weka

I have been using Weka’s J48 decision tree to classify frequencies of keywords
in RSS feeds into target categories. And I think I may have a problem
reconciling the generated decision tree with the number of correctly classified
instances reported and in the confusion matrix.
For example, one of my .arff files contains the following data extracts:
#attribute Keyword_1_nasa_Frequency numeric
#attribute Keyword_2_fish_Frequency numeric
#attribute Keyword_3_kill_Frequency numeric
#attribute Keyword_4_show_Frequency numeric
...
#attribute Keyword_64_fear_Frequency numeric
#attribute RSSFeedCategoryDescription {BFE,FCL,F,M, NCA, SNT,S}
#data
0,0,0,34,0,0,0,0,0,40,0,0,0,0,0,0,0,0,0,0,24,0,0,0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,10,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
...
20,0,64,19,0,162,0,0,36,72,179,24,24,47,24,40,0,48,0,0,0,97,24,0,48,205,143,62,78,
0,0,216,0,36,24,24,0,0,24,0,0,0,0,140,24,0,0,0,0,72,176,0,0,144,48,0,38,0,284,
221,72,0,72,0,SNT
...
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,S
And so on: there’s a total of 64 keywords (columns) and 570 rows where each one contains the frequency of a keyword in a feed for a day. In this case, there are 57 feeds for
10 days giving a total of 570 records to be classified. Each keyword is prefixed
with a surrogate number and postfixed with ‘Frequency’.
My use of the decision tree is with default parameters using 10x validation.
Weka reports the following:
Correctly Classified Instances 210 36.8421 %
Incorrectly Classified Instances 360 63.1579 %
With the following confusion matrix:
=== Confusion Matrix ===
a b c d e f g <-- classified as
11 0 0 0 39 0 0 | a = BFE
0 0 0 0 60 0 0 | b = FCL
1 0 5 0 72 0 2 | c = F
0 0 1 0 69 0 0 | d = M
3 0 0 0 153 0 4 | e = NCA
0 0 0 0 90 10 0 | f = SNT
0 0 0 0 19 0 31 | g = S
The tree is as follows:
Keyword_22_health_Frequency <= 0
| Keyword_7_open_Frequency <= 0
| | Keyword_52_libya_Frequency <= 0
| | | Keyword_21_job_Frequency <= 0
| | | | Keyword_48_pic_Frequency <= 0
| | | | | Keyword_63_world_Frequency <= 0
| | | | | | Keyword_26_day_Frequency <= 0: NCA (461.0/343.0)
| | | | | | Keyword_26_day_Frequency > 0: BFE (8.0/3.0)
| | | | | Keyword_63_world_Frequency > 0
| | | | | | Keyword_31_gaddafi_Frequency <= 0: S (4.0/1.0)
| | | | | | Keyword_31_gaddafi_Frequency > 0: NCA (3.0)
| | | | Keyword_48_pic_Frequency > 0: F (7.0)
| | | Keyword_21_job_Frequency > 0: BFE (10.0/1.0)
| | Keyword_52_libya_Frequency > 0: NCA (31.0)
| Keyword_7_open_Frequency > 0
| | Keyword_31_gaddafi_Frequency <= 0: S (32.0/1.0)
| | Keyword_31_gaddafi_Frequency > 0: NCA (4.0)
Keyword_22_health_Frequency > 0: SNT (10.0)
My question concerns reconciling the matrix to the tree or vice versa. As far as
I understand the results, a rating like (461.0/343.0) indicates that 461 instances have been classified as NCA. But how can that be when the matrix reveals only 153? I am
not sure how to interpret this so any help is welcome.
Thanks in advance.
The number in parentheses at each leaf should be read as (number of total instances of this classification at this leaf / number of incorrect classifications at this leaf).
In your example for the first NCA leaf, it says there are 461 test instances that were classified as NCA, and of those 461, there were 343 incorrect classifications. So there are 461-343 = 118 correctly classified instances at that leaf.
Looking through your decision tree, note that NCA is also at other leaves. I count 118 + 3 + 31 + 4 = 156 correctly classified instances out of 461 + 3 + 31 + 4 = 499 total classifications of NCA.
Your confusion matrix shows 153 correct classifications of NCA out of 39 + 60 + 72 + 69 + 153 + 90 + 19 = 502 total classifications of NCA.
So there is a slight difference between the tree (156/499) and your confusion matrix (153/502).
Note that if you are running Weka from the command-line, it outputs a tree and a confusion matrix for testing on all the training data and also for testing with cross-validation. Be careful that you are looking at the right matrix for the right tree. Weka outputs both training and test results, resulting in two pairs of matrix and tree. You may have mixed them up.

MATLAB: Identify if a value is repeated sequentially N times in a vector

I am trying to identify if a value is repeated sequentially in a vector N times. The challenge I am facing is that it could be repeated sequentially N times several times within the vector. The purpose is to determine how many times in a row certain values fall above the mean value. For example:
>> return_deltas
return_deltas =
7.49828129642663
11.5098198572327
15.1776644881294
11.256677995536
6.22315734182976
8.75582103474613
21.0488849115947
26.132605745393
27.0507649089989
...
(I only printed a few values for example but the vector is large.)
>> mean(return_deltas)
ans =
10.50007490258002
>> sum(return_deltas > mean(return_deltas))
ans =
50
So there are 50 instances of a value in return_deltas being greater than the mean of return_deltas.
I need to identify the number of times, sequentially, the value in return_deltas is greater than its mean 3 times in a row. In other words, if the values in return_deltas are greater than its mean 3 times in a row, that is one instance.
For example:
---------------------------------------------------------------------
| `return_delta` value | mean | greater or less | sequence |
|--------------------------------------------------------------------
| 7.49828129642663 |10.500074902 | LT | 1 |
| 11.5098198572327 |10.500074902 | GT | 1 |
| 15.1776644881294 |10.500074902 | GT | 2 |
| 11.256677995536 |10.500074902 | GT | 3 * |
| 6.22315734182976 |10.500074902 | LT | 1 |
| 8.75582103474613 |10.500074902 | LT | 2 |
| 21.0488849115947 |10.500074902 | GT | 1 |
| 26.132605745393 |10.500074902 | GT | 2 |
| 27.0507649089989 |10.500074902 | GT | 3 * |
---------------------------------------------------------------------
The star represents a successful sequence of 3 in a row. The result of this set would be two because there were two occasions where the value was greater than the mean 3 times in a row.
What I am thinking is to create a new vector:
>> a = return_deltas > mean(return_deltas)
that of course contains ones where values in return_deltas is greater than the mean and using it to find how many times sequentially, the value in return_deltas is greater than its mean 3 times in a row. I am attempting to do this with a built in function (if there is one, I have not discovered it) or at least avoiding loops.
Any thoughts on how I might approach?
With a little work, this snippet finds the starting index of every run of numbers:
[0 find(diff(v) ~= 0)] + 1
An Example:
>> v = [3 3 3 4 4 4 1 2 9 9 9 9 9]; # vector of integers
>> run_starts = [0 find(diff(v) ~= 0)] + 1 # may be better to diff(v) < EPSILON, for floating-point
run_starts =
1 4 7 8 9
To find the length of each run
>> run_lengths = [diff(run_starts), length(v) - run_starts(end) + 1]
This variables then makes it easy to query which runs were above a certain number
>> find(run_lengths >= 4)
ans =
5
>> find(run_lengths >= 2)
ans =
1 2 5
This tells us that the only run of at least four integers in a row was run #5.
However, there were three runs that were at least two integers in a row, specifically runs #1, #2, and #5.
You can reference where each run starts from the run_starts variable.

How to import space-formatted tables (9.0)?

This are the first three lines of my text file:
Dist Mv CL Typ LTef logg Age Mass B-V U-B V-I V-K V [Fe/H] l b Av Mbol
0.033 14.40 5 7.90 3.481 5.10 1 0.15 1.723 1.512 3.153 5.850 17.008 0.13 0.50000 0.50000 0.014 12.616
0.033 7.40 5 6.50 3.637 4.62 7 0.71 1.178 0.984 1.302 2.835 10.047 -0.56 0.50000 0.50000 0.014 6.125
0.052 11.70 5 7.40 3.529 4.94 2 0.31 1.541 1.167 2.394 4.565 15.393 -0.10 0.50000 0.50000 0.028 10.075
Assuming I have the right columns, how do I import this?
Bonus: is it possible/are there tools to create the schema from these kinds of files automatically?
At lowest level you could just use COPY command (or \copy in psql if you don't have access to superuser account, which COPY requires to load data from file). Unfortunately you have to create table structure at first (there is no built-in guess-by-header feature), but it looks straightforward to write one.
Choose whaterer suitable datatype you need e.g. real for single precision floating-point (IEEE 754), double precision or numeric if you need arbitrary precision numbers:
CREATE TABLE measurement
(
"Dist" double precision,
"Mv" double precision,
"CL" double precision,
"Typ" double precision,
"LTef" double precision,
"logg" double precision,
"Age" double precision,
"Mass" double precision,
"B-V" double precision,
"U-B" double precision,
"V-I" double precision,
"V-K" double precision,
"V" double precision,
"[Fe/H]" double precision,
"l" double precision,
"b" double precision,
"Av" double precision,
"Mbol" double precision
);
Another thing is that your file contains multiple spaces between values, so it's better to transform it into single-tab-delimited entries (there are plenty of tools to do this):
$ sed 's/ */\t/g' import.csv
Dist Mv CL Typ LTef logg Age Mass B-V U-B V-I V-K V [Fe/H] l b Av Mbol
0.033 14.40 5 7.90 3.481 5.10 1 0.15 1.723 1.512 3.153 5.850 17.008 0.13 0.50000 0.50000 0.014 12.616
0.033 7.40 5 6.50 3.637 4.62 7 0.71 1.178 0.984 1.302 2.835 10.047 -0.56 0.50000 0.50000 0.014 6.125
0.052 11.70 5 7.40 3.529 4.94 2 0.31 1.541 1.167 2.394 4.565 15.393 -0.10 0.50000 0.50000 0.028 10.075
Finally you can import your file straight into Postgres database, for example:
=> \copy measurement FROM '/path/import.csv' (FORMAT csv, DELIMITER E'\t', HEADER 'true')
=> TABLE measurement;
Dist | Mv | CL | Typ | LTef | logg | Age | Mass | B-V | U-B | V-I | V-K | V | [Fe/H] | l | b | Av | Mbol
-------+------+----+-----+-------+------+-----+------+-------+-------+-------+-------+--------+--------+-----+-----+-------+--------
0.033 | 14.4 | 5 | 7.9 | 3.481 | 5.1 | 1 | 0.15 | 1.723 | 1.512 | 3.153 | 5.85 | 17.008 | 0.13 | 0.5 | 0.5 | 0.014 | 12.616
0.033 | 7.4 | 5 | 6.5 | 3.637 | 4.62 | 7 | 0.71 | 1.178 | 0.984 | 1.302 | 2.835 | 10.047 | -0.56 | 0.5 | 0.5 | 0.014 | 6.125
0.052 | 11.7 | 5 | 7.4 | 3.529 | 4.94 | 2 | 0.31 | 1.541 | 1.167 | 2.394 | 4.565 | 15.393 | -0.1 | 0.5 | 0.5 | 0.028 | 10.075
(3 rows)