I'm tagging text to search for nouns and adjectives:
text = u"""Developed at the Vaccine and Gene Therapy Institute at the Oregon Health and Science University (OHSU), the vaccine proved successful in about fifty percent of the subjects tested and could lead to a human vaccine preventing the onset of HIV/AIDS and even cure patients currently on anti-retroviral drugs."""
nltk.pos_tag(nltk.word_tokenize(text))
This results in:
[('Developed', 'NNP'), ('at', 'IN'), ('the', 'DT'), ('Vaccine',
'NNP'), ('and', 'CC'), ('Gene', 'NNP'), ('Therapy', 'NNP'),
('Institute', 'NNP'), ('at', 'IN'), ('the', 'DT'), ('Oregon', 'NNP'),
('Health', 'NNP'), ('and', 'CC'), ('Science', 'NNP'), ('University',
'NNP'), ('(', 'NNP'), ('OHSU', 'NNP'), (')', 'NNP'), (',',
','), ('the', 'DT'), ('vaccine', 'NN'), ('proved', 'VBD'),
('successful', 'JJ'), ('in', 'IN'), ('about', 'IN'), ('fifty', 'JJ'),
('percent', 'NN'), ('of', 'IN'), ('the', 'DT'), ('subjects', 'NNS'),
('tested', 'VBD'), ('and', 'CC'), ('could', 'MD'), ('lead', 'VB'),
('to', 'TO'), ('a', 'DT'), ('human', 'NN'), ('vaccine', 'NN'),
('preventing', 'VBG'), ('the', 'DT'), ('onset', 'NN'), ('of', 'IN'),
('HIV/AIDS', 'NNS'), ('and', 'CC'), ('even', 'RB'), ('cure', 'NN'),
('patients', 'NNS'), ('currently', 'RB'), ('on', 'IN'),
('anti-retroviral', 'JJ'), ('drugs', 'NNS'), ('.', '.')]
Is there a built in way of correctly detecting parenthesis when tagging sentences?
If you know what you want to return as the tag value for the parens, then you can use a RegexpTagger to match the parens and fallback to the preferred tagger for all else.
import nltk
from nltk.data import load
_POS_TAGGER = 'taggers/maxent_treebank_pos_tagger/english.pickle'
tagger = load(_POS_TAGGER) # same tagger as using nltk.pos_tag
regexp_tagger = nltk.tag.RegexpTagger([(r'\(|\)', '--')], backoff = tagger)
regexp_tagger.tag(nltk.word_tokenize(text))
Result:
[(u'Developed', 'NNP'), (u'at', 'IN'), (u'the', 'DT'), (u'Vaccine',
'NNP'), (u'and', 'CC'), (u'Gene', 'NNP'), (u'Therapy', 'NNP'),
(u'Institute', 'NNP'), (u'at', 'IN'), (u'the', 'DT'), (u'Oregon',
'NNP'), (u'Health', 'NNP'), (u'and', 'CC'), (u'Science', 'NNP'),
(u'University', 'NNP'), (u'(', '--'), (u'OHSU', 'NNP'), (u')', '--'),
(u',', ','), (u'the', 'DT'), (u'vaccine', 'NN'), (u'proved', 'VBD'),
(u'successful', 'JJ'), (u'in', 'IN'), (u'about', 'IN'), (u'fifty',
'JJ'), (u'percent', 'NN'), (u'of', 'IN'), (u'the', 'DT'),
(u'subjects', 'NNS'), (u'tested', 'VBD'), (u'and', 'CC'), (u'could',
'MD'), (u'lead', 'VB'), (u'to', 'TO'), (u'a', 'DT'), (u'human', 'NN'),
(u'vaccine', 'NN'), (u'preventing', 'VBG'), (u'the', 'DT'), (u'onset',
'NN'), (u'of', 'IN'), (u'HIV/AIDS', 'NNS'), (u'and', 'CC'), (u'even',
'RB'), (u'cure', 'NN'), (u'patients', 'NNS'), (u'currently', 'RB'),
(u'on', 'IN'), (u'anti-retroviral', 'JJ'), (u'drugs', 'NNS'), (u'.',
'.')]
Related
I have a totally random set of data with around 500 inputs obtained from a physics experiment in excel format. I plotted it successfully but I am not able to create a trendline for my plot. I need to find the trendline for this data.
excel data can be found from this google drive link.
Alternatively, these are the first 2 columns of the Excel file:
0_x 0_y
0.321816 -82.594828
1.595041 -85.396122
2.868266 -82.085502
3.632201 -84.632133
4.905426 -83.868143
6.178651 -84.122807
7.197231 -85.905448
7.961166 -84.37747
8.470457 -85.141459
9.743682 -82.340165
10.252972 -84.886796
11.271552 -82.594828
13.054067 -84.886796
13.054067 -83.104154
14.327292 -84.632133
15.345872 -86.160112
16.364452 -84.886796
17.383032 -86.160112
18.401612 -83.868143
19.165547 -85.905448
20.438773 -82.849491
21.457353 -85.396122
22.221288 -83.104154
22.475933 -85.650785
23.494513 -85.650785
24.513093 -84.632133
25.277028 -85.141459
26.550253 -84.122807
27.314188 -85.141459
28.587413 -82.849491
29.860638 -84.886796
31.133864 -82.340165
31.897799 -84.886796
32.916379 -83.868143
33.934959 -84.37747
34.698894 -84.886796
35.717474 -84.122807
37.245344 -85.141459
38.263924 -83.104154
39.282504 -84.122807
40.301084 -81.576175
40.810374 -84.122807
42.0836 -82.085502
43.356825 -84.886796
44.12076 -85.396122
45.13934 -85.141459
45.903275 -86.669438
47.1765 -84.632133
47.940435 -85.396122
48.70437 -83.358817
49.977595 -85.396122
50.486885 -82.085502
51.25082 -84.632133
52.2694 -83.868143
52.524045 -83.868143
53.287981 -85.905448
55.070496 -84.37747
56.343721 -86.160112
57.107656 -84.122807
57.871591 -86.160112
58.635526 -83.61348
59.144816 -85.905448
60.163396 -84.122807
61.181976 -85.396122
62.709846 -86.160112
62.964491 -85.396122
64.492362 -86.669438
65.001652 -85.141459
65.765587 -85.905448
66.784167 -84.122807
68.566682 -86.160112
69.585262 -83.358817
70.349197 -85.650785
71.113132 -84.37747
72.386357 -84.632133
73.150292 -85.141459
73.914227 -83.868143
74.932807 -85.141459
75.951388 -82.594828
77.224613 -85.396122
78.752483 -82.340165
79.516418 -85.141459
80.280353 -84.122807
81.553578 -85.905448
82.317513 -85.650785
82.826803 -84.886796
83.845383 -85.905448
83.336093 -83.868143
81.808223 -85.905448
80.534998 -83.104154
79.516418 -85.905448
78.497838 -83.868143
76.969968 -85.396122
75.696743 -86.160112
74.678162 -85.396122
73.659582 -86.669438
72.641002 -85.141459
71.367777 -86.160112
69.585262 -83.61348
68.566682 -86.160112
67.548102 -84.122807
66.274877 -86.160112
65.765587 -85.650785
65.001652 -85.905448
63.728426 -86.414775
61.945911 -84.632133
60.927331 -86.160112
60.418041 -83.61348
59.144816 -85.905448
58.380881 -82.594828
57.362301 -84.886796
56.089076 -82.849491
55.325141 -85.141459
54.561206 -85.141459
53.797271 -83.61348
52.524045 -85.141459
51.505465 -83.358817
50.486885 -85.141459
49.21366 -83.104154
48.19508 -84.886796
47.431145 -82.340165
46.15792 -84.886796
45.13934 -84.37747
44.12076 -85.396122
42.847535 -85.141459
41.319664 -84.886796
40.301084 -85.396122
39.537149 -84.632133
38.009279 -85.905448
37.245344 -82.849491
35.972119 -85.396122
34.953539 -83.868143
34.189604 -85.141459
32.152444 -84.632133
31.388509 -84.632133
30.115283 -86.160112
29.351348 -83.61348
28.078123 -85.396122
27.568833 -82.594828
26.804898 -85.396122
25.786318 -82.085502
24.513093 -85.141459
23.494513 -85.141459
22.730578 -84.37747
21.711998 -84.632133
20.438773 -84.122807
19.674838 -84.632133
18.401612 -82.594828
17.383032 -84.886796
16.619097 -82.594828
15.855162 -84.122807
15.091227 -83.61348
13.818002 -84.122807
12.799422 -84.122807
11.780842 -84.122807
11.016907 -84.886796
9.743682 -82.849491
8.215811 -84.632133
7.706521 -81.830838
6.942586 -84.632133
5.414716 -83.868143
4.141491 -85.396122
3.122911 -85.396122
2.358976 -84.886796
1.085751 -85.905448
0.321816 -84.37747
-0.696764 -85.905448
-1.969989 -83.358817
-2.98857 -85.650785
-4.51644 -82.594828
-5.53502 -85.905448
-6.808245 -85.396122
-7.57218 -86.160112
-8.59076 -86.414775
-10.11863 -85.141459
-10.62792 -86.669438
-11.901145 -84.122807
-12.919725 -86.669438
-13.17437 -83.61348
-13.938306 -85.650785
-15.211531 -84.122807
-15.975466 -85.396122
-17.248691 -86.414775
-18.267271 -85.141459
-19.285851 -86.160112
-20.049786 -85.141459
-21.323011 -84.886796
-22.341591 -82.594828
-23.105526 -85.141459
-24.124106 -82.085502
-25.397332 -84.632133
-26.161267 -84.632133
-27.179847 -84.632133
-27.434492 -85.905448
-28.707717 -84.632133
-29.471652 -85.905448
-30.999522 -83.358817
-32.527392 -85.396122
-33.291327 -82.340165
-34.309907 -85.396122
-35.073842 -83.61348
-35.583132 -85.141459
-37.620293 -85.396122
-38.638873 -84.122807
-39.402808 -85.650785
-40.421388 -84.122807
-41.694613 -86.160112
-43.222483 -82.849491
-44.241063 -86.160112
-45.259643 -83.868143
-46.023578 -85.650785
-47.551449 -85.905448
-48.315384 -85.141459
-49.588609 -85.141459
-50.607189 -84.122807
-51.625769 -85.141459
-52.644349 -82.340165
-53.662929 -84.886796
-54.426864 -82.594828
-54.936154 -84.37747
-56.209379 -83.104154
-56.718669 -83.868143
-57.737249 -84.37747
-59.26512 -83.104154
-60.029055 -84.37747
-61.30228 -81.830838
-62.066215 -83.868143
-62.32086 -82.085502
-63.594085 -84.122807
-64.612665 -82.849491
-66.140535 -84.122807
-66.649825 -84.122807
-68.177695 -83.868143
-69.45092 -85.396122
-70.214856 -83.61348
-71.233436 -85.141459
-72.252016 -82.594828
-73.015951 -84.886796
-74.289176 -82.594828
-76.071691 -84.632133
-76.326336 -85.650785
-77.854206 -84.122807
-78.618141 -86.160112
-79.891366 -85.905448
-80.655301 -86.414775
-81.928527 -83.868143
-82.692462 -86.414775
-83.965687 -83.358817
-84.984267 -85.141459
-86.002847 -83.868143
-86.766782 -84.122807
-87.785362 -84.886796
-88.803942 -83.868143
-89.313232 -85.141459
-90.331812 -83.358817
-91.350392 -85.650785
-92.368973 -82.594828
-93.132908 -84.37747
-94.406133 -83.104154
-95.424713 -84.632133
-96.188648 -85.141459
-96.697938 -84.37747
-97.461873 -85.141459
-98.480453 -84.37747
-99.499033 -85.650785
-100.517613 -83.104154
-101.790838 -85.650785
-102.554773 -83.358817
-103.573354 -85.650785
-104.591934 -85.396122
-105.865159 -85.396122
-107.647674 -85.650785
-108.156964 -84.632133
-108.920899 -85.905448
-110.703414 -83.358817
-111.721994 -86.414775
-112.485929 -83.868143
-113.759154 -85.905448
-114.52309 -85.141459
-115.54167 -86.414775
-116.305605 -86.414775
-117.57883 -85.396122
-118.08812 -86.414775
-118.852055 -84.886796
-120.12528 -86.160112
-120.889215 -83.358817
-121.907795 -85.905448
-122.926375 -83.868143
-123.69031 -85.396122
-124.963535 -85.396122
-125.727471 -84.632133
-126.746051 -85.905448
-127.764631 -83.868143
-128.783211 -84.632133
-129.037856 -82.594828
-130.056436 -85.141459
-131.075016 -82.085502
-132.093596 -84.37747
-133.112176 -84.37747
-133.876111 -84.37747
-135.149336 -84.632133
-136.677207 -83.104154
-137.441142 -85.650785
-138.459722 -83.61348
-139.223657 -84.886796
-140.496882 -82.340165
-141.515462 -84.632133
-142.534042 -83.358817
-143.297977 -84.37747
-144.825847 -85.141459
-145.844427 -84.886796
-146.863007 -86.160112
-147.881588 -84.37747
-148.900168 -85.396122
-149.918748 -83.358817
-150.682683 -85.650785
-151.701263 -83.358817
-152.719843 -85.141459
-153.738423 -85.396122
-153.993068 -84.122807
-155.011648 -84.886796
-156.030228 -84.37747
-157.303453 -84.632133
-158.067388 -82.340165
-159.340614 -85.396122
-159.849904 -82.085502
-159.595259 -84.122807
-158.576678 -82.849491
-157.303453 -84.122807
-156.030228 -84.37747
-155.011648 -83.358817
-153.229133 -84.37747
-152.210553 -82.849491
-151.446618 -84.632133
-149.664103 -81.321512
-148.645523 -84.886796
-147.881588 -82.594828
-146.608362 -84.37747
-145.589782 -84.886796
-144.571202 -84.122807
-143.552622 -85.396122
-143.043332 -84.122807
-142.024752 -85.396122
-140.751527 -83.358817
-139.478302 -85.905448
-138.459722 -83.358817
-137.441142 -85.905448
-136.677207 -85.141459
-135.403981 -85.141459
-134.640046 -85.905448
-134.130756 -84.632133
-132.857531 -84.632133
-131.838951 -83.868143
-130.820371 -85.905448
-129.547146 -82.594828
-128.783211 -85.650785
-127.764631 -83.868143
-125.982116 -85.141459
-124.963535 -85.650785
-123.944955 -84.632133
-122.926375 -85.396122
-121.907795 -83.868143
-120.889215 -85.141459
-119.870635 -82.594828
-118.59741 -85.141459
-117.833475 -82.849491
-116.305605 -85.905448
-114.777735 -85.141459
-114.013799 -84.632133
-113.504509 -85.141459
-112.740574 -83.868143
-111.467349 -85.396122
-110.703414 -83.868143
-109.684834 -85.650785
-108.666254 -83.104154
-107.647674 -86.160112
-106.629094 -84.37747
-105.355869 -85.396122
-104.337289 -85.905448
-103.318709 -85.141459
-102.300128 -86.160112
-102.045483 -83.868143
-101.026903 -86.669438
-100.008323 -83.868143
-98.735098 -86.669438
-97.716518 -84.122807
-96.952583 -85.905448
-95.934003 -86.669438
-94.406133 -85.905448
-93.387553 -85.905448
-92.368973 -84.632133
-91.350392 -85.905448
-90.331812 -83.358817
-89.313232 -85.650785
-88.294652 -83.104154
-87.530717 -85.141459
-86.512137 -83.358817
-85.238912 -84.632133
-84.474977 -84.886796
-83.201752 -84.122807
-82.437817 -84.886796
-80.655301 -84.122807
-79.891366 -85.141459
-78.618141 -82.340165
-77.599561 -84.886796
-76.835626 -83.358817
-75.817046 -85.141459
-75.053111 -85.141459
-73.779886 -84.37747
-73.015951 -86.160112
-72.506661 -83.868143
-71.488081 -85.905448
-70.469501 -82.085502
-69.196275 -85.141459
-67.92305 -83.104154
-67.41376 -85.396122
-66.649825 -85.141459
-65.3766 -85.141459
-65.121955 -85.905448
-64.103375 -84.122807
-63.084795 -84.886796
-62.066215 -83.358817
-61.047635 -85.141459
-60.029055 -82.085502
-59.010475 -84.886796
-58.246539 -83.358817
-56.973314 -84.886796
-55.700089 -85.141459
-54.681509 -83.868143
-53.408284 -84.37747
-52.644349 -83.104154
-51.880414 -84.37747
-50.861834 -81.830838
-49.588609 -84.122807
-48.824674 -81.830838
-47.806094 -84.37747
-46.787513 -84.632133
-45.259643 -84.886796
-43.986418 -84.886796
-43.222483 -83.61348
-42.203903 -85.905448
-41.185323 -83.61348
-40.166743 -86.160112
-39.148163 -83.104154
-38.384228 -85.650785
-37.365648 -85.141459
-36.856358 -84.886796
-35.837777 -84.886796
-35.073842 -83.358817
-33.800617 -84.886796
-33.036682 -82.594828
-31.763457 -85.905448
-31.508812 -83.104154
-30.490232 -86.414775
-29.471652 -84.122807
-28.453072 -86.160112
-27.434492 -86.414775
-26.415912 -85.396122
-25.397332 -86.414775
-24.378751 -84.632133
-23.105526 -86.414775
-21.832301 -83.868143
-21.068366 -85.650785
-19.540496 -83.868143
-19.031206 -85.141459
-18.012626 -85.141459
-16.994046 -84.632133
-15.466176 -85.905448
-14.956886 -83.868143
-13.68366 -85.141459
-12.919725 -82.849491
-11.901145 -84.886796
-10.62792 -82.340165
-9.863985 -85.141459
-8.336115 -83.61348
-7.57218 -84.886796
-6.04431 -86.414775
-5.53502 -85.141459
-5.02573 -85.396122
-4.00715 -83.868143
-2.479279 -85.396122
-1.206054 -82.849491
-0.696764 -85.396122
-0.187474 -83.868143
1.085751 -86.160112
1.849686 -86.160112
3.122911 -85.905448
4.141491 -86.414775
4.905426 -84.632133
5.924006 -86.160112
6.178651 -83.358817
7.197231 -85.905448
8.215811 -82.594828
9.234392 -85.141459
10.252972 -84.632133
11.271552 -84.632133
13.054067 -85.396122
13.563357 -84.122807
14.836582 -85.141459
15.855162 -83.868143
I tried polyval, polyfit, Basic fitting (plot->tootls->basic fitting)
datasheet=xlsread('08_08_2019-11_08_58_.xlsx','Worksheet');
x=datasheet(:,1);
y=datasheet(:,2);
plot(x,y)
What exactly do you understand under trendline? A linear line p1*x+p2 can be achieved by the following code, which assumes that you have the curve fitting toolbox installed. The last part concerning polyfit() does not need this toolbox.
If you check these (fit) and these (fitoptions) docs, you'll find out that also other options, for quadratic polynomials, exponentials or fourier/sin-cos functions (and many more) are available when using the toolbox. The xand y of the data you posted looks like sinusoidal to me, so I'd go with the second option below.
Linear fit p1*x+p2
data = xlsread('08_08_2019-11_08_58_.xlsx');
x=data(:,1);
y=data(:,2);
[xData, yData] = prepareCurveData(x, y);
% Set up fittype and options.
ft = fittype('poly1')
% Fit model to data.
[fitresult, gof] = fit(xData, yData, ft);
% Plot fit with data.
figure
h = plot(fitresult, xData, yData);
legend(h, 'data', 'linear 1st order polynomial', 'Location', 'NorthEast');
xlabel('x'); ylabel('y'); grid on
% Extract the equation(ft) and the coefficients
coeffnames(fitresult)
coeffvalues(fitresult)
Using a Nonlinear Least Squares to fit a Fourier model a0 + a1*cos(x*w) + b1*sin(x*w)
data = xlsread('08_08_2019-11_08_58_.xlsx');
x=data(:,1);
y=data(:,2);
[xData, yData] = prepareCurveData(x, y);
% Set up fittype and options.
ft = fittype('fourier1')
opts = fitoptions('Method', 'NonlinearLeastSquares');
% Fit model to data.
[fitresult, gof] = fit(xData, yData, ft, opts);
% Plot fit with data.
figure
h = plot(fitresult, xData, yData);
legend(h, 'data', 'Fourier model fit', 'Location', 'NorthEast');
xlabel('x'); ylabel('y'); grid on
% Extract the equation(ft) and the coefficients
coeffnames(fitresult)
coeffvalues(fitresult)
Using the polyfit() and polyval() functions for p1*x+p2:
% using polyfit and polyval (no toolbox needed)
data = xlsread('08_08_2019-11_08_58_.xlsx');
x=data(:,1);
y=data(:,2);
p = polyfit(x, y, 1)
pp = polyval(p, x);
h = plot(x, y, 'o', x, pp);
legend(h, 'data', 'linear 1st order polynomial', 'Location', 'NorthEast');
xlabel('x'); ylabel('y'); grid on
Without Curve fitting toolbox, using your mentioned in-plot method
data = xlsread('08_08_2019-11_08_58_.xlsx');
x=data(:,1);
y=data(:,2);
plot(x, y)
In the figure window: Tools -> Basic Fitting -> linear/polynomial
also seems to work for your given case.
Trying to join two dataframes A & B. B has a distinct operation right before the join. Also one of the columns in B is joined on two columns in A. This specific situation is giving an IndexOutOfBoundsException. Anyone run into this situation before?
Details below. Thanks in advance!
Environment:
spark-shell standalone mode
Spark version 2.3.1
Code:
val df1 = Seq((1, "one", "one"), (2, "two", "two")).toDF("key1", "val11", "val12")
val df2 = Seq(("one", "first"), ("one", "first"), ("two", "second")).toDF("key2", "val2")
val df3 = df2.distinct
val df4 = df1.join(df3, col("val11") === col("key2") and col("val12") === col("key2"))
df4.show(false)
Exception:
java.lang.IndexOutOfBoundsException: -1
at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
at scala.collection.immutable.List.apply(List.scala:84)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$reorder$1.apply(EnsureRequirements.scala:233)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$reorder$1.apply(EnsureRequirements.scala:231)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.reorder(EnsureRequirements.scala:231)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$reorderJoinKeys(EnsureRequirements.scala:255)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$reorderJoinPredicates$1.applyOrElse(EnsureRequirements.scala:277)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$reorderJoinPredicates$1.applyOrElse(EnsureRequirements.scala:273)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$reorderJoinPredicates(EnsureRequirements.scala:273)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:302)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:294)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:294)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:37)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3249)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2484)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2698)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
at org.apache.spark.sql.Dataset.show(Dataset.scala:725)
at org.apache.spark.sql.Dataset.show(Dataset.scala:702)
... 49 elided
Update: Working Solution: Thanks #1pluszara!
val df1 = Seq((1, "one", "one"), (2, "two", "two")).toDF("key1", "val11", "val12")
val df2 = Seq(("one", "first"), ("one", "first"), ("two", "second")).toDF("key2", "val2")
val df3 = spark.createDataFrame(df2.rdd.distinct, df2.schema)
val df4 = df1.join(df3, col("val11") === col("key2") and col("val12") === col("key2"))
df4.show(false)
Tried this:
val df3 = df2.rdd.distinct().map({
case Row(key2: String, val2: String) => (key2,val2)
}).toDF("key2","val2")
val df4 = df1.join(df3, col("val11") === col("key2") and col("val12") === col("key2"))
df4.show(false)
Output:
+----+-----+-----+----+------+
|key1|val11|val12|key2|val2 |
+----+-----+-----+----+------+
|2 |two |two |two |second|
|1 |one |one |one |first |
+----+-----+-----+----+------+
But not sure how the execution has worked internally for the dataframe version though.
As part of a new PCI-DSS server deployment I am in the process of configuring a fully auditable NTP time change history. All is working as expected however I am now seeing audit logs written every single second relating to time change operations. After a lot of searching I'm still no closer to understanding what is going on. The issue shows itself in /var/log/messages where an audit message is being written continuously.
My research suggests that the syscall "exit=5" message means that the clock was not properly synchronised:
adjtimex() syscall response "#define TIME_BAD 5 /* clock not synchronized */".
So, in summary it appears that the clock is synced correctly (as far as my understanding takes me) however it is constantly changing - unexpected behaviour with the polling interval set at the default 64s.
Is anyone able to offer suggestions? I've included as much detail as I can muster below:
Audit time rules:
[09:31] callum pci-fram-ipa1 ~ $ sudo cat /etc/audit/rules.d/audit_time_rules.rules
-a always,exit -F arch=b64 -S adjtimex -S settimeofday -k time-change
-a always,exit -F arch=b32 -S adjtimex -S settimeofday -S stime -k time-change
-a always,exit -F arch=b64 -S clock_settime -k time-change
-a always,exit -F arch=b32 -S clock_settime -k time-change
-w /etc/localtime -p wa -k time-change
System time vs clock time:
[09:14] callum pci-fram-ipa1 ~ $ sudo clock;date
Thu 05 Jan 2017 09:14:01 GMT -0.500708 seconds
Thu 5 Jan 09:14:01 GMT 2017
Example audit output:
[09:15] callum pci-fram-ipa1 ~ $ sudo tail -f /var/log/messages|grep time
Jan 5 09:15:25 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607725.390:2328215): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:26 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607726.390:2328216): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:27 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607727.390:2328217): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:28 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607728.390:2328218): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:29 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607729.390:2328219): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:30 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607730.390:2328220): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:31 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607731.390:2328221): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:32 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607732.390:2328222): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:33 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607733.390:2328223): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:34 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607734.390:2328224): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Sync stats:
[09:15] callum pci-fram-ipa1 ~ $ sudo ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*neon.trippett.o 131.188.3.221 2 u 112 256 377 17.924 -0.704 0.252
+uno.alvm.me 193.79.237.14 2 u 196 256 377 19.737 0.505 0.436
+greenore.zeip.e 140.203.204.77 2 u 165 256 377 19.616 0.019 0.252
+devrandom.pl 87.124.126.49 3 u 124 256 377 19.675 0.371 0.572
Additional info:
[09:17] callum pci-fram-ipa1 ~ $ ntpdc -c sysinfo
system peer: neon.trippett.org
system peer mode: client
leap indicator: 00
stratum: 3
precision: -23
root distance: 0.03258 s
root dispersion: 0.04211 s
reference ID: [178.62.6.103]
reference time: dc188cec.d9ea15c5 Thu, Jan 5 2017 9:14:20.851
system flags: auth ntp stats
jitter: 0.000320 s
stability: 0.000 ppm
broadcastdelay: 0.000000 s
authdelay: 0.000000 s
This sounds like this could be expected behavior, based upon how often NTP skews a clock
From NTP documentation:
5.1.3.2. How frequently will the System Clock be updated?
As time should be a continuous and steady stream, ntpd updates the clock in small quantities. However, to keep up with clock errors, such corrections have to be applied frequently. If adjtime() is used, ntpd will update the system clock every second(I know this is not adjtimex, but adjtimex can function just as adjtime in the ADJ_OFFSET_SINGLESHOT mode: see adjtimex man page). If ntp_adjtime() is available, the operating system can compensate clock errors automatically, requiring only infrequent updates. See also Section 5.2 and Q: 5.1.6.1..
The polling interval has nothing to do with this though. It's instead how often the upstream(lower time stratum) time server is "queried" for reference.
If the problem is that you're seeing the audit entries and you don't wish to see them for the ntp user - and you only want to see nefarious time skews, then follow the advice from this link, and exclude the ntp uid/auid.
Also, from the adjtimex man page, it seems that the TIME_BAD error you see may not mean that the time was never correctly slewed:
TIME_ERROR The system clock is not synchronized to a reliable
server. This value is returned when any of the following
holds true:
* Either STA_UNSYNC or STA_CLOCKERR is set.
* STA_PPSSIGNAL is clear and either STA_PPSFREQ or
STA_PPSTIME is set.
* STA_PPSTIME and STA_PPSJITTER are both set.
* STA_PPSFREQ is set and either STA_PPSWANDER or
STA_PPSJITTER is set.
The symbolic name TIME_BAD is a synonym for TIME_ERROR,
provided for backward compatibility.
I want to use IDW interpolation technique on my data set. As usual consist on text files. Name Output_00.text to Output_23.text in a folder.
Each text file consist on three columns. Latitude, Longitude and Temperature values. 3rd Column(Temperature column) contain -9999.000 value encoded as missing or NaN value.
I want to interpolate these NaN/missing value in temperature through inverse distance weighting interpolation technique.
Here is what I am trying,but not idea how to use it
function[Vint]=IDW(xc,yc,vc,x,y,e,r1,r2)
%%% INPUTS
%xc = stations x coordinates (columns) [vector]
%yc = stations y coordinates (rows) [vector]
%vc = variable values on the point [xc yc]
%x = interpolation points x coordinates [vector]
%y = interpolation points y coordinates [vector]
%e = distance weight
%r1 --- 'fr' = fixed radius ; 'ng' = neighbours
%r2 --- radius lenght if r1 == 'fr' / number of neighbours if r1 =='ng'
%%% OUTPUTS
%Vint --- Matrix [length(y),length(x)] with interpolated variable values
%%% EXAMPLES
%%% --> V_spa=IDW(x1,y1,v1,x,y,-2,'ng',length(x1));
% Simone Fatichi -- simonef#dicea.unifi.it
% Copyright 2009
% $Date: 2009/06/19 $
% $Updated 2012/02/24 $
Vint=zeros(length(y),length(x));
xc=reshape(xc,1,length(xc));
yc=reshape(yc,1,length(yc));
vc=reshape(vc,1,length(vc));
if strcmp(r1,'fr')
if (r2<=0)
disp('Error: Radius must be positive')
return
end
for i=1:length(x)
for j=1:length(y)
D=[]; V=[]; wV =[]; vcc=[];
D= sqrt((x(i)-xc).^2 +(y(j)-yc).^2);
if min(D)==0
disp('Error: One or more stations have the coordinates of an interpolation point')
return
end
vcc=vc(D<r2); D=D(D<r2);
V = vcc.*(D.^e);
wV = D.^e;
if isempty(D)
V=NaN;
else
V=sum(V)/sum(wV);
end
Vint(j,i)=V;
end
end
else
if (r2 > length(vc)) || (r2<1)
disp('Error: Number of neighbours not congruent with data')
return
end
for i=1:length(x)
for j=1:length(y)
D=[]; V=[]; wV =[];vcc=[];
D= sqrt((x(i)-xc).^2 +(y(j)-yc).^2);
if min(D)==0
disp('Error: One or more stations have the coordinates of an interpolation point')
return
end
[D,I]=sort(D);
vcc=vc(I);
V = vcc(1:r2).*(D(1:r2).^e);
wV = D(1:r2).^e;
V=sum(V)/sum(wV);
Vint(j,i)=V;
end
end
end
return
I want to modify this code so that it read all text file step by step and interpolate NaN values with IDW interpolation technique and save each text file with IDW_00.text to IDW_23.text in the same folder.
My data set looks like this.
21.500 60.500 295.867
21.500 61.500 295.828
21.500 62.500 295.828
21.500 63.500 295.867
21.500 64.500 296.102
21.500 65.500 296.234
21.500 66.500 296.352
21.500 67.500 296.336
21.500 68.500 296.305
21.500 69.500 298.281
21.500 70.500 301.828
21.500 71.500 302.094
21.500 72.500 299.469
21.500 73.500 301.711
21.500 74.500 -9999.000
21.500 75.500 -9999.000
21.500 76.500 -9999.000
21.500 77.500 -9999.000
21.500 78.500 -9999.000
22.500 60.500 295.477
22.500 61.500 295.484
22.500 62.500 295.516
22.500 63.500 295.547
22.500 64.500 295.852
22.500 65.500 295.859
22.500 66.500 295.852
22.500 67.500 295.711
22.500 68.500 295.969
22.500 69.500 298.562
22.500 70.500 300.828
22.500 71.500 302.352
22.500 72.500 300.570
22.500 73.500 301.383
22.500 74.500 -9999.000
22.500 75.500 -9999.000
22.500 76.500 -9999.000
22.500 77.500 -9999.000
22.500 78.500 -9999.000
23.500 60.500 294.906
23.500 61.500 294.898
23.500 62.500 295.000
23.500 63.500 295.078
23.500 64.500 295.297
23.500 65.500 295.359
23.500 66.500 295.297
23.500 67.500 295.312
23.500 68.500 296.664
23.500 69.500 298.781
23.500 70.500 299.211
23.500 71.500 300.109
23.500 72.500 301.000
23.500 73.500 301.594
23.500 74.500 302.000
23.500 75.500 -9999.000
23.500 76.500 -9999.000
23.500 77.500 -9999.000
23.500 78.500 -9999.000
24.500 60.500 294.578
24.500 61.500 294.516
24.500 62.500 294.734
24.500 63.500 294.789
24.500 64.500 294.844
24.500 65.500 294.562
24.500 66.500 294.734
24.500 67.500 296.367
24.500 68.500 297.438
24.500 69.500 298.531
24.500 70.500 298.453
24.500 71.500 299.195
24.500 72.500 300.062
24.500 73.500 -9999.000
24.500 74.500 -9999.000
24.500 75.500 -9999.000
24.500 76.500 -9999.000
24.500 77.500 -9999.000
24.500 78.500 -9999.000
25.500 60.500 296.258
25.500 61.500 296.391
25.500 62.500 296.672
25.500 63.500 296.398
25.500 64.500 295.773
25.500 65.500 295.812
25.500 66.500 296.609
25.500 67.500 297.977
25.500 68.500 297.109
25.500 69.500 297.828
25.500 70.500 298.430
25.500 71.500 298.836
25.500 72.500 298.703
25.500 73.500 -9999.000
25.500 74.500 -9999.000
25.500 75.500 -9999.000
25.500 76.500 -9999.000
25.500 77.500 -9999.000
25.500 78.500 299.023
26.500 60.500 -9999.000
26.500 61.500 298.266
26.500 62.500 296.773
26.500 63.500 -9999.000
26.500 64.500 -9999.000
26.500 65.500 -9999.000
26.500 66.500 297.250
26.500 67.500 296.188
26.500 68.500 295.938
26.500 69.500 296.906
26.500 70.500 297.828
26.500 71.500 299.312
26.500 72.500 299.359
26.500 73.500 -9999.000
26.500 74.500 -9999.000
26.500 75.500 -9999.000
26.500 76.500 -9999.000
26.500 77.500 298.875
26.500 78.500 296.773
27.500 60.500 -9999.000
27.500 61.500 -9999.000
27.500 62.500 -9999.000
27.500 63.500 -9999.000
27.500 64.500 -9999.000
27.500 65.500 -9999.000
27.500 66.500 -9999.000
27.500 67.500 295.352
27.500 68.500 295.148
27.500 69.500 295.750
27.500 70.500 295.750
27.500 71.500 296.070
27.500 72.500 295.227
27.500 73.500 -9999.000
27.500 74.500 -9999.000
27.500 75.500 -9999.000
27.500 76.500 -9999.000
27.500 77.500 -9999.000
27.500 78.500 296.609
28.500 60.500 -9999.000
28.500 61.500 -9999.000
28.500 62.500 -9999.000
28.500 63.500 -9999.000
28.500 64.500 -9999.000
28.500 65.500 -9999.000
28.500 66.500 -9999.000
28.500 67.500 295.773
28.500 68.500 295.375
28.500 69.500 295.438
28.500 70.500 294.664
28.500 71.500 294.906
28.500 72.500 294.812
28.500 73.500 295.805
28.500 74.500 -9999.000
28.500 75.500 -9999.000
28.500 76.500 -9999.000
28.500 77.500 -9999.000
28.500 78.500 -9999.000
29.500 60.500 -9999.000
29.500 61.500 -9999.000
29.500 62.500 -9999.000
29.500 63.500 -9999.000
29.500 64.500 -9999.000
29.500 65.500 -9999.000
29.500 66.500 -9999.000
29.500 67.500 295.719
29.500 68.500 296.797
29.500 69.500 293.375
29.500 70.500 294.305
29.500 71.500 294.070
29.500 72.500 293.750
29.500 73.500 295.539
29.500 74.500 -9999.000
29.500 75.500 -9999.000
29.500 76.500 -9999.000
29.500 77.500 -9999.000
29.500 78.500 -9999.000
30.500 60.500 -9999.000
30.500 61.500 -9999.000
30.500 62.500 -9999.000
30.500 63.500 -9999.000
30.500 64.500 -9999.000
30.500 65.500 -9999.000
30.500 66.500 -9999.000
30.500 67.500 -9999.000
30.500 68.500 -9999.000
30.500 69.500 -9999.000
30.500 70.500 293.320
30.500 71.500 292.930
30.500 72.500 293.570
30.500 73.500 294.648
30.500 74.500 295.383
30.500 75.500 -9999.000
30.500 76.500 -9999.000
30.500 77.500 -9999.000
30.500 78.500 -9999.000
31.500 60.500 -9999.000
31.500 61.500 -9999.000
31.500 62.500 -9999.000
31.500 63.500 -9999.000
31.500 64.500 -9999.000
31.500 65.500 -9999.000
31.500 66.500 -9999.000
31.500 67.500 -9999.000
31.500 68.500 -9999.000
31.500 69.500 -9999.000
31.500 70.500 293.992
31.500 71.500 293.422
31.500 72.500 294.438
31.500 73.500 294.141
31.500 74.500 -9999.000
31.500 75.500 -9999.000
31.500 76.500 -9999.000
31.500 77.500 -9999.000
31.500 78.500 -9999.000
32.500 60.500 -9999.000
32.500 61.500 -9999.000
32.500 62.500 -9999.000
32.500 63.500 -9999.000
32.500 64.500 -9999.000
32.500 65.500 -9999.000
32.500 66.500 -9999.000
32.500 67.500 -9999.000
32.500 68.500 -9999.000
32.500 69.500 -9999.000
32.500 70.500 -9999.000
32.500 71.500 294.312
32.500 72.500 294.812
32.500 73.500 -9999.000
32.500 74.500 -9999.000
32.500 75.500 -9999.000
32.500 76.500 -9999.000
32.500 77.500 -9999.000
32.500 78.500 -9999.000
33.500 60.500 -9999.000
33.500 61.500 -9999.000
33.500 62.500 -9999.000
33.500 63.500 -9999.000
33.500 64.500 -9999.000
33.500 65.500 -9999.000
33.500 66.500 -9999.000
33.500 67.500 -9999.000
33.500 68.500 -9999.000
33.500 69.500 -9999.000
33.500 70.500 -9999.000
33.500 71.500 -9999.000
33.500 72.500 -9999.000
33.500 73.500 -9999.000
33.500 74.500 -9999.000
33.500 75.500 -9999.000
33.500 76.500 -9999.000
33.500 77.500 -9999.000
33.500 78.500 -9999.000
34.500 60.500 -9999.000
34.500 61.500 -9999.000
34.500 62.500 -9999.000
34.500 63.500 -9999.000
34.500 64.500 -9999.000
34.500 65.500 -9999.000
34.500 66.500 -9999.000
34.500 67.500 -9999.000
34.500 68.500 -9999.000
34.500 69.500 -9999.000
34.500 70.500 -9999.000
34.500 71.500 -9999.000
34.500 72.500 -9999.000
34.500 73.500 -9999.000
34.500 74.500 -9999.000
34.500 75.500 -9999.000
34.500 76.500 -9999.000
34.500 77.500 -9999.000
34.500 78.500 -9999.000
35.500 60.500 -9999.000
35.500 61.500 -9999.000
35.500 62.500 -9999.000
35.500 63.500 -9999.000
35.500 64.500 -9999.000
35.500 65.500 -9999.000
35.500 66.500 -9999.000
35.500 67.500 -9999.000
35.500 68.500 -9999.000
35.500 69.500 -9999.000
35.500 70.500 -9999.000
35.500 71.500 -9999.000
35.500 72.500 -9999.000
35.500 73.500 -9999.000
35.500 74.500 -9999.000
35.500 75.500 -9999.000
35.500 76.500 -9999.000
35.500 77.500 -9999.000
35.500 78.500 -9999.000
36.500 60.500 276.742
36.500 61.500 274.406
36.500 62.500 -9999.000
36.500 63.500 -9999.000
36.500 64.500 -9999.000
36.500 65.500 272.219
36.500 66.500 273.023
36.500 67.500 275.875
36.500 68.500 -9999.000
36.500 69.500 -9999.000
36.500 70.500 -9999.000
36.500 71.500 -9999.000
36.500 72.500 -9999.000
36.500 73.500 -9999.000
36.500 74.500 -9999.000
36.500 75.500 -9999.000
36.500 76.500 -9999.000
36.500 77.500 -9999.000
36.500 78.500 -9999.000
37.500 60.500 277.406
37.500 61.500 277.547
37.500 62.500 276.375
37.500 63.500 275.484
37.500 64.500 276.820
37.500 65.500 275.312
37.500 66.500 274.875
37.500 67.500 275.875
37.500 68.500 -9999.000
37.500 69.500 -9999.000
37.500 70.500 -9999.000
37.500 71.500 -9999.000
37.500 72.500 -9999.000
37.500 73.500 -9999.000
37.500 74.500 -9999.000
37.500 75.500 -9999.000
37.500 76.500 -9999.000
37.500 77.500 -9999.000
37.500 78.500 -9999.000
Please help and a lot of thanks for this kind assistance
I want to use scatter data interpolation for missing or NaN values in my temperature data? I have text files, each consist on three column
First column is latitude
Second column is longitude
Third column consist on temperature value
There is -9999.000 value for temperature column which representing missing or NaN data. I want to interpolate this values from remaining known values. I want to use F=scatteredInterpolant(x,y,v)
Where x, y are coordinates of sample points and v is corresponding value of these sample point. After making F(q), where q is query point. Which will be represent NaN or missing value of temperature in 3rd column.
I have prepared this code but not idea how to move next??
metC = {'linear','cubic','next','pchip','previous','spline','v5cubic','nearest'};
% Read the file data:
S = dir('Output_*.txt');
N = sort({S.name});
nmf = numel(N);
nmr = size(load(N{1},'-ascii'),1);
mat = zeros(nmr,3,nmf);
for k = 1:nmf
mat(:,:,k) = load(N{k},'-ascii');
end
tmp = 0==diff(mat(:,1:2,:),1,3);
assert(all(tmp(:)),'First columns do not match')
% Rearrange:
[VC,NA,IC] = unique(mat(:,1,1));
[VR,NA,IR] = unique(mat(:,2,1));
out = reshape(mat(:,3,:),numel(VR),numel(VC),nmf);
% Detect -9999:
idx = out<(-9998);
Here i have not idea whether i am going right or not??
Interpolate:
vec = 1:nmf;% 24 times
for m = 1:numel(metC)% the length of techniques let say 3 times in this case
metS = metC{m} % the method
for r = 1:numel(VR)% 19 times
for c = 1:numel(VC)% 17 times
idy = squeeze(idx(r,c,:)).'; % removing singleton dimension. Here idy will be position of the without -9999 value
if any(idy) %check for -9999
Xold = vec(~idy); %giving x which are sample points
Yold = squeeze(out(r,c,~idy)).'; % I DO NOT KNOW WHAT THIS LINE DOING?
Xnew = vec(idy); % these are query points which we want to interpolate
out(r,c,idy) = scatteredInterpolant(Xold,Yold,???????????,Xnew,metC,'extrap');
end
end
end
end
My data set look like:
21.500 60.500 295.867
21.500 61.500 295.828
21.500 62.500 295.828
21.500 63.500 295.867
21.500 64.500 296.102
21.500 65.500 296.234
21.500 66.500 296.352
21.500 67.500 296.336
21.500 68.500 296.305
21.500 69.500 298.281
21.500 70.500 301.828
21.500 71.500 302.094
21.500 72.500 299.469
21.500 73.500 301.711
21.500 74.500 -9999.000
21.500 75.500 -9999.000
21.500 76.500 -9999.000
21.500 77.500 -9999.000
21.500 78.500 -9999.000
22.500 60.500 295.477
22.500 61.500 295.484
22.500 62.500 295.516
22.500 63.500 295.547
22.500 64.500 295.852
22.500 65.500 295.859
22.500 66.500 295.852
22.500 67.500 295.711
22.500 68.500 295.969
22.500 69.500 298.562
22.500 70.500 300.828
22.500 71.500 302.352
22.500 72.500 300.570
22.500 73.500 301.383
22.500 74.500 -9999.000
22.500 75.500 -9999.000
22.500 76.500 -9999.000
22.500 77.500 -9999.000
22.500 78.500 -9999.000
23.500 60.500 294.906
23.500 61.500 294.898
23.500 62.500 295.000
23.500 63.500 295.078
23.500 64.500 295.297
23.500 65.500 295.359
23.500 66.500 295.297
23.500 67.500 295.312
23.500 68.500 296.664
23.500 69.500 298.781
23.500 70.500 299.211
23.500 71.500 300.109
23.500 72.500 301.000
23.500 73.500 301.594
23.500 74.500 302.000
23.500 75.500 -9999.000
23.500 76.500 -9999.000
23.500 77.500 -9999.000
23.500 78.500 -9999.000
24.500 60.500 294.578
24.500 61.500 294.516
24.500 62.500 294.734
24.500 63.500 294.789
24.500 64.500 294.844
24.500 65.500 294.562
24.500 66.500 294.734
24.500 67.500 296.367
24.500 68.500 297.438
24.500 69.500 298.531
24.500 70.500 298.453
24.500 71.500 299.195
24.500 72.500 300.062
24.500 73.500 -9999.000
24.500 74.500 -9999.000
24.500 75.500 -9999.000
24.500 76.500 -9999.000
24.500 77.500 -9999.000
24.500 78.500 -9999.000
25.500 60.500 296.258
25.500 61.500 296.391
25.500 62.500 296.672
25.500 63.500 296.398
25.500 64.500 295.773
25.500 65.500 295.812
25.500 66.500 296.609
25.500 67.500 297.977
25.500 68.500 297.109
25.500 69.500 297.828
25.500 70.500 298.430
25.500 71.500 298.836
25.500 72.500 298.703
25.500 73.500 -9999.000
25.500 74.500 -9999.000
25.500 75.500 -9999.000
25.500 76.500 -9999.000
25.500 77.500 -9999.000
25.500 78.500 299.023
26.500 60.500 -9999.000
26.500 61.500 298.266
26.500 62.500 296.773
26.500 63.500 -9999.000
26.500 64.500 -9999.000
26.500 65.500 -9999.000
26.500 66.500 297.250
26.500 67.500 296.188
26.500 68.500 295.938
26.500 69.500 296.906
26.500 70.500 297.828
26.500 71.500 299.312
26.500 72.500 299.359
26.500 73.500 -9999.000
26.500 74.500 -9999.000
26.500 75.500 -9999.000
26.500 76.500 -9999.000
26.500 77.500 298.875
26.500 78.500 296.773
27.500 60.500 -9999.000
27.500 61.500 -9999.000
27.500 62.500 -9999.000
27.500 63.500 -9999.000
27.500 64.500 -9999.000
27.500 65.500 -9999.000
27.500 66.500 -9999.000
27.500 67.500 295.352
27.500 68.500 295.148
27.500 69.500 295.750
27.500 70.500 295.750
27.500 71.500 296.070
27.500 72.500 295.227
27.500 73.500 -9999.000
27.500 74.500 -9999.000
27.500 75.500 -9999.000
27.500 76.500 -9999.000
27.500 77.500 -9999.000
27.500 78.500 296.609
28.500 60.500 -9999.000
28.500 61.500 -9999.000
28.500 62.500 -9999.000
28.500 63.500 -9999.000
28.500 64.500 -9999.000
28.500 65.500 -9999.000
28.500 66.500 -9999.000
28.500 67.500 295.773
28.500 68.500 295.375
28.500 69.500 295.438
28.500 70.500 294.664
28.500 71.500 294.906
28.500 72.500 294.812
28.500 73.500 295.805
28.500 74.500 -9999.000
28.500 75.500 -9999.000
28.500 76.500 -9999.000
28.500 77.500 -9999.000
28.500 78.500 -9999.000
29.500 60.500 -9999.000
29.500 61.500 -9999.000
29.500 62.500 -9999.000
29.500 63.500 -9999.000
29.500 64.500 -9999.000
29.500 65.500 -9999.000
29.500 66.500 -9999.000
29.500 67.500 295.719
29.500 68.500 296.797
29.500 69.500 293.375
29.500 70.500 294.305
29.500 71.500 294.070
29.500 72.500 293.750
29.500 73.500 295.539
29.500 74.500 -9999.000
29.500 75.500 -9999.000
29.500 76.500 -9999.000
29.500 77.500 -9999.000
29.500 78.500 -9999.000
30.500 60.500 -9999.000
30.500 61.500 -9999.000
30.500 62.500 -9999.000
30.500 63.500 -9999.000
30.500 64.500 -9999.000
30.500 65.500 -9999.000
30.500 66.500 -9999.000
30.500 67.500 -9999.000
30.500 68.500 -9999.000
30.500 69.500 -9999.000
30.500 70.500 293.320
30.500 71.500 292.930
30.500 72.500 293.570
30.500 73.500 294.648
30.500 74.500 295.383
30.500 75.500 -9999.000
30.500 76.500 -9999.000
30.500 77.500 -9999.000
30.500 78.500 -9999.000
31.500 60.500 -9999.000
31.500 61.500 -9999.000
31.500 62.500 -9999.000
31.500 63.500 -9999.000
31.500 64.500 -9999.000
31.500 65.500 -9999.000
31.500 66.500 -9999.000
31.500 67.500 -9999.000
31.500 68.500 -9999.000
31.500 69.500 -9999.000
31.500 70.500 293.992
31.500 71.500 293.422
31.500 72.500 294.438
31.500 73.500 294.141
31.500 74.500 -9999.000
31.500 75.500 -9999.000
31.500 76.500 -9999.000
31.500 77.500 -9999.000
31.500 78.500 -9999.000
32.500 60.500 -9999.000
32.500 61.500 -9999.000
32.500 62.500 -9999.000
32.500 63.500 -9999.000
32.500 64.500 -9999.000
32.500 65.500 -9999.000
32.500 66.500 -9999.000
32.500 67.500 -9999.000
32.500 68.500 -9999.000
32.500 69.500 -9999.000
32.500 70.500 -9999.000
32.500 71.500 294.312
32.500 72.500 294.812
32.500 73.500 -9999.000
32.500 74.500 -9999.000
32.500 75.500 -9999.000
32.500 76.500 -9999.000
32.500 77.500 -9999.000
32.500 78.500 -9999.000
33.500 60.500 -9999.000
33.500 61.500 -9999.000
33.500 62.500 -9999.000
33.500 63.500 -9999.000
33.500 64.500 -9999.000
33.500 65.500 -9999.000
33.500 66.500 -9999.000
33.500 67.500 -9999.000
33.500 68.500 -9999.000
33.500 69.500 -9999.000
33.500 70.500 -9999.000
33.500 71.500 -9999.000
33.500 72.500 -9999.000
33.500 73.500 -9999.000
33.500 74.500 -9999.000
33.500 75.500 -9999.000
33.500 76.500 -9999.000
33.500 77.500 -9999.000
33.500 78.500 -9999.000
34.500 60.500 -9999.000
34.500 61.500 -9999.000
34.500 62.500 -9999.000
34.500 63.500 -9999.000
34.500 64.500 -9999.000
34.500 65.500 -9999.000
34.500 66.500 -9999.000
34.500 67.500 -9999.000
34.500 68.500 -9999.000
34.500 69.500 -9999.000
34.500 70.500 -9999.000
34.500 71.500 -9999.000
34.500 72.500 -9999.000
34.500 73.500 -9999.000
34.500 74.500 -9999.000
34.500 75.500 -9999.000
34.500 76.500 -9999.000
34.500 77.500 -9999.000
34.500 78.500 -9999.000
35.500 60.500 -9999.000
35.500 61.500 -9999.000
35.500 62.500 -9999.000
35.500 63.500 -9999.000
35.500 64.500 -9999.000
35.500 65.500 -9999.000
35.500 66.500 -9999.000
35.500 67.500 -9999.000
35.500 68.500 -9999.000
35.500 69.500 -9999.000
35.500 70.500 -9999.000
35.500 71.500 -9999.000
35.500 72.500 -9999.000
35.500 73.500 -9999.000
35.500 74.500 -9999.000
35.500 75.500 -9999.000
35.500 76.500 -9999.000
35.500 77.500 -9999.000
35.500 78.500 -9999.000
36.500 60.500 276.742
36.500 61.500 274.406
36.500 62.500 -9999.000
36.500 63.500 -9999.000
36.500 64.500 -9999.000
36.500 65.500 272.219
36.500 66.500 273.023
36.500 67.500 275.875
36.500 68.500 -9999.000
36.500 69.500 -9999.000
36.500 70.500 -9999.000
36.500 71.500 -9999.000
36.500 72.500 -9999.000
36.500 73.500 -9999.000
36.500 74.500 -9999.000
36.500 75.500 -9999.000
36.500 76.500 -9999.000
36.500 77.500 -9999.000
36.500 78.500 -9999.000
37.500 60.500 277.406
37.500 61.500 277.547
37.500 62.500 276.375
37.500 63.500 275.484
37.500 64.500 276.820
37.500 65.500 275.312
37.500 66.500 274.875
37.500 67.500 275.875
37.500 68.500 -9999.000
37.500 69.500 -9999.000
37.500 70.500 -9999.000
37.500 71.500 -9999.000
37.500 72.500 -9999.000
37.500 73.500 -9999.000
37.500 74.500 -9999.000
37.500 75.500 -9999.000
37.500 76.500 -9999.000
37.500 77.500 -9999.000
37.500 78.500 -9999.000
Here is an example using scatteredInterpolant.
% Load the data
data = load('temperature_data.txt');
% separate the data columns, just to make the code clear
Lat = data(:,1); % Column 1 is Latitude
Lon = data(:,2); % Column 2 is Longitude
Tmp = data(:,3); % Column 3 is Temperature
% Find the "good" data points
good_temp = find(Tmp > -9999);
% create the interpolant object using the good data points
T = scatteredInterpolant(Lat(good_temp),Lon(good_temp),Tmp(good_temp),'linear');
% find the "bad" data points
bad_temp = find(Tmp == -9999);
% use the interpolation object to interpolate temperature values
interp_values = T(Lat(bad_temp),Lon(bad_temp));
% replace the bad values with the interpolated values
Tmp(bad_temp) = interp_values;