SciPy converting polygon to hull object - scipy

I have the following code that takes a series of points from a pandas dataframe (x,y) and a scipy.spatial.qhull.ConvexHull object. This returns all points which are within the hull.
def in_hull(points, hull):
hulleq = hull.equations
dist = np.array(points[['x', 'y']]) # hulleq[:, :2].T + hulleq[:, 2]
return np.all(dist < 0, axis=1)
I'm trying to do the same for a polygon (see below), I would like to reuse the in_hull() function rather than write a point in polygon function. Is there anyway to convert a polygon into a scipy hull object so it can pretend to be a hull object?
POLYGON ((580994.4751 4275268.0318, 580994.8389 4275267.4381, 580994.8673 4275267.3736, 580994.3239 4275266.9655, 580993.7005 4275266.3116, 580993.5152 4275266.1903, 580991.8844 4275265.1638, 580991.7139 4275265.2946, 580991.5705 4275265.4891, 580990.4452 4275267.2628, 580990.1548 4275267.7447, 580990.0031 4275268.0023, 580989.8736 4275268.2297, 580990.164 4275268.4583, 580990.2375 4275268.5093, 580990.4965 4275268.6763, 580990.826 4275268.8845, 580990.8388 4275268.8923, 580991.3172 4275269.1658, 580991.9052 4275269.5398, 580992.3238 4275269.7897, 580992.3515 4275269.8057, 580992.4967 4275269.889, 580992.6127000001 4275269.9522, 580992.8403 4275270.0501, 580993.03 4275270.0662, 580993.3365 4275269.7995, 580993.6424 4275269.3079, 580994.0201 4275268.7661, 580994.4751 4275268.0318))

Related

PySpark Cosine Similarity between two vectors of TF-IDF values the Cosine Similarity using SparseMatrix + koalas or Pandas API on Spark

I do try to implement this Name Matching Cosine Similarity approach/functions get_matches_df in pyspark and pandas_on_spark(koalas) and struggling with optimizing this function (I do try to avoid conversion toPandas() for dataframes because will overload driver so I want to optimize this function and to scale it, so basically a batch approach will work perfect as in this example, or use pandas_udfs or simple UDFs that takes 1 vector and 2 dataframes:
>>> psdf = ps.DataFrame({'a': [1,2,3], 'b':[4,5,6]})
>>> def pandas_plus(pdf):
... return pdf[pdf.a > 1] # allow arbitrary length
...
>>> psdf.pandas_on_spark.apply_batch(pandas_plus)
this is the function I do work on optimizing (everything else I converted and created custom tfidfvectorizer, scaling cosine, pyspark sparsematrix generator and all I have left to optimize is this part (because uses loc and not sure how does work, I don't mind to have it behave as pandas aka all dataframe to driver but ideally will be
def get_matches_df(sparse_matrix, name_vector, top=100):
non_zeros = sparse_matrix.nonzero()
sparserows = non_zeros[0]
sparsecols = non_zeros[1]
if top:
nr_matches = top
else:
nr_matches = sparsecols.size
left_side = np.empty([nr_matches], dtype=object)
right_side = np.empty([nr_matches], dtype=object)
similairity = np.zeros(nr_matches)
for index in range(0, nr_matches):
left_side[index] = name_vector[sparserows[index]]
right_side[index] = name_vector[sparsecols[index]]
similairity[index] = sparse_matrix.data[index]
return pd.DataFrame({'left_side': left_side,
'right_side': right_side,
'similairity': similairity})

scala return matrix of average pixels

Here's the thing: I want to modify (and then return) a matrix of integers that is given in the parameters of the function. The funcion average (of the class MatrixMotionBlur) gives the average between the own pixel, upper, down and left pixels. Follows the following formula:
result(x, y) = (M1(x, y)+M1(x-1, y)+M1(x, y-1)+M1(x, y+1)) / 4
This is the code i've implemented so far
MatrixMotionBlur - Average function
MotionBlurSingleThread - run
The objetive here is to apply "average" method to alter the matrix value and return that matrix. The thing is the program gives me error when I to insert the value on the matrix.
Any ideas how to do this ?
The functional way
val updatedData = data.map{ outter =>
outter(i).map{ inner =>
mx.average(i.j)
}
}
Pay attention that Seq is immutable collection type and you can't just modify it, you can create new, modified collection only.
By the way, why you iterate starting 1, but not 0. Are you sure you want it?

MATLAB find row and column index of closest to specified value

I have latitude LAT, longitude LON and windspeed from a netCDF file.
I want to find the windspeed at a given LAT,LON coordinate %Location of Met Mast 51.94341,1.922094888. I am trying to find the nearest value to RefLAT and RefLONin the LAT and LON matrices respectively. When I have the nearest values in LAT and LON I will then use the addresses to locate my windspeed at this location.
When I use the code below, I expect a single value for each of the column and row values CLON, CLAT, RLAT and RLON. Instead I get CLON, CLAT, RLAT and RLON as arrays of 972 values, the matrices I am searching LAT and LON are size 848 x 972.
LAT = ncread('wind_level2.nc','latitude');
LON = ncread('wind_level2.nc','longitude');
wind = ncread('wind_level2.nc','wind');
LAT = double(LAT);
LON =double(LON);
%Location of Met Mast 51.94341,1.922094888
RefLAT=51.94341;
LATcalc = abs(LAT - RefLAT);
[RLAT,CLAT]=find(min(LATcalc));
RefLON=1.922094888;
LONcalc = abs(LON - RefLON);
[RLON,CLON]=find(min(LONcalc));`
Any help appreciated. Thanks
Data sample as requested:
LAT: 848x972 double:
51.6652641296387 51.6608505249023 51.6564369201660 51.6520233154297 51.6476097106934 51.6431961059570 51.6387825012207
51.6663322448731 51.6619186401367 51.6575050354004 51.6530914306641 51.6486778259277 51.6442642211914 51.6398506164551
51.6674041748047 51.6629867553711 51.6585731506348 51.6541595458984 51.6497459411621 51.6453323364258 51.6409187316895
51.6684722900391 51.6640548706055 51.6596412658691 51.6552276611328 51.6508140563965 51.6464004516602 51.6419868469238
51.6695404052734 51.6651229858398 51.6607093811035 51.6562957763672 51.6518821716309 51.6474685668945 51.6430549621582
51.6706047058106 51.6661911010742 51.6617774963379 51.6573638916016 51.6529502868652 51.6485366821289 51.6441192626953
51.6716728210449 51.6672592163086 51.6628456115723 51.6584320068359 51.6540145874023 51.6496009826660 51.6451873779297
51.6727409362793 51.6683235168457 51.6639099121094 51.6594963073731 51.6550827026367 51.6506690979004 51.6462554931641
51.6738052368164 51.6693916320801 51.6649780273438 51.6605644226074 51.6561470031738 51.6517333984375 51.6473197937012
51.6748695373535 51.6704559326172 51.6660423278809 51.6616287231445 51.6572151184082 51.6528015136719 51.6483840942383
51.6759376525879 51.6715240478516 51.6671066284180 51.6626930236816 51.6582794189453 51.6538658142090 51.6494522094727
LON 848x972 double:
3.04663085937500 3.04491543769836 3.04320049285889 3.04148554801941 3.03977084159851 3.03805613517761 3.03634166717529
3.03959774971008 3.03788304328918 3.03616857528687 3.03445434570313 3.03274011611938 3.03102612495422 3.02931237220764
3.03256440162659 3.03085041046143 3.02913665771484 3.02742290496826 3.02570939064026 3.02399611473084 3.02228283882141
3.02553081512451 3.02381753921509 3.02210426330566 3.02039122581482 3.01867818832397 3.01696562767029 3.01525306701660
3.01849699020386 3.01678419113159 3.01507163047791 3.01335930824280 3.01164698600769 3.00993490219116 3.00822281837463
3.01146292686462 3.00975084304810 3.00803875923157 3.00632715225220 3.00461530685425 3.00290393829346 3.00119256973267
3.00442862510681 3.00271701812744 3.00100588798523 2.99929451942444 2.99758362770081 2.99587273597717 2.99416208267212
2.99739408493042 2.99568319320679 2.99397253990173 2.99226188659668 2.99055147171021 2.98884129524231 2.98713135719299
2.99035930633545 2.98864889144897 2.98693895339966 2.98522901535034 2.98351931571960 2.98180961608887 2.98010015487671
2.98332428932190 2.98161458969116 2.97990512847900 2.97819590568543 2.97648668289185 2.97477769851685 2.97306895256043
As noted by #excaza, your syntax of find is not appropriate in your case. Without any comparison operator, find assumes a logical comparison and will return a vector containing the indices of all non-zero elements. That is not what you want.
Try the following :
[RLAT,CLAT]=find(LATcalc<eps);
where eps is an error tolerance, for instance 0.00001.
You can't give find a perfect equality because the figure you are seeking may be not present in the matrix. With abs(LAT - RefLAT), you will have a matrix of difference between two values. As before, you may not have a perfect zero so you find the closest result to zero by an error tolerance low enough to be sure to catch the minimum.

How to make sense of principal component analysis (PCA) in MATLAB

I have a data set of 3 different variables, each variable has 37 data points as follows:
Variable_1 = [0.489274770173646 0.534659090909091 0.496806966618287 0.593160935871933 0.542091836734694 0.514607775477341 0.580715497052410 0.542977656178750 0.624465240641712 0.644904791797447 0.444644611857190 0.464080100125156 0.522286821705426 0.507719139590466 0.612791008830612 0.561735261401557 0.524166666666667 0.526627218934911 0.449009900990099 0.472768878718535 0.488477561567263 0.576187425642902 0.558307692307692 0.609308792372882 0.647109905020352 0.513392857142857 0.454701120797011 0.557692307692308 0.511568509615385 0.440248676030394 0.500000000000000 0.593340146482712 0.518269230769230 0.623676307886835 0.563086974275214 0.609080188679245 0.769444444444444]
Variable_2 = [0.573717948717949 0.489656381486676 0.443821689259645 0.578812453392990 0.678328092243187 0.476432291666667 0.460748792270531 0.593650793650794 0.585645494152717 0.540435139573071 0.536423112870416 0.471528337362342 0.514469014469015 0.459801313718039 0.674409015942826 0.526881720430108 0.437327188940092 0.531890398342160 0.479985035540591 0.449145299145299 0.553381642512077 0.524932614555257 0.652630308880308 0.561587521131090 0.560003234675724 0.537254901960784 0.521990521327014 0.466041489059392 0.571461291800275 0.413770728190339 0.493939393939394 0.458024968229051 0.579528535980149 0.512145748987855 0.567205861018424 0.463562753036437 0.562938596491228]
Variable_3 = [0.630327868852459 0.521367521367521 0.467658730158730 0.485012755102041 0.523217247097844 0.449032738095238 0.574519230769231 0.594594594594595 0.544390243902439 0.581524147097918 0.487662337662338 0.497564726993079 0.417307692307692 0.609668109668110 0.508928571428572 0.511870845204179 0.444067796610169 0.562337662337663 0.494043887147335 0.530476190476191 0.484235294117647 0.502136752136752 0.632418524871355 0.528787878787879 0.619780219780220 0.416958041958042 0.552419354838710 0.586057692307692 0.461351186853317 0.495276653171390 0.524305555555555 0.655671296296296 0.496873496873497 0.462542087542088 0.660491689750693 0.772549019607843 0.558589870903674]
I put all three variables in a matrix, where the columns are the variables and the rows are the 37 data points.
I uses the PCA function in MATLAB and it gives me the following matrix:
PCA = 0.6370 0.3070 0.7071
0.3494 0.7026 -0.6199
0.6871 -0.6420 -0.3403
First Question: What does each row and each column represent in the PCA matrix.
Second Question: How can I use this matrix to plot each variable along its principle component in 3 dimensions.
Thank you, I very appreciate any help

postgis convert Points to polygon

what is the easy way to convert points to polygon?
i've tried this query
SELECT ST_GeomFromText('POLYGON((157 -536.0,157 -537.0,157 -538.0,157 -539.0,157 -540.0,157 -541.0,157 -542.0,157 -543.0,157 -544.0,157 -545.0,158 -545.0,159 -545.0,160 -545.0,161 -545.0,162 -545.0,163 -545.0,164 -545.0,165 -545.0,165 -544.0,165 -543.0,165 -542.0,165 -541.0,165 -540.0,165 -539.0,165 -538.0,165 -537.0,165 -536.0,164 -536.0,163 -536.0,162 -536.0,161 -536.0,160 -536.0,159 -536.0,158 -536.0,157.0 -536.0))');
but its results are not as expected as shown below
which is supposed to be like this
Obviously your points are not in the correct order to define a polygon., and as the commenter pointed out, you have more than one polygons.
You could divide them into sets that make each polygon (manually?), and construct a multipolygon as follows:
SELECT ST_AsText(ST_Collect(ARRAY[ST_GeomFromText('POLYGON(..first polygon...)'),ST_GeomFromText('POLYGON(..2nd polygon...)',...,ST_GeomFromText('POLYGON(..last polygon...)')]));