PostgreSQL scoring distance with trigrams

PostgreSQL scoring distance with trigrams - postgresql

I would like to understand why the distance between 2 words is so big with just a letter reverse.
Example:
SELECT name, searchable, searchable <-> 'fluerie' AS dist FROM shd_appellations WHERE region_id=3 ORDER BY dist LIMIT 1;
Result:
fleurie => 0,6666666
So, the distance is 0,666 between fluerie and fleurie ! I know the word is truncated into "flu", "lue", ... "rie" before comparing but I need to set a limit in my code and I fix it to 0.5 maximum distance to obtain a result... so in this case the matching is not taken.
I could set a max distance of 0.75... but I don't want to match other terms not related to search in some other cases.
An idea: could it be logical to increase the distance for small words, so depending search term width ?

Related

Postgres: How is rank aggregated in full text search

I'm trying to figure out how postgres's full text search rank values are aggregated (assume no normalization). If I knew the rank that postgres calculated for two halves of a document could I figure out (ignoring phrases that cross the boundary) the rank for the full document?
Specifically, suppose I know that tsvec_x has rank x and tsvec_y 2 has rank y relative to some ts_query Q. Is there a formula that (approximately) gives me the rank that tsvec_x || tsvec_y would have relative to the query Q in terms of x and y?
Like is the rank additive? Do I take the max? Or something more complicated?

Efficient way to select all points inside radius

I'm working with meanstack application. so I have mongodb collection that contain worldwide locations. Schema loooks like follows.
[{
address : "example address",
langitude : 79.8816,
latitude : 6.773
},
{...}]
What is the efficient way to select all points inside Circle( pre defined point and pre defined radius) ..??
Using SQL Query also we can do this. But I want a efficient way without iterating one by one and check weather it is inside that radius or not.

Distance d between two points with coordinates {lat1,lon1} and {lat2,lon2} is given by
d = 2*asin(sqrt((sin((lat1-lat2)/2))^2 +
cos(lat1)*cos(lat2)*(sin((lon1-lon2)/2))^2))
Above formula gives d in radians. To get the answer in km, use the formula below.
distance_km ≈ radius_km * distance_radians ≈ 6371 * d
Here, the magic number 6371 is approx. the radius of planet earth. This is the minimum computation that you will have to do in your case. You can compute this distance and select all the records having a distance less than your radius value.
Better approach
You can use Elasticsearch which supports geolocation queries for better performance.

How to define Traingular membership function for fuzzy controller design?

I am designing a fuzzy controller and for that, I have to define 3 triangular function sets. They are:
1 large
2 medium
3 small
But my problem is I have following data only:
Maximum input = 3 Minimum input= 0.1
Maximum output = 5.5 Minimum output= 0.8
How to define 3 triangular set range based on only this given information?

Here is the formula for a triangular membership function
f=0 if x<=a
f=(x-a)/(b-a) if a<=x<=b
f=(c-x)/(c-b) if b<=x<=c
f=0 if x>c
where a is the min, c is the max and b is the midpoint.
In your case, take the top situation where the max is 3 and the min is 0.1. The midpoint is (3+0.1)/2=1.55, so you have
f=0 if x<=0.1
f=(x-0)/(1.55-1) if 0.1<=x<=1.55
f=(3-x)/(3-1.55) if 1.55<=x<=3
f=0 if x>3
You should be able to take the 2nd example from here, but if not let me know. Something worth pointing out is that the midpoint may not be the ideal b in your situation. Any point between a and c could serve as your b, just know that it is the point where the membership function equals 1.
It is difficult to tell, but it looks like maybe you just have given parameters for two of the functions, perhaps for small and large or medium and large. You may need to use some judgement for the 3rd membership function.

MATLAB-How can I randomly select smaller values with higher probabilities?

I have a column vector "distances", and I want to select a value randomly from this vector such that smaller values have a higher probability of being selected. So far I am using the following, where "possible_cells" is the randomly selected value:
w=(fliplr(1:numel(distances)))/100
possible_cells=randsample((sort(distances)),1,true,w)
Basically, I flipped the distance vector to create probabilities of selection "w" (if I am understanding randsample correctly), so that the smallest value has the probability of being selected equal to the highest value. To check how well this works, I randomly drew 50 values and by using a histogram, I see that the values are higher than I would expect. Does anyone have any idea on how else to do what I described above?
0 Comments

How about something like this?
let's start with 10 sample distances with lengths no greater than 20 just to demonstrate:
d = randi(20,10,1);
Next, since we want smaller values to be more likely, let's take the reciprocal of those distances:
d_rec = 1./d;
Now, let's normalize so we can create a distribution from which to select our distance:
d_rec_norm = d_rec ./ sum(d_rec);
This new variable reflects the probability with which to select each given distance. Now comes a little trick... we choose the distance like this:
d_i = find(rand < cumsum(d_rec_norm),1);
This will give us the index of our chosen distance. The logic behind this is that when cumulatively summing the normalized values associated with each distance (d_rec_norm) we create "bins" whose widths are proportional to the likelihood of selecting each distance. All that is left is to pick a random number between 0 and 1 (rand) and see which "bin" it falls in.
I'm a new poster here, so let me know if this is unclear and I can try to improve my explanation.

Most efficient way to find points within a certain radius from a given point

I've read several questions + answers here on SO about this theme, but I can't understand which is the common way (if there is one...) to find all the points whithin a "circle" having a certain radius, centered on a given point.
In particular I found two ways that seem the most convincing:
select id, point
from my_table
where st_Distance(point, st_PointFromText('POINT(-116.768347 33.911404)', 4326)) < 10000;
and:
select id, point
from my_table
where st_Within(point, st_Buffer(st_PointFromText('POINT(-116.768347 33.911404)', 4326), 10000));
Which is the most efficient way to query my database? Is there some other option to consider?

Creating a buffer to find the points is a definite no-no because of (1) the overhead of creating the geometry that represents the buffer, and (2) the point-in-polygon calculation is much less efficient than a simple distance calculation.
You are obviously working with (longitude, latitude) data so you should convert that to an appropriate Cartesian coordinate system which has the same unit of measure as your distance of 10,000. If that distance is in meter, then you could also cast the point from the table to geography and calculate directly on the (long, lat) coordinates. Since you only want to identify the points that are within the specified distance, you could use the ST_DWithin() function with calculation on the sphere for added speed (don't do this when at very high latitudes or with very long distances):
SELECT id, point
FROM my_table
WHERE ST_DWithin(point::geography,
ST_GeogFromText('POINT(-116.768347 33.911404)'),
10000, false);

I have used following query
SELECT *, ACOS(SIN(latitude) * SIN(Lat)) + COS(latitude) * COS(Lat) * COS(longitude) - (Long)) ) * 6380 AS distance FROM Table_tab WHERE ACOS( SIN(latitude) * SIN(Lat) + COS(latitude) * COS(Lat) * COS(longitude) - Long )) * 6380 < 10
In above query latitude and longitude are from database and lat, long are the points from we want to search.
WORKING : it will calculate the distance(In KM) between all the points in database from search points and check if the distance is less then 10 KM. It will return all the co-ordinates within 10 KM.

I do not know how postgis does it best, but in general:
Depending on your data it might be best to first search in a square bounding box (which contains the search area circle) in order to eliminate a lot of candidates, this should be extremely fast as you can use simple range operators on lon/lat which are ideally indexed properly for this.
In a second step search using the radius.
Also if your limit max points is relatively low and you know you have a lot of candidates, you may simply do a first 'optimistic' attempt with a box inside your circle, if you find enough points you are done !

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL scoring distance with trigrams - postgresql

Related

Postgres: How is rank aggregated in full text search

Efficient way to select all points inside radius

How to define Traingular membership function for fuzzy controller design?

MATLAB-How can I randomly select smaller values with higher probabilities?

Most efficient way to find points within a certain radius from a given point

Categories

Resources