Sphinx search with filtering by coordinates - sphinx

I have this query:
select id, post_category_name , title, description,WEIGHT(),
geodist(50.95, 24.69, latitude, longitude) dist
from serv1 where match('#(title,description) searchText ) and dist < 2000000000000;
in my DB post have latitude: 50.85, and longitude: 24.69
In result I have distance:893641 but real distance is 11119.49 meters.
I also tried convert input coordinates to radians but still have not correct distance.
What I'm doing wrong? Thank you in advance.

Try
geodist(50.95, 24.69, latitude, longitude, {in=deg}) dist
(note {in=deg})
It returns the number close to the one you're expecting:
mysql> select geodist(50.95, 24.69, 50.85, 24.69, {in=deg});
+-----------------------------------------------+
| geodist(50.95, 24.69, 50.85, 24.69, {in=deg}) |
+-----------------------------------------------+
| 11124.928711 |
+-----------------------------------------------+
1 row in set (0.00 sec)

Based on code in one of your other questions, would do something like
sql_query = select p.id, ... , \
RADIANS(l.Latitude) as latitude, RADIANS(l.Longitude) as longitude FROM ...
using the MySQL function to convert the stored degreee value to radions for the attribute.
The...
sql_attr_float = latitude
sql_attr_float = longitude
would be unchanged.

Related

Counting how many times each distinct value occurs in a column in PySparkSQL Join

I have used PySpark SQL to join together two tables, one containing crime location data with longitude and latitude and the other containing postcodes with their corresponding longitude and latitude.
What I am trying to work out is how to tally up how many crimes have occurred within each postcode. I am new to PySpark and my SQL is rusty so I am unsure where I am going wrong.
I have tried to use COUNT(DISTINCT) but that is simply giving me the total number of distinct postcodes.
mySchema = StructType([StructField("Longitude", StringType(),True), StructField("Latitude", StringType(),True)])
bgl_df = spark.createDataFrame(burglary_rdd, mySchema)
bgl_df.registerTempTable("bgl")
rdd2 = spark.sparkContext.textFile("posttrans.csv")
mySchema2 = StructType([StructField("Postcode", StringType(),True), StructField("Lon", StringType(),True), StructField("Lat", StringType(),True)])
pcode_df = spark.createDataFrame(pcode_rdd, mySchema2)
pcode_df.registerTempTable("pcode")
count = spark.sql("SELECT COUNT(DISTINCT pcode.Postcode)
FROM pcode RIGHT JOIN bgl
ON (bgl.Longitude = pcode.Lon
AND bgl.Latitude = pcode.Lat)")
+------------------------+
|count(DISTINCT Postcode)|
+------------------------+
| 523371|
+------------------------+
Instead I want something like:
+--------+---+
|Postcode|Num|
+--------+---+
|LN11 9DA| 2 |
|BN10 8JX| 5 |
| EN9 3YF| 9 |
|EN10 6SS| 1 |
+--------+---+
You can do a groupby count to get a distinct count of values for a column:
group_df = df.groupby("Postcode").count()
You will get the ouput you want.
For an SQL query:
query = """
SELECT pcode.Postcode, COUNT(pcode.Postcode) AS Num
FROM pcode
RIGHT JOIN bgl
ON (bgl.Longitude = pcode.Lon AND bgl.Latitude = pcode.Lat)
GROUP BY pcode.Postcode
"""
count = spark.sql(query)
Also, I have copied in from your FROM and JOIN clause to make the query more relevant for copy-pasta.

Creating polygon geometry from text field the same table in PostGiS

I have a table like this
Table "public.zone_polygons"
Column | Type |
-----------+-------------------------+
id | integer |
zone_id | integer |
zone_name | text |
zone_path | text |
geom | geometry(Geometry,4326) |
Each zone_path has a list of lat longs as text in this format
75.2323 30.7423,
75.3432 30.5344,
75.5423 30.2342,
75.9123 30.3122,
75.2323 30.7423
I am trying to generate a geometry using the zone_path values using the below query.
update zone_polygons set geom=ST_SetSRID(ST_MakePolygon(ST_GeomFromText('LINESTRING(zone_path)')), 4326);
I get the below error
ERROR: parse error - invalid geometry
HINT: "LINESTRING(zo" <-- parse error at position 13 within geometry
Is there a way in postgis to use one of the fields to create geometry.
I believe you have a typo and the coordinates are in Long - Lat (India), not Lat-Long (middle of Barents sea). PostGIS expects coordinates as Long - Lat, so if the input list is indeed in lat-long, it would needs to be swapped. You can either fix the source or use ST_FlipCoordinates
Since the coordinates are saved in a column, you would need to concatenate the LINESTRING( and the column content (not name) using 'LINESTRING(' || zone_path || ')'
with src as (select '75.2323 30.7423, 75.3432 30.5344, 75.5423 30.2342, 75.9123 30.3122, 75.2323 30.7423' zone_path)
SELECT ST_ASTEXT(
ST_SetSRID(
ST_MakePolygon(
ST_GeomFromText('LINESTRING(' || zone_path || ')')), 4326))
FROM src;
--> POLYGON((75.2323 30.7423,75.3432 30.5344,75.5423 30.2342,75.9123 30.3122,75.2323 30.7423))

How to get lat long from HEXEWKB PostGis?

I'm making query and want get from this format points again. Is this possible? How i can do it?
UPDATE geo2 SET geometry = ST_AsHEXEWKB(ST_GeomFromText('POLYGON((-15.66486 27.91996,-15.60610 27.91820, -15.60359 27.97169, -15.66586 27.97144,-15.66486 27.91996))',4326)) where options->>'koatuu' = '0110392101' ;
Yes, It's possible. You should just cast it to a geometry type with:
SELECT ST_AsHEXEWKB(ST_GeomFromText('POLYGON((-15.66486 27.91996,-15.60610 27.91820, -15.60359 27.97169, -15.66586 27.97144,-15.66486 27.91996))',4326))::geometry FROM geo2 WHERE ....
and you can get lat/long by using ST_X, ST_Y
SELECT ST_X(your_column::geometry) as long, ST_Y(your_column::geometry) as lat FROM geo2

Postgresql earthdistance - earth_box with radius

Please, can you explain me this behaviour of earth_box function ... or what I'm doing wrong?
data used
40.749276, -73.985643 = Empire State Building - is in my table
40.689266, -74.044512 = Statue of Liberty - is my current position in select - 8324m far from Empire State Building
my table
=> select id, latitude, longitude, title from requests;
id | latitude | longitude | title
----+-----------+------------+-----------------------
1 | 40.749276 | -73.985643 | Empire State Building
distance from Empire State Building to Statue of Liberty
=> SELECT id, latitude, longitude, title, earth_distance(ll_to_earth(40.689266, -74.044512), ll_to_earth(latitude, longitude)) as distance_from_current_location FROM requests ORDER BY distance_from_current_location ASC;
id | latitude | longitude | title | distance_from_current_location
----+-----------+------------+-----------------------+--------------------------------
1 | 40.749276 | -73.985643 | Empire State Building | 8324.42998846164
My current position is Statue of Libery which is more than 8000m far from Empire State Buildng, but
select return row with id 1 even when radius is only 5558m ! Can you explain me this behaviour or what is wrong?
=> SELECT id,latitude,longitude,title FROM requests WHERE earth_box(ll_to_earth(40.689266, -74.044512), 5558) #> ll_to_earth(requests.latitude, requests.longitude);
id | latitude | longitude | title
----+-----------+------------+-----------------------
1 | 40.749276 | -73.985643 | Empire State Building
versions of extensions and postgresql
=> \dx
List of installed extensions
Name | Version | Schema | Description
---------------+---------+------------+-------------------------------------------------------------- cube | 1.0 | public | data type for multidimensional
cubes earthdistance | 1.0 | public | calculate great-circle
distances on the surface of the Earth plpgsql | 1.0 |
pg_catalog | PL/pgSQL procedural language
=> select version();
version
-------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4beta2 on x86_64-apple-darwin13.3.0, compiled by Apple
LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit
thank you
noe
The problem here is that earth_box gets Statute Miles.
8324.42998846164 meters are near 5.172560986623845 statute miles
Unit Converter
The solution: convert the radius into Statute Miles units
earth_box(ll_to_earth(40.689266, -74.044512), 5558/1.609) //doesn't return results
earth_box(ll_to_earth(40.689266, -74.044512), 9000/1.609) //does.
As per the doc, by default, the radius is expressed in meters.
I just tested out and it looks from the documentation that you need both earth_box and earth_distance in your WHERE clause statement.
So, you need to use both earth_box and earth_distance in conjunction to get correct results.
Also in doc in earth_box function description it says:
Some points in this box are further than the specified great circle
distance from the location, so a second check using earth_distance
should be included in the query.
So the following will return results
SELECT earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) distance
FROM (SELECT 1) test
WHERE
(earth_box(ll_to_earth(40.749276, -73.985643), 9000) #> ll_to_earth(40.689266,-74.044512))
AND earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) <= 9000
order by distance desc
but this won't as the actual distance is about 8324.429988461638 meters
SELECT earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) distance
FROM (SELECT 1) test
WHERE
(earth_box(ll_to_earth(40.749276, -73.985643), 6000) #> ll_to_earth(40.689266,-74.044512))
AND earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) <= 6000
order by distance desc

How to write a conditional SELECT query in TSQL using arguments in the WHERE/AND clause?

I've got a stored procedure that returns postal codes within a specified radius. The arguments are
ALTER PROCEDURE [dbo].[proximitySearch]
#proximity int = 0,
#country varchar (2) = NULL,
#city varchar (180) = NULL,
#state varchar (100) = NULL,
#stateAbr varchar (2) = NULL,
#postalCode varchar(20) = NULL
AS...
In the proc, the first query needs to select a single record (or no record) that matches whatever was passed in and assign the lat/long to local variables, as I started to write below:
SELECT TOP 1 #Longitude = Longitude, #Latitude = Latitude
FROM PostalCodes
WHERE ...
This is where I get stumped... the WHERE clause needs to be conditional based on what was passed in. Some arguments (or all of them) can be NULL, and I don't want use them in the query if they are.
I was thinking along the lines of:
SELECT TOP 1 #Longitude = Longitude, #Latitude = Latitude
FROM PostalCodes
WHERE Longitude IS NOT NULL
AND CASE WHEN #postalCode IS NOT NULL THEN PostalCode = #postalCode ELSE 1 END
...but this doesn't work. How is something like this typically done? (I'm definitely not a seasoned TSQL guy!!!) Thanks in advance!
There is more than one way of implementing that kind of logic:
1)
SELECT TOP 1 #Longitude = Longitude, #Latitude = Latitude
FROM PostalCodes
WHERE Longitude IS NOT NULL
AND (#postalCode IS NULL OR PostalCode = #postalCode)
2)
SELECT TOP 1 #Longitude = Longitude, #Latitude = Latitude
FROM PostalCodes
WHERE Longitude IS NOT NULL
AND PostalCode = COALESCE(#postalCode, PostalCode)
SELECT TOP 1 #Longitude = Longitude, #Latitude = Latitude
FROM PostalCodes
WHERE
((#postalCode IS NULL) OR (PostalCode = #postalCode))
AND ((#someotherparam IS NULL) OR (someothercolumn = #someotherparam)) etc...
But be aware that this technique can suffer from 'parameter sniffing'