How to find overlapped points in postGIS - postgresql

I have a set of point dataset in postgis with xcoordinate and ycordinate. There are few points which overlapped with each other as they are having same xcoordinate and ycoordniate. How to find the overlapped points using query in postgresql?

There are at least two ways to find duplicate records using aggregate functions. Assume a table my_table with geometry column geom and primary key gid:
First, using a HAVING statement, and collecting the primary keys with array_agg:
SELECT array_agg(gid), count(*)
FROM my_table
GROUP BY geom
HAVING count(gid) > 1;
Second, using a WINDOW to count the partitions.
WITH data AS (
SELECT gid, count(*) OVER (PARTITION BY geom)
FROM my_table
)
SELECT * FROM data WHERE count > 1;

Related

Sum and average total columns in PostgreSQL

I'm using this query to find duplicate dates but not sure how to sum each duplicate dates, average it and remove duplicate dates.
DB Schema
date_time
datapoint_1
datapoint_2
SQL Query
SELECT date_time, COUNT(date_time)
FROM MYTABLE
GROUP BY date_time
HAVING COUNT(date_time) > 1
ORDER BY COUNT(date_time)
I would create a new table to replace the old one. That is easier and might even perform better:
CREATE TABLE mytable2 (LIKE mytable);
INSERT INTO mytable2 (date_time, datapoint_1, datapoint_2)
SELECT m.date_time, avg(m.datapoint_1), avg(m.datapoint_2)
FROM mytable AS m
GROUP BY m.date_time;
Then you can drop mytable and rename mytable2 to replace it.
To prevent new rows from creating duplicates, you could change the way you insert data:
-- to keep track of counts
ALTER TABLE mytable ADD numval integer DEFAULT 1;
-- to prevent duplicates
ALTER TABLE mytable ADD UNIQUE (date_time);
-- to insert new rows
INSERT INTO mytable (date_time, datapoint_1, datapoint_2)
VALUES ('2021-06-30', 42.0, -34.9)
ON CONFLICT (date_time)
DO UPDATE SET numval = mytable.numval + 1,
datapoint_1 = mytable.datapoint_1 + excluded.datapoint_1,
datapoint_2 = mytable.datapoint_2 + excluded.datapoint_2;
-- to select the averages
SELECT date_time,
datapoint_1 / numval AS datapoint_1,
datapoint_2 / numval AS datapoint_2
FROM mytable;
When you use GROUP BY you can also use aggregate functions to reduce multiple lines to a single one (COUNT, that you used is one of such functions). In your case the query would be:
SELECT date_time, avg(datapoint_1), avg(datapoint_2)
FROM MYTABLE
GROUP BY date_time
For every distinct date_time you will get a single row with the average of datapoint_1 and datapoint_2.

How to update column with sequential data?

I have table in Postgres
create table foo(bar int);
I know I can insert sequential data like this
insert into public.foo (select i from generate_series(1, 10) as i);
Now I want to update rows
update public.foo set bar = sq.t from (select t from generate_series(100, 10000) as t) as sq;
but this will update column to all the same values.
I know I need to use where somehow, but how can I use it without primary keys from both sides?
EDIT:
I will add more real life detail. I have complex table with around 20 columns. Around 40k rows. I am interested in two columns here, pk (or id, integer, with id_seq) and created_date.
I populated this table with duplicating initial 10 rows, so created_date are repeating (like 123123123). I want to pick big range of dates from generate_series with 1 min interval and put them in created_date column to have sequential data there. And ideally regenerate ids from 1. How can I do it?
You're looking for
UPDATE foo
SET bar = CASE WHEN new_value < N THEN new_value + start END
FROM (
SELECT
ctid,
row_number() OVER () as new_value
FROM foo
) AS new_foo
WHERE foo.ctid = new_foo.ctid;
TABLE foo;
(online demo)

Does PostgreSQL have any column which maintains uniqueness between table rows like as oracle has rowId and rowNum?

In oracle DB we can display rowid & rownum with table column, but in PostgreSQL does anything can we display along with table column?
Ex: select rowid, rownum,id_mytable from mytable;
You can use the ctid column:
SELECT ctid, *
FROM some_table
LIMIT 10;
There is window function row_number() over(), but it doesn't return row identifier, it is only numbers of rows in query results.

Populate column with query results in Postgresql

Table columns structured like:
longitude, latitude, gid, Hash
-78.885636, 36.854, 1, empty
Using PostgreSQL 9.4 and trying to update column Hash with results of a geohash function:
SELECT ST_GeoHash(ST_SetSRID(ST_MakePoint(longitude::float, latitude::float), 4326))
FROM my_table;
To update the column, I am using:
UPDATE my_table SET Hash = (SELECT ST_GeoHash(ST_SetSRID(ST_MakePoint(longitude::float, latitude::float), 4326))
FROM my_table);
But I get an error:
ERROR: more than one row returned by a subquery used as an expression.
I'm new to this so I may be asking a tedious question. Any help would be appreciated. For now I'll be RTFM-ing.
To update Hash column with a calculated value you need to use this query, where gid is a primary key(probably you need to change it).
UPDATE my_table
SET Hash = my_table_2.geo_hash
FROM
(SELECT gid,
ST_GeoHash(ST_SetSRID(ST_MakePoint(longitude::float, latitude::float), 4326)) as geo_hash
FROM my_table) as my_table_2
WHERE my_table.gid = my_table_2.gid
This worked for me:
UPDATE my_table SET "Hash" = (
SELECT ST_GeoHash(ST_SetSRID(ST_MakePoint(longitude::float, latitude::float), 4326))
FROM my_table p
WHERE my_table.gid = p.gid
);

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1