Why does ST_MakeValid() strip SRID from already-defined geometries? - postgresql

I'm attempting to use the PostGIS ST_MakeValid() function on a series of mostly-concentric isodistance multipolygons . . .
. . . which are defined with clear geometry type and SRID (and while they may not be perfectly valid, are still valid enough to render in QGIS as shown above):
trade=# \d tmp1
Table "public.tmp1"
Column | Type | Collation | Nullable | Default
-----------+-----------------------------+-----------+----------+---------
the_geom | geometry(MultiPolygon,4326) | | |
Unfortunately, the ST_MakeValid() function is stripping both SRID and geometry type when I use it to create a new table:
trade=# CREATE TABLE tmp2 AS (SELECT ST_MakeValid(the_geom) AS the_geom_valid FROM tmp1);
SELECT 25
trade=# \d tmp2
Table "public.tmp2"
Column | Type | Collation | Nullable | Default
----------------+----------+-----------+----------+---------
the_geom_valid | geometry | | |
. . . and ST_SetSRID() cannot resolve it, either by creating a new table:
trade=# CREATE TABLE tmp3 AS (SELECT ST_SetSRID(the_geom_valid,4326) AS the_geom_srid FROM tmp2);
SELECT 25
trade=# \d tmp3
Table "public.tmp3"
Column | Type | Collation | Nullable | Default
---------------+----------+-----------+----------+---------
the_geom_srid | geometry | | |
. . . or by nesting functions:
trade=# CREATE TABLE tmp4 AS (SELECT ST_SetSRID(ST_MakeValid(the_geom),4326) AS the_geom_all FROM tmp1);
SELECT 25
trade=# \d tmp4
Table "public.tmp4"
Column | Type | Collation | Nullable | Default
--------------+----------+-----------+----------+---------
the_geom_all | geometry | | |
. . . or even by using everyone's favorite ST_MakeValid() semi-substitute, ST_Buffer():
trade=# CREATE TABLE tmp5 AS (SELECT ST_Buffer(the_geom,0) AS the_geom_buffer FROM tmp1);
SELECT 25
trade=# \d tmp5;
Table "public.tmp5"
Column | Type | Collation | Nullable | Default
-----------------+----------+-----------+----------+---------
the_geom_buffer | geometry | | |
I can't find any documentation that suggests this is expected behavior when using ST_MakeValid() - how can I create valid geometries without losing geom type and SRID?

The SRID is being preserved. For instance, try this:
SELECT st_srid(the_geom_buffer) FROM tmp2 LIMIT 1;
and you should see something like:
┌─────────┐
│ st_srid │
├─────────┤
│ 4326 │
└─────────┘
The reason for this is that ST_MakeValid() can return any of a number of types, but can't know what they'll be ahead of time. From the documentation:
In case of full or partial dimensional collapses, the output geometry may be a collection of lower-to-equal dimension geometries or a geometry of lower dimension.
So the only option is to return a generic geometry.
However, if you are confident that won't happen, you can force it using a cast (or an ALTER statement post hoc):
CREATE TABLE tmp2 AS
SELECT
ST_MakeValid(the_geom)::geometry(MultiPolygon, 4326) AS the_geom_valid
FROM tmp1;

Related

selecting multiple columns but group by only one in postgres

I have a simple table in postgres:
remoteaddr count
142.4.218.156 592
158.69.26.144 613
167.114.209.28 618
Which I pulled using the following:
select remoteaddr,
count (remoteaddr)
from domain_visitors
group by remoteaddr
having count (remoteaddr) > 500
How do I select additional columns and still only group by remoteaddr?
Option 1: You could use the array_agg() function to concatenate the additional column values into a grouped list:
SELECT
remoteaddr,
array_agg(DISTINCT username) AS unique_users,
array_agg(username) AS repeated_users,
count(remoteaddr) as remote_count
FROM domain_visitors
GROUP BY remoteaddr;
See this SQL Fiddle. This query would return something like the below:
+----------------+---------------------------------+-----------------------------------------------------------------------------------------------------+--------------+
| remoteaddr | unique_users | repeated_users | remote_count |
+----------------+---------------------------------+-----------------------------------------------------------------------------------------------------+--------------+
| 142.4.218.156 | anotheruser,user9688766,vistor1 | user9688766,anotheruser,vistor1,vistor1,vistor1,vistor1,vistor1,anotheruser,anotheruser,anotheruser | 10 |
| 158.69.26.144 | anotheruser,user9688766 | anotheruser,user9688766,user9688766,user9688766,user9688766 | 5 |
| 167.114.209.28 | vistor1 | vistor1 | 1 |
+----------------+---------------------------------+-----------------------------------------------------------------------------------------------------+--------------+
Option 2: You could put your first query in a common table expression (aka a "WITH" clause), and join it against the original table, like this:
WITH grouped_addr AS (
SELECT remoteaddr, count(remoteaddr) AS remote_count
FROM domain_visitors
GROUP BY remoteaddr
)
SELECT ga.remoteaddr, dv.username, ga.remote_count
FROM grouped_addr ga
INNER JOIN domain_visitors dv
ON ga.remoteaddr = dv.remoteaddr
WHERE remote_count > 500;
Here is a SQL Fiddle.
Bear in mind that this will return repeated results for any additional columns (in this example, username). This is not usually what you want. Note each of the SELECT examples in the Fiddles and see which best suits your purpose.

PostgreSQL Group By not working as expected - wants too many inclusions

I have a simple postgresql table that I'm tying to query. Imaging a table like this...
| ID | Account_ID | Iteration |
|----|------------|-----------|
| 1 | 100 | 1 |
| 2 | 101 | 1 |
| 3 | 100 | 2 |
I need to get the ID column for each Account_ID where Iteration is at its maximum value. So, you'd think something like this would work
SELECT "ID", "Account_ID", MAX("Iteration")
FROM "Table_Name"
GROUP BY "Account_ID"
And I expect to get:
| ID | Account_ID | MAX(Iteration) |
|----|------------|----------------|
| 2 | 101 | 1 |
| 3 | 100 | 2 |
But when I do this, Postgres complains:
ERROR: column "ID" must appear in the GROUP BY clause or be used in an aggregate function
Which, when I do that it just destroys the grouping altogether and gives me the whole table!
Is the best way to approach this using the following?
SELECT DISTINCT ON ("Account_ID") "ID", "Account_ID", "Iteration"
FROM "Marketing_Sparks"
ORDER BY "Account_ID" ASC, "Iteration" DESC;
The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.
SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration

Postgres - updates with join gives wrong results

I'm having some hard time understanding what I'm doing wrong.
The result of this query shows the same results for each row instead of being updated by the right result.
My DATA
I'm trying to update a table of stats over a set of business
business_stats ( id SERIAL,
pk integer not null,
b_total integer,
PRIMARY KEY(pk)
);
the details of each business are stored here
business_details (id SERIAL,
category CHARACTER VARYING,
feature_a CHARACTER VARYING,
feature_b CHARACTER VARYING,
feature_c CHARACTER VARYING
);
and here a table that associate the pk with the category
datasets (id SERIAL,
pk integer not null,
category CHARACTER VARYING;
PRIMARY KEY(pk)
);
WHAT I DID (wrong)
UPDATE business_stats
SET b_total = agg.total
FROM business_stats b,
( SELECT d.pk, count(bd.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
GROUP BY d.pk
) agg
WHERE b.pk = agg.pk;
The result of this query is
| id | pk | b_total |
+----+----+-----------+
| 1 | 14 | 273611 |
| 2 | 15 | 273611 |
| 3 | 16 | 273611 |
| 4 | 17 | 273611 |
but if I run just the SELECT the results of each pk are completely different
| pk | agg.total |
+----+-------------+
| 14 | 273611 |
| 15 | 407802 |
| 16 | 179996 |
| 17 | 815580 |
THE QUESTION
why is this happening?
why is the WHERE clause not working?
Before writing this question I've used as reference these posts: a, b, c
Do the following (I always recommend against joins in Updates)
UPDATE business_stats bs
SET b_total =
( SELECT count(c.id) total
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk
)
/*optional*/
where exists (SELECT *
FROM business_details AS bd
INNER JOIN datasets AS d
ON bd.category = d.category
where d.pk=bs.pk)
The issue is your FROM clause. The repeated reference to business_stats means you aren't restricting the join like you expect to. You're joining agg against the second unrelated mention of business_stats rather than the row you want to update.
Something like this is what you are after (warning not tested):
UPDATE business_stats AS b
SET b_total = agg.total
FROM
(...) agg
WHERE b.pk = agg.pk;

Insert output of query into another table in postgres?

I'm working in Postgres 9.4. I have two tables:
Table "public.parcel"
Column | Type | Modifiers
ogc_fid | integer | not null default
wkb_geometry | geometry(Polygon,4326) |
county | character varying |
parcel_area | double precision |
Table "public.county"
Column | Type | Modifiers
--------+------------------------+-----------
name | character(1) |
chull | geometry(Polygon,4326) |
area | double precision |
I would like to find all the unique values of county in parcel, and the total areas of the attached parcels, and then insert them into the county table as name and area respectively.
I know how to do the first half of this:
SELECT county,
SUM(parcel_area) AS area
FROM inspire_parcel
GROUP BY county;
But what I don't know is how to insert these values into county. Can anyone advise?
I think it's something like:
UPDATE county SET name, area = (SELECT county, SUM(parcel_area) AS area
FROM inspire_parcel GROUP BY county)
You use INSERT INTO. So, something like this:
INSERT INTO county
SELECT county, SUM(parcel_area) AS area
FROM inspire_parcel GROUP BY county;

Need cleaner update method in PostgreSQL 9.1.3

Using PostgreSQL 9.1.3 I have a points table like so (What's the right way to show tables here??)
| Column | Type | Table Modifiers | Storage
|--------|-------------------|-----------------------------------------------------|----------|
| id | integer | not null default nextval('points_id_seq'::regclass) | plain |
| name | character varying | not null | extended |
| abbrev | character varying | not null | extended |
| amount | real | not null | plain |
In another table, orders I have a bunch of columns where the name of the column exists in the points table via the abbrev column, as well as a total_points column
| Column | Type | Table Modifiers |
|--------------|--------|--------------------|
| ud | real | not null default 0 |
| sw | real | not null default 0 |
| prrv | real | not null default 0 |
| total_points | real | default 0 |
So in orders I have the sw column, and in points I'll now have an amount that realtes to the column where abbrev = sw
I have about 15 columns like that in the points table, and now I want to set a trigger so that when I create/update an entry in the points table, I calculate a total score. Basically with just those three shown I could do it long-hand like this:
UPDATE points
SET total_points =
ud * (SELECT amount FROM points WHERE abbrev = 'ud') +
sw * (SELECT amount FROM points WHERE abbrev = 'sw') +
prrv * (SELECT amount FROM points WHERE abbrev = 'prrv')
WHERE ....
But that's just plain ugly and repetative, and like I said there are really 15 of them (right now...). I'm hoping there's a more sophisticated way to handle this.
In general each of those silly names on the orders table represents a type of work associated with the order, and each of those types has a 'cost' to it, which is stores in the points table. I'm not married to this structure if there's a cleaner setup.
"Serialize" the costs for orders:
CREATE TABLE order_cost (
order_cost_id serial PRIMARY KEY
, order_id int NOT NULL REFERENCES order
, cost_type_id int NOT NULL REFERENCES points
, cost int NOT NULL DEFAULT 0 -- in Cent
);
For a single row:
UPDATE orders o
SET total_points = COALESCE((
SELECT sum(oc.cost * p.amount) AS order_cost
FROM order_cost oc
JOIN points p ON oc.cost_type_id = p.id
WHERE oc.order_id = o.order_id
), 0);
WHERE o.order_id = $<order_id> -- your order_id here ...
Never use the lossy type real for currency data. Use exact types like money, numeric or just integer - where integer is supposed to store the amount in Cent.
More advice in this closely related example:
How to implement a many-to-many relationship in PostgreSQL?