Use Query result as element of another query - postgresql

I have this two queries,
SELECT ST_AsText(geom)
FROM areasTable
WHERE "Name" ILIKE 'Kachina';
Let's say that it returns a polygon value of: POLYGON((-XXX.XX XX.XXX, -XXX.XX XX.XXX, -XXX.XX XX.XXX, -XXX.XX XX.XXX)). I then use that value to do another search.
SELECT "ROAD_NAME"
FROM addresses
WHERE ST_Contains(ST_GEOMFROMTEXT('POLYGON((-XXX.XX XX.XXX, -XXX.XX XX.XXX, -XXX.XX XX.XXX, -XXX.XX XX.XXX))',4326), addresses.geom);
What I have been trying to do is save a step and just find all the roads within a certain area without having to manually copy and paste the polygon of the area. Any ideas?

Try creating a stored procedure. https://www.w3schools.com/sql/sql_stored_procedures.asp
Then write a program in a language that can interact with the SQL DB and run a loop that calls this stored procedure over and over.

I am including this as a help to someone who may be curious about how I solved this, but technically, MaximumBoy pointed me to the direct answer to my question, and for that reason I am going to vote for his answer as the correct one. I couldn't figure out how to do it Maximum's way, but that is because I know very little of SQL.
This is how I accomplished what I wanted. NOTICE THAT I changed the table name from "areasTable" to "areas" in my solution.
First way,
SELECT
"ROAD_NAME"
FROM
addresses
JOIN areas ON ST_Contains(areas.geom, addresses.geom)
WHERE
areas. "Name" ILIKE 'KACHINA';
The second way to accomplish this is the following,
SELECT
addresses."ROAD_NAME"
FROM
areas, addresses
WHERE
areas."Name" ILIKE 'KACHINA'
AND ST_Contains(areas.geom, addresses.geom);
The first one is a little bit slower than the second query, but this is from my empirical observations.

Related

Postgis and Postgres: How to perform a ST_Contains query with geometry array?

I have a boundary which is stored in a geometry array. (like {...,...,...})
My goal is to perform a ST_Contains query. I want see whether a node is inside that boundary or not.
I tried something like
SELECT ST_Contains(ST_Polygonize((SELECT CAST(bt.geomarray AS geometry[]) FROM boundarytable AS bt)), nodetable.geom)
But I always get errors like "Invalid hex character (,) encountered".
Can anybody show me the right way to do this?
Now that I know how to do it, I'm answering this question by myself.
We do not have to use an array. We step through each node's geom and create the polygon. We store the polygon in polygontable. (Notice: don't forget you need the polygon to be closed, so you have to add the first node as last node again in boundarytable before you perform the query. Otherwise you will get an error):
SELECT ST_MakePolygon(ST_MakeLine(bt.geom)) AS geomboundary
INTO TABLE polygontable
FROM boundarytable AS bt
GROUP BY bt.dummy -- (just a constant value to round up all bt.geom)
Then we can perform the ST_Contains query like
SELECT *, ST_Contains((SELECT geomboundary FROM polygontable), anytable.geom)

Perl : Tracking duplicates

I am trying to figure out what would be the best way to go ahead and locate duplicates in a 5 column csv data. The real data has more than million rows in it.
Following is the content of mentioned 6 columns.
Name, address, city, post-code, phone number, machine number
Data does not have fixed length, data might in certain columns might be missing in certain instances.
I am thinking of using perl to first normalize all the short forms used in names, city and address. Fellow perl enthusiasts from stackoverflow have helped me a lot.
But there would still be a lot of data which would be difficult to match.
So I am wondering is it possible to match content based on "LIKELINESS / SIMILARITY" (eg. google similar to gugl) the likeliness would be required to overcome errors that creeped in while collecting data.
I have 2 tasks in hand w.r.t. the data.
Flag duplicate rows with certain identifier
Mention the percentage match between similar rows.
I would really appreciate if I could get suggestions as to what all possible methods could be employed and which would propbably be best because of their certain merits.
You could write a Perl program to do this, but it will be easier and faster to put it into a SQL database and use that.
Most SQL databases have a way to import CSV. For this answer, I suggest PostgreSQL because it has very powerful string functions which you will need to find your fuzzy duplicates. Create your table with an auto incremented ID column if your CSV data doesn't already have unique IDs.
Once the import is done, add indexes on the columns you want to check for duplicates.
CREATE INDEX name ON whatever (name);
You can do a self-join to look for duplicates in whatever way you like. Here's an example that finds duplicate names.
SELECT id
FROM whatever t1
JOIN whatever t2 ON t1.id < t2.id
WHERE t1.name = t2.name
PostgreSQL has powerful string functions including regexes to do the comparisons.
Indexes will have a hard time working on things like lower(t1.name). Depending on the sorts of duplicates you want to work with, you can add indexes for these transforms (this is a feature of PostgreSQL). For example, if you wanted to search case insensitively you can add an index on the lower-case name. (Thanks #asjo for pointing that out)
CREATE INDEX ON whatever ((lower(name)));
// This will be muuuuuch faster
SELECT id
FROM whatever t1
JOIN whatever t2 ON t1.id < t2.id
WHERE lower(t1.name) = lower(t2.name)
A "likeness" match can be achieved in several ways, a simple one would be to use the fuzzystrmatch functions like metaphone(). Same trick as before, add a column with the transformed row and index it.
Other simple things like data normalization are better done on the data itself before adding indexes and looking for duplicates. For example, trim out and squish extra whitespace.
UPDATE whatever SET name = trim(both from name);
UPDATE whatever SET name = regexp_replace(name, '[[:space:]]+', ' ');
Finally, you can use the Postgres Trigram module to add fuzzy indexing to your table (thanks again to #asjo).

How to do a SQL query using columns from a related table?

I've got three related SQL tables, simplified they look like this:
ShopTable
[ShopID]
ShelfTable
[ShelfID]
[ShopID]
InventoryTable
[ShelfID]
[Value]
[ShopID] and [ShelfID] are relations. Now what I want to do is get the SUM of [Value] for one [ShopID], but this obviously won't work since [ShopID] ain't part of InventoryTable:
SELECT SUM([Value]) WHERE [ShopID] = '1'
How do I have to write the query to filter the InventoryTable using the ShopID?
SELECT SUM(i.value)
FROM shelfTable s
JOIN inventoryTable i
ON i.shelfId = s.shelfId
WHERE s.shopId = 1
This is a fundamental question about relations between tables, so I'll provide some detail, hoping that you can use some of these ideas when writing SQL queries in the future.
Let's start with one basic thing first. [ShopID] could refer to two different but related columns, one in [ShopTable] and one in [ShelfTable]. The same things applies to [ShelfID]. It's useful to always specify the table.
You describe [ShopID] and [ShelfID] as "relations." As Damien_The_Unbeliever has commented, those columns are, in fact, two pairs of primary and foreign keys. That is, [ShelfTable].[ShelfID] identifies a "shelf" record, and [InventoryTable].[ShelfID] relates an "inventory item" (whatever that is) to a "shelf." (It's not always possible to interpret rows in a database this naively, but I'm willing to guess I'm not too far off from reality.)
Likewise, each "shelf" belongs to one "shop," and [ShelfTable].[ShopID] refers to that specific "shop." Notice that because we have the value of [ShopID] already (I'll call it "#MyShopID"), we don't even need the [ShopTable] here. We can just use [ShelfTable].[ShopID] to filter for the "shelves" we're interested in.
You're asking to get the sum total of [InventoryTable].[Value] for one [ShopID] value, but [ShopID] doesn't show up in [InventoryTable]. That's where your (inner) join comes into play. You know that you'll be adding up values from [InventoryTable], but you've got to specify the particular "shop." You specify #MyShopID for [ShelfTable].[ShelfID], which will do your filtering in [InventoryTable] for you.
One final thing before composing the query. I'm assuming that you haven't oversimplified your tables too much, and that [Value] is the total value of each "inventory item," and not just a unit value. If it wasn't, we'd have to multiply values by quantities, etc., but I'll let you check your own work here.
So, here's what we do:
We select FROM the [InventoryTable]
but we INNER JOIN to the [ShelfTable] on [ShelfID] from both tables
and we only want "shelves" from one "shop," i.e. WHERE [ShelfTable].[ShopID] = #MyShopID
and then we SELECT the SUM([InventoryTable].[Value])
and we're done. In SQL, let's remove the brackets, provide some table aliases, and we'll get a query that looks like this:
SELECT SUM(inv.Value)
FROM InventoryTable AS inv
INNER JOIN ShelfTable AS shf ON shf.ShelfID = inv.ShelfID
WHERE shf.ShopID = #MyShopID
;
Here are a few take-away points to consider. Notice we handled the FROM clause first. You'll always want to do that.
You'll also want a "driving table" to start with, in this case, [InventoryTable]. The other tables in your join add extra information and provide you a means to filter, but don't otherwise interfere with your summing up. More complex queries don't offer such an obvious luxury, but we're not getting too fancy here.
You'll also note, just briefly, that because [ShelfID] is a primary key in [ShelfTable], those [ShelfID]'s are unique values in [ShelfTable], and so each "inventory" thing belongs to a single "shelf." So the join won't cause us to double-count values. That's a good thing to remember when you're not dealing with primary and foreign keys, like we're doing here.
Hope that helps. And I hope I didn't come across as too pedantic.

How to put together two queries?

In the title is what I need.
CREATE TABLE newTable1 AS SELECT t2.name,t2.the_geom2
FROM t1,t2
WHERE ST_Contains(ST_Expand(t2.the_geom2,0.05),t1.the_geom1)
and t1.gid=2;
CREATE TABLE newTable2 AS SELECT t1.the_geom,t1.label FROM t1 WHERE t1.gid=2;
First query result is all points within polygon and apart from it for 5min where this polygon has gid=2. But I also want to display this polygon. I tried to write in first query
... AS SELECT t2.name,t2.the_geom2,t1.the_geom1,t1.label ...but got only points without polygon...
This question is linked with already asked question "How to find all points away from some polygon?". But didn't get answere, so please...
And is ST_expand ok solution or it will be better to use ST_DWithin or ST_buffer ?
You can't combine two CREATE TABLE statements into one. Why are you creating tables if you are just querying data?
It sounds like what you are really trying to do is one query that will give you the points within the polygon and the polygon itself. Something like this?
SELECT
t1.the_geom AS polygon, t1.label AS polygon_label,
t2.the_geom2 AS point, t2.name AS point_name
FROM
t1, t2
WHERE
ST_Contains(ST_Expand(t2.the_geom2,0.05), t1.the_geom1)
AND t1.gid = 2;
If this is still not clear, post your complete table definitions and more details about what you are trying to do.

TSQL question : how to iterate columns of result set

I have a select statement and a cursor to iterate the rows I get. the problem is that I have many columns (more than 500), and so "fetch .. into #variable" is impossible for me. how can I iterate the columns (one by one, I need to process the data)?
Thanks in advance,
n.b
Two choices.
1/ Use SSIS or ADO.Net to pour through your dataset row by row.
2/ Consider what you're actually needing to achieve and find a set-based approach.
My preference is for option 2. Let us know what you need done and we'll find a way.
Rob
You can build a SQL string using sys.columns or INFORMATION_SCHEMA queries. Here's a post I wrote on that.