Is ST_GeomFromText better than providing direct geometry? - postgresql

I have been working with postgis recently,and in my query if I use ST_GeomFromText it execute faster than running a sub-query to get geom.
I thought ST_GeomFromText will be more expensive but after running many tests every time I got the result faster, my question Is there any explanation behind this?
because for me getting the geom directly in sub-query is better than getting geom as text then added as GeomFromText.
Thanks,
Sara

Your issue is that the issues are different. ST_GeomFromText is going to be an immutable function, so the output depends on the input only. This means the planner can execute it once at the beginning of the query. Running a subquery means you are having to look up the geometry, which means disk access, etc. In the first, you have a little bit of CPU activity, executed once for the query, and on the second you have disk lookups.
So the answer to some extent depends on what you are doing with it. In general, you can assume that the optimizer will handle things like type conversions on input, where those are not dependent on settings, quite well.
Think about it this way.
SELECT * FROM mytable WHERE my_geom = ST_GeomFromText(....);
This gets converted into something like the following pseudocode:
private_geom = ST_GeomFromText(....);
SELECT * FROM mytable WHERE my_geom = private_geom;
Then that query gets planned and executed.
Obviously you don't want to be adding round trips just to avoid in-query lookups, but where you know the geometry, you might as well specify it via ST_GeomFromText(....) in your query.

Related

json_populate_record Performance Issues

I have Postgres 9.4 and using json_populate_record function with type which has 60 columns defined.
The table with the json, old_table, has 100,000 records and it takes over 2 minutes to do the following parse and insert command:
select (json_populate_record(null::v_type, row_to_json_output)).*
into new_table
from from old_table
Can I optimize it in any way? Alternatives? Are newer versions better in doing it?
Thanks
In my hands, using jsonb_populate_record with jsonb values is about 40% faster than using json_populate_record with json values.
Starting in v11, this statement could run in parallel. However, the cost estimate of the function json_populate_record is very low, so to get it to actually choose parallelization, you should probably do something like:
alter function json_populate_record cost 1000;
And the same with jsonb_populate_record.
But really, why are you doing this often enough to care if it takes 2 minutes? Certainly cycling data back and forth between table to json and back to table on a routine ongoing basis surely doesn't make much sense.

Why can't immutable UDFs be called within certain clauses of a CTE in Redshift?

The issue can be simplified to this, where, within a view, any CTE referring to some_immutable_func breaks, except in the WHERE, HAVING clauses, resulting in the following error::
create or replace function some_immutable_func ()
returns int
immutable as $$
SELECT 1
$$ language sql;
create view some_view as
WITH some_cte AS (
SELECT immutable_func()
)
SELECT * FROM some_cte;
FATAL: Query processing failed due to an internal error.
CONTEXT: SQL function "immutable_func"
SSL connection has been closed unexpectedly
The connection to the server was lost. Attempting reset: Succeeded.
-- but this is okay
create view some_view as
WITH some_cte AS (
SELECT * FROM some_table
WHERE ...some_immutable_func()...
HAVING ...some_immutable_func()...
)
SELECT * FROM some_cte;
CREATE VIEW
Built-in immutable UDFs such as ABS(-3) work fine. Simply changing the UDF to be stable fixes the issue, but I am looking to optimize query performance within a complex view, where apparently the stable nature of a certain UDF slows it down nearly 100x. Ideally, I would also like to minimize changes to the view's structure, hopefully doing a simple replace-all instead of rearranging all references to the UDF into WHERE and HAVING clauses.
I believe the problem may be with the query optimizer, but I'm surprised there is little information out there regarding the cryptic/finer details of Redshift/postgres and immutability.
EDIT: I've also discovered that changing the UDF language to python seems to work fine. However, it seems to have quite slow performance in my specific use case, potentially worse than just using the STABLE SQL UDF.
Just wanted to confirm that this has been resolved since the question was posted.
The provided example now works as expected.

Postgres: Perform statement and limit

I have the following query which I run every night.
perform distinct fn_debtor_summary( clientacc) from client where not clientacc is null;
However because the function is quite slow, when I debug I like to debug off a small subset of data, so I use the following query.
perform distinct fn_debtor_summary( clientacc) from client where not clientacc is null limit 10;
However I find that the limit doesn't work and it runs the function against the whole table.
Any ideas why this is happening and how I could run it against a small subset of the data without creating temporary tables?
PostgreSQL runs functions on every row in the PERFORM query, before applying the limit. So even through it returns only 10, it will still run the function more than 10 times.
the solution is to use a subquery, interestingly PERFORM doesnt work, but a SELECT will work as well.
select fn_debtor_summary( limitclients.clientacc) from (select clientacc from client limit 1) limitclients;

syntax for COPY in postgresql

INSERT INTO contacts_lists (contact_id, list_id)
SELECT contact_id, 67544
FROM plain_contacts
Here I want to use Copy command instead of Insert command in sql to reduce the time to insert values. I fetched the data using select operation. How can i insert it into a table using Copy command in postgresql. Could you please give an example for it?. Or any other suggestion in order to achieve the reduction of time to insert the values.
As your rows are already in the database (because you apparently can SELECT them), then using COPY will not increase the speed in any way.
To be able to use COPY you have to first write the values into a text file, which is then read into the database. But if you can SELECT them, writing to a textfile is a completely unnecessary step and will slow down your insert, not increase its speed
Your statement is as fast as it gets. The only thing that might speed it up (apart from buying a faster harddisk) is to remove any potential index on contact_lists that contains the column contact_id or list_id and re-create the index once the insert is finished.
You can find the syntax described in many places, I'm sure. One of those is this wiki article.
It looks like it would basically be:
COPY plain_contacts (contact_id, 67544) TO some_file
And
COPY contacts_lists (contact_id, list_id) FROM some_file
But I'm just reading from the resources that Google turned up. Give it a try and post back if you need help with a specific problem.

Does Firebird need manual reindexing?

I use both Firebird embedded and Firebird Server, and from time to time I need to reindex the tables using a procedure like the following:
CREATE PROCEDURE MAINTENANCE_SELECTIVITY
ASDECLARE VARIABLE S VARCHAR(200);
BEGIN
FOR select RDB$INDEX_NAME FROM RDB$INDICES INTO :S DO
BEGIN
S = 'SET statistics INDEX ' || s || ';';
EXECUTE STATEMENT :s;
END
SUSPEND;
END
I guess this is normal using embedded, but is it really needed using a server? Is there a way to configure the server to do it automatically when required or periodically?
First, let me point out that I'm no Firebird expert, so I'm answering on the basis of how SQL Server works.
In that case, the answer is both yes, and no.
The indexes are of course updated on SQL Server, in the sense that if you insert a new row, all indexes for that table will contain that row, so it will be found. So basically, you don't need to keep reindexing the tables for that part to work. That's the "no" part.
The problem, however, is not with the index, but with the statistics. You're saying that you need to reindex the tables, but then you show code that manipulates statistics, and that's why I'm answering.
The short answer is that statistics goes slowly out of whack as time goes by. They might not deteriorate to a point where they're unusable, but they will deteriorate down from the perfect level they're in when you recreate/recalculate them. That's the "yes" part.
The main problem with stale statistics is that if the distribution of the keys in the indexes changes drastically, the statistics might not pick that up right away, and thus the query optimizer will pick the wrong indexes, based on the old, stale, statistics data it has on hand.
For instance, let's say one of your indexes has statistics that says that the keys are clumped together in one end of the value space (for instance, int-column with lots of 0's and 1's). Then you insert lots and lots of rows with values that make this index contain values spread out over the entire spectrum.
If you now do a query that uses a join from another table, on a column with low selectivity (also lots of 0's and 1's) against the table with this index of yours, the query optimizer might deduce that this index is good, since it will fetch many rows that will be used at the same time (they're on the same data page).
However, since the data has changed, it'll jump all over the index to find the relevant pieces, and thus not be so good after all.
After recalculating the statistics, the query optimizer might see that this index is sub-optimal for this query, and pick another index instead, which is more suited.
Basically, you need to recalculate the statistics periodically if your data is in flux. If your data rarely changes, you probably don't need to do it very often, but I would still add a maintenance job with some regularity that does this.
As for whether or not it is possible to ask Firebird to do it on its own, then again, I'm on thin ice, but I suspect there is. In SQL Server you can set up maintenance jobs that does this, on a schedule, and at the very least you should be able to kick off a batch file from the Windows scheduler to do something like it.
That does not reindex, it recomputes weights for indexes, which are used by optimizer to select most optimal index. You don't need to do that unless index size changes a lot. If you create the index before you add data, you need to do the recalculation.
Embedded and Server should have exactly same functionality apart the process model.
I wanted to update this answer for newer firebird. here is the updated dsql.
SET TERM ^ ;
CREATE OR ALTER PROCEDURE NEW_PROCEDURE
AS
DECLARE VARIABLE S VARCHAR(300);
begin
FOR select 'SET statistics INDEX ' || RDB$INDEX_NAME || ';'
FROM RDB$INDICES
WHERE RDB$INDEX_NAME <> 'PRIMARY' INTO :S
DO BEGIN
EXECUTE STATEMENT :s;
END
end^
SET TERM ; ^
GRANT EXECUTE ON PROCEDURE NEW_PROCEDURE TO SYSDBA;