Postgres query shows under the 'Most time consuming section' of heroku - postgresql

This below query I am using for a search:
I guess this current time is not bad but still I am searching for some kind of more optimizations. Also I saw in the analyze report this nested loop and nested loop joins shows the red. It would be great if I get an idea to reduce that. I was thinking to add index for search key. It would be great if I can get more suggestions to improve this. Here I have added the explain analyze result with 3 times execution, which ran in production

You could try to add ingredients.name or ingredients.code to an existing index or to create a new index so that more rows are filtered during ingredients index scan.
You should also try to avoid to use fonction on column name such as LOWER(ingredients.name) to make sure the right index is used.

Related

don't use index in pandas-profiling

When running pandas-profiling on a dataframe I see it analyses the index as a variable. Note: My index is a unique key (named UUID)
Is there a way to exclude bringing in the index to report?
I understand I could remove it in pandas but in my head I would like to do
ProfileReport(df, use_index=False)
I agree that having an option to use_index=False in ProfileReport would be nice and clean, it apparently doesn't exist (yet).
So currently the only way I can find to exclude bringing the index into the report is by dropping it before profiling:
df.reset_index(drop=True, inplace=True)
This gets the job done.

Understanding SQL query complexity

I'm currently having trouble understanding why a seemingly simple query is taking much longer to return results than a much more complicated (looking) query.
I have a view, performance_summary (which in turn selects from another view). Currently, within psql, when I run a query like
SELECT section
FROM performance_summary
LIMIT 1;
it takes a minute or so to return a result, whereas a query like
SELECT section, version, weighted_approval_rate
FROM performance_summary
WHERE version in ('1.3.10', '1.3.11') AND section ~~ '%WEST'
ORDER BY 1,2;
gets results almost instantly. Without knowing how the view is defined, is there any obvious or common reason why this is?
Not really, without knowing how the view is defined. It could be that the "more complex" query uses an index to select just two rows and then perform some trivial grouping sorting on the two. The query without the where clause might see postgres operating on millions of rows, trillions of operations and producing a single row out after discarding 999999999 rows, we just don't know unless you post the view definition and the explain plan output for both queries
You might be falling into the trap of thinking that a View is somehow a cache of info - it isn't. It's a stored query, that is inserted into the larger query when you select from it/include it in another query- this means that the whole thing must be planned and executed from scratch. There isn't a notion that creating a View does any pre planning etc, onto which other further improvement is done. It's more like the definition of the View is pasted into any query that uses it, then the query is run as if it were just written there and then

PostgreSQL Probabilities: EXPLAIN on CREATE INDEX

I am using PostgreSQL to compute empirical probability density functions for pairs of variables across all my data. I am trying to determine if/when it is more effective to index before computing the PDF. I run EXPLAIN CREATE INDEX like,
EXPLAIN CREATE INDEX AB ON xrootd ("F.mName", "F.mOpenTime");
CREATE INDEX AB ON xrootd ("F.mName", "F.mOpenTime");
But PSQL complains,
psql:sql/stats.sql:3: ERROR: syntax error at or near "INDEX"
LINE 2: EXPLAIN CREATE INDEX AB ON xrootd ("F.mName", "F.mOpenTime")...
Is there a better way of doing anything I am trying to do? At the very least, I would like to know if constructing the indexes is useful. I have a lot of variable and a lot of data, so speeding this up is crucial.
Checking the cost of CREATE INDEX would be able to tell me if make the index is too expensive for the gain of using it.
There is no EXPLAIN CREATE INDEX, as EXPLAIN only works for SELECTs and DML queries (INSERT/DELETE/UPDATE).
What you are probably looking for is called hypothetical indexes. There is a PostgreSQL extension for that: https://github.com/dalibo/hypopg

Partial index not being used in psql 8.2

I would like to run a query on a large table along the lines of:
SELECT DISTINCT user FROM tasks
WHERE ctime >= '2012-01-01' AND ctime < '2013-01-01' AND parent IS NULL;
There is already an index on tasks(ctime), but most (75%) of rows have a non-NULL parent, so that's not very effective.
I attempted to create a partial index for those rows:
CREATE INDEX CONCURRENTLY task_ctu_np ON tasks (ctime, user)
WHERE parent IS NULL;
but the query planner continues to choose the tasks(ctime) index instead of my partial index.
I'm using postgresql 8.2 on the server, and my psql client is 8.1.
First, I second Richard's suggestion that upgrading should be at the top of your priority. The areas of partial indexes, etc. have, as I understood it, improved significantly since 8.2.
The second thing is you really need the actual query plans with timing information (EXPLAIN ANALYZE) because without these we can't talk about selectivity, etc.
So my order of business if I were you would be to upgrade first and then tune after that.
Now, I understand that 8.3 is a big upgrade (it is the only one that caused us issues in LedgerSMB). You may need some time to address that, but the alternative is to get further behind and be asking questions on a version that is less and less in current understanding as time goes on.

Postgres - gin index doesn't work

it seems that my server won't use gin index.
I've created a new database with one table.
I've inserted one row as example.
I've loaded trigram extension and created gin index using trigrams
But when I check if the index works right I can see it doesn't
Any ideas?
SQL: http://pastebin.com/1yDQQA1Z
P.S. A day ago I've followed a tutorial about trigrams. Basically it was the same like my example above. The table had 2 columns, numeric(5, 0) and character varying (the one with gin trgm index). Query was with like operator using "%" and index was working (I could see Bitmap using in query explain), so I know, my server can use index (and its properly installed).
Thanks in advance.
Don't test on one row, it is meaningless.
Here's an excerpt of the documentation explaining why, in Examining Index Usage:
Use real data for experimentation. Using test data for setting up
indexes will tell you what indexes you need for the test data, but
that is all.
It is especially fatal to use very small test data sets. While
selecting 1000 out of 100000 rows could be a candidate for an index,
selecting 1 out of 100 rows will hardly be, because the 100 rows
probably fit within a single disk page, and there is no plan that can
beat sequentially fetching 1 disk page.