Sequential scan rather than index scan - postgresql

I have a bunch of tables in postgresql and I run a query as follows
SELECT DISTINCT ON ...some stuff...
FROM "rent_flats" INNER JOIN "rent_flats_linked_users"
ON "rent_flats_linked_users"."rent_flat_id" = "rent_flats"."id"
INNER JOIN "users"
ON "users"."id" = rent_flats_linked_users"."user_id"
INNER JOIN "owners"
ON "owners"."id" = "users"."profile_id" AND "users"."profile_type" = 'Owner'
INNER JOIN "phone_numbers"
ON "phone_numbers"."person_id" = "owners"."id" AND "phone_numbers"."person_type" = 'Owner'
INNER JOIN "phone_number_categories"
ON "phone_number_categories"."id" = "phone_numbers"."phone_number_category_id"
INNER JOIN "localities"
ON "localities"."id" = "rent_flats"."locality_id"
INNER JOIN "regions"
ON "regions"."id" = "localities"."region_id"
INNER JOIN "cities"
ON "cities"."id" = "regions"."city_id"
INNER JOIN "property_types"
ON "property_types"."id" = "rent_flats"."property_type_id"
INNER JOIN "apartment_types"
ON "apartment_types"."id" = "rent_flats"."apartment_type_id"
WHERE "rent_flats"."status" = 3
AND (((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period))
AND (phone_number_categories.name IN ('SMS','SMS & Mobile'))
ORDER BY rf_id, phone_numbers.priority ASC
Note: The rent_flats table contains around 5 million rows, and rent_flats_linked_users contains around 600k rows and users contains 350k rows.Other tables are small in size.
The query takes about 6.8 secs to execute and the explain analyses shows that around 50% of the total time goes in sequential scans of the rent_flats, users and rent_flats_linked_users tables and the other 30% in Hash joins.
On setting seq_scan to off...the query takes even longer to ~11 secs (in this case Hash and Hash join take upto 97.5% of the time)
Here's the explain query plan analyses.
I have put indices on the fields involved in the inner joins as well as on fields involved in the filters like phone_numbers.priority and cities.short_period and cities.long_period. But I still get a sequential scan. What can be the reasons and possible solutions to fasten the query?

I suspect that if there is a part of that query worth optimising then it is this:
(((extract(epoch from age(current_date,rent_flats.date_added))/86400)::int) IN (cities.short_period,cities.long_period))
You really need to turn that into something like:
rent_flats.date_added in (...)
Then you can index date_added, and maybe index (date_added, status).
the next step would be to make sure that the join columns are indexed.

Related

Postgres CTE exponentially saving time?

I would love some clear explanation on the below, I would have thought PG would have optimized the first query to be just as fast as the second query, which uses a CTE, since it's basically using a simple index to filter and join on 2 columns. Everything in the joins and filtering, except "l"."type", has an index. This would be on PG 10.
The below takes 20 minutes+.
SELECT
transactions.id::text AS id,
transactions.amount,
transactions.currency::text AS currency,
transactions.external_id::text AS external_id,
transactions.check_sender_balance,
transactions.created,
transactions.type::text AS type,
transactions.sequence,
transactions.legacy_id::text AS legacy_id,
transactions.reference_transaction::text AS reference_transaction,
a.user_id as user_id
FROM transactions
JOIN lines l ON transactions.id = l.transaction
JOIN accounts a ON l.account = a.id
WHERE l.type='DEBIT'
AND "sequence" > 357550718
AND user_id IN ('5bf4ceb45d27fd2985a000000')
But the following, which I suppose explicitly optimizes accounts via CTE, finishes in ~2-4minutes. I would have thought PG would have optimized to match this type of performance?
WITH "accts" AS (
SELECT "id", "user_id"
FROM "accounts" WHERE "user_id" IN ('5bf4ceb45d27fd2985a000000')
)
SELECT "transactions"."id"::TEXT AS "id",
"transactions"."amount",
"transactions"."currency"::TEXT AS "currency",
"transactions"."external_id"::TEXT AS "external_id",
"transactions"."check_sender_balance",
"transactions"."created",
"transactions"."type"::TEXT AS "type",
"transactions"."sequence",
"transactions"."legacy_id"::TEXT AS "legacy_id",
"transactions"."reference_transaction"::TEXT AS "reference_transaction",
a."user_id" AS "user_id"
FROM "transactions"
JOIN "lines" "l" ON "transactions"."id" = "l"."transaction"
JOIN "accts" "a" ON "a"."id" = "l"."account"
WHERE "l"."type" = 'DEBIT'
AND "sequence" > 357550718
You have a second predicate in your second query vs your first. In your second in the CTE you are limiting it to only a specific user_id. Nowhere in your first query do you have that filter. If there is an index on the user_id field then it is probably helping your performance. You can run an explain plan on both queries separately by adding EXPLAIN to the beginning of them and see how the plan differs. This will help you figure out why there is a difference.

Select query became very very very slow in postgresql

I have one table which contains "133,072,194" records and I am trying to execute
SELECT COUNT(test)
FROM mytable
WHERE test = false
but it is taking Execution time: 128320.712 ms
I already have indexing on test column. Could you please let me know, what I can optimize or change, so my query became faster?
Because of this, my other select query is also not working.
If there are many rows where test is FALSE, you won't be able to get an exact result faster than with a sequential scan, which is slow for big tables.
If you have only few rows that satisfy the condition, you should create a partial index:
CREATE INDEX mytable_notest_ind ON mytable(id) WHERE NOT test;
(assuming that id is the primary key) and keep mytable autovacuumed often enough that you get an index only scan.
But usually exact results for queries like this are not required.
You could calculate an estimated count from the table statistics with a query like this:
SELECT t.reltuples
* (1 - t.nullfrac)
* mcv.freq AS count_false
FROM pg_stats AS s
CROSS JOIN LATERAL unnest(s.most_common_vals::text::boolean[],
s.most_common_freqs) AS mcv(val, freq)
JOIN pg_class AS t
ON s.tablename = t.relname
AND s.schemaname = t.relnamespace::regnamespace::text
WHERE s.tablename = 'mytable'
AND s.attname = 'test'
AND mcv.val = FALSE;
That would be very fast.
See my blog post for more considerations about the speed of SELECT count(*).

Query row and join many rows as JSON array

I am looking to join three tables via ids, the outcome being three json columns with the content from each.
The issue I am facing is that for each cat_request there are many cat_request_fields, I am currently getting cat_request_fields as one object and not an array of objects.
This query gets me a result set with cat_requests and cat_request_fields.
SELECT
row_to_json("cat_requests") AS cat_request,
array_agg(row_to_json("cat_request_fields")) AS cat_request_fields
FROM
"cat_requests"
LEFT OUTER JOIN "cat_request_fields" ON "cat_requests"."id" = "cat_request_fields"."cat_request_id"
GROUP BY
"cat_requests"."id"
LIMIT 10;
This query gets me a result set with cats and cat_requests.
SELECT
row_to_json("cat_requests") as cat_request,
row_to_json("cats") as cat
FROM
"cat_requests",
"cats"
WHERE
"cat_requests"."cat_id" = "cats"."id"
LIMIT 1;
I'm looking for a query that will give me a combination of the two...
How can I modify this query to map the cat_request_fields to be an array of rows and not just one.
SELECT
row_to_json("cat_requests") AS cat_request,
(select row_to_json("cats".*) as cats from "cats" where "cats"."id" = "cat_requests"."cat_id"),
array_agg(row_to_json("cat_request_fields")) AS cat_request_fields
FROM
"cat_requests"
INNER JOIN "cat_request_fields" ON "cat_requests"."id" = "cat_request_fields"."cat_request_id"
GROUP BY
"cat_requests"."id"
LIMIT 6;

SQLITE : Optimize ORDER BY Query

All,
I am iOS developer. Currently we have stored 2.5 lacks data in database. And we have implemented search functionality on that. Below is the query that we are using.
select CustomerMaster.CustomerName ,CustomerMaster.CustomerNumber,
CallActivityList.CallActivityID,CallActivityList.CustomerID,CallActivityList.UserID,
CallActivityList.ActivityType,CallActivityList.Objective,CallActivityList.Result,
CallActivityList.Comments,CallActivityList.CreatedDate,CallActivityList.UpdateDate,
CallActivityList.CallDate,CallActivityList.OrderID,CallActivityList.SalesPerson,
CallActivityList.GratisProduct,CallActivityList.CallActivityDeviceID,
CallActivityList.IsExported,CallActivityList.isDeleted,CallActivityList.TerritoryID,
CallActivityList.TerritoryName,CallActivityList.Hours,UserMaster.UserName,
(FirstName ||' '||LastName) as UserNameFull,UserMaster.TerritoryID as UserTerritory
from
CallActivityList
inner join CustomerMaster
ON CustomerMaster.DeviceCustomerID = CallActivityList.CustomerID
inner Join UserMaster
On UserMaster.UserID = CallActivityList.UserID
where
(CustomerMaster.CustomerName like '%T%' or
CustomerMaster.CustomerNumber like '%T%' or
CallActivityList.ActivityType like '%T%' or
CallActivityList.TerritoryName like '%T%' or
CallActivityList.SalesPerson like '%T%' )
and CallActivityList.IsExported!='2' and CallActivityList.isDeleted != '1'
order by
CustomerMaster.CustomerName
limit 50 offset 0
Without using 'order by' The query is returning result in 0.5 second. But when i am attaching 'order by', Time is increasing to 2 seconds.
I have tried indexing but it is not making any noticeable change. Any one please help. If we are not going through Query then how can we do it fast.
Thanks in advance.
This is due to the the limit. Without ORDER BY only 50 records have to be processed and any 50 will be returned. With ORDER BY all the records have to be processed in order to determine which ones are the first 50 (in order).
The problem is that the ORDER BY is performed on a joined table. Otherise you could apply the limit on the main table (I assume it is the CallActivityList) first and then join.
SELECT ...
FROM
(SELECT ... FROM CallActivityList ORDER BY ... LIMIT 50 OFFSET 0) AS CAL
INNER JOIN CustomerMaster ON ...
INNER JOIN UserMaster ON ...
ORDER BY ...
This would reduce the costs for joining the tables. If this is not possible, try at least to join CallActivityList with CustomerMaster. Apply the limit to those and finally join with UserMaster.
SELECT ...
FROM
(SELECT ...
FROM
CallActivityList
INNER JOIN CustomerMaster ON ...
ORDER BY CustomerMaster.CustomerName
LIMIT 50 OFFSET 0) AS ActCust
INNER JOIN UserMaster ON ...
ORDER BY ...
Also, in order to make the ordering unambiguous, I would include more columns into the order by, like call date and call id. Otherwise this could result in a inconsistent paging.

Sql Server Union Query Optimization

I have given a task to optimize the below sql query. Currently the query is timing out and causing a lot of blocking . I just started using t-sql, so please help me with optimizing the query.
select ExcludedID
from OfferConditions with (NoLock)
where OfferID = 27251
and ExcludedID in (210,223,409,423,447,480,633,...lots and lots of these...,
13346,13362,13380,13396,13407,1,2)
union
select CustomerGroupID as ExcludedID
from CPE_IncentiveCustomerGroups ICG with (NoLock)
inner join CPE_RewardOptions RO with (NoLock)
on RO.RewardOptionID = ICG.RewardOptionID
where RO.IncentiveID = 27251
AND ICG.Deleted = 0 and RO.Deleted = 0 and
and ExcludedUsers = 1
and CustomerGroupID in (210,223,409,423,447,480,633,...lots and lots of these...,
13346,13362,13380,13396,13407,1,2);
You can try to insert those IDs to temp table and join it instead of using IN statement.
The key to solving you problem is NOT to fix the SQL, but to fix indexes on your tables. For example, you should have a compound index on the OfferConditions table with OfferID and ExcludedID.
When you create the indexes on the other tables, remember that if the field is in the where OR the join filter, it should be part of your compound index.