I have a simple Postgres Table. A simple query to count total records takes ages. I have 7.5 millions records in table, I using 8 vCPUs, 32 GB memory machine. Database is in same machine.
Edit: add query.
Following query is very slow:
SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000
Output of explain
$ explain SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000
---------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.17 rows=10000 width=1985)
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144730.02 rows=3835870 width=1985)
Filter: (NOT processed)
(3 rows)
My table is as below:
Column | Type | Collation | Nullable | Default
-------------------+----------------+-----------+----------+---------
id | integer | | |
name | character(500) | | |
domain | character(500) | | |
year_founded | real | | |
industry | character(500) | | |
size_range | character(500) | | |
locality | character(500) | | |
country | character(500) | | |
linkedinurl | character(500) | | |
employees | integer | | |
processed | boolean | | not null | false
employee_estimate | integer | | |
Indexes:
"import_csv_id_idx" btree (id)
"processed_idx" btree (processed)
Thank you
Edit 3:
# explain analyze SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8331.070..8355.556 rows=10000 loops=1)
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8331.067..8354.874 rows=10001 loops=1)
Filter: (NOT processed)
Rows Removed by Filter: 3482252
Planning time: 0.081 ms
Execution time: 8355.925 ms
(6 rows)
explain (analyze, buffers)
# explain (analyze, buffers) SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8236.899..8260.941 rows=10000 loops=1)
Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8236.896..8260.104 rows=10001 loops=1)
Filter: (NOT processed)
Rows Removed by Filter: 3482252
Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
Planning time: 0.386 ms
Execution time: 8261.406 ms
(8 rows)
It is slow because it has to dig through 3482252 rows which fail the processed = False criterion before finding the 10001st on which passes, and apparently all those failing rows are scattered randomly about the table leading to a lot of slow IO.
You either need an index on (processed, id), or on (id) where processed = false
If you do the first of these, you can drop the index on processed alone, as it would no longer be independently useful (if it ever were to start with).
Related
There are two large table num,pre in database :
\d num
Table "public.num"
Column | Type | Collation | Nullable | Default
----------+-------------------------+-----------+----------+---------------------------------
id | integer | | not null | nextval('num_id_seq'::regclass)
adsh | character varying(20) | | |
tag | character varying(256) | | |
version | character varying(20) | | |
coreg | character varying(256) | | |
ddate | date | | |
qtrs | numeric(8,0) | | |
uom | character varying(20) | | |
value | numeric(28,4) | | |
footnote | character varying(1024) | | |
\d pre
Table "public.pre"
Column | Type | Collation | Nullable | Default
----------+------------------------+-----------+----------+---------------------------------
id | integer | | not null | nextval('pre_id_seq'::regclass)
adsh | character varying(20) | | |
report | numeric(6,0) | | |
line | numeric(6,0) | | |
stmt | character varying(2) | | |
inpth | boolean | | |
rfile | character(1) | | |
tag | character varying(256) | | |
version | character varying(20) | | |
plabel | character varying(512) | | |
negating | boolean | | |
Check how many records in the tables:
select count(*) from num;
count
----------
83862587
(1 row)
Time: 204945.436 ms (03:24.945)
select count(*) from pre;
count
----------
36738034
(1 row)
Time: 100604.085 ms (01:40.604)
Execute a long query :
explain analyze select tag,uom,qtrs,value,ddate from num
where adsh='0000320193-22-000108' and tag in
(select tag from pre where stmt='IS' and
adsh='0000320193-22-000108') and ddate='2022-09-30';
It cost almost 7 minutes 30 seconds.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Semi Join (cost=2000.00..2909871.29 rows=2 width=59) (actual time=357717.922..450523.035 rows=45 loops=1)
Join Filter: ((num.tag)::text = (pre.tag)::text)
Rows Removed by Join Filter: 61320
-> Gather (cost=1000.00..1984125.01 rows=32 width=59) (actual time=190.355..92987.731 rows=678 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on num (cost=0.00..1983121.81 rows=13 width=59) (actual time=348.753..331304.671 rows=226 loops=3)
Filter: (((adsh)::text = '0000320193-22-000108'::text) AND (ddate = '2022-09-30'::date))
Rows Removed by Filter: 27953970
-> Materialize (cost=1000.00..925725.74 rows=43 width=33) (actual time=0.097..527.331 rows=91 loops=678)
-> Gather (cost=1000.00..925725.53 rows=43 width=33) (actual time=65.880..357527.133 rows=96 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on pre (cost=0.00..924721.22 rows=18 width=33) (actual time=201.713..357490.037 rows=32 loops=3)
Filter: (((adsh)::text = '0000320193-22-000108'::text) AND ((stmt)::text = 'IS'::text))
Rows Removed by Filter: 12245979
Planning Time: 0.632 ms
JIT:
Functions: 27
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 5.870 ms, Inlining 272.489 ms, Optimization 367.828 ms, Emission 213.288 ms, Total 859.474 ms
Execution Time: 450524.974 ms
(22 rows)
Time: 450526.084 ms (07:30.526)
How can optimize the database to reduce time on query?Add index and close something (the db running on my local pc without any other users)?
Its performing a full table scan on those columns, as you see from the analysis that says "Parallel Seq Scan" (ie sequential scan) on num and pre.
this means it is looking at every row to check if it should be used.
To speed it up, massively, you need to create an index on the columns used in the where clauses (pre.stmt, pre.adsh, num.adsh and num.ddate). Then the query will use the index to decide which rows to include, and as indexes are specially organised for this task, the performance will increase.
adding a CTE would greatly improve your query :
WITH t AS (
SELECT tag FROM tag WHERE stmt= 'yy' AND adsh = 'xxxx'
)
select tag,uom,qtrs,value,ddate
from num n
JOIN t ON t.tag = q.tag
where adsh='xxxx' and ddate='2022-09-30';
Also you can specialize one of the used columns if the queries used most frequently are very known :
CREATE INDEX idx_num_adshxxxx ON num(adsh) WHERE adsh='xxxx';
This would create a very fast index for just a small portion of the table.
Is important to notice that a indexes have very limited usage for non-preplanned selects, for large tables is often better to create a index scan select and re-query the results from a CTE than affording to load the entire table, as example this is a very common predicate of every day db usage and its problems:
WHERE LOWER(adsh) = 'xxxx' ;
Now notice how important is for your queries to be matching the correct indexing
This means if you change the column being researched your indexes should match or they doenst get used, this is same for integercolumn::text = 'x' or date::text = '2019-10-01'
after adding the proper index this get solved:
as much tables start getting bigger less and less random filters can be allowed, table scan will toggle a S.O memory cache , than later the same data will be doubled in the postgresql cache and only later will stabilize .
Unstable postgresql cached queries will reduce the speed of the previously cached queries whenever a new random query is executed if the cached memory of the server is already on the limit.
The subselect in the where clause needs to be rewritten into a table join which will be much faster.The DISTINCT may not be necessary, but I don't know the cardinality of your data, so I included it.
select DISTINCT
num.tag
, num.uom
, num.qtrs
, num.value
, num.ddate
from num
inner join pre
on num.adsh = pre.adsh
and num.adsh = 'xxxx'
and pre.stmt = 'yy'
and num.ddate = '2022-09-30'
I have a bigint column which is a 19 digit CAS, the first 13 digits of which are a unix timestamp. I need to extract the first 13 digits.
What is the best way to extract the first n digits from the bigint.
Both of these work:
select
cast(left(cast(1641394918760185856 as varchar), 13) as bigint) as withcasting,
1641394918760185856/1000000 as integerdiv;
OUTPUT:1641394918760, 1641394918760
Is there an obviously better way?
Which one of these is better as far as performance?
Which one if any is the canonical way to do it.
Intuitively, I'd think integer division is more performant because it's one operation.
But I like CAS because it's very simple (doesn't require figuring out the divisor) and expresses most clearly what I'm looking to do (extract the leading 13 digits) and also lends itself to a generalized UDF abstraction (LEFT_DIGITS(number,n)).
So I guess, assuming that it is indeed less performant, the question is really how much less performant?
Actually this way is simpler, you really don't need the left function , just cast to n number of character that you need:
select cast(1641394918760185856 as varchar(3))
well, my initial assumption was not right since actually you hav to cast it back to bigint , after looking at the execution plans, you can see actually arithmetic operation is done much easier ( with less memory usage) :
select 1641394918760185856/1000000 as ii
| QUERY PLAN |
| :--------------------------------------------------------------------------------- |
| Result (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1) |
| Output: '1641394918760'::bigint |
| Planning Time: 0.021 ms |
| Execution Time: 0.007 ms |
select cast(cast(1641394918760185856 as varchar(13)) as bigint) as ii
| QUERY PLAN |
| :--------------------------------------------------------------------------------- |
| Result (cost=0.00..0.01 rows=1 width=8) (actual time=0.000..0.001 rows=1 loops=1) |
| Output: '1641394918760'::bigint |
| Planning: |
| Buffers: shared hit=12 read=3 |
| Planning Time: 0.097 ms |
| Execution Time: 0.005 ms |
db<>fiddle here
I have a read-only table with 80 million rows :
Column | Type | Modifiers | Storage | Stats target | Description
-------------+------------------------+-----------+----------+--------------+-------------
id | character(11) | not null | extended | |
gender | character(1) | | extended | |
postal_code | character varying(10) | | extended | |
operator | character varying(5) | | extended | |
Indexes:
"categorised_phones_pkey" PRIMARY KEY, btree (id)
"operator_idx" btree (operator)
"postal_code_trgm_idx" gin (postal_code gin_trgm_ops)
id is Primary Key and contains unique mobile numbers. Table rows looks like this:
id | gender | postal_code | operator
----------------+--------------+----------------+------------
09567849087 | m | 7414776788 | mtn
09565649846 | f | 1268398732 | mci
09568831245 | f | 7412556443 | mtn
09469774390 | m | 5488312790 | mci
This query takes almost ~65 seconds for the first time and ~8 seconds for next times:
select operator,count(*) from categorised_phones where postal_code like '1%' group by operator;
And the output looks like this:
operator | count
----------+---------
mci | 4050314
mtn | 6235778
And the output of explain alanyze :
HashAggregate (cost=1364980.61..1364980.63 rows=2 width=10) (actual time=8257.026..8257.026 rows=2 loops=1)
Group Key: operator
-> Bitmap Heap Scan on categorised_phones (cost=95969.17..1312915.34 rows=10413054 width=2) (actual time=1140.803..6332.534 rows=10286092 loops=1)
Recheck Cond: ((postal_code)::text ~~ '1%'::text)
Rows Removed by Index Recheck: 25105697
Heap Blocks: exact=50449 lossy=237243
-> Bitmap Index Scan on postal_code_trgm_idx (cost=0.00..93365.90 rows=10413054 width=0) (actual time=1129.270..1129.270 rows=10287127 loops=1)
Index Cond: ((postal_code)::text ~~ '1%'::text)
Planning time: 0.540 ms
Execution time: 8257.392 ms
How can I make this query faster?
Any idea would be great appreciated.
P.S:
I'm using PostgreSQL 9.6.1
UPDATE
I just updated the question. I disabled Parallel Query and results changed.
For queries that involve comparisons of the form LIKE '%start', and following PostgreSQL own advice, you can use the following index:
CREATE INDEX postal_code_idx ON categorised_phones (postal_code varchar_pattern_ops) ;
With that index in place, and some simulated data, your execution plan could very likely look like:
| QUERY PLAN |
| :------------------------------------------------------------------------------------------------------------------------------------- |
| HashAggregate (cost=2368.65..2368.67 rows=2 width=12) (actual time=18.093..18.094 rows=2 loops=1) |
| Group Key: operator |
| -> Bitmap Heap Scan on categorised_phones (cost=536.79..2265.83 rows=20564 width=4) (actual time=2.564..12.061 rows=22171 loops=1) |
| Filter: ((postal_code)::text ~~ '1%'::text) |
| Heap Blocks: exact=1455 |
| -> Bitmap Index Scan on postal_code_idx (cost=0.00..531.65 rows=21923 width=0) (actual time=2.386..2.386 rows=22171 loops=1) |
| Index Cond: (((postal_code)::text ~>=~ '1'::text) AND ((postal_code)::text ~<~ '2'::text)) |
| Planning time: 0.119 ms |
| Execution time: 18.122 ms |
You can check it at dbfiddle here
If you have both queries with LIKE 'start%' and LIKE '%middle%', you should add this index, but keep the one already in place. Trigram indexes might prove useful with this second kind of match.
Why?
From PostgreSQL documentation on operator classes:
The operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the types text, varchar, and char respectively. The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard "C" locale.
From PostgreSQL documentation on Index Types
The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your database does not use the C locale you will need to create the index with a special operator class to support indexing of pattern-matching queries; see Section 11.9 below. It is also possible to use B-tree indexes for ILIKE and ~*, but only if the pattern starts with non-alphabetic characters, i.e., characters that are not affected by upper/lower case conversion.
UPDATE
If the queries performed involved always a fix (and relatively small) number of LIKE 'x%' expressions, consider using partial indexes.
For instance, for LIKE '1%', you'd have the following index, and the following query plan (it shows about a 3x improvement):
CREATE INDEX idx_1 ON categorised_phones (operator) WHERE postal_code LIKE '1%';
VACUUM categorised_phones ;
| QUERY PLAN |
| :-------------------------------------------------------------------------------------------------------------------------------------------- |
| GroupAggregate (cost=0.29..658.74 rows=3 width=12) (actual time=3.235..6.493 rows=2 loops=1) |
| Group Key: operator |
| -> Index Only Scan using idx_1 on categorised_phones (cost=0.29..554.10 rows=20921 width=4) (actual time=0.028..3.266 rows=22290 loops=1) |
| Heap Fetches: 0 |
| Planning time: 0.293 ms |
| Execution time: 6.517 ms |
Overview
I've been working on Netdot trying to speed up some of the queries. Some of them can benefit from SQL changes because of unneeded joins or broad searches, but some have proven harder to track down.
In this particular case I've got two tables. fwtableentry has 132,233,684 rows. fwtable has 2,178,088 rows.
hardware / software versions
This is a virtual machine running debian_version 7.5 (wheezy). The disks are on a SAN with raid 0+1. The machine has 4GB of ram allocated to it. I/O and memory don't appear to be an issue but I can allocate more resources if need be.
Linux netdot 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3+deb7u2 x86_64 GNU/Linux
Running Postgres 9.1.13-0wheezy1 (debian package version)
Sysctl Options
kernel.shmmax = 1500000000
vm.overcommit_memory = 2
Postgres Options
I started with the defaults. I've now modified them referencing another server that I had previously tuned according to one of the postgres tuning docs. The changes don't seem to help for the slow queries but might be helping for other things.
shared_buffers = 1GB # min 128kB
work_mem = 64MB # min 64kB
maintenance_work_mem = 256MB # min 1MB
wal_buffers = 16MB # min 32kB, -1 sets based on shared_buffers
checkpoint_segments = 32 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 10min # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
random_page_cost = 2.5 # same scale as above
effective_cache_size = 2GB
Explain showing the problem
netdot=# explain analyze SELECT ft.tstamp
FROM fwtableentry fte, fwtable ft
WHERE fte.physaddr=9115
AND fte.fwtable=ft.id
GROUP BY ft.tstamp
ORDER BY ft.tstamp DESC
LIMIT 10
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=53610.80..53610.82 rows=10 width=8) (actual time=27436.502..27436.631 rows=10 loops=1)
-> Sort (cost=53610.80..53617.92 rows=2849 width=8) (actual time=27436.220..27436.258 rows=10 loops=1)
Sort Key: ft.tstamp
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=53520.74..53549.23 rows=2849 width=8) (actual time=27417.749..27425.805 rows=2876 loops=1)
-> Nested Loop (cost=125.79..53500.91 rows=7933 width=8) (actual time=98.801..27367.988 rows=3562 loops=1)
-> Bitmap Heap Scan on fwtableentry fte (cost=125.79..18909.68 rows=7933 width=8) (actual time=97.718..26942.693 rows=3562 loops=1)
Recheck Cond: (physaddr = 9115)
-> Bitmap Index Scan on "FWTableEntry3" (cost=0.00..123.81 rows=7933 width=0) (actual time=86.433..86.433 rows=3562 loops=1)
Index Cond: (physaddr = 9115)
-> Index Scan using pk_fwtable on fwtable ft (cost=0.00..4.35 rows=1 width=16) (actual time=0.069..0.077 rows=1 loops=3562)
Index Cond: (id = fte.fwtable)
Total runtime: 27449.802 ms
Here are the two tables
fwtable
netdot=# \d fwtable
Table "public.fwtable"
Column | Type | Modifiers
--------+-----------------------------+---------------------------------------------------------------------
device | bigint | not null
id | bigint | not null default nextval('fwtable_id_seq'::regclass)
tstamp | timestamp without time zone | not null default '1970-01-02 00:00:01'::timestamp without time zone
Indexes:
"pk_fwtable" PRIMARY KEY, btree (id)
"fwtable1" UNIQUE CONSTRAINT, btree (device, tstamp)
"FWTable2" btree (device)
"FWTable3" btree (tstamp)
Foreign-key constraints:
"fk_device" FOREIGN KEY (device) REFERENCES device(id) DEFERRABLE
Referenced by:
TABLE "fwtableentry" CONSTRAINT "fk_fwtable" FOREIGN KEY (fwtable) REFERENCES fwtable(id) DEFERRABLE
fwtableentry
netdot=# \d fwtableentry
Table "public.fwtableentry"
Column | Type | Modifiers
-----------+--------+-----------------------------------------------------------
fwtable | bigint | not null
id | bigint | not null default nextval('fwtableentry_id_seq'::regclass)
interface | bigint | not null
physaddr | bigint | not null
Indexes:
"pk_fwtableentry" PRIMARY KEY, btree (id)
"FWTableEntry1" btree (fwtable)
"FWTableEntry2" btree (interface)
"FWTableEntry3" btree (physaddr)
Foreign-key constraints:
"fk_fwtable" FOREIGN KEY (fwtable) REFERENCES fwtable(id) DEFERRABLE
"fk_interface" FOREIGN KEY (interface) REFERENCES interface(id) DEFERRABLE
"fk_physaddr" FOREIGN KEY (physaddr) REFERENCES physaddr(id) DEFERRABLE
Here is a sample of the two tables
first fwtableentry:
fwtable | id | interface | physaddr
---------+-----------+-----------+----------
675157 | 39733332 | 29577 | 9115
674352 | 39686929 | 29577 | 9115
344 | 19298 | 29577 | 9115
1198 | 68328 | 29577 | 9115
1542 | 88107 | 29577 | 9115
675960 | 39779466 | 29577 | 9115
675750 | 39766468 | 39946 | 9115
2994 | 168721 | 29577 | 9115
3895 | 218228 | 29577 | 9115
4795 | 267949 | 29577 | 9115
5695 | 324905 | 29577 | 9115
674944 | 39720652 | 39946 | 9115
6595 | 375149 | 29577 | 9115
7501 | 425045 | 29577 | 9115
8400 | 475265 | 29577 | 9115
9298 | 524985 | 29577 | 9115
10200 | 575136 | 29577 | 9115
11104 | 626065 | 29577 | 9115
12011 | 677963 | 29577 | 9115
676580 | 39814792 | 39946 | 9115
12914 | 731390 | 29577 | 9115
677568 | 39871297 | 29577 | 9115
13821 | 784435 | 29577 | 9115
676760 | 39825496 | 29577 | 9115
fwtable (minus the device column):
id | tstamp
---------+---------------------
2178063 | 2014-06-10 17:00:13
2177442 | 2014-06-10 16:00:06
2176816 | 2014-06-10 15:00:07
2176190 | 2014-06-10 14:00:09
2175566 | 2014-06-10 13:00:07
2174941 | 2014-06-10 12:00:07
2174316 | 2014-06-10 11:00:07
2173689 | 2014-06-10 10:00:06
2173065 | 2014-06-10 09:00:06
2172444 | 2014-06-10 08:00:06
(10 rows)
Problem as far as I can tell
So, the problem is you need to know what ids to send to fwtable, but you can't know which ones match the 10 latest timestamps so you need to send them all, then let the index on fwtable determine which ones to throw away.
netdot=# select count(*) from fwtableentry where physaddr = 9115;
count
-------
3562
(1 row)
This is also the reason that, once cached, the query is fast. The joined datasets aren't huge so once it has an idea of what to do it can cache everything needed.
You might ask why not just pick the latest 10 timestamps and match against those, but the issue is that those timestamps might not have any results with that physaddr, so you need to check the results of the join.
Rewriting the query
select ft.tstamp from fwtable ft where ft.id in (select fwtable from fwtableentry where physaddr = 9115) order by ft.tstamp desc limit 10;
Still gives the same query plan but makes it easier for me to visualize the problems. Actually, ordering the query this way forces using of the plan even if you drop the sort.
You would think an index on fwtable (id, tstamp DESC) might help but it doesn't seem to get used. I can see how any index would be confused since it's taking a bunch of results from all over the place.
I thought it might help to tell the database that the relationship between id and tstamp was 1:1, so I added a unique index for the two. It didn't.
Dropping the limit doesn't affect the plan. It's only the sort that kills the performance.
Short of a materialized view with the three needed columns (which is impractical due to table size I think..) I'm not sure of a way to resolve this, but I might just not be SQL smart enough to realize the real problem.
Final notes
You can drop the GROUP BY in the first query, it's unneeded. I've dropped off a few tables from the joins that weren't needed for this example. They don't affect query speed as much as this issue and aren't needed in general so I'll probably rewrite the code to permanently leave them out.
I should also mention that drastic changes to the schema aren't really something I can do. I may be able to propose it as a last ditch measure if it is the only solution to the problem, but Netdot isn't my program so I don't know how much I can change to fix something that may bother me more than other people.
If you can constrain the query by timestamp too, then you might find an index on (tstamp, physaddr) useful*. It's a question of whether a query of the "top 10 in the last 30 days" is acceptable. As it stands I don't think there's anything smarter the planner could do, there's no reason for it to expect physaddr values to appear anywhere in particular.
* or perhaps (physaddr,tstamp) - it will depend upon the distribution of values.
Try rewiting into an exists(...) instead of an IN(...) Plus remove the (unneeded) GROUP BY; this may avoid aggregation
SELECT ft.tstamp
FROM fwtable ft
WHERE EXISTS(
SELECT *
FROM fwtableentry fte
WHERE fte.physaddr=9115
AND fte.fwtable=ft.id
)
-- GROUP BY ft.tstamp -- you don't need this
ORDER BY ft.tstamp DESC
LIMIT 10 -- LIMIT could kill your performance...
;
I searched for my problem a bit, but couldn't find a solution.
I run a PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit and my Query is pretty simple.
EXPLAIN (ANALYZE) SELECT CUSTOMER, PRICE, BUYDATE FROM dbo.Invoice WHERE CUSTOMER = 11111111 AND BUYDATE BETWEEN '2012-11-01' AND '2013-10-31';
Output:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on Invoice (cost=88193.54..152981.03 rows=20699 width=14) (actual time=987.205..987.242 rows=36 loops=1)
Recheck Cond: ((CUSTOMER = 11111111) AND (BUYDATE >= '2012-11-01'::date) AND (BUYDATE <= '2013-10-31'::date))
-> BitmapAnd (cost=88193.54..88193.54 rows=20699 width=0) (actual time=987.189..987.189 rows=0 loops=1)
-> Bitmap Index Scan on ix_Invoice (cost=0.00..1375.69 rows=74375 width=0) (actual time=0.043..0.043 rows=40 loops=1)
Index Cond: (CUSTOMER = 11111111)
-> Bitmap Index Scan on ix_Invoice3 (cost=0.00..86807.24 rows=4139736 width=0) (actual time=986.562..986.562 rows=4153999 loops=1)
Index Cond: ((BUYDATE >= '2012-11-01'::date) AND (BUYDATE <= '2013-10-31'::date))
Total runtime: 987.294 ms
(8 rows)
The Table Structure:
Column | Type | Modifiers | Storage | Stats target | Description
-----------------+---------------------------+-----------+----------+--------------+-------------
profitcenter | character varying(5) | not null | extended | |
invoicenumber | character varying(10) | not null | extended | |
invoiceposition | character varying(6) | not null | extended | |
buydate | date | not null | plain | |
customer | integer | | plain | |
nettowert | numeric(18,2) | | main | |
Indexes:
"filialbons_key" PRIMARY KEY, btree (profitcenter, invoicenumber, invoiceposition, buydate)
"ix_Invoice" btree (customer)
"ix_Invoice2" btree (invoicenumber)
"ix_Invoice3" btree (buydate)
"ix_Invoice4" btree (articlenumber)
Has OIDs: no
Example Output from the Query:
customer | price | buydate
--------------+-----------+----
11111111 | 8.32 | 2013-02-06
11111111 | 5.82 | 2013-02-06
11111111 | 16.64 | 2013-02-06
I ran the same Query on a MSSQL 2010? Express with the Date Column as varchar() and it was much faster.
Thanks for your help
with index on (customer, buydate) query should work much faster.
you may try to help planner to chose better plan by collecting more statistic data:
ALTER TABLE Invoice ALTER COLUMN customer SET STATISTICS 1000;
ALTER TABLE Invoice ALTER COLUMN buydate SET STATISTICS 1000;
ANALYZE Invoice;