How to reduce index scan by avoid scaning null values - tsql

I am using SQL Server 2008 R2 and I want to add a non-clustered index on a non-unique, nullable field. My index will also have one more column included in order to avoid accessing my clustered index:
CREATE INDEX IX_My_Index
ON MyTable (myBasicField)
INCLUDE (myIncludedField);
In the actual data of myBasicField there will be a lot of NULLs and I was wondering if there is a way I can increase performance by not scanning these NULLs, or prevent NULL values to be stored on my index.
Thanks in advance.

With SQL Server 2008 and newer, you could use a filtered index. See an intro blog post here - syntax would be:
CREATE INDEX IX_My_Index
ON MyTable (myBasicField)
INCLUDE (myIncludedField)
WHERE myBasicField IS NOT NULL;
Any query that contains the same WHERE clause can take advantage of this, and the index will be a lot smaller and thus perform better if you exclude NULL values like this.

You are looking for a filtered index. See:
http://technet.microsoft.com/en-us/library/cc280372.aspx

Related

jsonb data type lookup cost in postgres

This might be an obvious and simple question.
But I read through the jsonb data type documentation, but nowhere it mentions the lookup cost of a key in jsonb data.
For example, let's say I have a table with following schema:
CREATE TABLE A (id character varying (20),
info jsonb);
I want to know how postgres would parse a where query as below:
SELECT * FROM A WHERE info->>'city' = 'portland';
While going through the jsonb field of a row, is the lookup constant time (O(1)) or linear time (checking each key one by one in the row's jsonb dictionary) within that jsonb data dictionary?
My intuition is that it must be constant time (else what's the point of a dictionary style data?) but I can't see it in the official documentation to convince my team.
Any help would be great!
Thanks!
As with any WHERE condition in SQL: if there is no index, the database has to go through all rows of the table to find those that satisfy your condition.
You can either index a specific expression, or you can index the whole json value using a GIN index which then enables Postgres to use the index if any of the supported operators are used.
If you always check for the city, you can create a regular B-Tree index:
create index on a ( (info->>'city') );
If you don't know what you will be looking for, a GIN index might be a better choice:
create index on a using gin (info);
But you will need to change your query to use one of the operators that are supported by a GIN index, e.g. using the contains operator #>
select *
from a
where info #> '{"city": "portland"}::jsonb;
Note that an index lookup is not always the most efficient solution. Sometimes it's faster to simply go through all rows, sometimes the index lookup is faster.
If you want to learn more about indexes in relational database, go through the material here: http://use-the-index-luke.com/

How multiple indexes in postgres work on the same column

I was wondering I'm not really sure how multiple indexes would work on the same column.
So lets say I have an id column and a country column. And on those I have an index on id and another index on id and country. When I do my query plan it looks like its using both those indexes. I was just wondering how that works? Can I force it to use just the id and country index.
Also is it bad practice to do that? When is it a good idea to index the same column multiple times?
It is common to have indexes on both (id) and (country,id), or alternatively (country) and (country,id) if you have queries that benefit from each of them. You might also have (id) and (id, country) if you want the "covering" index on (id,country) to support index only scans, but still need the stand along to enforce a unique constraint.
In theory you could just have (id,country) and still use it to enforce uniqueness of id, but PostgreSQL does not support that at this time.
You could also sensibly have different indexes on the same column if you need to support different collations or operator classes.
If you want to force PostgreSQL to not use a particular index to see what happens with it gone, you can drop it in a transactions then roll it back when done:
BEGIN; drop index table_id_country_idx; explain analyze select * from ....; ROLLBACK;

Use of DB2 Indices with several columns

We are trying to locate a performance problem and wondering if an index is being used.
We have a table with a composite key, "ID" and "Version", both integers.
We have a select that tries to find the max of "ID". (This is done via Entity framework if it makes a difference).
Will this use the index or will it do a table scan?
If the ID column is defined as the first part of a multi-column index, then DB2 will use that index to determine the MAX(). It will still probably try to use the index if you did a MAX(VERSION), but if you have a very large table, this may take quite a bit of processing.
You can confirm this using the explain facilities (link is for Linux/Unix/Windows 9.7).

Postgres Indexing?

I am a newbie in postgres. I have a column named host (string varchar2) in a table which has around 20 million rows. How do I use indexing to optimize my search to find particular host. Also, this column will be updated daily do I need to write trigger indexing at particular interval? If yes, how do I do that? (For Records I am using Ruby and Rails 3)
Assuming you're doing exact matches, you should just be able to create the index and leave it:
CREATE INDEX host_index ON table_name (host)
The query optimizer should just use that automatically.
You may wish to specify other options such as the collation to use.
See the PostgreSQL docs for CREATE INDEX for more information.
I'd suggest using BRIN Index since its introduction from PostgreSQL 9.5 rather than the conventional btree index.
For text search, it is recommended that you use GIN or GiST index types.
https://www.postgresql.org/docs/9.5/static/textsearch-indexes.html
Another possibility is that if you were only performing exact matching in the host column, i.e., no inequality comparisons (>, <) and partial matching (like, wildcard) involved, you may consider converting host to a hash integer to speed up the search significantly.

alternative to bitmap index in postgresql

I have a table with hundreds of millions rows with schema like below.
tabe AA {
id integer primay key,
prop0 boolean not null,
prop1 boolean not null,
prop2 smallint not null,
...
}
The each "property" field (prop0, prop1, ...) has a small number of distinct values. And I usually query to find "id" from the given conditions of properties fields. I think Bitmap index is best for this query. But postgresql seems not support bitmap index.
I tried b-tree index on each field but these indexes are not used according to the query explain.
Is there a good alternative way to do this?
(i'm using postgresql 9)
Your real problem is a bad schema design, not the index. The properties should be placed in a different table and your current table should link to that table using a many to many relation.
The BIT datatype might also be of use, just check the manual.
Create a multicolumn index on properties which are always or almost always queried. Or several multicolumn indexes if needed.
The alternative, when you do not query the same properties almost always, is to make a tsvector column with words describing your data, maintained using trigger, for example
prop0=true
prop1=false
prop2=4
would be
'propzero nopropone proptwo4'::tsvector
index it using GIN and then use full text search for searching:
where tsv ## 'popzero & nopropone & proptwo4'::tsquery
An index is only used if it actually speeds up the query which is not necessarily always the case. Especially with smallish tables (say thousands of rows) a full table scan ("seq scan" in the Postgres execution plan) might indeed be a lot faster.
How many rows did the table have when you tried the statement?
How did the query look like? Maybe there are other conditions that prevent the index usage.
Did you analyze the table to have up-to-date statistics?